What is NoSQL? — A Simple Intro to Scalable Database Systems
When we are first introduced to databases, we usually start off with some relational database like MySQL or Postgres. We have to sit and think about the structure of our tables and how entities in our database will be related to each other entities.
At times, we may even want to normalize our database structure. If you’re not familiar with the benefits of normalizing your database, check out this article. Essentially you get less redundancy of data and avoid “update anomalies”.
Storage Used to be Expensive
Back in the day, when the cost per one gigabyte of storage was still high, it was extremely important for businesses to make sure that they optimized the crap out of their relational databases so they can save money. The mindset was to structure the data in such a way that reduced data redundancy. Don’t store duplicates and the bill at the end of the month won’t be as high.
However, as we all know, the cost per one gigabyte of storage has dropped dramatically. I remember my dad paying $50 for an 8 megabyte PS2 memory chip in the early 2000s and now I can get a whole terabyte hard drive from Best Buy for nearly the same price.
This dramatic drop in price for storage made the action of carefully planning and normalizing your database a lot less important. From the perspective of a business or a C-suite executive, it was no longer data storage that was a big cost. The developers started becoming the biggest and primary cost in software development.
The Start of NoSQL
So in the late 2000s, as the cost of storage came down dramatically, NoSQL databases were created to take advantage of the abundance of storage.
NoSQL refers to “non”-SQL, or rather “non”-relational. Essentially, NoSQL databases store information in a different way than relational databases such as MySQL or Postgres.
NoSQL databases allow for developers to store huge amounts of unstructured data which provides for a lot of flexibility. You no longer have to sit down and think about the structure of your database.
Additionally, the Agile methodology started to become the industry standard around the same time and developers started to recognize the need to be able to quickly adapt to ever-changing requirements. The flexibility that NoSQL databases provided helped fill that need.
Diving Deeper into NoSQL
So NoSQL databases like MongoDB are non-tabular, essentially meaning that they do not use tables in the same way that relational databases would use them.
Given this, NoSQL databases allow you to provide flexible schemas. You don’t have to a rigid structure that you need to comply with.
Furthermore, because of the name “non-relational”, there’s a misconception that NoSQL databases cannot store relationships between data. NoSQL databases can store relationships between data, they just store it differently than how relational databases would.
Finally, there are many types of NoSQL databases. Each type differs in its data model and how it structures data. These data models allow for related data to be nested within a single data structure which covers the need to store relationships between data.
Let’s take a closer look at each type of NoSQL database.
1. Document-Based NoSQL Databases
Data is stored in documents similar to JSON objects. Each document contains pairs of fields and values. The values can be anything from strings, numbers, booleans, arrays or objects.
The nice benefit of using a document-based NoSQL database is that the objects’ structures in each document will typically align with the structure of the objects that developers are working with within the code. This makes developers much more productive.
Furthermore, they can horizontally scale out to store and work with large data volumes. For example, creating a few replicas of your database for fault tolerance is not a problem.
Document-based databases also have a variety of field value types and when combined with powerful query languages, document-based databases can be applied to a wide array of use cases and can be considered as a general-purpose database.
MongoDB is an example of a document-based NoSQL database.
2. Key-Value-Based NoSQL Databases
This is a much simpler type of NoSQL database where each item contains some keys and values.
If you need to store large amounts of data but you don’t need to really perform a bunch of complex queries on the data, you should consider key-value-based NoSQL databases. For example, if you wanted to store preferences for a user or keep an activity log for a user, a key-value NoSQL database will be helpful.
Redis and DynamoDB are two popular key-value-based NoSQL databases.
3. Wide-Column-Based NoSQL Stores
Wide-column-based NoSQL stores provide a lot of flexibility over relational databases because each row is not required to have the same columns. Essentially, each row can have its own structure.
It stores data in tables, rows and dynamic columns. Each column is stored separately on the disk. Many consider wide-column databases to be “two-dimensional” key-value databases.
HBase and Apache Cassandra are two examples of wide-column-based NoSQL stores.
4. Graph-Based NoSQL Databases
Graph-based NoSQL databases will store information in nodes and edges. Nodes store information about an object such as a person, a place or a thing. Edges will store information about the relationships between the nodes.
Graph-based NoSQL databases work really well in use cases where you need to traverse relationships between multiple nodes to look for patterns. For example, when social networks such as Facebook or LinkedIn want to suggest you new friends or connections through mutual friends or connections, using a graph-based NoSQL database can help in quickly finding these potential relationships.
Neo4j and JanusGraph are examples of graph-based NoSQL databases.
Reasons to Use NoSQL
There are three main reasons to use NoSQL: flexibility, scalability and high-performance. Let’s go over each one.
NoSQL databases will generally allow you to get away with not defining a rigid structure for your data. Being able to provide flexible schemes enables faster and more iterative development.
Given how flexible NoSQL databases can be, they are the ideal candidate for semi-structured and unstructured data.
NoSQL databases were made with horizontal scaling in mind. They are generally designed to scale out by using distributed clusters of computers instead of scaling vertically which is scaling by adding more expensive and robust servers.
Some cloud providers like Amazon Web Services will handle all of these operations behind the scenes for you as well. For example, with DynamoDB, you can easily scale it out on AWS to handle more traffic.
Being able to scale out horizontally is an extremely attractive benefit and one of the main reasons NoSQL databases are used.
NoSQL databases are generally optimized for a specific data model (document-based, key-value-based, etc.) and they can access patterns that enable higher performance than relational databases for similar functions.
For example, reading data from a key-value-based NoSQL database can be extremely fast. This can allow for a better user experience.
Disadvantages of NoSQL
As there are benefits, there are also downsides to using NoSQL.
1. Different Syntax Standards across NoSQL Databases
Each NoSQL database type can have its own syntax for querying and managing data. Migrating from one database system to another can mean having to relearn how to work with the new NoSQL database. In contrast, SQL generally has a consistent syntax across each SQL-based database.
2. Data Responsibilities Transition to Developers
Next, the lack of a rigid database schema and constraints gives more flexibility, however it also removes the data integrity safeguards that are typically in relational databases.
So the responsibility falls on the developer to make sure that the structure of the data is sound. Usually, this is done by a database administrator.
3. Eventual Consistency Model
Most NoSQL databases use the eventual consistency model. This model guarantees that when an update is made to a database in a distributed system, that change will eventually be reflected in all of the database nodes in the distributed system.
So at times, the data across a distributed network of NoSQL databases may not be the same. One database may have been updated while the other node has to make the same change in the data. This means that NoSQL databases are generally not the best solution when you require immediate integrity of some data, like bank transactions.
4. Just Newer and Less Experience
Relational databases like MySQL have been around for a while now and companies have come up with some industry standards on how to use them.
On the other hand, NoSQL databases are much newer and so there’s a lack of some comprehensive list of industry best practices and standards. Research is still being done and improvements are constantly made.
NoSQL databases, unlike relational databases, allow you to store and manage unstructured data in a feasible manner. Furthermore, NoSQL databases are built with scalability in mind.
If you’re looking to store large amounts of unstructured data or want the flexibility to quickly build a prototype of a product, I would recommend looking into NoSQL.