How to Choose the Right Database for Your Data Strategy

Posted by tina on August 16, 2016 Data Governance

Choosing the right database is a long term decision that will impact your business. You don’t want to start out on the path of implementing a database that your business will outgrow in a few months. And, rethinking your entire database strategy down the road isn’t ideal.

Databases are more than columns and rows. Whether you’re a scaling SaaS business or a Fortune 500 company, you’ll need a way to store all your data from email marketing lists to user IDs.

Thankfully, today’s database ecosystem is rich in mature options. Many, like PostgreSQL, have been on the market for 20 years and are still growing in popularity. Of all the available options, how do you choose the right database? Whether you’re starting from scratch or planning to migrate from one database to another, we’ve compiled a high-level database framework for you to consider.

In this blog post, we’ll explain how to go about choosing the right database for your company’s data strategy. 

1. Define Your Data Strategy

Before diving into operational databases, it’s crucial to define a data strategy that aligns with your business. Standard databases are used for online transactional processing (OLTP) and are optimized to record data from various sources. Meaning, your mobile e-commerce application will record the items left in a shopping cart by user 12345 and load that transactional data into your database.

So, understanding the type of transactional data that you’ll need to record is important in both defining your data strategy and evaluating operational databases.

With that in mind, defining a data strategy doesn’t have to be difficult. It can be as simple as evaluating your transactional data requirements and turning those requirements into the blueprint for your framework. Here are a few to help you define your requirements:

Structured Versus Unstructured Data

Your company, whether it’s a delivery app or a project management tool, the volume of data created and collected daily is growing exponentially. For data analysts, this is great news because it promises more data and paves the way for more complex analyses. However, with more data, it furthers the growing debate of structured versus unstructured data.

Structured data, such as a finance spreadsheet, is organized in rows and columns which makes it readily searchable for queries. Therefore, structured data is more compatible with relational databases (more on this later) and the most common type of data for businesses.

If structured data is easily searchable, then unstructured data is the complete opposite. Unstructured data does not abide by a predefined data model nor is it organized in a specific manner. Examples of unstructured data include social media posts, documents, images and metadata. Accordingly, unstructured data is rather difficult to run queries or get analytics from—unless you know what you’re looking for. So, knowing which type of data you’re most likely going to record is mission-critical.

Volume of Data in Your Database

When evaluating databases, you have to estimate the size of the database your business will need for the present and future. An agile database is meant to scale alongside your business growth and estimating the size for your future needs will enable you to grow without friction.

Estimating the volume of data in your database impacts performance and speed. So, it’s no surprise that queries run at a slower speed as data volume grows. However, each database offers storage a megabyte, terabyte and petabyte scales—allowing you to choose and adjust based on your needs.

In addition, standard relational databases allow you to run a VACUUM command to clean the data within a table for more refined storage.

Performance Issues via Unpredictable Queries

If your business needs access to its database in real-time in order to perform important tasks, it’s best to choose a database that’s optimized for analytics and aligned with standard semantics.

2. Relational versus Non-Relational Databases

Now that we’ve outlined the high-level requirements for a data strategy, let’s unpack the different types of databases available on the market. Remember, there are a lot of options, so choose one based on your needs.

Relational Databases

Relational databases (RDBMS) have been around since the early 1970s, and unsurprisingly were not designed to hold unstructured data. As aforementioned, the data within a relational database is structured and organized into one or more tables of columns and rows, which makes it readily available for querying.  

Popular relational databases include PostgreSQLMySQL and SQLite, all of which serve small to large data sets. These databases also support some secondary indexing, which allows for an easier way to access data. Here are a few key differences between the three mentioned databases:

Opensource Database

  • PostgreSQL - This database implements standard and advanced SQL more prominently than its competitors, while supporting a multitude of advanced data types.

  • MySQL - Designed to work well with web-based servers for web applications like transactional delivery companies.

  • SQLite - Is a popular choice as an embedded database for web browsers, rather than a database engine.

With PostgreSQL and MySQL as relational database leaders, it means that as a user you’ll get a mature product that has the ability to grow alongside your business and data set. Further, both databases have a large community of users with technical documentation for support.

Non-relational Databases

Non-relational databases, or often referred to as NoSQL databases, have grown in popularity over the years. This growth is primarily due to companies that are growing at such a speed that they don’t have time to slow down and work within data schemas. For NoSQL databases, scaling is a built-in functionality and is a sought after requirement for real-time companies such as Amazon, Facebook and Google.

NoSQL databases have the ability to incorporate any type of data, without loosing any of its scaling abilities and allows users to make changes in real-time. The most popular non-relational database is MongoDB. Others include Apache, Cassandra and Redis.

MongoDB is document based and retains a few SQL characteristics such an intuitive query language and the ability to index data. Users of MongoDB find it similar to MySQL, which allows for an easier transition from relational to non-relational databases (should your business require it).

Conclusion

Today’s database market is sprawling. Whether your business is better suited with a SQL or NoSQL database, it’s best to choose a database that suits your future needs. Either way, once your data is properly stored in an operational database, the next step in your strategy is to ready it for a data warehouse and analytics.

For more information on databases, watch our on demand webinar with Treasure Data as they explore the territory from data sources to databases.