December 28, 2022 By Matthew Rathbone *

A columnar database is a type of database management system (DBMS) that stores data in columns rather than rows. This allows for more efficient querying and data manipulation, as the DBMS only needs to access the relevant columns of data rather than entire rows.

Columnar database examples

Commercial examples of columnar databases include Amazon Redshift, Snowflake, Google BigQuery, and Microsoft Azure Synapse.

Open source examples include Apache Cassandra, Apache HBase, and ClickHouse. Also worth noting are columar file structures like Parquet and Delta.

Make Redshift Fun Again With Beekeeper Studio

Write SQL, create tables, edit data, and have fun doing it! Beekeeper Studio is available for MacOS, Linux, and Windows.

Use cases for columnar databases

Columnar databases are typically used to perform data analysis and reporting (rather than for operational trasactions) because they are able to quickly summarize and analyze large amounts of data. This is espcially useful for applications that require real-time analysis of data, such as financial reporting or market research.

Columnar vs transactional (row-based) databases

In a traditional row-based database, data is stored in rows, with each row representing a record or entry in the database. For example, a database table containing information about employees might have one row for each employee, with each row containing data such as the employee’s name, address, phone number, and job title. However, this type of storage can be inefficient for analytics or reporting queries, especially when dealing with large datasets.

In a columnar database, data is organized into columns rather than rows. So for example all the employee names are stored together, rather than being stored alongside the other data for each employee. This allows the DBMS to quickly access and manipulate only the relevant columns of data, rather than needing to read and process entire rows of data to extract a few slices. This can greatly improve the performance of analytics and reporting queries, especially those that only need to access a small number of columns from a large dataset.

Columnar databases can scale

Another advantage of columnar databases is that they can be easily scaled to handle large amounts of data. Since the data is stored in columns, it can be distributed across multiple servers or storage devices, allowing the database to handle much larger datasets than a traditional row-based database. This makes columnar databases well-suited for applications such as data warehousing and business intelligence, where large amounts of data need to be analyzed and queried in real-time.

Do you need a columnar database?

If your reporting and analytics queries are taking a long time to complete, or your DBA is complaining about the strain these queries put on the database, it might be time to use a Columnar database.

Columnar database vendors have many tools to help you migrate your data into their systems, but be prepared for a non-trivial amount of engineering effort in order to complete the migration.

Columnar database summary

In summary, a columnar database is a type of DBMS that stores data in columns rather than rows. This allows for more efficient querying and data manipulation, and enables the database to easily scale to handle large amounts of data. Examples of columnar databases include Amazon Redshift, Google BigQuery, and Microsoft Azure Synapse.