FeatureBase: The #1 ClickHouse Alternative

ClickHouse for Real-Time Analytics

ClickHouse is an open-source, column-oriented database for online analytical processing (OLAP). The technology was originally developed about 10 years ago at Yandex, Russia’s largest technology company, and is heavily built on Apache Arrow

ClickHouse was designed with a single objective: to filter and aggregate as much data as possible as quickly as possible. It is often used as the database for use cases requiring large-scale aggregations in real-time (for example, server log analysis to determine error rates, response times, etc.). It does not support transactional workloads and, in fact, updates data asynchronously, which can create inconsistencies in the data, leading users to look for a ClickHouse alternative or competitor. Because it is open-source, several SaaS offerings are built on ClickHouse, including Firebolt, Tinybird, and Altinity.

While ClickHouse is DevOps friendly and is almost unbeatable for complex grouping aggregation queries, it does have some weaknesses – such as updates and deletes – that can become detrimental if your workloads have certain dependencies which should be considered when determining whether to switch to a ClickHouse alternative.

Reasons to Consider Replacing ClickHouse: 

Molecula FeatureBase

FeatureBase is a real-time database that makes an organization’s freshest data immediately accessible, actionable, and reusable. It powers real-time analytics and machine learning applications by executing low-latency, high-throughput, and highly concurrent workloads simultaneously. 

FeatureBase ingests data continuously to execute on computationally intensive analytical workloads in real-time for the front lines of your business. It allows you to ingest millions of events per second with ACID transactions while simultaneously analyzing, transforming, and aggregating billions of rows of data and maintaining efficiency, making FeatureBase a great ClickHouse alternative.

 

ClickHouse vs. FeatureBase: 4 Key Technical Differences 

As a ClickHouse alternative, FeatureBase has key technical differences you should consider, including data ingestion, query capabilities, data modeling, and the data format.  Let’s look at each. 

Consideration #1 For Using a ClickHouse Alternative: Real-Time Data Ingestion

ClickHouse does not support real-time, record-by-record data ingestion, but instead recommends performing inserts in batches of at least 1000 records, or no more than one insertion per second. Because of this, it’s essential to configure ClickHouse to maximize the number of records per insertion. Depending on your use case and the throughput rate of your input data, even these configurations may not be sufficient to optimize writes into ClickHouse.

FeatureBase seamlessly handles the ingestion of massive-scale streaming data while simultaneously allowing for real-time inserts and updates to existing data schemas. While FeatureBase is not explicitly optimized for writes and also ingests data in microbatches, it is able to scale out horizontally and also employs several optimizations (like write-ahead logs) to support required throughputs. Additionally, FeatureBase is able to do a lot of preprocessing on the client side so that users not only have the option to scale out the actual database servers, but can actually offload much of the computation to ingest servers. These ingest servers can be ephemeral and exist only while there’s load. Essentially, they are elastic and further improve ingest efficiencies.

Consideration #2 For Using a ClickHouse Alternative: Query Capabilities

ClickHouse is a column-oriented OLAP database, which means that it excels at analytical workloads, not transactional workloads. It is designed to analyze immutable data (e.g., logs, events, and metrics). ClickHouse is particularly good at aggregations and filters, but because it was intended for immutable data, it does not do well with update or delete queries.

FeatureBase also excels at analytical workloads, but it is built around a feature-oriented format that allows for a few step function improvements over column-oriented databases (we’ll dive into this in more detail later). As a result, it is extremely good at supporting live updates while maintaining low-latency queries. FeatureBase’s novel approach to data minimizes I/O on queries by allowing the database engine to read and write exactly the data it needs and intelligently compress that data in memory.

Consideration #3 For Using a ClickHouse Alternative: Data Modeling 

ClickHouse is a column-oriented database that uses indexing structures and materialized views to speed up query execution when a filter is provided. In ClickHouse, batch deletes and updates occur asynchronously, which might seem trivial, but can actually cause major impacts to materialized views because the server cannot automatically update multiple tables at once. These impacts mean that data can be inconsistent and unreliable.

FeatureBase models data in a novel way. Tables are typically modeled around entities (customers, patients, unique IDs, etc.) or events (transactions, etc.). In addition, tables can have multiple sources (batch and streaming) and update records or add new fields in real time. Mapping relational tables to FeatureBase can be as simple as a one-to-one mapping, but significant performance improvements can be made through the mapping and feature table structure depending on:

Screen Shot 2022 02 02 at 10.35.11 PM

Consideration #4 For Using a ClickHouse Alternative: Data Format

ClickHouse, as mentioned above, is built heavily on Apache Arrow and, therefore, utilizes a column-oriented data format that has been tuned for particular performance optimizations. In its technical documentation, ClickHouse states that it is extremely fast because of a few key benefits that result from its optimized column-oriented format. These include:

Fig. 1 Clickhouse technical documentation FAQs

Fig. 1 ClickHouse technical documentation FAQs

FeatureBase is built entirely on our proprietary feature-oriented format. The beauty of the feature-oriented format is that it takes all of the benefits that column-oriented formats provide over row-oriented formats (as listed above in ClickHouse’s technical documentation) and actually optimizes them even further, resulting in a 10-100X cost/performance reduction.

For example, while column-oriented databases allow only relevant columns to be scanned to answer queries (instead of scanning every single row), FeatureBase takes that even further, requiring only the specific value to be scanned.

As another example, ClickHouse optimizes compression by storing different values of the same column together. FeatureBase stores data as compressed bits set within a bitmap. It can optimize that compression and storage even further by utilizing three different types of encodings that it intelligently adjusts based on the individual dataset.

 

Example:

If this is a bit confusing (it’s a new concept, after all!), it can be easiest to understand with an example. Let’s say we’re trying to count the number of people wearing green shirts. 

  • In a row-oriented database, the database would have to scan the entire “People” table row-by-row to find everywhere “Shirt Color” occurs and count each occurrence of “Green.” 
  • In a column-oriented database, it would have to scan the entire “Shirt Color” column to count all of the instances of “Green,” but it would be able to ignore everything else in the table. 

In FeatureBase, the database can go directly to the value “Green Shirt Color” to count the bits set in a compressed bitmap for that value. This means the database only has to deal with the data that tells it whether a person is wearing a green shirt or not. It can ignore everything else (including other shirt color options!).

When to Choose FeatureBase Over ClickHouse

Molecula’s FeatureBase is a real-time database built on bitmaps. FeatureBase continuously extracts and updates features from streaming technologies like Kafka and other data sources without the need for staging or preaggregation. This is a crucial differentiator when deciding whether your organization would be better suited to use a ClickHouse alternative such as FeatureBase.

 

If your organization is looking for flexibility to ad-hoc query the freshest data as soon as it hits your database, or if you’re highly dependent on updates and deletes, you will struggle with ClickHouse. Lastly, if you’re looking for the ability to filter based on time ranges, FeatureBase excels, while ClickHouse falls down in all but a few use cases. 

 

Start Free Trial