Elasticsearch vs. FeatureBase

 

Elasticsearch is an excellent solution for free-text and unstructured data use cases, but it is not always the best choice for real-time analytics. A database designed explicitly for real-time analytics may be a better option in many cases that scale past a critical “tipping point.” Below, we will review essential differences between Elasticsearch and Molecula’s FeatureBase and compare performance benchmarks.

Elasticsearch Overview

Elasticsearch is a distributed document store based on the Apache Lucene library specializing in full-text search for schema-free documents and provides access to raw event-level data. Elasticsearch stores complex data structures that have been serialized as JSON documents instead of storing information in columnar data format.

FeatureBase Overview 

For comparison, FeatureBase’s feature-oriented data format revolutionizes traditional tabular and columnar databases by storing data in a highly optimized format that enables blazing-fast queries. Underneath the hood, the data format looks a lot like a bitmap index, breaking out each unique value within each column and storing those values in a machine-native 1’s and 0’s format from the outset. This approach makes everything you do with the data faster. FeatureBase’s binary format was purpose-built for large-scale, real-time analytics and is much more performant (in terms of query speeds) and efficient (in terms of data footprint) than columnar data formats.

 

sample FeatureBase

Fig. 1: Illustration of a sample FeatureBase index


 

 

We know this first-hand at Molecula. FeatureBase was invented at Umbel (now MVPIndex), a customer data platform serving the biggest names in sports, media, and entertainment. Umbel needed to deliver real-time queries on massive datasets, including hundreds of different data sources (think social graphs, behavioral graphs, in-arena WiFi data, etc.). These sources required ingesting datasets containing hundreds of millions of fans with hundreds of millions of attributes. Umbel’s job was to make all this data instantly accessible with real-time joining and querying so that departments across client organizations could make decisions on the same, most up-to-date data.

The Shortcomings of Elasticsearch

As the datasets at Umbel grew larger, the existing systems (Elasticsearch and Cassandra) could no longer support the necessary data ingest volumes while simultaneously maintaining the low-latency query times required by their customers. They had huge Cassandra and Elasticsearch clusters (20+ nodes), but their most essential queries were still taking longer and longer. Umbel began to explore preprocessing, preaggregating, and all of the “things that you do” to attempt to make big data faster, but each of these workarounds required hefty tradeoffs between the promise of low-latency querying, high ingest volumes, and highly concurrent usage. Because existing solutions did not solve those problems, Umbel’s engineering team invented a much more efficient and performant data format, now known as Molecula’s feature-oriented format.

Key Differences: Elasticsearch vs. FeatureBase

Listed below are key differences between Elasticsearch and FeatureBase:

featurebase or elasticsearch

Optimal Use Cases

  • FeatureBase excels at powering large-scale analytical, structured, and semi-structured workloads where near real-time requirements are present.
  • Elasticsearch excels at free text search use cases, logging, log analysis, and scraping web sources at lower volumes of data.

Benchmarks: Elasticsearch vs FeatureBase

For both Elasticsearch and FeatureBase, we used a separate, three-node cluster running on AWS EC2 to perform the benchmarks. The instance type we chose for the nodes was “r4.2xlarge,” an 8-core virtual machine with 61 GiB RAM. In addition, we used a general-purpose EBS volume for the root storage volume. As a result, when comparing FeatureBase with Elasticsearch, FeatureBase far exceeds the speed of response times across large datasets:benchmarks

One of Molecula’s customers is a leader in outcome-based marketing that provides a platform for personalized consumer journeys. This customer could no longer power their workloads efficiently and quickly using Elasticsearch. Ingestion, data preparation, query, and new attribute addition times were slower than their business required. Hence, they began looking for solutions that could power their massive workloads in real time. They chose FeatureBase for its ability to provide ultra low latency at any scale. As a result, FeatureBase was able to increase the speed of their workloads by the following percentages compared to Elasticsearch: ingestion benchmarks

New Attribute Creation Results:Screen Shot 2022 01 20 at 1.07.09 PM

In summary, Elasticsearch and the ELK stack were designed to solve log-based and general text-based queries. FeatureBase was purpose-built to solve the shortcomings of Elasticsearch and remain performant as record count increases beyond a tipping point for many users. These users may be forced to consider the pros and cons of continuing to scale out or scale up to retain performant querying in Elasticsearch. Therefore, it’s crucial to select tooling that can meet analytical goals. If you’re looking for a free-text search tool, Elasticsearch is amazing. However, if you plan to power high volume, ultra low latency analytical use cases, Elasticsearch will not meet your needs. 

 

Schedule 15-Minute Demo