FeatureBase: An Elasticsearch Alternative For Real-Time Data
Elasticsearch is an excellent solution for free-text and unstructured data use cases but for real-time analytics at scale, an Elasticsearch alternative can power your workload more efficiently. A database designed explicitly for real-time analytics may be a better option for many use cases that scale past a critical “tipping point.” Below, we will review essential differences between Elasticsearch and FeatureBase and compare performance benchmarks.
Elasticsearch is a distributed document store based on the Apache Lucene library that specializes in full-text search for schema-free documents and provides access to raw event-level data. Elasticsearch stores complex data structures that have been serialized as JSON documents instead of storing information in columnar data format.
FeatureBase: An Elasticsearch Alternative for Real-Time Analytics
For comparison, FeatureBase, an extremely efficient real-time database, revolutionizes traditional tabular and columnar databases by storing data in a highly optimized format that enables blazing-fast queries. Underneath the hood, the data format looks a lot like a bitmap index, breaking out each unique value within each column and storing those values in a machine-native 1’s and 0’s format from the outset. This approach makes everything you do with the data faster. FeatureBase’s binary format was purpose-built for large-scale, real-time analytics and is much more performant (in terms of query speeds) and efficient (in terms of data footprint) than columnar data formats, making it a compelling Elasticsearch alternative for analytical workloads.
Fig. 1: Illustration of a sample FeatureBase index
We know this first-hand. FeatureBase was invented at Umbel (now MVPIndex), a customer data platform serving the biggest names in sports, media, and entertainment. Umbel needed to deliver real-time queries on massive datasets, including hundreds of different data sources (think social graphs, behavioral graphs, in-arena WiFi data, etc.). These sources required ingesting datasets containing hundreds of millions of fans with hundreds of millions of attributes. Umbel’s job was to make all this data instantly accessible with real-time joining and querying so that departments across client organizations could make decisions on the same, most up-to-date data.
The Shortcomings of Elasticsearch
As the datasets at Umbel grew larger, the existing systems (Elasticsearch and Cassandra) could no longer support the necessary data ingest volumes while simultaneously maintaining the low-latency query times required by their customers. They had huge Cassandra and Elasticsearch clusters (20+ nodes), but their most essential queries were still taking longer and longer, requiring them to search for an Elasticsearch alternative. Umbel began to explore preprocessing, preaggregating, and all of the “things that you do” to attempt to make big data faster, but each of these workarounds required hefty tradeoffs between the promise of low-latency querying, high ingest volumes, and highly concurrent usage. Because existing solutions did not solve those problems, Umbel’s engineering team invented a much more efficient and performant data format, now known as our feature-oriented format.
Key Differences: Elasticsearch vs. FeatureBase
Listed below are key differences between Elasticsearch and FeatureBase:
Optimal Use Cases
- FeatureBase excels at powering large-scale analytical, structured, and semi-structured workloads where near real-time requirements are present.
- Elasticsearch excels at free text search use cases, logging, log analysis, and scraping web sources at lower volumes of data.
Benchmarks: Elasticsearch vs FeatureBase
For both Elasticsearch and FeatureBase, we used a separate, three-node cluster running on AWS EC2 to perform the benchmarks. The instance type we chose for the nodes was “r4.2xlarge,” an 8-core virtual machine with 61 GiB RAM. In addition, we used a general-purpose EBS volume for the root storage volume. As a result, when comparing FeatureBase with Elasticsearch, FeatureBase far exceeds the speed of response times across large datasets:
One of our customers is a leader in outcome-based marketing that provides a platform for personalized consumer journeys. This customer could no longer power their workloads efficiently and quickly using Elasticsearch. Ingestion, data preparation, query, and new attribute addition times were slower than their business required. Hence, they began looking for Elasticsearch alternatives that could power their massive workloads in real time. They chose FeatureBase for its ability to provide ultra low latency at any scale. As a result, FeatureBase was able to increase the speed of their workloads by the following percentages compared to Elasticsearch:
New Attribute Creation Results:
In summary, Elasticsearch and the ELK stack were designed to solve log-based and general text-based queries. FeatureBase was purpose-built to solve the shortcomings of Elasticsearch when it comes to analytical workloads, and remain performant as record count increases beyond a tipping point for many users. Elasticsearch users may be forced to consider the pros and cons of either continuing to scale their Elasticsearch cluster out/up to retain performant querying or switching to an Elasticsearch alternative.
Therefore, it’s crucial to select tooling that can meet analytical goals. If you’re looking for a free-text search tool, Elasticsearch is amazing. However, if you plan to power high volume, ultra low latency analytics, Elasticsearch will not meet your needs.