How to Reduce Latency: Approaches for Data Engineers

By: Molecula

TL;DR

  • Latency, the time delay between an action and a response, is a certainty, but there are ways to reduce it, and doing so is critical for many applications
  • Latency reduction strategies include “scale up,” “scale out,” and pre-aggregation, but each comes with its own set of compromises and flaws
  • A new way to reduce latency is through how data itself is stored

Computer scientists and software engineers across industries and technologies devote countless resources to reducing latency—the time delay between an action and a response—in nearly every imaginable application.

Reducing Latency in Machine Learning

Managing latency is mission-critical to any technology that relies on accessing and transferring massive amounts of data at scale. It has become a growing pain point for users who need access to data that are increasingly large, rapidly changing, and highly complex.

Current solutions to reduce latency, such as using columnar formats like Parquet and ORC paired with Apache Spark, can help reduce queries that previously took days to run into just seconds. Shrinking a query which previously took days down to only a few seconds may sound like a successful ending to the latency story, but it is not enough for certain use cases that require millisecond speeds. It is still extremely difficult to push into sub-second latencies for analytical queries on huge data sets.

Latency Reduction Strategies

There are a number of ways that organizations have attempted to reduce latency. Let’s take a look at a few of the most common:

  1. Scale Up:  The “scaling up” approach refers to buying a bigger machine to house the database. While buying bigger machines improves latency to a point, most demanding applications will hit that point sooner if not later. One machine won’t support more than about 100 cores and a few terabytes of memory. Even if the required data set fits in memory, the amount of I/O and processing needed to serve a complex query may still take hours. For example, scaling from a machine with one core to a machine with 100 cores would result in a 100x performance increase in the absolute best case scenario.
  2. Scale Out: If the problem can’t be solved with a bigger machine, another solution would be to spread the workload over many machines. This “scaling out” approach works pretty well. As the data are spread over more and more machines, each machine only needs to process a smaller chunk. All these machines can save time since they work in parallel. However, there is overhead associated with fanning a request out to many hundreds or thousands of machines: there is overhead on each of those machines in processing the request, returning its results, and eventually, those results need to be re-aggregated into a single answer. This approach is also hindered by fundamental limits. A thousand machines don’t fit into a small space; there is necessarily distance between them, not to mention networking equipment. For large numbers of machines, fanning out a query and reducing the results may involve several network hops. Additionally, the more machines that are involved, the greater the chance that some will  have failures or performance hiccups adding to overall request latency.
  3. Pre-Process your Data: The next often-used strategy is pre-processing the data. This includes techniques such as data marts and OLAP cubes. When data is pre-processed, it can be queried and explored very quickly as long as the specific needs have been articulated and are supported by the processed version of the data set. Pre-processing typically involves aggregating data. The data set is shrunk to a more manageable size, but the trade-off is a loss of data resolution, meaning granular views are not accessible and you’re limited by the preconceived data selections you’ve made. Technically, the latency is still there, it is just moved to a new location within the process.

FeatureBase: A New Way to Crush Latency

The ultimate latency reduction strategy is to store data in the most efficient format possible for the job.

This idea goes back to some of the first databases and the notion of indexes. In many databases, indexes are created as auxiliary data structures which help to look up data for particular purposes quickly. An index might help answer queries with sorted data or avoid additional I/O by storing pointers to certain sections of the data based on query parameters. 

Indexes are helpful, but the real performance gains come when you start playing with how the data itself is stored.

FeatureBase, Molecula’s feature extraction and storage technology, breaks data out first by column, and then by each unique value within the column. By extracting features from source data without creating copies or moving the data itself, FeatureBase provides scale, performance, and increased control. All of this translates into faster data, more data, and easier-to-access data.

FeatureBase_Format

The obvious advantages of FeatureBase’s format are extensions of the columnar advantages. It is only necessary to read the data needed for a particular query. For columnar data stores, only data located in the particular columns relevant to the query are scanned. FeatureBase takes this one step further, scanning only data relevant to the particular values relevant to the query.

In columnar stores, data in columns can often be compressed more efficiently because the values are closely related. With FeatureBase, the majority of the data becomes a feature map that describes which records have a particular feature. These feature maps remain independent of the features themselves which are compressed using the same highly optimized approach (a variant of Roaring Bitmaps). Roaring Bitmaps are a type of succinct data structure, a form of homomorphic compression which can read and write features without decompressing. 

Applications of FeatureBase

FeatureBase is primarily focused on opening up new use cases for clients by shattering the latency floor compared to legacy systems. Rather than ripping and replacing entire systems, IT departments using FeatureBase often find ways to replace OLAP Cubes, Analytical Data Lakes, and other redundant systems with FeatureBase: doing this can help companies with cost savings by 10-100x due to the network and data movement costs associated with information era systems. 

For example, in the situations where FeatureBase replaces Elasticsearch for analytical purposes, there has been a 10x reduction in data footprint, a 1000x improvement in performance, and the ability to do all of this without the typical pre-aggregation or pre-processing.

Creating Value 

Software engineers, data engineers, and machine learning engineers who are tasked with delivering data access to people or applications that need to query, segment, analyze, and make decisions on data in real time all stand to benefit from the strategies outlined above.  

For those seeking to deliver real-time insights, the same old data strategies are not enough. FeatureBase breaks the latency floor with an entirely new paradigm for continuous, real-time data analysis that reduces complexity and compresses days, hours, or minutes of processing time into milliseconds.

Breaking through the latency floor is mining for time. Every moment that is recaptured by reduced latency can be correlated with increased value, whether it is a better user experience, a more accurate prediction, a real-time report, or a research breakthrough. The new value to be created is only limited by the laws of the universe.

Download our white paper to read more about Breaking the Latency Floor.