Accelerate Your Delta Lake with FeatureBase

What is Databricks’ Delta Lake?

Delta Lake is a file-based open source storage format that enables organizations to build data products with a high level of governance and reliability. By replacing data silos with a single home for structured, semi-structured, and unstructured data, Delta Lake is the foundation of a cost-effective, highly scalable lakehouse.

Delta Lake is an amazing cloud data storage solution that empowers organizations to have a single source of truth that is ACID compliant, meets regulatory needs, unifies batch and streaming data within a single access point, and prepares organizations to begin addressing advanced analytics, AI, and ML use cases.

What is FeatureBase?

FeatureBase is the database for real-time decisions .

FeatureBase powers real-time analytics and machine learning applications, making data immediately accessible, actionable, and reusable. By eliminating time-consuming and costly pre-aggregation, FeatureBase unlocks your data to drive instant decisions.

FeatureBase + Delta Lake:

While Databricks’ Delta Lake has a full suite of capabilities and empowers users to unify data within a single data access point, there are opportunities to make Delta Lake far more powerful. Delta Lake excels at storing data within the data cloud layer, but when it comes to operating on data within the Delta Lake, FeatureBase creates massive optimizations!

Out-of-the-box Delta Lake includes several “types” of data tables or layers:

  • Bronze: Bronze data tables keep data in an as-is form (raw form e.g. JSON, Parquet, IOT data, XML, etc)
  • Silver: Silver data tables result in a more refined view of an organization’s data. They can be queried directly, and data is clean, normalized, and can be considered as a single source of truth.
  • Gold: Gold data tables are an aggregated data layer reserved for different business use cases.

When Milliseconds Matter

Each layer of data within the Delta Lake creates added latency (and copies!) and inhibits data freshness. For some, this is okay, but when it comes to organizations that are trying to serve real-time results on the freshest of data, or to minimize risk exposure and data sprawl due to copies, FeatureBase can help. With Delta Lake, as-is, reporting, analytics, and machine learning models require Gold tables, but to create Gold tables, one must pre-aggregate data — thereby negating much of the benefit of the Delta Lake itself.

Once FeatureBase is implemented as an overlay on Delta Lake, the pre-aggregation can occur directly from source data, without the three steps of required copying and materializing that are currently present within Delta Lake, and can directly sync back up with Delta Lake to return results as needed.

FeatureBase + Delta Lake Benefits:

  • Access the data you need in the moment you need it (no more waiting on new table creation)
  • Operate/compute on data in its most optimized format
  • Reduce latency through bypassing steps in the process – compute directly on data in bronze or silver tables (and combine it with gold, when necessary)
  • Improve data freshness by operating on your most up-to-date and relevant data, then storing it within the Delta Lake “single source of truth”

