The Origin of Feature Stores
Feature stores have quickly emerged as a new frontier for connecting models to real-time data. Feature stores, as the name implies, store features derived from raw data and serves them to downstream models for training and inference. Most reference architectures for feature stores built today are amalgamations of batch, streaming, caching, and database systems.
RISELab at UC Berkeley defines feature stores as such:
“Feature stores are used to store and serve features across multiple branches of the pipeline, allowing for shared computation and optimizations. While different feature stores vary in their functionality, they typically manage the following:
- Serving features to meet varying query latency requirements — Features are usually placed in both a fast “online store” (to query during inference) and durable “offline store” (to query during training).
- Making features composable and extensible — Once a feature is defined, it should be easy to connect it to downstream models, derive additional features from it, or redefine the feature’s schema or featurization function.
- Maintaining features derived from real-time data — Maintaining features is resource intensive, but stale features can negatively affect prediction performance.”
Most feature stores today are built atop existing technologies creating constraints that can be problematic when serving feature store functions thus have had to optimize tradeoffs between data freshness, latency, and cost. Part of the challenge in scaling feature stores in production is that we continue to operate on these information-era databases and formats, designed for human-centric data. The systems that are typically used include:
How the Feature Store Works in Molecula
Molecula has introduced a platform designed to automatically convert all data into features and to be the intersection between Data Engineering and Data Science. We believe that Data Engineers are the key to aligning the workload imbalance that lies between our data and extracting value from our data. Data Engineers will be the modern corporate heroes when they can use feature stores to transition from deploying infrastructure for every single project and can instead spend their day delivering model-ready data to the business.
When you can automate extraction of features directly from raw data sources and store only the features in a purpose-built feature storage system, all of your most important data (think customers, patients, inventory, supply chain, parts, etc.) is already in a computationally efficient form. Using this feature-first format as the basis for all workloads, you get huge cost, security, and performance benefits. Features are amazing and they absolutely will power the future of machine intelligence, but when you power your features with a data format designed specifically for ML, the benefits are exponential. Because of this, feature stores, will become a foundational component of the entire ML Lifecycle.NEXT