Is a Feature-Oriented Database Right for You? 9 Questions You Should Be Asking

“Any sufficiently advanced technology is indistinguishable from magic.”

-Arthur C. Clarke, Author and Futurist

FeatureBase, Molecula’s feature-oriented database, excels at complex analytical workloads where source data is fragmented across silos and where a user or machine wants to apply a number of filters or criteria to a query. Organizations that have billions of real-time data events joined with many fragmented, terabyte-scale data sources are the ones that benefit the most from embracing a feature-oriented architecture.

Wondering if you would benefit from a feature-oriented database? Ask yourself and your team the following: 

  • Do we have a single access point that powers all AI and ML projects?
  • Do we have to build new ETL pipelines and infrastructure for each project?
  • Are our queries taking seconds, minutes, or longer to return results?
  • Is the data we are analyzing real-time and fresh, or is it stale?
  • Are we having challenges with joining historical and streaming data?
  • What if we could speed up data processing by a factor of 10X? 100X? 1000X?
  • What if we could process our current overnight batches instantly, even fast enough to populate our UI in real time?
  • What ideas have we not considered because it would just be too hard to process the data with our current technology?
  • Once we train a model, does it take us months to get it into production?

Your answers to the above questions will start the discussion you should be having with your team and give you an idea of whether or not embracing a feature-oriented mindset could transform your business now or in the future.

A Look at Feature-Oriented Architecture in the Field

FeatureBase is Molecula’s real-time decision database product and is the best way we know of to embark on a feature-oriented database journey. It is the most efficient way to store data and is specifically designed for machine-scale workloads. It is so efficient, that complex queries and real-time computations can be done directly on the data. In other words, it eliminates the need for preaggreagation.

How Does a Feature-Oriented Database Work?

Let’s start at the beginning, with implementation. FeatureBase functions as an overlay to your existing data architecture (it lives between the data storage layer and the application layer). This means you do not need to disrupt your existing data collection, storage, backup systems, etc., to reap the benefits of a feature-oriented database right off the bat. It’s worth noting that after implementing FeatureBase, most organizations are able to eliminate much of their project-oriented infrastructure deployments (such as data warehouses, ETL pipelines, and batch processing) since they no longer need to make copies for preaggregation, but that’s a bonus, not the main attraction.

Once implemented, FeatureBase’s job is to monitor all of your data quietly behind the scenes and automatically convert and store it in a machine-centric format. All of it. All the time. FeatureBase utilizes a variety of ingest plugins, including bulk SQL loaders, Kafka connectors (supporting Avro and the Confluent schema registry), and change data capture (CDC) plugins.

Once data has been compressed into the binary 1’s and 0’s format, the speed at which computations can be made is so dramatically improved, that it feels like magic. The format also minimizes the physical space required to store the data. A typical FeatureBase implementation reduces an overall data footprint by 90%—or more—compared to the traditional database solution. Of course, the feeling of “magic” isn’t actually magic, but it is a real-life way to get more value out of the same data with less computing power than you are using today. Here’s what’s happening behind the scenes:

FeatureBase feature-oriented database architecture graphic which shows feature extraction and storage

This diagram shows a high-level view of how FeatureBase works. Existing data sources are accessed through taps. The data is then automatically transformed into features before any processing is done. The features are then stored (while being continuously updated) and ready to be instantly put to use. Data scientists can directly access, query, and compute on the features representing 100% of the data without going back to the source.

FeatureBase serves continuous and computable features at the latency and freshness required by demanding AI, ML, and advanced analytics applications, from model development to model training to deployment and maintenance. Homomorphic compression is one of the contributing factors that make FeatureBase’s fully-computable, highly-compressed data format possible. In plain English, that means the data can be operated on without incurring any CPU cost to decompress the data first.

FeatureBase Feature Lifecycle

Extract Features

Automated taps extract features from raw data at the source—even if your data is in a multitude of data centers—and pull those features
into FeatureBase. Feature extraction can be executed server-side or client-side. Client-side extraction reduces the amount of data transferred. FeatureBase can ingest features through continuous updates (streaming) and batch, pulls from sources such as Kafka, and ingests data formats such as CSV and JSON.

Store & Manage Features

Features are stored in-memory in a high-performance format ideal for model development, training, and production. Users experience low-latency exploration of features and can manage access and infrastructure via the control plane. FeatureBase employs a purpose-built feature storage solution rather than stitching together existing conventional technologies.

Consume Features

Use APIs—like HTTP, gRPC (Python), and PostgreSQL Wire Protocol—for end-user applications, including real-time decisioning, analytical apps, and model training. Shifting queries and data transformation to this phase provides more control within the relevant model or code.
Since extraction is automatic and continuous, consumption reflects the freshest, real-time data values. The consumption is suitable for training, as well as production.

FeatureBase is NOT a Feature Store

You may be familiar with an offering in the AI technology space called a feature store. Several technology vendors have begun promoting feature stores in recent months. A feature store is a repository for persistently storing and managing collections of features extracted from raw data. Similar to FeatureBase, the data in a feature store can be used to create models. Unlike FeatureBase, most reference architectures for feature stores are built using amalgamations of batch, streaming, caching, and traditional database systems.

Feature stores seek to solve the typical data engineering and data science problems, such as slow, manual data provisioning through IT, one-off efforts that can’t be re-used from project to project, version-tracking and lineage, and improving consistency of features between training and production. Unfortunately, the infamous problems of speed at large scale continue to haunt data teams, even when a feature store is used. typical feature store diagram

 

Why?

Because the features themselves are still being stored in the same old format. There are literal, physical restrictions to how fast the data can be retrieved and put to use, resulting in tradeoffs between data freshness, latency, and cost.

The concept of a feature store is great, but unless it is paired with a feature-oriented architecture, it will never be as fast, lean, or flexible as it could be.

Feature stores are a centralized place for data scientists to store and manage features used for machine learning. While they solve problems such as feature re-use, they ultimately store the features in traditional, AKA slow, database formats. 

FeatureBase enables a feature-oriented database architecture by reducing the dimensionality of data to its simplest form, extracting only relationships from the data, and employing homomorphic compression. FeatureBase lives between the data storage layer and the application layer, functioning as a database for real-time decisions.

The Business Impact of a Feature-Oriented Database

Here are some examples of how implementing a feature-oriented database such as FeatureBase could impact day-to-day business:

  • Immediate, real-time access to all of your company’s relevant data in a format ready for data scientists to build models now, instead of waiting for data requests to be fulfilled.
  • Ability to preserve legacy infrastructure investments and architectures, while also benefiting from a highly-performant data format that provides the best price performance.
  • Extreme optimization and collaboration as features can be stored and reused across projects, and across the organization, so that data teams aren’t starting from scratch with each project.
  • Increased margins and competitive differentiation through promoting a feature-oriented culture where data can be used more experimentally by data scientists, positioning your organization to be at the forefront of innovation.

The Future of Compute in the Cloud

Implementing a feature-oriented database is one step closer to ubiquitous AI. Looking to the future, we envision a world where instant computation on all data is comparable to turning on the tap to fill a glass of water or plugging in a device to get electricity. That is when the super-evolution becomes possible. Collecting data from disparate sources and storing it in the cloud is now “business as usual” for data storage. What we do with that data next will define our success.

How will you apply AI to make better decisions in your business?

graphic showing the AI application layers in machine learning
 

When data is automatically transformed into features at scale, accessibility becomes fast enough to enable computations in true real time. This means transforming, aggregating, and performing calculations on the data on the fly, without making copies or preprocessing, imagine this functionality as a new “compute layer” that serves real-time analytics applications the answers they need to make real-time decisions. 

 

Start Free Trial