FeatureBase—

a feature extraction and storage technology that enables real-time analytics and AI initiatives, making model-ready data accessible, usable, and re-usable across organizations

Watch Video

Using FeatureBase—

Purpose-built feature storage that automatically converts all of your data into a model-ready format.

Topic Stream Data Source Real-time Decisioning Application Analytics Model Training Feature Storage Store + Manage Features Monitoring and Observability Auth + User Management Feature Exploration (Python, SQL, Go) Feature Extraction Store + Manage Features Consume/Output Feature Extraction Feature Extraction TAP TAP

Extract Features

Our taps extract features from raw data at the source — even if your data is in a multitude of data centers — and pull those features into FeatureBase. Feature extraction can be executed server-side or client-side, while client-side extraction reduces the amount of data transferred. FeatureBase can ingest features through continuous updates (streaming) and batch, pulls from sources such as Kafka, and ingests data formats such as CSV and JSON.

Store + Manage Features

Features are stored in-memory in a high performance model-ready format. Users experience low-latency exploration of features and are able to manage users and infrastructure via the control plane. FeatureBase employs a purpose-built feature storage solution rather than cobbling together existing data-centric technologies.

Consume Features

Leverage our APIs—like HTTP, gRPC (Python), and Postgres Wire Protocol (SQL)— for end-user applications including real-time decisioning, analytical applications, and model training. Queries and transforms happen within this step in your model or code.

Product Overview

Core Technology

FeatureBase simplifies, accelerates, and improves control over data to power real-time analytics and AI. FeatureBase is an overlay to conventional big data systems that automatically extracts features, not data, from each of the underlying data sources or data lakes and stores them in one centralized feature storage platform. FeatureBase maintains up-to-the-millisecond data updates with little to no upfront data preparation. This is achieved by reducing the dimensionality of the original data, effectively collapsing conventional data models (such as relational or star-schemas) into a highly optimized format.

Capabilities—

Feature Extraction

‘AI Ready’ feature storage that continuously extracts and updates features in real-time

Time-Oriented Filtering

Track and filter time at a feature level

Supports ML Workloads

High concurrency queries for machine-scale analytics and ML

Single Point of Access

Centralized, ultra low latency access to all of your data

Reduces Footprint

Lossless reduction in data footprint, up to 85%, without copying or moving data

Eliminates Pre-Processing

Performant Joins at query time, with no pre-aggregation or pre-processing

Underlay Implementation

Extension framework enabling seamless integration into existing environments

Environment

FeatureBase is beneficial for organizations that need to access large quantities of real-time data events each day joined with many fragmented, terabyte-scale data sources. FeatureBase excels at complex analytical workloads where source data is fragmented across silos, and where a user or machine wants to apply a number of filters or criteria to a query.

Implementation

We have a variety of ingest plugins including bulk SQL loaders, Kafka connecters (supporting Avro and the confluent schema registry), and change data capture (CDC) plugins. We are constantly adding new ingest plugins, so please ask us about our latest additions. Our experienced customer engineering managers are trained to bring these integrations, customized for your complex data environment, to production during the implementation period.

Format

FeatureBase stores data in a format that extracts features at the original data source and then homomorphically compresses them for transmission and storage. The core format allows for granular scans at a feature-by-feature level rather than a columnar or tabular data format. This enables breakthrough analytical performance, allowing for unprecedented iteration speed in feature engineering on the totality of large data sets.

Licensing

FeatureBase can be deployed in hybrid environments, and is priced based on your organization’s unique consumption demands.
Molecula also offers a Managed Service option for FeatureBase. This offering is designed for companies that need fast, scalable analytics and don’t have the bandwidth to learn and implement a new solution.

Contact Sales

Integrations—

Kafka

Ingest data from a Kafka topic into FeatureBase

Kafka Connect

Ingest data via Kafka Connect into FeatureBase

MySQL

Ingest data from MySQL database into FeatureBase

SQL

Ingest data from SQL Server database into FeatureBase

Snowflake

Ingest data from Snowflake data warehouse into FeatureBase

Cassandra

Ingest data from Cassandra database into FeatureBase

Teradata

Ingest data from Teradata data warehouse into FeatureBase

Spark

Ingest Spark data streams into FeatureBase

Parquet

Ingest Parquet files into FeatureBase

S3

Ingest files from your S3 instances into FeatureBase

Big Query

Ingest data from your Big Query data warehouse into FeatureBase

ODBC Driver

Ingest data from any database into FeatureBase via ODBC

Prometheus

Monitor your instance with Prometheus

Splunk

Monitor your instance with Splunk

Jaeger

Monitor your instance with Jaeger

StatsD

Collect metrics about your instance with StatsD

OpenTracing

Monitor your instance with OpenTracing

Datadog

Monitor your instance with Datadog

Jupyter Notebook

Query your FeatureBase data from your Jupyter Notebook

Pandas

Create Pandas data frames from FeatureBase

Snowflake

Query your FeatureBase data using Snowflake

RStudio

Query your FeatureBase data using RStudio

RAPIDS

Query your FeatureBase data using RAPIDS

GRPC

Leverage our GRPC API to connect custom applications

Grafana-logo

Grafana

Visualize and query your real-time, mission-critical data


Read more about our product and related resources—

Molecula’s novel approach to data access breaks through the latency floor created by the zoo of legacy data processing technologies, eliminating the need to pre-aggregate, federate, copy, cache or move source data.

View Resource

Unlock Human Potential with the Power of Real-Time Data Molecula is an Operational AI company […]

View Resource

In this paper we introduce a novel approach to data virtualization which is making strides to clear the log jam that has increasingly plagued data beneficiaries for years.

View Resource

Calculate Your TCO with Molecula—

Molecula's novel approach to data access is game changing for your machine-scale analytics and AI. By simplifying real-time analytics and AI infrastructure Molecula can reduce footprint by 60-90% and save you time, resources, and headaches. Come see for yourself.