[ fee-cher ] n. A feature is an individual, measurable property, attribute, or characteristic of a phenomenon being observed to serve as a computationally efficient input variable for a given system or model.
The Origin of Features
The technique of using features was pioneered by data scientists who needed to prepare data for demanding machine learning and AI workloads. Features have historically been extracted from source data in a process called feature extraction, which is one of the initial steps in the overall feature engineering lifecycle. Feature extraction has typically been a manual process performed by data scientists, once she has the source data, usually exported in an adHoc process from IT databases by data engineers. Once the data scientist has the data, it is almost always moved to their laptops for processing, which consists of several steps including data preparation, feature ranking, feature selection, feature transformation, and feature reuse.
A feature, in its purest form, is information. It is an attribute or unique variable. A feature represents the presence of a particular attribute for a given record with 100% fidelity to the underlying dataset. A feature, which is conceptually interchangeable with a “column” in tabular data, represents a measurable piece of data that can be used for analysis: Name, Age, Sex, Fare, and so on.
Because the process of generating features has required considerable manual efforts, only a small subset of original data is usually extracted into features.
It is clear that the world is waking up to the power of features as the raw ingredient to fuel the entire ML Lifecycle, but we haven’t yet enabled a full mindset shift away from “data as fuel” and instead to “features as fuel.” We continue to move data to laptops, manually extracting features, then fighting IT to try to figure out how to productionize our work. Worse, we are storing these features in data stores, causing unnecessary latency and inefficiency.
How Features Work in Molecula
With Molecula, we have invented a technology that converts semi-structured and fully-structured data into features called automated feature generation. Once extracted we store features in a feature-first format that is managed inside of Molecula’s feature store. This process is especially effective on datasets that are Terabyte in scale and/or generating millions and billions of events per day.
When features are stored and retrieved from Molecula’s feature store, features are able to securely serve analytical, high-concurrency workloads in milliseconds, while creating a 60-90% smaller footprint than the data they are representing. This performance, which traditional information-era systems can not achieve, allows for transformations and joins to happen directly in your model, either in training or production.NEXT