One Company’s Secret to Segmenting Millions of Customer Records INSTANTLY

By: Molecula

Customer data footprint reduced from 700GB to <70GB.

350 million records queried in 9 milliseconds!

 

A large ad-tech company that provides a one-stop customer data platform for internal stakeholders struggled to gain valuable insights about its customers. A primary cause of this challenge was that data was split among several different time-series and reference datasets. Despite having a treasure trove of actionable data (including multiple data warehouses and pipelining tools), they could not match profiles across datasets. The inability to do so caused the company to miss out on opportunities to better segment and analyze their data. Even when they could match profiles and segment customers, their queries took so long to run that they were irrelevant when they returned (and cost a fortune!).

Implementing FeatureBase within their infrastructure, Molecula’s database for real-time decisions, resulted in millisecond complex queries of more than 350 million customer records. Let’s walk through how:

 

The Existing Schema and Opportunity

With a strong emphasis on regulatory compliance and privacy, the company shared data with obfuscated emails via SHA-256 bit hashing. With the pre-existing data schema, ingestion could be completed via batch or streaming through a common carrier such as Kafka. Sample dataset fields included: 

  • Geographic ZIP information
  • Advertisement IDs
  • Device type
  • Browser type
  • URL visits 
  • Demographic data consisting of 60+ fields

The company believed stitching this data would inform platform users of advertisements viewed by target segments. They also wanted to refine targeting by geographic region, which has traditionally proven challenging.

Updated Data Schema:

Screen Shot 2021 12 02 at 11.13.51 AM e1638487094416

Using FeatureBase to Localize and Segment JOINs Based on Time

The ad-tech company was impressed that FeatureBase could not only link unique IDs or factual information about individuals and companies across datasets but also validate by associating ID with a specific timestamp. With FeatureBase’s multi-layered, crosslinking functionality, the customer localizes and segments specific JOINs based on time. For example, a record can be associated with a specific timestamp, providing flexibility to search through billions of records at granular windows of time. Because there is a live JOIN when executing this query, the process traditionally requires preaggregating the time window portion of the data; with FeatureBase’s highly performant, low-latency format, the JOIN is performed directly within the query.

 

For a more detailed look, here are a couple sample queries the customer found valuable: 

 

[Audience Facts]Groupby(Company), filter=Distinct(Row(Device Platform=’iOS Mobile’,

              from=’2021-03-02T03:00′, to =’2021-03-23T03:00′ ),

field = ip address, table=Ad Serving)

The above query performs a Groupby of companies from the “Audience Facts” table and filters from the separate table “Ad Serving”, searching for a specific platform within a specified time range. To be more granular: in the query above, the platform was “iOS devices,” and the goal was to associate IP ranges of those devices to particular companies of interest in real time. 

[Ad Serving] TopK( Row(Ad Place= Ad URL)

The above query is a TopK, an ordering operation that returns the top ad URLs across all records. Note: with FeatureBase, this query is scalable and performant even in the billions of records! These queries are excellent examples of how the customer reduced preaggregation and used FeatureBase to discover and validate customer segments in milliseconds. 

 

Reducing Data Footprint and Compute

With over 700GB of compressed data, the customer initially struggled to extract value. Data cardinality and scale caused the original cloud data warehouse and pipelining tool to be slow and expensive per query. FeatureBase enabled them to search across datasets, count intersections, and reduce preaggregation— all at ultra-low-latency! 

With FeatureBase, the customer reduced their storage requirements by 10X and minimized compute consumption, all while accomplishing their primary objective: use the data to solve a real-time customer segmentation challenge.

 

Watch our webinar to learn more:

Real-Time Segmentation to Improve Customer Experience

Watch Now