Enhance Snowflake for Real-Time Analytics with FeatureBase

By: Molecula

What is Snowflake?

Snowflake is a cloud data warehouse that can run on Amazon Web Services (AWS), Google Cloud Platform (GCP), and Microsoft Azure. Fundamentally, Snowflake was born out of the need to migrate legacy data warehouses to the cloud for reduced data storage costs. Snowflake’s breakthrough in the data warehousing market was driven by the concept of separating storage and compute, which allows for greater pricing flexibility and pay-per-second billing. This pay-per-use model is a paradigm shift in how organizations typically interact with data warehouse vendors. 

More than a cloud data warehouse, Snowflake has evolved into an intelligent analytics platform that powers a variety of use cases. However, even with the numerous benefits Snowflake offers, delivering on real-time use cases remains out of reach.

Why Enable Real-Time Capabilities with Snowflake?

The definition of real time can vary depending on the business outcome or use case. This malleable definition can create problems if supporting data is not fresh, low-latency, and actionable. For example, when implementing solutions to optimize real-time customer experience, real-time means serving results within seconds, and more often milliseconds, to mission-critical applications with high concurrency. 

As organizations progress in analytical maturity, from reporting trends to predicting and prescribing real-time business decisions, they typically see Snowflake consumption costs increase. In addition, achieving real-time decisions may require other technologies and tools, many of which only operate on the data stored within Snowflake. These limitations often leave organizations unable to perform impactful real-time analytics due to Snowflake’s traditional RDBMS data structure and the resulting workarounds, like preaggregation, that seek to achieve low-latency at the cost of data freshness.

Reasons to consider enhancing Snowflake: 

  • Latency hinders analytics. Examples include complex queries bogging down performance, like those with unions, multiple JOINs, or concurrent users running queries simultaneously. In addition, lLag times between queries and results may take several minutes or even timeout and return no results at all. To work around system limitations and reduce this lag time, compute resources may need to be increased, or preaggregation tables may be developed and refreshed as new data is available. Unfortunately, this process can add minutes, hours, or even days between when data is generated and when it is available for analysis. 
  • Cost is impacted by the scale of data, the complexity of queries, performance requirements, and increased storage volume and processing of preaggregated copies. In addition to the computing costs on large volumes of data, scaling up the performance or adding additional technologies to overcome freshness and latency issues may further increase costs.

FeatureBase + Snowflake:

Snowflake is an extremely powerful tool for enabling business intelligence initiatives, but may benefit from specific enhancements due to its tendency to slow down when two or more are true:

  • Queries become too complex 
  • Queries are highly-concurrent 
  • Data ingestion volume is large
  • Computing on billions of records

Molecula’s FeatureBase solves these challenges. Molecula’s FeatureBase is a feature-oriented database service purpose-built for real-time analytics and machine learning. FeatureBase continuously extracts and updates features from Snowflake and other data sources without the need for staging or preaggregation. This superpower allows Snowflake to serve real-time applications and analytics more efficiently than it could alone. 

How they work together: 

  • Snowflake: Ingest all data to maintain Single Source of Truth (SSoT) and power analytics where low-latency is unnecessary.
  • FeatureBase: Ingest large volumes of structured and semi-structured data very quickly, like streaming sources and IoT device logs, via SQL or Change Data Capture (CDC), to serve real-time or low-latency applications.

FeatureBase + Snowflake enables organizations to graduate beyond human-scale queries and into real-time, high-concurrency queries for machine-scale analytics. 

FeatureBase + Snowflake Benefits:

  • Access the data you need at the moment you need it 
  • Improve data freshness by operating on your most up-to-date and relevant data, then store it within the Snowflake SSoT
  • Reduce latency and costs by computing on data in its most optimized format

 

Architecture:

Diagram illustrating the benefits of Snowflake and FeatureBase together

Fig. 1

Learn more about how FeatureBase enables real-time analytics!

 

Watch Video