Molecula’s Latest Product Release (4.0): Consistency, Scalability


Bring All of Your Data, No In-Memory Limits Here!


March 10, 2021 – Our most recent product release focuses both on consistency and the ability to scale elegantly. It delivers a proprietary transactional backend that enables our customers to work with datasets that scale beyond memory, and enterprise-grade cluster management that will allow users to confidently react to and navigate hardware-related failure scenarios. Customers will also benefit from massively reduced system resource usage when using features like time quantums, which have previously been prohibitively expensive at fine granularity and large data size.

Why is this important? Well, most in-memory data systems require you to decide what data is put “in-memory.” This deeply limits your work and your ability to think about multiple business problems at a given moment because your memory sizes will often tap out within gigabytes (meanwhile, your business is likely collecting and needs access to terabytes or petabytes of data). Adding to this limitation is the fact that data is often in a raw format that can be 10x or more larger than the compute-optimized format used by Molecula. Current in-memory technologies require you to think long and hard about what data you will work with and ingest.

Molecula’s latest FeatureBase release, however, leverages an optimized in-memory format that allows for performant in-memory processing while also maintaining homomorphic compression (which results in extremely efficient use of your hardware). 

Molecula’s proprietary data format and technology already results in a typical 10x compression of data footprint compared to other systems, so it’s rare that we hit our limits. However, we’ve begun to see more and more use cases that employ significantly more data (typically “time-series” type use cases or high volume use cases with large historical data) — both of these use case types will benefit from the ability to scale beyond memory.

To put it into the simplest of terms: bring all of your big data — in-memory is no longer a limit!

Now let’s dive into some of the specific functionality arriving with this release:

FeatureBase’s Proprietary, Patent-Pending Transactional Backend

Our engineering team has built a proprietary, ACID compliant transactional backend for low level feature storage. This new storage engine allows customers to work with datasets that are larger than memory, while maintaining blazing fast read and write performance. There is no hard limit on how large a data set can scale beyond memory as long as the instance is configured to support it. It also elegantly handles memory allocation and management, reducing memory usage and opening up CPU, which makes FeatureBase’s operational cost even more competitive.

Enterprise Grade Cluster Management

In this release, we have enhanced the platform’s cluster coordination and state management to gracefully recover from scenarios such as hardware failure and network partitions. FeatureBase has embedded etcd to maintain and coordinate cluster state. Etcd is well renowned for its reliability (products including Kubernetes and Cloud Foundry are built on top of and/or leverage etcd). With this new implementation, our customers will be able to prevent, troubleshoot, and respond to hardware failure scenarios with the confidence that their cluster will remain stable.

UX Enhancements

While our focus with this release was around consistency and scalability, we have also added query and operational enhancements to make user’s lives easier:

  • Automatic generation of record IDs: When ingesting into FeatureBase, each record must be associated with a key – this functionality automatically generates keys for datasets that do not have them.
  • Added query functionality: 
    • Query autocomplete functionality in the UI
    • Ability to get top values for a given filter that are exact and deterministic 
    • Support for percentile computation

What does this mean for you?

In short:

  • Added stability when nodes go up and down (i.e. no concerns related to outages)
    • Removes the possibility of getting inconsistent cluster-related information from different nodes
    • Removes the possibility of nodes no longer communicating or syncing data to each other
  • Say goodbye to in-memory limitations! Bring all the big data you’ve got — we can handle it with our ability to load datasets that are larger than memory into FeatureBase

If you have any questions about our latest release, we’d love to chat! Get in touch here.