2021 Reflections, Top 5 Data & Computing Predictions for 2022
2021 is coming to a close, and with it, a fantastic year full of change and inspiration. Here at Molecula, 2021 has served as an inflection point, allowing us to understand our customer pain points and needs better while also throttling us towards a clear vision of our future state, and nothing could be more exciting.
First, let’s cover a few of our highlights…
Molecula’s Greatest Hits: 2021
Molecula was included not once, not twice, but three times in key Gartner reports, including “Cool Vendors in Data Management,” “Hype Cycle for Data Science and Machine Learning,” and “Feature Stores for Machine Learning.”
We joined the AI Infrastructure Alliance! This aligns us with best-in-breed infrastructural companies working towards a composable, canonical AI infrastructure that can be accessed and employed by companies of all shapes and sizes.
Our team grew from about 20 people to nearly 70!
We built and began launching customers on a cloud version of FeatureBase to serve as Molecula’s platform for all future endeavors.
While these accomplishments came with their fair share of growing pains, the effort that has gone into each and every one of them makes each win that much sweeter. But we’d be remiss to think that our achievements are the only relevant ones. The entire advanced analytics, machine learning, AI, cloud infrastructure, and general data landscape made massive strides over the last year. Nevertheless, it’s clear that we’re just getting started. So with that, here are some of our thoughts, ideas, and predictions for the coming year (as influenced by many of our brilliant friends, mentors, thought leaders, etc.).
Here’s a list of our top 5 big data and computing predictions for the coming year:
Power Shifts in the Cloud:
Businesses will take full advantage of cloud-native architectures
No one says it better than Erik Bernhardsson in his recent blog post “Storm in the Stratosphere: How the Cloud Will Be Reshuffled.” Cloud vendors have held far too much of the power over the last decade, but as we say, startups are starting to out-cloud the cloud with the cloud. We’ve been saying it for years now; current cloud strategies have deeply resembled “lift and shift”, which means that, while optimizations have occurred, software has not been built intentionally for the cloud, and therefore does not capitalize on all the opportunities cloud-native architectures offer. With Snowflake leading the charge, this is starting to change, and we believe it will be a continued theme in the coming years. Gone are the times of moving all of the pieces of your on-prem infrastructure into the cloud and essentially recreating what you had — welcome to the new age: one where storage and computing are entirely separated, and each can be optimized to be made more efficient. Stay tuned…
Abstracting Away the Nitty Gritty:
Simplified data infrastructure will triumph thanks to low code/no-code tools
We first saw this with the rise of serverless (a cloud-native development model that allows developers to build and run applications without managing servers). But, over the last year, we’ve seen continuous advances towards abstracting away the engineering-heavy aspects of data infrastructures and inviting new teams into the mix. Along with serverless, we’ve seen the rise of low-code/no-code, with particular attention paid to Databricks’ platform as well as AWS’ Sagemaker Canvas. Both of these low/no-code tools ingest/explore data and build ML models that run on your data, all with just the drag and drop of a mouse click. With these advances in abstraction, new audiences for data products have emerged. We believe these shifts will allow the merging of business and data knowledge needed to allow for the proper implementation of advanced analytics, AI, and ML in a generalized way.
The Year of Convergences:
Data warehouses vs. data lakes, data scientists vs. data engineers – it’s all coming together
Over the last year, we’ve heard many convergence stories: the data warehouse is converging with the data lake, the transactional database is converging with the analytical database, the data engineer is converging with the data scientist, etc. As additional specialized tools emerge, the big players in the space are trying to cover more of what any single organization might need. The best example of this is the convergence between Databricks and Snowflake. Databricks started as a data lake, mainly dealing with unstructured data but not necessarily providing the warehousing needs for analytical workloads within organizations; meanwhile, Snowflake focused on serving the data warehouse role providing structured and semi-structured data for analytical purposes. This past year, Snowflake worked overtime to add support for unstructured data to its toolbox. At the same time, Databricks hardened its Delta Lake offering, providing structured and even preaggregated tiers of data within its platform for analysis. Similarly, Founder/CEO of Clickhouse, Aaron Katz, often mentioned that NoSQL databases like MongoDB are adding SQL functionality, while SQL databases are starting to allow for less structured queries.
The Edge Gets Edgier:
Edge computing will make one hell of a comeback
Market leaders are focusing more and more on increasing edge coverage and computing. During re:Invent, AWS rolled out 30 new local zones worldwide, along with new ways to connect to, what they call, the “Internet of a Billion Things.” Before COVID-19, more and more companies were leaning into edge computing, establishing mini, remote data centers in small towns and cities, like Socorro, New Mexico, to provide super low-latency data access. But when COVID hit, people remained at home, and the urgency around edge computing ebbed. As the world begins to return to pre-COVID patterns, edge computing will re-emerge.
Efficiency will Reign Supreme:
Organizations with an advanced analytics strategy will unlock the power of AI/ML
We’ve all heard that machine learning and AI will overtake the world for years. In one version of the story, machines become our overlords while humans are relegated to cubicle-style rooms, never to emerge again. In the other version, we achieve utopia; machines automatically know everything you want and need, making life much simpler and more enjoyable. These stories have made headlines for years, but neither has come even close to fruition because we do not yet have large-scale implementable AI. Why has this been so hard to achieve? The FAANGs make it seem possible, and at times even easy, but unlike most organizations, FAANGs do not care about efficiency, so they’ve created unachievable standards and precedents. Most organizations will never be able to implement advanced analytics and AI until the value they unlock exceeds the effort/resources put into the process. We believe there will be a push towards efficiency in the coming years, not just performance, speed, and scale.
2021 has been a year of extreme acceleration on many fronts. We can only imagine it will reach stratospheric levels in 2022, and as we look ahead, we could not be more excited about what the future holds. Still, it’s tough to truly appreciate where we are today without taking a quick look into the past and understanding the strides we’ve made in the world of data infrastructure and computing. We invite you to read this Brief History of Data as a reminder, and if you enjoy it, download our Feature-First Field Guide.
Introducing our Feature-First AI Field Guide for Executives