The Data Engineer’s Social Dilemma


What do you do when someone asks, “What do you do?”

comics Cyanide and Happiness ghost hunter 558691Whether you love it or hate it, you’d probably recognize that look on an unsuspecting dinner party victim’s face after you tell them you’re a data engineer. Bewildered, bored, or beguiled, it’s likely they have absolutely no idea what you actually “do” for a living. To be fair, you can hardly blame them as Robert Vargus from the Center for Cognitive Brain Imaging at Carnegie Mellon explains, “most of our understanding of how the brain processes objects and concepts is based on how our five senses take in information.” Data engineering, like ghosts, can be pretty difficult to see, taste, hear, smell, or touch.

Of course that doesn’t mean it isn’t real. There are about 176,000 people with “data engineer,” in their title on LinkedIn and no doubt countless others who perform data engineering tasks under a different title.  

In fact, data engineering is the fastest-growing tech job in the world thanks to the exploding fields of machine learning (ML) and AI which require access to massive amounts of data. And while data scientists rightly get a lot of attention for developing the latest predictive models and neural nets, it’s the data engineers who wrangle and harness the raw data necessary to productionize AI and ML. Now that we’ve established that you are real, and you are important, how do you tell people what you actually do?

What Does a Data Engineer Do?

In a nutshell, a data engineer builds and maintains the infrastructure that ingests, routes, stores, and serves data into a usable data product. This is done by building and maintaining complex data systems across an organization. Data engineers ensure that the data is available, clean, reliable, prepared, and accessible for whatever the use cases require.

Analogies Make for Better Conversations

While the above description is generally accurate, it might not be the best way to explain what you do to a layperson. In S.I. Hayakawa’s Ladder Of Abstractions notion, he states that sophistication of thought deals in the realm of abstraction, but sophistication of good communication is achieved through supporting those abstractions with concrete details.

That’s why analogies can be so effective at communicating abstract concepts such as ‘big data,’ whether it is the “tsunami” we need to brace for or the “new oil” we need to extract. Here are some metaphors that might be helpful when describing what a data engineer does using more concrete terms: 

The Coffee Analogy

As Dominic Ligot, a data analyst and developer, explains on Quora, he begins his data engineering story with coffee beans; you can’t just eat the raw beans (unless you’re a civet of course). Someone has to select, roast, and grind the beans. Then they’re brewed, recipes are created, and so on, until there is finally a delicious, consumable beverage. 

In this scenario, the data engineer is the selector, roaster, grinder, and brewer

Data engineers are like coffee producers.

The Transportation Analogy

Data engineer, Michael David Cobb Bowen likens data engineering to transporting people. In his analogy, people are data. They need to get places. A child going to school may encounter a bus driver, a crossing guard, and a playground monitor who all make sure the kid gets from point A to point B safely and efficiently. With big data, the analogy scales up to airports, highway systems, trains, etc. 

The data engineers are the professionals who figure out how to move lots of people (data) safely and quickly to the right place at the right time while avoiding traffic jams.

Data engineers are like transportation planners

The Race Car Analogy

Dataquest describes data engineering as it relates to racing. In contrast to the data scientist who gets a thrill from speeding around the race track in front of an audience, a data engineer finds their joy in constantly tuning the engines, experimenting with exhaust systems, and building a more efficient machine.

Data engineers are like race car mechanics

The MacGyver Analogy

For the 1980’s TV fans, we can describe data engineers as the MacGyvers of data. Engineers manipulate a myriad of software and hardware devices in creative ways as they ingeniously improvise new techniques to complete data challenges by whatever means necessary. We count on them to outsmart the bad guys (difficult-to-access data) and accomplish the mission, while keeping all the systems alive and running.

Data engineers are the MacGyvers of data

What Makes a Successful Data Engineer?

Perhaps you can relate some of your data engineering responsibilities to the above analogies. And maybe you’re already great at explaining what you do to others. In any case, there is one aspect of data engineering that propels the best forward: mastery of data platforms and tools. The ability to seek out, learn, tweak, and implement the right software and hardware tools for each job is a critical artistry that makes the top data engineers stand out.

The Future Depends on Your Toolbox

AI and machine learning implementations will change our lives forever in nearly every category. Successful ML depends on data scientists. And data scientists are only as good as their data engineers. And data engineers are only as good as their tools.

Whether it’s collecting data, storing data, or figuring out how to get access to the right data in the right place at the right time, the tools you select, implement and use to maintain all of the above will literally define the future.

So the next time you’re at a dinner party, make an appearance at little Sally’s career day, or just want to understand your role better, a concrete analogy can go a long way. But keep in mind that while it’s a little harder to visualize or properly appreciate, the key to data engineering success isn’t knowing some particular programming language or setting up an amazing architecture for a single project. It’s your ability to be curious, stay up-to-date with new technologies, and have the courage to try new tools that will lead to next-generation solutions.