A High Performance Team
At Molecula, we’ve got a lot of high-performing people, and for a long time, I thought that’s all we needed. I thought that was all anyone needed to be successful.
I’ve spent a lot of my career focused on improving my personal productivity, and sharing strategies for others to improve their own productivity. I picked a powerful, extensible editor (Emacs, woop woop!) and learned how to use it. I optimized many aspects of my working environment–everything from monitor to keyboard to operating system. I experimented with half a dozen tiling window managers, various terminals, a great many programming languages, and a few different version control systems. I’ve learned observability tools, performance monitoring tools, debuggers, and more, all in support of the mission to become a more productive and efficient Software Engineer.
To be clear, I don’t see an end to this journey. Tooling, languages, abstractions, and strategies keep getting better, and individual developers now wield more power than ever before. It’s possible for a single person to produce an incredible amount of value on their own and progress in that area continues to excite me. That is actually why I got into this stuff in the first place.
What I have belatedly come to focus on though is not individual performance, but team performance. There is an entirely orthogonal dimension of software development productivity to be unlocked by focusing on building high-performance teams in addition to high-performance individuals. These goals aren’t mutually exclusive, though there are often interesting tensions that emerge between the two. For example, Google essentially mandates that only a few different programming languages be used across the organization. There is a clear tradeoff here between individual flexibility and organizational flexibility. At Amazon, Bezos famously mandated that all teams expose any functionality through service interfaces that could just as easily be used by developers outside the company as by other teams inside the company.
These types of policies are not designed to improve the productivity of individual developers, but to improve the productivity and perhaps more tellingly, the scalability of the organization as a whole.
(There is, perhaps, an interesting parallel here in scalability/efficiency tradeoffs for the software itself vs the software team, but I’ll save that for another blog post)
I think there are probably at least three different levels at which you can optimize software development performance:
There may be more and things like big open source projects work differently from tightly controlled companies, but in any case, that’s beside the point. I want to talk about team performance, which is my current obsession.
This is a topic about which much has been written, but relatively little rigorous research has been done. How is it possible that multiple individual high performers can come together and produce less than the sum of their parts? Progress feels slow, confidence in the produced software decreases, work-in-progress gets stuck in code review, QA, or testing, priorities get muddied, and ownership becomes unclear. Maybe sometimes the same work gets done twice, while a critical issue remains unresolved. Does any of that feel familiar?
How do you measure all this misery? How do you quantify it, and how do you fix it? How do you take a group of individually intelligent and productive people and help them to produce effectively toward a common goal?
I certainly don’t have a single answer, but I am beginning to see a framework to reign in the chaos. I said that very little rigorous research has been done, but there has actually been some, and in particular, Accelerate provides a solid foundation based on rigorous statistical analysis of broad survey data on what sets high-performing teams apart.
I won’t go into their methods, except to say that they do a great job of explaining and justifying them and if you’re skeptical you should read the book (you should read it regardless).
Probably the most important gift they give you is telling you what to measure:
1. Deployment frequency
2. Deployment lead time
3. Change failure rate
4. Mean time to restore
High-performing teams have greater deployment frequency (multiple times per day), shorter lead times (less than an hour from merge to production deploy), a lower percentage of deployments leading to failures, and less time on average to restore availability when a failure is encountered.
These are the key metrics to track, make visible, and strive to improve which leads to more effective teams.
Additionally, their research breaks out 24 capabilities that affect software delivery performance “in a statistically significant way.” Some of these are quite obvious like “use version control” though it’s nice that the research confirms our intuitions. Others might be considered more controversial, like “use trunk-based development methods,” meaning that most work should be merged to the main branch quickly, and other branches should be very short-lived (less than a day). The first 8 capabilities (of which those two are examples) fall under the umbrella of “continuous delivery,” and have fairly obvious effects on the 4 metrics. They also include automated deployments, automated testing, continuous integration, continuous delivery, test data management, and building security into the development process early on.
The second category of capabilities is architectural and includes “use a loosely coupled architecture” and “architect for empowered teams.” These are more organizational than team capabilities (by my hierarchy above) and Amazon’s push for SOA is a clear example of this. Individual teams can use whatever technology they want (empowerment), but they must expose their functionality through interfaces that can be externally or internally consumable (loose coupling).
The third category of capabilities is “product and process.” Customer feedback should be visible to the team and the flow of work including addressing that feedback should be visible as well. Work should be in small batches to enable fast feedback/iteration. Team empowerment also comes up here—teams should have enough autonomy to feel comfortable experimenting, not just with different technical solutions, but with changing product specifications as they uncover things during the development process—obviously this is more effective if the developers are hearing customer feedback and understanding the pain points!
The fourth category is “lean management and monitoring capabilities.”
This is about limiting work in progress (small batch sizes), making key metrics visible, having lightweight code review/change approval, monitoring system health proactively, and probably most importantly, setting up a tight feedback loop from production so that defects and issues can come back to development quickly.
The fifth category is “cultural capabilities.” This is a bit fuzzier, but the authors have done quite a good job of carefully defining and measuring it. In brief, the culture should support learning, inquire into failures but do so without blame, foster collaboration between teams, and encourage experimentation.
I’ve listed all these capabilities out to demonstrate that there is a pretty core issue we need to solve first to do almost any of this stuff. Continuous delivery, monitoring, setting up feedback loops from production… What do we need to make all that happen? How can we track the four basic metrics?
We need to have a “production” environment.
We must have something to deploy to if we’re going to measure our deployments. “Production” can’t be a customer environment because there’s only so much control that we can exercise over customer environments and how we deploy to them. We must operate our own infrastructure that we “eat” on a daily basis. We have to monitor it, we have to use it, we have to benchmark it, deploy to it, test it, and so on.
To be clear, we’ve gotten pretty far with automated testing and tooling to simulate complex situations like flaky networks and we need to keep all that and keep adding to it, but it’s very difficult to simulate all the realities of long-running infrastructure which has been upgraded multiple times throughout its life. That’s what our customers are doing, so that’s what we need to do.
This is well worth spending time and money on because nearly everything else we might do to measure and improve our performance hinges on it.
Production, we need it™.
Molecula’s enterprise feature store simplifies, accelerates, and improves control over big data infrastructure for advanced analytics, machine learning, and edge/IoT. Its unique ability to deliver highly-performant representations of large, disparate data sources eliminates the need to pre-aggregate or federate, thus reducing data delivery cycles and data gravity.
Global 2000 organizations rely on Molecula to help achieve a data-driven enterprise by accelerating decision-making, enabling real-time customer segmentation and analyzing large, distributed datasets across any cloud, from core or edge. Molecula is based on Pilosa, an open-source project with 2,000+ users across many tier-one organizations.
Molecula has offices in Austin and Palo Alto and was founded in 2017 with a mission to unlock human potential through the power of data.
To learn more about Molecula:
Interested in joining the team?