December 2014 – Dave's Blog

In my current role I’ve inherited responsibility for the build system for the products my company is working on. This roles has a pretty broad remit; I’m involved in lots of areas including how our modules our layed out, how code is generated, packaging, deployment and even how the build supports the software architecture

For context, the system I have inherited is quite complex; it’s a mixed Scala and C# (for Unity3d) project, with heaps of code generation tangled into the build. There is custom dependency management build on top of Maven/Ivy. Unity3d has its own build system. The build is used for both the components and the games themselves.

The previous approach was to create a monolithic build system with all of the components required to build games and libraries packaged up with the bottom layers of the software. Instead of a monolithic build, we should adopt a approach more like the Unix philosophy, that is, we should provide the tools that people can use to extend their own build instead of attempting the replace their entire process.

I’d like to rewrite this, but I don’t expect this to be easy; it’s a moving goal with many developers still extending this system. While thinking about this problem I wrote a list to describe what I think makes a good build system. By sharing this approach I wanted to get the other developers on board with the direction I’m going so that they don’t generate any more technical debt.

Hopefully this list is not controversial so I haven’t attempted to provide justification for each item. I’m hoping that the value of each should be obvious. These should serve as preferences or guidelines rather than rules. In some cases, there may be technical reasons that they’re not achievable.

I want to be able to check out the source that I need from a single repository. I want this to build and test with a single command. I want to go from nothing to a working development environment in very few steps.
I don’t want to have to open several IDEs; it should be optional. If I do have to open another, IDE the build should be self contained. I shouldn’t have to open the same IDE twice either. Using an IDE should be optional – command line should be an alternative.
I only want to check out the code that is relevant to me; if it’s not in the products I work on, I don’t need it. I want the area I work on to be “small”.
I want to continuously integrate with my team. I want to be able to branch and develop with the same abilities as master/develop.
Working across components should be rare, but if I need to work across components, I want to be able to do this with minimum fuss, all on my local machine. That is, with very few extra commands, without manual copying.
I want to be able to work on sub-projects like they are a top-level project. They should be self contained/independent. That is, I can build, test and publish with single commands. Their dependencies should be built. It’s just convenience that a sub project is grouped together with others.
I want to be able to leverage my skills for each platforms I develop on. For example, if I’m using C#, I want to use Visual Studio and MSBuild. I do not want to have to learn skills that are not transferable. I don’t want to learn a new platform if it isn’t essential to what I’m doing.
I want to be able to continuously deploy/integrate. I don’t want to have to deliver to other teams. I want to be able to add value myself. That is, teams should work on products, not components.

I wrote the list in a format a bit like user stories, where the users (or the “I”) is a developer.
Later, it struck me that this list reads like a manifesto. It is after all a declaration of my intentions.

I believe that the build is something critical to the success of a project. A build process done wrong can dramatically harm developer productivity.

This list should capture some of the most important points of an effective build system. I haven’t attempted to capture everything; they are from the context of where I am, there are some “obvious” points, such as reliable and reproducible that I haven’t documented here as we already have them.

This list should capture some guidelines/qualities that I often see overlooked. I might go into detail of each of the points in future posts.

Admitting something is hard to understand is often very difficult for software engineers, as no matter how complex a system is, if you put enough effort in you will eventually gain insight.

The best systems (not just software) are often easy to understand. If systems are easy to understand, this means you can more easily reason about them; you can track down bugs more easily, add features and optimise more effectively. Easy to understand systems are often a collection of well defined components that we can treat as black boxes; you do not have to understand the details to understand the system as a whole.

There are notable exceptions; for example, the Linux kernel and its driver space is notoriously difficult to understand. That complexity has been introduced over time. Despite the fact that the kernel is monolithic, it started out with manageable levels of complexity and reached a critical mass. Now there are many enterprises and individuals invested in its success. That said, not attempting to simplify is very risky; how do we know how complicated is too complicated?

Whilst, “hard to understand” can be subjective, its rare that easy to understand systems just emerge; you need to put effort into making them easy to understand. There are many tools we can use to better reason about a system. Abstraction underlies most techniques for understanding. By hiding extraneous detail we make systems easier to understand.

Layers constrain the dependencies between components. Using layers makes an architecture easier to understand because it aims to limit connectivity between components. Each connection between components has a complexity cost. That said, not all architectures need to be layered. Some systems are simple enough that they don’t need layers. Some systems require a certain structure that is inherently complex, and cannot be fitted into layers; luckily they are the exception. Layers themselves have a complexity cost and should be used sparingly.

To help our understanding and communication, we need to develop a common language to talk about the system (a.k.a. ubiquitous language); well defined terms will help us. Naming is important. For example, what is an Entity? Its one of the building blocks of the system I work on, and last time I asked this question, I did not get very consistent replies from the team. It doesn’t matter if you’re using the term Entity to be something more specific than the real world, but it should still be consistent with the rest of the world. For instance, I’ve seen many examples classes named Cache in different codebases that don’t fit the real-world definition.

The principles behind the Agile manifesto say that the best architectures come from self-organising teams. This means that the best architectures are not dictated to the team; the team have to do the architecture themselves. In some self organising teams a senior member will naturally fill the position of architect and in other teams all team members will take collective responsibility for the architecture. I believe architecture is evolutionary and that you discover it on the way; in the process of building your system.

Month: December 2014

My Build System Manifesto

Hard to understand systems