Dealing With The Polyrepo Concept
For many years we were using monorepo for each application in my organization. Now, we are using polyrepo successfully for 4 years already. Polyrepo concept is; instead of using a big single repository, dividing the codebase into multiple repositories. Almost every open source project on GitHub manages their code with monorepo fashion. I haven’t witnessed but many gigantic organizations do the same. There is a great article about why you shouldn’t use monorepo and I strongly recommend reading that article before you continue reading this. The concept of monorepo is yet straightforward but drops a bulky VCS to a configuration specialist. Generally, that role is filled by a senior engineer. This article is about what we have achieved with the polyrepo concept.
The software we deal with
Although we are not a very large team (roughly 30 developers) we develop 8 software entities which many of them are designed with a client-server architecture. We develop ground station software for both GEO (Communication) and LEO (Earth Observation) satellites. Some of our software used on both GEO and LEO. Nonetheless, that makes roughly 16 independent applications.
The software entities are:
- Satellite Command & Control Software
- Archive and Data Analysis Software
- Satellite Mission Planning Software
- Payload Configuration Software
- Orbit Control Software
- Payload Data Processing Software
- Orbit Dynamics Service
What have we done?
We were already using a monorepo for each software, we split each into at least 3-4 independent repositories. First, we created an independent repo for each application for its API. The API is required if any of that software will communicate to each other. Then, a repository for the client, one for the server and sometimes a common repo for data structs that used in both client and server. Some repositories such as messaging, encryption etc. are used by every application and they also have a separate repository for themselves. That makes roughly 25-30 individual repositories.
In this case, a single component of the software consists of at least 4 repositories:
- Messaging
- Encryption
- API
- Itself
…
In the beginning, it seems hard to build and discouraging to make a change. I can confidently say it’s not! Managing polyrepos is actually easier than monorepos. It reduces the complexity of the system. Since we have tests for each repository, the builds are quicker, and as a matter of fact, deployment is easier for minor changes. Since the repositories are smaller, tracking a change couldn’t be easier. I can admit that we heavily rely on tests, but testing makes easy to develop individual parts of the software. Apart from that let’s say if we have a lame part in the whole system, it is much easier to replace it with a new solution/approach.
Monorepo on the other hand, development requires much more attention. Each feature has its own branch, and merging them is painful and sometimes it breaks other parts of the software. Versioning is awkward because a minor version introduces something new at only a little portion of the whole system and it is not so meaningful for the whole system. Tracking a change is not easy too, you may need a shovel and a pickaxe to track a feature branch that merged in the past. Lastly, tests are squeezed into unit tests which sometimes does not make any sense to write.
Homebrew project is a good example for the polyrepo concept.
I will compare the polyrepo concept to monorepo with the advantages of monorepo which stated on Wikipedia page.
- Ease of code reuse: Polyrepo does not prevent that. In fact, it encourages it. You don’t need a package manager if you split to more than one repo. Just import them separately into your workspace.
- Simplified dependency management: I also find polyrepo concept handles this issue better. The dependencies are better linked to where they are needed. And you don’t require all dependencies to change some part of the software.
- Atomic commits: Most of the atomic commits does not have an impact on software’s overall design. That means you can do atomic commits on a repository that won’t impact to the interface or overall system design. It is not discouraged by the polyrepo approach. It also puts a safety lever to change interfaces between repositories. Converting an integer value to short value may seem atomic but if it is on the interface, I wouldn’t tell its a small change. And it should be reviewed and tested.
- Large-scale code refactoring: This may be hard on polyrepo compared to monorepo but that doesn’t mean it is impossible. It is manageable if you work on a workspace that has all required repositories included.
- Collaboration across teams: I don’t understand why monorepo has the advantage over polyrepo in terms of collaboration. You can still include the source of another repository into your workspace. That does not break anything. In fact, you can be sure that master branch is always stable and you can checkout to development branch of included repository to test new features. I find the polyrepo concept is better in terms of collaboration.
Drawbacks
As far as I observed, the polyrepo concept has some drawbacks too. I’ll go over some of them.
Issue tracking is hard. We use self-hosted GitLab and it is a great tool. We barely require an additional task management tool. Since we have many repositories related to a single application, the issue management is complicated. It is hard to decide where to open an issue for an overall feature request or a bug report.
Keeping all repositories up-to-date related to an application. Just for this, I wrote an application to do this job better. gitbatch helps to manage hundreds of repositories. (i.e. sync with remote, merge latest changes, see changes across multiple repositories, etc.)
It’s not easy to just clone and start developing. First, you need to clone all required repositories and link them into a single workspace. Also, you may need to read the instructions of each repository to understand the requirements to build those repositories.
Conclusion
We are using the polyrepo approach with tens of engineers across multiple teams now for almost four years. It works great for us and we’re planning to stick on the polyrepo concept for the foreseeable future. As far as it goes, we got used to it. We do not miss monorepo in any sense. With this approach, we have testable repositories. Better version control on components of a single application. We also have integration tests so that merging into master branch won’t break something on the system. Let it be admitted that we have more labor on version control but it’s well worth it. Scaling our VCS is easier now.
I would like to hear your thoughts about this post. If you want to share with me, you can find me on Twitter.