Git Branching Models
One of git’s main benefits is also one of its biggest challenges: its distributed nature. The same freedom to work on an independent copy of the source, without having to deal with unrelated modifications until necessary, can create headaches when unrelated functionality is not so unrelated as it looks.
The central problem is that merging is no fun, and that careless branching and independent work ordinarily leads to a great deal of merging, once a software team grows beyond a certain size. It’s easy to limit the amount of merging by enforcing strict limits on the amount of branching, but this robs git of its main advantage. A good git branching model, therefore, should limit the amount of complicated merging as much as possible, while also placing as few constraints as possible on branching.
Note that, of the three models I will discuss here, two are discussed in the context of external tools. Although the external tools are not strictly necessary—that is, all three workflows are perfectly usable with command-line git and some means of communication between developers—they do help. Git is a complicated version control tool, more complicated than non-distributed source management tools, and even more complicated than its fellow distributed source management tool, Mercurial. External tools ease the cognitive load on developers, and free them to do their jobs, rather than worry about source management.
Finally, before we start, let me define two terms. When I write of ‘merging’ in this post, I refer to the sort of merge that leaves a merge commit in the git history: that is, either a merge with conflicts, or git merge called with the --no-ff flag. When I write of ‘rebasing’, I refer to a merge that places the merged branch at the tip of the target branch, with no merge commit: that is, either a merge that can be fast-forwarded, or a rebase-and-merge.
With that in mind, let’s dig in.
The GitHub Workflow
The GitHub workflow originated with the developers behind GitHub, and, as such, uses GitHub. As it was designed to manage the source for the GitHub platform, it’s best suited to projects like GitHub: web applications, or other applications which can be instantly deployed and updated for all users, for reasons which will become clear. It revolves around its master branch, which is always deployment-ready. Any change to master, from a one-line, one-commit typo fix to a weeklong major feature, starts life in a descriptively-named branch, which can be freely pushed to the server, both so that other developers can have a look at it, and to serve as an ad-hoc work-in-progress list. As work progresses, the developer creates a GitHub pull request. Other developers review the changes contained in the pull request, and once the changes are approved, the branch is merged back into master.
In addition to its function as code review, the pull request supports discussion about the code changes. For instance, if the developer working on the branch needs help, he can open a pull request to get extra eyes on the source. Beyond that, the GitHub interface provides several excellent tools for managing and visualizing the source tree, including a repository graph, an essential part of the toolbox for working with any highly-branched source tree.
As mentioned above, the GitHub flow is excellent for rapid-deployment environments: it focuses on maintaining a source tree which is always ready for deployment, and indeed, GitHub often deploys tens of times per day. This permits an extraordinary amount of responsiveness, a superb fit for a web app, where updating a user’s copy is as simple as refreshing the page, but requires additional discipline when using it for a project with larger teams that features a more monolithic release structure, or larger feature changes. Without breaking large features into smaller portions, the merge at the end of the branch risks becoming just as messy as the merges we were hoping to avoid.
The Repo/Gerrit Workflow
Repo and gerrit are tools developed by Google to manage the Android operating system source tree. In its entirety, it’s a heavyweight tool: it’s built to maintain dozens to hundreds of git repositories, all of which are component parts of one larger whole. For very large projects, repo/gerrit may be the correct toolset, but, as most projects aren’t quite that large, we will focus on using gerrit as a standalone tool in this post.
Gerrit is a web-based git code review tool, similar in many ways to GitHub. It does not provide as many project management features as GitHub, but it is a robust code review and merge management tool. It is also capable of imposing several constraints on how a git source tree is managed, which yield some very interesting consequences.
At first glance, the gerrit workflow looks very similar to the GitHub workflow. Each feature and bugfix is developed on a named feature branch, which may be freely pushed to the server. There are two key differences. First, gerrit can (and should) be configured to require that every merge be a fast-forward. That is, every branch which is to be pulled into master must be clean relative to master: it must apply to the tip of the master branch with no merge conflicts. This makes for a master branch with no merge commits, and no situations where a bad merge introduces a bug in the merge commit itself. Bugs caused by merges will always show up in the commit which did not apply cleanly to master, and can be much more easily tracked down with tools like git bisect. It also maintains some of the flexibility of the GitHub model, while providing better support for long-running feature branches. Since rebasing requires developers to make changes to fix merge conflicts in the commits where the merge conflicts appear, it simplifies and compartmentalizes merge changes.
Second, gerrit enforces code review, by requiring that branches be reviewed and verified before they are merged: the code reviewer must provide a rating from -2 to +2, and the tester must provide a verification. Only when a reviewer has given a +2 and a tester has given a verification may the branch be merged. This helps to ensure that master stays clean.
The gerrit workflow is something of a middle ground: although it provides for a cleaner and more regimented codebase than the GitHub, it remains an agile, fast-moving flow. Branches should represent either bugfixes or complete features, and should be merged when done, and fairly rapid releases are the order of the day.
One additional caveat: multiple developers working on one branch can make for tricky rebasing, since rebasing is a history-rewriting operation. Whenever a rebase is required, all developers should either save their work in progress, or commit it and push it. One developer carries out the rebase, and the others all pull the new, rewritten branch.
The Gitflow Workflow
So far, we have discussed two workflows well-suited to projects with relatively rapid release schedules. Now, we will discuss a more heavyweight model, better suited to large projects with more a more moderate pace of releases, and a better-defined release process.
Minimally, the Gitflow workflow consists of two branches: a master branch, where releases live, and a development branch, where ongoing development happens. Features are developed in branches off of the development branch. To create a release, a pre-release branch is created from the development branch. Fixes to issues detected in pre-release testing are fixed in the pre-release branch and merged back to development. Upon release, the pre-release branch is merged into master. Fixes to bugs in a released version are developed in hotfix branches, created from the master branch, and merged up to the development branch and into master. Releases along master are tagged so that they can be found easily. The flow in Gitflow describes the movement of commits: features flow from development down, and fixes flow from their particular branch up.
Gitflow, then, is a workflow designed with large, slow-releasing projects in mind. The master branch is isolated from all development by pre-release branches, and only hosts code that has undergone testing; as such, master is very safe code, and only mistakes (rather than the normal course of development) will see serious flaws introduced into otherwise release-ready code. It can also be extended: for one project, SDG used Gitflow with an additional QA branch in between development and master, which provided further isolation between work in progress and releases. Development was used like the development branch in standard Gitflow, feeding into QA, which was used to build releases for our partner to test and evaluate internally. QA, in turn, fed into master, from which releases to our partner’s customers were built.
Though Gitflow pushes stable release code better than either of the other workflows I have discussed, it does have two major downsides. First, it is relatively complicated, and requires good repository discipline from developers. A failure to enforce the workflow’s central tenets, the flow of features and fixes, can lead to two diverging branches, and the further such failures go, the harder it is to restore the source tree to its proper state. Second, it is difficult to visualize the workflow without a repository graphing tool, and visualization of the workflow is key to understanding it.To close, it is important to remember that selecting a good workflow is not a one-time choice, and the workflow that is correct for one project may not be correct for another. Two developers working on a small internal project do not need as much process and structure as a team of twenty working with a partner across the country.