Often times, the transition companies go through moving from a centralized version control system (CVCS) to a distributed version control system (DVCS) can be daunting. Choosing a migration tool, presenting and getting acceptance from executive leadership, developer acceptance and education, and a host of other non-tech related factors are involved in making the switch, all of which are as integral as deciding on the actual day-to-day workflow Git users will follow.
While his blog post focuses on the latter, let’s briefly touch on why a decentralized model is a better fit for the enterprise.
Why Use Decentralized?
Perhaps the best reason to make the switch is the ease and encouragement of branching and merging. In this model, each developer has a clone of the repository locally, including all branches and tags. This means that, once cloned, the developer is able to commit work, create branches, merge features, and even run tests with the entire codebase at their disposal – all without an internet connection.
In a centralized world, the thought of having the entire repository on your local machine sounds absurd - and it is, for centralized repositories! When every commit must contact a central server, that results in many things:
- Developers must create a “devbranch” to store code that they don’t want to affect trunk, effectively treating the VCS as file dump.
- Encourages long-standing “islands” of code that exist only on developers’ computers instead of in a repository stored on a server for fear of merging.
- At the point of integration with trunk, updating your branch with the latest can result in local merge conflicts on files that you didn’t even touch - and may have already been resolved.
This list should be longer, however the focus of this post is on the workflow around using a DVCS in an enterprise context. For more information about why the DVCS design is superior to a CVCS, Linus Torvalds - the creator of Git - has a terrific (albeit biased) presentation on the subject that can be found on YouTube.
A DVCS solves the above problems as well as afford much more freedom to the developer. Since each person gets a clone of the repository, s/he is able to freely commit, branch, and merge locally without affecting anyone else, very similar to the “sandbox” concept. It is only at the point of integration that the developer must worry about having a branch in a suitable state for merging. At that point—and only at that point—the developer simply refers the main repository to their clone’s branch that has been curated. But, now we’re getting a little ahead of ourselves. Instead, let’s take a step back and discuss how this switch from CVCS to DVCS can be made.
Paradigm Shift
The first major hurdle to really understanding any DVCS (like Git) is making the mental switch from centralized to decentralized.
To be fair, centralized models do have some attractive features:
- Design is usually consistent across SVN projects (tags, branches, trunk).
- Clear where different types of development belong within the repository.
- Commits become immediately available to everyone.
- Shorter learning curve.
But because these tenets are common among CVCSs, their shortcomings are also well-understood:
- Repository is 3x as complicated as needed when starting.
- Immediately available commits encourage increasingly divergent branches.
- New branches are entire copies of the existing branch, thereby grossly overusing disk space and “bloating” the repository.
- Merges:
- Usually are avoided unless necessary.
- Nightly merging is encouraged, but is rarely practiced.
- Merging algorithms give metadata equal weight as file content, which can lead to strange merge conflicts created over whitespace.
- Inherent divergence of branches contributes to marathon merging sessions.
- Tags are only culturally enforced to be off-limits, and so are susceptible to changes.
Decentralized Git Workflow
Anyone who has already contributed on an open source project understands this workflow, however for those who haven’t—which is most people—what does a decentralized Git workflow look like, and why should your team switch?
“Fork & PR”
The shortest way to describe the optimal Git workflow for an enterprise, or any, team with 2 or more people is “fork & PR,” or more specifically, “create your own fork of the main codebase and use pull requests (PRs) to merge changes from your fork to the main repository.” Let’s break down what each of these mean:
Fork It, Man!
Cloning is the foundation of any DVCS. You are either cloning or initializing a repository. “Forking” is a term introduced by Github to describe the act of cloning any given repository, and saving that into your own (or, a different) namespace. That new clone in the new namespace is referred to as a “fork.” As with any DVCS, because the fork is a clone, you receive all of the branches, tags, metadata, and source code that the “upstream” repository has.
We can see many examples of this is the open source world, including: - Icinga is a fork of Nagios, a popular monitoring application. - Jenkins is a fork of Hudson, a known continuous integration app. - Ubuntu is a fork of Debian, the most widely used Linux distribution.
The reason behind forking is to allow developers to make and commit changes without affecting the upstream repository, or other authors. In addition to the advantages described earlier, forks also allow you to experiment with production quality code in a development environment.
Once you have forked a repository, you will create a local clone of that fork on your workstation, making changes and developing as desired. At the point you feel the code is ready for integration, you would create a pull request to integrate your branch with that of the upstream repository.
Pull Request: When Worlds Collide
Pull requests originated with Github as a way to integrate changes from one branch into another, create visibility into those changes, and enable collaboration between affected developers (via the web interface). The bonus is that the branch could originate from any repository and be from within any namespace. In addition, they provide a standard way through which changes can be merged into the original repository.
The term “pull request” comes from the target branch’s point of view. The decision about whether or not to accept the changes in the request lies with the original repository’s creators/admins. It is their choice to “pull in” changes being requested from other branches, hence, “pull request.”
The actual action of merging a pull request is no different than a standard git merge (technically, git merge –no-ff), however the addition of the other features has proven the pull request to be a preferred feature. This is evidenced by competing applications to Github that include pull requests (e.g. Atlassian Stash), and also the inclusion of the “git request-pull” command in Git core itself as of version 2.0.
What Does The Workflow Look Like Using Forks And PRs?
There are six easy steps:
- Fork the original (“upstream”) codebase — Github, Stash, Bitbucket and other code repository hosting tools have a “fork” button that creates a copy of the selected repository in your personal namespace.
- Clone your fork (“origin”)
- Define upstream remote in your working copy
- Do Work!
- Update the working copy with the latest from upstream
- Push to origin
This workflow should be followed for even the smallest of changes, as it allows complete visibility, accountability, and traceability for all changes introduced into your codebase.
What Do We Gain From This?
There are many benefits to using this workflow, including:
- Freedom to experiment, commit, and push without affecting the main project repository
- Ability to easily share and merge peer’s repositories/forks
- Forks are updated before each push
- Reduced “repository bloat”
- Not restricted to working in named devbranches
- Easy code review via pull requests
Beware - Gitflow: CVCS in DVCS Clothing In an attempt to use a decentralized version control system (DVCS), companies and teams often times misunderstand how to properly leverage its power, very much echoing James Shore’s Cargo Cult Agile post. A prime example of this can be seen in one of the most widely adopted helper tools used to implement a development workflow in Git: Gitflow.
Aside from the fact that Gitflow is an abstraction from Git itself, how does its workflow model compare with that of the centralized/SVN model outlined earlier?
Subversion | Gitflow |
Repository is divided into 3 parts | Repository is divided into 2 parts |
Trunk has the most unstable code | Develop has the most unstable code |
Long-running feature and release branches | Long-running feature and release branches |
Nightly merges to trunk | Merge as new code is pushed to common ancestor |
Tags are created at release time | Tags are created at release time |
Merge to trunk after production releases | Merge to master after production releases |
The similarities are almost identical, which begs the question: how is using Gitflow any better than using SVN? The answer? It’s not. In fact, it’s the worst of both worlds, as it tries to enforce a centralized workflow on a decentralized system. The biggest reason for its popularity is rooted in the users’ lack of Git knowledge. Given all of the factors involved with switching an enterprise development team from a CVCS to a DVCS—as discussed at the beginning of this post—users gravitate to the familiar and embrace Gitflow as it “feels” the most like the CVCS workflow to which they have become accustomed.
Closing Thoughts
After making the switch from CVCS to DVCS, users tend to realize the power, flexibility, and freedom that they have are invaluable, and quickly take advantage of its power. Particularly with overseas teams, this workflow is a, scalable, simple, and invaluable way to organize the development of any size project for any size team.