The Dangers of Rebasing A Branch

When working with git, updating a feature, bug fix, or release branch to the latest version of the main branch can be tough. There are two ways of looking at things:

Merging
Rebasing

Merging takes the two branches, finds a common ancestor and builds the diff from there and includes it as a commit in your current branch. Rebasing instead places the commits in line from where you split from the branch you are rebasing from.

Both seem to be viable solutions to this problem, but, the real deciding factor for myself and my team at Keen has been dealing with publishing our code and dealing with merge conflicts, to which merge has a clear advantage over rebase

But first, I would note some issues with merging in larger projects:

Primarily, the issue that many people have with merge conflicts is the fact that it adds a new commit for every time you update your current branch to the status of another branch.

In comparison, rebasing can have a few more looming issues depending on how you view git.

Personally, having studied Merkle Trees and discussed a possible use-case for using git/Merkle Trees as a caching solution, I view git as a entirely immutable structure of your code. Rebases break this immutability of commits.

Git commits are based on the previous commit and the diff data that creates the underlying trees and blobs (folders and files) for that commit. When you rebase, the idea is that you essentially only inject new nodes into the linked list of commits. But instead you are actually changing every commit since you split off of the last branch.

Issues with public code

At Keen we try to push our code after nearly each commit. This gives us the opportunity for peer code review as well as being able to hand off a project off at any time, even at the feature level. Since rebasing changes each commit in the history of a branch, this becomes a bit of a problem with shared branches (either on github or some other remote).

In order to push rebased code to a remote where the code exists, you must force push the code since it has to rewrite every commit on that particular branch. In some teams this can be slightly excused if you only have one person working of that branch, otherwise the other watchers of the branch will have to force pull and risk losing any new commits they have made to their local copy of the branch.

The other option that you have is to create a new branch every time that you rebase and then publish your rebase, but in my opinion that then just makes your remote filled with random branches and doesn't solve the issue of simultaneous work on a branch.

So, for this case, the only way I would recommend git rebases are on local branches that are not pushed to a remote in anyway (which conflicts with our company's view of pushing each commit).

Issues with merge conflicts

Probably the bigger issue that I have with rebasing is the way that it handles merge conflicts. When you rebase, when it inserts the new commits from the parent branch, it then goes through and tries to apply the diff from each commit in your current branch. Then if there is an issue applying the diff, it will then stop the rebase process and ask you to resolve the merge conflict. Once you have fixed the merge conflict and hit resume, your merge resolution is just put in as part of that old commit on your current branch.

To give you an example, let's look at a common place where merge conflicts arise: Parameter Declaration In this case, our server-side engineer was creating a new model on the server-side and kindly decided to add some properties to the client-side model as well. This seperations of concerns in a branch is a discussion for another day and even at companies with great processes, it happens and has to be dealt with. So when rebasing we get something like this:

On master the property function looked like this:

commentsForAllPosts: function() {
    return this.posts.comments;
}

Instead on our current branch when building it for real, we write this:

commentsForAllPosts: computedProperty.mapAll('posts', 'comments')

Since commentsForAllPosts didn't exist on the model before our server-side engineer had his PR brought into the parent branch, git freaks out. And seeing that something exists on parent in the merge resolution we go with what is on parent instead of the proper implementation (again it will happen to even the most diligent teams).

The rest of the rebase goes along smoothly, tests all seem to pass. A PR is made.

And then some more code is written that relies on the commentsForAllPosts property and everything is broken. But who do we go and ask for help? git blame shows that line of code has only been written by the server side engineer and he throws up his hands.

Now your front-end engineer is out on vacation, sick leave, or who knows. No one can figure out what that code should look like!

Rebase has killed the team's ability to look at the history to find what went wrong because any merge conflicts on the child branch are killed and the original code is lost forever.

If this same merge conflict came up and merge was used, the blame would show that that line of code had been touched in the merge process, the commit on the parent branch, and the commit on the child branch. Some toying around with the three permutations and you can get the original intent back into the code base and working without a ton of head scratching a finger pointing. And all you really had was another commit

Follow what the big boys do

One thing to note, is look at how git works in terms of git pull. git pull really is just an alias for git fetch && git merge. Similarly, you likely have used PR's on Github and/or Bitbucket, these are eventually doing a git pull which once again uses a merge and creates the new commit message. Merge is used through the core git flow and git internals… Rebase… Just isn't. So I take that as a bit of a guide for our workflow as well.

So what's your take on rebasing and merging on team and OS projects?