Git is a powerful tool. And, in my opinion, the only version control system software developers should be using in 2020. Perhaps there will be a new king in town one day. But in 2020, Git is the only solution worth investing your time with. Why? Because everyone uses it! Literally, everyone.
However, not everyone is an expert with git! These are three git commands that I believe you should NEVER use unless you truly understand what you’re doing!
Update 11/26/2020: This article got WAY more traction than I was expecting and there are a lot of comments with valid criticisms that are easier to address up front than individually. First, my appologies for the title and tone of the article. By “NEVER” I do not truly mean never. Although I am a fan of hyperbole, the idea I’m attempting to convey here is that these commands have the potential to increase complexity enough that it may be easier to just not use them and solve your problem more simply. For example — there are usually easier ways to handle pure dependencies than git submodule. Also, if you don’t have an overabundance of parallel active branches, I believe git merge to be generally easier to use and understand than git rebase.
However, I recognize that I am biased because I work almost exclusively on relatively small projects which solve relatively pointed problems with smallish teams of 1–7 engineers. I’m not the guy developing the next Google (yet!). I’m the guy solving specific problems for each of our clients. If you work on larger teams with highly complex projects, git rebase and git submodule may be necessary to allow the history for your project or your sub-components to be reasonably navigable. If you are in this camp, hopefully an article written on a bit of a whim by a random person on the internet should not change your mind on whether or not using git rebase or git submodule is the correct answer. Let the official git documentation guide you. https://git-scm.com/doc
Simply put, the git submodule command allows you to add a git repository into your git repository! However, managing a project where multiple submodules are used can be cumbersome.
git submodule add <repository location>
The above command adds the repository specified as a directory within your current git repository. This directory is itself a complete git repository. If you change directory into the new submodule you’ll notice that ALL of your git commands behave as if you just moved to a completely new repository. Because you have!
We’ll hereby reference the top-level repository as the “parent” repository and the submodule as the “child”. Within the parent repository, the directory which is the child submodule is treated more or less exactly how any other file within git would be treated. When the child repository is changed, you’ll see in the git status of the parent that the submodule has new commits or has modified content. Under the hood, from the perspective of the parent, a git submodule can be thought of as nothing more than a single file that contains a remote location and a commit id! The only way to update this file, and thus the parent’s reference to the submodule, is to track a different commit ID. Only when the most recent commit (aka the HEAD) of the child repository changes can the parent repository git add/git commit the updated submodule.
This intuitively means that if you are actively working within the child submodule, your workflow must also include managing the parent’s reference to the child. For instance, when you make a new commit in the child, you need to ALSO commit the parent’s reference to the child in order to keep the full project up to date. This can clearly become painful. Especially if you have many submodules or multiple levels of nested submodules. Each commit in any child repository equally requires updates to all references to that repository. This is why, in my opinion, submodules should only be used if the development team of the parent repository is different from the development team of the child. Submodules work well to keep track of references to external source code dependencies. However, I believe submodules do not work well to segment a single development team’s project into multiple sub-components. The resulting increase in complexity is not worth the additional effort required to maintain the parent/child relationship.
An additional note of interest: when using git submodules you must recognize that the parent repository will store no backup of the contents of the submodule. Only a reference. If you do not own the location which the submodule references, there is no guarantee that what you’re referencing will continue to exist at that location forever. If you do not own the remote location, I highly suggest creating your own remote location which is a clone of the original location as opposed to depending on a third party to maintain their repository at that exact location forever.
git filter-branch is a fascinatingly dangerous command which has an extremely useful niche use case.
git filter-branch --tree-filter '<terminal command>' HEAD
The above command will execute <terminal command> for every commit in the tree starting at and working backward from HEAD. The command will create a new commit with the new content for each commit. For example, say that you accidentally committed your private user credentials as a file in your git repository. You suddenly noticed this was the case several commits later. You could run the following command to re-write all your previous commits by removing that file from each commit.
git filter-branch --tree-filter 'rm -f filename' HEAD
Perfect! That file no longer exists anywhere in your history.
Why would you want to do this instead of just creating a new commit which removes that file? Because someone malicious could always go back to the previous commit and see your private credentials if you don’t remove the file from all past commits as well.
Great! We achieved what we wanted to do. This command seems wonderful! What’s the problem?!
The problem is that you’ve now essentially created a completely new repository with a completely new history. If you were working in a team with a shared remote repository, you just destroyed your ability to push or pull to/from the remote repository. To alleviate this, you could force push to the remote repository but then you’re potentially permanently destroying valuable changes.
Because there’s no longer a common history, comparing your local repository to the remote repository would need to be done manually or have been done prior to the git filter-branch. Additionally, after the force push, you’ll have destroyed your team’s ability to push/pull to/from the remote repository. This causes your team to need to delete their local branch or repository in order to clone/pull the new history.
Also, there will be no stored history of the changes which the git filter-branch command itself created. You better be absolutely sure that the changes you’re introducing with git filter-branch are the changes you want because there’s no going back!
In general, re-writing history is ill-advised for all of the aforementioned reasons.
git rebase is perhaps the most controversial git command in existence. There are many development teams that mandate git rebase over git merge. Personally, I am a fan of the git merge approach. For one simple reason. git rebase rewrites history and rewriting history is inherently dangerous.
Let me explain the difference between git merge and git rebase before I further explain why I prefer git merge.
Both git merge and git rebase solve the same problem. They solve the problem of combining two branches with parallel changes into a single branch with one history. Let’s examine the case where we have a master branch and a feature branch as depicted in the following image.
The feature branch was created from the master branch at the “m2” commit. The feature branch then made changes to the repository in commits “f1” and “f2”. While the “f1” and “f2” changes were being developed, the master branch was updated with the “m3” commit. How do we update the master branch with the changes created in the feature branch?
git merge does this by preserving all of the commits of the feature branch. On the master branch, a new commit will be created which has a historical relationship to both the HEAD of the master branch and the HEAD of the feature branch. This merge commit exists to resolve the final states of the two branches. If the changes introduced between the two branches are not able to be resolved automatically, a merge conflict occurs and the user must resolve the conflict by deciding what changes should be kept prior to the creation of the merge commit. The direction of merging the master branch into the feature branch or merging the feature branch into the master branch generally doesn’t affect the resulting resolution. All that changes is on which branch the merge commit will be initially created.
git rebase, however, achieves resolving the two branches into one history by moving commits within history. For example, instead of the changes of the “f1” commit being applied to “m2”. git rebase will store the changes made in “f1” and “f2”, move the feature branch to instead be based upon “m3”, then apply “f1” and “f2” upon “m3”. If there is a conflict, you must resolve that conflict one by one for each commit which will be applied to the new base commit (in our example, merge conflicts may occur when applying both “f1” and “f2” upon “m3”).
git rebase is great because the resulting git history is a straight line. There’s no single commit that has more than one parent. However, after the rebase, commits “f1” and “f2” are no longer true preservations of the original “f1” and “f2” commits. Their commit ID will be different. Their content may be different. And as a result, you fall into the same issue which git filter-branch created. If you’re working with a shared remote repository and conducted the rebase locally. You’ll need to force push the rebased changes to the remote repository. A force push risks permanently removing valuable changes that may have been on the remote repository. Additionally, a force push destroys your collaborators’ ability to push/pull to/from that branch on the remote repository.
Generally, teams relegate the use of git rebase to only occur during pull requests to the master branch. Thus, avoiding the potential for collaboration issues. However, I still prefer to never rewrite history.
When working with a team of developers, it is important for git to not get in the way of seamless collaboration. Generally speaking, behavior within git that re-writes history or alters the convention of how most people interact with git should be avoided. Git submodules alter the general git convention by causing changes to be tracked on multiple levels as opposed to just once for the entire project. git filter-branch and git rebase both re-write your git history. And we discussed that there are complications that arise when the git history is changed. In my opinion, git rebase has little advantage over git merge and should almost always be avoided. However, git filter-branch has its uses in dire situations. It’s an “Oh $#!&” button more than it is a useful tool. Good to know it exists, but hopefully you’ll never need to use it.