I recently had an interesting question presented to me:
What happens if you have tagged a commit and the branch containing it is later squashed?
When a commit is squashed, it normally loses the history. If part of that history is tagged, is the tag lost? Does it get updated to reflect the new history? To understand what happens and why, we need to explore what really happens when you squash your commits.
Squashing
If you’re not familiar with the concept, a squash rewrites the history by combining one or more commits into a single commit. This alters the branch.
For example, a branch with three commits:
becomes a branch with a single commit after the squash.
At the end of the operation, the original commits B
, C
, and D
are combined into a single commit, D'
.
Squashing away the past
So what happens if I have a branch with a tag and it gets squashed? As an example, assume there is a main
branch with three commits. The last one is tagged v1
.
When those commits are squashed, a parallel history is created. The branch that was squashed now contains a new history, with A3'
represents the squashed versions of A1
, A2
, and A3
. The tag, however, continues to reference A3
. The original commits are separated from the main
branch, and the tag continues to point to the original commit. The result looks like this:
The original commits are no longer part of any branch, but they continue to exist.
What happened?
The reason that the original commits normally seem to disappear is because they are no longer referenced by any branch. They are dangling commits, meaning they have become unreachable unless you know the commit’s hash. A dangling commit will eventually be garbage collected by Git and removed. This typically happens after 90 days, when the reflog
expires. That said, it can happen sooner, especially if you force Git to expire the reflog
sooner.
When the squash occurs, a new commit is created that contains all of the changed files. After that, the commits that the branch points to are updated to point to the new commit. Normally, that would leave the original commits unreferenced. Unreferenced commits are eligible for garbage collection.
That’s what happens in the first example. The commits B
, C
, and D
still exist, but they aren’t referenced by any tag or branch. Without a reference, they eventually disappear. If this happens on the client side with new commits, then those commits will never be pushed to the remote. It’s as if those commits never happened.
When a commit is tagged, a reference to that commit is created in Git. The tag points to the commit. This is why the original commits are still present. While the branch was updated to point to the new commit (A3'
), the v1
tag continues to point to its original commit, A3
. That commit points to A2
which in turn points to A1
. None of those commits are dangling – the tag creates a reference that makes each of those reachable. Because those commits are referenced, they are not eligible for garbage collection.
The end result is that both histories remain, and the two histories are now divergent. Interestingly, the underlying blobs may remain unchanged. That’s because Git uses a SHA hash for each changed file, and two identical files will carry the same hash. That’s a story for a different post…
Happy DevOp’ing!