Inside Git’s Brain: How Commits Are Stored and Linked
What Really Happens When You Hit Commit!
We’ve all been there, working on a feature, writing some amazing code, and shipping the feature. After some time, BOOM! Something broke.
Now we need to undo our changes and revert to a previous stable state, hoping this will fix the issue.
But “revert” strikes fear among developers (at least I used to fear it). I always thought that I would surely mess things up.
But here’s the truth: the only way to conquer that fear is to understand how Git actually works.
Not just the commands, but the WHY and HOW behind them.
What is inside Git?
Git has 4 types of objects stored in .git/objects
:
Blob – file content (no filename, just raw data)
Tree – directory structure (lists blobs + sub-trees + names)
Commit – points to a tree, contains metadata (author, message, timestamp)
Tag – references other objects, used for versioning
But how does all this create such an intricate system? Let’s understand it.
Suppose we have the following directory —
repo/
├── a.txt
├── b.txt
└── subdir/
└── c.txt
Each file (
a.txt
,b.txt
,c.txt
) is stored as a blob.Each directory (
subdir/
) is stored as a tree, which lists the blobs (and other trees).The root directory is also a tree, listing
a.txt
,b.txt
, and thesubdir
tree.
We ran the following command —
git init
git add .
git commit -m "Initial Commit"
We initialized git in this repo
We added all the files to git
We created a commit (let’s call it commit A)
We can imagine these commits and state trees as a graph below —
Now, let’s do the following operation —
Change "a.txt"
git add a.txt
git commit -m "Changed file a"
How does the graph look now?
Graph after the next commit (Commit B)
A new commit B was created, and its parent was assigned as “Commit A”; also, it is just a pointer to a new tree snapshot
Only the changed files (a.txt) were used to create a new Blob; for the rest, old blobs are used
Git doesn’t just store the diff, it creates new blobs with the entirety of the content
How does git know the exact lines/characters that changed?
In the above example, we changed a.txt
. If we do a git diff
, it will show all the exact things that changed.
Git stores the blobs of different versions of the file.
a.txt → v1 → blob1
a.txt → v2 → blob2
Now the whole thing comes down to calculating the difference between these blobs, and for that, git
uses some amazing and clever algorithms for comparison.
Git
uses Myers diff
for line-level, Histogram diff
for better rename/move detection.
Now that we are acquainted with the underlying mechanism that git is composed of, let’s look at working with it.
Lifecycle of a change in git
Every change in git goes through 3 stages —
HEAD
A pointer to the latest commit on the current branch.
It’s what repo is based on right now.
It’s like - "What version of the repo am I currently looking at?"
HEAD
can be moved around using commands like checkout
, reset
, revert
INDEX (Staging Area)
A middle layer between your work and Git’s history.
It’s what gets included in the next commit.
When we do
git add a.txt
, Git copies the current version ofa.txt
into the index.
A commit is made from whatever is in the index, not necessarily what’s in the working directory. That’s why we need to “add” the changed files we want git to track.
Working Directory
What we actually see in the code editor.
These are files on disk, not in Git yet.
Running Commands
We will look at how the commit graph changes on running different commands.
This is how our initial commit graph looks —
The current HEAD
is at C
, which is in the main
branch.
Git Checkout
Let’s run the following commands and see how the above graph changes —
1. current branch is in main and current commit at C
2. git checkout -b feature (new branch from main branch, from commit C)
3. git commit -m "new changes" (Commit D)
4. git commit -m "new changes again" (Commit E)
Git Revert
Let’s do the revert
now —
git revert C
What does normal revert do?
A new commit
C'
is added.C'
undoes changes introduced inC
, but keeps history.
It seems simple, until the commit C is not a merge commit. If it is, it’s an entirely different story I might cover in the next part.
Git Reset
Let’s do the infamous reset
now —
git reset --hard HEAD~1
Let’s understand each line of the syntax first —
Reset
This moves the current
HEAD
pointer to a different commit.It also changes the state of:
Index (staging area)
Working directory, depending on the mode (
--soft
,--mixed
,--hard
)
So, git reset <target>
says:
“Pretend the last commit(s) never happened — go back to this
<target>
.”
--hard
-- soft → only the HEAD changes → keeps staging area + working dir as-is
-- mixed →
HEAD
+ Index changes → resets staging area, but keeps working dir.-- hard →
HEAD
+ Index + Working Dir changes → everything goes back to the target commit — files are overwritten.
So --hard
means:
Reset
HEAD
, wipe the staging area, and make the files on disk match the target commit.
~ (tilda)
This is the commit ancestry operator.
HEAD~1
means → “The first parent of the current commit.”HEAD~2
means → “The grandparent (two steps back).”
In our above example —
HEAD~0 → C
HEAD~1 → B
HEAD~2 → A
Now, all together, git reset —hard HEAD~1
means —
HEAD
andmain
move back toB
.C
is lost (if not referenced).Working dir & index are forcibly set to the state at
B
.
Git Merge
Suppose we have the following commit graph —
Now, we want to merge the branch “feature” into “main”. We do the following —
git checkout main
git merge feature
The above graph will look like this now —
Git creates a new merge commit
M
.M
has two parents:C
(frommain
) andE
(fromfeature
).The histories of both branches are now joined.
What’s inside a merge commit?
Git compares the two branches (
C
andE
) and finds their common ancestor (B
).It performs a three-way merge:
Base =
B
Ours =
C
(the branch we’re on)Theirs =
E
(the branch we're merging in)
It auto-resolves changes where possible and creates a new snapshot =
M
.
Merge Conflicts
If both branches changed the same line or area, Git will pause and show conflicts like:
Auto-merging a.txt
CONFLICT (content): Merge conflict in a.txt
We must resolve manually and then do:
git add a.txt
git commit Creates the merge commit
In Brief
Git stores data as objects:
blob
(file content),tree
(directories),commit
(snapshot + metadata), andtag
(reference for versioning).Every file change creates a new blob, Git doesn't store diffs; it stores full snapshots.
Diffing is done by comparing blobs using algorithms like Myers and Histogram.
Three key areas in Git:
HEAD
→ points to the latest commitINDEX
(Staging Area) → holds changes ready to commitWorking Directory
→ actual files we’re editing
Revert creates a new commit that undoes changes from a previous commit without altering history.
Reset moves
HEAD
back to an earlier commit:--soft
: only updates HEAD--mixed
: resets index too--hard
: resets everything, including the working directory
Merge combines histories from two branches and creates a merge commit with two parents.
Three-way merge uses:
Base = common ancestor
Ours = current branch
Theirs = merging branch
Conflicts must be resolved manually when the same lines are changed on both branches.
Wrapping Up
As software engineers, we use Git multiple times a day, committing, pushing, pulling, and checking out branches.
But the moment we need to do something like revert
or reset
, many of us hesitate. Why? Because we’re not used to how these commands work under the hood.
By understanding how Git is structured internally, blobs, trees, commits, and the graph structure, we stop relying on memorization and start using Git intuitively.
It becomes a tool we control, not something we’re afraid to break.
Feel free to let me know in the comments if you want me to cover some more operations in git in the same detail as I did here.
As always, thanks for reading this. And I will be back with another interesting topic next week.
Stay tuned!
nice article. will love part 2 covering advanced commands like `rebase`, `cherrypick`, etc
You truly learn when you make a real mistake in a repository(experience) 😁