Last modified 2023-08-01.
Time to read: 5 minutes.
When on an expert witness assignment, I often inspect git repositories provided by the opposing party's lawyers. Quite often, those repositories have issues that seriously impede git operation. Although the motivation for a legal team to hamper the work of an opposing expert is readily apparent, often it is technical ignorance and not malice that causes problems.
This page contains my notes on:
- Preventing problems for parties that need to share a git repository
- Overcoming issues with git repositories provided by the other party, whether deliberately caused or due to honest mistakes
Typical Git Files and Directories
Following are the files and directories provided after a typical
and checking in a few files.
The 256 subdirectories of the
objects/ directory are not shown for clarity.
logs/refs/remotes directories are also not shown,
since they are unnecessary when there is no internet access.
$ find .git -type d \ -not -path ".git/objects/*" \ -not -path ".git/hooks*" \ -not -path ".git/logs/refs/remotes*" | \ sed -E 's^.git/?^^' | \ column -c 80 HEAD info/exclude COMMIT_EDITMSG objects ORIG_HEAD index config refs logs refs/remotes logs/HEAD refs/remotes/origin logs/refs logs/refs/heads refs/remotes/origin/master refs/tags logs/refs/heads/master refs/heads description branches refs/heads/master info FETCH_HEAD
Maximizing Good Will
This section is dedicated to preventing problems for parties that need to share a git repository.
Litigation is by nature an adversarial activity. When computer software changes evidence without notice or warning, people tend blame each other.
A dangling commit is a commit that is unreachable from any other commit. One way to make a dangling commit is to make a commit on a detached head.
Git runs garbage collection periodically. This happens without warning. One of the functions that git garbage collection performs is to delete (prune) all dangling commits. There is no message to alert the user that dangling commits were found or that they were pruned.
This is can lead to investigators experiencing files disappearing from a git repo after a period of time, as if they were written in the digital equivalent of disappearing ink. The party that obtained the git repository might accusing the other party of destroying evidence.
To avoid this potentially very damaging accusation, 3 actions should be performed before giving a git repository to another party:
- Name any dangling commits that you want to preserve
- Verify the integrity of the git object database
- Run the garbage collection with extra care
Naming Dangling Commits
If you want to preserve a dangling commit, give it a name. Do this before performing the other two actions, described next.
Giving names to dangling commits prevents the garbage collector from deleting them.
You can name a dangling commit by creating an annotated tag.
The following example creates an annotated tag called
$ git tag -m 'Named this dangling commit' -a dangle1 283492384928349823
Verifying Data Integrity
The following verifies the integrity of the repository's object database, and prunes dangling objects.
$ git fsck --unreachable --dangling --no-reflogs
Extra-Careful Garbage Collection
The following runs garbage collection with extra care and attention:
$ git gc --aggressive
Contending With Inspection Problems
This section is dedicated to overcoming issues with git repositories provided by the other party, whether deliberately caused or due to honest mistakes.
The computers that are provided to software experts when visiting the opposition's clean room to inspect their client's software
never have internet access.
This means that commands like
git fetch and
git clone are non-functional.
The lack of connectivity restricts options for dealing with issues.
git fsck command can be used to verify the integrity of a git repository.
It can also identify dangling and unreachable objects.
SHA-1and general object sanity, and it does full tracking of the resulting reachability and everything else. It prints out any corruption it finds (missing or bad objects), and if you use the --unreachable flag it will also print out objects that exist, but aren’t reachable from any of the specified head nodes (or the default set, as mentioned above).
Write dangling objects into
.git/lost-found/other/, depending on type. If the object is a blob, the contents are written into the file, rather than its object name.
Report root nodes.
Print out objects that exist but, aren’t reachable from any of the reference nodes.
The first commit of most
git repos is the
It is possible for a
git repo to have more than one
in that case you will have to examine them to determine which was 'first',
according to what you might mean by 'first'.
$ git fsck --lost-found --root --unreachable root 3fa77c58f85c591f9c6a1b0510228e4aec704697 Checking object directories: 100% (256/256), done.
.git/HEAD has been deleted, then
git commands give an error,
like the following:
$ git log fatal: not a git repository (or any of the parent directories): .git
HEAD to point to the tip of the
master branch like this:
$ echo "ref: refs/heads/master" > .git/HEAD
git project you are working with was created on GitHub recently,
HEAD should probably point to the tip of the
main branch instead:
$ echo "ref: refs/heads/main" > .git/HEAD
git commands should work, unless other problems are also present.
If the staging area in
.git/index has been deleted, the
command shows all the files and directories in the project as having been deleted,
and also shows those same files as being untracked.
Since a file or directory cannot both be deleted and untracked,
this contradictory result indicates that
.git/index was deleted or is damaged.
$ rm .git/index $ git status On branch master Your branch is up to date with 'origin/master'. Changes to be committed: (use "git restore --staged
..." to unstage) deleted: .gitignore deleted: .rspec deleted: .rubocop.yml
Untracked files: (use "git add
..." to include in what will be committed) .gitignore .rspec .rubocop.yml
index, without disturbing the worktree, type:
$ git reset --mixed $ git status On branch master Your branch is up to date with 'origin/master'.
nothing to commit, working tree clean
Obtaining the Hash of the First Commit
git fsck --root option shown above
yields the hash of the first commit, but
that value is mixed with other tokens which are a pain to parse.
To display the hash of the first commit such that it can be easily stored into an environment variable,
use the following incantation:
$ git log --reverse --format="%h" | head -n 1
Define the environment variable
COMMIT0 like this:
$ COMMIT0="$( git log --reverse --format="%h" | head -n 1 )"
Display Files in a Commit
The following incantation lists the filenames in a commit:
--root option allows this to work with the root commit.
$ git show --format="" --name-only --root $COMMIT0
The following incantation displays the files changed by a commit.
$ git diff-tree -r --name-only --root $COMMIT0
Display File From a Hash
Display the contents of a file, given its hash:
$ git cat-file -p a997766
Discovering the Commit that Added a File
The hash of the commit that added the first version of a file is easily discovered with the following incantation.
$ git log --format="%h" --diff-filter=A -- README.md 3fa77c5
We can save the result in an environment variable called
This environment variable will be used in the remainder of this document.
$ COMMIT1="$( git log --format="%h" --diff-filter=A -- README.md )" $ echo $COMMIT1 3fa77c5
Display File Version in a Commit
If you know the commit hash, the file contents as it existed in the commit can be displayed.
$ git show $COMMIT1:README.md
Diffs of a File Against the HEAD Version
To compare the version of the file in the commit to the currently checked out version of the file,
provide the hash of the commit and the name of the file to the
git diff command.
$COMMIT1 refers to the hash of the commit that contains the
first version of
$ git diff $COMMIT1 README.md
Diffs of Any Versions of a File
To compare to another version of the same file
(for example, the version that existed before the previous 2 commits to the current branch).
Note that the version that existed 2 commits ago might be identical to the version pointed
because there is no guarantee that those 2 commits modified this file.
$ git diff $COMMIT1 HEAD~2 -- README.md
It is often more useful to examine the changes to a file instead.
To obtain the hashes of all modifications to a file,
excluding the commit that added the file to the repository,
$ README_MODS="$( git log --format="%h" --diff-filter=M -- README.md )" $ echo "$README_MODS" # quotes keep each value on a separate line: 841c17a 7e30894 18a09e3 d71002d $ echo "$README_MODS" | tac # Reverse the list d71002d 18a09e3 7e30894 841c17a
To obtain the hash of the 2nd change to the file, which is the 3rd version of the file:
$ echo "$README_MODS" | tac | sed '2q;d' 18a09e3
To compare the 3rd version of the file (which has the hash immediately above) to the 4th version, first do some setup:
$ README3="$( echo "$README_MODS" | tac | sed '2q;d' )" $ README4="$( echo "$README_MODS" | tac | sed '3q;d' )" $ echo $README3 $README4 d71002d 18a09e3
There are two types of incantations that can produce diffs of a file. The first type of incantation allows comparing two arbitrary versions. For this incantation, all that is required are the hashes of both file versions:
$ git diff $README3 $README4
The second type of incantation compares an arbitrary version against the version in
For this incantation, provide the
hash of the commit,
the hash of the desired version to use as the basis for comparison,
and the name of the file.
$ git diff $COMMIT1 $README3 -- README.md