Git has been the dominant source code control program since 2008. When on assignment as a software expert for a US federal court case, I always request entire git repositories instead of just the source code at one specific historical date. Having the entire git repository means that I can learn how a project was built, and who did what.
Git repositories use two types of immutable objects to store data: blobs, which are files, and trees, which are directories. The ‘trees’ are actually Merkle trees, which are immutable. Merkle trees are also used in cryptocurrency for similar reasons.
Each node in a tree has a cryptographic hash, also known as a SHA, most often referred to as a hash. Git evaluates the validity of the hash of each object each time it processes them. Invalid hashes cause git to halt with an error message. This guarantees that if git continues a command to completion without halting with an error message, the objects that it processed are unchanged.
The git status
command tells the user if any files or directories have been added, deleted or changed.
This is the message that results when a git repository has no uncommitted changes:
$ git status On branch master Your branch is up to date with 'origin/master'. nothing to commit, working tree clean
This is the message that results when a git repository has a modified file:
$ git status Refresh index: 100% (1756/1756), done. On branch master Your branch is up to date with 'origin/master'. Changes not staged for commit: (use "git add..." to update what will be committed) (use "git restore ..." to discard changes in working directory) modified: what_yo_mama_said.txt no changes added to commit (use "git add" and/or "git commit -a")
This is the message that results when a git repository has an extra file:
$ git status On branch master Your branch is up to date with 'origin/master'. Untracked files: (use "git add..." to include in what will be committed) extra.html nothing added to commit but untracked files present (use "git add" to track)
This is the message that results when a git repository has a deleted file:
$ git status On branch master Your branch is up to date with 'origin/master'. Changes not staged for commit: (use "git add/rm..." to update what will be committed) (use "git restore ..." to discard changes in working directory) deleted: README.md no changes added to commit (use "git add" and/or "git commit -a")
Thus, if the opposing side does not provide source code for a given product or program in a git repository, not only is it impossible to examine the history of the product or program, it is also impossible to know that all of the files have been received, unchanged, or if extra files have been provided. I always press for receiving source code in git repositories instead of in unmanaged directories.
]]>When on an expert witness assignment, I often inspect git repositories provided by the opposing party's lawyers. Quite often, those repositories have issues that seriously impede git operation. Although the motivation for a legal team to hamper the work of an opposing expert is readily apparent, often it is technical ignorance and not malice that causes problems.
This page contains my notes on:
Following are the files and directories provided after a typical git init
,
and checking in a few files.
The 256 subdirectories of the objects/
directory are not shown for clarity.
The hooks/
and logs/refs/remotes
directories are also not shown,
since they are unnecessary when there is no internet access.
$ find .git -type d \ -not -path ".git/objects/*" \ -not -path ".git/hooks*" \ -not -path ".git/logs/refs/remotes*" | \ sed -E 's^.git/?^^' | \ column -c 80 HEAD info/exclude COMMIT_EDITMSG objects ORIG_HEAD index config refs logs refs/remotes logs/HEAD refs/remotes/origin logs/refs logs/refs/heads refs/remotes/origin/master refs/tags logs/refs/heads/master refs/heads description branches refs/heads/master info FETCH_HEAD
This section is dedicated to preventing problems for parties that need to share a git repository.
Litigation is by nature an adversarial activity. When computer software changes evidence without notice or warning, people tend blame each other.
A dangling commit is a commit that is unreachable from any other commit. One way to make a dangling commit is to make a commit on a detached head.
Git runs garbage collection periodically. This happens without warning. One of the functions that git garbage collection performs is to delete (prune) all dangling commits. There is no message to alert the user that dangling commits were found or that they were pruned.
This is can lead to investigators experiencing files disappearing from a git repo after a period of time, as if they were written in the digital equivalent of disappearing ink. The party that obtained the git repository might accusing the other party of destroying evidence.
To avoid this potentially very damaging accusation, 3 actions should be performed before giving a git repository to another party:
If you want to preserve a dangling commit, give it a name. Do this before performing the other two actions, described next.
Giving names to dangling commits prevents the garbage collector from deleting them.
You can name a dangling commit by creating an annotated tag.
The following example creates an annotated tag called dangle1
:
$ git tag -m 'Named this dangling commit' -a dangle1 283492384928349823
The following verifies the integrity of the repository's object database, and prunes dangling objects.
$ git fsck --unreachable --dangling --no-reflogs
The following code runs git gc
(garbage collection) with extra care and attention.
It also expires the contents reflog
, in other words, it empties the reflog
.
For more information please see Configuring Garbage Collection.
git gc
removes unreachable (“dangling”) objects, which might be commits, trees (directories), and blobs (files).
An object is unreachable if it is not part of the history of some branch.
git gc
does not normally remove unreachable objects that are younger than two weeks,
so we use --prune=now
which means
“remove unreachable objects that were created before now”.
$ git gc --aggressive --prune=now
We need to remove the reflogs to remove blobs that are not reachable from any branch.
Please see Reflog Configuration for more information.
We do so by expiring --all
reflogs.
--expire-unreachable=now
.
$ git reflog expire --expire-unreachable=now --all
This section is dedicated to overcoming issues with git repositories provided by the other party, whether deliberately caused or due to honest mistakes.
The computers that are provided to software experts when visiting the opposition's clean room to inspect their client's software
never have internet access.
This means that commands like git fetch
and git clone
are non-functional.
The lack of connectivity restricts options for dealing with issues.
The git fsck
command can be used to verify the integrity of a git repository.
It can also identify dangling and unreachable objects.
git-fsck
tests SHA-1
and general object sanity, and it does
full tracking of the resulting reachability and everything
else. It prints out any corruption it finds (missing or bad
objects), and if you use the --unreachable flag it will also
print out objects that exist, but aren’t reachable from
any of the specified head nodes (or the default set, as
mentioned above).--lost-found
.git/lost-found/commit/
or
.git/lost-found/other/
, depending on type. If the object
is a blob, the contents are written into the file, rather
than its object name.--root
--unreachable
The first commit of most git
repos is the root
node.
It is possible for a git
repo to have more than one root
node;
in that case you will have to examine them to determine which was 'first',
according to what you might mean by 'first'.
$ git fsck --lost-found --root --unreachable root 3fa77c58f85c591f9c6a1b0510228e4aec704697 Checking object directories: 100% (256/256), done.
If .git/HEAD
has been deleted, then git
commands give an error,
like the following:
$ git log fatal: not a git repository (or any of the parent directories): .git
Recreate HEAD
to point to the tip of the master
branch like this:
$ echo "ref: refs/heads/master" > .git/HEAD
If the git
project you are working with was created on GitHub recently,
HEAD
should probably point to the tip of the main
branch instead:
$ echo "ref: refs/heads/main" > .git/HEAD
Now git
commands should work, unless other problems are also present.
If the staging area in .git/index
has been deleted, the git status
command shows all the files and directories in the project as having been deleted,
and also shows those same files as being untracked.
Since a file or directory cannot both be deleted and untracked,
this contradictory result indicates that .git/index
was deleted or is damaged.
$ rm .git/index
$ git status On branch master Your branch is up to date with 'origin/master'.
Changes to be committed: (use "git restore --staged..." to unstage) deleted: .gitignore deleted: .rspec deleted: .rubocop.yml
Untracked files: (use "git add..." to include in what will be committed) .gitignore .rspec .rubocop.yml
To rebuild index
, without disturbing the worktree, type:
$ git reset --mixed $ git status On branch master Your branch is up to date with 'origin/master'.
nothing to commit, working tree clean
The git fsck --root
option shown above
yields the hash of the first commit, but
that value is mixed with other tokens which are a pain to parse.
To display the hash of the first commit such that it can be easily stored into an environment variable,
use the following incantation:
$ git log --reverse --format="%h" | head -n 1
Define the environment variable COMMIT0
like this:
$ COMMIT0="$( git log --reverse --format="%h" | head -n 1 )"
The following incantation lists the filenames in a commit:
The --root
option allows this to work with the root commit.
$ git show --format="" --name-only --root $COMMIT0
The following incantation displays the files changed by a commit.
$ git diff-tree -r --name-only --root $COMMIT0
Display the contents of a file, given its hash:
$ git cat-file -p a997766
The hash of the commit that added the first version of a file is easily discovered with the following incantation.
$ git log --format="%h" --diff-filter=A -- README.md 3fa77c5
We can save the result in an environment variable called COMMIT1
.
This environment variable will be used in the remainder of this document.
$ COMMIT1="$( git log --format="%h" --diff-filter=A -- README.md )" $ echo $COMMIT1 3fa77c5
If you know the commit hash, the file contents as it existed in the commit can be displayed.
$ git show $COMMIT1:README.md
To compare the version of the file in the commit to the currently checked out version of the file,
provide the hash of the commit and the name of the file to the git diff
command.
Recall that $COMMIT1
refers to the hash of the commit that contains the
first version of README.md
.
$ git diff $COMMIT1 README.md
To compare to another version of the same file
(for example, the version that existed before the previous 2 commits to the current branch).
Note that the version that existed 2 commits ago might be identical to the version pointed
to by $COMMIT
because there is no guarantee that those 2 commits modified this file.
$ git diff $COMMIT1 HEAD~2 -- README.md
It is often more useful to examine the changes to a file instead.
To obtain the hashes of all modifications to a file,
excluding the commit that added the file to the repository,
use --diff-filter=M
:
$ README_MODS="$( git log --format="%h" --diff-filter=M -- README.md )" $ echo "$README_MODS" # quotes keep each value on a separate line: 841c17a 7e30894 18a09e3 d71002d $ echo "$README_MODS" | tac # Reverse the list d71002d 18a09e3 7e30894 841c17a
To obtain the hash of the 2nd change to the file, which is the 3rd version of the file:
$ echo "$README_MODS" | tac | sed '2q;d' 18a09e3
To compare the 3rd version of the file (which has the hash immediately above) to the 4th version, first do some setup:
$ README3="$( echo "$README_MODS" | tac | sed '2q;d' )" $ README4="$( echo "$README_MODS" | tac | sed '3q;d' )" $ echo $README3 $README4 d71002d 18a09e3
There are two types of incantations that can produce diffs of a file. The first type of incantation allows comparing two arbitrary versions. For this incantation, all that is required are the hashes of both file versions:
$ git diff $README3 $README4
The second type of incantation compares an arbitrary version against the version in HEAD
.
For this incantation, provide the hash
of the commit,
the hash of the desired version to use as the basis for comparison,
and the name of the file.
$ git diff $COMMIT1 $README3 -- README.md
This article is not meant to be exhaustive. These are just my notes about global settings that I use.
~/.gitconfig
contains an OS user’s global settings.
These settings are normally maintained by using the git config ‑‑global
command,
but the file could also be modified using a text editor.
[branch "master"] remote = origin merge = refs/heads/master [core] filemode = false autocrlf = input safecrlf = false pager = less -F [color] status = auto branch = auto ui = auto [gui] trustmtime = true [push] default = matching autoSetupRemote = true autoSetupRemote = true [user] name = Mike Slinn email = mslinn@mslinn.com [rebase] autostash = true [diff "exif"] textconv = exiftool [diff] compactionHeuristic = true colorMoved = zebra [init] defaultBranch = master [pull] rebase = true [fetch] prune = true
The following commands created some of the above settings.
$ git config --global pull.rebase true $ git config --global fetch.prune true $ git config --global diff.colorMoved zebra
For an explanation of the above settings, please see Three Git Configurations that Should Be the Default.
$ git config --global rebase.autostash true
For an explanation of the above setting, please see this git tip by Chi Shang Cheng.
For an explanation of the core.pager
setting,
please see The Git Pager.
Some git commands automatically run git gc
according to various configuration parameters.
Although you can permanently disable this behavior, I do not recommend that you normally do this.
However, if you need to preserve the state of a repository,
perhaps to prepare a repository to be used as evidence in a legal proceeding,
then this would be necessary.
Following are two equivalent syntaxes. Note that these commands only operate on the current git project. I do not advise making these settings global.
$ git config gc.auto never $ git config gc.auto 0
If you do not want to type
merge comments,
you can either use the --noedit
option each time you run git merge
, like this:
$ git merge --no-edit
Or you can define a bash alias, like this:
alias git.merge='git merge --no-edit'
Or you can add the following to ~/.bashrc
:
export GIT_MERGE_AUTOEDIT=no
The reflog records the change history of blobs (files); it contains all changes to a repository. Some commands that change a git repository’s history include:
git commit
(appends blobs and perhaps trees to a branch)git commit --amend
(replaces the most recent commit)git reset
(undoes local changes to a Git repo)git rebase
(moves or combines a sequence of commits to a new base commit)
Stashes are implemented using the reflog, which means that they do not normally persist forever.
Instead, their maximum lifespan is set by the gc.reflogExpire
configuration setting,
described next.
The default length of time that blobs that are reachable from a branch,
or another HEAD
such as a stash,
are allowed to persist in the reflog is 90 days.
After this time they are subject to garbage collection the next time one of several git commands runs.
This time period value can be modified with the gc.reflogExpire
configuration setting.
Unfortunately, the default value is not displayed when queried; you need to know that you should interpret the following lackof response to mean 90 days:
$ git config gc.reflogExpire
Various time intervals can be used, for example days
, weeks
, months
, years
and never
.
The default time interval is days
.
Let’s change the default expiry period to 3 days, then query it:
$ git config gc.reflogExpire 3 $ git config gc.reflogExpire 3
We can display the changes to .git/config
that resulted from the above command as follows:
$ awk '/\[gc\]/{f=1}f' .git/config [gc] reflogExpire = 3
The following shows 3 syntaxes, all of which change the value for the current repository to expire after 3 days:
$ git config gc.reflogExpire 3 $ git config gc.reflogExpire 3.days $ git config gc.reflogExpire '3 days'
Other units can be specified, and two syntaxes are supported:
$ git config gc.reflogExpire 3.weeks $ git config gc.reflogExpire '3 weeks' $ git config gc.reflogExpire 3.months $ git config gc.reflogExpire '3 months' $ git config gc.reflogExpire '3 years' $ git config gc.reflogExpire 3.years
You can change gc.reflogExpire
to never expire;
however, be warned that this will cause your local copy of the git repository to grow without bounds,
which will eventually cause your computer to grind very slowly every time you perform an operation on the repository.
$ git config gc.reflogExpire never
You can change the value for the current repository to expire immediately as follows.
Because this is a permanent setting, this actually means that you do not want a reflog;
which implies that the git stash
command will be disabled,
and refs that rely on the reflog will not be available.
$ git config gc.reflogExpire now $ awk '/\[gc\]/{f=1}f' .git/config [gc] reflogExpire = now
gc.reflogExpireUnreachable
is used to set how long dangling commits should be preserved.
This value defaults to 30 days.
As with the gc.reflogExpire
configuration,
it is again unfortunate that the default value for gc.reflogExpireUnreachable
is not displayed when queried.
You need to know that the following lack of response means 30 days,
not 90 days as is the case for gc.reflogExpire
:
$ git config gc.reflogExpire
The same time span syntax is used for gc.reflogExpire
and gc.reflogExpireUnreachable
,
including the values now
and never
.
Following is an example of setting the time span to 13 days:
$ git config gc.reflogExpireUnreachable 13 $ awk '/\[gc\]/{f=1}f' .git/config [gc] reflogExpireUnreachable = 13
Many people use git-merge
for years without realizing how complex this command is.
Git-pull
utilizes git-merge
when merging changes.
Because libgit2
is a low-level API,
higher-level functionality such as git
’s git-pull
is not provided.
If you need higher-level functionality you must either write it yourself,
or use whatever extra capability provided by the language binding your project uses.
This means it is important to understand the low-level complexity when using
libgit2
and its language bindings,
such as rugged
or pygit2
to merge and pull updates.
$
Merging two commit sequences is generally performed when the sequences are on different branches. The commits on the branch to be merged are incorporated into the commits of the current branch, also known as the target branch. This article has several references to the target branch; remember that this is always the current branch. This is so the merge commit can be created, and the working tree and index can be updated.
When discussing the mechanics of a merge,
and the HEAD
of the target branch is called the target HEAD
,
the branches to be merged are called merge branch(es).
and the HEAD
(s) of the merge branches are called merge HEAD
(s).
If the git
repository is not corrupt,
the target branch and the branch to be merged share a common ancestor commit.
This commit is called the merge base.
The git-merge-base
command shows this commit.
$
The git-merge
command first analyzes the commit sequences to merge.
It categorizes the situation into one of four possible scenarios:
HEAD
s are reachable from the target HEAD
.
This means the target HEAD
is up-to-date, and no action needs to be taken.
HEAD
.HEAD
can be reached by fast-forwarding from the merge HEAD
HEAD
can simply be set to the merge HEAD
.
In this scenario, both the target branch and the commit sequence to merge have diverged from their common ancestor. A normal merge (sometimes called a true merge) is required to reconcile divergent branches.
A normal merge will also be performed even though fast-forwarding might be possible when:
‑‑no‑ff
option to the git-merge
command is specified.Git
was configured to default to
disabling fast-forward
prior to running the git-merge
command.
$ git config --add merge.ff false
The divergent commit sequences are merged by creating a new commit that joins them; this is called a merge commit. As you should know, commits are always made on the current branch. The reason why target branch must be the current branch for a merge is so the merge commit is created on the proper branch.
The merge commit has a reference to the first commit in the sequence to merge, and a reference to the common ancestor commit in the target branch; this is why we say the merge commit has two parents.
Octupus commits have more than two parents. Because this article does not discuss merge strategies, no further mention will be made of Cthulhu.
At this point the newly created merge commit is HEAD
.
After the new commit is created, the working tree and the index are updated.
HEAD
of the target branch is said to be unborn when it does not point to anything yet.
Since there is no common ancestor for HEAD
and the commit sequences to merge,
a normal merge is not possible.
If there is only one commit sequence to merge,
the logical action would be to set HEAD
of the current branch to the HEAD
of the commit sequence to merge.
Examining the libgit2
C source code for
merge.c
and
merge.h
should help understand the mechanics better.
Libgit2
has a method called
git_merge_analysis
that performs the same analysis as just described.
It is used in the merge.c
example.
My Introduction to Libgit2
article mentioned the following language bindings (wrappers) for libgit2
:
git2go
LibGit2
libg2sharp
nodegit
pygit2
rugged
git2-rs
My Merge and Pull: git CLI vs. Libgit2 Wrappers
article discusses the mechanics of git
merging and related concepts.
My libgit2
Examples
article explores the C code examples written by the libgit2
team
to demonstrate how to use their library.
Libgit2
is a low-level API, providing the ‘plumbing’
that is common to all of the language-specific wrappers.
However, the git
CLI also provides a set of high-level (‘porcelain’) commands,
which are not implemented by libgit2
.
Examples are git-push
,
git-pull
and
git-merge
.
These high-level commands are user-friendly and would also be very useful if libgit2
implemented them.
That is not the case, unfortunately,
and there does not seem to be a high level of interest by the libgit2
developers in providing them.
Many of the language wrapper libraries have some degree of support for the high-level functionality.
Unfortunately, there is no co-ordination amongst the development communities that have formed around the language-specific libraries.
This means that the work that is in process now by the various groups of developers to eventually provide equivalent functionality to the
git cli
does not share a common API.
In some cases the approaches are completely different.
As a result, the code bases for the gitlib2
language wrappers are such that progress made by a team that is working with one language
is not very helpful to other teams, working on other language wrappers.
A few weeks after publishing this article I found the
libgit2
example code.
These examples are a mixture of basic emulation of core Git command line functions and
simple snippets demonstrating
libgit2
API usage (for use with Docurium).
As a whole, they are not vetted carefully for bugs, error handling,
and cross-platform compatibility in the same manner as the rest of the code in libgit2
,
so copy with caution.
That being said, you are welcome to copy code from these examples as desired when using libgit2
.
They have been
released to the public domain,
so there are no restrictions on their use.
I have not had time to see how these code examples compare to the production code for other language bindings, discussed below.
For example, the pygit2
wrapper library for Python's implementation for git-merge
uses a completely different concept from that of the rugged
wrapper library for Ruby.
Neither library provides support for a git-pull
work-alike.
Although many have stated that
“git-pull
is just git-fetch
followed by git-merge
”
that is a radical over-simplification, and does not properly
describe the work that git-pull
actually does.
Michael Boselowitz implemented
git-pull
for pygit2
.
He naturally used pygit2
’s
Repository.merge
method.
I highlighted that line in the following code.
def pull(repo, remote_name='origin', branch='master'):
for remote in repo.remotes:
if remote.name == remote_name:
remote.fetch()
remote_master_id = repo.lookup_reference('refs/remotes/origin/%s' % (branch)).target
merge_result, _ = repo.merge_analysis(remote_master_id)
# Up to date, do nothing
if merge_result & pygit2.GIT_MERGE_ANALYSIS_UP_TO_DATE:
return
# We can just fastforward
elif merge_result & pygit2.GIT_MERGE_ANALYSIS_FASTFORWARD:
repo.checkout_tree(repo.get(remote_master_id))
try:
master_ref = repo.lookup_reference('refs/heads/%s' % (branch))
master_ref.set_target(remote_master_id)
except KeyError:
repo.create_branch(branch, repo.get(remote_master_id))
repo.head.set_target(remote_master_id)
elif merge_result & pygit2.GIT_MERGE_ANALYSIS_NORMAL:
repo.merge(remote_master_id)
if repo.index.conflicts is not None:
for conflict in repo.index.conflicts:
print 'Conflicts found in:', conflict[0].path
raise AssertionError('Conflicts, ahhhhh!!')
user = repo.default_signature
tree = repo.index.write_tree()
commit = repo.create_commit('HEAD',
user,
user,
'Merge!',
tree,
[repo.head.target, remote_master_id])
# We need to do this or git CLI will think we are still merging.
repo.state_cleanup()
else:
raise AssertionError('Unknown merge analysis result')
I wrote a literal translation of Michael Boselowitz’s Python code to Ruby,
using rugged
.
As mentioned, rugged
for Ruby implements merge
differently.
I highlighted the problematic line; scroll the souce listing to see it.
require 'rainbow/refinement'
require 'rugged'
require_relative 'credentials'
require_relative 'repository'
class GitUpdate
using Rainbow
abort "Error: Rugged was not built with ssh support. Please see https://www.mslinn.com/git/4400-rugged.html".red \
unless Rugged.features.include? :ssh
# Just update the default branch
def pull(repo, remote_name = 'origin') # rubocop:disable Metrics/AbcSize, Metrics/MethodLength
remote = repo.remotes[remote_name]
unless remote.respond_to? :url
puts " Remote '#{remote_name}' has no url defined. Skipping this repository."
return
end
puts " remote.url=#{remote.url}".yellow
default_branch = repo.head.name.split('/')[-1]
refspec_str = "refs/remotes/#{remote_name}/#{default_branch}"
begin
success = remote.check_connection(:fetch, credentials: select_credentials)
unless success
puts " Error: remote.check_connection failed.".red
return
end
remote.fetch(refspec_str, credentials: select_credentials)
rescue Rugged::NetworkError => e
puts " Error: #{e.full_message}".red
end
abort "Error: repo.ref(#{refspec_str}) for #{remote} is nil".red if repo.ref(refspec_str).nil?
remote_master_id = repo.ref(refspec_str).target
merge_result, = repo.merge_analysis remote_master_id
case merge_result
when :up_to_date
# Nothing needs to be done
puts " Repo at '#{repo.workdir}' was already up to date.".blue.bright
when :fastforward
repo.checkout_tree(repo.get(remote_master_id))
master_ref = repo.lookup_reference 'refs/heads/master'
master_ref.set_target remote_master_id
repo.head.set_target remote_master_id
when :normal
repo.merge remote_master_id # rugged does not have this method
raise "Problem: merging updates for #{repo.name} encountered conflicts".red if repo.index.conflicts?
user = repo.default_signature
tree = repo.index.write_tree
repo.create_commit 'HEAD', user, user, 'Merge', tree, [repo.head.target, remote_master_id]
repo.state_cleanup
else
raise AssertionError 'Unknown merge analysis result'.red
end
end
def update_via_rugged(dir_name)
repo = Rugged::Repository.new dir_name
pull repo
rescue StandardError => e
puts "Ignoring #{dir_name} due to error: .#{e.full_message}".red
end
end
Unfortunately, rugged
does not offer a similar method.
Instead, Tree.merge
is provided (written in C), which accepts completely different parameter types.
Here is an example of how to merge two trees using rugged
:
tree = Rugged::Tree.lookup(repo, "d70d245ed97ed2aa596dd1af6536e4bfdb047b69") diff = tree.diff(repo.index) diff.merge!(tree.diff)
This means the code I wrote above (the literal translation) does not work because
theses two language libraries for libgit2
have diverged.
Michael Boselowitz’s Python code does not handle all possible scenarios. He provided valuable prototype code, not production code. I have attempted to illustrate important issues in the articles I write, but I am also not pretending to publish production code in these articles either. For more information, including a discussion of the missing scenarios, please read the aforementioned article Merge and Pull: git CLI vs. Libgit2 Wrappers.
I will have to significantly rewrite my Ruby code to accomodate the differences between
implementations of higher-level git
APIs for the various language binding libraries.
Progress would happen much faster if the communities downstream from libgit2
co-operated in a standards effort, and/or the libgit2
community
authorized an architect/evangelist to help standardize higher-level APIs.
Progress made by any project downstream would be available,
after translation, to all other projects.
A lucid dream worthy of taking action on.
Libgit2
does not using the system ssh
command.
Instead, it is statically linked to libssh2
,
which does not read .ssh/config
,
so libgit2
does not support that either.
In contrast, command line git
uses the ssh
command,
and therefore supports .ssh/config
.
Git can be configured to use a specific ssh
key for authorization of a given repo.
Within the repo, type:
$ git config core.sshCommand "ssh -i ~/.ssh/id_rsa"
The above causes the following to be added to .git/config
:
[core]
sshCommand = ssh -i ~/.ssh/id_rsa
This is helpful for debugging ssh connectivity issues, for example by increasing ssh
verbosity:
$ git config core.sshCommand "ssh -vi ~/.ssh/id_rsa"
Both command line git
and libgit2
use any
ssh-agent
that is running.
# Launch ssh-agent; best to do this in ~/.bashrc $ eval $(ssh-agent) > /dev/null
$ ssh-add Identity added: /home/mslinn/.ssh/id_rsa (/home/mslinn/.ssh/id_rsa)
$ ssh-add -l 1024 SHA256:Xdv1AE4QTd0NfrwOGVTamF/wxnvFufCtsOIoOXtX5Mw /home/mslinn/.ssh/id_rsa (RSA) $ echo $SSH_AGENT_PID 2196
If you have ever needed to work on a relatively small portion of a large git repository, you know how slow things can get, and how problems arise with large files and directories. Two new features, partial clone and sparse checkout, can be used together to dramatically speed things up. Also, signifiantly less storage will be required on your computing device!
Git added a partial clone feature in version 2.24, via git clone --filter
.
Git’s sparse checkout feature became user-friendly in version 2.25 with
the addition of the git sparse-checkout
and git clone --sparse
porcelain commands.
By default, git repositories have up to 3 copies of every file. Copies can exist in git’s:
.git/..
,
which is the the parent directory of the .git
directory.
The contents of the .git
directory are not part of the working tree.
.git/index
.
When you run git add
or git commit -a
,
a new snapshot of your working tree is saved to the index.
.git/objects
.
When you run git commit
,
the contents of the snapshots in the index are saved to the object database.
If you want to work on a subdirectory of a large git project, you may not want to have the entire project’s repository on your device. A partial clone, combined with the git sparse checkout feature allows you to just work on the subdirectory of interest in your repository.
By itself, sparse checkout only affects the working tree, and hence the index.
In contrast, git’s object database is by default complete.
Sparse checkout means that for this local repository,
only selected portions of the repository’s object database are instantiated in the working tree.
When you git push
from a sparse clone to a remote repository such as origin
,
the snapshots contained in the local repository’s entire object database
are copied to the remote repository.
The integrity of the entire original repo is maintained. If someone else checks out the new repository, without performing the sparse checkout procedure, their working tree will populated from the complete contents of the original repository’s object database.
$
As of the date this was written (2023-06-02),
the git-sparse-checkout
command was still marked experimental.
The features and syntax have changed significantly since it was first proposed.
The git sparse-checkout init
subcommand is
now deprecated and no longer recommended.
Non-cone mode is also deprecated.
Read about cones here.
Partial clones work by specifying a filter that limits which objects are fetched. In the following examples, <repo> stands for the URL of a remote repository:
$ # omit all blobs $ git clone --filter=blob:none <repo> $ # omit blobs larger then 1 MB $ git clone --filter=blob:limit=1m <repo>
By default, partial clones retrieve missing objects when the user attempts to access them. Thus, a partial clone will grow larger over time unless sparse checkout is used in conjunction with a partial clone.
Sparse checkouts allow you to restrict the files and directories that git can retrieve from the remote repository. When sparse checkout is used with partial cloning, the two features work together so that not only is the size of the working tree reduced, but the git object database also reduced in size, so that only the requested objects are fetched from the remote repository, on demand.
$
The project I wanted to work on was
Sinatra-ActiveRecord
and I wanted to play with the sample project for sqlite
.
The sample project was very small (too small to be useful, actually!),
so it made no sense to fill my computing device with an overly large repository.
I wanted to eventually create two git remotes:
upstream
– pointing to the original git repo,
sinatra-activerecord/sinatra-activerecord
.
origin
– pointing to a new repo in my GitHub account
that will contain the complete original repo's contents and history, plus my changes.
This repo will be called mslinn/sinatra-activerecord-sqlite
.
In the following command,
notice how I used the ‑‑origin
option to name the upstream
remote,
instead of using the default name, origin
.
$ git clone \ --filter=blob:none \ --origin upstream \ --sparse \ https://github.com/sinatra-activerecord/sinatra-activerecord/ Cloning into 'sinatra-activerecord'... remote: Enumerating objects: 1020, done. remote: Counting objects: 100% (145/145), done. remote: Compressing objects: 100% (74/74), done. remote: Total 1020 (delta 41), reused 123 (delta 38), pack-reused 875 Receiving objects: 100% (1020/1020), 131.40 KiB | 3.86 MiB/s, done. Resolving deltas: 100% (245/245), done. remote: Enumerating objects: 9, done. remote: Counting objects: 100% (6/6), done. remote: Compressing objects: 100% (6/6), done. remote: Total 9 (delta 0), reused 0 (delta 0), pack-reused 3 Receiving objects: 100% (9/9), 6.15 KiB | 6.15 MiB/s, done. $ cd sinatra-activerecord/
The ‑‑filter=blob:none
option in the above git clone
command
suppressed all but the top-level population of the working tree.
The same thing would have happened if ‑‑filter=tree:0
had been
used instead of ‑‑filter=blob:none
.
The only items in the working tree are the top-level files at this point:
$ ls -aF1 ./ ../ .git/ .gitignore Appraisals CHANGELOG.md CONTRIBUTING.md Gemfile LICENSE README.md Rakefile sinatra-activerecord.gemspec
Now we can ask for just the portions of the repository that interest us.
Notice that a checkout happens right after the git-sparse-checkout set
command.
Directories specified by git-sparse-checkout
must not have a leading slash.
$ git sparse-checkout set example/sqlite remote: Enumerating objects: 14, done. remote: Counting objects: 100% (1/1), done. remote: Total 14 (delta 0), reused 0 (delta 0), pack-reused 13 Receiving objects: 100% (14/14), 2.36 KiB | 2.36 MiB/s, done. Resolving deltas: 100% (1/1), done. $ git sparse-checkout list example/sqlite
Here are the files and directories that I just sparsely cloned from the repo:
$ ls -af example/sqlite/ README.md ./ config/ app.rb Gemfile config.ru bin/ ../ Rakefile db/
Next I used the GitHub CLI to create a repo in my GitHub account
for containing the complete repo, along with my modifications.
This command created a remote called origin
,
which points at the GitHub repo that was just created.
$ gh repo create --public --source=. --remote=origin ✓ Created repository mslinn/sinatra-activerecord-sqlite on GitHub ✓ Added remote git@github.com:mslinn/sinatra-activerecord-sqlite.git
The above gh repo create
command automatically names the repo from the current directory name.
I do this so often that I defined 2 bash aliases in ~/.bash_aliases
:
alias gh_new_private='gh repo create --private --source=. --remote=origin' alias gh_new_public='gh repo create --public --source=. --remote=origin'
This article discusses low-level git
commands.
It builds on the material presented in Git Concepts.
The commands are presented in alphabetical order.
Unfortunately, I do not say much of interest in this page between git-cat-file
(next),
and git-rev-parse
.
This page will be updated.
The version of git
used in this article is:
$ git --version git version 2.37.2
Here is the help information for git-cat-file
.
$
This example obtains the type of a reference for .gitignore
at HEAD
:
$ exec git cat-file -t HEAD:.gitignore blob
Remember, a git blob
is just a stored file,
and a git tree
is just a stored directory.
Two syntaxes are possible to display the contents of a file, given its SHA:
$ git cat-file -p a997766
$ git cat-file a997766^{blob}
This example lists the first 10 lines of the contents of the version of the
.gitignore
file at the HEAD
commit:
$ echo HEAD:.gitignore | git cat-file --batch | head 6e11c49ab69e6d2ca5109dffd269b0ce3e97f767 blob 583 .yarn/* !.yarn/cache !.yarn/patches !.yarn/plugins !.yarn/releases !.yarn/sdks !.yarn/versions .bsp/ exe/
$
The help information is:
Here is an example:
$ cd="$jekyll_pre" git describe HEAD~3 v1.3.0-4-g3325b45
Git-Diff
can create patches between two refs.
Here is the help message:
$
To create a patch between HEAD
and the previous commit:
$
Computes the object id (oid) for an object, and optionally store it as a blog.
$
$ echo 'test content' | git hash-object --stdin d670460b4b4aece5915caf5c68d12f560a9fe3e4
Git log
is flexible and powerful.
Here is the help message
$
The following uses git-log
to view the commit message, the commit date, the branch,
the remote branch and author information of the most recent commit on the current branch:
$ git log -1 commit d9eaf1372978872c232c2875c7376f4e9e2dbd7f Author: Mike Slinn <mslinn@mslinn.com> Date: Mon May 15 14:29:46 2023 -0400
this is a comment
Similarly, git log -N
returns information about the
N
most recent commits.
$ git log -3 commit 795e7d363e0677ebf64ff02fc8e2e03bccf1ff9f (HEAD -> master, origin/master, origin/HEAD) Author: Mike SlinnDate: Thu Jul 13 14:20:50 2023 -0400
this is a comment
commit c0b0e51264c6077815241c8d0c584f2746cd30ab Author: Mike SlinnDate: Thu Jul 13 14:03:44 2023 -0400
yet another comment
commit a34f1b489ae450edec45acfedd46177135ae4bd6 Author: Mike SlinnDate: Thu Jul 13 14:02:54 2023 -0400
yay, another comment
To get information about the first commit that has a specific file in it:
$ git log --diff-filter=A -- README.md commit c8c92b5f19e49402aa367d470fb89b289ae788f0 Author: Mike SlinnDate: Sat Jul 11 21:11:38 2020 -0400
This is the comment
To specify a file path, do not provide a leading slash. For example, write the path this way:
$ git log --diff-filter=A -- path/to/file.ext
Do not write it this way:
$ git log --diff-filter=A -- /path/to/file.ext
To get the hash of the first commit that has the file in it:
$ git log --format="%h" --diff-filter=A -- path/to/file.ext c8c92b5
To get 9 digits of the hash, use the --abbrev
option:
$ git log --format="%h" --abbrev=9 --diff-filter=A -- path/to/file.ext c8c92b5f1
Here is the help information for git-ls-files
.
$
Git ls-tree
shows the contents of a tree object.
This is the git-ls-tree
man page:
$
-t
seems to be the default--abbrev=6
shortens the SHAs to the first 6 characters, which is usually enough to make them unique
HEAD
points to the snapshot of the most recent commit to a branch.
To see what the first 10 lines of that snapshot looks like,
run the following in the root directory of your repository:
$ git ls-tree --name-only HEAD | head .bundle .gitignore .rubocop.yml .ruby-version .shellcheckrc .vscode 404.html 670nm.html
Directory enties can be suppressed,
such that only files are recursively displayed,
via the -r
option:
$ git ls-tree --name-only -r HEAD | head 100644 .bundle/config .gitignore .rubocop.yml .ruby-version .shellcheckrc .vscode/launch.json .vscode/settings.json .vscode/tasks.json 404.html 670nm.html
The default is to display the commit’s mode, type and SHA as well as the file name:
$ git ls-tree HEAD | head 040000 tree aaf10b00a2daba90550321ed46912db00c690a62 .bundle 100644 blob 6e11c49ab69e6d2ca5109dffd269b0ce3e97f767 .gitignore 100644 blob f480670f694bc102098c514b695b177b6258cc3f .rubocop.yml 100644 blob fd2a01863fdd3035fac5918c59666363544bfe23 .ruby-version 100644 blob 4e0ef479723860d16a332347a691d422a0ef2770 .shellcheckrc 040000 tree bd6408dfd6ecaba73a77c4cff0c9a82dadff76d6 .vscode 100644 blob 14ad2b4cc7658f00f26da39cc70583f6750c2943 404.html 100644 blob 1fe1aa295d166663285b54cb94954bfe19da152c 670nm.html 100644 blob 02862b254846b5669596de4d2795d023ebc87c7c BingSiteAuth.xml 100644 blob e527be14f8b957cbbd3517a16198a701e38f5dd7 Gemfile
This is the help message for git-rev-parse
– the Swiss Army Knife for git
.
It supports revision syntax.
$
There are several ways to discover the SHA
of the HEAD
commit:
$ git rev-parse HEAD d9eaf1372978872c232c2875c7376f4e9e2dbd7f
$ $ git rev-parse --symbolic-full-name HEAD refs/heads/master
$ git rev-parse refs/heads/master d9eaf1372978872c232c2875c7376f4e9e2dbd7f
$ git rev-parse --short HEAD d9eaf137
$ git rev-parse --short master d9eaf137
$ git rev-parse --short refs/heads/master d9eaf137
$ git rev-parse --short @ d9eaf137
Read about accessing parent commits.
You can get the SHA of the penultimate commit (the parent of HEAD
),
and the SHA of the commit before that:
$ git rev-parse --short HEAD~ e4e53f69
$ git rev-parse --short HEAD~2 1d711f74
Use the reflog to access the snapshot of the previous value of HEAD
:
$ git rev-parse --short HEAD@{1} e4e53f69
This is the help message for git-show
:
$
This is the help message for git-show-ref
:
$
View all refs that are a HEAD
in a repo:
$ git show-ref --abbrev=8 --head d9eaf137 HEAD d9eaf137 refs/heads/master d9eaf137 refs/remotes/origin/HEAD d9eaf137 refs/remotes/origin/master e57fd0ba refs/stash c05d0d37 refs/tags/1
Show all references called master
, including tags, heads and any other refs, including remote refs:
$ git show-ref --abbrev=8 --head master d9eaf137 HEAD d9eaf137 refs/heads/master d9eaf137 refs/remotes/origin/master
Show the SHA
for HEAD
:
$ git show-ref -s --head HEAD d9eaf1372978872c232c2875c7376f4e9e2dbd7f d9eaf1372978872c232c2875c7376f4e9e2dbd7f
The help message is:
Obtain the value of your HEAD
:
$ git symbolic-ref HEAD refs/heads/master
Set the value of HEAD to the test
branch.
(This is what git branch test
does).
$ git symbolic-ref HEAD refs/heads/test
$ cat .git/HEAD ref: refs/heads/test
This is the help message:
$
Git-update-index
is a huge command.
Here are two use cases:
Similar to git add
, git update-index --add
adds a file in the working tree to the index.
Call git write-tree
to write the objects to the git
database.
Note that this does not make a commit.
Here are the steps to make a commit using low-level git
commands:
$ git update-index --add new.txt
$ SHA1="$( git write-tree }"
$ echo 'First commit' | git commit-tree $SHA1
If you need to make a temporary change to a file,
and you do not want to commit the change,
git update-index
has a handy option: --assume-unchanged
.
This option causes git
to temporarily ignore changes to that tracked file or directory.
Git
considers the changes to the files and directories you specify to be hidden,
even though the file system does not consider the file to be hidden.
Use it like this:
$ git update-index --assume-unchanged path/to/file.txt
Next time you run git status
you will see that git is no longer aware of the edit.
Any future changes to those files or directories will be ignored (hidden).
$ git commit -am "This is a comment"
$ git push
Actually, git
has not forgotten about the file,
it is merely ignoring changes to the hidden file.
The -v
option of the git-ls-files
command
causes the output to indicate hidden files with the h
flag:
$ git ls-files -v h path/to/file.txt
To restore the file’s changes to being noticed by git
again,
use the --no-assume-unchanged
option:
$ git update-index --no-assume-unchanged path/to/file.txt
You can read more about this option here.
The help message is:
The following creates a new branch called my_branch
at the current HEAD
.
This will create a file called .git/refs/heads/my_branch
.
$ git update-ref refs/heads/my_branch HEAD
The above is equivalent to:
$ git branch my_branch
This is the help message:
$
This article discusses low- to high-level git
concepts:
hashes, refs, terms and revision parameters.
If you are new to git
, the following easy-to-read trilogy provides a nice explanation:
This section was paraphrased, updated and enhanced from the Git Internals - Plumbing and Porcelain chapter of the Pro Git book, written by Scott Chacon and Ben Straub and published by Apress. The book is licensed under the Creative Commons Attribution Non-Commercial Share Alike 3.0 license.
Git was initially a toolkit for a version control system, rather than being user-friendly.
From the beginning, its low-level subcommands were designed to be chained together UNIX-style,
or called from scripts.
The low-level git
subcommands are referred to as plumbing subcommands.
Starting from 2015, more user-friendly git subcommands were added;
continuing with the plumbing metaphor,
the more user-friendly git subcommands are called porcelain commands.
There were 43 main porcelain subcommands when this article was last updated.
Type git help -a
to see these subcommands,
along with many other categories of subcommands.
$
Git currently uses SHA-1 to identify all types of objects it stores (commits, trees, blobs and annotated tags). Git has symbolic names for branches and tags, to spare you the awkwardness of having to use long alphanumeric identifiers. Combinations of symbolic names are called refs, which is short for references. While most refs usually refer to commits, tags are a special kind of ref that can refer to any of the four object types.
Git
v2.2.9 introduced SHA-256
for object names and content.
This required a new repository format.
There is no interoperability between SHA-1 and SHA-256 repositories yet.
No major Git provider is currently supporting SHA-256-enabled repositories yet.
A revision is anything which may be resolved to some kind of object stored in a Git object database, using Git’s DSL.
Git implements a DSL that can be used by combining ref names, SHA-1 names and operators.
This is documented in the
gitrevisions
man page, which is dedicated to specifying revisions and ranges for Git.
This is a difficult document to read.
I have rewritten and paraphrased the material in the remainder of this article.
Within the .git/
directory of a git
project, many entries are possible:
$ tree .git -FL 1 .git ├── COMMIT_EDITMSG ├── FETCH_HEAD ├── HEAD ├── ORIG_HEAD ├── branches/ ├── config ├── description ├── hooks/ ├── index ├── info/ ├── logs/ ├── objects/ ├── packed-refs └── refs/
Only 4 files and subdirectories are important for this discussion:
HEAD
HEAD
is a special reference.
By definition, it always points to the currently checked out commit.
However, this is not usually a direct pointer – instead, it is a symbolic reference,
which means that it points to a branch whose tip commit is currently checked out.
index
git
staging area,
also referred to by older documentation as the cache.
This is the data that is committed when you run git commit
.
In general, when you commit, you commit the index
.
objects/
objects
directory has 256 subdirectories,
which contain the actual database files.refs/
Every branch has a head, which is the pointer to the current branch reference,
which is in turn a pointer to the last commit made on that branch.
If the default branch is master
,
then the default head is the head of the master
branch.
The reference called HEAD
is equivalent to writing
something of the form HEAD/
.
If the current branch is master
you might write HEAD/
.
You could also be more precise by writing using the fully qualified form:
refs/
.
The value for HEAD
is persisted in .git/HEAD
:
$ cat .git/HEAD ref: refs/heads/master
The above defines HEAD
as refs/
.
This means the default branch is master
.
When a git repository has these contents in that file,
writing HEAD
is equivalent to heads/
and
refs/
.
$ git show-ref -s HEAD fcd6335681f917421ef3522bc9704c4800467aa0 $ git show-ref -s heads/master fcd6335681f917421ef3522bc9704c4800467aa0 $ git show-ref -s refs/heads/master fcd6335681f917421ef3522bc9704c4800467aa0
A refname is a symbolic reference name.
Examples include:
master
, heads/
,
refs/
and
refs/
.
The shorter refnames are convenient to write,
while using longer refnames avoids ambiguity.
The refname master
typically means the commit object referenced by
refs/
,
defined in the file .git/
.
However, the meaning of the short version of a refname might be ambiguious, depending on context.
For example, if a git
repository has both the refnames
heads/
(defined in the file .git/
) and
refs/
(defined in the file .git/
),
you can explicitly write heads/
or
refs/
to be precise.
Note that the SHAs of each reference are the same,
which would make sense if these repositories were both up-to-date sibling clones.
$ cat .git/refs/heads/master fcd6335681f917421ef3522bc9704c4800467aa0 $ cat .git/refs/remotes/origin/master fcd6335681f917421ef3522bc9704c4800467aa0 $ git show-ref master fcd6335681f917421ef3522bc9704c4800467aa0 refs/heads/master fcd6335681f917421ef3522bc9704c4800467aa0 refs/remotes/origin/master $ git show-ref -s heads/master fcd6335681f917421ef3522bc9704c4800467aa0 $ git show-ref -s refs/remotes/origin/master fcd6335681f917421ef3522bc9704c4800467aa0
When ambiguous, a refname is disambiguated by the contents of the first file found below:
.git/<refname>
Refname | Defined in | Description |
---|---|---|
HEAD |
.git/HEAD |
Names the commit on which you based the changes in the working tree. |
FETCH_HEAD |
.git/FETCH_HEAD |
Records the branch which you fetched from a remote repository with your last git fetch invocation. |
ORIG_HEAD |
.git/ORIG_HEAD |
This file and the refname are created by commands that move HEAD
in a drastic way, such as
git am ,
git merge ,
git rebase , and
git reset .
The purpose of this file and refname is to record the position of the HEAD before their operation,
so that you can easily change the tip of the branch back to the state before you ran them. |
MERGE_HEAD |
.git/MERGE_HEAD |
This file and refname record the commit(s) which you are merging into your branch when you run git merge. |
CHERRY_PICK_HEAD |
.git/CHERRY_PICK_HEAD |
This file and refname record the commit which you are cherry-picking when you run git cherry-pick . |
.git/refs/<refname>
.git/packed-refs/<refname>
.git/refs/tags/<refname>
.git/packed-refs/tags/<refname>
.git/refs/heads/<refname>
.git/packed-refs/heads/<refname>
.git/refs/remotes/<refname>
.git/packed-refs/remotes/<refname>
.git/refs/remotes/<refname>/HEAD
.git/packed-refs/remotes/<refname>/HEAD
The history of the previous values of references is stored in .git/
:
$
The above directory tree shows where the history of HEAD
is stored:
in .git/
.
Each git
branch has its own history,
under .git/
.
Internally, git
refers to branches as heads.
This might seem confusing; you will get used to it.
Each remote HEAD
has its history stored under
.git/
.
Each remote branch has its history stored under
.git/
.
Access the reference log by indexing a reference @{using braces}, preceded by an @ character.
For example, the previous HEAD
can be written like this: HEAD@{1}
.
The Git Glossary defines many terms, and StackOverflow clarifies them. I have expanded on some definitions:
<rev>:<path>
.:
),
followed by the name of the blob or tree at the given path.HEAD:README
, :README
,
master:path/to/file
, and master:README
.
HEAD
is assumed.
.git/
directory
contains the working tree.
Bare repositories
have no working tree.
.git/
directory is physically contained within the working tree,
but is not logically part of it.
The meaning of revision parameters depends on the git command they are used with. A revision parameter might denote:
git describe
:
a tag, optionally followed by a dash and a number of commits, followed by a dash, a g,
then an abbreviated object name.
git-log
,
which walk the revision graph,
revison parameters denote all commits which are reachable from that commit.
The range of revisions can also be explicitly specified.
git-cat-file
,
git-push
, git-show
,
and git-show-ref
,
accept revision parameters which denote types of objects other than commits.
For example, these commands can accept objects such as blobs (files) or trees (directories of files).
The syntax for revision parameters is easily confused with the syntax for a parent commit.
Revision parameters look like reference^{type}
,
whereas the parent of HEAD
is written as HEAD^
.
To be more specific, revision parameters are written with the following components:
{commit}
or {tree}
.
^0
is a shorthand for ^{commit}
.
The object is recursively dereferenced until an object of the desired type is found.
In the following example,
master^{tree}
returns the tree object associated with ref master
.
$
$
The following table shows examples of revision syntax in the left column,
and the returned class from Rugged::Repository.rev_parse
in the right column.
Hover your mouse over a row to see it highlighted.
Incantation | Returned Class |
---|---|
abf8efadc8 | Rugged::Commit |
abf8efadc8:README.md | Rugged::Blob |
abf8efadc8^ | Rugged::Commit |
abf8efadc8^{tree} | Rugged::Tree |
@ | Rugged::Commit |
HEAD | Rugged::Commit |
HEAD~3 | Rugged::Commit |
HEAD^ | Rugged::Commit |
HEAD^{tree} | Rugged::Tree |
master^{tree} | Rugged::Tree |
HEAD:README.md | Rugged::Blob |
master:README.md | Rugged::Blob |
HEAD@{0} | Rugged::Commit |
HEAD@{yesterday} | Rugged::Commit |
HEAD@{2 months ago} | Rugged::Commit |
HEAD@{1 month 2 weeks 3 days ago} | Rugged::Commit |
HEAD@{'Oct 15, 2021'} | Rugged::Commit |
HEAD@{'2021-10-15'}^{tree} | Rugged::Tree |
HEAD@{'2021-10-15'}:README.md | Rugged::Blob |
master@{yesterday} | Rugged::Commit |
master@{'2021-10-15'}:README.md | Rugged::Blob |
@{'2021-10-15'}:README.md | Rugged::Blob |
@{last week}:README.md | Rugged::Blob |
@{last month}:README.md | Rugged::Blob |
@{last year}:README.md | Rugged::Blob |
@{'2021-10-15 12:34'}:README.md | Rugged::Blob |
@{0} | Rugged::Commit |
v1.5.1 | Rugged::Commit |
v1.5.1^0 | Rugged::Commit |
v1.5.1^{} | Rugged::Commit |
:/bump | Rugged::Commit |
HEAD^{/bump} | Rugged::Commit |
The above table was produced by the following program and my
flexible_include
Jekyll plugin:
#!/usr/bin/env ruby require 'rainbow/refinement' require 'rugged' class GitRevisionException < StandardError; end using Rainbow EXPRESSIONS = [ 'abf8efadc8', 'abf8efadc8:README.md', 'abf8efadc8^', 'abf8efadc8^{tree}', '@', 'HEAD', 'HEAD~3', 'HEAD^', 'HEAD^{tree}', 'master^{tree}', 'HEAD:README.md', 'master:README.md', 'HEAD@{0}', 'HEAD@{yesterday}', 'HEAD@{2 months ago}', 'HEAD@{1 month 2 weeks 3 days ago}', "HEAD@{'Oct 15, 2021'}", "HEAD@{'2021-10-15'}^{tree}", "HEAD@{'2021-10-15'}:README.md", 'master@{yesterday}', "master@{'2021-10-15'}:README.md", "@{'2021-10-15'}:README.md", "@{last week}:README.md", "@{last month}:README.md", "@{last year}:README.md", "@{'2021-10-15 12:34'}:README.md", '@{0}', 'v1.5.1', 'v1.5.1^0', 'v1.5.1^{}', ':/bump', 'HEAD^{/bump}' ].freeze def do_one(rev_str) rev_str.strip! return nil if rev_str.strip.empty? begin result = @repo.rev_parse(rev_str).class td = "<td>#{result}</td>" rescue StandardError => e td = "<td class='error' style='padding: 1px 3px;'>#{e.message}</td>" end " <tr class='code'><td>#{rev_str}</td> #{td}</tr>" end def expand_env(str) str.gsub(/\$([a-zA-Z_][a-zA-Z0-9_]*)|\${\g<1>}|%\g<1>%/) do ENV.fetch(Regexp.last_match(1), nil) end end begin git_dir = expand_env '$rugged' abort 'Error: the $rugged environment variable is not defined'.red if git_dir.empty? @repo = Rugged::Repository.new git_dir puts <<~END_OUTPUT <table class="condensed noborder table"> <tr><th>Incantation</th> <th>Returned Class</th></tr> #{EXPRESSIONS.map { |x| do_one x }.compact.join("\n")} </table> END_OUTPUT rescue StandardError => e raise GitRevisionException, "#{e.class}: #{e.full_message}".red, [] end
If you want to be able to run this program,
you first need to install its dependency,
rainbow
:
$ gem install rainbow
gitrevisions(7)
man page.
By default, the git
commands that might generate lots of output use a pager,
so you can scroll through voluminous output easily.
These commands include:
git branch -l
git config --list
git diff
git log
The default pager is
less
.
To exit from less
,
you must press the q key, which is confusing for some people.
You can specify any shell command to use as the git
pager.
Many options exist for controlling the git
pager.
I prefer to use the last method shown.
A pager is never used if you pipe the git
output to another process, or into a file.
$ git -P log > my_log.txt
$ git -P diff | grep 'pattern'
To control the git
pager for just one git
command,
you have 2 options:
Pass the -P
option to git
, before the subcommand, like this:
$ git -P log ... lots of output...
$ git -P diff ... lots of output...
Control the pager by definoing the GIT_PAGER
environment variable inline (before the command).
To disable the pager for the output of the current command only,
set GIT_PAGER=cat
, like this:
$ GIT_PAGER=cat git log ... lots of output...
$ GIT_PAGER=cat git diff ... lots of output...
To use a pager only if the output of the current command is longer than one terminal screen,
set GIT_PAGER="less -F"
, like this:
$ GIT_PAGER="less -F" git log ... lots of output...
$ GIT_PAGER="less -F" git diff ... lots of output...
To disable the pager for all git
commands in the current shell or script,
define GIT_PAGER=cat
on a separate line, like this:
$ export GIT_PAGER=cat
$ git branch -l ... lots of output...
$ git log ... lots of output...
To use a pager only if the output of the current command is longer than one terminal screen in the current shell or script,
define GIT_PAGER="less -F"
on a separate line, like this:
$ export GIT_PAGER="less -F"
$ git branch -l ... lots of output...
$ git log ... lots of output...
These methods only affects the copy of the git repository that you are working on; other clones of the repository are not affected.
To permanently configure git
to never use a pager for the current git
project,
configure this project’s configuration for core.pager
to have value "cat"
.
$ git config core.pager "cat"
$ git branch -l ... lots of output...
$ git log ... lots of output...
To configure git
to always use a pager if the output of a git command is
longer than one terminal screen in the current project,
configure this project’s configuration for core.pager
to have value
"less -F"
, like this:
$ git config core.pager "less -F"
$ git branch -l ... lots of output...
$ git log ... lots of output...
This method only affects the git
repositories that your OS userid works on;
other users are not affected.
You can permanently configure git
to never use a pager for any of your git
projects.
$ git config --global core.pager "cat"
$ git branch -l ... lots of output...
$ git log ... lots of output...
To permanently configure git
to only use a pager if the output of any git
command is
longer than one terminal screen in any project,
configure the global setting of core.pager
to have value "less -F"
, like this:
$ git config --global core.pager "less -F"
$ git branch -l ... lots of output...
$ git log ... lots of output...