Git and libgit2
Mike Slinn

Low-Level Git Concepts

Published 2023-03-13. Last modified 2023-04-25.
Time to read: 6 minutes.

This page is part of the git collection, categorized under Git.

This article discusses low- to high-level git concepts: hashes, refs, terms and revision parameters.

If you are new to git, the following easy-to-read trilogy provides a nice explanation:

  1. A curious tale.
  2. Curious git.
  3. Types of git objects brings you right into this document.

Low- to High-Level User Interfaces

This section was paraphrased, updated and enhanced from the Git Internals - Plumbing and Porcelain chapter of the Pro Git book, written by Scott Chacon and Ben Straub and published by Apress. The book is licensed under the Creative Commons Attribution Non-Commercial Share Alike 3.0 license.

Git was initially a toolkit for a version control system, rather than being user-friendly. From the beginning, its low-level subcommands were designed to be chained together UNIX-style, or called from scripts. The low-level git subcommands are referred to as plumbing subcommands.

Starting from 2015, more user-friendly git subcommands were added; continuing with the plumbing metaphor, the more user-friendly git subcommands are called porcelain commands. There were 43 main porcelain subcommands when this article was last updated. Type git help -a to see these subcommands, along with many other categories of subcommands.

Shell
$ git help -a
See 'git help <command>' to read about a specific subcommand

Main Porcelain Commands
   add                     Add file contents to the index
   am                      Apply a series of patches from a mailbox
   archive                 Create an archive of files from a named tree
   bisect                  Use binary search to find the commit that introduced a bug
   branch                  List, create, or delete branches
   bundle                  Move objects and refs by archive
   checkout                Switch branches or restore working tree files
   cherry-pick             Apply the changes introduced by some existing commits
   citool                  Graphical alternative to git-commit
   clean                   Remove untracked files from the working tree
   clone                   Clone a repository into a new directory
   commit                  Record changes to the repository
   describe                Give an object a human readable name based on an available ref
   diff                    Show changes between commits, commit and working tree, etc
   fetch                   Download objects and refs from another repository
   format-patch            Prepare patches for e-mail submission
   gc                      Cleanup unnecessary files and optimize the local repository
   gitk                    The Git repository browser
   grep                    Print lines matching a pattern
   gui                     A portable graphical interface to Git
   init                    Create an empty Git repository or reinitialize an existing one
   log                     Show commit logs
   maintenance             Run tasks to optimize Git repository data
   merge                   Join two or more development histories together
   mv                      Move or rename a file, a directory, or a symlink
   notes                   Add or inspect object notes
   pull                    Fetch from and integrate with another repository or a local branch
   push                    Update remote refs along with associated objects
   range-diff              Compare two commit ranges (e.g. two versions of a branch)
   rebase                  Reapply commits on top of another base tip
   reset                   Reset current HEAD to the specified state
   restore                 Restore working tree files
   revert                  Revert some existing commits
   rm                      Remove files from the working tree and from the index
   scalar                  A tool for managing large Git repositories
   shortlog                Summarize 'git log' output
   show                    Show various types of objects
   sparse-checkout         Reduce your working tree to a subset of tracked files
   stash                   Stash the changes in a dirty working directory away
   status                  Show the working tree status
   submodule               Initialize, update or inspect submodules
   switch                  Switch branches
   tag                     Create, list, delete or verify a tag object signed with GPG
   worktree                Manage multiple working trees

Ancillary Commands / Manipulators
   config                  Get and set repository or global options
   fast-export             Git data exporter
   fast-import             Backend for fast Git data importers
   filter-branch           Rewrite branches
   mergetool               Run merge conflict resolution tools to resolve merge conflicts
   pack-refs               Pack heads and tags for efficient repository access
   prune                   Prune all unreachable objects from the object database
   reflog                  Manage reflog information
   remote                  Manage set of tracked repositories
   repack                  Pack unpacked objects in a repository
   replace                 Create, list, delete refs to replace objects

Ancillary Commands / Interrogators
   annotate                Annotate file lines with commit information
   blame                   Show what revision and author last modified each line of a file
   bugreport               Collect information for user to file a bug report
   count-objects           Count unpacked number of objects and their disk consumption
   diagnose                Generate a zip archive of diagnostic information
   difftool                Show changes using common diff tools
   fsck                    Verifies the connectivity and validity of the objects in the database
   gitweb                  Git web interface (web frontend to Git repositories)
   help                    Display help information about Git
   instaweb                Instantly browse your working repository in gitweb
   merge-tree              Perform merge without touching index or working tree
   rerere                  Reuse recorded resolution of conflicted merges
   show-branch             Show branches and their commits
   verify-commit           Check the GPG signature of commits
   verify-tag              Check the GPG signature of tags
   version                 Display version information about Git
   whatchanged             Show logs with difference each commit introduces

Interacting with Others
   archimport              Import a GNU Arch repository into Git
   cvsexportcommit         Export a single commit to a CVS checkout
   cvsimport               Salvage your data out of another SCM people love to hate
   cvsserver               A CVS server emulator for Git
   imap-send               Send a collection of patches from stdin to an IMAP folder
   p4                      Import from and submit to Perforce repositories
   quiltimport             Applies a quilt patchset onto the current branch
   request-pull            Generates a summary of pending changes
   send-email              Send a collection of patches as emails
   svn                     Bidirectional operation between a Subversion repository and Git

Low-level Commands / Manipulators
   apply                   Apply a patch to files and/or to the index
   checkout-index          Copy files from the index to the working tree
   commit-graph            Write and verify Git commit-graph files
   commit-tree             Create a new commit object
   hash-object             Compute object ID and optionally creates a blob from a file
   index-pack              Build pack index file for an existing packed archive
   merge-file              Run a three-way file merge
   merge-index             Run a merge for files needing merging
   mktag                   Creates a tag object with extra validation
   mktree                  Build a tree-object from ls-tree formatted text
   multi-pack-index        Write and verify multi-pack-indexes
   pack-objects            Create a packed archive of objects
   prune-packed            Remove extra objects that are already in pack files
   read-tree               Reads tree information into the index
   symbolic-ref            Read, modify and delete symbolic refs
   unpack-objects          Unpack objects from a packed archive
   update-index            Register file contents in the working tree to the index
   update-ref              Update the object name stored in a ref safely
   write-tree              Create a tree object from the current index

Low-level Commands / Interrogators
   cat-file                Provide content or type and size information for repository objects
   cherry                  Find commits yet to be applied to upstream
   diff-files              Compares files in the working tree and the index
   diff-index              Compare a tree to the working tree or index
   diff-tree               Compares the content and mode of blobs found via two tree objects
   for-each-ref            Output information on each ref
   for-each-repo           Run a Git command on a list of repositories
   get-tar-commit-id       Extract commit ID from an archive created using git-archive
   ls-files                Show information about files in the index and the working tree
   ls-remote               List references in a remote repository
   ls-tree                 List the contents of a tree object
   merge-base              Find as good common ancestors as possible for a merge
   name-rev                Find symbolic names for given revs
   pack-redundant          Find redundant pack files
   rev-list                Lists commit objects in reverse chronological order
   rev-parse               Pick out and massage parameters
   show-index              Show packed archive index
   show-ref                List references in a local repository
   unpack-file             Creates a temporary file with a blob's contents
   var                     Show a Git logical variable
   verify-pack             Validate packed Git archive files

Low-level Commands / Syncing Repositories
   daemon                  A really simple server for Git repositories
   fetch-pack              Receive missing objects from another repository
   http-backend            Server side implementation of Git over HTTP
   send-pack               Push objects over Git protocol to another repository
   update-server-info      Update auxiliary info file to help dumb servers

Low-level Commands / Internal Helpers
   check-attr              Display gitattributes information
   check-ignore            Debug gitignore / exclude files
   check-mailmap           Show canonical names and email addresses of contacts
   check-ref-format        Ensures that a reference name is well formed
   column                  Display data in columns
   credential              Retrieve and store user credentials
   credential-cache        Helper to temporarily store passwords in memory
   credential-store        Helper to store credentials on disk
   fmt-merge-msg           Produce a merge commit message
   hook                    Run git hooks
   interpret-trailers      Add or parse structured information in commit messages
   mailinfo                Extracts patch and authorship from a single e-mail message
   mailsplit               Simple UNIX mbox splitter program
   merge-one-file          The standard helper program to use with git-merge-index
   patch-id                Compute unique ID for a patch
   sh-i18n                 Git's i18n setup code for shell scripts
   sh-setup                Common Git shell script setup code
   stripspace              Remove unnecessary whitespace

User-facing repository, command and file interfaces
   attributes              Defining attributes per path
   cli                     Git command-line interface and conventions
   hooks                   Hooks used by Git
   ignore                  Specifies intentionally untracked files to ignore
   mailmap                 Map author/committer names and/or E-Mail addresses
   modules                 Defining submodule properties
   repository-layout       Git Repository Layout
   revisions               Specifying revisions and ranges for Git

Developer-facing file formats, protocols and other interfaces
   format-bundle           The bundle file format
   format-chunk            Chunk-based file formats
   format-commit-graph     Git commit-graph format
   format-index            Git index format
   format-pack             Git pack format
   format-signature        Git cryptographic signature formats
   protocol-capabilities   Protocol v0 and v1 capabilities
   protocol-common         Things common to various protocols
   protocol-http           Git HTTP-based protocols
   protocol-pack           How packs are transferred over-the-wire
   protocol-v2             Git Wire Protocol, Version 2

External commands
   fame
   filter-repo
   gui
   lfs
   remote-keybase
   tree-evars
   tree-exec
   tree-replicate

Command aliases
   br                      branch
   ci                      commit
   co                      checkout
   dc                      diff --cached
   df                      diff
   dif                     diff --word-diff=color --ignore-space-at-eol
   ign                     ls-files -o -i --exclude-standard
   lg                      log -p
   lol                     log --graph --decorate --pretty=oneline --abbrev-commit
   lola                    log --graph --decorate --pretty=oneline --abbrev-commit --all
   ls                      ls-files
   pwd                     !pwd
   st                      status

Fundamental Concepts

Git currently uses SHA-1 to identify all types of objects it stores (commits, trees, blobs and annotated tags). Git has symbolic names for branches and tags, to spare you the awkwardness of having to use long alphanumeric identifiers. Combinations of symbolic names are called refs, which is short for references. While most refs usually refer to commits, tags are a special kind of ref that can refer to any of the four object types.

Git v2.2.9 introduced SHA-256 for object names and content. This required a new repository format. There is no interoperability between SHA-1 and SHA-256 repositories yet. No major Git provider is currently supporting SHA-256-enabled repositories yet.

A revision is anything which may be resolved to some kind of object stored in a Git object database, using Git’s DSL.

Git implements a DSL that can be used by combining ref names, SHA-1 names and operators. This is documented in the gitrevisions man page, which is dedicated to specifying revisions and ranges for Git. This is a difficult document to read. I have rewritten and paraphrased the material in the remainder of this article.

Low-level Files and Directories

Within the .git/ directory of a git project, many entries are possible:

Shell
$ tree .git -FL 1
.git
├── COMMIT_EDITMSG
├── FETCH_HEAD
├── HEAD
├── ORIG_HEAD
├── branches/
├── config
├── description
├── hooks/
├── index
├── info/
├── logs/
├── objects/
├── packed-refs
└── refs/ 

Only 4 files and subdirectories are important for this discussion:

HEAD
This file is created after the first commit, and points to the current branch. HEAD is a special reference. By definition, it always points to the currently checked out commit. However, this is not usually a direct pointer – instead, it is a symbolic reference, which means that it points to a branch whose tip commit is currently checked out.
index
This file contains the git staging area, also referred to by older documentation as the cache. This is the data that is committed when you run git commit. In general, when you commit, you commit the index.
objects/
This subdirectory contains the git project’s object database; the objects directory has 256 subdirectories, which contain the actual database files.
refs/
This subdirectory stores string representations of pointers to objects stored in the object database, such as commits, branches, tags, and remotes.

References

Every branch has a head, which is the pointer to the current branch reference, which is in turn a pointer to the last commit made on that branch. If the default branch is master, then the default head is the head of the master branch.

The reference called HEAD is equivalent to writing something of the form HEAD/<default_branch>. If the current branch is master you might write HEAD/master. You could also be more precise by writing using the fully qualified form: refs/heads/master.

The value for HEAD is persisted in .git/HEAD:

Shell
$ cat .git/HEAD
ref: refs/heads/master 

The above defines HEAD as refs/heads/master. This means the default branch is master. When a git repository has these contents in that file, writing HEAD is equivalent to heads/master and refs/heads/master.

Shell
$ git show-ref -s HEAD
fcd6335681f917421ef3522bc9704c4800467aa0 

$ git show-ref -s heads/master
fcd6335681f917421ef3522bc9704c4800467aa0 

$ git show-ref -s refs/heads/master
fcd6335681f917421ef3522bc9704c4800467aa0 

Refnames

A refname is a symbolic reference name. Examples include: master, heads/master, refs/heads/master and refs/remotes/origin/master. The shorter refnames are convenient to write, while using longer refnames avoids ambiguity.

The refname master typically means the commit object referenced by refs/heads/master, defined in the file .git/refs/heads/master. However, the meaning of the short version of a refname might be ambiguious, depending on context.

For example, if a git repository has both the refnames heads/master (defined in the file .git/refs/heads/master) and refs/remotes/origin/master (defined in the file .git/refs/remotes/origin/master), you can explicitly write heads/master or refs/remotes/origin/master to be precise. Note that the SHAs of each reference are the same, which would make sense if these repositories were both up-to-date sibling clones.

Shell
$ cat .git/refs/heads/master
fcd6335681f917421ef3522bc9704c4800467aa0 

$ cat .git/refs/remotes/origin/master
fcd6335681f917421ef3522bc9704c4800467aa0 

$ git show-ref master
fcd6335681f917421ef3522bc9704c4800467aa0 refs/heads/master
fcd6335681f917421ef3522bc9704c4800467aa0 refs/remotes/origin/master 

$ git show-ref -s heads/master
fcd6335681f917421ef3522bc9704c4800467aa0 

$ git show-ref -s refs/remotes/origin/master
fcd6335681f917421ef3522bc9704c4800467aa0 

Refname Disambiguation Rules

When ambiguous, a refname is disambiguated by the contents of the first file found below:

  1. .git/<refname>
    These unqualified refnames are usually only useful for the following:
    Refname Defined in Description
    HEAD .git/HEAD Names the commit on which you based the changes in the working tree.
    FETCH_HEAD .git/FETCH_HEAD Records the branch which you fetched from a remote repository with your last git fetch invocation.
    ORIG_HEAD .git/ORIG_HEAD This file and the refname are created by commands that move HEAD in a drastic way, such as git am, git merge, git rebase, and git reset. The purpose of this file and refname is to record the position of the HEAD before their operation, so that you can easily change the tip of the branch back to the state before you ran them.
    MERGE_HEAD .git/MERGE_HEAD This file and refname record the commit(s) which you are merging into your branch when you run git merge.
    CHERRY_PICK_HEAD .git/CHERRY_PICK_HEAD This file and refname record the commit which you are cherry-picking when you run git cherry-pick.
  2. .git/refs/<refname>
    .git/packed-refs/<refname>
  3. .git/refs/tags/<refname>
    .git/packed-refs/tags/<refname>
  4. .git/refs/heads/<refname>
    .git/packed-refs/heads/<refname>
  5. .git/refs/remotes/<refname>
    .git/packed-refs/remotes/<refname>
  6. .git/refs/remotes/<refname>/HEAD
    .git/packed-refs/remotes/<refname>/HEAD

Reference Logs

The history of the previous values of references is stored in .git/logs:

Shell
$ tree --noreport .git/logs
.git/logs
├── HEAD
└── refs
    ├── heads
    │   └── master
    ├── remotes
    │   └── origin
    │       ├── HEAD
    │       └── master
    └── stash

The above directory tree shows where the history of HEAD is stored: in .git/logs/HEAD.

Each git branch has its own history, under .git/logs/refs/heads. Internally, git refers to branches as heads. This might seem confusing; you will get used to it.

Each remote HEAD has its history stored under .git/logs/refs/remotes/<remote_name>/HEAD.

Each remote branch has its history stored under .git/logs/refs/remotes/<remote_name>/<branch_­name>.

Access the reference log by indexing a reference @{using braces}, preceded by an @ character. For example, the previous HEAD can be written like this: HEAD@{1}.

Terms

The Git Glossary defines many terms, and StackOverflow clarifies them. I have expanded on some definitions:

Commit-ish
The SHA of a commit, or an annotated tag that points at a commit. All commit-ish references are also tree-ish.
Tree-ish
Any identifier that points to a subdirectory tree. Git refers to directories as trees and tree objects. The general form is: <rev>:<path>.
To break this down, first there might be an optional prefix, delimited by a colon (:), followed by the name of the blob or tree at the given path.
For example: HEAD:README, :README, master:path/to/file, and master:README.

If the prefix is not provided, HEAD is assumed.
Working tree
The directory tree of physical files. The working tree normally contains the contents of the HEAD commit’s tree, plus any local changes that you have made but not yet committed.

Unless submodules or worktrees are in play, the parent directory of the .git/ directory contains the working tree. Bare repositories have no working tree.

The .git/ directory is physically contained within the working tree, but is not logically part of it.
Repository
A repository is a collection of refs, together with an object database containing all objects which are reachable from the refs, possibly accompanied by metadata from one or more porcelains. A repository can share an object database with other repositories via an alternates mechanism. The repository proper does not include the index or the working tree; it mostly consists of the commits.
index
Also known as the staging area and the cache. A collection of files with status information, whose contents are stored as objects. The index is a stored version of the working tree. The index can also contain two or three versions of a working tree, for merging.

Revision Parameters

The meaning of revision parameters depends on the git command they are used with. A revision parameter might denote:

  • A specific commit; the type of revision parameter could be specified as:
    • A SHA or ref.
    • Output from git describe: a tag, optionally followed by a dash and a number of commits, followed by a dash, a g, then an abbreviated object name.
  • For commands, such as git-log, which walk the revision graph, revison parameters denote all commits which are reachable from that commit. The range of revisions can also be explicitly specified.
  • Some Git commands, such as git-cat-file, git-push, git-show, and git-show-ref, accept revision parameters which denote types of objects other than commits. For example, these commands can accept objects such as blobs (files) or trees (directories of files).

The syntax for revision parameters is easily confused with the syntax for a parent commit. Revision parameters look like reference^{type}, whereas the parent of HEAD is written as HEAD^.

To be more specific, revision parameters are written with the following components:

  1. A reference.
  2. The character ^.
  3. An object type name enclosed in braces, for example: {commit} or {tree}.

^0 is a shorthand for ^{commit}.

The object is recursively dereferenced until an object of the desired type is found.

In the following example, master^{tree} returns the tree object associated with ref master.

Shell
$ git cat-file -p master
tree d4702090a9dabcf2b2af16d30fed3b2a8452ed2f
parent 00f59a41ab11c0815ac9e94afe0be84551a3368a
author Mike Slinn <mslinn@mslinn.com> 1701095478 -0500
committer Mike Slinn <mslinn@mslinn.com> 1701095478 -0500

-

$ git cat-file -p master^{tree} | head 040000 tree eb81966ec081881a6e3d668ce07c089f1b506b18 .bundle 100644 blob 07ccc240d16f795ff1b81e2ba7264e9b449b4e46 .gitignore 100644 blob 68389f23bf4f4006577672914d3e5bd0b5ea65bd .markdownlint.yaml 100644 blob 00407e6d086c986e4ac02e7df529995a0667a7b6 .rubocop.yml 100644 blob 4e0ef479723860d16a332347a691d422a0ef2770 .shellcheckrc 040000 tree 0df7a371457529ec1adf77378d378a7ff6bf4197 .vscode 100644 blob 779480bf02fc0102a240caa65523fc6c435954bf 404.html 100644 blob 1fe1aa295d166663285b54cb94954bfe19da152c 670nm.html 100644 blob 02862b254846b5669596de4d2795d023ebc87c7c BingSiteAuth.xml 100644 blob 5eb771fd0bcbe15a32dace40730e7d0f7b62c908 Gemfile

Revision Syntax Examples

The following table shows examples of revision syntax in the left column, and the returned class from Rugged::Repository.rev_parse in the right column. Hover your mouse over a row to see it highlighted.

IncantationReturned Class
abf8efadc8Rugged::Commit
abf8efadc8:README.mdRugged::Blob
abf8efadc8^Rugged::Commit
abf8efadc8^{tree}Rugged::Tree
@Rugged::Commit
HEADRugged::Commit
HEAD~3Rugged::Commit
HEAD^Rugged::Commit
HEAD^{tree}Rugged::Tree
master^{tree}Rugged::Tree
HEAD:README.mdRugged::Blob
master:README.mdRugged::Blob
HEAD@{0}Rugged::Commit
HEAD@{yesterday}Rugged::Commit
HEAD@{2 months ago}Rugged::Commit
HEAD@{1 month 2 weeks 3 days ago}Rugged::Commit
HEAD@{'Oct 15, 2021'}Rugged::Commit
HEAD@{'2021-10-15'}^{tree}Rugged::Tree
HEAD@{'2021-10-15'}:README.mdRugged::Blob
master@{yesterday}Rugged::Commit
master@{'2021-10-15'}:README.mdRugged::Blob
@{'2021-10-15'}:README.mdRugged::Blob
@{last week}:README.mdRugged::Blob
@{last month}:README.mdRugged::Blob
@{last year}:README.mdRugged::Blob
@{'2021-10-15 12:34'}:README.mdRugged::Blob
@{0}Rugged::Commit
v1.5.1Rugged::Commit
v1.5.1^0Rugged::Commit
v1.5.1^{}Rugged::Commit
:/bumpRugged::Commit
HEAD^{/bump}Rugged::Commit

The above table was produced by the following program and my flexible_include Jekyll plugin:

#!/usr/bin/env ruby

require 'rainbow/refinement'
require 'rugged'

class GitRevisionException < StandardError; end

using Rainbow

EXPRESSIONS = [
  'abf8efadc8',
  'abf8efadc8:README.md',
  'abf8efadc8^',
  'abf8efadc8^{tree}',
  '@',
  'HEAD',
  'HEAD~3',
  'HEAD^',
  'HEAD^{tree}',
  'master^{tree}',
  'HEAD:README.md',
  'master:README.md',
  'HEAD@{0}',
  'HEAD@{yesterday}',
  'HEAD@{2 months ago}',
  'HEAD@{1 month 2 weeks 3 days ago}',
  "HEAD@{'Oct 15, 2021'}",
  "HEAD@{'2021-10-15'}^{tree}",
  "HEAD@{'2021-10-15'}:README.md",
  'master@{yesterday}',
  "master@{'2021-10-15'}:README.md",
  "@{'2021-10-15'}:README.md",
  "@{last week}:README.md",
  "@{last month}:README.md",
  "@{last year}:README.md",
  "@{'2021-10-15 12:34'}:README.md",
  '@{0}',
  'v1.5.1',
  'v1.5.1^0',
  'v1.5.1^{}',
  ':/bump',
  'HEAD^{/bump}'
].freeze

def do_one(rev_str)
  rev_str.strip!
  return nil if rev_str.strip.empty?

  begin
    result = @repo.rev_parse(rev_str).class
    td = "<td>#{result}</td>"
  rescue StandardError => e
    td = "<td class='error' style='padding: 1px 3px;'>#{e.message}</td>"
  end
  "  <tr class='code'><td>#{rev_str}</td> #{td}</tr>"
end

def expand_env(str)
  str.gsub(/\$([a-zA-Z_][a-zA-Z0-9_]*)|\${\g<1>}|%\g<1>%/) do
    ENV.fetch(Regexp.last_match(1), nil)
  end
end

begin
  git_dir = expand_env '$rugged'
  abort 'Error: the $rugged environment variable is not defined'.red if git_dir.empty?
  @repo = Rugged::Repository.new git_dir
  puts <<~END_OUTPUT
    <table class="condensed noborder table">
      <tr><th>Incantation</th> <th>Returned Class</th></tr>
    #{EXPRESSIONS.map { |x| do_one x }.compact.join("\n")}
    </table>
  END_OUTPUT
rescue StandardError => e
  raise GitRevisionException, "#{e.class}: #{e.full_message}".red, []
end

If you want to be able to run this program, you first need to install its dependency, rainbow:

Shell
$ gem install rainbow

References