Mike Slinn
Mike Slinn

Working With Git Repos Using Ruby's Rugged Gem

Published 2023-03-11.
Time to read: 3 minutes.

This page is part of the git collection, categorized under Git, Open Source, Ruby.

This article builds on the preceding libgit2 article. It demonstrates how to work with the Ruby wrapper around libgit2, called rugged.

Exploring Rugged With irb

Just for fun, let’s look up the commit for the tag v1.5.1 in the git repository for rugged itself using the rugged Ruby gem.

ruby
$ irb
irb(main):001:0> require 'rugged'
=> true 
irb(main):003:0> repo = Rugged::Repository.new('/mnt/c/work/ruby/rugged') => #<Rugged::Repository:74460 {path: "/mnt/c/work/ruby/rugged/.git/"}>
irb(main):004:0> repo.tags.find { |tag| tag.name == 'v1.5.1' }.target => #<Rugged::Commit:81360 {message: "Merge pull request #945 from libgit2/cmn/bump-libgit2-15\n\nUpdate to v1.5.1", tree: #<Rugged::Tree:81380 {oid: 91fdc2d8b85409686fdb35e4bc380d48164355c3}> <".gitattributes" bcbb8a9f8b89305e8e98b7736d605d43b5824b8c> <".github" dce490129533cf36a61f872a55e8674cc8df01b2> <".gitignore" 429849f5370009d62c4293b417edffd8651c6cf5> <".gitmodules" 44fa6fdf6891e3217e712c91758f2a7e7a2141a2> <".yardopts" 427acd6d077569adc83b2fc22f7a2c5c80a113fc> <"CHANGELOG.md" 700f9627e0b8297896fbfe9d259b124e384825be> <"Gemfile" b6880a0a33e0666d938e49281563fa761b3fd98a> <"LICENSE" 9efa1d8c7baaef8538d6318fde8f560a142594ab> <"README.md" 5484bce591ee114e1f43904a94647b30a71f0dd6> <"Rakefile" 5319630e221c60adc5c1ba9ea1c5b2dc34e9b9e4> <"ext" bd1480be34805260af2f77a5ca0248ab47bb84b5> <"lib" 8bcbbe21243733f495f852e3fce48b01e332668d> <"rugged.gemspec" 05403f802f4d3ab1ea46e145738d8e78ccfb5aee> <"script" b3133486465e567fbb2475b1b99a698822f24c4a> <"test" 90ac2f2a21eedf7ffcb18dc8d06da230f99949e3> <"vendor" 75c3bbc422c544ce9fbb27b547f28eea7b9e1694> , parents: ["22122185dcf117866c68f34f5bbf50acbbb082e1", "9d5978bba108785feb5626a4f01bc860791985aa"]}>

Diff-Centric Dump of Changes

Following is a simple Ruby program that uses rugged v1.5.1 to dump out all the changes to a repository between HEAD and the previous commit. This is a diff-centric approach; all changes in every file associated with a diff are dumped.

Diff-centric Ruby/rugged program
require 'rugged'
repo = Rugged::Repository.new('.') head = repo.head target = head.target # Rugged::Commit parent = head.target.parents.first # Rugged::Commit puts "head.target.oid: #{target.oid}" puts "parents.first.oid: #{parent.oid}" puts "commit message: #{parent.message}" head.target.diff(head.target.parents.first).each_patch do |patch| delta = patch.delta # Rugged::Diff::Delta new_path = delta.new_file[:path] patch.each_hunk do |hunk| hunk.each_line do |line| sign = line.addition? ? '+' : '-' lineno = line.new_lineno.to_s.rjust(4, ' ') puts "#{sign} #{new_path}:#{lineno} #{line.content}" end end end

I ran the above program on git itself. Here is the output as of 2023-03-07:

head.target.oid: d15644fe0226af7ffc874572d968598564a230dd
parents.first.oid: ef7d4f53c2fd9e8186d093dea6d45a91ce57110e
commit message: Git 2.40-rc1\n\nSigned-off-by: Junio C Hamano <gitster@pobox.com>\n
- range-diff.c: 383 \tconst char *color_new = diff_get_color_opt(diffopt, DIFF_FILE_NEW);\n
- range-diff.c: 384 \tconst char *color_commit = diff_get_color_opt(diffopt, DIFF_COMMIT);\n
- range-diff.c: 385 \tconst char *color;\n
- range-diff.c:  -1 \tint abbrev = diffopt->abbrev;\n
+ range-diff.c:  -1 \tchar abbrev = diffopt->abbrev;\n
- range-diff.c: 387 \n
- range-diff.c: 388 \tif (abbrev < 0)\n
- range-diff.c: 389 \t\tabbrev = DEFAULT_ABBREV;\n

A Bad Example

One of my pet peeves about open source software is that the documentation is often substandard and out-of-date. Rugged suffers from this issue.

Few complete examples of how to accomplish a task are available, and most of them do not work anymore because the library has evolved so much. Maintaining documentation is not something that most open-source software developers care about. That is a shame, because without good documentation few people are able to use such a library.

Another flaw in the human character is that everybody wants to build and nobody wants to do maintenance.

Following is an revised and annotated example of a program that can be found in the rugged documentation. Unfortunately, it is not properly explained, and thus what exactly it does can be elusive. I spent the time to figure it out. Hopefully this explanation will be helpful.

To aid your learning experience I first use irb to present the concepts, then I present a similar working program.

Irb Exploration

First lets make a new directory in /tmp/test, then initialize the directory as a bare git repository. The contents are then listed.

irb equivalent
$ mkdir /tmp/test
$ cd /tmp/test
$ irb irb(main):001:0> require 'rugged' => true
irb(main):002:0> repo = Rugged::Repository.init_at('.', :bare) => #<Rugged::Repository:67820 {path: "/tmp/test/"}>
irb(main):003:0> Dir['./*'].each { |x| puts x } ./HEAD ./config ./description ./hooks ./info ./objects ./refs => ["./HEAD", "./config", "./description", "./hooks", "./info", "./objects", "./refs"]

Right now we have an empty bare repository. We can see that no files have been stored as blob objects in the repo so far because nothing has been hashed yet:

irb equivalent, continued
irb(main):004:0> Dir['./objects/**/*'].each { |x| puts x }
./objects
./objects/info
./objects/pack 

Normally when working with git using the command line, one would:

  1. Create a file
  2. Git add it to the staging area (also known as the index)
  3. Commit the index

Instead, the misguided example skips the file creation process. A string is hashed and stored into the git database; the misguided programmer pretended that the hash represented the contents of a file called newfile.txt. The last command displays the files in objects/ directory again:

irb equivalent, continued
irb(main):005:0> oid = repo.write("This blob will be written to the git stage as newfile.txt.", :blob)
=> "fad0c411fb5b035b58be3116f03c8bf76a1ae760" 
irb(main):006:0> Dir['./objects/**/*'].each { |x| puts x } ./objects/fa ./objects/fa/d0c411fb5b035b58be3116f03c8bf76a1ae760 ./objects/info ./objects/pack

Now we see that a blob object was written to the object database. However, it has not been entered into the index. That would never happen when using git add.

This is how to add the blob to the index:

irb equivalent, continued
irb(main):007:0> index.add path: 'newfile.txt', oid: oid, mode: 0o0100644
=> "fad0c411fb5b035b58be3116f03c8bf76a1ae760" 
irb(main):008:0> curr_tree = repo.index.write_tree(repo) => "4b825dc642cb6eb9a060e54bf8d69288fbee4904"

Now we can commit:

irb equivalent, continued
irb(main):09:0> author = {
  :email=>"mslinn@mslinn.com",
  :time=>Time.now,
  :name=>"Mike Slinn"
}
=> {:email=>"mslinn@mslinn.com", :time=>2023-03-19 20:34:05.588255505 -0400, :name=>"Mike Slinn"} 
irb(main):008:0> new_commit = Rugged::Commit.create repo, author: author, message: 'This is a commit message.', parents: [], tree: curr_tree, update_ref: 'HEAD' => "7d3e484d4f136a9f1298f794dc43b0ebabc91d57"

Similar Ruby Program

In summary, this program creates files directly in the git stage and commits them. The ‘files’ are never physically created, thus this example does not utilize a git working tree — it is a bare repository. This becomes problematic if you attempt to mix the effects of the Ruby code with git commands. In general, there is little benefit in working this way.

require 'rugged'
require 'tmpdir'

author = { email: 'email@email.com', time: Time.now, name: 'username' }

Dir.mktmpdir do |temp_dir| # This directory will be deleted on exit
  Dir.chdir(temp_dir)
  puts "Working in #{temp_dir}"

  repo = Rugged::Repository.init_at temp_dir
  index = repo.index

  # Stage a file (newfile.txt) that does not actually exist.
  oid = repo.write("This blob will be written to the git stage as newfile.txt.", :blob)
  index.add path: 'newfile.txt', oid: oid, mode: 0o0100644
  curr_tree = index.write_tree(repo)
  new_commit = Rugged::Commit.create repo,
      author:     author,
      message:    "This is a commit message.",
      committer:  author,
      parents:    repo.empty? ? [] : [repo.head.target].compact,
      tree:       curr_tree,
      update_ref: 'HEAD'
  puts "Commit #{new_commit[0..5]} created.\n\n"

  # The git CLI shows newfile.txt as a deleted file:
  #   $ git status
  #   On branch master
  #   Changes to be committed:
  #     (use "git restore --staged <file>..." to unstage)
  #           deleted:    newfile.txt
  puts `git status`

  files = Dir["*"]
  if files.reject { |x| x == '.git' }.empty?
    puts "No physical files were created."
  else
    puts "Physical files are:\n  #{files.join("  \n")}"
  end
end
Typical output
Working in /tmp/d20230311-1208322-ovp16x
Commit fab118b9 created.

On branch master
Changes to be committed:
  (use "git restore --staged ..." to unstage)
	deleted:    newfile.txt

No physical files were created.