playing with libgit2

Following on my dorking around with bash prompts yesterday, I was wondering how fast one could write a program to query state from git for the command prompt. The minimal __git_ps1 function I set up there takes about 75ms to run; takes longer if I add more to my prompt. That function works by invoking git a bunch of times, maybe a custom C binary could do better?

I took a quick crack at it using libgit2. It’s been years since I wrote any C code and it was rough going. No exceptions, no useful string type, manually allocated and typed memory, ugh. Maybe I should break down and do C++. Anyway, I managed to scratch together something that prints out the branch name and hash for the current commit. With caches warm my program takes just about 30ms to run; compare 23ms for __git_ps1 if it’s only printing branch name, nothing else. Ouch, I’m slower!

Using set -x I can see what __git_ps1 is doing. The main command its forking is git symbolic-ref HEAD, which takes 10ms if I run it by itself. I wonder why my libgit2 based code is so much slower?

I still feel a custom C binary should be faster, particularly to support fancier output info. But it’s an awful lot of pain writing libgit2 code in C, I may not have the patience to press on.

Update: derp, my program isn’t slow because it’s slow. It’s slow because I was running it on my Mac off of an SMB mounted volume. If I copy my binary to local disk it’s faster than stock git as expected: 6ms vs. 10ms for git symbolic-ref HEAD. Duh.

One other note; I started playing with dtruss/dtrace to figure out what system calls the two programs were running. My code makes 135 system calls, stock git makes 178. My code opens 8 files, 3 of them from the repo (which is still SMB and therefore slow). git opens 14 files, 6 from the repo. It seems to be doing all the work on the repo files twice, not sure why. Anyway now I’m reassured my libgit2 approach could be more efficient.

#include <stdio.h>
#include <stdlib.h>
#include <git2.h>

void check_return(int rc, char * msg) {
    if (rc != 0) {
        fprintf(stderr, "%s. Code: %d\n", msg, rc);
        exit(1);
    }
}

int main(int argc, char ** args) {
    int rc;
    git_repository *repo;
    git_reference *head;

    // open the repository
    rc = git_repository_open_ext(&repo, ".", 0, NULL);
    if (rc == GIT_ENOTFOUND) {
        // No git repo; that's OK.
        exit(0);
    }
    check_return(rc, "opening repository");

    // find HEAD
    rc = git_repository_head(&head, repo);
    check_return(rc, "finding HEAD");

    int ref_type = git_reference_type(head);
    if (ref_type == GIT_REF_SYMBOLIC) {
        check_return(-999, "Cant handle symbolic targets");
        // git_reference_symbolic_target() or git_reference_resolve()??
    } else if (ref_type == GIT_REF_OID) {
        // Get the name
        const char * name = git_reference_name(head);

        // Get the hash
        const git_oid * oid = git_reference_target(head);
        char hash[7];
        git_oid_tostr(hash, 7, oid);

        // Print out what we've learned
        printf("%s %s", name, hash);
    }

    return 0;
}

3 thoughts on “playing with libgit2

  1. So, `git_repository_open_ext` always loads the git config data to check if you have a bare repository or a remote working directory or a variety of other configuration settings. The time to load config data (which also checks your home directory and other places) is significant.

    As an experiment, I added a new flag to `git_repository_open_ext` named `GIT_REPOSITORY_OPEN_SKIP_CONFIG` that allows you to skip over the config parsing and just assumes that the working directory is the parent of the “.git” directory. This is a bad idea in the general case, but for a small one-off app like yours, it can make a difference. (I also took the liberty of taking your above code and copying into `examples/namehash.c` so I could play with it.)

    The timing on my machine is completely different from yours, but I see about a 40% performance increase. With the repository on an SMB mount, I would guess that it won’t matter quite as much.

    If you’re interested and you get a chance, check out https://github.com/arrbee/libgit2/tree/fast-open (you can run `git fetch https://github.com/arrbee/libgit2 fast-open` to get the code) and try the `GIT_REPOSITORY_OPEN_SKIP_CONFIG` flag – I’m curious if you see the same speed up.

  2. Thanks for the reply and the thoughtful comment! I’ve reimplemented most of the __git_ps1() function in a Python program now and while it’s a little faster, it’s not significantly faster. Once you start being interested in the dirty state of the repo or, even worse, the presence of untracked files it gets really slow as the whole repo has to be inspected.

    I’m also coming to the conclusion that optimizing for the SMB/NFS mounted case is foolish; git’s already a distributed filesystem of its own, I imagine people using network mounted repo clones are in the repository.

  3. Yeah, once you start check dirty state or untracked files, the amount of data you have to go through becomes quite large. Right now, libgit2 doesn’t have a flag to diff / status to stop processing files as soon as any actual diff is found, but I’ve thought about adding one so you could have a “quick” check of “was something dirty?” This is a good concrete use case for such a flag. It is still going to be hundreds of times slower than just a quick open to read the branch and SHA, but we might be able to keep performance acceptably fast.

    I do think that a lot of people like to see git status in their prompt, so this might be useful to lots of folks, but at the same time, I suspect that very few need the SMB/NFS optimization and fewer still would be willing to install a native binary just to accelerate their prompt. Ah well…

Comments are closed.