One of the great things about the Java ecosystem is the plethora of available libraries to integrate with virtually any imaginable tool, and one of those libraries is JGit.
What is JGit?
JGit is, as you have probably guessed, a Java library that implements the Git version control system. You may be wondering why we may need such a library, given that there are plenty of Git command line and desktop tools already. What’s more, even if you needed something that is not currently provided by any existing tool, you could easily create a script that uses the Git command line interface to achieve your objectives (I should know, as I create my own Bash scripts to automate Git tasks).
However, there are plenty of situations where a script that manipulates the output of Git commands won’t be up to the task: maybe you’re creating an application, it being web, desktop, mobile, or otherwise, that needs to integrate transparently with a Git repository. Or maybe you’re trying to process Git data in a way that is complex enough to require a high-level programming language, not just a script. In these scenarios being able to interact with the Git repository directly via Java classes and methods can be invaluable, so let’s take a look at how we would do that.
The JGit API is a fair mirror of the Git API itself, clearly differentiating between Plumbing and Porcelain; if you haven’t come across these concepts before, don’t worry, most people haven’t! Plumbing and Porcelain refer to the two levels at which you could be interacting with Git: at the low level, you could be interacting with hash objects, trees, blobs, references, and other Git internal representations. This is the Plumbing. And then, at a higher level, you could be interacting with files, commits, authors, branches, and other much more user-friendly concepts. This is the Porcelain.
Chances are you’ve never had to interact with Git Plumbing and, unless you’re doing some pretty advanced stuff, you probably never will. This is why, in this article, we will focus on the Porcelain.
Given the above, interacting with an existing Git repository using JGit is straightforward. The first step is to obtain a reference to the repository itself, which will give us an instance of the Repository class:
final Repository repository = FileRepositoryBuilder.create(new File(repositoryPath + "/.git"));
It is important to note here that, for JGit to really understand your repository’s status, it needs to grab hold of the .git directory within your checked out repository, which is where Git stores all its metadata.
Once we have a reference to the repository, we wrap it with an instance of the Git class, which will provide us with a Porcelain-style API to interact with our repository:
final Git git = new Git(repository);
And here is where the magic begins: the Git class has methods such as commit(), log(), merge(), branchCreate(), and a long et cetera that allows you to do anything that you can do at the command line. Moreover, these methods don’t simply execute the command, they create a Command object that will allow you to build options into it before actually executing it, giving you all the necessary flexibility. For instance, if you wanted to obtain a list of commits pertaining to files within a particular folder, but ignoring those in a subfolder, you could achieve this with the following:
final Iterable<RevCommit> logResult = git.log() .addPath("/folder") .excludePath("/folder/subfolder") .call();
At this point you may be wondering how you can obtain access to a repository that you haven’t checked out (i.e. cloned) yet. Easy! You clone it with Git class (since clone is a Porcelain operation):
Git git = Git.cloneRepository() .setURI( remoteRepoUri ) .call();
Or, if your repository doesn’t even exist yet, you can create it use the Porcelain init:
Git git = Git.init() .setDirectory( directory ) .call();
The rest of commands are very intuitive if you already have experience with Git, and for anything else JGit (and the Internet) have very good documentation. Let’s now see an example of something that we would struggle to do with simple Git commands.
Measuring developers’ legacy
I appreciate this is now dangerous territory: any attempt at measuring a developer’s legacy or performance needs to be surrounded by a number of caveats, among other things because the work of a developer goes beyond simply writing code: it’s about sharing knowledge, weighing options, supporting the business, etc. However, we can get a rough idea of part of the work a developer by looking at their code.
The easiest way of using code to measure someone’s contributions is to count the number of lines of code each person writes. This, however, is not a good metric, among other reasons because not all lines of code are equal. People then try to factor quality into the equation, but this bring its own host of problems: What is quality? How do you define quality rules? How do you weigh some rules against others? How do you identify exceptions? And most importantly, what if your concept of “quality” changes over time?
To circumvent these dilemmas, we can forget about quality and instead measure a consequence of quality: if a team follows good practices, then good code stays, while bad code eventually disappears. This means that the older a particular piece of code is, the more it has been exposed to the judgment of developers, and if no one has decided to change it to date, then the higher the chances that this is considered a high-quality piece of code. So we can count how many surviving lines of code we have, check when and by who they were written, and use this data to draw a picture of the lasting (code) legacy from each developer.
This process is very easy to implement with JGit, and for an idea of what this would look like you can take a look at the Developer’s Legacy Index. I wouldn’t suggest that you take the scores calculated by this tool excessively seriously, but it can be a fun way to assess the potential of JGit and to peek at your team’s dynamics.