Git Commit Metrics

LoopedNetwork
5 min readApr 20, 2022

I’m a fan of the Fediverse, as should be evidenced by the fact that I recently started my own Mastodon instance. If you’re curious, no, I’m not accepting sign-ups; I had this deployed for a handful of friends, but that will likely be another post in itself in the very near future.

This may just be due to the fact that I’m in lot of free software communities online, but I sometimes feel like folks in the Fediverse get a little… unrealistic about:

  1. The number of people using websites with non-free software.
  2. The number of people wanting to contribute to open source projects.

I could easily be horribly wrong on both fronts, but that’s at least the impression I occasionally get. Case in point, I stumbled my way across a thread with a few people discussing the Mastodon project. I’m intentionally not linking to the thread because my goal isn’t to put anyone on blast but to instead write about the rabbit hole I went down as a result of the discussion. The prevailing sentiment in the thread was that while Mastodon seemed to be doing the right thing from the perspective of what the project was delivering, it was ultimately flawed because the project was hosted out of GitHub, which has been owned by Microsoft since 2018. The idea discussed in the thread was that, by making GitHub the home of the project, it was missing out on potential contributions from all of the developers who refused to make GitHub accounts.

I happen to support the Mastodon project through Patreon, and that prompted me to join their Discord server. A decent amount of discussion surrounding the development of the platform happens there, and for the most part there’s a relatively small number of folks who seem to be regular contributors. This in itself isn’t necessarily an accurate gauge of interest, though, since I’d have to imagine that people who wouldn’t make a GitHub account certainly wouldn’t be making Discord accounts. I found it a little hard to believe, though, that there were developers just chomping at the bit to take part in Mastodon who were solely held back by the fact that the project’s code resides on GitHub.

Ultimately, there isn’t really a good way to prove that idea one way or another. However, it got me thinking about the scope of the project, how many people were actually involved, and to what degree. Looking at the main project repository on GitHub shows that there have been 665 contributors:

I wanted to get a breakdown of what number of commits each person made, though, to see what the distribution would be. Number of commits is absolutely not the end all, be all of project activity, but I thought it would be interesting to see the scope of contributions.

At the time of this writing, the project has 11,457 commits. Of those, the project’s creator Gargron has made 3,602 of them. The individual with the second most commits has 1,058, and the third most sits at 501 commits. The accounts — which I’m not sharing in this post because it’s interesting, not a competition — are ones I recognize from my time lurking in Discord. From there, numbers fall off at a fairly rapid pace. Only 11 developers — including the top 3 just mentioned — have more than 100 commits, 6 developers have been 99 and 50 commits, and 62 developers have been 49 and 10 commits. 247 developers have been 9 and 2 commits, while 328 developers have made a single commit. This is a little wonky because 343 commits have been made by accounts which no longer exist or otherwise don’t get included in the API call, so it’s difficult to figure out any ratios for a rather significant number of contributions.

Regardless, nearly half of the overall people that have made commits to the project made a single, one-off commit. Likewise, just 3 developers have made 45% of the commits. These numbers are a bit weird, though, because there are two accounts I’ve ignored in this write-up thus far:

Those two bots are responsible for 1,500 and 720 commits, respectively. If we take those out, that puts the commit count at 9,237. It would mean that, as far as human commits are concerned, the top 3 have supplied 55% of them. Another relatively small number of folks have then made a very significant number of commits, with the number of contributions per individual decreasing as the tiers expand.

Does any of this mean anything? Not really. But I thought it was interesting to gather the data and look through it. Furthermore, I recently started a new job, and most of my work has shifted from scripting in Groovy to scripting in Python. As a result, I saw this as an opportunity to brush a little bit of rust off of my Python skills. I have to assume there’s something out there which would have pulled these numbers for me, but instead I chose to write my own script for it. If you’re interested in seeing how bad it is, you can check out the code in this GitLab snippet. If you try to run it, just be mindful that you’ll need to supply a few switches:

  • -u: This is the URL you want to query. Be aware that it's looking for the URL to the commit, not to the project base. The URL for Mastodon, for example, is https://api.github.com/repos/mastodon/mastodon/commits
  • -a: The GitHub user account. You need to be authenticated since otherwise the rate limiting will smack you pretty hard for any decently sized repository.
  • -t: The GitHub Personal Access Token

Additionally, if you want you can specify:

This is probably not the cleanest Python around, so if anyone actually both 1.) reads this and 2.) looks at my code, feel free to drop some feedback. If absolutely nothing else, though, it gave me an excuse to write a little Python and a follow-up blog post.

Originally published at https://borked.sh on April 20, 2022.

--

--