Measuring the Health of Git Repositories 🧑‍⚕️

by Patrick DeVivo

Photo by Jantine Doornbos on Unsplash

As I work more on tickgit 🎟️, a question that repeatedly arises is whether TODO comments are a useful indicator of the health of a codebase. Generally, I think not — at least on their own. This does have me thinking about the general question of measuring a codebase’s health, however.

How can we measure the “health” of a project via it’s Github page alone, without even looking at the content of the code? This is a typical question when looking to compare two projects or libraries that have similar functionality, in an effort to decide which one is “better” to use. If all else is basically equal in terms of functionality and use case, what factors do you typically use to gauge which project is “healthier”?

In this case, “healthy” to me means a project that is well maintained and will continue to be so. I think it’s not only important to evaluate whether a project has been recently healthy but to look for indicators that it will continue to be healthy.

I’ll list out some of the “metrics” or factors that seem important to me. I’ll try to be as language agnostic as possible, as different language ecosystems have different practices and attitudes towards evaluating 3rd party dependencies.

I think this is a pretty common and basic measure to use when evaluating a project — has it been recently committed to or not? Have there been recent releases? This is probably a significant indicator of a project’s health, as it shows ongoing maintenance, feature development, and general “care” from the maintainers.

There’s some subtlety to this measure, though, because I don’t think “recency of last commit” makes sense on its own. It’s important to also judge when the latest meaningful commits have occurred — not just superficial updates (say, README edits or small impact code changes). This can be hard to discern, and maybe a good way to evaluate is by looking at changelogs or actual releases to understand better.

Noticing that a project has not been updated in years or months, however, is a very good baseline for evaluating lack-of-health. If it’s been some years, it’s probably an abandoned project (to state the obvious).

Is this project backed by a large tech company or an individual? Who are the primary committers to the project? Is it one person’s side project, or are there multiple people contributing?

I think the underlying question here is about “key man risk.” Is this project dependent on one person, or is there enough momentum that maintenance load is spread across a team or group of contributors?

I don’t think it’s necessarily bad for a project to be dependent on one person, especially if it’s open-source and has some framework for receiving contributions. It is a risk, however. Perhaps it’s important to also evaluate the individual maintaining the project. Are they themselves working at a large company that uses this project? Do they maintain other “successful” projects? Do they have some level of “profile” or reputation as a developer?

I think that projects from established companies will likely be more reliable than those from individuals (though there are exceptions to this left and right). This is not to say, however, that there is no key man risk in those projects, as the “key man” may indeed be the company itself.

In general, making a judgment about who maintains a project is an important part of evaluating whether it will continue to be maintained. Perhaps it’s important to understand the reasons the maintainers contribute as well (is it core to their business or just a side project?)

This can be hard to figure out if not advertised by the project itself, but I think is an important indicator of how “battle-tested” a codebase may be. What other significant projects also rely on this library or tool?

High profile projects or a large number of users indicate to me validation and potential for continued maintenance.

Validation in that many others (or other high profile projects) have pre-vetted this dependency and accepted it for themselves. “If it’s good enough for them, it’s good enough for me.”

Presumably projects with many and/or important users will be more motivated to remain kept up and continuously improve from user feedback.

This is certainly a more subjective “metric” to evaluate when looking at a repository. The care factor to me is about looking for indicators that the maintainers actively want consumers of their product to have a great user experience, they actively care about the health of their code and its ease of use.

- Is the README thorough and well thought out?
- Are metrics upfront? Test coverage, CI status, download counts, badges, etc.
- How is the documentation? Getting started guides?
- Is there active discussion on issues and PRs?
- Does the project have its own website, email list, communities?
- Is there a clear roadmap of upcoming features, and a process in which changes/improvements are decided and worked on?

I appreciate you making it this far! Generally speaking, I’m interested in better understanding how we can make measurements of the “health” of projects in a code agnostic way. I’m very curious to know if the above considerations can be resolved into more quantitative values. What could a “scorecard” or graph capturing the above look like?

Building tools for software and data engineers

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store