by Patrick DeVivo

Ever want to know what organizations are contributing to an open-source codebase? This AskGit query may be able to help:

Or, in other words: show me the email domains of commit authors (excluding merge commits), ordered by the most frequently occurring domain.

For instance, for the Kubernetes repo (see it live here):

Top email domains of contributors (by commit count) to the kubernetes source code
Top email domains of contributors (by commit count) to the kubernetes source code
Top email domains of contributors (by commit count) to the Kubernetes source code. Check it out

Some further angles to pursue

  • Look at more than commit count — lines of code added/removed, files modified, types of files modified, the actual content of contributions
  • Look…

by Patrick DeVivo

AskGit recently added support for querying against data in the GitHub API. Repositories and pull requests can be listed by calling several table-valued functions in your SQL queries. More resource types from the GitHub API are planned and will be integrated soon!

Using github_pull_requests('REPO_OWNER', REPO_NAME') in the FROM clause of a query will list all pull requests for the specified repository. There are optimizations in place to avoid a full table scan when possible (i.e. a paginated API walk for every pull request object, even if it won’t be used in the query). For instance:

Latest 100…

by Patrick DeVivo

AskGit is a tool we’ve been building that makes it possible to run SQL queries against data in git repositories. Recently, we added support for a stats table, which tracks lines of code added and removed to a file, for every commit (in the current history). Think git log --stat, but as a table that can be queried with SQL.

AskGit commit stats table

We can use this table to find areas of “code churn” in a repository.

Nicolas Carlo of Understand Legacy Code has a great article that advises focusing on hotspots in a codebase in order to proactively address…

by Patrick DeVivo

from — project management in your code with TODO comments

Have you ever come across a TODO comment in some code and had a chuckle? Maybe it looked like: // TODO fix this, with no context. Maybe it was added years ago by a developer who’s quit the project. Or better yet, maybe you added it yourself and haven’t given it a second thought since.

TODO comments are a low overhead way of saying “hmm, this isn’t perfect but I’m going to move on for now and hopefully make it better in the future.”

They come in a variety of forms: FIXME, HACK, OPTIMIZE, and they’re baked…

by Patrick DeVivo

Photo by Jantine Doornbos on Unsplash

As I work more on tickgit 🎟️, a question that repeatedly arises is whether TODO comments are a useful indicator of the health of a codebase. Generally, I think not — at least on their own. This does have me thinking about the general question of measuring a codebase’s health, however.

How can we measure the “health” of a project via it’s Github page alone, without even looking at the content of the code? This is a typical question when looking to compare two projects or libraries that have similar functionality, in an effort to decide which…

About a week ago, we published an article looking at the (mostly forgotten) 2k+ TODO comments in the Kubernetes codebase. We chose to look at Kubernetes because of its high profile and “high scale” as a large, open-source project. What we found was interesting, but probably not all that surprising.

Big projects have lots of TODOs, and those TODOs are mostly forgotten. This is likely fairly intuitive to most software developers out there. We’ve all been guilty of the “chuck it over the fence” mentality of leaving TODO comments, and we’ve probably all encountered ones left by others. A discussion…

by Patrick DeVivo

Photo by Yancy Min on Unsplash

Kubernetes is a big project. Not only because it’s a big deal, but also in terms of its source code. At the time of writing, there are 86k+ commits, 2k+ contributors, 2k+ open issues, 1k+ open PRs, and 61k+ stars. This is accessible from the project’s Github page.

scc counts 4.3M+ lines of go source code (5.2M+ total lines), 3M+ lines of “actual” vs. 700k+ lines of comments. 16k+ files in total. This includes the vendor/ directory.

We’ve been working on a project that surfaces TODO comments in a codebase to help developers do basic project management…

Augmentable Software

Building tools for software and data engineers

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store