AskGit recently added support for querying against data in the GitHub API. Repositories and pull requests can be listed by calling several table-valued functions in your SQL queries. More resource types from the GitHub API are planned and will be integrated soon!
github_pull_requests('REPO_OWNER', REPO_NAME') in the
FROM clause of a query will list all pull requests for the specified repository. There are optimizations in place to avoid a full table scan when possible (i.e. a paginated API walk for every pull request object, even if it won’t be used in the query). For instance:
AskGit is a tool we’ve been building that makes it possible to run SQL queries against data in git repositories. Recently, we added support for a
stats table, which tracks lines of code added and removed to a file, for every commit (in the current history). Think
git log --stat, but as a table that can be queried with SQL.
We can use this table to find areas of “code churn” in a repository.
Nicolas Carlo of Understand Legacy Code has a great article that advises focusing on hotspots in a codebase in order to proactively address areas of technical debt. In particular, he looks at comparing code complexity with code churn to find which areas (files) are worth refactoring or re-examining. …
Have you ever come across a TODO comment in some code and had a chuckle? Maybe it looked like:
// TODO fix this, with no context. Maybe it was added years ago by a developer who’s quit the project. Or better yet, maybe you added it yourself and haven’t given it a second thought since.
TODO comments are a low overhead way of saying “hmm, this isn’t perfect but I’m going to move on for now and hopefully make it better in the future.”
They come in a variety of forms: FIXME, HACK, OPTIMIZE, and they’re baked into the programming habits of enough developers that many IDEs come preloaded with support for finding and displaying them. …
As I work more on tickgit 🎟️, a question that repeatedly arises is whether TODO comments are a useful indicator of the health of a codebase. Generally, I think not — at least on their own. This does have me thinking about the general question of measuring a codebase’s health, however.
How can we measure the “health” of a project via it’s Github page alone, without even looking at the content of the code? This is a typical question when looking to compare two projects or libraries that have similar functionality, in an effort to decide which one is “better” to use. …
About a week ago, we published an article looking at the (mostly forgotten) 2k+ TODO comments in the Kubernetes codebase. We chose to look at Kubernetes because of its high profile and “high scale” as a large, open-source project. What we found was interesting, but probably not all that surprising.
Big projects have lots of TODOs, and those TODOs are mostly forgotten. This is likely fairly intuitive to most software developers out there. We’ve all been guilty of the “chuck it over the fence” mentality of leaving TODO comments, and we’ve probably all encountered ones left by others. A discussion on Slashdot shows a decent diversity of opinion among developers regarding how they should be used, and what they represent. Are they tech debt? A normal part of software development? Basically a ticket? Less important than a ticket? A bad habit? …
by Patrick DeVivo
Kubernetes is a big project. Not only because it’s a big deal, but also in terms of its source code. At the time of writing, there are 86k+ commits, 2k+ contributors, 2k+ open issues, 1k+ open PRs, and 61k+ stars. This is accessible from the project’s Github page.
scc counts 4.3M+ lines of go source code (5.2M+ total lines), 3M+ lines of “actual” vs. 700k+ lines of comments. 16k+ files in total. This includes the