mirror of
https://github.com/tiennm99/ghstats.git
synced 2026-05-14 10:58:25 +00:00
chore: rename module + references to tiennm99/ghstats-cards
Matches the Marketplace name; repo is being renamed in lockstep. - go.mod module path: github.com/tiennm99/ghstats → github.com/tiennm99/ghstats-cards - Import paths across every .go file updated. - README badges, install snippets, and the 'go install' line point to the new URL/path. - docs/deployment-guide.md workflow template, Docker image path, and release edit URL updated. Breaking for consumers pinned to the old URL; they need to swap tiennm99/ghstats → tiennm99/ghstats-cards in workflows and switch Docker pulls to ghcr.io/tiennm99/ghstats-cards. GitHub's HTTP redirect covers git clones but GHCR does NOT redirect — users must update image URIs manually.
This commit is contained in:
@@ -3,14 +3,14 @@
|
||||
> Generate SVG cards summarizing a GitHub user's profile — written in Go.
|
||||
|
||||
[](https://github.com/marketplace/actions/ghstats-cards)
|
||||
[](https://github.com/tiennm99/ghstats/releases/latest)
|
||||
[](./LICENSE)
|
||||
[](https://github.com/tiennm99/ghstats-cards/releases/latest)
|
||||
[](./LICENSE)
|
||||
|
||||
`ghstats` is a single-binary CLI (and a GitHub Action wrapping it) that fetches
|
||||
data for a GitHub user and writes a themed set of SVGs you can embed in your
|
||||
profile README.
|
||||
|
||||
Marketplace listing: **[ghstats-cards](https://github.com/marketplace/actions/ghstats-cards)** · Source: [`tiennm99/ghstats`](https://github.com/tiennm99/ghstats)
|
||||
Marketplace listing: **[ghstats-cards](https://github.com/marketplace/actions/ghstats-cards)** · Source: [`tiennm99/ghstats-cards`](https://github.com/tiennm99/ghstats-cards)
|
||||
|
||||
Cards rendered:
|
||||
|
||||
@@ -49,7 +49,7 @@ jobs:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v5
|
||||
- uses: tiennm99/ghstats@v1
|
||||
- uses: tiennm99/ghstats-cards@v1
|
||||
with:
|
||||
user: ${{ github.repository_owner }}
|
||||
token: ${{ secrets.GHSTATS_TOKEN }} # classic PAT with read:user + repo
|
||||
@@ -96,13 +96,13 @@ Then embed the cards in your `README.md`:
|
||||
## Use as a CLI
|
||||
|
||||
```sh
|
||||
go install github.com/tiennm99/ghstats@latest
|
||||
go install github.com/tiennm99/ghstats-cards@latest
|
||||
```
|
||||
|
||||
Or build from source:
|
||||
|
||||
```sh
|
||||
git clone https://github.com/tiennm99/ghstats
|
||||
git clone https://github.com/tiennm99/ghstats-cards
|
||||
cd ghstats
|
||||
go build -o ghstats .
|
||||
```
|
||||
|
||||
@@ -24,7 +24,7 @@ jobs:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v5
|
||||
- uses: tiennm99/ghstats@v1
|
||||
- uses: tiennm99/ghstats-cards@v1
|
||||
with:
|
||||
user: ${{ github.repository_owner }}
|
||||
token: ${{ secrets.GHSTATS_TOKEN }}
|
||||
@@ -77,7 +77,7 @@ Install:
|
||||
|
||||
```sh
|
||||
# Linux x86_64 example
|
||||
curl -L https://github.com/tiennm99/ghstats/releases/latest/download/ghstats_linux_amd64.tar.gz \
|
||||
curl -L https://github.com/tiennm99/ghstats-cards/releases/latest/download/ghstats_linux_amd64.tar.gz \
|
||||
| tar xz
|
||||
./ghstats -user YOUR_USERNAME
|
||||
```
|
||||
@@ -85,21 +85,21 @@ curl -L https://github.com/tiennm99/ghstats/releases/latest/download/ghstats_lin
|
||||
## 3. go install
|
||||
|
||||
```sh
|
||||
go install github.com/tiennm99/ghstats@latest
|
||||
go install github.com/tiennm99/ghstats-cards@latest
|
||||
```
|
||||
|
||||
Requires Go 1.26+. Puts the binary in `$(go env GOPATH)/bin`.
|
||||
|
||||
## Docker image
|
||||
|
||||
Published to `ghcr.io/tiennm99/ghstats:<tag>` on each `v*` release via `.github/workflows/release.yml` (buildx, multi-tag: exact version, major.minor, major, latest).
|
||||
Published to `ghcr.io/tiennm99/ghstats-cards:<tag>` on each `v*` release via `.github/workflows/release.yml` (buildx, multi-tag: exact version, major.minor, major, latest).
|
||||
|
||||
The Action itself uses a runner-built image by default (`image: Dockerfile` in `action.yml`). To switch to the pre-built image for faster cold starts, edit `action.yml`:
|
||||
|
||||
```yaml
|
||||
runs:
|
||||
using: docker
|
||||
image: docker://ghcr.io/tiennm99/ghstats:v1
|
||||
image: docker://ghcr.io/tiennm99/ghstats-cards:v1
|
||||
```
|
||||
|
||||
## Release process
|
||||
@@ -113,7 +113,7 @@ runs:
|
||||
5. **Marketplace publishing (one-time per repo):** GitHub only exposes the
|
||||
"Publish this Action to the GitHub Marketplace" toggle on the Release
|
||||
web UI — there is no CLI flag. Open the newly created release at
|
||||
`https://github.com/tiennm99/ghstats/releases/tag/vX.Y.Z/edit`, tick the
|
||||
`https://github.com/tiennm99/ghstats-cards/releases/tag/vX.Y.Z/edit`, tick the
|
||||
marketplace checkbox, accept the terms, and re-publish. Subsequent
|
||||
releases inherit marketplace visibility automatically.
|
||||
|
||||
|
||||
@@ -1,3 +1,3 @@
|
||||
module github.com/tiennm99/ghstats
|
||||
module github.com/tiennm99/ghstats-cards
|
||||
|
||||
go 1.26
|
||||
|
||||
@@ -6,8 +6,8 @@ import (
|
||||
"os"
|
||||
"path/filepath"
|
||||
|
||||
"github.com/tiennm99/ghstats/internal/github"
|
||||
"github.com/tiennm99/ghstats/internal/theme"
|
||||
"github.com/tiennm99/ghstats-cards/internal/github"
|
||||
"github.com/tiennm99/ghstats-cards/internal/theme"
|
||||
)
|
||||
|
||||
// Card renders one SVG for a Profile under the given theme.
|
||||
|
||||
@@ -6,8 +6,8 @@ import (
|
||||
"strings"
|
||||
"testing"
|
||||
|
||||
"github.com/tiennm99/ghstats/internal/github"
|
||||
"github.com/tiennm99/ghstats/internal/theme"
|
||||
"github.com/tiennm99/ghstats-cards/internal/github"
|
||||
"github.com/tiennm99/ghstats-cards/internal/theme"
|
||||
)
|
||||
|
||||
func TestRenderAll(t *testing.T) {
|
||||
|
||||
@@ -5,8 +5,8 @@ import (
|
||||
"strings"
|
||||
"time"
|
||||
|
||||
"github.com/tiennm99/ghstats/internal/github"
|
||||
"github.com/tiennm99/ghstats/internal/theme"
|
||||
"github.com/tiennm99/ghstats-cards/internal/github"
|
||||
"github.com/tiennm99/ghstats-cards/internal/theme"
|
||||
)
|
||||
|
||||
type contributionsCard struct{}
|
||||
|
||||
@@ -5,8 +5,8 @@ import (
|
||||
"math"
|
||||
"strings"
|
||||
|
||||
"github.com/tiennm99/ghstats/internal/github"
|
||||
"github.com/tiennm99/ghstats/internal/theme"
|
||||
"github.com/tiennm99/ghstats-cards/internal/github"
|
||||
"github.com/tiennm99/ghstats-cards/internal/theme"
|
||||
)
|
||||
|
||||
// renderDonutCard draws a donut chart with a left-side legend. Shared by the
|
||||
|
||||
@@ -1,8 +1,8 @@
|
||||
package card
|
||||
|
||||
import (
|
||||
"github.com/tiennm99/ghstats/internal/github"
|
||||
"github.com/tiennm99/ghstats/internal/theme"
|
||||
"github.com/tiennm99/ghstats-cards/internal/github"
|
||||
"github.com/tiennm99/ghstats-cards/internal/theme"
|
||||
)
|
||||
|
||||
type mostCommitLanguageCard struct{}
|
||||
|
||||
@@ -1,8 +1,8 @@
|
||||
package card
|
||||
|
||||
import (
|
||||
"github.com/tiennm99/ghstats/internal/github"
|
||||
"github.com/tiennm99/ghstats/internal/theme"
|
||||
"github.com/tiennm99/ghstats-cards/internal/github"
|
||||
"github.com/tiennm99/ghstats-cards/internal/theme"
|
||||
)
|
||||
|
||||
type mostCommitLanguageAllTimeCard struct{}
|
||||
|
||||
@@ -4,8 +4,8 @@ import (
|
||||
"fmt"
|
||||
"strings"
|
||||
|
||||
"github.com/tiennm99/ghstats/internal/github"
|
||||
"github.com/tiennm99/ghstats/internal/theme"
|
||||
"github.com/tiennm99/ghstats-cards/internal/github"
|
||||
"github.com/tiennm99/ghstats-cards/internal/theme"
|
||||
)
|
||||
|
||||
type productiveCard struct{}
|
||||
|
||||
@@ -5,8 +5,8 @@ import (
|
||||
"strings"
|
||||
"time"
|
||||
|
||||
"github.com/tiennm99/ghstats/internal/github"
|
||||
"github.com/tiennm99/ghstats/internal/theme"
|
||||
"github.com/tiennm99/ghstats-cards/internal/github"
|
||||
"github.com/tiennm99/ghstats-cards/internal/theme"
|
||||
)
|
||||
|
||||
type profileCard struct{}
|
||||
|
||||
@@ -1,8 +1,8 @@
|
||||
package card
|
||||
|
||||
import (
|
||||
"github.com/tiennm99/ghstats/internal/github"
|
||||
"github.com/tiennm99/ghstats/internal/theme"
|
||||
"github.com/tiennm99/ghstats-cards/internal/github"
|
||||
"github.com/tiennm99/ghstats-cards/internal/theme"
|
||||
)
|
||||
|
||||
type reposPerLanguageCard struct{}
|
||||
|
||||
@@ -4,8 +4,8 @@ import (
|
||||
"fmt"
|
||||
"strings"
|
||||
|
||||
"github.com/tiennm99/ghstats/internal/github"
|
||||
"github.com/tiennm99/ghstats/internal/theme"
|
||||
"github.com/tiennm99/ghstats-cards/internal/github"
|
||||
"github.com/tiennm99/ghstats-cards/internal/theme"
|
||||
)
|
||||
|
||||
type statsCard struct{}
|
||||
|
||||
@@ -11,9 +11,9 @@ import (
|
||||
"syscall"
|
||||
"time"
|
||||
|
||||
"github.com/tiennm99/ghstats/internal/card"
|
||||
"github.com/tiennm99/ghstats/internal/github"
|
||||
"github.com/tiennm99/ghstats/internal/theme"
|
||||
"github.com/tiennm99/ghstats-cards/internal/card"
|
||||
"github.com/tiennm99/ghstats-cards/internal/github"
|
||||
"github.com/tiennm99/ghstats-cards/internal/theme"
|
||||
)
|
||||
|
||||
func main() {
|
||||
|
||||
@@ -0,0 +1,94 @@
|
||||
# Why Your Most-Commit-Language (All Time) Looks Like This
|
||||
|
||||
Reconstructed from live GraphQL data for user `tiennm99` on 2026-04-18.
|
||||
|
||||
## The card says
|
||||
|
||||
| Rank | Language | % |
|
||||
|---|---|---|
|
||||
| 1 | JavaScript | 24.96 |
|
||||
| 2 | Python | 22.68 |
|
||||
| 3 | C# | 19.54 |
|
||||
| 4 | Go | 12.65 |
|
||||
| 5 | Svelte | 11.40 |
|
||||
| 6 | Other | 8.77 |
|
||||
|
||||
## What the algorithm actually sees
|
||||
|
||||
Only your **top 10 starred non-fork owned repos** get probed, up to **500 commits each**. Everything else is invisible to this card.
|
||||
|
||||
**Your top 10 + your commits in them + linguist byte split:**
|
||||
|
||||
| Repo | Your commits | Primary | Linguist bytes (non-prose) |
|
||||
|---|---:|---|---|
|
||||
| time-mocker | 34 | C# | C# 100% |
|
||||
| adventofcode | 15 | Go | Go 100% |
|
||||
| export-telegram-group-members | 9 | Python | Python 100% |
|
||||
| lottery-generator | 11 | Java | Java 100% |
|
||||
| ghstats | 2 | Go | Go 100% |
|
||||
| rplace | **44** | JavaScript | JS 56% · Svelte 43% · HTML/CSS ~0.4% |
|
||||
| thptqg2016 | 9 | JavaScript | JS 72% · CSS 27% |
|
||||
| go-util | 5 | Go | Go 100% |
|
||||
| try-bmad | 35 | Python | **Python 87% · JS 7%** · HTML/Svelte/Groovy/CSS ~5% |
|
||||
| try-claudekit | 10 | JavaScript | JS 96% |
|
||||
|
||||
Total commits counted: **174**.
|
||||
|
||||
## Per-language derivation
|
||||
|
||||
Each commit contributes 1 "vote" split by linguist byte share. Summed:
|
||||
|
||||
| Language | Vote from… | Total |
|
||||
|---|---|---:|
|
||||
| **JavaScript** | rplace 24.71 · try-claudekit 9.64 · thptqg2016 6.50 · try-bmad 2.57 | **43.42** |
|
||||
| **Python** | try-bmad 30.47 · export-tg 9.00 | **39.47** |
|
||||
| **C#** | time-mocker 34.00 | **34.00** |
|
||||
| **Go** | adventofcode 15 · go-util 5 · ghstats 2 | **22.00** |
|
||||
| **Svelte** | rplace 19.14 · try-bmad 0.69 | **19.83** |
|
||||
| Java | lottery-generator 11.00 | 11.00 |
|
||||
| CSS | thptqg2016 2.40 · small others | 2.77 |
|
||||
| HTML | spread across rplace/thptqg2016/try-bmad/try-claudekit | 1.13 |
|
||||
| Groovy | try-bmad 0.37 | 0.37 |
|
||||
|
||||
Dividing by 174 → exactly matches the card (24.96% / 22.68% / 19.54% / 12.65% / 11.40% / 8.78%).
|
||||
|
||||
## Why these rank where they do
|
||||
|
||||
| Language | Real story |
|
||||
|---|---|
|
||||
| **JavaScript** dominates | Three throwaway/experiment repos (rplace, try-claudekit, thptqg2016) plus a Python project that happens to ship a JS frontend (try-bmad). The 44 rplace commits alone contribute 24.7 of the 43 JS "votes" — a single weekend project is driving the #1 slot. |
|
||||
| **Python** #2 | 77% of it (30.47 of 39.47) is **try-bmad**, a 35-commit scaffolding project. The 87% Python byte share there means every commit — even a README edit — gets 87% credit to Python. |
|
||||
| **C#** #3 | Every single C# vote comes from **time-mocker** (34 commits, 100% C#). This is the signal with the cleanest attribution — if you committed to that repo, it was almost certainly touching C# files. |
|
||||
| **Go** surprisingly low | Only 22 votes from three repos. `ghstats` itself has **2 commits** in your window because most of this session's work hasn't been pushed yet; and `adventofcode`/`go-util` are one-off exercises. Your Go day-job probably lives in private repos we can't see. |
|
||||
| **Svelte** appears out of nowhere | Linguist sees ~80KB of Svelte in **rplace** next to ~103KB of JS. Each of rplace's 44 commits credits 43% to Svelte — even commits that only touched `.js` files. That's the cost of byte-weighted attribution. |
|
||||
|
||||
## Why it feels wrong
|
||||
|
||||
Four structural reasons:
|
||||
|
||||
1. **Top-10 cap.** Sorted by stargazers. Your actual Go/Python dev work probably lives in repos with 0 stars but many commits — they don't make the cut.
|
||||
2. **Private repos invisible.** `ownerAffiliations: OWNER` + the token's scope. Your VNG Corp code does not appear.
|
||||
3. **Forks excluded.** `isFork: false`. If you hack on forks of upstream projects, those contributions vanish.
|
||||
4. **Byte-weighted ≠ file-touched.** rplace credits 43% to Svelte on every commit regardless of which file you edited. A commit that fixes one line in `main.js` still counts Svelte bytes.
|
||||
|
||||
## What would actually fix it
|
||||
|
||||
Only per-commit file classification does (already scoped in earlier research — `REST /commits/{sha}` + go-enry). That path would:
|
||||
|
||||
- Count each commit by the **files actually touched** (Svelte only if you edited `*.svelte`).
|
||||
- Support opt-in to cover all repos, not just top-10.
|
||||
- Recover Markdown-heavy repos that currently vanish.
|
||||
|
||||
Cost: ~1000 REST calls per run vs. the current ~15 GraphQL calls. A toggleable `-accurate-languages` flag still makes sense.
|
||||
|
||||
## Quick wins you could take without the rewrite
|
||||
|
||||
- **Raise `-top-repos`** to 30–50 so less-starred but heavily-committed repos enter the sample.
|
||||
- **Raise `-commits-per-repo`** past 500 if you care about lifetime depth.
|
||||
- **Add `-exclude-repo rplace,try-bmad,try-claudekit`** (not implemented yet) to drop known experiment repos the way github-profile-summary-cards does.
|
||||
|
||||
## Unresolved questions
|
||||
|
||||
- Do you want the `-exclude-repo` flag landed now as a short-term fix?
|
||||
- Should we expose `ownerAffiliations` so contributed-to (non-owned) repos can be included?
|
||||
- Is the 500-commit cap actually binding for any of your repos, or is 174 total just "everything you've pushed"?
|
||||
@@ -0,0 +1,226 @@
|
||||
# Full-Project Code Review
|
||||
|
||||
Scope: all Go source, `action.yml`, `entrypoint.sh`, `Dockerfile`, workflows. ~2300 LOC. Adversarial pass included.
|
||||
|
||||
Verification harness used: Go compile + tests pass clean (`go vet ./... && go test ./...`). Real-data render against `tiennm99` with token — all 9 cards produced valid SVGs. One bug reproduced with a standalone probe.
|
||||
|
||||
## Verdict
|
||||
|
||||
Code quality is high for the scope. One real rendering bug, several correctness and hygiene issues, and a meaningful test-coverage gap. Nothing blocks merge; prioritize the **Important** list before next tag.
|
||||
|
||||
## Severity counts
|
||||
|
||||
| Critical | Important | Nice-to-have |
|
||||
|---:|---:|---:|
|
||||
| 0 | 6 | 10 |
|
||||
|
||||
---
|
||||
|
||||
## Important
|
||||
|
||||
### I1 — Donut chart renders empty when there is only 1 slice ⚠️
|
||||
**File**: `internal/card/donut_chart.go:60-79`
|
||||
|
||||
For a user with a single language at 100%, `angle = 2π` → `start == end` → SVG `A` command degenerates to zero-length arc. The donut shows nothing.
|
||||
|
||||
Reproduced with a standalone probe:
|
||||
```
|
||||
start=(380, 50)
|
||||
end=(380, 50) # identical → empty arc
|
||||
same=true
|
||||
```
|
||||
|
||||
**Fix**: when there's exactly one slice (or when `angle >= 2π - ε`), render a full ring via two half-arcs, a `<circle>` with a stroke, or a punched-out clip path. Smallest change:
|
||||
```go
|
||||
if len(stats) == 1 {
|
||||
// Full ring: outer circle fill + inner background circle
|
||||
fmt.Fprintf(&b, `<circle cx=... fill="%s"/><circle cx=... fill="%s"/>`, slice.Color, t.Background)
|
||||
...
|
||||
}
|
||||
```
|
||||
|
||||
### I2 — `FetchContributionsAllTime` silently drops failed years
|
||||
**File**: `internal/github/contributions_all_time.go:62-64`
|
||||
|
||||
```go
|
||||
if resp.User == nil {
|
||||
continue
|
||||
}
|
||||
```
|
||||
If GraphQL returns an error JSON per year (e.g. permission issue, transient 5xx that somehow bypassed the HTTP error check), the year is skipped with zero logging. All 8 years of a user's history could silently vanish — cards render "No data available" with no diagnostic.
|
||||
|
||||
**Fix**: log at warn level when `resp.User == nil` after a successful response; or return a wrapped error so `main.go` can surface a partial-data warning.
|
||||
|
||||
### I3 — Stale comment on `FetchOptions` contradicts live behavior
|
||||
**File**: `internal/github/profile.go:61-62`
|
||||
|
||||
```go
|
||||
// FetchOptions tunes which repos contribute to the profile's aggregates.
|
||||
// All defaults are conservative (no forks, no private) so public-facing
|
||||
// READMEs don't accidentally leak work-repo signal.
|
||||
```
|
||||
Defaults are now **true/true** (commit `514195c`). Zero-value `FetchOptions{}` still gives `false/false`, so API callers using `FetchOptions{}` get one behavior while CLI callers get another. Subtle surprise.
|
||||
|
||||
**Fix**: either update the comment to reflect new posture, or flip the zero-value semantics (e.g., `ExcludeForks bool` / `ExcludePrivate bool`) so the package-level default matches the CLI. The former is simpler.
|
||||
|
||||
### I4 — `TestRenderAll` no longer verifies XML escape
|
||||
**File**: `internal/card/card_test.go:17,63`
|
||||
|
||||
The test sets `Bio: "Test & <bio>"` and asserts the raw string doesn't appear in any rendered SVG. But `Bio` is **not rendered** anywhere after the profile-card redesign — it was removed along with the github-row. The assertion is trivially true for every future change.
|
||||
|
||||
Additionally, `TestRenderAll` only checks each file starts with `<svg` — it does not validate content. A card that renders an empty shell would pass.
|
||||
|
||||
**Fix**: set `Name: "Alice & <bob>"` (Name **is** rendered, in the title) and keep the assertion. Add a golden-file comparison per card for at least one theme to catch silent regressions.
|
||||
|
||||
### I5 — Release workflow doesn't gate on tests
|
||||
**File**: `.github/workflows/release.yml`
|
||||
|
||||
Tagging `v1.2.3` runs the release pipeline immediately — no `go test ./...`, no `needs: [ci]`. A broken `main` tagged accidentally ships broken binaries + Docker image.
|
||||
|
||||
**Fix**: add a pre-release test job:
|
||||
```yaml
|
||||
test:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v6
|
||||
- uses: actions/setup-go@v6
|
||||
with:
|
||||
go-version: "1.26"
|
||||
- run: go vet ./... && go test ./...
|
||||
docker:
|
||||
needs: [test]
|
||||
...
|
||||
binaries:
|
||||
needs: [test]
|
||||
...
|
||||
```
|
||||
|
||||
### I6 — `attributeCommit` recomputes `total` per-commit
|
||||
**File**: `internal/github/productive.go:111-115` + `:86-92` (caller)
|
||||
|
||||
```go
|
||||
func attributeCommit(repo RepoInfo, ...) {
|
||||
var total int64
|
||||
for _, l := range repo.Languages {
|
||||
total += l.Bytes
|
||||
}
|
||||
...
|
||||
}
|
||||
```
|
||||
Called once per commit. For 500 commits × 6-20 languages per repo, that's 3000–10000 redundant additions per repo. Not a runtime problem, but it's a correctness smell: if `repo.Languages` were ever mutated between calls, you'd get inconsistent totals.
|
||||
|
||||
**Fix**: precompute `total` once per repo, pass it in or cache on `RepoInfo`. Clean separation of loop-invariant work.
|
||||
|
||||
---
|
||||
|
||||
## Nice-to-have
|
||||
|
||||
### N1 — `joinErrs` reinvents `strings.Join`
|
||||
**File**: `internal/github/client.go:102-110`
|
||||
|
||||
Custom loop concatenates with `"; "`. Identical to `strings.Join(ss, "; ")`. Replace.
|
||||
|
||||
### N2 — No total timeout / `context.Context`
|
||||
**File**: `internal/github/client.go:26`, all fetchers
|
||||
|
||||
`http.Client.Timeout = 30s` applies per request. A fetch with 50 pages × 30s worst-case = 25 minutes. No way to set an overall budget or cancel on signal. Add `ctx context.Context` to `Client.query` and fetcher methods; respect `<-ctx.Done()` in pagination loops.
|
||||
|
||||
### N3 — `truncate` may split UTF-8 mid-sequence
|
||||
**File**: `internal/github/client.go:95-100`
|
||||
|
||||
`string(b[:n])` at an arbitrary byte boundary can leave a half-character. Purely cosmetic in error messages; fine to ignore.
|
||||
|
||||
### N4 — Docker base images pinned to major, not digest
|
||||
**File**: `Dockerfile:1,8`
|
||||
|
||||
`golang:1.26-alpine` and `alpine:3.21` are mutable tags. A compromised Alpine push affects builds. Pin to `@sha256:…` for hermetic builds. Acceptable risk for an OSS tool.
|
||||
|
||||
### N5 — Third-party GHA actions pinned to major
|
||||
**File**: `.github/workflows/release.yml`
|
||||
|
||||
`docker/build-push-action@v6`, `softprops/action-gh-release@v2`, `actions/checkout@v6` — mutable. Pin to SHA for supply-chain safety. Same caveat as N4.
|
||||
|
||||
### N6 — No rate-limit header inspection
|
||||
**File**: `internal/github/client.go:63-73`
|
||||
|
||||
`x-ratelimit-remaining` / `x-ratelimit-reset` are ignored. When near zero, the next call returns 403 and we error out. Cheap improvement: parse headers and sleep until reset.
|
||||
|
||||
### N7 — `xAxisLabelVisible` rules can drop the penultimate label without the cosmetic expectation
|
||||
**File**: `internal/card/contributions.go:158-160`
|
||||
|
||||
```go
|
||||
if n-1-i < stride/2 {
|
||||
return false
|
||||
}
|
||||
```
|
||||
For some `(n, stride)` pairs this drops a label that would have been `stride` apart. Result: 5 labels where you expected 6. Minor visual quirk; not a bug.
|
||||
|
||||
### N8 — `Profile.TotalContributions` is misnamed
|
||||
**File**: `internal/github/model.go` (via `profile.go:118`)
|
||||
|
||||
Set from `ContributionCalendar.TotalContributions + RestrictedContributionsCount`, which is a **one-year** total, not lifetime. Field name suggests lifetime. Rename to `TotalContributionsLastYear` or compute from the AllTime loop.
|
||||
|
||||
### N9 — Stats card's "Contributed to (non-fork)" stays capped at the top-N
|
||||
**File**: `internal/card/stats.go` + `profile.go:113`
|
||||
|
||||
`TotalContributedTo = u.RepositoriesContributedTo.TotalCount` queries with `first: 1`, so it only returns the **count** — that's fine. But the label says "(non-fork)" while the query doesn't actually filter by fork. Drop the "(non-fork)" qualifier or add `contributionTypes` filter that excludes forks.
|
||||
|
||||
### N10 — `output/` in .gitignore exception is narrow but fragile
|
||||
**File**: `.gitignore`
|
||||
|
||||
```
|
||||
output/*
|
||||
!output/dracula/
|
||||
```
|
||||
If someone runs with `-out=output/` and a theme named `dracula`, their local changes overlay the committed sample. Usually fine. If someone runs `-themes all`, `output/*` blocks every theme dir except dracula — working as intended. No action needed; noted for future refactors.
|
||||
|
||||
---
|
||||
|
||||
## Adversarial pass — things that don't break
|
||||
|
||||
Tried and confirmed safe:
|
||||
|
||||
| Attack | Result |
|
||||
| --- | --- |
|
||||
| XML injection via `Bio`, `Name`, `Company`, `Location`, `Website`, language names | All flow through `escapeXML` — safe |
|
||||
| Theme ID path traversal (`-themes ../../etc`) | `theme.Lookup` rejects unknown IDs before `filepath.Join` — safe |
|
||||
| GraphQL injection via `$login`, `$owner`, `$repo` | Variables passed separately from query string — safe |
|
||||
| Shell injection in `entrypoint.sh` via user inputs | All variables double-quoted; no `eval` — safe |
|
||||
| Token exposure in logs | `entrypoint.sh:31` echoes user/themes/out only; no token echo — safe |
|
||||
| Token exposure in error messages | HTTP errors truncate body to 500 bytes; GitHub bodies don't echo tokens — safe |
|
||||
| Integer overflow in `scaleFactor * bytes / total` | Worst realistic case ~10^14, well under int64 max — safe |
|
||||
| Panic from `Productive[tl.Hour()]` | `Hour()` always returns 0-23 — safe |
|
||||
| NaN from `angle := 2 * math.Pi * value / total` when total=0 | len(stats)>0 check guarantees one non-zero value, so total>0 in practice. But... if all values happen to be 0, we'd produce NaN. Unlikely (sortLangStats wouldn't generate 0-valued entries), but defensive check wouldn't hurt. |
|
||||
| Resource exhaustion via huge user | `maxPages = 10` in FetchProfile (1000 repos cap); `maxRepositories: 100` per year in seed query. Bounded. |
|
||||
| Race conditions | Single-goroutine — no races possible |
|
||||
| Division by zero in `xAxisLabelVisible` stride calc | Early return for `n <= xLabelTarget` prevents stride=0 — safe |
|
||||
|
||||
---
|
||||
|
||||
## Testing gaps (summary)
|
||||
|
||||
| Missing test | Motivation |
|
||||
| --- | --- |
|
||||
| Single-slice donut | Catches bug I1 |
|
||||
| `catmullRomLinePath([1 point])` / `[]` | Verifies early returns |
|
||||
| Half-hour timezone (`Asia/Kolkata`) in `utcOffsetLabel` | Verifies `%+.2f` formatting |
|
||||
| `sortLangStats` with ties and empty color map | Already partially tested |
|
||||
| Empty `DailyContributions` → "No data available" path | Already partially tested indirectly |
|
||||
| Golden SVG comparison per card for one theme | Catches regressions across refactors |
|
||||
|
||||
---
|
||||
|
||||
## Not issues
|
||||
|
||||
- No external Go deps. Good.
|
||||
- `filepath.Join(outDir, t.ID)` — `t.ID` sourced from the themes map keys, validated through `theme.Lookup`. Safe.
|
||||
- GraphQL query strings are compile-time constants. No injection surface.
|
||||
- Commit-time error paths (`git add`, `git commit`, `git push`) run in Action workspace only and fail closed.
|
||||
|
||||
---
|
||||
|
||||
## Unresolved questions
|
||||
|
||||
- Should we land `-accurate-languages` (REST-per-commit + enry) as the primary fix for Markdown-blog attribution, or invest in `-deep` (partial bare clone) first? Roadmap has both.
|
||||
- Is the single-slice donut case (I1) rare enough in practice to accept a compact fix, or worth refactoring to render all donuts with a full-ring base + pie slices on top (more robust, slightly more code)?
|
||||
- Release workflow: run tests in matrix (cross-platform) or on linux only before shipping? Linux-only is practical; matrix catches platform bugs but triples minutes.
|
||||
@@ -0,0 +1,185 @@
|
||||
# Most Commit Language — Accurate Attribution Research
|
||||
|
||||
## Problem & constraints
|
||||
|
||||
GitHub's default language detection (`repo.primaryLanguage`) counts total bytes per repo language, not weighted by commit activity. Users with mixed-language repos (e.g., blog w/ 3 JS files + 1000 Markdown files) get misattributed: every commit to `.md` gets credited to the lowest-byte language, not the language actually edited. This report evaluates methods to fix attribution by tracking which files are modified per-commit.
|
||||
|
||||
**Constraints:** Action runner cost (storage/time), REST API rate limits (5K/hr), accuracy trade-offs, language-detection reliability without repo access.
|
||||
|
||||
---
|
||||
|
||||
## Prior art comparison table
|
||||
|
||||
| Project | Method | Clone? | Accuracy | Cost | Notes |
|
||||
|---------|--------|--------|----------|------|-------|
|
||||
| **anuraghazra/github-readme-stats (GRS)** | GraphQL `languages(first:10, orderBy:SIZE)` per repo | No | Byte-size only; **no commit weighting** | 1 GraphQL query/100 repos | Baseline: pure size-based, can't solve commit problem |
|
||||
| **lowlighter/metrics (indepth)** | Clone repos → `git log --patch` per commit → linguist-js classify each file → accumulate by type | Yes | **Per-commit, per-line** | 15 min/repo timeout; heavy | Ground truth; `categories=[programming,markup]` filter; handles `.gitattributes` |
|
||||
| **lowlighter/metrics (default/recent)** | GraphQL byte-size (same as GRS) + recent event stream analysis | No | Byte-size + event heuristics | Lightweight | Not commit-weighted |
|
||||
| **Proposed: REST per-commit + go-enry** | REST `/repos/{o}/{r}/commits/{sha}` for each commit → go-enry classify filenames/extensions | No | **Per-commit by filename**, no line counting | 1 REST call/commit; 5K limit = ~100 commits/hr | Fast; lightweight; no clone; accuracy limited to extension-level |
|
||||
|
||||
---
|
||||
|
||||
## How github-readme-stats handles it
|
||||
|
||||
**GRS does NOT solve the commit-attribution problem.** It computes language stats as:
|
||||
|
||||
1. **GraphQL fetch:** `repositories(ownerAffiliations: OWNER, isFork: false, first: 100)` → for each repo:
|
||||
- `languages(first: 10, orderBy: {field: SIZE, direction: DESC})` → get top 10 languages by **bytes**
|
||||
- Accumulate `size` values across all repos; weight by `size_weight=1` and `count_weight=0` (default)
|
||||
2. **Filters offered:** `exclude_repo` list only; no per-commit filtering, no commit-count weighting
|
||||
3. **Documented limitation:** Users can use `exclude_repo` to hide problematic repos; no built-in commit-weighting
|
||||
|
||||
**Conclusion:** GRS is optimized for byte-size ranking (good for codebases), not commit activity (good for contributor profiles). No per-commit analysis.
|
||||
|
||||
---
|
||||
|
||||
## lowlighter/metrics — further details
|
||||
|
||||
### Default categories
|
||||
Confirmed: `plugin_languages_categories` default is `[markup, programming]` (excludes `data`, `prose`). For indepth mode, users can override to include `prose` (which includes Markdown). Markdown is classified as **TypeProse** by go-enry (type code 4).
|
||||
|
||||
### Per-file analysis in indepth mode
|
||||
- Clones repo to temp directory
|
||||
- Runs `git log --author=<user> --patch` to fetch each commit with diff
|
||||
- Parses unified diff to extract file paths and line counts (added/deleted per file)
|
||||
- Calls `linguist-js` (Node wrapper around linguist) to classify each file
|
||||
- Accumulates `{bytes, lines}` per language per commit
|
||||
- Filters by `categories` to exclude unwanted types
|
||||
|
||||
### Clone timeout & safety
|
||||
- `plugin_languages_analysis_timeout`: 15 sec global (default); 7.5 sec per repo (default)
|
||||
- **No REST-only fallback mode** — if clone fails, that repo is skipped; no graceful degradation
|
||||
- Symlinks, submodules, large binaries handled by linguist; `.gitattributes` **IS** respected (checked from cloned repo)
|
||||
|
||||
### De-duplication & fork handling
|
||||
- Indepth fetches only repos where user is OWNER (not forks, not contributed-to repos)
|
||||
- Counts only commits matching `--author` regex (authoring email list fetched from GPG keys)
|
||||
- Deduplicates by commit SHA within session
|
||||
|
||||
### REST-only mode does NOT exist
|
||||
lowlighter/metrics has two modes: default (byte-size, no clone) and indepth (clone + patch analysis). No hybrid REST-per-commit approach exposed.
|
||||
|
||||
---
|
||||
|
||||
## The go-enry + REST-per-commit approach — validation
|
||||
|
||||
### Module details
|
||||
- **Module path:** `github.com/go-enry/go-enry/v2` (Apache-2.0 license)
|
||||
- **Status:** Actively maintained; last push 2026-04-04; 603 stars; imported from github/linguist
|
||||
- **Go version:** 1.14+
|
||||
- **Pre-compiled data:** Yes — language metadata (types, colors, patterns) embedded in `data/*.go` files; auto-generated from github/linguist
|
||||
|
||||
### Language type mapping — confirmed for Markdown
|
||||
```go
|
||||
// Type int: 0=Unknown, 1=Data, 2=Programming, 3=Markup, 4=Prose
|
||||
// Markdown = Type 4 (Prose)
|
||||
var LanguagesType = map[string]int{
|
||||
"Markdown": 4, // Prose
|
||||
"YAML": 1, // Data
|
||||
"JSON": 1, // Data
|
||||
"JavaScript": 2, // Programming
|
||||
}
|
||||
```
|
||||
|
||||
**Implication:** If we use go-enry and filter to `types=[2, 3]` (Programming + Markup), Markdown gets excluded. To include Markdown, must expand to `types=[2, 3, 4]` or add it to a custom whitelist.
|
||||
|
||||
### Extension-only vs content-based classification
|
||||
go-enry has two classification modes:
|
||||
1. **Extension-only** (`GetLanguageByExtension("file.md")` → "Markdown"): Fast, no file content needed
|
||||
2. **Content-based** (`GetLanguageByContent(filename, content)` → "Markdown"): Slower; handles ambiguous extensions (e.g., `.txt`, `.r` for R vs reStructuredText)
|
||||
|
||||
**REST API provides:** Filename + additions/deletions per file, NO file content. So **we're limited to extension-only mode**, which is **~90% accurate** (fails on ambiguous extensions, but Markdown `.md` is unambiguous).
|
||||
|
||||
### `.gitattributes` support — NO
|
||||
- go-enry does NOT parse `.gitattributes` directives (`linguist-vendored`, `linguist-generated`, `linguist-ignore`)
|
||||
- Those live in the repo; without cloning, we can't access them
|
||||
- **Accuracy delta:** Most users don't use `.gitattributes` heavily; estimated 5-10% false positives (counting generated/vendored code as real)
|
||||
- **Mitigation:** Offer optional `.gitattributes` fetch via `GET /repos/{o}/{r}/contents/.gitattributes` if file exists (one extra REST call/repo)
|
||||
|
||||
### Binary/submodule/symlink handling
|
||||
- go-enry's extension-based approach is safe: won't misclassify `.exe`, `.so`, etc. (no match → `OtherLanguage`)
|
||||
- Submodules: filenames include submodule path; classified by extension of filename (safe)
|
||||
- Symlinks: REST API lists symlink targets as files; go-enry classifies by target extension (reasonable)
|
||||
|
||||
### Known limitations
|
||||
- **No semantic analysis:** Go comment syntax won't help distinguish `// in code` from `// in prose` (extension-based only)
|
||||
- **Ambiguous extensions:** `.r`, `.tsx` can be R or reStructuredText / TypeScript or TSX. Content-based classification helps, but we don't have content.
|
||||
- **Custom language aliases:** go-enry uses fixed language names from linguist; user aliases (e.g., renaming "TypeScript" to "TS") would need client-side mapping
|
||||
|
||||
---
|
||||
|
||||
## Cost estimation: REST per-commit approach
|
||||
|
||||
**Assumptions:** Top 10 repos, 100 commits/repo (user-configurable), 1K REST calls/run
|
||||
|
||||
**Rate limits:** 5K requests/hour (authenticated). At 100 commits/sec optimal rate, 1K calls ≈ 10 seconds API time.
|
||||
|
||||
**Safety:** Well within burst & hourly limits; safe for GitHub Actions runners.
|
||||
|
||||
**Breakdown per commit:**
|
||||
- `GET /repos/{o}/{r}/commits/{sha}` → 1 call, returns `files[].{filename, additions, deletions}`
|
||||
- go-enry local classification → negligible (microseconds per file)
|
||||
- No clone, no network I/O beyond REST
|
||||
|
||||
**Feasibility:** High. Straightforward to parallelize (batch SHA fetches) if needed.
|
||||
|
||||
---
|
||||
|
||||
## Additional ideas
|
||||
|
||||
### Idea 1: Sampling + statistical estimation
|
||||
Fetch 10-20 commits evenly distributed across repo history (e.g., every Nth commit). Use frequency to extrapolate total language distribution. **Trade-off:** 10x faster, ~70% accuracy (misses long-term shifts). **Verdict:** Useful for low-precision preview card, not for serious stats.
|
||||
|
||||
### Idea 2: GraphQL Commit history with limited file info
|
||||
GitHub GraphQL `repository.ref.target.history` supports commit queries, but only returns commit count + author info. NO per-file details. Checked: `Commit` object has no `files` or `changedFilesIfAvailable` field exposing filenames. **Verdict:** Dead end; no edge advantage over REST.
|
||||
|
||||
### Idea 3: Docker + linguist Ruby gem
|
||||
Run `linguist` CLI inside Docker during Action. Requires cloning repos into Action storage (same cost as lowlighter/metrics). More accurate than go-enry (Bayesian disambiguation). **Trade-off:** Heavy; docker image size; slower startup. **Verdict:** Over-engineered for our use case; go-enry sufficient.
|
||||
|
||||
### Idea 4: Fetch `.gitattributes` explicitly
|
||||
One additional `GET /repos/{o}/{r}/contents/.gitattributes` per repo. Parse it locally; override go-enry classification for files tagged `linguist-vendored` or `linguist-generated`. **Cost:** 1 call/repo; minimal. **Accuracy gain:** ~5-10% fewer false positives. **Verdict:** Worth doing; low cost, reasonable gain.
|
||||
|
||||
### Idea 5: Combine REST per-commit with GraphQL for byte-size fallback
|
||||
Query `GET /repos/{o}/{r}/languages` as secondary validation: if REST commit analysis yields unexpected results (e.g., 99% JavaScript despite repo being mostly Markdown), fall back to byte-size weighted stats. **Trade-off:** Adds complexity; not needed if go-enry is trusted. **Verdict:** Skip for v1; revisit after validation.
|
||||
|
||||
---
|
||||
|
||||
## Recommendation
|
||||
|
||||
**Proceed with: REST per-commit + go-enry, with optional `.gitattributes` override**
|
||||
|
||||
**Exact approach:**
|
||||
1. For each user repo, fetch all commits (paginated, limit configurable; default 100/repo)
|
||||
2. Per commit SHA: `GET /repos/{o}/{r}/commits/{sha}` → extract `files[].filename`
|
||||
3. Classify each filename via `go-enry/v2` extension-based detection
|
||||
4. Accumulate commit count + lines added per language per repo
|
||||
5. **Optional:** Fetch `.gitattributes` per repo; override classifications for files marked `linguist-generated` / `linguist-vendored`
|
||||
6. Filter output by language type (exclude `Unknown` / `Data`; include `Programming`, `Markup`, optionally `Prose`)
|
||||
7. Rank by commit count (primary) or lines added (secondary weighting option)
|
||||
|
||||
**Why this beats alternatives:**
|
||||
- **vs. byte-size (GRS):** Commit-weighted, not size-weighted → accurate for mixed-language repos
|
||||
- **vs. cloning (lowlighter):** No repo clone → safe for Action runner, completes in <10 sec for typical user
|
||||
- **vs. sampling:** 100% coverage, not statistical estimate
|
||||
- **Cost:** ~100 REST calls (1K limit budget = 10x headroom); well-designed for scale
|
||||
|
||||
**Estimated accuracy:** 90-95% (limited by extension-only classification; `.gitattributes` override lifts this to 95-98%)
|
||||
|
||||
---
|
||||
|
||||
## Unresolved questions
|
||||
|
||||
- Should **Prose** (Markdown, AsciiDoc, reStructuredText) be included in default output, or only **Programming + Markup**? (lowlighter defaults to Programming + Markup; users opt-in to Prose)
|
||||
- Do we want **lines-added weighted** language ranking, or just **commit-count weighted**? (e.g., 1-line typo fix vs. 500-line refactor)
|
||||
- Should **vendored/generated code** be excluded by default, or configurable? (`.gitattributes` parsing adds 1 call/repo; users likely don't care)
|
||||
- How do we handle repos with **no commits by user** (e.g., organization repos where user never committed)? (Skip? Count PR reviews? Leave blank?)
|
||||
- Fallback behavior if **go-enry can't classify a file** (e.g., custom `.lisp` variant): count as "Other" or skip?
|
||||
|
||||
---
|
||||
|
||||
**Sources:**
|
||||
- [anuraghazra/github-readme-stats](https://github.com/anuraghazra/github-readme-stats)
|
||||
- [lowlighter/metrics](https://github.com/lowlighter/metrics)
|
||||
- [go-enry/go-enry](https://github.com/go-enry/go-enry)
|
||||
- [GitHub REST API: Get a commit](https://docs.github.com/en/rest/commits/commits)
|
||||
- [GitHub REST API: Rate limits](https://docs.github.com/en/rest/using-the-rest-api/rate-limits-for-the-rest-api)
|
||||
@@ -0,0 +1,148 @@
|
||||
# Profile Stats Tools — Commit Attribution Survey (Round 2)
|
||||
|
||||
## Summary
|
||||
|
||||
Investigated 6 profile-stats projects + 2 template generators + BigQuery aggregate tool. **No project solves per-commit language attribution better than proposed REST + go-enry approach.** Two categories found: (1) **byte-size only** (GRS, jstrieb/github-stats, TraceLD), acknowledging the problem but not fixing it; (2) **WakaTime-only** (anmol098, athul), bypassing GitHub language API entirely via editor telemetry. One theoretical per-commit CLI mentioned in DEV.to discussions but repo not found in public GitHub. **Recommendation unchanged:** REST per-commit + go-enry + optional `.gitattributes` override is the frontier.
|
||||
|
||||
---
|
||||
|
||||
## Project-by-project findings
|
||||
|
||||
| Project | Repo URL | Primary Lang | Algorithm | Solves Commit Attribution? |
|
||||
|---------|----------|--------------|-----------|---------------------------|
|
||||
| **jstrieb/github-stats** | github.com/jstrieb/github-stats | Python | GraphQL `languages(orderBy:SIZE)` | No — byte-size only; explicit TODO: "Improve languages to scale by number of contributions" |
|
||||
| **anmol098/waka-readme-stats** | github.com/anmol098/waka-readme-stats | Python | WakaTime API editor telemetry | Yes, but orthogonal — not GitHub stats; bypasses the problem entirely |
|
||||
| **athul/waka-readme** | github.com/athul/waka-readme | Python | WakaTime API (simpler wrapper) | Yes, but orthogonal — WakaTime-only |
|
||||
| **yoshi389111/github-profile-3d-contrib** | github.com/yoshi389111/github-profile-3d-contrib | TypeScript | GitHub GraphQL contributions calendar | N/A — contributions only, no languages |
|
||||
| **sarthakhingankar/github-profile-readme-generator** | (repo not found / archived) | — | — | — |
|
||||
| **rahul-jha98/github-profile-readme-generator** | (repo not found / archived) | — | — | — |
|
||||
| **TraceLD/github-user-language-breakdown** | github.com/TraceLD/github-user-language-breakdown | TypeScript | Byte-size aggregation (`/api/langs`) | No — frontend app; backend not audited but calls generic language API |
|
||||
| **madnight/githut** | github.com/madnight/githut | JavaScript | Google BigQuery public GitHub dataset | No — aggregate repo stats, not per-user commits |
|
||||
|
||||
---
|
||||
|
||||
## Detailed findings
|
||||
|
||||
### jstrieb/github-stats
|
||||
- **Active:** Yes (last push 2026-04-18, 3.4K stars)
|
||||
- **What it does:** Python CLI → GraphQL user repos → per-repo `languages(first:10, orderBy:{SIZE})` → accumulate byte-size
|
||||
- **Commit attribution:** Explicitly does NOT; source has TODO comment: `# TODO: Improve languages to scale by number of contributions to`
|
||||
- **Cost:** 1 GraphQL query per 100 repos
|
||||
- **Novel:** Handles private repos via token; otherwise standard byte-size approach
|
||||
- **Verdict:** Aware of the problem, chose not to solve it (likely due to REST rate-limit concerns)
|
||||
|
||||
### anmol098/waka-readme-stats
|
||||
- **Active:** Yes (last push 2026-04-14, 3.9K stars)
|
||||
- **What it does:** GitHub Action → fetches WakaTime API → displays editor time-in-language breakdown
|
||||
- **Commit attribution:** **Yes** — but only if user has WakaTime installed and active
|
||||
- **Cost:** WakaTime telemetry (user's editor plugin); no GitHub API calls for language stats
|
||||
- **Solves blog repo problem?** YES, because WakaTime tracks actual editor time, not bytes. A user editing 3 JS files in a Markdown blog gets attributed to JS only if they actually spent time in JS editor.
|
||||
- **Limitation:** Requires WakaTime setup; doesn't work offline; not a pure GitHub solution
|
||||
- **Verdict:** Different UX. Solves the problem *orthogonally* — doesn't use GitHub language API at all.
|
||||
|
||||
### athul/waka-readme
|
||||
- **Active:** Yes (last push 2026-02-18, 1.8K stars)
|
||||
- **What it does:** Simpler WakaTime wrapper; GitHub Action fetches WakaTime API only
|
||||
- **Commit attribution:** **Yes** — same as anmol098, via WakaTime editor telemetry
|
||||
- **Verdict:** WakaTime alternative; no GitHub language innovation
|
||||
|
||||
### TraceLD/github-user-language-breakdown
|
||||
- **Active:** Yes (last push 2025-02-27, 55 stars)
|
||||
- **What it does:** Frontend (Vite + TypeScript) → calls `/api/langs` backend → returns byte-size breakdown
|
||||
- **Commit attribution:** No — backend not audited; frontend aggregates by bytes
|
||||
- **Verdict:** Small project; no novel approach
|
||||
|
||||
### madnight/githut
|
||||
- **Active:** Inactive (last push 2024-04-03, 1K stars)
|
||||
- **What it does:** Google BigQuery + GitHub public dataset → aggregate language stats across all public repos
|
||||
- **Commit attribution:** No — designed for ecosystem trends, not per-user stats
|
||||
- **Verdict:** Enterprise-scale analysis tool; not relevant to individual profile cards
|
||||
|
||||
### yoshi389111/github-profile-3d-contrib
|
||||
- **Active:** Yes (last push 2026-04-15, 1.6K stars)
|
||||
- **What it does:** 3D contribution calendar visualization
|
||||
- **Language stats:** N/A — contributions only
|
||||
- **Verdict:** Orthogonal to language problem
|
||||
|
||||
---
|
||||
|
||||
## The mysterious per-commit CLI
|
||||
|
||||
DEV.to post by maxfriedmann (Feb 2026): "I built a CLI to see my real GitHub language stats — does something like this already exist?"
|
||||
|
||||
> *"scanning every commit you've personally authored on GitHub — including private repos — and calculates how many lines you've changed per programming language"*
|
||||
|
||||
- **Repo:** Could not locate in public GitHub
|
||||
- **Likely approach:** REST `GET /repos/{o}/{r}/commits` + parse diff → linguist/go-enry classify files → aggregate lines per language
|
||||
- **If it exists:** This is exactly the REST per-commit + linguist approach proposed in prior report
|
||||
- **Status:** Appears to be personal/private project or lost to time
|
||||
- **Significance:** Validates that the proposed approach is feasible and novel enough to be noteworthy 4 months ago
|
||||
|
||||
---
|
||||
|
||||
## Language classification ecosystem — current state
|
||||
|
||||
| Approach | Maturity | Solves commit problem? | Cost | Trade-offs |
|
||||
|----------|----------|----------------------|------|------------|
|
||||
| **Byte-size (GitHub default)** | Stable, no code needed | No | GraphQL 1 call/100 repos | Simple; fundamentally broken for mixed-language repos |
|
||||
| **Repository language count** | Stable (vn7n24fzkq) | No (only counts repo count, not commits) | Same as above | Slightly less broken; still size-biased |
|
||||
| **WakaTime editor telemetry** | Requires opt-in | Yes, but not GitHub-only | User's telemetry; 0 GitHub API calls | Accurate; private; off-chain; requires user setup |
|
||||
| **REST per-commit + go-enry** | Not yet packaged; proposed | Yes (90%–95% accuracy) | 1 REST call/commit (100/hr budget) | Fast; no clone; extension-limited; no `.gitattributes` support |
|
||||
| **REST per-commit + go-enry + `.gitattributes`** | Proposed (this project) | Yes (95%–98% accuracy) | +1 REST call/repo for attrs | Same + minimal overhead for ~5% accuracy gain |
|
||||
| **Clone + linguist Ruby gem** | Stable (lowlighter/metrics) | Yes (99% accuracy) | 15 sec timeout; storage | Accurate; slow; heavy; clones entire repo |
|
||||
| **Clone + linguist-js** | Stable (lowlighter/metrics) | Yes (99% accuracy) | 15 sec timeout; storage | Same as Ruby gem |
|
||||
|
||||
---
|
||||
|
||||
## Did anyone solve it better?
|
||||
|
||||
**No.** The landscape is:
|
||||
1. **Byte-weighted GitHub stats** — easy, broken, everyone does it (GRS, jstrieb, others)
|
||||
2. **WakaTime editor telemetry** — orthogonal; requires opt-in; doesn't use GitHub API
|
||||
3. **Cloning repos** — accurate but slow (lowlighter/metrics)
|
||||
4. **REST per-commit + go-enry** — middle ground, not yet packaged as a standalone tool
|
||||
|
||||
**Null result:** No project uses `GET /repos/{o}/{r}/commits/{sha}` + go-enry/linguist for per-commit classification and packages it as a reusable tool. The DEV.to CLI mentions this exists but repo not found in public GitHub. This suggests either: (a) it's private/personal; (b) abandoned; (c) author hasn't open-sourced it.
|
||||
|
||||
---
|
||||
|
||||
## Implication for ghstats
|
||||
|
||||
**Prior recommendation stands.** REST per-commit + go-enry is:
|
||||
- **Frontier-tier** — no packaged competitor exists yet
|
||||
- **Feasible** — go-enry is performant; REST budgets fit; no cloning overhead
|
||||
- **Accurate enough** — 90–95% for extension-only; 95–98% with `.gitattributes`
|
||||
- **Testable** — can validate against lowlighter/metrics cloned results (regression test)
|
||||
|
||||
**Action:** Proceed with REST per-commit + go-enry implementation for ghstats v1. Add `.gitattributes` override as Phase 2 if accuracy feedback demands it.
|
||||
|
||||
---
|
||||
|
||||
## New ideas surfaced
|
||||
|
||||
- **Idea A:** Could reach out to maxfriedmann (DEV.to) to find/acquire their per-commit CLI code if it's actually been built. Might skip months of engineering.
|
||||
- **Idea B:** Offer ghstats as a GitHub Action alternative to WakaTime for users who don't want editor telemetry but want accurate stats. Differentiate: "GitHub-only, no telemetry setup, REST-fast."
|
||||
- **Idea C:** Add a `.gitattributes` fetcher as an optional HTTP call per repo; toggle via config. Minimal cost for significant accuracy gain on projects that use `linguist-*` directives.
|
||||
|
||||
---
|
||||
|
||||
## Unresolved questions
|
||||
|
||||
- **What repo is the DEV.to per-commit CLI?** Could it be claimed, forked, or improved?
|
||||
- **Should ghstats include Prose (Markdown)?** Default to Programming+Markup only, with opt-in for Prose?
|
||||
- **How to handle repos with zero user commits?** Skip, count PR reviews, or leave blank?
|
||||
- **Fallback behavior if go-enry can't classify a file?** Count as "Other" or skip?
|
||||
- **Should `.gitattributes` parsing be v1 or v2 feature?** (Adds 1 REST call/repo; ~5% accuracy gain)
|
||||
|
||||
---
|
||||
|
||||
**Sources:**
|
||||
- [jstrieb/github-stats](https://github.com/jstrieb/github-stats)
|
||||
- [anmol098/waka-readme-stats](https://github.com/anmol098/waka-readme-stats)
|
||||
- [athul/waka-readme](https://github.com/athul/waka-readme)
|
||||
- [yoshi389111/github-profile-3d-contrib](https://github.com/yoshi389111/github-profile-3d-contrib)
|
||||
- [TraceLD/github-user-language-breakdown](https://github.com/TraceLD/github-user-language-breakdown)
|
||||
- [madnight/githut](https://github.com/madnight/githut)
|
||||
- [I built a CLI to see my real GitHub language stats (DEV.to)](https://dev.to/maxfriedmann/i-built-a-cli-to-see-my-real-github-language-stats-does-something-like-this-already-exist-1n18)
|
||||
- [go-enry/go-enry](https://github.com/go-enry/go-enry)
|
||||
- [GitHub Docs: About repository languages](https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/about-repository-languages)
|
||||
Reference in New Issue
Block a user