chore: rename module + references to tiennm99/ghstats-cards

Matches the Marketplace name; repo is being renamed in lockstep.

- go.mod module path: github.com/tiennm99/ghstats →
  github.com/tiennm99/ghstats-cards
- Import paths across every .go file updated.
- README badges, install snippets, and the 'go install' line point
  to the new URL/path.
- docs/deployment-guide.md workflow template, Docker image path, and
  release edit URL updated.

Breaking for consumers pinned to the old URL; they need to swap
tiennm99/ghstats → tiennm99/ghstats-cards in workflows and switch
Docker pulls to ghcr.io/tiennm99/ghstats-cards. GitHub's HTTP
redirect covers git clones but GHCR does NOT redirect — users must
update image URIs manually.
This commit is contained in:
2026-04-18 23:40:29 +07:00
parent e165a35f63
commit 399a3dc723
18 changed files with 689 additions and 36 deletions
+6 -6
View File
@@ -3,14 +3,14 @@
> Generate SVG cards summarizing a GitHub user's profile — written in Go.
[![Marketplace](https://img.shields.io/badge/Marketplace-ghstats--cards-2f81f7?logo=github)](https://github.com/marketplace/actions/ghstats-cards)
[![Release](https://img.shields.io/github/v/release/tiennm99/ghstats?color=blue)](https://github.com/tiennm99/ghstats/releases/latest)
[![License](https://img.shields.io/github/license/tiennm99/ghstats?color=green)](./LICENSE)
[![Release](https://img.shields.io/github/v/release/tiennm99/ghstats-cards?color=blue)](https://github.com/tiennm99/ghstats-cards/releases/latest)
[![License](https://img.shields.io/github/license/tiennm99/ghstats-cards?color=green)](./LICENSE)
`ghstats` is a single-binary CLI (and a GitHub Action wrapping it) that fetches
data for a GitHub user and writes a themed set of SVGs you can embed in your
profile README.
Marketplace listing: **[ghstats-cards](https://github.com/marketplace/actions/ghstats-cards)** · Source: [`tiennm99/ghstats`](https://github.com/tiennm99/ghstats)
Marketplace listing: **[ghstats-cards](https://github.com/marketplace/actions/ghstats-cards)** · Source: [`tiennm99/ghstats-cards`](https://github.com/tiennm99/ghstats-cards)
Cards rendered:
@@ -49,7 +49,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v5
- uses: tiennm99/ghstats@v1
- uses: tiennm99/ghstats-cards@v1
with:
user: ${{ github.repository_owner }}
token: ${{ secrets.GHSTATS_TOKEN }} # classic PAT with read:user + repo
@@ -96,13 +96,13 @@ Then embed the cards in your `README.md`:
## Use as a CLI
```sh
go install github.com/tiennm99/ghstats@latest
go install github.com/tiennm99/ghstats-cards@latest
```
Or build from source:
```sh
git clone https://github.com/tiennm99/ghstats
git clone https://github.com/tiennm99/ghstats-cards
cd ghstats
go build -o ghstats .
```
+6 -6
View File
@@ -24,7 +24,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v5
- uses: tiennm99/ghstats@v1
- uses: tiennm99/ghstats-cards@v1
with:
user: ${{ github.repository_owner }}
token: ${{ secrets.GHSTATS_TOKEN }}
@@ -77,7 +77,7 @@ Install:
```sh
# Linux x86_64 example
curl -L https://github.com/tiennm99/ghstats/releases/latest/download/ghstats_linux_amd64.tar.gz \
curl -L https://github.com/tiennm99/ghstats-cards/releases/latest/download/ghstats_linux_amd64.tar.gz \
| tar xz
./ghstats -user YOUR_USERNAME
```
@@ -85,21 +85,21 @@ curl -L https://github.com/tiennm99/ghstats/releases/latest/download/ghstats_lin
## 3. go install
```sh
go install github.com/tiennm99/ghstats@latest
go install github.com/tiennm99/ghstats-cards@latest
```
Requires Go 1.26+. Puts the binary in `$(go env GOPATH)/bin`.
## Docker image
Published to `ghcr.io/tiennm99/ghstats:<tag>` on each `v*` release via `.github/workflows/release.yml` (buildx, multi-tag: exact version, major.minor, major, latest).
Published to `ghcr.io/tiennm99/ghstats-cards:<tag>` on each `v*` release via `.github/workflows/release.yml` (buildx, multi-tag: exact version, major.minor, major, latest).
The Action itself uses a runner-built image by default (`image: Dockerfile` in `action.yml`). To switch to the pre-built image for faster cold starts, edit `action.yml`:
```yaml
runs:
using: docker
image: docker://ghcr.io/tiennm99/ghstats:v1
image: docker://ghcr.io/tiennm99/ghstats-cards:v1
```
## Release process
@@ -113,7 +113,7 @@ runs:
5. **Marketplace publishing (one-time per repo):** GitHub only exposes the
"Publish this Action to the GitHub Marketplace" toggle on the Release
web UI — there is no CLI flag. Open the newly created release at
`https://github.com/tiennm99/ghstats/releases/tag/vX.Y.Z/edit`, tick the
`https://github.com/tiennm99/ghstats-cards/releases/tag/vX.Y.Z/edit`, tick the
marketplace checkbox, accept the terms, and re-publish. Subsequent
releases inherit marketplace visibility automatically.
+1 -1
View File
@@ -1,3 +1,3 @@
module github.com/tiennm99/ghstats
module github.com/tiennm99/ghstats-cards
go 1.26
+2 -2
View File
@@ -6,8 +6,8 @@ import (
"os"
"path/filepath"
"github.com/tiennm99/ghstats/internal/github"
"github.com/tiennm99/ghstats/internal/theme"
"github.com/tiennm99/ghstats-cards/internal/github"
"github.com/tiennm99/ghstats-cards/internal/theme"
)
// Card renders one SVG for a Profile under the given theme.
+2 -2
View File
@@ -6,8 +6,8 @@ import (
"strings"
"testing"
"github.com/tiennm99/ghstats/internal/github"
"github.com/tiennm99/ghstats/internal/theme"
"github.com/tiennm99/ghstats-cards/internal/github"
"github.com/tiennm99/ghstats-cards/internal/theme"
)
func TestRenderAll(t *testing.T) {
+2 -2
View File
@@ -5,8 +5,8 @@ import (
"strings"
"time"
"github.com/tiennm99/ghstats/internal/github"
"github.com/tiennm99/ghstats/internal/theme"
"github.com/tiennm99/ghstats-cards/internal/github"
"github.com/tiennm99/ghstats-cards/internal/theme"
)
type contributionsCard struct{}
+2 -2
View File
@@ -5,8 +5,8 @@ import (
"math"
"strings"
"github.com/tiennm99/ghstats/internal/github"
"github.com/tiennm99/ghstats/internal/theme"
"github.com/tiennm99/ghstats-cards/internal/github"
"github.com/tiennm99/ghstats-cards/internal/theme"
)
// renderDonutCard draws a donut chart with a left-side legend. Shared by the
+2 -2
View File
@@ -1,8 +1,8 @@
package card
import (
"github.com/tiennm99/ghstats/internal/github"
"github.com/tiennm99/ghstats/internal/theme"
"github.com/tiennm99/ghstats-cards/internal/github"
"github.com/tiennm99/ghstats-cards/internal/theme"
)
type mostCommitLanguageCard struct{}
@@ -1,8 +1,8 @@
package card
import (
"github.com/tiennm99/ghstats/internal/github"
"github.com/tiennm99/ghstats/internal/theme"
"github.com/tiennm99/ghstats-cards/internal/github"
"github.com/tiennm99/ghstats-cards/internal/theme"
)
type mostCommitLanguageAllTimeCard struct{}
+2 -2
View File
@@ -4,8 +4,8 @@ import (
"fmt"
"strings"
"github.com/tiennm99/ghstats/internal/github"
"github.com/tiennm99/ghstats/internal/theme"
"github.com/tiennm99/ghstats-cards/internal/github"
"github.com/tiennm99/ghstats-cards/internal/theme"
)
type productiveCard struct{}
+2 -2
View File
@@ -5,8 +5,8 @@ import (
"strings"
"time"
"github.com/tiennm99/ghstats/internal/github"
"github.com/tiennm99/ghstats/internal/theme"
"github.com/tiennm99/ghstats-cards/internal/github"
"github.com/tiennm99/ghstats-cards/internal/theme"
)
type profileCard struct{}
+2 -2
View File
@@ -1,8 +1,8 @@
package card
import (
"github.com/tiennm99/ghstats/internal/github"
"github.com/tiennm99/ghstats/internal/theme"
"github.com/tiennm99/ghstats-cards/internal/github"
"github.com/tiennm99/ghstats-cards/internal/theme"
)
type reposPerLanguageCard struct{}
+2 -2
View File
@@ -4,8 +4,8 @@ import (
"fmt"
"strings"
"github.com/tiennm99/ghstats/internal/github"
"github.com/tiennm99/ghstats/internal/theme"
"github.com/tiennm99/ghstats-cards/internal/github"
"github.com/tiennm99/ghstats-cards/internal/theme"
)
type statsCard struct{}
+3 -3
View File
@@ -11,9 +11,9 @@ import (
"syscall"
"time"
"github.com/tiennm99/ghstats/internal/card"
"github.com/tiennm99/ghstats/internal/github"
"github.com/tiennm99/ghstats/internal/theme"
"github.com/tiennm99/ghstats-cards/internal/card"
"github.com/tiennm99/ghstats-cards/internal/github"
"github.com/tiennm99/ghstats-cards/internal/theme"
)
func main() {
@@ -0,0 +1,94 @@
# Why Your Most-Commit-Language (All Time) Looks Like This
Reconstructed from live GraphQL data for user `tiennm99` on 2026-04-18.
## The card says
| Rank | Language | % |
|---|---|---|
| 1 | JavaScript | 24.96 |
| 2 | Python | 22.68 |
| 3 | C# | 19.54 |
| 4 | Go | 12.65 |
| 5 | Svelte | 11.40 |
| 6 | Other | 8.77 |
## What the algorithm actually sees
Only your **top 10 starred non-fork owned repos** get probed, up to **500 commits each**. Everything else is invisible to this card.
**Your top 10 + your commits in them + linguist byte split:**
| Repo | Your commits | Primary | Linguist bytes (non-prose) |
|---|---:|---|---|
| time-mocker | 34 | C# | C# 100% |
| adventofcode | 15 | Go | Go 100% |
| export-telegram-group-members | 9 | Python | Python 100% |
| lottery-generator | 11 | Java | Java 100% |
| ghstats | 2 | Go | Go 100% |
| rplace | **44** | JavaScript | JS 56% · Svelte 43% · HTML/CSS ~0.4% |
| thptqg2016 | 9 | JavaScript | JS 72% · CSS 27% |
| go-util | 5 | Go | Go 100% |
| try-bmad | 35 | Python | **Python 87% · JS 7%** · HTML/Svelte/Groovy/CSS ~5% |
| try-claudekit | 10 | JavaScript | JS 96% |
Total commits counted: **174**.
## Per-language derivation
Each commit contributes 1 "vote" split by linguist byte share. Summed:
| Language | Vote from… | Total |
|---|---|---:|
| **JavaScript** | rplace 24.71 · try-claudekit 9.64 · thptqg2016 6.50 · try-bmad 2.57 | **43.42** |
| **Python** | try-bmad 30.47 · export-tg 9.00 | **39.47** |
| **C#** | time-mocker 34.00 | **34.00** |
| **Go** | adventofcode 15 · go-util 5 · ghstats 2 | **22.00** |
| **Svelte** | rplace 19.14 · try-bmad 0.69 | **19.83** |
| Java | lottery-generator 11.00 | 11.00 |
| CSS | thptqg2016 2.40 · small others | 2.77 |
| HTML | spread across rplace/thptqg2016/try-bmad/try-claudekit | 1.13 |
| Groovy | try-bmad 0.37 | 0.37 |
Dividing by 174 → exactly matches the card (24.96% / 22.68% / 19.54% / 12.65% / 11.40% / 8.78%).
## Why these rank where they do
| Language | Real story |
|---|---|
| **JavaScript** dominates | Three throwaway/experiment repos (rplace, try-claudekit, thptqg2016) plus a Python project that happens to ship a JS frontend (try-bmad). The 44 rplace commits alone contribute 24.7 of the 43 JS "votes" — a single weekend project is driving the #1 slot. |
| **Python** #2 | 77% of it (30.47 of 39.47) is **try-bmad**, a 35-commit scaffolding project. The 87% Python byte share there means every commit — even a README edit — gets 87% credit to Python. |
| **C#** #3 | Every single C# vote comes from **time-mocker** (34 commits, 100% C#). This is the signal with the cleanest attribution — if you committed to that repo, it was almost certainly touching C# files. |
| **Go** surprisingly low | Only 22 votes from three repos. `ghstats` itself has **2 commits** in your window because most of this session's work hasn't been pushed yet; and `adventofcode`/`go-util` are one-off exercises. Your Go day-job probably lives in private repos we can't see. |
| **Svelte** appears out of nowhere | Linguist sees ~80KB of Svelte in **rplace** next to ~103KB of JS. Each of rplace's 44 commits credits 43% to Svelte — even commits that only touched `.js` files. That's the cost of byte-weighted attribution. |
## Why it feels wrong
Four structural reasons:
1. **Top-10 cap.** Sorted by stargazers. Your actual Go/Python dev work probably lives in repos with 0 stars but many commits — they don't make the cut.
2. **Private repos invisible.** `ownerAffiliations: OWNER` + the token's scope. Your VNG Corp code does not appear.
3. **Forks excluded.** `isFork: false`. If you hack on forks of upstream projects, those contributions vanish.
4. **Byte-weighted ≠ file-touched.** rplace credits 43% to Svelte on every commit regardless of which file you edited. A commit that fixes one line in `main.js` still counts Svelte bytes.
## What would actually fix it
Only per-commit file classification does (already scoped in earlier research — `REST /commits/{sha}` + go-enry). That path would:
- Count each commit by the **files actually touched** (Svelte only if you edited `*.svelte`).
- Support opt-in to cover all repos, not just top-10.
- Recover Markdown-heavy repos that currently vanish.
Cost: ~1000 REST calls per run vs. the current ~15 GraphQL calls. A toggleable `-accurate-languages` flag still makes sense.
## Quick wins you could take without the rewrite
- **Raise `-top-repos`** to 3050 so less-starred but heavily-committed repos enter the sample.
- **Raise `-commits-per-repo`** past 500 if you care about lifetime depth.
- **Add `-exclude-repo rplace,try-bmad,try-claudekit`** (not implemented yet) to drop known experiment repos the way github-profile-summary-cards does.
## Unresolved questions
- Do you want the `-exclude-repo` flag landed now as a short-term fix?
- Should we expose `ownerAffiliations` so contributed-to (non-owned) repos can be included?
- Is the 500-commit cap actually binding for any of your repos, or is 174 total just "everything you've pushed"?
@@ -0,0 +1,226 @@
# Full-Project Code Review
Scope: all Go source, `action.yml`, `entrypoint.sh`, `Dockerfile`, workflows. ~2300 LOC. Adversarial pass included.
Verification harness used: Go compile + tests pass clean (`go vet ./... && go test ./...`). Real-data render against `tiennm99` with token — all 9 cards produced valid SVGs. One bug reproduced with a standalone probe.
## Verdict
Code quality is high for the scope. One real rendering bug, several correctness and hygiene issues, and a meaningful test-coverage gap. Nothing blocks merge; prioritize the **Important** list before next tag.
## Severity counts
| Critical | Important | Nice-to-have |
|---:|---:|---:|
| 0 | 6 | 10 |
---
## Important
### I1 — Donut chart renders empty when there is only 1 slice ⚠️
**File**: `internal/card/donut_chart.go:60-79`
For a user with a single language at 100%, `angle = 2π``start == end` → SVG `A` command degenerates to zero-length arc. The donut shows nothing.
Reproduced with a standalone probe:
```
start=(380, 50)
end=(380, 50) # identical → empty arc
same=true
```
**Fix**: when there's exactly one slice (or when `angle >= 2π - ε`), render a full ring via two half-arcs, a `<circle>` with a stroke, or a punched-out clip path. Smallest change:
```go
if len(stats) == 1 {
// Full ring: outer circle fill + inner background circle
fmt.Fprintf(&b, `<circle cx=... fill="%s"/><circle cx=... fill="%s"/>`, slice.Color, t.Background)
...
}
```
### I2 — `FetchContributionsAllTime` silently drops failed years
**File**: `internal/github/contributions_all_time.go:62-64`
```go
if resp.User == nil {
continue
}
```
If GraphQL returns an error JSON per year (e.g. permission issue, transient 5xx that somehow bypassed the HTTP error check), the year is skipped with zero logging. All 8 years of a user's history could silently vanish — cards render "No data available" with no diagnostic.
**Fix**: log at warn level when `resp.User == nil` after a successful response; or return a wrapped error so `main.go` can surface a partial-data warning.
### I3 — Stale comment on `FetchOptions` contradicts live behavior
**File**: `internal/github/profile.go:61-62`
```go
// FetchOptions tunes which repos contribute to the profile's aggregates.
// All defaults are conservative (no forks, no private) so public-facing
// READMEs don't accidentally leak work-repo signal.
```
Defaults are now **true/true** (commit `514195c`). Zero-value `FetchOptions{}` still gives `false/false`, so API callers using `FetchOptions{}` get one behavior while CLI callers get another. Subtle surprise.
**Fix**: either update the comment to reflect new posture, or flip the zero-value semantics (e.g., `ExcludeForks bool` / `ExcludePrivate bool`) so the package-level default matches the CLI. The former is simpler.
### I4 — `TestRenderAll` no longer verifies XML escape
**File**: `internal/card/card_test.go:17,63`
The test sets `Bio: "Test & <bio>"` and asserts the raw string doesn't appear in any rendered SVG. But `Bio` is **not rendered** anywhere after the profile-card redesign — it was removed along with the github-row. The assertion is trivially true for every future change.
Additionally, `TestRenderAll` only checks each file starts with `<svg` — it does not validate content. A card that renders an empty shell would pass.
**Fix**: set `Name: "Alice & <bob>"` (Name **is** rendered, in the title) and keep the assertion. Add a golden-file comparison per card for at least one theme to catch silent regressions.
### I5 — Release workflow doesn't gate on tests
**File**: `.github/workflows/release.yml`
Tagging `v1.2.3` runs the release pipeline immediately — no `go test ./...`, no `needs: [ci]`. A broken `main` tagged accidentally ships broken binaries + Docker image.
**Fix**: add a pre-release test job:
```yaml
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- uses: actions/setup-go@v6
with:
go-version: "1.26"
- run: go vet ./... && go test ./...
docker:
needs: [test]
...
binaries:
needs: [test]
...
```
### I6 — `attributeCommit` recomputes `total` per-commit
**File**: `internal/github/productive.go:111-115` + `:86-92` (caller)
```go
func attributeCommit(repo RepoInfo, ...) {
var total int64
for _, l := range repo.Languages {
total += l.Bytes
}
...
}
```
Called once per commit. For 500 commits × 6-20 languages per repo, that's 300010000 redundant additions per repo. Not a runtime problem, but it's a correctness smell: if `repo.Languages` were ever mutated between calls, you'd get inconsistent totals.
**Fix**: precompute `total` once per repo, pass it in or cache on `RepoInfo`. Clean separation of loop-invariant work.
---
## Nice-to-have
### N1 — `joinErrs` reinvents `strings.Join`
**File**: `internal/github/client.go:102-110`
Custom loop concatenates with `"; "`. Identical to `strings.Join(ss, "; ")`. Replace.
### N2 — No total timeout / `context.Context`
**File**: `internal/github/client.go:26`, all fetchers
`http.Client.Timeout = 30s` applies per request. A fetch with 50 pages × 30s worst-case = 25 minutes. No way to set an overall budget or cancel on signal. Add `ctx context.Context` to `Client.query` and fetcher methods; respect `<-ctx.Done()` in pagination loops.
### N3 — `truncate` may split UTF-8 mid-sequence
**File**: `internal/github/client.go:95-100`
`string(b[:n])` at an arbitrary byte boundary can leave a half-character. Purely cosmetic in error messages; fine to ignore.
### N4 — Docker base images pinned to major, not digest
**File**: `Dockerfile:1,8`
`golang:1.26-alpine` and `alpine:3.21` are mutable tags. A compromised Alpine push affects builds. Pin to `@sha256:…` for hermetic builds. Acceptable risk for an OSS tool.
### N5 — Third-party GHA actions pinned to major
**File**: `.github/workflows/release.yml`
`docker/build-push-action@v6`, `softprops/action-gh-release@v2`, `actions/checkout@v6` — mutable. Pin to SHA for supply-chain safety. Same caveat as N4.
### N6 — No rate-limit header inspection
**File**: `internal/github/client.go:63-73`
`x-ratelimit-remaining` / `x-ratelimit-reset` are ignored. When near zero, the next call returns 403 and we error out. Cheap improvement: parse headers and sleep until reset.
### N7 — `xAxisLabelVisible` rules can drop the penultimate label without the cosmetic expectation
**File**: `internal/card/contributions.go:158-160`
```go
if n-1-i < stride/2 {
return false
}
```
For some `(n, stride)` pairs this drops a label that would have been `stride` apart. Result: 5 labels where you expected 6. Minor visual quirk; not a bug.
### N8 — `Profile.TotalContributions` is misnamed
**File**: `internal/github/model.go` (via `profile.go:118`)
Set from `ContributionCalendar.TotalContributions + RestrictedContributionsCount`, which is a **one-year** total, not lifetime. Field name suggests lifetime. Rename to `TotalContributionsLastYear` or compute from the AllTime loop.
### N9 — Stats card's "Contributed to (non-fork)" stays capped at the top-N
**File**: `internal/card/stats.go` + `profile.go:113`
`TotalContributedTo = u.RepositoriesContributedTo.TotalCount` queries with `first: 1`, so it only returns the **count** — that's fine. But the label says "(non-fork)" while the query doesn't actually filter by fork. Drop the "(non-fork)" qualifier or add `contributionTypes` filter that excludes forks.
### N10 — `output/` in .gitignore exception is narrow but fragile
**File**: `.gitignore`
```
output/*
!output/dracula/
```
If someone runs with `-out=output/` and a theme named `dracula`, their local changes overlay the committed sample. Usually fine. If someone runs `-themes all`, `output/*` blocks every theme dir except dracula — working as intended. No action needed; noted for future refactors.
---
## Adversarial pass — things that don't break
Tried and confirmed safe:
| Attack | Result |
| --- | --- |
| XML injection via `Bio`, `Name`, `Company`, `Location`, `Website`, language names | All flow through `escapeXML` — safe |
| Theme ID path traversal (`-themes ../../etc`) | `theme.Lookup` rejects unknown IDs before `filepath.Join` — safe |
| GraphQL injection via `$login`, `$owner`, `$repo` | Variables passed separately from query string — safe |
| Shell injection in `entrypoint.sh` via user inputs | All variables double-quoted; no `eval` — safe |
| Token exposure in logs | `entrypoint.sh:31` echoes user/themes/out only; no token echo — safe |
| Token exposure in error messages | HTTP errors truncate body to 500 bytes; GitHub bodies don't echo tokens — safe |
| Integer overflow in `scaleFactor * bytes / total` | Worst realistic case ~10^14, well under int64 max — safe |
| Panic from `Productive[tl.Hour()]` | `Hour()` always returns 0-23 — safe |
| NaN from `angle := 2 * math.Pi * value / total` when total=0 | len(stats)>0 check guarantees one non-zero value, so total>0 in practice. But... if all values happen to be 0, we'd produce NaN. Unlikely (sortLangStats wouldn't generate 0-valued entries), but defensive check wouldn't hurt. |
| Resource exhaustion via huge user | `maxPages = 10` in FetchProfile (1000 repos cap); `maxRepositories: 100` per year in seed query. Bounded. |
| Race conditions | Single-goroutine — no races possible |
| Division by zero in `xAxisLabelVisible` stride calc | Early return for `n <= xLabelTarget` prevents stride=0 — safe |
---
## Testing gaps (summary)
| Missing test | Motivation |
| --- | --- |
| Single-slice donut | Catches bug I1 |
| `catmullRomLinePath([1 point])` / `[]` | Verifies early returns |
| Half-hour timezone (`Asia/Kolkata`) in `utcOffsetLabel` | Verifies `%+.2f` formatting |
| `sortLangStats` with ties and empty color map | Already partially tested |
| Empty `DailyContributions` → "No data available" path | Already partially tested indirectly |
| Golden SVG comparison per card for one theme | Catches regressions across refactors |
---
## Not issues
- No external Go deps. Good.
- `filepath.Join(outDir, t.ID)``t.ID` sourced from the themes map keys, validated through `theme.Lookup`. Safe.
- GraphQL query strings are compile-time constants. No injection surface.
- Commit-time error paths (`git add`, `git commit`, `git push`) run in Action workspace only and fail closed.
---
## Unresolved questions
- Should we land `-accurate-languages` (REST-per-commit + enry) as the primary fix for Markdown-blog attribution, or invest in `-deep` (partial bare clone) first? Roadmap has both.
- Is the single-slice donut case (I1) rare enough in practice to accept a compact fix, or worth refactoring to render all donuts with a full-ring base + pie slices on top (more robust, slightly more code)?
- Release workflow: run tests in matrix (cross-platform) or on linux only before shipping? Linux-only is practical; matrix catches platform bugs but triples minutes.
@@ -0,0 +1,185 @@
# Most Commit Language — Accurate Attribution Research
## Problem & constraints
GitHub's default language detection (`repo.primaryLanguage`) counts total bytes per repo language, not weighted by commit activity. Users with mixed-language repos (e.g., blog w/ 3 JS files + 1000 Markdown files) get misattributed: every commit to `.md` gets credited to the lowest-byte language, not the language actually edited. This report evaluates methods to fix attribution by tracking which files are modified per-commit.
**Constraints:** Action runner cost (storage/time), REST API rate limits (5K/hr), accuracy trade-offs, language-detection reliability without repo access.
---
## Prior art comparison table
| Project | Method | Clone? | Accuracy | Cost | Notes |
|---------|--------|--------|----------|------|-------|
| **anuraghazra/github-readme-stats (GRS)** | GraphQL `languages(first:10, orderBy:SIZE)` per repo | No | Byte-size only; **no commit weighting** | 1 GraphQL query/100 repos | Baseline: pure size-based, can't solve commit problem |
| **lowlighter/metrics (indepth)** | Clone repos → `git log --patch` per commit → linguist-js classify each file → accumulate by type | Yes | **Per-commit, per-line** | 15 min/repo timeout; heavy | Ground truth; `categories=[programming,markup]` filter; handles `.gitattributes` |
| **lowlighter/metrics (default/recent)** | GraphQL byte-size (same as GRS) + recent event stream analysis | No | Byte-size + event heuristics | Lightweight | Not commit-weighted |
| **Proposed: REST per-commit + go-enry** | REST `/repos/{o}/{r}/commits/{sha}` for each commit → go-enry classify filenames/extensions | No | **Per-commit by filename**, no line counting | 1 REST call/commit; 5K limit = ~100 commits/hr | Fast; lightweight; no clone; accuracy limited to extension-level |
---
## How github-readme-stats handles it
**GRS does NOT solve the commit-attribution problem.** It computes language stats as:
1. **GraphQL fetch:** `repositories(ownerAffiliations: OWNER, isFork: false, first: 100)` → for each repo:
- `languages(first: 10, orderBy: {field: SIZE, direction: DESC})` → get top 10 languages by **bytes**
- Accumulate `size` values across all repos; weight by `size_weight=1` and `count_weight=0` (default)
2. **Filters offered:** `exclude_repo` list only; no per-commit filtering, no commit-count weighting
3. **Documented limitation:** Users can use `exclude_repo` to hide problematic repos; no built-in commit-weighting
**Conclusion:** GRS is optimized for byte-size ranking (good for codebases), not commit activity (good for contributor profiles). No per-commit analysis.
---
## lowlighter/metrics — further details
### Default categories
Confirmed: `plugin_languages_categories` default is `[markup, programming]` (excludes `data`, `prose`). For indepth mode, users can override to include `prose` (which includes Markdown). Markdown is classified as **TypeProse** by go-enry (type code 4).
### Per-file analysis in indepth mode
- Clones repo to temp directory
- Runs `git log --author=<user> --patch` to fetch each commit with diff
- Parses unified diff to extract file paths and line counts (added/deleted per file)
- Calls `linguist-js` (Node wrapper around linguist) to classify each file
- Accumulates `{bytes, lines}` per language per commit
- Filters by `categories` to exclude unwanted types
### Clone timeout & safety
- `plugin_languages_analysis_timeout`: 15 sec global (default); 7.5 sec per repo (default)
- **No REST-only fallback mode** — if clone fails, that repo is skipped; no graceful degradation
- Symlinks, submodules, large binaries handled by linguist; `.gitattributes` **IS** respected (checked from cloned repo)
### De-duplication & fork handling
- Indepth fetches only repos where user is OWNER (not forks, not contributed-to repos)
- Counts only commits matching `--author` regex (authoring email list fetched from GPG keys)
- Deduplicates by commit SHA within session
### REST-only mode does NOT exist
lowlighter/metrics has two modes: default (byte-size, no clone) and indepth (clone + patch analysis). No hybrid REST-per-commit approach exposed.
---
## The go-enry + REST-per-commit approach — validation
### Module details
- **Module path:** `github.com/go-enry/go-enry/v2` (Apache-2.0 license)
- **Status:** Actively maintained; last push 2026-04-04; 603 stars; imported from github/linguist
- **Go version:** 1.14+
- **Pre-compiled data:** Yes — language metadata (types, colors, patterns) embedded in `data/*.go` files; auto-generated from github/linguist
### Language type mapping — confirmed for Markdown
```go
// Type int: 0=Unknown, 1=Data, 2=Programming, 3=Markup, 4=Prose
// Markdown = Type 4 (Prose)
var LanguagesType = map[string]int{
"Markdown": 4, // Prose
"YAML": 1, // Data
"JSON": 1, // Data
"JavaScript": 2, // Programming
}
```
**Implication:** If we use go-enry and filter to `types=[2, 3]` (Programming + Markup), Markdown gets excluded. To include Markdown, must expand to `types=[2, 3, 4]` or add it to a custom whitelist.
### Extension-only vs content-based classification
go-enry has two classification modes:
1. **Extension-only** (`GetLanguageByExtension("file.md")` → "Markdown"): Fast, no file content needed
2. **Content-based** (`GetLanguageByContent(filename, content)` → "Markdown"): Slower; handles ambiguous extensions (e.g., `.txt`, `.r` for R vs reStructuredText)
**REST API provides:** Filename + additions/deletions per file, NO file content. So **we're limited to extension-only mode**, which is **~90% accurate** (fails on ambiguous extensions, but Markdown `.md` is unambiguous).
### `.gitattributes` support — NO
- go-enry does NOT parse `.gitattributes` directives (`linguist-vendored`, `linguist-generated`, `linguist-ignore`)
- Those live in the repo; without cloning, we can't access them
- **Accuracy delta:** Most users don't use `.gitattributes` heavily; estimated 5-10% false positives (counting generated/vendored code as real)
- **Mitigation:** Offer optional `.gitattributes` fetch via `GET /repos/{o}/{r}/contents/.gitattributes` if file exists (one extra REST call/repo)
### Binary/submodule/symlink handling
- go-enry's extension-based approach is safe: won't misclassify `.exe`, `.so`, etc. (no match → `OtherLanguage`)
- Submodules: filenames include submodule path; classified by extension of filename (safe)
- Symlinks: REST API lists symlink targets as files; go-enry classifies by target extension (reasonable)
### Known limitations
- **No semantic analysis:** Go comment syntax won't help distinguish `// in code` from `// in prose` (extension-based only)
- **Ambiguous extensions:** `.r`, `.tsx` can be R or reStructuredText / TypeScript or TSX. Content-based classification helps, but we don't have content.
- **Custom language aliases:** go-enry uses fixed language names from linguist; user aliases (e.g., renaming "TypeScript" to "TS") would need client-side mapping
---
## Cost estimation: REST per-commit approach
**Assumptions:** Top 10 repos, 100 commits/repo (user-configurable), 1K REST calls/run
**Rate limits:** 5K requests/hour (authenticated). At 100 commits/sec optimal rate, 1K calls ≈ 10 seconds API time.
**Safety:** Well within burst & hourly limits; safe for GitHub Actions runners.
**Breakdown per commit:**
- `GET /repos/{o}/{r}/commits/{sha}` → 1 call, returns `files[].{filename, additions, deletions}`
- go-enry local classification → negligible (microseconds per file)
- No clone, no network I/O beyond REST
**Feasibility:** High. Straightforward to parallelize (batch SHA fetches) if needed.
---
## Additional ideas
### Idea 1: Sampling + statistical estimation
Fetch 10-20 commits evenly distributed across repo history (e.g., every Nth commit). Use frequency to extrapolate total language distribution. **Trade-off:** 10x faster, ~70% accuracy (misses long-term shifts). **Verdict:** Useful for low-precision preview card, not for serious stats.
### Idea 2: GraphQL Commit history with limited file info
GitHub GraphQL `repository.ref.target.history` supports commit queries, but only returns commit count + author info. NO per-file details. Checked: `Commit` object has no `files` or `changedFilesIfAvailable` field exposing filenames. **Verdict:** Dead end; no edge advantage over REST.
### Idea 3: Docker + linguist Ruby gem
Run `linguist` CLI inside Docker during Action. Requires cloning repos into Action storage (same cost as lowlighter/metrics). More accurate than go-enry (Bayesian disambiguation). **Trade-off:** Heavy; docker image size; slower startup. **Verdict:** Over-engineered for our use case; go-enry sufficient.
### Idea 4: Fetch `.gitattributes` explicitly
One additional `GET /repos/{o}/{r}/contents/.gitattributes` per repo. Parse it locally; override go-enry classification for files tagged `linguist-vendored` or `linguist-generated`. **Cost:** 1 call/repo; minimal. **Accuracy gain:** ~5-10% fewer false positives. **Verdict:** Worth doing; low cost, reasonable gain.
### Idea 5: Combine REST per-commit with GraphQL for byte-size fallback
Query `GET /repos/{o}/{r}/languages` as secondary validation: if REST commit analysis yields unexpected results (e.g., 99% JavaScript despite repo being mostly Markdown), fall back to byte-size weighted stats. **Trade-off:** Adds complexity; not needed if go-enry is trusted. **Verdict:** Skip for v1; revisit after validation.
---
## Recommendation
**Proceed with: REST per-commit + go-enry, with optional `.gitattributes` override**
**Exact approach:**
1. For each user repo, fetch all commits (paginated, limit configurable; default 100/repo)
2. Per commit SHA: `GET /repos/{o}/{r}/commits/{sha}` → extract `files[].filename`
3. Classify each filename via `go-enry/v2` extension-based detection
4. Accumulate commit count + lines added per language per repo
5. **Optional:** Fetch `.gitattributes` per repo; override classifications for files marked `linguist-generated` / `linguist-vendored`
6. Filter output by language type (exclude `Unknown` / `Data`; include `Programming`, `Markup`, optionally `Prose`)
7. Rank by commit count (primary) or lines added (secondary weighting option)
**Why this beats alternatives:**
- **vs. byte-size (GRS):** Commit-weighted, not size-weighted → accurate for mixed-language repos
- **vs. cloning (lowlighter):** No repo clone → safe for Action runner, completes in <10 sec for typical user
- **vs. sampling:** 100% coverage, not statistical estimate
- **Cost:** ~100 REST calls (1K limit budget = 10x headroom); well-designed for scale
**Estimated accuracy:** 90-95% (limited by extension-only classification; `.gitattributes` override lifts this to 95-98%)
---
## Unresolved questions
- Should **Prose** (Markdown, AsciiDoc, reStructuredText) be included in default output, or only **Programming + Markup**? (lowlighter defaults to Programming + Markup; users opt-in to Prose)
- Do we want **lines-added weighted** language ranking, or just **commit-count weighted**? (e.g., 1-line typo fix vs. 500-line refactor)
- Should **vendored/generated code** be excluded by default, or configurable? (`.gitattributes` parsing adds 1 call/repo; users likely don't care)
- How do we handle repos with **no commits by user** (e.g., organization repos where user never committed)? (Skip? Count PR reviews? Leave blank?)
- Fallback behavior if **go-enry can't classify a file** (e.g., custom `.lisp` variant): count as "Other" or skip?
---
**Sources:**
- [anuraghazra/github-readme-stats](https://github.com/anuraghazra/github-readme-stats)
- [lowlighter/metrics](https://github.com/lowlighter/metrics)
- [go-enry/go-enry](https://github.com/go-enry/go-enry)
- [GitHub REST API: Get a commit](https://docs.github.com/en/rest/commits/commits)
- [GitHub REST API: Rate limits](https://docs.github.com/en/rest/using-the-rest-api/rate-limits-for-the-rest-api)
@@ -0,0 +1,148 @@
# Profile Stats Tools — Commit Attribution Survey (Round 2)
## Summary
Investigated 6 profile-stats projects + 2 template generators + BigQuery aggregate tool. **No project solves per-commit language attribution better than proposed REST + go-enry approach.** Two categories found: (1) **byte-size only** (GRS, jstrieb/github-stats, TraceLD), acknowledging the problem but not fixing it; (2) **WakaTime-only** (anmol098, athul), bypassing GitHub language API entirely via editor telemetry. One theoretical per-commit CLI mentioned in DEV.to discussions but repo not found in public GitHub. **Recommendation unchanged:** REST per-commit + go-enry + optional `.gitattributes` override is the frontier.
---
## Project-by-project findings
| Project | Repo URL | Primary Lang | Algorithm | Solves Commit Attribution? |
|---------|----------|--------------|-----------|---------------------------|
| **jstrieb/github-stats** | github.com/jstrieb/github-stats | Python | GraphQL `languages(orderBy:SIZE)` | No — byte-size only; explicit TODO: "Improve languages to scale by number of contributions" |
| **anmol098/waka-readme-stats** | github.com/anmol098/waka-readme-stats | Python | WakaTime API editor telemetry | Yes, but orthogonal — not GitHub stats; bypasses the problem entirely |
| **athul/waka-readme** | github.com/athul/waka-readme | Python | WakaTime API (simpler wrapper) | Yes, but orthogonal — WakaTime-only |
| **yoshi389111/github-profile-3d-contrib** | github.com/yoshi389111/github-profile-3d-contrib | TypeScript | GitHub GraphQL contributions calendar | N/A — contributions only, no languages |
| **sarthakhingankar/github-profile-readme-generator** | (repo not found / archived) | — | — | — |
| **rahul-jha98/github-profile-readme-generator** | (repo not found / archived) | — | — | — |
| **TraceLD/github-user-language-breakdown** | github.com/TraceLD/github-user-language-breakdown | TypeScript | Byte-size aggregation (`/api/langs`) | No — frontend app; backend not audited but calls generic language API |
| **madnight/githut** | github.com/madnight/githut | JavaScript | Google BigQuery public GitHub dataset | No — aggregate repo stats, not per-user commits |
---
## Detailed findings
### jstrieb/github-stats
- **Active:** Yes (last push 2026-04-18, 3.4K stars)
- **What it does:** Python CLI → GraphQL user repos → per-repo `languages(first:10, orderBy:{SIZE})` → accumulate byte-size
- **Commit attribution:** Explicitly does NOT; source has TODO comment: `# TODO: Improve languages to scale by number of contributions to`
- **Cost:** 1 GraphQL query per 100 repos
- **Novel:** Handles private repos via token; otherwise standard byte-size approach
- **Verdict:** Aware of the problem, chose not to solve it (likely due to REST rate-limit concerns)
### anmol098/waka-readme-stats
- **Active:** Yes (last push 2026-04-14, 3.9K stars)
- **What it does:** GitHub Action → fetches WakaTime API → displays editor time-in-language breakdown
- **Commit attribution:** **Yes** — but only if user has WakaTime installed and active
- **Cost:** WakaTime telemetry (user's editor plugin); no GitHub API calls for language stats
- **Solves blog repo problem?** YES, because WakaTime tracks actual editor time, not bytes. A user editing 3 JS files in a Markdown blog gets attributed to JS only if they actually spent time in JS editor.
- **Limitation:** Requires WakaTime setup; doesn't work offline; not a pure GitHub solution
- **Verdict:** Different UX. Solves the problem *orthogonally* — doesn't use GitHub language API at all.
### athul/waka-readme
- **Active:** Yes (last push 2026-02-18, 1.8K stars)
- **What it does:** Simpler WakaTime wrapper; GitHub Action fetches WakaTime API only
- **Commit attribution:** **Yes** — same as anmol098, via WakaTime editor telemetry
- **Verdict:** WakaTime alternative; no GitHub language innovation
### TraceLD/github-user-language-breakdown
- **Active:** Yes (last push 2025-02-27, 55 stars)
- **What it does:** Frontend (Vite + TypeScript) → calls `/api/langs` backend → returns byte-size breakdown
- **Commit attribution:** No — backend not audited; frontend aggregates by bytes
- **Verdict:** Small project; no novel approach
### madnight/githut
- **Active:** Inactive (last push 2024-04-03, 1K stars)
- **What it does:** Google BigQuery + GitHub public dataset → aggregate language stats across all public repos
- **Commit attribution:** No — designed for ecosystem trends, not per-user stats
- **Verdict:** Enterprise-scale analysis tool; not relevant to individual profile cards
### yoshi389111/github-profile-3d-contrib
- **Active:** Yes (last push 2026-04-15, 1.6K stars)
- **What it does:** 3D contribution calendar visualization
- **Language stats:** N/A — contributions only
- **Verdict:** Orthogonal to language problem
---
## The mysterious per-commit CLI
DEV.to post by maxfriedmann (Feb 2026): "I built a CLI to see my real GitHub language stats — does something like this already exist?"
> *"scanning every commit you've personally authored on GitHub — including private repos — and calculates how many lines you've changed per programming language"*
- **Repo:** Could not locate in public GitHub
- **Likely approach:** REST `GET /repos/{o}/{r}/commits` + parse diff → linguist/go-enry classify files → aggregate lines per language
- **If it exists:** This is exactly the REST per-commit + linguist approach proposed in prior report
- **Status:** Appears to be personal/private project or lost to time
- **Significance:** Validates that the proposed approach is feasible and novel enough to be noteworthy 4 months ago
---
## Language classification ecosystem — current state
| Approach | Maturity | Solves commit problem? | Cost | Trade-offs |
|----------|----------|----------------------|------|------------|
| **Byte-size (GitHub default)** | Stable, no code needed | No | GraphQL 1 call/100 repos | Simple; fundamentally broken for mixed-language repos |
| **Repository language count** | Stable (vn7n24fzkq) | No (only counts repo count, not commits) | Same as above | Slightly less broken; still size-biased |
| **WakaTime editor telemetry** | Requires opt-in | Yes, but not GitHub-only | User's telemetry; 0 GitHub API calls | Accurate; private; off-chain; requires user setup |
| **REST per-commit + go-enry** | Not yet packaged; proposed | Yes (90%95% accuracy) | 1 REST call/commit (100/hr budget) | Fast; no clone; extension-limited; no `.gitattributes` support |
| **REST per-commit + go-enry + `.gitattributes`** | Proposed (this project) | Yes (95%98% accuracy) | +1 REST call/repo for attrs | Same + minimal overhead for ~5% accuracy gain |
| **Clone + linguist Ruby gem** | Stable (lowlighter/metrics) | Yes (99% accuracy) | 15 sec timeout; storage | Accurate; slow; heavy; clones entire repo |
| **Clone + linguist-js** | Stable (lowlighter/metrics) | Yes (99% accuracy) | 15 sec timeout; storage | Same as Ruby gem |
---
## Did anyone solve it better?
**No.** The landscape is:
1. **Byte-weighted GitHub stats** — easy, broken, everyone does it (GRS, jstrieb, others)
2. **WakaTime editor telemetry** — orthogonal; requires opt-in; doesn't use GitHub API
3. **Cloning repos** — accurate but slow (lowlighter/metrics)
4. **REST per-commit + go-enry** — middle ground, not yet packaged as a standalone tool
**Null result:** No project uses `GET /repos/{o}/{r}/commits/{sha}` + go-enry/linguist for per-commit classification and packages it as a reusable tool. The DEV.to CLI mentions this exists but repo not found in public GitHub. This suggests either: (a) it's private/personal; (b) abandoned; (c) author hasn't open-sourced it.
---
## Implication for ghstats
**Prior recommendation stands.** REST per-commit + go-enry is:
- **Frontier-tier** — no packaged competitor exists yet
- **Feasible** — go-enry is performant; REST budgets fit; no cloning overhead
- **Accurate enough** — 9095% for extension-only; 9598% with `.gitattributes`
- **Testable** — can validate against lowlighter/metrics cloned results (regression test)
**Action:** Proceed with REST per-commit + go-enry implementation for ghstats v1. Add `.gitattributes` override as Phase 2 if accuracy feedback demands it.
---
## New ideas surfaced
- **Idea A:** Could reach out to maxfriedmann (DEV.to) to find/acquire their per-commit CLI code if it's actually been built. Might skip months of engineering.
- **Idea B:** Offer ghstats as a GitHub Action alternative to WakaTime for users who don't want editor telemetry but want accurate stats. Differentiate: "GitHub-only, no telemetry setup, REST-fast."
- **Idea C:** Add a `.gitattributes` fetcher as an optional HTTP call per repo; toggle via config. Minimal cost for significant accuracy gain on projects that use `linguist-*` directives.
---
## Unresolved questions
- **What repo is the DEV.to per-commit CLI?** Could it be claimed, forked, or improved?
- **Should ghstats include Prose (Markdown)?** Default to Programming+Markup only, with opt-in for Prose?
- **How to handle repos with zero user commits?** Skip, count PR reviews, or leave blank?
- **Fallback behavior if go-enry can't classify a file?** Count as "Other" or skip?
- **Should `.gitattributes` parsing be v1 or v2 feature?** (Adds 1 REST call/repo; ~5% accuracy gain)
---
**Sources:**
- [jstrieb/github-stats](https://github.com/jstrieb/github-stats)
- [anmol098/waka-readme-stats](https://github.com/anmol098/waka-readme-stats)
- [athul/waka-readme](https://github.com/athul/waka-readme)
- [yoshi389111/github-profile-3d-contrib](https://github.com/yoshi389111/github-profile-3d-contrib)
- [TraceLD/github-user-language-breakdown](https://github.com/TraceLD/github-user-language-breakdown)
- [madnight/githut](https://github.com/madnight/githut)
- [I built a CLI to see my real GitHub language stats (DEV.to)](https://dev.to/maxfriedmann/i-built-a-cli-to-see-my-real-github-language-stats-does-something-like-this-already-exist-1n18)
- [go-enry/go-enry](https://github.com/go-enry/go-enry)
- [GitHub Docs: About repository languages](https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/about-repository-languages)