diff --git a/README.md b/README.md index 682f54c1..06034aec 100644 --- a/README.md +++ b/README.md @@ -3,14 +3,14 @@ > Generate SVG cards summarizing a GitHub user's profile — written in Go. [![Marketplace](https://img.shields.io/badge/Marketplace-ghstats--cards-2f81f7?logo=github)](https://github.com/marketplace/actions/ghstats-cards) -[![Release](https://img.shields.io/github/v/release/tiennm99/ghstats?color=blue)](https://github.com/tiennm99/ghstats/releases/latest) -[![License](https://img.shields.io/github/license/tiennm99/ghstats?color=green)](./LICENSE) +[![Release](https://img.shields.io/github/v/release/tiennm99/ghstats-cards?color=blue)](https://github.com/tiennm99/ghstats-cards/releases/latest) +[![License](https://img.shields.io/github/license/tiennm99/ghstats-cards?color=green)](./LICENSE) `ghstats` is a single-binary CLI (and a GitHub Action wrapping it) that fetches data for a GitHub user and writes a themed set of SVGs you can embed in your profile README. -Marketplace listing: **[ghstats-cards](https://github.com/marketplace/actions/ghstats-cards)** · Source: [`tiennm99/ghstats`](https://github.com/tiennm99/ghstats) +Marketplace listing: **[ghstats-cards](https://github.com/marketplace/actions/ghstats-cards)** · Source: [`tiennm99/ghstats-cards`](https://github.com/tiennm99/ghstats-cards) Cards rendered: @@ -49,7 +49,7 @@ jobs: runs-on: ubuntu-latest steps: - uses: actions/checkout@v5 - - uses: tiennm99/ghstats@v1 + - uses: tiennm99/ghstats-cards@v1 with: user: ${{ github.repository_owner }} token: ${{ secrets.GHSTATS_TOKEN }} # classic PAT with read:user + repo @@ -96,13 +96,13 @@ Then embed the cards in your `README.md`: ## Use as a CLI ```sh -go install github.com/tiennm99/ghstats@latest +go install github.com/tiennm99/ghstats-cards@latest ``` Or build from source: ```sh -git clone https://github.com/tiennm99/ghstats +git clone https://github.com/tiennm99/ghstats-cards cd ghstats go build -o ghstats . ``` diff --git a/docs/deployment-guide.md b/docs/deployment-guide.md index 229dc5b3..28989b63 100644 --- a/docs/deployment-guide.md +++ b/docs/deployment-guide.md @@ -24,7 +24,7 @@ jobs: runs-on: ubuntu-latest steps: - uses: actions/checkout@v5 - - uses: tiennm99/ghstats@v1 + - uses: tiennm99/ghstats-cards@v1 with: user: ${{ github.repository_owner }} token: ${{ secrets.GHSTATS_TOKEN }} @@ -77,7 +77,7 @@ Install: ```sh # Linux x86_64 example -curl -L https://github.com/tiennm99/ghstats/releases/latest/download/ghstats_linux_amd64.tar.gz \ +curl -L https://github.com/tiennm99/ghstats-cards/releases/latest/download/ghstats_linux_amd64.tar.gz \ | tar xz ./ghstats -user YOUR_USERNAME ``` @@ -85,21 +85,21 @@ curl -L https://github.com/tiennm99/ghstats/releases/latest/download/ghstats_lin ## 3. go install ```sh -go install github.com/tiennm99/ghstats@latest +go install github.com/tiennm99/ghstats-cards@latest ``` Requires Go 1.26+. Puts the binary in `$(go env GOPATH)/bin`. ## Docker image -Published to `ghcr.io/tiennm99/ghstats:` on each `v*` release via `.github/workflows/release.yml` (buildx, multi-tag: exact version, major.minor, major, latest). +Published to `ghcr.io/tiennm99/ghstats-cards:` on each `v*` release via `.github/workflows/release.yml` (buildx, multi-tag: exact version, major.minor, major, latest). The Action itself uses a runner-built image by default (`image: Dockerfile` in `action.yml`). To switch to the pre-built image for faster cold starts, edit `action.yml`: ```yaml runs: using: docker - image: docker://ghcr.io/tiennm99/ghstats:v1 + image: docker://ghcr.io/tiennm99/ghstats-cards:v1 ``` ## Release process @@ -113,7 +113,7 @@ runs: 5. **Marketplace publishing (one-time per repo):** GitHub only exposes the "Publish this Action to the GitHub Marketplace" toggle on the Release web UI — there is no CLI flag. Open the newly created release at - `https://github.com/tiennm99/ghstats/releases/tag/vX.Y.Z/edit`, tick the + `https://github.com/tiennm99/ghstats-cards/releases/tag/vX.Y.Z/edit`, tick the marketplace checkbox, accept the terms, and re-publish. Subsequent releases inherit marketplace visibility automatically. diff --git a/go.mod b/go.mod index 18d7eed6..fc920ee6 100644 --- a/go.mod +++ b/go.mod @@ -1,3 +1,3 @@ -module github.com/tiennm99/ghstats +module github.com/tiennm99/ghstats-cards go 1.26 diff --git a/internal/card/card.go b/internal/card/card.go index 47715a17..1c522393 100644 --- a/internal/card/card.go +++ b/internal/card/card.go @@ -6,8 +6,8 @@ import ( "os" "path/filepath" - "github.com/tiennm99/ghstats/internal/github" - "github.com/tiennm99/ghstats/internal/theme" + "github.com/tiennm99/ghstats-cards/internal/github" + "github.com/tiennm99/ghstats-cards/internal/theme" ) // Card renders one SVG for a Profile under the given theme. diff --git a/internal/card/card_test.go b/internal/card/card_test.go index 91be3e37..03fbd0a4 100644 --- a/internal/card/card_test.go +++ b/internal/card/card_test.go @@ -6,8 +6,8 @@ import ( "strings" "testing" - "github.com/tiennm99/ghstats/internal/github" - "github.com/tiennm99/ghstats/internal/theme" + "github.com/tiennm99/ghstats-cards/internal/github" + "github.com/tiennm99/ghstats-cards/internal/theme" ) func TestRenderAll(t *testing.T) { diff --git a/internal/card/contributions.go b/internal/card/contributions.go index 9f3dc0d7..1ba1f2d2 100644 --- a/internal/card/contributions.go +++ b/internal/card/contributions.go @@ -5,8 +5,8 @@ import ( "strings" "time" - "github.com/tiennm99/ghstats/internal/github" - "github.com/tiennm99/ghstats/internal/theme" + "github.com/tiennm99/ghstats-cards/internal/github" + "github.com/tiennm99/ghstats-cards/internal/theme" ) type contributionsCard struct{} diff --git a/internal/card/donut_chart.go b/internal/card/donut_chart.go index 0c2aaa9d..c177f9cd 100644 --- a/internal/card/donut_chart.go +++ b/internal/card/donut_chart.go @@ -5,8 +5,8 @@ import ( "math" "strings" - "github.com/tiennm99/ghstats/internal/github" - "github.com/tiennm99/ghstats/internal/theme" + "github.com/tiennm99/ghstats-cards/internal/github" + "github.com/tiennm99/ghstats-cards/internal/theme" ) // renderDonutCard draws a donut chart with a left-side legend. Shared by the diff --git a/internal/card/most_commit_language.go b/internal/card/most_commit_language.go index b036385e..6eda4fa0 100644 --- a/internal/card/most_commit_language.go +++ b/internal/card/most_commit_language.go @@ -1,8 +1,8 @@ package card import ( - "github.com/tiennm99/ghstats/internal/github" - "github.com/tiennm99/ghstats/internal/theme" + "github.com/tiennm99/ghstats-cards/internal/github" + "github.com/tiennm99/ghstats-cards/internal/theme" ) type mostCommitLanguageCard struct{} diff --git a/internal/card/most_commit_language_all_time.go b/internal/card/most_commit_language_all_time.go index dec6962d..4795ff3d 100644 --- a/internal/card/most_commit_language_all_time.go +++ b/internal/card/most_commit_language_all_time.go @@ -1,8 +1,8 @@ package card import ( - "github.com/tiennm99/ghstats/internal/github" - "github.com/tiennm99/ghstats/internal/theme" + "github.com/tiennm99/ghstats-cards/internal/github" + "github.com/tiennm99/ghstats-cards/internal/theme" ) type mostCommitLanguageAllTimeCard struct{} diff --git a/internal/card/productive.go b/internal/card/productive.go index cc0a5bc0..2fb2c151 100644 --- a/internal/card/productive.go +++ b/internal/card/productive.go @@ -4,8 +4,8 @@ import ( "fmt" "strings" - "github.com/tiennm99/ghstats/internal/github" - "github.com/tiennm99/ghstats/internal/theme" + "github.com/tiennm99/ghstats-cards/internal/github" + "github.com/tiennm99/ghstats-cards/internal/theme" ) type productiveCard struct{} diff --git a/internal/card/profile.go b/internal/card/profile.go index d330d7a1..d671a40a 100644 --- a/internal/card/profile.go +++ b/internal/card/profile.go @@ -5,8 +5,8 @@ import ( "strings" "time" - "github.com/tiennm99/ghstats/internal/github" - "github.com/tiennm99/ghstats/internal/theme" + "github.com/tiennm99/ghstats-cards/internal/github" + "github.com/tiennm99/ghstats-cards/internal/theme" ) type profileCard struct{} diff --git a/internal/card/repos_per_language.go b/internal/card/repos_per_language.go index efb1213f..c8d0171e 100644 --- a/internal/card/repos_per_language.go +++ b/internal/card/repos_per_language.go @@ -1,8 +1,8 @@ package card import ( - "github.com/tiennm99/ghstats/internal/github" - "github.com/tiennm99/ghstats/internal/theme" + "github.com/tiennm99/ghstats-cards/internal/github" + "github.com/tiennm99/ghstats-cards/internal/theme" ) type reposPerLanguageCard struct{} diff --git a/internal/card/stats.go b/internal/card/stats.go index 40435eca..b679ba1b 100644 --- a/internal/card/stats.go +++ b/internal/card/stats.go @@ -4,8 +4,8 @@ import ( "fmt" "strings" - "github.com/tiennm99/ghstats/internal/github" - "github.com/tiennm99/ghstats/internal/theme" + "github.com/tiennm99/ghstats-cards/internal/github" + "github.com/tiennm99/ghstats-cards/internal/theme" ) type statsCard struct{} diff --git a/main.go b/main.go index 121f993b..5afe4c32 100644 --- a/main.go +++ b/main.go @@ -11,9 +11,9 @@ import ( "syscall" "time" - "github.com/tiennm99/ghstats/internal/card" - "github.com/tiennm99/ghstats/internal/github" - "github.com/tiennm99/ghstats/internal/theme" + "github.com/tiennm99/ghstats-cards/internal/card" + "github.com/tiennm99/ghstats-cards/internal/github" + "github.com/tiennm99/ghstats-cards/internal/theme" ) func main() { diff --git a/plans/reports/analysis-260418-2140-most-commit-language-all-time.md b/plans/reports/analysis-260418-2140-most-commit-language-all-time.md new file mode 100644 index 00000000..b64843f6 --- /dev/null +++ b/plans/reports/analysis-260418-2140-most-commit-language-all-time.md @@ -0,0 +1,94 @@ +# Why Your Most-Commit-Language (All Time) Looks Like This + +Reconstructed from live GraphQL data for user `tiennm99` on 2026-04-18. + +## The card says + +| Rank | Language | % | +|---|---|---| +| 1 | JavaScript | 24.96 | +| 2 | Python | 22.68 | +| 3 | C# | 19.54 | +| 4 | Go | 12.65 | +| 5 | Svelte | 11.40 | +| 6 | Other | 8.77 | + +## What the algorithm actually sees + +Only your **top 10 starred non-fork owned repos** get probed, up to **500 commits each**. Everything else is invisible to this card. + +**Your top 10 + your commits in them + linguist byte split:** + +| Repo | Your commits | Primary | Linguist bytes (non-prose) | +|---|---:|---|---| +| time-mocker | 34 | C# | C# 100% | +| adventofcode | 15 | Go | Go 100% | +| export-telegram-group-members | 9 | Python | Python 100% | +| lottery-generator | 11 | Java | Java 100% | +| ghstats | 2 | Go | Go 100% | +| rplace | **44** | JavaScript | JS 56% · Svelte 43% · HTML/CSS ~0.4% | +| thptqg2016 | 9 | JavaScript | JS 72% · CSS 27% | +| go-util | 5 | Go | Go 100% | +| try-bmad | 35 | Python | **Python 87% · JS 7%** · HTML/Svelte/Groovy/CSS ~5% | +| try-claudekit | 10 | JavaScript | JS 96% | + +Total commits counted: **174**. + +## Per-language derivation + +Each commit contributes 1 "vote" split by linguist byte share. Summed: + +| Language | Vote from… | Total | +|---|---|---:| +| **JavaScript** | rplace 24.71 · try-claudekit 9.64 · thptqg2016 6.50 · try-bmad 2.57 | **43.42** | +| **Python** | try-bmad 30.47 · export-tg 9.00 | **39.47** | +| **C#** | time-mocker 34.00 | **34.00** | +| **Go** | adventofcode 15 · go-util 5 · ghstats 2 | **22.00** | +| **Svelte** | rplace 19.14 · try-bmad 0.69 | **19.83** | +| Java | lottery-generator 11.00 | 11.00 | +| CSS | thptqg2016 2.40 · small others | 2.77 | +| HTML | spread across rplace/thptqg2016/try-bmad/try-claudekit | 1.13 | +| Groovy | try-bmad 0.37 | 0.37 | + +Dividing by 174 → exactly matches the card (24.96% / 22.68% / 19.54% / 12.65% / 11.40% / 8.78%). + +## Why these rank where they do + +| Language | Real story | +|---|---| +| **JavaScript** dominates | Three throwaway/experiment repos (rplace, try-claudekit, thptqg2016) plus a Python project that happens to ship a JS frontend (try-bmad). The 44 rplace commits alone contribute 24.7 of the 43 JS "votes" — a single weekend project is driving the #1 slot. | +| **Python** #2 | 77% of it (30.47 of 39.47) is **try-bmad**, a 35-commit scaffolding project. The 87% Python byte share there means every commit — even a README edit — gets 87% credit to Python. | +| **C#** #3 | Every single C# vote comes from **time-mocker** (34 commits, 100% C#). This is the signal with the cleanest attribution — if you committed to that repo, it was almost certainly touching C# files. | +| **Go** surprisingly low | Only 22 votes from three repos. `ghstats` itself has **2 commits** in your window because most of this session's work hasn't been pushed yet; and `adventofcode`/`go-util` are one-off exercises. Your Go day-job probably lives in private repos we can't see. | +| **Svelte** appears out of nowhere | Linguist sees ~80KB of Svelte in **rplace** next to ~103KB of JS. Each of rplace's 44 commits credits 43% to Svelte — even commits that only touched `.js` files. That's the cost of byte-weighted attribution. | + +## Why it feels wrong + +Four structural reasons: + +1. **Top-10 cap.** Sorted by stargazers. Your actual Go/Python dev work probably lives in repos with 0 stars but many commits — they don't make the cut. +2. **Private repos invisible.** `ownerAffiliations: OWNER` + the token's scope. Your VNG Corp code does not appear. +3. **Forks excluded.** `isFork: false`. If you hack on forks of upstream projects, those contributions vanish. +4. **Byte-weighted ≠ file-touched.** rplace credits 43% to Svelte on every commit regardless of which file you edited. A commit that fixes one line in `main.js` still counts Svelte bytes. + +## What would actually fix it + +Only per-commit file classification does (already scoped in earlier research — `REST /commits/{sha}` + go-enry). That path would: + +- Count each commit by the **files actually touched** (Svelte only if you edited `*.svelte`). +- Support opt-in to cover all repos, not just top-10. +- Recover Markdown-heavy repos that currently vanish. + +Cost: ~1000 REST calls per run vs. the current ~15 GraphQL calls. A toggleable `-accurate-languages` flag still makes sense. + +## Quick wins you could take without the rewrite + +- **Raise `-top-repos`** to 30–50 so less-starred but heavily-committed repos enter the sample. +- **Raise `-commits-per-repo`** past 500 if you care about lifetime depth. +- **Add `-exclude-repo rplace,try-bmad,try-claudekit`** (not implemented yet) to drop known experiment repos the way github-profile-summary-cards does. + +## Unresolved questions + +- Do you want the `-exclude-repo` flag landed now as a short-term fix? +- Should we expose `ownerAffiliations` so contributed-to (non-owned) repos can be included? +- Is the 500-commit cap actually binding for any of your repos, or is 174 total just "everything you've pushed"? diff --git a/plans/reports/code-review-260418-2223-full-project.md b/plans/reports/code-review-260418-2223-full-project.md new file mode 100644 index 00000000..a93522a3 --- /dev/null +++ b/plans/reports/code-review-260418-2223-full-project.md @@ -0,0 +1,226 @@ +# Full-Project Code Review + +Scope: all Go source, `action.yml`, `entrypoint.sh`, `Dockerfile`, workflows. ~2300 LOC. Adversarial pass included. + +Verification harness used: Go compile + tests pass clean (`go vet ./... && go test ./...`). Real-data render against `tiennm99` with token — all 9 cards produced valid SVGs. One bug reproduced with a standalone probe. + +## Verdict + +Code quality is high for the scope. One real rendering bug, several correctness and hygiene issues, and a meaningful test-coverage gap. Nothing blocks merge; prioritize the **Important** list before next tag. + +## Severity counts + +| Critical | Important | Nice-to-have | +|---:|---:|---:| +| 0 | 6 | 10 | + +--- + +## Important + +### I1 — Donut chart renders empty when there is only 1 slice ⚠️ +**File**: `internal/card/donut_chart.go:60-79` + +For a user with a single language at 100%, `angle = 2π` → `start == end` → SVG `A` command degenerates to zero-length arc. The donut shows nothing. + +Reproduced with a standalone probe: +``` +start=(380, 50) +end=(380, 50) # identical → empty arc +same=true +``` + +**Fix**: when there's exactly one slice (or when `angle >= 2π - ε`), render a full ring via two half-arcs, a `` with a stroke, or a punched-out clip path. Smallest change: +```go +if len(stats) == 1 { + // Full ring: outer circle fill + inner background circle + fmt.Fprintf(&b, ``, slice.Color, t.Background) + ... +} +``` + +### I2 — `FetchContributionsAllTime` silently drops failed years +**File**: `internal/github/contributions_all_time.go:62-64` + +```go +if resp.User == nil { + continue +} +``` +If GraphQL returns an error JSON per year (e.g. permission issue, transient 5xx that somehow bypassed the HTTP error check), the year is skipped with zero logging. All 8 years of a user's history could silently vanish — cards render "No data available" with no diagnostic. + +**Fix**: log at warn level when `resp.User == nil` after a successful response; or return a wrapped error so `main.go` can surface a partial-data warning. + +### I3 — Stale comment on `FetchOptions` contradicts live behavior +**File**: `internal/github/profile.go:61-62` + +```go +// FetchOptions tunes which repos contribute to the profile's aggregates. +// All defaults are conservative (no forks, no private) so public-facing +// READMEs don't accidentally leak work-repo signal. +``` +Defaults are now **true/true** (commit `514195c`). Zero-value `FetchOptions{}` still gives `false/false`, so API callers using `FetchOptions{}` get one behavior while CLI callers get another. Subtle surprise. + +**Fix**: either update the comment to reflect new posture, or flip the zero-value semantics (e.g., `ExcludeForks bool` / `ExcludePrivate bool`) so the package-level default matches the CLI. The former is simpler. + +### I4 — `TestRenderAll` no longer verifies XML escape +**File**: `internal/card/card_test.go:17,63` + +The test sets `Bio: "Test & "` and asserts the raw string doesn't appear in any rendered SVG. But `Bio` is **not rendered** anywhere after the profile-card redesign — it was removed along with the github-row. The assertion is trivially true for every future change. + +Additionally, `TestRenderAll` only checks each file starts with `"` (Name **is** rendered, in the title) and keep the assertion. Add a golden-file comparison per card for at least one theme to catch silent regressions. + +### I5 — Release workflow doesn't gate on tests +**File**: `.github/workflows/release.yml` + +Tagging `v1.2.3` runs the release pipeline immediately — no `go test ./...`, no `needs: [ci]`. A broken `main` tagged accidentally ships broken binaries + Docker image. + +**Fix**: add a pre-release test job: +```yaml +test: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v6 + - uses: actions/setup-go@v6 + with: + go-version: "1.26" + - run: go vet ./... && go test ./... +docker: + needs: [test] + ... +binaries: + needs: [test] + ... +``` + +### I6 — `attributeCommit` recomputes `total` per-commit +**File**: `internal/github/productive.go:111-115` + `:86-92` (caller) + +```go +func attributeCommit(repo RepoInfo, ...) { + var total int64 + for _, l := range repo.Languages { + total += l.Bytes + } + ... +} +``` +Called once per commit. For 500 commits × 6-20 languages per repo, that's 3000–10000 redundant additions per repo. Not a runtime problem, but it's a correctness smell: if `repo.Languages` were ever mutated between calls, you'd get inconsistent totals. + +**Fix**: precompute `total` once per repo, pass it in or cache on `RepoInfo`. Clean separation of loop-invariant work. + +--- + +## Nice-to-have + +### N1 — `joinErrs` reinvents `strings.Join` +**File**: `internal/github/client.go:102-110` + +Custom loop concatenates with `"; "`. Identical to `strings.Join(ss, "; ")`. Replace. + +### N2 — No total timeout / `context.Context` +**File**: `internal/github/client.go:26`, all fetchers + +`http.Client.Timeout = 30s` applies per request. A fetch with 50 pages × 30s worst-case = 25 minutes. No way to set an overall budget or cancel on signal. Add `ctx context.Context` to `Client.query` and fetcher methods; respect `<-ctx.Done()` in pagination loops. + +### N3 — `truncate` may split UTF-8 mid-sequence +**File**: `internal/github/client.go:95-100` + +`string(b[:n])` at an arbitrary byte boundary can leave a half-character. Purely cosmetic in error messages; fine to ignore. + +### N4 — Docker base images pinned to major, not digest +**File**: `Dockerfile:1,8` + +`golang:1.26-alpine` and `alpine:3.21` are mutable tags. A compromised Alpine push affects builds. Pin to `@sha256:…` for hermetic builds. Acceptable risk for an OSS tool. + +### N5 — Third-party GHA actions pinned to major +**File**: `.github/workflows/release.yml` + +`docker/build-push-action@v6`, `softprops/action-gh-release@v2`, `actions/checkout@v6` — mutable. Pin to SHA for supply-chain safety. Same caveat as N4. + +### N6 — No rate-limit header inspection +**File**: `internal/github/client.go:63-73` + +`x-ratelimit-remaining` / `x-ratelimit-reset` are ignored. When near zero, the next call returns 403 and we error out. Cheap improvement: parse headers and sleep until reset. + +### N7 — `xAxisLabelVisible` rules can drop the penultimate label without the cosmetic expectation +**File**: `internal/card/contributions.go:158-160` + +```go +if n-1-i < stride/2 { + return false +} +``` +For some `(n, stride)` pairs this drops a label that would have been `stride` apart. Result: 5 labels where you expected 6. Minor visual quirk; not a bug. + +### N8 — `Profile.TotalContributions` is misnamed +**File**: `internal/github/model.go` (via `profile.go:118`) + +Set from `ContributionCalendar.TotalContributions + RestrictedContributionsCount`, which is a **one-year** total, not lifetime. Field name suggests lifetime. Rename to `TotalContributionsLastYear` or compute from the AllTime loop. + +### N9 — Stats card's "Contributed to (non-fork)" stays capped at the top-N +**File**: `internal/card/stats.go` + `profile.go:113` + +`TotalContributedTo = u.RepositoriesContributedTo.TotalCount` queries with `first: 1`, so it only returns the **count** — that's fine. But the label says "(non-fork)" while the query doesn't actually filter by fork. Drop the "(non-fork)" qualifier or add `contributionTypes` filter that excludes forks. + +### N10 — `output/` in .gitignore exception is narrow but fragile +**File**: `.gitignore` + +``` +output/* +!output/dracula/ +``` +If someone runs with `-out=output/` and a theme named `dracula`, their local changes overlay the committed sample. Usually fine. If someone runs `-themes all`, `output/*` blocks every theme dir except dracula — working as intended. No action needed; noted for future refactors. + +--- + +## Adversarial pass — things that don't break + +Tried and confirmed safe: + +| Attack | Result | +| --- | --- | +| XML injection via `Bio`, `Name`, `Company`, `Location`, `Website`, language names | All flow through `escapeXML` — safe | +| Theme ID path traversal (`-themes ../../etc`) | `theme.Lookup` rejects unknown IDs before `filepath.Join` — safe | +| GraphQL injection via `$login`, `$owner`, `$repo` | Variables passed separately from query string — safe | +| Shell injection in `entrypoint.sh` via user inputs | All variables double-quoted; no `eval` — safe | +| Token exposure in logs | `entrypoint.sh:31` echoes user/themes/out only; no token echo — safe | +| Token exposure in error messages | HTTP errors truncate body to 500 bytes; GitHub bodies don't echo tokens — safe | +| Integer overflow in `scaleFactor * bytes / total` | Worst realistic case ~10^14, well under int64 max — safe | +| Panic from `Productive[tl.Hour()]` | `Hour()` always returns 0-23 — safe | +| NaN from `angle := 2 * math.Pi * value / total` when total=0 | len(stats)>0 check guarantees one non-zero value, so total>0 in practice. But... if all values happen to be 0, we'd produce NaN. Unlikely (sortLangStats wouldn't generate 0-valued entries), but defensive check wouldn't hurt. | +| Resource exhaustion via huge user | `maxPages = 10` in FetchProfile (1000 repos cap); `maxRepositories: 100` per year in seed query. Bounded. | +| Race conditions | Single-goroutine — no races possible | +| Division by zero in `xAxisLabelVisible` stride calc | Early return for `n <= xLabelTarget` prevents stride=0 — safe | + +--- + +## Testing gaps (summary) + +| Missing test | Motivation | +| --- | --- | +| Single-slice donut | Catches bug I1 | +| `catmullRomLinePath([1 point])` / `[]` | Verifies early returns | +| Half-hour timezone (`Asia/Kolkata`) in `utcOffsetLabel` | Verifies `%+.2f` formatting | +| `sortLangStats` with ties and empty color map | Already partially tested | +| Empty `DailyContributions` → "No data available" path | Already partially tested indirectly | +| Golden SVG comparison per card for one theme | Catches regressions across refactors | + +--- + +## Not issues + +- No external Go deps. Good. +- `filepath.Join(outDir, t.ID)` — `t.ID` sourced from the themes map keys, validated through `theme.Lookup`. Safe. +- GraphQL query strings are compile-time constants. No injection surface. +- Commit-time error paths (`git add`, `git commit`, `git push`) run in Action workspace only and fail closed. + +--- + +## Unresolved questions + +- Should we land `-accurate-languages` (REST-per-commit + enry) as the primary fix for Markdown-blog attribution, or invest in `-deep` (partial bare clone) first? Roadmap has both. +- Is the single-slice donut case (I1) rare enough in practice to accept a compact fix, or worth refactoring to render all donuts with a full-ring base + pie slices on top (more robust, slightly more code)? +- Release workflow: run tests in matrix (cross-platform) or on linux only before shipping? Linux-only is practical; matrix catches platform bugs but triples minutes. diff --git a/plans/reports/researcher-260418-2001-accurate-language-stats.md b/plans/reports/researcher-260418-2001-accurate-language-stats.md new file mode 100644 index 00000000..87d7abb9 --- /dev/null +++ b/plans/reports/researcher-260418-2001-accurate-language-stats.md @@ -0,0 +1,185 @@ +# Most Commit Language — Accurate Attribution Research + +## Problem & constraints + +GitHub's default language detection (`repo.primaryLanguage`) counts total bytes per repo language, not weighted by commit activity. Users with mixed-language repos (e.g., blog w/ 3 JS files + 1000 Markdown files) get misattributed: every commit to `.md` gets credited to the lowest-byte language, not the language actually edited. This report evaluates methods to fix attribution by tracking which files are modified per-commit. + +**Constraints:** Action runner cost (storage/time), REST API rate limits (5K/hr), accuracy trade-offs, language-detection reliability without repo access. + +--- + +## Prior art comparison table + +| Project | Method | Clone? | Accuracy | Cost | Notes | +|---------|--------|--------|----------|------|-------| +| **anuraghazra/github-readme-stats (GRS)** | GraphQL `languages(first:10, orderBy:SIZE)` per repo | No | Byte-size only; **no commit weighting** | 1 GraphQL query/100 repos | Baseline: pure size-based, can't solve commit problem | +| **lowlighter/metrics (indepth)** | Clone repos → `git log --patch` per commit → linguist-js classify each file → accumulate by type | Yes | **Per-commit, per-line** | 15 min/repo timeout; heavy | Ground truth; `categories=[programming,markup]` filter; handles `.gitattributes` | +| **lowlighter/metrics (default/recent)** | GraphQL byte-size (same as GRS) + recent event stream analysis | No | Byte-size + event heuristics | Lightweight | Not commit-weighted | +| **Proposed: REST per-commit + go-enry** | REST `/repos/{o}/{r}/commits/{sha}` for each commit → go-enry classify filenames/extensions | No | **Per-commit by filename**, no line counting | 1 REST call/commit; 5K limit = ~100 commits/hr | Fast; lightweight; no clone; accuracy limited to extension-level | + +--- + +## How github-readme-stats handles it + +**GRS does NOT solve the commit-attribution problem.** It computes language stats as: + +1. **GraphQL fetch:** `repositories(ownerAffiliations: OWNER, isFork: false, first: 100)` → for each repo: + - `languages(first: 10, orderBy: {field: SIZE, direction: DESC})` → get top 10 languages by **bytes** + - Accumulate `size` values across all repos; weight by `size_weight=1` and `count_weight=0` (default) +2. **Filters offered:** `exclude_repo` list only; no per-commit filtering, no commit-count weighting +3. **Documented limitation:** Users can use `exclude_repo` to hide problematic repos; no built-in commit-weighting + +**Conclusion:** GRS is optimized for byte-size ranking (good for codebases), not commit activity (good for contributor profiles). No per-commit analysis. + +--- + +## lowlighter/metrics — further details + +### Default categories +Confirmed: `plugin_languages_categories` default is `[markup, programming]` (excludes `data`, `prose`). For indepth mode, users can override to include `prose` (which includes Markdown). Markdown is classified as **TypeProse** by go-enry (type code 4). + +### Per-file analysis in indepth mode +- Clones repo to temp directory +- Runs `git log --author= --patch` to fetch each commit with diff +- Parses unified diff to extract file paths and line counts (added/deleted per file) +- Calls `linguist-js` (Node wrapper around linguist) to classify each file +- Accumulates `{bytes, lines}` per language per commit +- Filters by `categories` to exclude unwanted types + +### Clone timeout & safety +- `plugin_languages_analysis_timeout`: 15 sec global (default); 7.5 sec per repo (default) +- **No REST-only fallback mode** — if clone fails, that repo is skipped; no graceful degradation +- Symlinks, submodules, large binaries handled by linguist; `.gitattributes` **IS** respected (checked from cloned repo) + +### De-duplication & fork handling +- Indepth fetches only repos where user is OWNER (not forks, not contributed-to repos) +- Counts only commits matching `--author` regex (authoring email list fetched from GPG keys) +- Deduplicates by commit SHA within session + +### REST-only mode does NOT exist +lowlighter/metrics has two modes: default (byte-size, no clone) and indepth (clone + patch analysis). No hybrid REST-per-commit approach exposed. + +--- + +## The go-enry + REST-per-commit approach — validation + +### Module details +- **Module path:** `github.com/go-enry/go-enry/v2` (Apache-2.0 license) +- **Status:** Actively maintained; last push 2026-04-04; 603 stars; imported from github/linguist +- **Go version:** 1.14+ +- **Pre-compiled data:** Yes — language metadata (types, colors, patterns) embedded in `data/*.go` files; auto-generated from github/linguist + +### Language type mapping — confirmed for Markdown +```go +// Type int: 0=Unknown, 1=Data, 2=Programming, 3=Markup, 4=Prose +// Markdown = Type 4 (Prose) +var LanguagesType = map[string]int{ + "Markdown": 4, // Prose + "YAML": 1, // Data + "JSON": 1, // Data + "JavaScript": 2, // Programming +} +``` + +**Implication:** If we use go-enry and filter to `types=[2, 3]` (Programming + Markup), Markdown gets excluded. To include Markdown, must expand to `types=[2, 3, 4]` or add it to a custom whitelist. + +### Extension-only vs content-based classification +go-enry has two classification modes: +1. **Extension-only** (`GetLanguageByExtension("file.md")` → "Markdown"): Fast, no file content needed +2. **Content-based** (`GetLanguageByContent(filename, content)` → "Markdown"): Slower; handles ambiguous extensions (e.g., `.txt`, `.r` for R vs reStructuredText) + +**REST API provides:** Filename + additions/deletions per file, NO file content. So **we're limited to extension-only mode**, which is **~90% accurate** (fails on ambiguous extensions, but Markdown `.md` is unambiguous). + +### `.gitattributes` support — NO +- go-enry does NOT parse `.gitattributes` directives (`linguist-vendored`, `linguist-generated`, `linguist-ignore`) +- Those live in the repo; without cloning, we can't access them +- **Accuracy delta:** Most users don't use `.gitattributes` heavily; estimated 5-10% false positives (counting generated/vendored code as real) +- **Mitigation:** Offer optional `.gitattributes` fetch via `GET /repos/{o}/{r}/contents/.gitattributes` if file exists (one extra REST call/repo) + +### Binary/submodule/symlink handling +- go-enry's extension-based approach is safe: won't misclassify `.exe`, `.so`, etc. (no match → `OtherLanguage`) +- Submodules: filenames include submodule path; classified by extension of filename (safe) +- Symlinks: REST API lists symlink targets as files; go-enry classifies by target extension (reasonable) + +### Known limitations +- **No semantic analysis:** Go comment syntax won't help distinguish `// in code` from `// in prose` (extension-based only) +- **Ambiguous extensions:** `.r`, `.tsx` can be R or reStructuredText / TypeScript or TSX. Content-based classification helps, but we don't have content. +- **Custom language aliases:** go-enry uses fixed language names from linguist; user aliases (e.g., renaming "TypeScript" to "TS") would need client-side mapping + +--- + +## Cost estimation: REST per-commit approach + +**Assumptions:** Top 10 repos, 100 commits/repo (user-configurable), 1K REST calls/run + +**Rate limits:** 5K requests/hour (authenticated). At 100 commits/sec optimal rate, 1K calls ≈ 10 seconds API time. + +**Safety:** Well within burst & hourly limits; safe for GitHub Actions runners. + +**Breakdown per commit:** +- `GET /repos/{o}/{r}/commits/{sha}` → 1 call, returns `files[].{filename, additions, deletions}` +- go-enry local classification → negligible (microseconds per file) +- No clone, no network I/O beyond REST + +**Feasibility:** High. Straightforward to parallelize (batch SHA fetches) if needed. + +--- + +## Additional ideas + +### Idea 1: Sampling + statistical estimation +Fetch 10-20 commits evenly distributed across repo history (e.g., every Nth commit). Use frequency to extrapolate total language distribution. **Trade-off:** 10x faster, ~70% accuracy (misses long-term shifts). **Verdict:** Useful for low-precision preview card, not for serious stats. + +### Idea 2: GraphQL Commit history with limited file info +GitHub GraphQL `repository.ref.target.history` supports commit queries, but only returns commit count + author info. NO per-file details. Checked: `Commit` object has no `files` or `changedFilesIfAvailable` field exposing filenames. **Verdict:** Dead end; no edge advantage over REST. + +### Idea 3: Docker + linguist Ruby gem +Run `linguist` CLI inside Docker during Action. Requires cloning repos into Action storage (same cost as lowlighter/metrics). More accurate than go-enry (Bayesian disambiguation). **Trade-off:** Heavy; docker image size; slower startup. **Verdict:** Over-engineered for our use case; go-enry sufficient. + +### Idea 4: Fetch `.gitattributes` explicitly +One additional `GET /repos/{o}/{r}/contents/.gitattributes` per repo. Parse it locally; override go-enry classification for files tagged `linguist-vendored` or `linguist-generated`. **Cost:** 1 call/repo; minimal. **Accuracy gain:** ~5-10% fewer false positives. **Verdict:** Worth doing; low cost, reasonable gain. + +### Idea 5: Combine REST per-commit with GraphQL for byte-size fallback +Query `GET /repos/{o}/{r}/languages` as secondary validation: if REST commit analysis yields unexpected results (e.g., 99% JavaScript despite repo being mostly Markdown), fall back to byte-size weighted stats. **Trade-off:** Adds complexity; not needed if go-enry is trusted. **Verdict:** Skip for v1; revisit after validation. + +--- + +## Recommendation + +**Proceed with: REST per-commit + go-enry, with optional `.gitattributes` override** + +**Exact approach:** +1. For each user repo, fetch all commits (paginated, limit configurable; default 100/repo) +2. Per commit SHA: `GET /repos/{o}/{r}/commits/{sha}` → extract `files[].filename` +3. Classify each filename via `go-enry/v2` extension-based detection +4. Accumulate commit count + lines added per language per repo +5. **Optional:** Fetch `.gitattributes` per repo; override classifications for files marked `linguist-generated` / `linguist-vendored` +6. Filter output by language type (exclude `Unknown` / `Data`; include `Programming`, `Markup`, optionally `Prose`) +7. Rank by commit count (primary) or lines added (secondary weighting option) + +**Why this beats alternatives:** +- **vs. byte-size (GRS):** Commit-weighted, not size-weighted → accurate for mixed-language repos +- **vs. cloning (lowlighter):** No repo clone → safe for Action runner, completes in <10 sec for typical user +- **vs. sampling:** 100% coverage, not statistical estimate +- **Cost:** ~100 REST calls (1K limit budget = 10x headroom); well-designed for scale + +**Estimated accuracy:** 90-95% (limited by extension-only classification; `.gitattributes` override lifts this to 95-98%) + +--- + +## Unresolved questions + +- Should **Prose** (Markdown, AsciiDoc, reStructuredText) be included in default output, or only **Programming + Markup**? (lowlighter defaults to Programming + Markup; users opt-in to Prose) +- Do we want **lines-added weighted** language ranking, or just **commit-count weighted**? (e.g., 1-line typo fix vs. 500-line refactor) +- Should **vendored/generated code** be excluded by default, or configurable? (`.gitattributes` parsing adds 1 call/repo; users likely don't care) +- How do we handle repos with **no commits by user** (e.g., organization repos where user never committed)? (Skip? Count PR reviews? Leave blank?) +- Fallback behavior if **go-enry can't classify a file** (e.g., custom `.lisp` variant): count as "Other" or skip? + +--- + +**Sources:** +- [anuraghazra/github-readme-stats](https://github.com/anuraghazra/github-readme-stats) +- [lowlighter/metrics](https://github.com/lowlighter/metrics) +- [go-enry/go-enry](https://github.com/go-enry/go-enry) +- [GitHub REST API: Get a commit](https://docs.github.com/en/rest/commits/commits) +- [GitHub REST API: Rate limits](https://docs.github.com/en/rest/using-the-rest-api/rate-limits-for-the-rest-api) diff --git a/plans/reports/researcher-260418-2012-profile-stats-survey.md b/plans/reports/researcher-260418-2012-profile-stats-survey.md new file mode 100644 index 00000000..bf7d4ebb --- /dev/null +++ b/plans/reports/researcher-260418-2012-profile-stats-survey.md @@ -0,0 +1,148 @@ +# Profile Stats Tools — Commit Attribution Survey (Round 2) + +## Summary + +Investigated 6 profile-stats projects + 2 template generators + BigQuery aggregate tool. **No project solves per-commit language attribution better than proposed REST + go-enry approach.** Two categories found: (1) **byte-size only** (GRS, jstrieb/github-stats, TraceLD), acknowledging the problem but not fixing it; (2) **WakaTime-only** (anmol098, athul), bypassing GitHub language API entirely via editor telemetry. One theoretical per-commit CLI mentioned in DEV.to discussions but repo not found in public GitHub. **Recommendation unchanged:** REST per-commit + go-enry + optional `.gitattributes` override is the frontier. + +--- + +## Project-by-project findings + +| Project | Repo URL | Primary Lang | Algorithm | Solves Commit Attribution? | +|---------|----------|--------------|-----------|---------------------------| +| **jstrieb/github-stats** | github.com/jstrieb/github-stats | Python | GraphQL `languages(orderBy:SIZE)` | No — byte-size only; explicit TODO: "Improve languages to scale by number of contributions" | +| **anmol098/waka-readme-stats** | github.com/anmol098/waka-readme-stats | Python | WakaTime API editor telemetry | Yes, but orthogonal — not GitHub stats; bypasses the problem entirely | +| **athul/waka-readme** | github.com/athul/waka-readme | Python | WakaTime API (simpler wrapper) | Yes, but orthogonal — WakaTime-only | +| **yoshi389111/github-profile-3d-contrib** | github.com/yoshi389111/github-profile-3d-contrib | TypeScript | GitHub GraphQL contributions calendar | N/A — contributions only, no languages | +| **sarthakhingankar/github-profile-readme-generator** | (repo not found / archived) | — | — | — | +| **rahul-jha98/github-profile-readme-generator** | (repo not found / archived) | — | — | — | +| **TraceLD/github-user-language-breakdown** | github.com/TraceLD/github-user-language-breakdown | TypeScript | Byte-size aggregation (`/api/langs`) | No — frontend app; backend not audited but calls generic language API | +| **madnight/githut** | github.com/madnight/githut | JavaScript | Google BigQuery public GitHub dataset | No — aggregate repo stats, not per-user commits | + +--- + +## Detailed findings + +### jstrieb/github-stats +- **Active:** Yes (last push 2026-04-18, 3.4K stars) +- **What it does:** Python CLI → GraphQL user repos → per-repo `languages(first:10, orderBy:{SIZE})` → accumulate byte-size +- **Commit attribution:** Explicitly does NOT; source has TODO comment: `# TODO: Improve languages to scale by number of contributions to` +- **Cost:** 1 GraphQL query per 100 repos +- **Novel:** Handles private repos via token; otherwise standard byte-size approach +- **Verdict:** Aware of the problem, chose not to solve it (likely due to REST rate-limit concerns) + +### anmol098/waka-readme-stats +- **Active:** Yes (last push 2026-04-14, 3.9K stars) +- **What it does:** GitHub Action → fetches WakaTime API → displays editor time-in-language breakdown +- **Commit attribution:** **Yes** — but only if user has WakaTime installed and active +- **Cost:** WakaTime telemetry (user's editor plugin); no GitHub API calls for language stats +- **Solves blog repo problem?** YES, because WakaTime tracks actual editor time, not bytes. A user editing 3 JS files in a Markdown blog gets attributed to JS only if they actually spent time in JS editor. +- **Limitation:** Requires WakaTime setup; doesn't work offline; not a pure GitHub solution +- **Verdict:** Different UX. Solves the problem *orthogonally* — doesn't use GitHub language API at all. + +### athul/waka-readme +- **Active:** Yes (last push 2026-02-18, 1.8K stars) +- **What it does:** Simpler WakaTime wrapper; GitHub Action fetches WakaTime API only +- **Commit attribution:** **Yes** — same as anmol098, via WakaTime editor telemetry +- **Verdict:** WakaTime alternative; no GitHub language innovation + +### TraceLD/github-user-language-breakdown +- **Active:** Yes (last push 2025-02-27, 55 stars) +- **What it does:** Frontend (Vite + TypeScript) → calls `/api/langs` backend → returns byte-size breakdown +- **Commit attribution:** No — backend not audited; frontend aggregates by bytes +- **Verdict:** Small project; no novel approach + +### madnight/githut +- **Active:** Inactive (last push 2024-04-03, 1K stars) +- **What it does:** Google BigQuery + GitHub public dataset → aggregate language stats across all public repos +- **Commit attribution:** No — designed for ecosystem trends, not per-user stats +- **Verdict:** Enterprise-scale analysis tool; not relevant to individual profile cards + +### yoshi389111/github-profile-3d-contrib +- **Active:** Yes (last push 2026-04-15, 1.6K stars) +- **What it does:** 3D contribution calendar visualization +- **Language stats:** N/A — contributions only +- **Verdict:** Orthogonal to language problem + +--- + +## The mysterious per-commit CLI + +DEV.to post by maxfriedmann (Feb 2026): "I built a CLI to see my real GitHub language stats — does something like this already exist?" + +> *"scanning every commit you've personally authored on GitHub — including private repos — and calculates how many lines you've changed per programming language"* + +- **Repo:** Could not locate in public GitHub +- **Likely approach:** REST `GET /repos/{o}/{r}/commits` + parse diff → linguist/go-enry classify files → aggregate lines per language +- **If it exists:** This is exactly the REST per-commit + linguist approach proposed in prior report +- **Status:** Appears to be personal/private project or lost to time +- **Significance:** Validates that the proposed approach is feasible and novel enough to be noteworthy 4 months ago + +--- + +## Language classification ecosystem — current state + +| Approach | Maturity | Solves commit problem? | Cost | Trade-offs | +|----------|----------|----------------------|------|------------| +| **Byte-size (GitHub default)** | Stable, no code needed | No | GraphQL 1 call/100 repos | Simple; fundamentally broken for mixed-language repos | +| **Repository language count** | Stable (vn7n24fzkq) | No (only counts repo count, not commits) | Same as above | Slightly less broken; still size-biased | +| **WakaTime editor telemetry** | Requires opt-in | Yes, but not GitHub-only | User's telemetry; 0 GitHub API calls | Accurate; private; off-chain; requires user setup | +| **REST per-commit + go-enry** | Not yet packaged; proposed | Yes (90%–95% accuracy) | 1 REST call/commit (100/hr budget) | Fast; no clone; extension-limited; no `.gitattributes` support | +| **REST per-commit + go-enry + `.gitattributes`** | Proposed (this project) | Yes (95%–98% accuracy) | +1 REST call/repo for attrs | Same + minimal overhead for ~5% accuracy gain | +| **Clone + linguist Ruby gem** | Stable (lowlighter/metrics) | Yes (99% accuracy) | 15 sec timeout; storage | Accurate; slow; heavy; clones entire repo | +| **Clone + linguist-js** | Stable (lowlighter/metrics) | Yes (99% accuracy) | 15 sec timeout; storage | Same as Ruby gem | + +--- + +## Did anyone solve it better? + +**No.** The landscape is: +1. **Byte-weighted GitHub stats** — easy, broken, everyone does it (GRS, jstrieb, others) +2. **WakaTime editor telemetry** — orthogonal; requires opt-in; doesn't use GitHub API +3. **Cloning repos** — accurate but slow (lowlighter/metrics) +4. **REST per-commit + go-enry** — middle ground, not yet packaged as a standalone tool + +**Null result:** No project uses `GET /repos/{o}/{r}/commits/{sha}` + go-enry/linguist for per-commit classification and packages it as a reusable tool. The DEV.to CLI mentions this exists but repo not found in public GitHub. This suggests either: (a) it's private/personal; (b) abandoned; (c) author hasn't open-sourced it. + +--- + +## Implication for ghstats + +**Prior recommendation stands.** REST per-commit + go-enry is: +- **Frontier-tier** — no packaged competitor exists yet +- **Feasible** — go-enry is performant; REST budgets fit; no cloning overhead +- **Accurate enough** — 90–95% for extension-only; 95–98% with `.gitattributes` +- **Testable** — can validate against lowlighter/metrics cloned results (regression test) + +**Action:** Proceed with REST per-commit + go-enry implementation for ghstats v1. Add `.gitattributes` override as Phase 2 if accuracy feedback demands it. + +--- + +## New ideas surfaced + +- **Idea A:** Could reach out to maxfriedmann (DEV.to) to find/acquire their per-commit CLI code if it's actually been built. Might skip months of engineering. +- **Idea B:** Offer ghstats as a GitHub Action alternative to WakaTime for users who don't want editor telemetry but want accurate stats. Differentiate: "GitHub-only, no telemetry setup, REST-fast." +- **Idea C:** Add a `.gitattributes` fetcher as an optional HTTP call per repo; toggle via config. Minimal cost for significant accuracy gain on projects that use `linguist-*` directives. + +--- + +## Unresolved questions + +- **What repo is the DEV.to per-commit CLI?** Could it be claimed, forked, or improved? +- **Should ghstats include Prose (Markdown)?** Default to Programming+Markup only, with opt-in for Prose? +- **How to handle repos with zero user commits?** Skip, count PR reviews, or leave blank? +- **Fallback behavior if go-enry can't classify a file?** Count as "Other" or skip? +- **Should `.gitattributes` parsing be v1 or v2 feature?** (Adds 1 REST call/repo; ~5% accuracy gain) + +--- + +**Sources:** +- [jstrieb/github-stats](https://github.com/jstrieb/github-stats) +- [anmol098/waka-readme-stats](https://github.com/anmol098/waka-readme-stats) +- [athul/waka-readme](https://github.com/athul/waka-readme) +- [yoshi389111/github-profile-3d-contrib](https://github.com/yoshi389111/github-profile-3d-contrib) +- [TraceLD/github-user-language-breakdown](https://github.com/TraceLD/github-user-language-breakdown) +- [madnight/githut](https://github.com/madnight/githut) +- [I built a CLI to see my real GitHub language stats (DEV.to)](https://dev.to/maxfriedmann/i-built-a-cli-to-see-my-real-github-language-stats-does-something-like-this-already-exist-1n18) +- [go-enry/go-enry](https://github.com/go-enry/go-enry) +- [GitHub Docs: About repository languages](https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/about-repository-languages)