8 Commits

Author SHA1 Message Date
Thieu Nguyen c84f4ac905 docs: sync API endpoint documentation with codebase (#632)
Add ~55 undocumented REST endpoints and ~30 undocumented WS methods.
Remove 14 stale entries (custom tools HTTP, sessions HTTP, delegations WS).
New sections: tenants, system-configs, edition, TTS, browser, zalo, export/import.
2026-04-02 08:24:08 +07:00
Kai (Tam Nhu) Tran 3ca3bb2062 feat: add capability-aware reasoning effort controls (#593)
* feat(reasoning): add capability-aware effort resolution

- resolve requested reasoning levels against exact model capabilities

- persist requested effort on agents and expose effective effort in traces

- add backend tests for provider models, agent store, and resolution logic

Refs #591

* feat(ui): gate reasoning controls by model capabilities

- only show supported reasoning levels when provider model metadata is available

- preserve expert reasoning selections during async model loading

- surface effective reasoning details in trace dialogs and localized copy

Refs #591

* docs(api): document capability-aware reasoning controls

- describe exact-match capability lookup and downgrade behavior

- update provider model metadata and trace response documentation

- refresh the generated OpenAPI spec for the new reasoning fields

Refs #591

* feat: add provider-first reasoning controls

* docs: refresh PR 593 UI evidence callouts

* refactor: deduplicate reasoning normalize functions and remove PR evidence

- Export NormalizeReasoningEffort/NormalizeReasoningFallback from providers
  package; store package now delegates instead of duplicating
- Store reasoning fallback constants alias providers canonical definitions
- Export deriveLegacyThinkingLevel from types/provider.ts; remove local
  copies from agent-advanced-dialog and provider-overview
- Remove unused _providerType param from useProviderModels hook
- Fix reasoning debug log to fire for all cases with a reason (not just
  non-off efforts)
- Remove docs/pr-593-evidence/ binary screenshots from repo

---------

Co-authored-by: viettranx <viettranx@gmail.com>
2026-03-31 07:56:01 +07:00
Kai (Tam Nhu) Tran 4c60dd021e fix: clarify container-scoped runtime warnings for minimal images (#395)
* fix(ui): clarify container-scoped runtime warnings

* docs(runtime): clarify docker image variant expectations

* test(tools): align media path expectations with workspace policy

* docs(tests): narrow message media path contract wording
2026-03-30 21:44:53 +07:00
Kai (Tam Nhu) Tran 30708ae79d feat(providers): support Codex OAuth pools with inherited routing defaults
* feat(auth): support named chatgpt oauth providers

- add provider-scoped ChatGPT OAuth routes and CLI support

- persist refresh tokens per provider and reject provider-type collisions

- wire provider OAuth setup flows in the dashboard and setup UI

Refs #448

* feat(agent): add chatgpt oauth account routing

- add agent other_config routing for manual and round-robin selection

- reuse routed provider resolution across resolver and pending loaders

- add router, parser, and agent advanced dialog coverage for multi-account use

Refs #448

* docs(api): describe chatgpt oauth routing

- document named-provider ChatGPT OAuth auth routes

- describe agent-side account routing and round-robin behavior

- update OpenAPI agent config schema and provider type enum

Refs #448

* fix(store): add missing agent key context helpers

* feat(ui): clarify chatgpt oauth account setup and routing

* docs(providers): align chatgpt oauth alias examples

* feat(agent): add codex pool activity dashboard

* fix(providers): harden codex oauth alias setup

* feat(codex-pool): improve routing dashboard UX

- redesign the Codex/OpenAI pool page around saved-pool checkpoints and live evidence

- add clearer selection, attention, and recent-proof states for pool members

- make the lower panels fill the remaining desktop viewport while staying responsive

* fix(store): resolve context helper merge duplication

* feat(oauth): add codex pool quota and observation APIs

- add quota inspection and observation endpoints for ChatGPT Subscription (OAuth) providers

- teach codex routing to surface pool activity, observation metadata, and quota-aware readiness

- extend tests and HTTP docs/OpenAPI for the new pool monitoring flows

* feat(web): add codex pool quota monitor and controls

- add provider quota fetching, readiness badges, and live routing evidence on the account pool page

- redesign pool setup and activity panels for multi-account management with localized copy updates

- keep the live monitor internally scrollable and compact the account cards for better viewport fit

* fix(web): clarify pool routing labels

- rename the recent request badge from Direct to Selected

- restore compact quota bars in the live pool cards

* feat(codex-pool): add runtime health dashboard

- derive per-provider success and failure health from routed Codex traces

- surface routing, quota, and recent request evidence in the pool UI

- align provider alias guidance and owner access with the dashboard role model

* docs(auth): document tenant scoping and key roles

* fix(auth): harden tenant and codex pool access control

* fix(providers): align codex pool runtime defaults

* feat(ui): tighten codex pool responsive layout

* feat(chatgpt-oauth): refine codex pool management UX

* feat(chatgpt-oauth): surface quota bars on provider pages

- add compact quota bars to Codex provider rows and provider detail

- fetch quota only for ready visible provider rows and ready detail aliases

- fix managed-member detail visibility and tighten provider locale copy
2026-03-27 09:35:57 +07:00
Viet Tran cd022699f6 feat: multi-tenant isolation — complete implementation (#359)
* feat(security): multi-tenant user data isolation (Plan 1)

Comprehensive user data isolation for non-owner system users:

- API key identity binding: owner_id column forces user_id on auth,
  prevents spoofing via X-GoClaw-User-Id header
- Sessions: ownership checks on list/preview/patch/delete/reset,
  non-admin users see only their own sessions
- Cron: user_id filtering on list, ownership checks on mutations
- Server-side WS event filtering: agent/chat/session/cron/team events
  scoped per-user instead of broadcast to all clients
- Web UI role guards: RequireAdmin on 15 admin-only pages, role
  propagated from WS connect response to auth store
- Tracing/activity: user_id enforcement for non-admin HTTP callers
- Teams: HasTeamAccess membership checks on get/delete/list
- Skills: fail-closed ownership check (deny non-admin if store
  doesn't support owner lookup)
- HTTP auth: requireAuthBearer now enforces owner_id + user context
  for file/media downloads (was missing)
- Dead code: removed delegation_history, handoff_routes tables and
  all related handlers/store code
- New: team_user_grants table for user-to-team access control

Migration 000026: api_keys.owner_id + team_user_grants + DROP legacy tables

* feat(security): multi-tenant foundation — tenants table, tenant_id propagation, permission cache (Plan 2)

Add tenant isolation infrastructure across the entire gateway:

Schema (migration 000027):
- Create tenants + tenant_users tables with master tenant seed
- Add tenant_id column to 30 user-scoped tables (NOT NULL DEFAULT master)
- api_keys.tenant_id nullable (NULL = system-level cross-tenant key)
- Create builtin_tool_tenant_configs + skill_tenant_configs for per-tenant overrides
- Drop custom_tools table (agent loop integration never wired)

Store layer:
- TenantStore interface + PGTenantStore (CRUD tenants + tenant_users)
- TenantID field on AgentData + APIKeyData
- tenant_id in agents/api_keys/skills SQL (Create, Get, List)

Context propagation:
- WithTenantID/TenantIDFromContext (uuid.Nil = fail-closed)
- WithCrossTenant/IsCrossTenant (owner/system admin flag)

Auth tenant resolution:
- HTTP: resolveAuthBearer sets TenantID/CrossTenant on all 5 auth paths
- WS: handleConnect sets tenantID/crossTenant on Client
- API key 2-tier: NULL = cross-tenant (system), set = tenant-scoped

Runtime isolation:
- Event bus: TenantID field on Event, fail-closed filter in event_filter.go
- Cron: tenant context injected in RunJob handler
- Subagent: tenant validation prevents cross-tenant spawn
- Security logging: tenant_id in auth resolution logs

Tenant management:
- WS RPC: 7 methods (tenants.list/get/create/update, tenants.users.*)
- HTTP: 7 endpoints (/v1/tenants/*)
- Slug validation + path traversal prevention
- Role validation (owner/admin/operator/member/viewer)

Infrastructure:
- PermissionCache: 4 sub-caches (tenant resolve, role, agent access, team access)
- tenant_paths.go: filesystem path helpers with master-tenant backward compat
- i18n: MsgInvalidRole key + translations (en/vi/zh)

Dead code removed: custom_tools store, HTTP handler, DynamicToolLoader (-828 lines)

* feat(security): tenant query filtering + workspace isolation (Plan 3)

Add WHERE tenant_id filtering to all 30+ tenant-scoped store queries,
wire workspace filesystem isolation, and harden restrict_to_workspace.

Store query filtering:
- Add tenantClauseN/tenantIDForInsert/requireTenantID helpers
- Filter all SELECT/INSERT/UPDATE/DELETE by tenant_id for non-cross-tenant
- Refactor SessionStore.GetOrCreate and CronStore.AddJob/ListJobs to
  accept context.Context for tenant propagation
- System skills (is_system=true) bypass tenant filter for all tenants
- Special cases: GetByKey (channels), GetByHash (auth) skip filter

Workspace isolation:
- Resolver computes tenant-scoped workspace + dataDir for non-master tenants
- Add WithTenantSlug/TenantSlugFromContext to context propagation
- Add TenantStore + Workspace to ResolverDeps
- Force effectiveRestrict() to always return true (multi-tenant security)
- Remove restrict_to_workspace from agentAllowedFields

UI cleanup:
- Remove custom-tools pages, types, routes, constants (backend removed in Plan 2)
- Clean tool-name-select component of custom tools references

* feat(security): session ctx propagation + execMapUpdate tenant guard (Plan 4)

Session store:
- Add ctx to AddMessage, SetSessionMetadata, SetAgentInfo, List, Save
- List now filters by tenant_id for non-cross-tenant callers
- Save uses ExecContext for cancellation support
- All ~15 callers updated to pass ctx

execMapUpdate tenant guard:
- Remove deleted_at IS NULL from execMapUpdateWhereTenant (only agents has soft-delete)
- Migrate 8 callers to execMapUpdateWhereTenant: agent_links, channel_instances,
  mcp_servers, secure_cli, tracing, teams, skills_crud, cron_update
- Add ctx to UpdateSkill, UpdateJob interfaces + all callers

Deferred: cron scheduler global cache (correct by design — system process),
browser per-tenant isolation (separate plan).

* refactor(store): add context.Context to all SessionStore interface methods

Complete ctx propagation across all 24 SessionStore methods for:
- Future tenant-aware DB operations
- Request cancellation/timeout support
- Distributed tracing capability

Updated ~15 files including all callers in agent loop, gateway methods,
heartbeat ticker, tools, and CLI commands.

* fix(security): remove context.Background() shadowing in gateway handlers

Critical fix from code review: gateway agent handlers (create, update,
delete, identity, files, links, teams) were creating ctx := context.Background()
which shadowed the handler's ctx that carries tenant_id. This breaks
tenant-scoped agent queries for non-master tenants.

- Remove ctx shadowing in 7 agent handler files
- Add ctx param to resolveAgentUUID/resolveAgentInfo helpers
- Use store.WithCrossTenant in resolver (system-level operation)

* feat(security): tenant-scoped UNIQUE constraints for multi-tenant isolation

Update UNIQUE indexes to include tenant_id, allowing same names across tenants:
- agents: (agent_key) → (tenant_id, agent_key) WHERE deleted_at IS NULL
- sessions: (session_key) → (tenant_id, session_key)
- skills: (slug) → (tenant_id, slug)
- mcp_servers: (name) → (tenant_id, name)
- channel_contacts: (channel_type, sender_id) → (tenant_id, channel_type, sender_id)

Code changes:
- GetByKey now filters by tenant_id (same pattern as GetByID)
- ON CONFLICT clauses updated for sessions and skills
- Channel consumer uses WithCrossTenant for agent resolution
- Down migration restores original constraints

* fix(security): close remaining tenant isolation gaps from final audit

Critical fixes:
- gateway_setup: WithCrossTenant for default agent lookup at startup (C6)
- channel_contacts: ON CONFLICT updated to (tenant_id, channel_type, sender_id) (Q15)
- agents.Delete: tenant filter on DELETE (Q1)

High priority fixes:
- agents: List, GetDefault, ShareAgent, RevokeShare, ListShares, CanAccess,
  ListAccessible, Update unset-default — all now tenant-scoped
- skills_crud: DeleteSkill now takes ctx, verifies tenant ownership
- mcp_servers, channel_instances, secure_cli: Delete methods tenant-scoped
- WithCrossTenant added to: gateway team notifications, team_tool_cache,
  pending_messages GetDefault

* fix(migration): add tenant_id to usage_snapshots unique index

Update idx_usage_snapshots_unique to include tenant_id, preventing
cross-tenant upsert collisions when different tenants have agents
with same provider/model/channel combination.

* feat(security): cron tenant guard + browser per-tenant isolation

Phase 3 — Cron API tenant guard:
- Add ctx to 5 CronStore methods (GetJob, RemoveJob, EnableJob, RunJob, GetRunLog)
- All API-facing cron ops now filter by tenant_id (prevents cross-tenant CRUD)
- RemoveJob/EnableJob return "not found" on tenant mismatch (no enumeration)
- GetRunLog JOINs cron_jobs for tenant filtering
- UpdateJob internal reads scoped by tenant (defense-in-depth)
- Scheduler-internal methods (GetDueJobs, refreshJobCache) unchanged (system-level)

Phase 4 — Browser per-tenant isolation:
- Per-tenant incognito browser contexts via rod Incognito() (separate cookie jars)
- All page access (Snapshot, Screenshot, Navigate, Click, Type, etc.) validated
  via getPageForTenant — blocks cross-tenant access by targetID
- OpenTab creates pages in tenant's incognito context
- ListTabs scoped to tenant's incognito context
- ConsoleMessages validates page ownership
- Stop/reconnect properly cleans up incognito contexts

* feat(security): isolation gaps + per-tenant config (Plan 5)

Part A — Isolation Gap Fixes:
- Merge migration 028 into 027: add tenant_id to llm_providers +
  config_secrets, fix UNIQUE constraints for paired_devices +
  channel_instances
- providers.go: tenant filtering on all CRUD queries
- config_secrets.go: ON CONFLICT (key, tenant_id)
- pairing_store: add ctx to all 7 interface methods, remove hardcoded
  MasterTenantID, update ~15 channel caller files
- Session cache: prefix keys with tenantID to prevent cross-tenant
  collision. DB queries (loadFromDB, Save, Delete, LastUsedChannel)
  add tenant filter
- config_permissions cache: prefix keys with tenantID
- Cron ListJobs: fail-closed when tenant context missing

Part B — Per-Tenant Configuration:
- Provider Registry: compound key tenantID/name with fallback to
  master tenant. GetForTenant/ListForTenant/RegisterForTenant
- Resolver: uses tenant-aware provider lookup + disabled tools query
- Agent loop: filter disabled tools from LLM tool definitions
- Builtin tool tenant configs: store interface + PG implementation +
  PUT/DELETE HTTP endpoints
- Skill tenant configs: store interface + PG + ListAccessible LEFT
  JOIN to exclude disabled skills per tenant
- OAuth: DBTokenSource with tenantID field for tenant-scoped token
  refresh
- All HTTP provider handlers use RegisterForTenant/UnregisterForTenant

* feat(security): channel tenant propagation + MCP per-user credentials (Plan 6)

- Propagate tenant_id from channel_instances through BaseChannel →
  InboundMessage → agent loop context (fixes 5-point break in tenant flow)
- Inject tenant context in WS router dispatch for all gateway methods
- Add MCP per-user credential overrides (api_key, headers, env) with
  AES-256-GCM encryption and HTTP API endpoints
- Rewrite MCP pool with tenant-scoped keys, slot semaphore, idle eviction,
  and credential rotation support (Evict per tenant+server)
- Bypass pool for users with custom credentials (separate connections)
- Fix MCP APIKey never passed to connections (inject as Authorization header)

* fix(security): close remaining tenant isolation gaps from Plan 1-6 audit

- Add tenant_id to 6 missing tables: agent_context_files,
  skill_agent_grants, mcp_agent_grants, team_tasks, spans,
  embedding_cache (migration 027)
- Fix tid==uuid.Nil fallback to fail-closed (return error) in 8 update
  methods: agent_links, teams, skills, channel_instances, secure_cli,
  cron, mcp_servers, tracing
- Add tenant filter to bare DELETEs: DeleteLink, DeleteTeam
- Add tenant filter to queries: ListChildTraces, GetMonthlyAgentCost,
  CountAgentGrantsByServer, ListAccessible (MCP), ReviewRequest,
  ResolveGroupTitles, buildTraceWhere
- Fix missing tenant_id in INSERTs: CreateSkill, GrantToUser,
  ReviewRequest grant INSERTs
- Add tenant filter to api_keys: List, Revoke, Delete
- Fix cron scanJob/RemoveJob/EnableJob fallthrough patterns

* fix(security): inject tenant context into channel handler entry points

Channel handlers used context.Background() which lost tenant context,
causing store operations to either fail-closed or default to master
tenant. Now all 10 handler entry points inject tenant from BaseChannel.

* fix(security): tenant filters for teams, tasks, skills (Plan 6b audit)

- Teams: add tenant filter to GetTeamForAgent, ListMembers,
  ListIdleMembers, KnownUserIDs (JOIN agent_teams for tenant check)
- Teams: add tenant_id to GrantTeamAccess INSERT, tenant filter to
  RevokeTeamAccess, ListTeamGrants, HasTeamAccess
- Team tasks: add tenant_id to CreateTask INSERT, fail-closed
  UpdateTask, tenant filter on all 7 query/delete methods
- Skills: add tenant filter to RevokeFromAgent, ListAgentGrants
- Skills: add ctx param + tenant filter to ToggleSkill
- History: annotate context.Background() locations with TODOs for
  future tenant injection (requires PendingHistory struct refactor)

* fix(security): add tenant_id to 4 missing team tables + fix INSERTs

Add tenant_id column to: agent_team_members, team_task_comments,
team_task_events, team_task_attachments (migration 027).

Fix INSERT statements to include tenant_id: AddMember,
AddTaskComment, RecordTaskEvent, AttachFileToTask.

* fix(migration): cast UUID literals in tenant_users seed + usage_snapshots index

PostgreSQL doesn't auto-cast string to UUID in SELECT and expression
index contexts. Add explicit ::uuid casts to prevent migration failure.

* docs: add multi-tenant architecture guide for integrators

Comprehensive solution doc covering auth model, WS protocol, event
system, data isolation, API reference, and integration patterns.
Target audience: developers building custom frontends or SaaS on GoClaw.

* feat(ui): multi-tenant awareness + tenant admin page (Plan 7)

Backend:
- Enrich WS connect response with tenant_name, tenant_slug, cross_tenant
- Add tenants.mine WS method (any user, returns own memberships)
- Parse tenant_hint in connect params for browser pairing multi-tenant
- Wire tenantStore to MethodRouter for connect-time tenant lookup

Frontend:
- Auth store: tenantId, tenantName, tenantSlug, isCrossTenant, availableTenants
- WS client: capture tenant fields from connect, send tenant_hint
- WS provider: auto-fetch tenants.mine on connect
- useTenants() shared hook for all tenant-aware components
- Tenant indicator in sidebar connection status
- Tenant admin page (/admin/tenants) with list + create dialog
- Tenants nav in sidebar (cross-tenant admin only)
- i18n: tenants namespace (en/vi/zh)
- Type updates: tenant_id on AgentData, ApiKeyData

* refactor(ui): move tenant selector into user menu dropdown in topbar

Replace simple logout button with a Radix Popover user menu showing:
- User ID display
- Tenant selector (when multi-tenant: list all tenants with check mark)
- Logout button

Remove tenant indicator from connection-status.tsx (now in topbar).
Tenant switch saves slug to localStorage and reloads for reconnect.

* feat(ui): add logout confirmation dialog

Show destructive confirm dialog before logout via ConfirmDialog
component. Added logoutConfirm i18n key for en/vi/zh.

* fix(ui): security hardening — hide admin nav, fix route guard, fix refresh

- Hide System nav group for non-admin roles in sidebar (was visible to all)
- Replace RequireAdmin with RequireCrossTenant guard on /admin/tenants route
- Add RequireCrossTenant component to require-role.tsx
- Fix refresh button animation: use isFetching instead of isLoading
- Clean up connection-status.tsx (remove tenant indicator, now in topbar)

* feat: cross-tenant admin tenant scope selector

Backend: add tenant_scope connect param. Cross-tenant clients can
narrow their scope to a specific tenant (slug). applyTenantScope()
sets client.tenantID and clears crossTenant flag.

UI: user menu shows "All Tenants" option for cross-tenant admins.
Selecting a tenant saves slug to localStorage as tenant_scope,
reload reconnects with narrowed scope. "All Tenants" clears scope.

* feat: provisioning API key scope + tenant detail page (Plan 8)

Backend:
- Add operator.provision scope for limited tenant management
- Add HasScope() method to gateway Client
- Allow provision-scoped keys to create tenants + add users
- Allow provision-scoped keys to create tenant-bound API keys

Frontend:
- Tenant detail page with user management (list, add, remove)
- Clickable tenant list rows navigate to detail
- i18n: tenant detail keys (en/vi/zh)
- Route /admin/tenants/:id with RequireCrossTenant guard

* fix: tenant scope keeps admin privileges + UI pattern fixes

Backend:
- applyTenantScope keeps crossTenant=true (retains admin features)
- Router: scoped cross-tenant injects WithTenantID (filters data)
  while keeping admin role for method access

UI:
- Fix "All Tenants" check mark (compare against nil UUID string)
- Fix tenant label when scope active (show selected tenant name)
- Use ConfirmDialog for user removal (was hand-rolled)
- Add DialogDescription to add-user dialog (Radix a11y)
- Fix table min-w-[600px] consistency
- Fix column header mismatch (was "role", should be "created")

* fix(ui): clean up tenant detail header — remove redundant info panel

Remove duplicate slug/status/created panel. Info now shown in
PageHeader description (slug + date). Status badge removed (redundant
with description). Cleaner, consistent with other admin pages.

* fix(ui): redesign tenant detail with info cards + user cards

* feat(ui): tenant selection gate — require tenant before app access

- Add tenantSelected flag to auth store (persisted via localStorage)
- WS provider auto-selects: single-tenant user auto, cross-tenant
  admin defaults to "All Tenants", zero-tenant user blocked
- RequireAuth gate: redirect to /select-tenant when connected but
  no tenant selected
- New TenantSelectorPage: centered card layout matching login page,
  "All Tenants" amber card for cross-tenant admin, per-tenant cards
  with role badges, no-access state with logout button
- i18n: selectTenant, noAccess keys (en/vi/zh)

* fix(security): scope events for cross-tenant admin with tenant_scope

Event filter was checking !crossTenant before filtering — scoped
cross-tenant admins (crossTenant=true + tenantID set) bypassed
tenant event filtering. Now checks tenantID != Nil regardless of
crossTenant flag, ensuring scoped admins only see their chosen
tenant's events.

* fix(security): HTTP API now respects tenant_scope for gateway token

Root cause: UI uses HTTP API (/v1/agents, /v1/mcp/servers, etc.)
for data fetching. HTTP auth middleware with gateway token always
set CrossTenant=true with no tenant filtering. tenant_scope only
worked for WS connection, not HTTP requests.

Fix:
- HTTP client sends X-GoClaw-Tenant-Scope header from localStorage
- HTTP auth resolves header slug → tenant UUID via tenantStore
- requireAuth: CrossTenant + TenantID → WithTenantID (scoped)
- Wire InitTenantStore(pgStores.Tenants) in gateway startup

* feat(security): tenant-aware provider registry, event filter, and membership validation

- Refactor providers.Registry: Get(ctx, name) / List(ctx) extract tenant
  from context via injected TenantFromCtx func (avoids circular import)
- Event filter: fail-closed 3-mode tenant filtering
  Mode 1: unscoped admin sees all
  Mode 2: scoped admin sees tenant events + system events
  Mode 3: regular user sees only own tenant (fail-closed)
- WS connect: resolveTenantHint validates membership via GetUserRole
  with PermissionCache (30s TTL, bus invalidation)
- BroadcastForTenant helper for tenant-scoped event emission
- Session list: add TenantID to SessionListOpts from context
- Cron handleRun: preserve tenant in background goroutine context
- GOCLAW_LOG_LEVEL env var (debug|info|warn|error) for Docker/K8s
- Cache debug logging: tenant_cache, permission_cache, api_key_cache
- Friendly verify error: timeout → user-readable message
- Verify timeout: 15s → 30s

* feat(ui): setup wizard improvements + agent preset enrichment

- Setup: skip link with confirm dialog, language selector (en/vi/zh)
- Setup: card padding fix (py-0 gap-0 on Card, py-5 on CardContent)
- Setup: remove duplicate skip link from layout
- Step Model: verify countdown timer (30s), stops on result
- Step Agent: default Fox Spirit preset, selected state styling,
  hide agent key/name inputs, auto-derive from preset, emoji in config
- Summoning modal: elapsed timer (m:ss format)
- Agent presets: enriched prompts with human-like quirks
  Fox Spirit: playful personality, care reminders
  Artisan: portrait/banner/ads/logo expertise
  Astrologer: reference sites (astro.com, cafeastrology, labyrinthos)
- i18n: "triệu hồi linh hồn" fix, all 3 locales updated

* feat(ui): API Key tenant support + card layout + provider chain fix

- API Key create: tenant selector for cross-tenant admin, provision scope
- API Key create: redesigned dialog with scope cards, Radix Select, icons
- API Key list: card layout with badges (status, tenant, scopes)
- API Key: shortcut in user menu (topbar)
- API Key: keep "API Key" untranslated across all locales
- Provider chain: empty state fix — skip legacy entry when provider
  not found in current tenant
- i18n: form.cancel key added to all 3 locales

* fix(ui): add bottom padding to all page layouts + misc improvements

- Add pb-10 to all 24 page containers to prevent content touching
  bottom edge of viewport
- Various UI polish from user modifications (summoning colors,
  layout icon, agent cards, sidebar adjustments)

* feat(ui): MCP user credentials dialog + builtin tool tenant toggle

- MCP: per-user credentials dialog (api_key, headers, env KV editor)
  with status badges, delete all, save
- MCP: "My Credentials" button on each server row
- Builtin Tools: per-tenant enable/disable override toggle
  with "Using default" / "Enabled/Disabled for tenant" badges
  and reset-to-default button
- Setup: larger logo (h-16) and bolder title (text-4xl font-bold)
- i18n: all keys added to en/vi/zh for both features

* fix(ui): API key card spacing + remove pagination border

- Card padding: px-4 py-3.5 (was px-3 py-2), rows spaced with gap-2
- Scopes on separate row from dates for readability
- Card gap: space-y-2.5 between cards
- Pagination: add className prop, remove border-t on API keys page
- Badge/icon sizes bumped to text-xs / h-3.5 (was text-[10px] / h-3)

* fix(security): comprehensive tenant isolation audit — SQL, events, cache, skills, files

Defense-in-depth hardening across 12 audit phases:

- SQL: add tenant_id WHERE to teams_tasks lifecycle/activity/followup/progress/embedding (~30 functions)
- Events: broadcastTeamEvent + task_ticker + subagent announce now carry TenantID
- Cache: agentKeyCache scoped by tenant (agent keys per-tenant, not globally unique)
- Skills: SkillStore interface accepts ctx, SQL filter (is_system OR tenant_id=$N), per-tenant list cache, GrantToAgent includes tenant_id, tenant-scoped file storage
- Files: StorageHandler/FilesHandler/TeamAttachments/teamWorkspaceDir use config.TenantDataDir/TenantTeamDir
- Security: HMAC signed file tokens (file_token.go) replace gateway token in URLs
- Audit: AuditEventPayload carries TenantID for async subscriber tenant scoping
- InboundMessage: subagent/dispatch/validation/session_send propagate TenantID
- Pending messages: DeleteStale scoped by tenant

* fix(security): skip gateway token in URLs with signed file tokens

toFileUrl() now skips appending ?token=GATEWAY_TOKEN when the URL
already contains ?ft= (HMAC signed file token). Prevents gateway
token exposure via browser history, logs, and referrer headers.

* fix(security): stop persisting auth tokens in session media URLs

mediaToMarkdown() now stores clean paths (/v1/files/path) without
any auth tokens. Previously embedded ?token=GATEWAY_TOKEN (or ?ft=)
into markdown which gets persisted in session messages DB.

Frontend toFileUrl() adds auth at render time — tokens never stored.

* fix(security): migration 027 strips leaked gateway tokens from session URLs

Adds cleanup step to tenant foundation migration: removes ?token=xxx
from persisted media URLs in session messages. Old code embedded the
gateway token; new code stores clean paths only.

* fix(security): sign file URLs at delivery time, not persist time

Add SignFileURLs() utility that finds /v1/files/ and /v1/media/ URLs
in content and appends HMAC signed ?ft= tokens before delivery.

Applied at 4 delivery points:
- WS agent events (OnEvent callback in gateway_managed.go)
- WS chat.history response
- WS sessions.preview response
- HTTP /v1/chat/completions response

Sessions store clean paths only. Tokens are generated per-delivery
with 1h TTL — never persisted in DB. Frontend toFileUrl() skips
appending gateway token when ?ft= is already present.

* fix: file token verify path must match signed path (/v1/files/ prefix)

SignFileURLs() signs the full URL path "/v1/files/{path}" but the
verify in files.go auth() was using "/{path}" (without prefix).
HMAC mismatch caused all signed file tokens to return 401.

* fix(security): scope storage size cache per-tenant

sizeCache was a single global entry — all tenants shared one cached
size. Changed to sync.Map keyed by tenantBaseDir so each tenant gets
its own cached size calculation.

* feat(ui): redesign API keys page — table layout + code snippet dialog

Replace card-based API keys list with table layout matching MCP Servers
pattern. Add "API Key Usage" dialog with tabbed code snippets (cURL,
TypeScript, Go) showing gateway connection examples with syntax
highlighting and copy-to-clipboard.

* fix(builtin-tools): seed media tools disabled, fix tenant toggle, add unconfigured warning

- Seed media tools with Enabled=false and no default provider settings
  (user must configure provider chain before enabling)
- Fix provider chain form ghost entries: validate provider exists in
  tenant before showing (parseInitialEntries new-format path)
- Fix double toggle: show only tenant override OR global toggle, not both
- Fix list API: merge tenant_enabled from builtin_tool_tenant_configs
  into response when tenant-scoped (was always null)
- Add ListAll() to BuiltinToolTenantConfigStore for full override map
- Add amber warning banner for enabled media tools missing provider config

* feat(mcp): require_user_credentials setting + KeyValueEditor for user creds

- Add require_user_credentials setting in mcp_servers.settings JSONB
- Backend: skip MCP server in LoadForAgent when user lacks credentials
- Frontend: toggle in MCP form dialog, persisted in settings field
- Redesign MCP user credentials dialog: replace raw Textarea with
  KeyValueEditor (sensitive key masking for auth/token/secret fields)
- Add settings to mcpServerAllowedFields for HTTP update

* fix(security): restrict cross-tenant to owner IDs, config to owners only

- Gateway token + non-owner user ID: admin role but tenant-scoped
  (no cross-tenant access). Fallback: only "system" is owner when
  GOCLAW_OWNER_IDS not configured (fail-closed).
- Config page (WS config.* methods): wrapped with requireCrossTenant
  middleware — non-owner admins get permission denied
- Config sidebar link: hidden for non-cross-tenant users
- Logout: clear tenant_id and tenant_hint from localStorage
  (prevents tenant scope leak to next user session)
- Refactor: LOCAL_STORAGE_KEYS.TENANT_ID/TENANT_HINT constants

* fix(ui): chat bubble contrast, login logo, tenant no-access UX

- Chat bubble: use --chat-bubble-user CSS var (darker orange, L=0.50/0.52)
  with text-white for WCAG AA contrast (~5.5:1)
- Login page: logo h-20 w-20, title text-3xl font-bold
- Tenant selector no-access: shield icon + hint text explaining
  user needs admin to add them to a tenant
- Sidebar: GoClaw text uses text-sidebar-primary (brand color)

* feat(contacts): merge/unmerge contacts to tenant users

Add API and UI for linking channel contacts to tenant_users identity,
enabling cross-channel user identification within a tenant.

Backend:
- POST /v1/contacts/merge — link contacts to existing or new tenant_user
- POST /v1/contacts/unmerge — remove merged_id from contacts
- GET /v1/contacts/merged/{id} — list contacts by tenant_user
- GET /v1/tenant-users — list users for current tenant
- Add display_name + metadata columns to tenant_users (migration 27)
- All endpoints enforce tenant isolation via context tenant_id

Frontend:
- Checkbox multi-select on contacts table
- Selection toolbar with Merge/Unmerge buttons
- Merge dialog: link to existing user or create new
- Link2 icon indicator for merged contacts
- i18n: en/vi/zh translations for merge section

* fix(security): add tenant_id to span and embedding_cache inserts

SpanData struct was missing TenantID field — all span inserts failed
with NOT NULL constraint violation after migration 027 dropped defaults.

Fix captures tenant_id from context at emit time (6 call sites in
loop_tracing.go + subagent_tracing.go), then includes it in both
CreateSpan() and BatchCreateSpans() SQL (25→26 columns).

Also fixes embedding_cache writeEmbeddingCache() which was missing
tenant_id in its batch INSERT — same class of bug.

Both use MasterTenantID fallback for backward compatibility.

* feat: Introduce tenant switcher UI and enhance multi-tenant architecture documentation.

* fix(security): enforce tenant scoping, fix session isolation and UI cleanup

- Force cross-tenant admins to always have a concrete tenant_id (default
  MasterTenantID) instead of unscoped WithCrossTenant — prevents mismatch
  between session listing (no filter) and writes (MasterTenantID fallback)
- Make agent router tenant-aware: Get(ctx, agentID) resolves agent for
  the caller's tenant, preventing cross-tenant agent cache collisions
- Fix context.Background() in title goroutine and summarization — now
  uses tenant-aware context (WithoutCancel) so titles and compaction
  persist to the correct tenant
- Add read-only SessionStore.Get() method; replace GetOrCreate in auth
  checks (preview/patch/delete/reset) to prevent phantom session creation
- Inject tenant from channel instance into inbound message processing
- Remove "All Tenants" option from tenant selector, topbar switcher,
  and ws-provider auto-select — admin must always operate within a tenant
- Fix contacts page selection toolbar layout shift (always rendered)
- Widen MCP credentials sensitive header regex to catch API_KEY etc.

* fix(security): propagate tenant_id in consumer handlers and background ops

- InjectTeamDispatch: use context.WithoutCancel instead of context.Background
  to preserve tenant_id while avoiding cancel propagation from HTTP/WS handlers
- handleTeammateMessage/handleSubagentAnnounce: inject tenant_id from msg
- Add nil guard for outcome.Result to prevent panic on agent-not-found
- Use BroadcastForTenant for EventTeamTaskFailed/Completed/LeaderProcessing
- Remove unnecessary WithCrossTenant in autoSetFollowup (ctx already scoped)
- resolveAgentByKey: accept ctx param for tenant-scoped agent lookup
- pending_messages: use request ctx instead of cross-tenant for GetDefault

* fix(security): tenant-scope EnsureContact and PendingHistory DB operations

- All channel EnsureContact calls now use tenant-scoped ctx instead of
  context.Background (whatsapp, slack, discord, telegram, feishu, zalo)
- PendingHistory: add tenantID field, thread through constructors
- All PendingHistory DB ops (load, flush, compact, delete) use tenantCtx()
- Normalize timeouts: 10s for simple queries, 15s for batch writes

* feat(teams): auto-attach media, retry completed tasks, improve tool messages

- Auto-attach workspace media from any tool (create_image/audio/video) to
  team tasks via loop-level hook, not just write_file interceptor
- Store absolute paths in team_task_attachments instead of relative
- Extend retry action to support completed tasks (reopen for follow-up)
- Context-aware comment result messages with next-action guidance for
  leader vs member roles and task status
- All tool results include task_id for agent follow-up actions
- Use #N "subject" format instead of raw UUIDs in tool messages

* feat(multi-tenant): tenant isolation for media, events, providers and UI

- Tenant-scoped media store, event filter, provider registry
- Tenant header propagation in WS/HTTP clients
- UI: tenant-aware chat messages, markdown renderer improvements
- Protocol: tenant error codes and event definitions

* docs: add multi-tenant architecture documentation

* fix(teams): store absolute paths in team_task_attachments

- AutoAttachWorkspaceFile: use cleanPath consistently instead of raw absPath
- executeAttach: resolve relative paths to absolute via team workspace
- AfterWrite interceptor already uses filepath.Clean (verified)

* fix(teams): attachment download handles both absolute and relative paths

filepath.Join with an absolute att.Path discards the teamBase prefix,
causing path traversal check to fail and download to serve wrong file.
Now checks IsAbs first — uses path directly for new absolute entries,
falls back to legacy join for old relative entries.

* fix(teams): attachment download validates against workspace root not tenant dir

Absolute paths stored in DB don't match TenantTeamDir structure
(master tenant has no tenants/ prefix). Now validates absolute paths
against dataDir (workspace root) instead. Legacy relative paths still
resolve via TenantTeamDir as before. IDOR check on att.TeamID ensures
cross-team isolation.

* fix(teams): attachment download uses workspace root, not data dir

Files are stored under GOCLAW_WORKSPACE/teams/ but handler was passed
dataDir (GOCLAW_DATA_DIR) — completely different directory. Now passes
workspace. Legacy relative paths resolve via {workspace}/teams/{teamID}/{chatID}/{path}.

* fix(security): use HMAC-signed file tokens for attachment downloads

Replace gateway token exposure (?token=) with HMAC-signed short-lived
file tokens (?ft=) for team task attachment downloads — same mechanism
used by chat file URLs.

Backend:
- team_attachments auth: accept ?ft= signed token (priority 1), Bearer (priority 2)
- teams_tasks RPC: sign download_url with HMAC at delivery time
- Add fileTokenSecret to TeamsMethods, thread through wireChannelRPCMethods

Frontend:
- Use server-signed download_url from attachment data instead of ?token=
- Remove useAuthStore dependency from task-detail-dialog

* fix(security): decouple file token signing from gateway token

- Generate random 256-bit HMAC key at startup (crypto/rand, memory-only)
- All file signing/verification uses FileSigningKey() instead of gateway token
- Remove ?token= query param fallback from /v1/files/, /v1/media/, attachments
- Only ?ft= signed tokens and Bearer header accepted for file access
- Reduce file token TTL from 1h to 5min
- Frontend: remove gateway token from all file URLs and imports
- Note: tokens invalidate on restart (acceptable for 5min TTL + WS reconnect)

* fix(ui): use signed download_url for task attachments

- Add download_url to TeamTaskAttachment type
- Use a.download_url (server-signed ?ft=) instead of bare URL

* fix(security): tenant-scope team workspace paths + show user/tenant in topbar

- WorkspaceDir callers now use config.TenantWorkspace() to resolve
  tenant-scoped base dir (non-master tenants get workspace/tenants/{slug}/)
- Fixes: all tenants previously wrote to global /app/workspace/teams/
  without filesystem-level isolation
- Affected: loop.go (agent run), team_tasks_mutations.go (task creation)
- teams_workspace.go already correct (uses TenantTeamDir)
- UI topbar: show "userId (tenantName)" in user menu

* feat(ui): redesign task detail dialog with improved UX

- Split monolithic 343-line component into 5 focused files
- New header: subject as title, identifier + status badges above
- Metadata grid with soft bg-muted/30 background, priority icons
- Attachments: card-style with mime-type icons + proper download Button
- Description/Result: markdown rendering via MarkdownRenderer
- Comments: avatar circles + markdown rendering for content
- All sections collapsible with chevron + count badge
- Timeline: vertical dot-line pattern, collapsed by default
- Fix kanban card hover layout shift (opacity instead of display toggle)

* fix(security): tenant-scoped workspace paths and tool cache isolation

- Scope team workspace paths to tenant directory
- Add tenant isolation to tool cache and task reads
- Shell deny pattern improvements
- Agent resolver and context file tenant scoping
- Sidebar tenant/user display fix
- Add tests for workspace, boundary, and context file interceptor

* fix(ui): tenant visibility fallback, merge coming-soon, task detail tweaks

- Tenants page: try tenants.list (owner), fall back to tenants.mine
  for regular users; hide create button for non-owners
- Merge contacts dialog: add coming-soon banner (i18n en/vi/zh),
  disable form and submit button
- Task detail: collapse attachments/comments by default,
  guard download_url before rendering
2026-03-23 08:08:23 +07:00
viettranx 843b550651 feat: runtime packages UI, pkg-helper, configurable shell deny groups (#244)
Runtime package management with security hardening:

- pkg-helper: root-privileged daemon for apk install/uninstall via Unix socket
- HTTP API: /v1/packages (list/install/uninstall/runtimes), admin role required for writes
- Shell deny groups: 15 configurable groups (per-agent overrides via context)
- Packages UI: Web page for managing system/pip/npm packages with confirmation dialogs
- Docker: privilege separation (root entrypoint → su-exec drop), init for zombie reaping
- Security: umask socket creation, persist file validation, deny pattern hardening
  (Node.js fetch/http, Python from/import, curl localhost, sensitive env vars)
- Auth: empty gateway token → admin role (dev/single-user mode)
2026-03-17 19:50:26 +07:00
Viet Tran 037d18f711 docs: comprehensive audit and update of all documentation (#231)
* feat(ui): improve kanban UX, fix dialog scroll, remove delegation page

- Kanban: reorder columns (blocked after pending), show blocked-by info
  on cards, clickable blocker links in task detail, framer-motion card
  animation between columns
- Dialogs: standardize scroll pattern across all modals — header fixed,
  scrollbar flush with outer edge via negative margin trick
- Remove delegation page, types, events, i18n, routes, and all references
- Fix activity_logs NULL jsonb scan error (COALESCE)
- Board header: show text labels on action buttons (desktop)

* docs: comprehensive audit and update of all documentation

- Update Go 1.25 → 1.26, PostgreSQL 15+ → 18 across all docs
- Add 10 missing internal modules to CLAUDE.md project structure
- Expand provider docs from 2 to 6 packages (Anthropic, OpenAI, DashScope, Claude CLI, ACP, Codex)
- Add 8 missing store interfaces to data model docs (22 total)
- Update bootstrap files from 7 to 13 templates
- Expand tool inventory from ~35 to 60+ tools with media/KG/credential categories
- Fix Team Task Board: add blocked status, 3 missing actions, V2 versioning, delegate restrictions
- Remove all references to removed features: handoff, delegate_search, evaluate_loop, agent_links
- Fix lane defaults (2/4/1 → 30/50/100/30), ghost file references, models.list → providers.models
- Add SecureCLI, snapshot worker, cost calculation, pairing security docs
- Comprehensive changelog catch-up
- Trim docs/03-tools-system.md to 800-line limit
2026-03-16 22:51:57 +07:00
Goon 75c570e951 feat(security): credentialed exec + HTTP RBAC + API key cache (#197)
- Secure CLI credential injection via AES-256-GCM encrypted env vars
- API key management with fine-grained RBAC scopes
- resolveAuth/requireAuth middleware across all 25+ HTTP handlers
- In-memory API key cache with TTL, negative caching, pubsub invalidation
- Sandbox-first execution (fails if unavailable, no silent fallback)
- Credential scrubbing, constant-time token comparison, Admin-only CLI creds
- SQL migration 000020: secure_cli_binaries + api_keys tables
- 14 unit tests for cache and RBAC with race detector

Closes #197
2026-03-15 20:13:18 +07:00