8 Commits

Author SHA1 Message Date
viettranx 860bfff38f chore: minor formatting and cleanup across config, audio, cache, http, and provider modules
- config: align struct tag formatting in TelemetryConfig for readability
- audio tests: modernize test patterns
- cache, http, hooks, providers: small style and cleanup passes
- No functional changes
2026-04-23 19:16:12 +07:00
viettranx 3b4004946e chore: apply go fix syntax upgrades (min/max, switch patterns)
Apply Go 1.21+ min/max builtins and improved switch pattern matching
across test files and utilities. No functional changes.
2026-04-15 11:24:57 +07:00
viettranx 83a6b3c68f test(cache,sessions,knowledgegraph): slim wave 3 coverage push
Add Wave 3 package coverage:
- cache/permission_cache_test.go: PermissionCache 9 methods + invalidation
- sessions/key_extra_test.go: session key builders
- sessions/manager_extra_test.go: SetHistory, Save, loadAll
- knowledgegraph/extractor_helpers_test.go: Extract with mock provider, splitChunks, mergeResults
2026-04-11 21:22:23 +07:00
viettranx 8d37dc45ea feat(pipeline): per-provider context window, cache/tenant/pipeline hardening
Ship Group A (bug fixes) + Group B (pipeline enhancements) from
plans/260410-1009-openclaw-ts-feature-port — five changes that share
cmd/ wiring and pipeline plumbing so they commit as a unit.

cache: InMemoryCache now supports periodic sweep + max-size cap via
variadic options. PermissionCache wires 60s sweep + 10k entry cap +
Close() hook in gateway shutdown so long-running gateways don't leak
per-user permission entries. Backward compatible for zero-arg callers.

store: ContactCollector.seen cache key now includes tenant_id + channel_
instance so the same sender in different tenants (or different bot
instances in the same tenant) no longer silently skip upserts against
each other. Zero-tenant (Desktop) behaviour preserved.

pipeline: EffectiveContextWindow is resolved once per run in ContextStage
via a ResolveContextWindow callback (backed by providers.ModelRegistry)
so PruneStage bills history against the actual model window instead of
a stale static config. Nil resolver / unknown model fall back to
Config.ContextWindow for backward compatibility. Locked to the model
observed at context build time to prevent mid-run budget drift.

pipeline: PipelineConfig.ReserveTokens carves out an optional safety
buffer subtracted from the history budget so compaction fires slightly
before the hard limit — protects against provider over-delivery and
token counter drift on streaming responses. Zero (default) preserves
legacy budget math.

agent: ModelRegistry flows gateway → ResolverDeps → LoopConfig → Loop →
pipeline adapter so resolver can look up per-model capabilities at run
time without re-touching gateway internals.

20 regression tests across cache, contact collector, and pipeline cover
the critical paths: cross-tenant isolation, cache sweep + eviction +
Close idempotency, per-model window override + fallback, and reserve
token buffer behaviour. All passing with -race on both go build ./...
and go build -tags sqliteonly ./....
2026-04-10 12:00:34 +07:00
Viet Tran 8f56ddaa64 feat(v3): core architecture redesign — pipeline, memory, vault, evolution, providers, orchestration (#790)
* feat(v3): add core interface contracts and migration for v3 redesign

Foundation interfaces: TokenCounter, WorkspaceContext, DomainEventBus,
ProviderAdapter/Capabilities. Pipeline: Stage, RunState, MessageBuffer,
substates, Pipeline orchestrator. Memory: EpisodicStore, AutoInjector,
KG temporal extensions, consolidation workers. System integration:
PromptConfig, ToolCapability, Retriever. Orchestration: OrchestrationMode,
EvolutionMetrics/SuggestionStore. Migration 000037: episodic_summaries,
evolution tables, KG temporal columns. Schema version 36→37.

* refactor(plans): mark all v3 design phases complete with file references

* fix(v3): address code review findings on design contracts

- C1: add missing l0_abstract column to episodic_summaries migration
- C2: align EpisodicSummary ID/TenantID/AgentID to uuid.UUID
- H1: document tenant_id scoping requirement on EpisodicStore
- H2: add UNIQUE constraint on (agent_id, user_id, source_id) for dedup
- H4: clarify ProviderAdapter vs Provider relationship in doc
- M3: set state.ExitCode on BreakLoop/AbortRun in pipeline
- M6: store full PipelineConfig in Pipeline struct
- Edge: add WHERE embedding IS NOT NULL on HNSW index

* fix(v3): second-pass review fixes

- H1: use context.WithoutCancel for finalize + set ExitCode on ctx cancel
- H2: use utf8.RuneCountInString consistently in FallbackCounter
- H3: longest-prefix-match in ModelContextWindow (prevents wrong tokenizer)
- H4: return unsubscribe cleanup func from consolidation.Register

* feat(v3): implement DomainEventBus with worker pool, dedup, and retry

Worker pool processes events from buffered channel. SourceID-based dedup
prevents duplicate processing. Exponential backoff retry on handler error.
Panic recovery per handler. Graceful shutdown via Drain(). 8/8 tests pass
with race detector.

* feat(v3): implement ProviderAdapter for Anthropic, OpenAI, DashScope, Codex

Add CapabilitiesAware to all 6 providers. Create ProviderAdapter
implementations that delegate to existing buildRequestBody/parseResponse
for DRY. ClaudeCLI and ACP get capabilities only (subprocess transport).
DashScope wraps OpenAI adapter with StreamWithTools=false override.

* feat(v3): implement WorkspaceContext Resolver for 6 scenarios

Stateless resolver produces immutable WorkspaceContext at run start.
Handles personal/group/predefined/team-shared/team-isolated/delegation.
Wired into loop_context.go behind v3PipelineEnabled flag (additive,
v2 path unchanged). Includes delegation path boundary check,
master tenant bypass, and tenant slug path composition.

* feat(v3): implement tiktoken TokenCounter with BPE encoding + cache

Adds tiktoken-go for accurate cl100k_base/o200k_base token counting.
Per-message FNV-1a hash cache avoids re-encoding unchanged history.
Falls back to rune/3 heuristic for unknown models. NewTokenCounter
factory selects implementation at build time.

* feat(v3): promote 12 other_config JSONB fields to dedicated agent columns

Extract emoji, agent_description, thinking_level, max_tokens,
self_evolve, skill_evolve, skill_nudge_interval, reasoning_config,
workspace_sharing, chatgpt_oauth_routing, shell_deny_groups, and
kg_dedup_config from the catch-all other_config JSONB into proper
columns with DB-level types and defaults.

- Migration: PG (000037) + SQLite (schema v6→7) with backfill
- Go: AgentData struct + simplified Parse* methods
- Store: SELECT/INSERT/scan updated for both PG and SQLite
- Gateway: create/update handlers accept promoted fields
- HTTP: export/import with legacy backward compat
- Web UI: all 15 frontend files read/write from top level

* feat(v3): implement Knowledge Vault with unified search, wikilinks, and FS sync

Migration 000038 adds vault_documents (FTS+pgvector), vault_links, vault_versions
tables. VaultStore interface with PG implementation for document CRUD, hybrid
FTS+vector search, and bidirectional link management. All queries enforce
tenant_id isolation including JOIN-based scoping on link operations.

FS sync layer: SHA-256 content hashing, VaultInterceptor hooks into write_file/
read_file for auto-registration and lazy sync, fsnotify watcher with 500ms
debounce. Wikilink engine parses [[target]] syntax, resolves targets via
3-step strategy, and maintains vault_links on write.

VaultSearchService fans out queries across vault, episodic, and KG stores in
parallel with per-source score normalization and weighted merge. AutoInjector
and Retriever implementations for pipeline integration.

Three agent tools: vault_search (unified discovery), vault_link (explicit
linking), vault_backlinks (dependency tracing). Feature-flagged via
v3_vault_enabled agent setting.

* feat(v3): wire vault into gateway startup + add unit tests

Wire VaultStore embedding provider, VaultSearchService, VaultInterceptor
on read/write tools, and register vault_search/vault_link/vault_backlinks
tools in gateway_vault_wiring.go. All wiring gated by stores.Vault != nil.

Add 28 unit tests for ContentHash, ContentHashFile, and ExtractWikilinks
covering edge cases, unicode, display text, context windows, and offsets.

* feat(v3): implement stage-based pipeline loop with 8 pluggable stages

Decompose monolithic agent loop into internal/pipeline/ package:
- 6 stages: Context, Think, Prune+MemoryFlush, Tool, Observe+Checkpoint, Finalize
- Foundation types: Stage interface, RunState with 7 typed substates, MessageBuffer
- Pipeline orchestrator with setup/iteration/finalize 3-phase execution
- Callback-based PipelineDeps avoids circular import with agent package
- Feature-flagged via v3PipelineEnabled in Loop.Run()
- All 7 exit conditions preserved (no tools, max iter, truncation, loop kill,
  read-only streak, tool budget, ctx cancel)

* feat(v3): wire pipeline callbacks to Loop methods + add 71 unit tests

Wire 15 of 17 PipelineDeps callbacks from Loop methods via closures:
- Context: LoadContextFiles, BuildMessages, EnrichMedia, InjectReminders
- Think: BuildFilteredTools, CallLLM (stream/sync)
- Prune: PruneMessages, CompactMessages
- Memory: RunMemoryFlush
- Finalize: SanitizeContent, FlushMessages, UpdateMetadata, BootstrapCleanup, MaybeSummarize
- Remaining: ExecuteToolCall, CheckReadOnly (deep loop.go integration)

Add comprehensive test suite (71 tests, all passing with -race):
- MessageBuffer: 10 tests (append, flush, replace, counts)
- Pipeline.Run: 14 tests (3-phase flow, exit conditions, ctx cancel)
- Stage tests: 47 tests (ThinkStage nudges/truncation, PruneStage budget,
  ToolStage parallel/exit, ObserveStage content, CheckpointStage interval,
  FinalizeStage cleanup)

* feat(v3): wire remaining 2 callbacks (ExecuteToolCall, CheckReadOnly)

Complete callback wiring — 17/17 PipelineDeps callbacks now active:
- ExecuteToolCall: resolves tool name, executes via registry, processes
  result via existing processToolResult with loop detection bridge
- CheckReadOnly: delegates to checkReadOnlyStreak via bridge runState
- Bridge runState shares loop detection state between pipeline and agent

* fix(v3): eliminate data race in tool execution + capture injected messages

- Remove parallel tool execution path — serialize all tool calls to avoid
  data races on shared bridgeRS (loop detector, media results, deliverables)
- Loop kill checked after each tool (mid-batch early exit)
- BuildFilteredTools: capture and append injected tool-awareness messages
- Rename test to reflect sequential execution

* feat(v3): wire ResolveWorkspace, safe parallel tools, ContextStage tests

- Wire ResolveWorkspace callback via workspace.NewResolver() with
  ResolveParams from Loop fields (no longer a nil stub)
- Re-add safe parallel tool execution: split into ExecuteToolRaw
  (parallel I/O) + ProcessToolResult (sequential state mutation)
  with opaque rawData pass-through (no double execution)
- Add 12 unit tests for ContextStage (8) + MemoryFlushStage (3)
- Split tool callbacks to loop_pipeline_tool_callbacks.go (under 200 lines)
- Capture buildFilteredTools injected messages

* feat(v3): add episodic memory store + temporal KG columns

Phase 1 — Episodic Store:
- Migration 000039: episodic_summaries table with pgvector, FTS, L0 abstracts
- EpisodicStore PG impl: CRUD, hybrid FTS+vector search, ExistsBySourceID,
  PruneExpired. Idempotent via source_id UNIQUE constraint.

Phase 2 — Temporal KG:
- Migration 000040: valid_from/valid_until on kg_entities + kg_relations,
  partial indexes for current-facts queries, epoch→timestamptz backfill
- ListEntitiesTemporal: current-only, point-in-time, or include-expired modes
- SupersedeEntity: atomic expire-old + insert-new in single transaction

Schema version bumped to 40.

* fix(v3): review fixes for episodic store + temporal KG

- C1: Fix column name mismatch turn_count vs message_count in Go SQL
- C2: Remove redundant migration 000040 (000037 already adds temporal KG columns)
- H1: Use time.Time not int64 for TIMESTAMPTZ columns in SupersedeEntity
- H2: Add tenant_id scoping to Get/Delete for tenant isolation
- M2: Fix scanEntityTemporal to convert TIMESTAMPTZ→UnixMilli correctly
- L1: Remove unused uuid import from episodic_search.go
- Schema version corrected to 39 (only 000039 is new)

* feat(v3): implement consolidation pipeline with 3 event-driven workers

Event chain: session.completed → EpisodicWorker → episodic.created →
SemanticWorker → entity.upserted → DedupWorker

- EpisodicWorker: reuses compaction summary or calls LLM, generates L0
  abstract (extractive), idempotent via source_id check
- SemanticWorker: extracts KG facts from episodic summary via existing
  Extractor, sets temporal valid_from, publishes entity.upserted
- DedupWorker: runs DedupAfterExtraction on new entity IDs (terminal)
- L0 abstract: sentence-based extraction (~50 tokens), no LLM needed
- All workers registered via DomainEventBus.Subscribe()

* feat(v3): implement progressive loading with L0 auto-inject + unified search

- AutoInjector: searches episodic store, builds L0 prompt section (~200 tokens),
  skips trivial messages via stopword filter
- L1Cache: in-memory LRU (500 entries, 1h TTL) for structured overviews
- UnifiedSearch: cross-tier search merging episodic + document results by score
- ContextStage integration: AutoInject callback appends memory section to system prompt
- MemorySection field added to ContextState for observability

* feat(v3): add memory_expand tool for L2 episodic retrieval

New tool: memory_expand(id) returns full episodic summary with metadata.
Complements memory_search L0/L1 results with deep L2 access.
Nil-safe: returns error message when episodic store not available.

Gateway wiring + memory_search depth param + kg_search temporal param
deferred to runtime integration phase.

* feat(v3): complete Phase 5 — tool extensions + gateway wiring

- memory_search: add depth param + episodic tier search merged with docs
- kg_search: add as_of temporal param, use ListEntitiesTemporal
- memory_expand: registered in gateway startup
- Gateway: Episodic field in Stores, PGEpisodicStore in factory,
  embedding provider wired, tools connected to episodic store

* fix(v3): Phase 3 review fixes — tenant isolation + AutoInject args

- C1: Add tenant_id filter to ftsSearch, vectorSearch, List queries
  (prevents cross-tenant episodic memory leaks)
- C2: Fix AutoInject callback signature — agent/tenant captured by
  closure, only userMessage + userID passed explicitly
- H1: Add tenant_id to List query

* feat(v3): wire per-agent v3 flags from DB into dual-mode gate

Parse v3_pipeline_enabled, v3_memory_enabled, v3_retrieval_enabled from
agent other_config JSONB via ParseV3Flags(). Resolver now sets all flags
on LoopConfig so the existing gate in loop_run.go reads from DB.

- V3Flags struct + ParseV3Flags() + ValidateV3Flags() in store layer
- v3MemoryEnabled/v3RetrievalEnabled added to Loop, LoopConfig, PipelineConfig
- Auto-inject gated on V3RetrievalEnabled (was unconditional)
- Structured perf logging for v3 pipeline runs
- v3 flag validation on both WS agent.update and HTTP PUT endpoints

* feat(v3): wire AutoInjector into pipeline for L0 memory auto-inject

Create AutoInjector at gateway startup from episodic store, pass through
ResolverDeps → LoopConfig → Loop. Pipeline adapter builds AutoInject
callback capturing agent/tenant context via closure.

ContextStage already gates on V3RetrievalEnabled + AutoInject != nil.

* feat(v3): add tool metadata map + capability-based deny rules

Registry gains per-tool ToolMetadata map with RegisterWithMetadata()
and GetMetadata() (infers defaults from tool name when not explicit).
PolicyEngine gains DenyCapability() for RBAC integration — tools with
denied capabilities filtered at step 8 after existing 7-step pipeline.

* fix(v3): add RWMutex to PolicyEngine capability deny fields

DenyCapability() and SetRegistry() now guarded by sync.RWMutex.
FilterTools reads snapshot under RLock. Prevents data race when
capability rules are modified concurrently with tool filtering.

* feat(v3): implement delegate tool for inter-agent task delegation

New `delegate` tool wraps existing agent_links infrastructure
(CanDelegate, DelegateTargets). Supports async (fire-and-forget)
and sync (block with timeout) modes. Permission checked via
AgentLinkStore. Events emitted: delegate.sent/completed/failed.

DelegateRunFunc injected by gateway to avoid circular dependency.

* feat(v3): complete 3 deferred implementations

1. OrchestrationMode resolution: ResolveOrchestrationMode() checks
   team membership → delegate links → spawn (priority order).

2. PG EvolutionMetricsStore: RecordMetric, QueryMetrics, aggregate
   tool/retrieval metrics, TTL cleanup. All queries tenant-scoped.

3. BridgePromptBuilder: implements PromptBuilder interface by
   delegating to existing BuildSystemPrompt(). Appends v3 memory
   L0 section when enabled. Ready for template engine swap later.

* fix(v3): address code review findings on commits 5-6

- C1: CanDelegate now tenant-scoped (fail-closed on missing tenant)
- H1: Sync delegate timeout capped at 600s
- H2: Async goroutine gets 10min deadline (prevents leaks)
- H3: JSONB casts use COALESCE/NULLIF guards (handles missing fields)
- M1/M2: Remove dead code (formatVaultSection, memoryL0ToStrings)

* fix(teams): stop auto-creating agent_links for team members

Teams use agent_team_members table directly — agent_links caused
context confusion between team dispatch and delegation systems.

- Remove autoCreateTeamLinks() calls from team create + member add
- Remove link cleanup from member remove
- Remove dead autoCreateTeamLinks() function
- Append DELETE to migration 000039: clear team-created agent_links

* fix(v3): tenant isolation for all agent_links queries + PromptBuilder Instructions

- DelegateTargets, GetLinkBetween, SearchDelegateTargets,
  SearchDelegateTargetsByEmbedding, DeleteTeamLinksForAgent all now
  scoped by tenant_id (fail-closed on missing tenant)
- BridgePromptBuilder now maps Instructions/InstructionContent to
  AGENTS.md context file (was silently dropped)

* feat(v3): wire orchestration mode + evolution metrics into agent loop

- Orchestration mode: resolver resolves mode from team/links, tool filter
  hides delegate/team_tasks based on mode, prompt builder injects delegation
  targets section
- Evolution metrics: non-blocking goroutine records tool execution metrics
  (name, success, duration) via EvolutionMetricsStore in both v2 loop and
  v3 pipeline paths (sequential + parallel)
- Fix review findings: tenant ID propagated via store.WithTenantID in
  background goroutine, 5s timeout prevents goroutine leak

* feat(v3): implement suggestion engine with pluggable analysis rules

- PG EvolutionSuggestionStore: CRUD for agent_evolution_suggestions table
- SuggestionEngine: aggregates 7-day metrics, runs rules, deduplicates
  pending suggestions per type before creating new ones
- 3 initial rules: LowRetrievalUsage (usage_rate<0.2), ToolFailure
  (success_rate<0.1), RepeatedTool (>100 calls/week → suggest skill)
- EventSuggestionCreated event type added to eventbus
- Cron wiring deferred to gateway startup integration pass

* feat(v3): implement auto-adapt guardrails with apply/rollback

- AdaptationGuardrails: max delta per cycle, min data points, locked
  params, rollback-on-drop percentage
- ApplySuggestion: applies threshold suggestions to agent other_config
  JSONB, stores baseline for rollback
- RollbackSuggestion: restores baseline values from suggestion params
- EvaluateApplied: compares post-apply metrics to baseline, auto-rolls
  back when quality drops beyond threshold
- Scope limited to retrieval params only (never security settings)

* feat(v3): wire evolution stores + daily/weekly cron for suggestions

- Add EvolutionMetrics + EvolutionSuggestions to Stores struct + PG factory
- Wire EvolutionMetricsStore into ResolverDeps (cmd/gateway_managed.go)
- Add gateway_evolution_cron.go: daily suggestion analysis + weekly
  evaluation/rollback for applied suggestions
- Cron runs as background goroutine with 5-min timeout per cycle

* fix(v3): address code review findings on evolution engine

- C1: persist baseline parameters before marking suggestion as applied
  (was building map but never saving — rollback would always fail)
- H1: add tenant_id isolation to UpdateSuggestionStatus, GetSuggestion,
  and new UpdateSuggestionParameters method

* test(v3): add unit tests for orchestration, suggestions, guardrails, prompt

- orchestration_mode_test: orchModeDenyTools (4 modes) + ResolveOrchestrationMode
  (4 scenarios with mock stores)
- suggestion_rules_test: LowRetrievalUsage, ToolFailure, RepeatedTool with
  threshold boundary tests (at/below/above min data points)
- evolution_guardrails_test: DefaultGuardrails values + CheckGuardrails
  (insufficient data, locked params, zero-min fallback)
- prompt_builder_orchestration_test: BridgePromptBuilder orchestration section
  presence/absence across 4 scenarios + target content verification

* test(v3): add integration tests for evolution metrics + suggestions

- Test helper: shared PG connection with sync.Once migration, per-test
  tenant+agent seed with cleanup
- Evolution metrics: RecordMetric, AggregateToolMetrics (success rate),
  Cleanup (TTL deletion)
- Evolution suggestions: full CRUD, UpdateSuggestionParameters (baseline
  persist), tenant isolation (cross-tenant read blocked)
- Pipeline E2E: seed 25 failed tools + 55 low-usage retrievals, verify
  SuggestionEngine creates suggestions, verify dedup on second run
- Fix: migration 039 de-duped (episodic_summaries already in 037)
- Fix: NULL reviewed_by scan via sql.NullString

* feat(v3): add HTTP API handlers for evolution, vault, episodic, orchestration, v3-flags

5 new handler files exposing v3 backend stores as REST endpoints:
- evolution_handlers.go: metrics query/aggregate + suggestions CRUD
- vault_handlers.go: cross-agent document listing + search + links
- episodic_handlers.go: episodic summaries list + hybrid search
- orchestration_handlers.go: computed mode + delegate targets (read-only)
- v3_flags_handlers.go: per-agent v3 feature flag get/toggle

Store fixes from code review:
- episodic FTS: use inline to_tsvector (no stored tsv column)
- episodic: conditional user_id filter in List + Search (admin view)
- episodic: add tenant_id to ExistsBySourceID + PruneExpired
- evolution: require tenant_id in context (no struct fallback)
- evolution: check RowsAffected on suggestion updates
- vault: optional agent_id filter in ListDocuments (cross-agent)

* feat(v3): add web UI for evolution tab, v3 settings, vault page, episodic memory

Agent Detail enhancements:
- V3 Settings section: pipeline/memory/retrieval flag toggles
- Orchestration section: mode badge + delegate targets display
- Evolution section: added metrics + suggestions v3 flag toggles
- Evolution tab: Recharts metrics charts + suggestion review table
  with approve/reject/rollback actions + guardrails card

New pages:
- /vault: Knowledge Vault document registry with cross-agent listing,
  hybrid search dialog, document detail with wikilinks
- Memory page: added Episodic Memory tab with summary cards,
  expandable details, key topic badges, and hybrid search

Infrastructure:
- HttpClient: added patch() method
- Query keys: v3Flags, orchestration, evolution namespaces
- 4 new hooks: use-v3-flags, use-orchestration, use-evolution-metrics,
  use-evolution-suggestions, use-vault, use-episodic
- i18n: vault namespace (en/vi/zh), agents + memory keys updated
- Reused formatRelativeTime from lib/format.ts (eliminated 3 duplicates)

* refactor(http): add bindJSON helper and migrate all decode call sites

Replace 36 json.NewDecoder(r.Body).Decode + error blocks with bindJSON
across 20 HTTP handler files. Standardizes decode error responses to
structured writeError format. Fixes unchecked decode in handleIndexAll.

* refactor(store): adopt sqlx for PG scan operations (Phase 1+2)

Add jmoiron/sqlx v1.4.0 with camelToSnake json tag mapper.
Migrate scan-heavy PG store methods to sqlx Get/Select:
- tracing.go: GetTrace, ListTraces, ListChildTraces, GetTraceSpans, GetCostSummary
- heartbeat.go: Get, ListDue, ListLogs
- providers.go: GetProvider, GetProviderByName, ListProviders, ListAllProviders
- mcp_servers.go: GetServer, GetServerByName, ListServers
- pairing.go: ListPending, ListPaired
- agents_export_queries.go: 5 export functions
- agents_export_team_queries.go: exportTeamMembers, ExportAgentLinks

All writes (INSERT/UPDATE/DELETE), execMapUpdate, and dynamic WHERE
builders remain raw SQL. Zero behavior change.

* refactor(store): adopt sqlx for SQLite scan operations (Phase 3)

Migrate SQLite store scan methods to sqlx Get/Select:
- providers.go: GetProvider, GetProviderByName, ListProviders, ListAllProviders
- tenants.go: GetTenant, GetTenantBySlug, ListTenants, GetTenantUser, ListUsers, ListUserTenants
- mcp_servers.go: GetServer, GetServerByName, ListServers

Create sqlx_scan_structs.go with sqliteTime-aware scan structs
(providerRow, tenantRow, tenantUserRow, mcpServerRow) to handle
SQLite TEXT timestamp parsing via StructScan.

* refactor(store): migrate PG bulk scan operations to sqlx (Phase 4)

Migrate scan-heavy methods across 6 PG store files:
- tenant_store.go: GetTenant, GetTenantBySlug, ListTenants, GetTenantUser,
  ListUsers, ListUserTenants — removed 3 scan helpers
- teams.go: ListTeams, GetTeam, ListMembers, ListMembersByTenant
- teams_tasks_activity.go: ListComments, ListEvents, ListFollowUps
- pending_message_store.go: ListPending, ListByHistoryKey
- skills_grants.go: ListAgentGrants
- config_permissions.go: CheckPermission

~20 scan ops converted. Files with encryption post-processing,
pq.Array, pgvector, or dynamic SQL kept raw.

* refactor(store): extract shared CamelToSnake mapper, add UUIDArray usage note

- Move camelToSnake to internal/store/column_mapper.go (DRY)
- Both pg and sqlitestore packages now import shared CamelToSnake
- Add planned-use comment on UUIDArray type

* refactor(cli): migrate commands from config.json to HTTP API, add providers/setup/TUI

- Add unified HTTP client (gateway_http_client.go) with auth, error parsing, typed generics
- Rewrite agent list/add/delete to use gateway HTTP API instead of config.json
- Rewrite channels list to HTTP API, add channels add/delete subcommands
- Replace models command with full providers CRUD (list/add/update/delete/verify)
- Add setup wizard command (provider → agent → channel post-onboard flow)
- Add Bubble Tea TUI behind build tag (tui/!tui with noop fallback)
- Update onboard next-steps to mention goclaw setup
- Add build-tui Makefile target
- Fix URL path injection (url.PathEscape on all user-supplied path segments)
- Fix UTF-8 truncation in skills description display

* refactor(store): add explicit db struct tags, fix sqlx mapper for heartbeat scan error

Switch sqlx mapper from NewMapperFunc (which only applies CamelToSnake to
field names, not tag values) to NewMapperFunc("db", CamelToSnake) with
explicit db:"column_name" tags on all store structs.

Root cause: NewMapperFunc("json", fn) sets mapFunc but not tagMapFunc,
so camelCase json tags like "agentId" were used as-is instead of being
converted to "agent_id", causing "missing destination name" scan errors.

Fix: use db struct tags as the source of truth for column mapping.
Every DB entity field gets db:"column_name", nested JSON configs and
runtime-only structs get db:"-".

* test(store): add integration tests for 13 store interfaces (70 tests)

Cover Tier 1 (critical) + Tier 2 (security) stores with integration tests
running against pgvector pg18. Coverage from 2.4% to ~54%.

Stores tested: Session, Agent, Team/Task, Memory, KnowledgeGraph, Vault,
MCP Server, API Key, ConfigPermission, Contact.

Infrastructure: fixture builders (seedTeam, seedMCPServer, etc.),
mock EmbeddingProvider, multi-tenant helpers, expanded cleanup.

* fix(store): resolve NULL scan bugs in MCP server and task metadata

- mcp_servers: COALESCE nullable TEXT columns (display_name, command,
  url, api_key, tool_prefix) to prevent sqlx scan failures
- mcp_servers_access: COALESCE nullable JSONB columns in ListAgentGrants
  (tool_allow, tool_deny, config_overrides) to prevent silent row drops
- teams_tasks: default task metadata to '{}' instead of nil to satisfy
  NOT NULL constraint on CreateTask
- sqlx_helpers: export InitSqlx for integration test setup

* feat(pipeline): fix v3 pipeline context injection, tracing, KG temporal filters

- Pipeline context: add InjectContext + LoadSessionHistory callbacks to
  ContextStage, propagate enriched ctx via state.Ctx for iteration stages
- Pipeline tracing: wrap makeCallLLM with emitLLMSpanStart/End, wrap
  makeExecuteToolCall/Raw with emitToolSpanStart/End
- Token counter: switch pipeline from FallbackCounter to TiktokenCounter
- KG temporal: add valid_until IS NULL filter to all entity/relation
  queries (list, search, vector, FTS, traversal CTE, stats)
- Skills: add SkillEmbedder interface for future hybrid BM25+vector search
- Cache: remove unused tenantResolve dead code from PermissionCache
- Store: fix NULL scan bugs in tracing metadata and agent skill_nudge
- Test: add TestStoreKG_TemporalFilter integration test
- UI: add v3 version badge, evolution section, memory/traces improvements

* refactor(store): migrate KG store from raw sql.Rows to sqlx StructScan

Migrate 6 knowledge graph store files from manual rows.Scan() to
pkgSqlxDB.GetContext/SelectContext with intermediate scan row structs.

- Add entityRow, relationRow, traversalRow, dedupCandidateRow structs
  with json.RawMessage for jsonb and time.Time for timestamptz columns
- Add toEntity()/toRelation() converters (UnixMilli + json.Unmarshal)
- Add sqlxTx() helper for wrapping *sql.Tx with sqlx mapper
- Fix ScanDuplicates passing time.Now().Unix() to TIMESTAMPTZ column
- Fix ListEntitiesTemporal missing tenant scope (scopeClause)
- Fix SupersedeEntity missing tenant scope and tenant_id on INSERT
- Fix DedupCandidate.CreatedAt using Unix() instead of UnixMilli()
- Update agents_export_queries.go to reuse new scan row structs
- Net -160 lines of manual scan boilerplate removed

* refactor(store): migrate memory, skills, agents, sessions, mcp, cron, vault stores to sqlx

Batch migration of 19 store files from raw rows.Scan() to
pkgSqlxDB.GetContext/SelectContext with intermediate scan row structs.

Groups migrated:
- Memory: memory_docs, memory_admin, memory_search, memory_embedding_cache
- Episodic: episodic_search, episodic_summaries
- Skills: skills, skills_admin, skills_embedding, skills_export_queries
- Agents: agents (backfill+shares), agents_context, agents_export_team_standalone
- Sessions: sessions_list (List, ListPaged, ListPagedRich)
- MCP: mcp_servers_access, mcp_export_queries
- Cron: cron_exec (GetRunLog)
- Vault: vault_documents (ListDocuments, ftsSearch, vectorSearch)
- Tenant: tenant_configs (ListDisabled, ListAll)

7 new scan row files created. Net -510 lines of manual scan boilerplate.
INSERT/UPDATE/DELETE and scalar COUNT queries kept as raw SQL.

* fix(store): fix 3 sqlx scan struct db tag issues found by audit

- Fix vault FTS alias mismatch: `AS rank` → `AS score` (critical: runtime scan error)
- Fix episodic key_topics type: json.RawMessage → pq.StringArray (TEXT[] column)
- Fix agentShareRow.CreatedAt: string → time.Time, wire to output struct

* feat(providers): implement Wave 2 provider resilience and intelligence

9-phase implementation covering:
- Request middleware chain with composable body transformers
- OpenAI prompt caching, service tier, and fast mode middlewares
- Error classification (9 categories) with two-tier failover
- Model registry with forward-compat resolvers (Anthropic + OpenAI)
- Embedding providers (OpenAI + Voyage) with 1536-dim validation
- Cooldown/probe system with per-provider:model state tracking
- Markdown-aware chunking shared across 5 channels
- Session recall via FTS + pgvector on episodic summaries
- Dreaming/promotion pipeline for long-term memory consolidation

Migrations: 000040 (episodic search index), 000041 (promoted_at column)
Schema version: 39 → 41

* feat(providers): wire model registry into gateway provider construction

Create InMemoryRegistry with Anthropic + OpenAI forward-compat resolvers
at gateway startup. Pass to all Anthropic and OpenAI providers created
from both config and DB sources.

* feat(consolidation): wire DomainEventBus and consolidation pipeline

Create DomainEventBus at gateway startup, thread through resolver →
LoopConfig → Loop → PipelineDeps. Emit session.completed event after
each run finalization. Register consolidation pipeline (episodic →
semantic → KG dedup → dreaming) with event bus subscriptions.

* fix(store): fix episodic key_topics pq.Array, ON CONFLICT, and migration 040 immutability

- episodic_summaries.go Create: json.Marshal(KeyTopics) → pq.Array (text[] column)
- episodic_search.go scanEpisodic/scanEpisodicRow: json.RawMessage → pq.StringArray
- episodic_summaries.go Create: ON CONFLICT add WHERE source_id IS NOT NULL for partial index
- migration 040: add immutable_array_to_string wrapper (array_to_string is STABLE in PG)

* test(store): add 17 integration tests for skills, cron, episodic, tenant configs

- Skills store: 6 tests (CRUD, grants, tenant isolation)
- Cron store: 4 tests (job CRUD, run log sqlx scan, pagination, tenant isolation)
- Episodic store: 4 tests (summary CRUD, list, FTS search, tenant isolation)
- Tenant configs: 3 tests (tool/skill disable, list, tenant isolation)
- Test helper: add cleanup for skills, cron, episodic tables

* fix(permissions): use cron-specific permission check for cron tool (#725)

* fix(security): harden exec path exemption matching (#721)

- Add absolute path exemption for dataDir/skills-store/ (fixes skill
  scripts using absolute paths like /app/data/skills-store/ being denied)
- Strip surrounding quotes before prefix matching (LLMs often quote paths)
- Reject path traversal ("..") in exempt fields to prevent escape
- Switch from "any field exempt → skip" to per-field matching: only exempt
  if ALL fields that match the deny pattern are individually exempt
- Closes pipe/comment bypass vectors where an exempt path in one argument
  would exempt the entire command including non-exempt paths

Includes 27 test cases covering: legitimate access, quoted paths,
path traversal, unicode bypass, pipe/comment bypass, mixed args.

* fix(permissions): use cron-specific permission check for cron tool

Cron tool was hardcoded to check `file_writer` configType via
CheckFileWriterPermission(), ignoring the `cron` configType that
the UI actually saves when granting cron permissions. This caused
agents in group chats to be denied cron access even with correct
permission configured.

Add ConfigTypeCron constant and CheckCronPermission() that checks
`cron` configType first, falling back to `file_writer`.

---------

Co-authored-by: Viet Tran <viettranx@gmail.com>

* fix(chat): load message history on first conversation click (#730)

* fix(chat): load message history when selecting existing conversation from clean state

The skipNextHistoryRef was unconditionally set when sessionKey transitioned
from empty to non-empty. This prevented loadHistory() from running when
clicking an existing conversation from the initial /chat page. The skip
was only intended for the new-chat send flow where the optimistic message
is already displayed.

Guard the skip with expectingRunRef so it only activates when a message
send is in flight.

Closes #729

* docs: add UI diff evidence for PR #730

Before/after screenshots and HTML comparison report showing
first conversation click behavior fix.

* feat(whatsapp): port native WhatsApp channel with whatsmeow from dev

Cherry-pick 0db1e93a with manual conflict resolution:
- cmd/channels_cmd.go: kept dev-v3 HTTP API approach (not config-based)
- go.mod/go.sum: merged deps, ran go mod tidy

* feat(ui): v3 web UI enhancement — branded loading, rich markdown, vault graph, sidebar polish

- Branded loading: HTML pre-loader with logo pulse/shimmer, fade-out on app ready, PageLoader logo swap
- Rich markdown: wikilinks, mermaid (lazy-loaded), math/KaTeX, callouts/admonitions plugins
- Vault graph: force-directed document graph view with table/graph toggle
- Agent filters: type filter (open/predefined) on agents page
- Sidebar: tenant name + role badge in footer
- Query keys: add vault + episodic entries

* fix(ui): address code review — mermaid XSS, Safari compat, cache key

- Change mermaid securityLevel from "loose" to "strict" (XSS prevention)
- Add requestIdleCallback fallback for Safari < 17
- Fix vault all-links query key to use sorted doc IDs (stale cache fix)
- Remove dead abortRef from MermaidBlock
- Add prefers-reduced-motion to loader animation

* docs: update CLAUDE.md with v3 architecture + complete changelog

- Add 10 new v3 internal modules to project structure (pipeline, eventbus, consolidation, tokencount, vault, workspace, etc.)
- Add native WhatsApp channel, edition system, providerresolve, updater to structure
- Update Key Patterns with 8 v3-specific patterns (pipeline, eventbus, 3-tier memory, vault, evolution, orchestration, middleware)
- Add comprehensive V3 Redesign section to changelog covering:
  - 8-stage pipeline with dual-mode gate
  - DomainEventBus + consolidation workers
  - 3-tier memory (working/episodic/semantic)
  - Knowledge Vault with wikilinks
  - Self-evolution engine
  - Orchestration + delegate tool
  - WorkspaceContext resolver
  - ModelRegistry + provider adapter
  - Feature flags
  - Request middleware
  - sqlx migration + 70+ integration tests

* fix(security): harden file path validation, tenant isolation, and tool access

- handleSign: validate path within workspace/dataDir before signing HMAC token
- handleSign: enforce tenant-scoped restriction for RBAC-enabled editions
- handleServe: add workspace/dataDir boundary check for ft= signed requests
- handleServe: remove cross-tenant findInWorkspace fallback for ft= requests
- TenantDataDir/TenantWorkspace: guard against empty slug resolving to parent
- exec tool: add tenants/ to AllowPathExemptions for tenant skill execution
- list_files: add AllowPaths support and wire skills directory access

* docs(v3): update 16 docs + add 2 new docs for v3 architecture

Update all docs/ to reflect v3 implementation (64 commits, 31K insertions):
- 00: architecture overview with 7 new packages
- 01: agent loop with pipeline, orchestration, evolution
- 02: providers with Wave 2 resilience (middleware, failover, registry)
- 03: tools with delegate, vault_search, vault_link, memory_expand
- 04,08,10,11,14,23: targeted v3 additions
- 06: store with 6 new tables, promoted columns, sqlx
- 07: memory with 3-tier architecture, consolidation pipeline
- 18,19: HTTP/WS API with v3 endpoints and methods
- 21: evolution system (metrics, suggestions, auto-adapt)
- model-steering: model registry relationship

New docs:
- 22-v3-http-endpoints.md: 12 v3 HTTP endpoints
- 24-knowledge-vault.md: vault architecture, wikilinks, search

* feat(v3): wire delegate tool, fix pipeline callbacks, clean dead code

Wire delegate tool end-to-end:
- Register DelegateTool in gateway with DelegateRunFunc
- Implement runFn: resolve target agent, build session key, propagate tracing
- Add async announce via msgBus (reuses subagent announce handler)
- Populate context fields (TenantID, Channel, ChatID) for routing
- Set DelegationID + ParentAgentID on RunRequest for event correlation

Fix pipeline callbacks:
- BreakLoop now completes remaining stages in current iteration
- EnrichMedia signature updated to use RunState for message buffer access
- Add non-streaming event emission for channel compatibility
- Fix user message flush tracking in v3 pipeline

Clean dead code (510 LOC removed):
- Delete memory/l1_cache.go, unified_search.go (superseded by vault search)
- Delete vault/auto_injector_impl.go, retriever_impl.go (never wired)
- Delete vault/sync_worker.go (never started)
- Remove orphaned EventMemoryLint, EventSuggestionCreated constants

* feat(v3): finalize stage — emit session.completed, NO_REPLY, strip directives

- Emit session.completed event for consolidation pipeline (episodic → semantic → dreaming)
- Detect NO_REPLY before flush so silent content is persisted for context
- Strip [[...]] message directives from user-facing content (v2 parity)
- Wire StripMessageDirectives, IsSilentReply, EmitSessionCompleted callbacks

* feat(vault): full CRUD — backend endpoints, UI dialogs, content preview

Backend (5 new HTTP endpoints):
- POST/PUT/DELETE /v1/agents/{id}/vault/documents — create, update, delete
- POST/DELETE /v1/agents/{id}/vault/links — create, delete
- Server-side validation for doc_type and scope enums
- Agent ownership verification on link creation
- FK cascade handles link cleanup on document delete

Frontend (React):
- Create document dialog (title, path, type, scope)
- Edit mode in detail dialog (inline title/type/scope editing)
- Delete document with confirmation
- Create link dialog (from/to doc, link type, context)
- Delete link with inline confirmation on badges
- Content preview (collapsible, lazy-loads via /v1/storage/files/)
- Mutation hooks with query invalidation
- i18n keys for en/vi/zh

* fix(v3): critical pipeline parity fixes — ChatRequest, reasoning, passback, media

- Enrich ChatRequest with all provider options (temperature, sessionKey,
  agentID, userID, channel, workspace, tenantID) matching v2
- Add ResolveReasoningDecision for thinking models (o3, DeepSeek-R1, Kimi)
- Wire uniquifyToolCallIDs to prevent OpenAI 400 on duplicate IDs
- Add assistant message passback (Phase, RawAssistantContent) for Anthropic
- Emit block.reply for intermediate content (non-streaming channels)
- Add ContentSuffix append + ForwardMedia merge in FinalizeStage
- Build final assistant message with MediaRefs for session persistence
- Use effectiveMaxTokens() + OptMaxTokens constant
- Guard truncation retry on len(ToolCalls) > 0 + parseError check
- Accumulate ThinkingTokens in usage
- Deduplicate emitRun closure (shared from callbackSet)
- Fix EnrichMedia to receive RunState with actual messages
- Persist user message in makeFlushMessages (matching v2)

* fix(store): coerce NOT NULL JSONB columns to empty object on agent update

When switching provider away from ChatGPT OAuth, the UI sends
chatgpt_oauth_routing: null which violates the NOT NULL constraint.
Coerce null → '{}' for all NOT NULL JSONB promoted columns:
chatgpt_oauth_routing, reasoning_config, workspace_sharing,
shell_deny_groups, kg_dedup_config.

* feat(v3): v3 info modal redesign, agent links CRUD tab, sidebar rename

- Rewrite v3 info modal with 8 feature cards (pipeline, memory,
  retrieval, vault, evolution, orchestration, resilience, registry)
  with v2→v3 comparisons, stat badges, full i18n (en/vi/zh)
- Add tabbed layout to teams page: Agent Teams | Agent Links tabs
- Agent Links tab with full CRUD via existing WS RPC methods
  (list/create/edit/delete) with Radix Select, Combobox, mutual
  agent exclusion in create dialog
- Sidebar menu renamed to "Agent Link & Team"
- Backend: add source_display_name, source_emoji, target_emoji
  to agent link joined queries for consistent display

* fix(store): include personal chats in cron delivery targets

Add 'user' contact_type to ListDeliveryTargets SQL filter in both
PG and SQLite stores. Previously only group/topic contacts appeared
in cron channel/chat dropdowns.

* fix(v3): duplicate messages, missing thinking, span numbering

- ThinkStage: skip AppendPending for final answer (no tool calls),
  let FinalizeStage build the definitive message with sanitization
  and MediaRefs — fixes duplicate assistant messages in session history
- RunResult: add Thinking field, propagate through v2 finalizeRun,
  v3 convertRunResult, run.completed event, and chat.send response
- UI: capture thinkingRef before clearing on run.completed, include
  in final message object so thinking renders without page refresh
- Span numbering: pass Iteration+1 in v3 callback to match v2's
  1-based iteration display in trace span names

* fix(v3): wire delegation targets into BuildSystemPrompt

buildOrchestrationSection() was implemented and tested but never
wired into the actual BuildSystemPrompt() flow. Only the unused
BridgePromptBuilder had it. Add DelegateTargets + OrchMode fields
to SystemPromptConfig and inject "## Delegation Targets" section
so agents with agent_links see their delegation targets in prompt.

* fix(v3): sync mediaResults from bridgeRS to pipeline state

syncBridgeToState copied loopKilled, asyncToolCalls, deliverables
from the v2 bridgeRS but missed mediaResults. Tool results with
MEDIA: prefix were extracted by processToolResult into bridgeRS
but never propagated to state.Tool.MediaResults — causing
FinalizeStage to produce empty MediaRefs and RunResult.Media.

* fix(v3): populate SessionCompletedPayload in session.completed events

Both v2 loop and v3 pipeline emitted session.completed events with
nil Payload, causing episodicWorker type assertion to fail silently.
Episodic summaries were never created.

- Expand EmitSessionCompleted callback to pass msgCount, tokensUsed,
  compactionCount from pipeline state
- V3 path: build payload from state.Messages.TotalLen(),
  state.Think.TotalUsage, state.Compact.CompactionCount
- V2 path: build payload from history + rs.totalUsage +
  sessions.GetCompactionCount()

* fix(v3): remaining pipeline parity gaps — skill postscript, team task count

- Add SkillPostscript callback to PipelineDeps + FinalizeStage,
  matching v2's skill evolution nudge after complex tool runs
- Wire makeSkillPostscript() in adapter with same logic as v2
  (skillEvolve + skillNudgeInterval + totalToolCalls threshold)
- Sync teamTaskCreates from bridgeRS to pipeline EvolutionState
  (was already syncing teamTaskSpawns but missed creates)

* fix(evolution): add JSON struct tags to metric aggregates

ToolAggregate and RetrievalAggregate had no JSON tags, causing
Go to marshal field names as PascalCase (ToolName, CallCount)
while the UI expects snake_case (tool_name, call_count). Charts
rendered empty containers with no data bars.

Also change AvgDuration (time.Duration) to AvgDurationMs (float64)
for JSON-friendly serialization — time.Duration marshals as
nanoseconds which is unusable in frontend.

* fix(v3): add debug logging to episodic worker for consolidation pipeline

Add INFO/WARN logs at entry, summary decision, and creation points
to diagnose why episodic_summaries table stays empty in production.

* chore: silence noisy tenant_cache debug logs

* refactor(ui): replace v3 settings section with engine version picker + tabbed info modal

Replace flat toggle list with radio-style version cards (v2/v3) and
feature mini-cards. Redesign v3 info modal from single scroll to 3-tab
layout (Core Engine, Memory & Knowledge, Orchestration) with Lucide
icons and v2→v3 comparison cards.

- Add batchUpdate to use-v3-flags hook for atomic v2 switch
- Create engine-version-section with VersionCard + FeatureMiniCard
- Create v3-info-modal/ with 5 modular components (<45 LOC each)
- Add i18n keys (detail.engine + v3Info.tabs) for en/vi/zh
- Wire into agent-overview-tab, agent-header, agent-card
- Delete v3-settings-section.tsx + agent-v3-info-modal.tsx

* feat(v3): pass media files through delegate tool results

Delegate tool now carries media files (images, audio, etc.) produced
by the delegatee back to the parent agent. DelegateRunFunc returns
DelegateResult{Content, Media} instead of plain string.

- Add DelegateResult struct with Content + Media fields
- Convert agent.MediaResult to bus.MediaFile in gateway wire
- Attach media to sync result and async announce message
- Add MediaCount to DelegateCompletedPayload for observability
- Set MetaParentAgent in async announce metadata

* merge: bring main bug fixes into dev-v3

Cherry-pick 8 commits from origin/main:
- fix(telegram): handle group-to-supergroup migration (#698)
- feat(providers): add OpenRouter identification headers (#705)
- fix: deterministic prompt ordering for LLM cache hit (#719)
- fix(security): harden exec path exemption matching (#721)
- fix: invalidate storage size cache on delete and move (#726)
- fix: use errors.Is() for sentinel comparisons (#727)
- fix(desktop): add defaultValues to form dialogs (#737)
- Release: credential resolver, WhatsApp native, exec hardening (#754)

Conflict resolution:
- store/pg exports: accept main's errors.Is() additions
- shell.go: accept main's extracted matchesAnyPathExemption helper
- gateway_providers: merge both WithAnthropicName + WithAnthropicRegistry
- loop_types + resolver: merge v3 fields (OrchMode, DelegateTargets,
  EvolutionMetricsStore) with main's new UserResolver/ContactStore
- gateway_setup: keep dev-v3's tenant-scoped path exemptions
- channels_cmd: keep dev-v3's HTTP API approach

* feat(v3): add foundation packages for architecture refactor (Phase 1)

Purely additive — zero changes to existing files. Creates shared types
and helpers that Phase 2-4 will migrate callers to:

- internal/store/base/: Dialect interface, BuildMapUpdate, nullable/JSON
  helpers, scope clause builder, table metadata (44 tests)
- internal/orchestration/: ChildResult capture from v2/v3, media type
  conversion with round-trip tests (10 tests)
- internal/providers/sse_reader.go: Shared SSE scanner replacing inline
  bufio.Scanner boilerplate in 3 providers (8 tests)

* refactor(store): unify pg/ and sqlitestore/ helpers via base/ package (Phase 2)

- Create PG and SQLite dialect implementations (base.Dialect interface)
- Replace 15 duplicate helpers in pg/helpers.go with aliases to base.*
- Replace 13 duplicate helpers in sqlitestore/helpers.go with aliases
- Rewrite execMapUpdate and execMapUpdateWhereTenant to use
  base.BuildMapUpdate with dialect-specific placeholders
- Rewrite scopeClause/scopeClauseAlias as thin wrappers around
  base.BuildScopeClause
- Remove duplicate execMapUpdateWhereTenant from pg/agents.go

pg/helpers.go: 267→175 LOC, sqlitestore/helpers.go: 226→130 LOC

* refactor(orch): extract BatchQueue[T] generic for announce queues (Phase 3)

- Create internal/orchestration/batch_queue.go: generic producer-consumer
  queue replacing duplicated sync.Map+mutex pattern (10 tests, race-safe)
- Simplify cmd/gateway_announce_queue.go: team queue uses BatchQueue,
  removes announceQueueState/getOrCreate/drain/tryFinish (~40 LOC saved)
- Simplify cmd/gateway_subagent_announce_queue.go: subagent queue uses
  BatchQueue, same pattern reduction (~40 LOC saved)
- Update cmd/gateway_consumer_handlers.go: callers use new signatures

* fix: remove defer accumulation in announce loop, clean comment tombstone

- Remove `defer ptd.ReleaseTeamLock()` inside for loop that accumulated
  deferred calls per iteration (explicit ReleaseTeamLock already called)
- Remove dead comment tombstone in pg/agents.go

* fix: scope defer in announce loop via closure for panic safety

Wrap announce processing in closure so defer ptd.ReleaseTeamLock()
runs once per iteration instead of accumulating. Explicit release
still called for normal path; defer catches panics.

* refactor(providers): wire shared SSEScanner into 3 providers (Phase 4b)

Replace inline bufio.Scanner+SSE parsing boilerplate with shared
SSEScanner from sse_reader.go in:
- openai.go: data-only SSE (OpenAI, DashScope, Kimi)
- codex.go: data-only SSE (OpenAI Codex)
- anthropic_stream.go: event+data SSE (uses EventType() for switch)

Removes ~24 LOC of duplicated scanner setup + manual line parsing.

* refactor(agent): force v3 pipeline, remove v2 runLoop (Phase 4A)

- Delete runLoop() from loop.go (~745 LOC removed), keep shared helpers
  (resolveToolCallName, hasParseErrors, truncateToolArgs)
- Remove v2/v3 gate in loop_run.go: always call runViaPipeline()
- Remove v3PipelineEnabled field from Loop struct + LoopConfig + resolver
- Always resolve workspace in loop_context.go (was behind v3 gate)
- Deprecate PipelineEnabled in V3Flags (kept for backward compat parsing)

All agents now always use v3 pipeline. No behavioral change for agents
that already had v3_pipeline_enabled=true (which was all production agents).

* refactor(gateway): decompose gateway.go from 1295 to 476 LOC (Phase 4B)

Extract sections of runGateway() into focused files:
- gateway_deps.go: gatewayDeps struct for shared state
- gateway_http_wiring.go: wireHTTPHandlersOnServer (~207 LOC)
- gateway_events.go: event subscribers + teamTaskEventType (~367 LOC)
- gateway_lifecycle.go: signal handling, shutdown, server start (~222 LOC)
- gateway_tools_wiring.go: cron/heartbeat/session tool wiring (~116 LOC)

Also extracted: startCronAndHeartbeat, makeDelegateAnnounceCallback.
Pure structural refactoring — no behavior change.

* test(agent): add v3 force migration guard tests

Verify v2 runLoop is deleted, V3PipelineEnabled field removed from
LoopConfig, and V3Flags backward compat parsing still works.
Compile-time guards prevent accidental re-introduction of v2 code.

* feat(delegate): wire ChildResult + fix media passthrough (Phase 3 gap)

- Use orchestration.CaptureFromRunResult in delegate run callback
  to standardize result capture via ChildResult
- Use MediaResultToBusFiles in CaptureFromRunResult (DRY)
- Fix delegate_tool.go metadata key: delegate_id → delegation_id

* refactor(ui): remove v2/v3 pipeline toggle, always show V3 (Phase 4E)

- engine-version-section.tsx: remove pipeline toggle, show V3 read-only
- use-agent-version.ts: always return "v3" (no flag check)
- use-v3-flags.ts: remove v3_pipeline_enabled from toggleable flags
- Update i18n strings (en/vi/zh): remove v2-specific tooltip text

* test: add Phase 5 test infrastructure for v3 architecture refactor

- pg/pg_dialect_test.go: 4 tests for PG Dialect (placeholder, transform,
  returning, interface compliance)
- sse_reader_test.go: 4 edge case tests (empty data, scanner error,
  event type persistence, no data after [DONE])
- gateway_announce_format_test.go: 10 tests for team + subagent
  announce formatting (single/batch, success/failure, snapshot)

Coverage: base/ 96%, orchestration/ 100%, providers/ 57%

* chore: remove stale runLoop references + apply go fix (Phase 6)

- Update comments referencing deleted runLoop in pipeline callbacks,
  loop_types, stage.go
- go fix: reflect.TypeFor, range over int, strings.Builder

* docs: update architecture docs for v3 refactor completion (Phase 6)

- CLAUDE.md: add store/base/, orchestration/ to project structure;
  remove v2 runLoop and dual-mode gate references; rename Pipeline (v3)
  to Pipeline; add SSEScanner and BatchQueue to key patterns
- docs/00-architecture-overview.md: update module map with new packages,
  remove [V3] markers (now standard)
- docs/17-changelog.md: add V3 Architecture Refactor entry (6 phases)

* feat(evolution): skill draft auto-generation + go fix cleanup

- Add skill draft template generation from evolution suggestions
- Wire skill apply endpoint in evolution HTTP handlers
- Apply go fix across codebase (range-over-int, reflect.TypeFor, etc.)
- Minor refactors: simplify switch/case, reduce string builder allocs
- Gateway deps: add skills loader field

* perf(prompt): deterministic tool order + Anthropic cache boundary split

Sort tool names in buildToolingSection for cache-stable output.
Insert GOCLAW_CACHE_BOUNDARY marker before Time section; Anthropic
provider splits system prompt into 2 blocks (stable cached, dynamic not).
Backward compat: no marker → single cached block.

* perf(prompt): optimize cache boundary position + add Execution Bias

Move Memory Recall and stable context files (AGENTS.md, TOOLS.md,
USER_PREDEFINED.md) above cache boundary. Dynamic per-user files
(USER.md, BOOTSTRAP.md) stay below. Add Execution Bias section
(full mode only) forcing action-oriented tool use. Fix duplicate
header when Project Context split across boundary.

* feat(prompt): add PromptMode task/none + 3-layer resolution

Expand PromptMode from full|minimal to full|task|minimal|none.
Task mode = enterprise automation: keeps Tooling, Execution Bias,
Safety-slim, Persona-slim, Skills-search, MCP-search, Memory-slim,
Workspace, Runtime, Delegation. Drops verbose sections (Tool Call
Style, Self-Evolution, Spawning, Recency, etc.).

None mode returns identity line only. 3-layer mode resolution:
runtime override > auto-detect (subagent/cron) > agent config
(other_config.prompt_mode) > default (full).

* feat(prompt): provider prompt contributions (stable/dynamic/overrides)

Add PromptContribution struct + PromptContributor interface for
provider-specific prompt customizations. Providers can inject
StablePrefix (before cache boundary), DynamicSuffix (after boundary),
or override sections by ID (e.g. execution_bias). Nil-safe: providers
that don't implement the interface get default behavior.

* feat(prompt): pinned skills with hybrid inline+search mode

Add per-agent pinned_skills config (max 10, from other_config JSONB).
Pinned skills always inline in prompt via BuildPinnedSummary.
Non-pinned discovered via skill_search. Hybrid section shows both
pinned XML and search instructions. Works in full and task modes.

* fix(prompt): validate prompt_mode from DB before cast

Reject unknown prompt_mode values from OtherConfig JSONB
(e.g. typos like "taks") by checking against validPromptModes set.
Invalid values default to "" (full mode). Prevents broken prompts
where no mode flags match.

* fix(prompt): wire SectionIDToolCallStyle for provider override

Tool Call Style section now uses sectionContent() like Execution Bias,
allowing providers to override it via PromptContribution.SectionOverrides.

* feat(store): implement 9 SQLite store backends for v3 parity

Close feature gap between PostgreSQL and SQLite (desktop/lite) editions.
96 methods across 9 stores: AgentLinks, SubagentTasks, SecureCLI,
SecureCLIGrants, EvolutionMetrics, EvolutionSuggestions, Episodic,
KnowledgeGraph, Vault. Schema v8→v9 adds 4 tables.

Key design decisions:
- LIKE-based search replaces tsvector/FTS5 (unavailable in modernc.org/sqlite)
- Go-side StringSimilarity for KG dedup (replaces pgvector cosine)
- Recursive CTE traversal with comma-delimited path cycle detection
- AES-256-GCM encryption on all SecureCLI read/write paths
- F15: SecureCLI disabled when EncryptionKey empty
- ON CONFLICT DO UPDATE (never INSERT OR REPLACE) to preserve FK cascades

* docs(store): add SQLite parity section to store data model docs

Document 9 new SQLite store implementations, schema v9, and
feature parity gaps (LIKE vs FTS, Jaro-Winkler vs vector dedup).

* fix(docker): skip web-builder stage when ENABLE_EMBEDUI=false

Use BuildKit conditional stage pattern so web-builder is not executed
when embedding is disabled. Also update pnpm-lock.yaml for 6 new deps
that were missing from the lockfile (markdown/math/mermaid packages).

* feat(vault): embed metadata.summary for richer vector search

Include summary from metadata JSONB in embedding text (title + path +
summary) for better semantic search. Update tsvector generated column
to include summary in FTS index. Zero new columns — summary stored in
existing metadata JSONB field. Backward compat: docs without summary
still embed title+path only.

* fix(vault-ui): wider detail dialog, markdown rendering, create tooltip

- Detail dialog: sm:max-w-lg → sm:max-w-2xl, content area 200→300px
- Content preview: render markdown via MarkdownRenderer instead of <pre>
- Create button: add tooltip explaining "select agent first" when disabled

* feat(ui): prompt_mode dropdown + pinned_skills multi-select

Add PromptSettingsSection to agent overview tab:
- prompt_mode: Select dropdown (full/task/minimal/none)
- pinned_skills: Tag input with max 10, click to remove
Both save to agent other_config JSONB via existing onUpdate flow.
Self-contained save button appears only when values change.

* fix(vault-ui): enlarge dialog to max-w-4xl, constrain markdown heading sizes

Dialog sm:max-w-2xl → sm:max-w-4xl. Markdown headings capped at
text-base/text-sm via Tailwind child selectors to prevent oversized
h1/h2 in content preview. Content area raised to 400px.

* refactor(ui): move pinned skills to dedicated section with skill select

Split pinned_skills out of PromptSettingsSection into PinnedSkillsSection.
Uses useAgentSkills hook for proper dropdown with granted skills list.
Placed right below SkillsSection in agent overview tab. Badge chips
with X to remove, Select dropdown to add. Max 10 enforced.

* fix(ui): add border wrap to capabilities section for visual consistency

* feat(ui): redesign prompt mode as compact cards, replace v3 badge

- Rewrite prompt settings from select dropdown to 2×2 compact cards
  with lucide icons (Zap/Wrench/Package/CircleOff) and ring selection
- Move prompt settings to top of agent overview tab
- Replace v3 badge with prompt mode badge in header, card, and list row
- Add i18n keys for prompt mode labels/descriptions (en/vi/zh)
- Extract readPromptMode() to shared agent-display-utils
- Remove dead useAgentVersion hook and v3Tooltip/v2Tooltip i18n keys
- Remove v3 badge from EngineVersionSection (keep feature toggles)
- Use cn() for badge class composition with twMerge safety

* feat(ui): add section tags and token estimates to prompt mode cards

Each card now shows which system prompt sections are included
(Persona, Tools, Safety, Skills, MCP, Memory, etc.) as compact
tags, plus estimated base token range (~2-4K for full, ~10 for none).
Helps users understand the impact of each mode at a glance.

* fix(ui): correct token estimates with actual tiktoken measurements

Measured via tiktoken (cl100k) on realistic config:
full=~1.7K+, task=~1.1K+, minimal=~660, none=~6 base tokens.
The "+" suffix indicates context files/skills/MCP add more.

* fix(ui): use tiktoken-measured token ranges for prompt mode cards

Measured across 3 scenarios (bare/typical/heavy) via tiktoken cl100k:
full=~500-2.9K, task=~350-1.2K, minimal=~350-820, none=~6 tokens.
Ranges reflect real configs from minimal (3 tools) to heavy
(10 tools, long persona, MCP, sandbox, pinned skills).

* fix(ui): use production-measured token counts for prompt mode cards

Measured via tiktoken on real production agent (tieu-ho) with
4 context files (AGENTS.md, SOUL.md, IDENTITY.md, USER_PREDEFINED.md),
14 tools, memory, KG, skills, Telegram channel:
full=~3.1K, task=~2.2K, minimal=~1.9K, none=~6 tokens.

* refactor(ui): accurate section tags per prompt mode from systemprompt.go

Section tags now match exact gating logic in BuildSystemPrompt():
- full: persona, tools, exec bias, call style, safety, skills,
  MCP, memory, sandbox, evolution, channel hints (11 sections)
- task: style echo, tools, exec bias, safety (slim), skills (search),
  MCP (search), memory (slim) (7 sections)
- minimal: tools, safety (2 sections + shared context files)
- none: no tags shown, no token count (trivially ~6 tokens)

Removed workspace/identityOnly tags (shared across all modes).

* feat(ui): replace engine version toggles with V3 capabilities modal

- Remove 3 toggle switches (Pipeline, Memory, Retrieval) from agent
  detail — all features are now always-on for v3 agents
- Replace with compact static badges layout
- New V3 Capabilities modal with 4 tabs:
  Pipeline (8-stage flow), Memory (L0/L1/L2 tiers),
  Knowledge (KG, Vault, Dreaming), Orchestration (Delegate, Evolution)
- Each tab has Lucide icons and technical descriptions of how
  features actually work
- Separate i18n namespace v3-capabilities in en/vi/zh locales

* refactor(ui): move V3 badge to agent header, remove engine version section

- Add clickable V3 badge next to agent name in header
- Clicking opens V3 Capabilities modal
- Remove Engine Version section from overview tab entirely
- Remove unused EngineVersionSection import

* feat: v3 prompt engine overhaul — 7-phase restructuring

Phase 1: Fix mode resolution — subagent/cron cap at task (not minimal),
heartbeat stays minimal, pinned skills injected in all modes, USER_PREDEFINED
added to minimal allowlist.

Phase 2: Context file restructuring — new CAPABILITIES.md (domain expertise,
separated from SOUL.md), AGENTS_MINIMAL.md for heartbeat sessions, both added
to stable context files and minimal allowlist.

Phase 3: Summoner update — all 4 prompt builders generate CAPABILITIES.md,
fallback 2-call stores capabilities alongside SOUL.md.

Phase 4: Open agent deprecation — creation silently upgrades open→predefined
in both HTTP and WS endpoints.

Phase 5: Bootstrap auto-contact — sender name from channel metadata injected
into bootstrap context for 1-turn onboarding (DM only).

Phase 6: System prompt preview — GET /v1/agents/{id}/system-prompt-preview
endpoint with mode param, token counting, section parsing.

Phase 7: Agent creation UX — removed open agent type toggle, schema simplified,
description always required, prompt mode cards updated with v3 token estimates.

* fix(vault-ui): redesign detail dialog and link dialog UX

- Vault detail: show content preview by default (not collapsed),
  move type/scope to header badges, hash to subtle footer
- Link dialog: replace native <select> with searchable Combobox
  for target document and link type (5 presets + custom)
- Fix Vietnamese i18n: add proper diacritics to v3-capabilities

* fix(i18n): keep prompt section badges in English across all locales

Technical terms like Persona, Tools, Safety, Memory, Skills, MCP,
Sandbox, Evolution should not be translated — they are UI labels
matching system prompt section names.

* feat(ui): system prompt preview in Files tab + README restructure

Files tab: add "System Prompt" item in sidebar below context files.
When selected, shows readonly preview with mode selector (full/task/
minimal/none), token count badge, and cache boundary highlighting.
Fetches from GET /v1/agents/{id}/system-prompt-preview.

README: remove Claw Ecosystem comparison tables, remove OpenClaw port
reference, rename "What Makes It Different" to "Core Features" with v3
additions (8-stage pipeline, 4-mode prompt, 3-tier memory, knowledge
vault, self-evolution), slim built-in tools to category summary table.

* feat: replace architecture images with v3 sketchnotes

Add 9 new architecture sketchnote images generated for v3:
- 8-Stage Agent Pipeline
- 4-Mode Prompt System
- 3-Tier Memory Architecture
- Multi-Tenant Architecture
- Agent Orchestration
- Knowledge Vault
- Provider Adapter System
- Self-Evolution System
- DomainEventBus

Remove old architecture images (architecture.jpg, goclaw_multi_tenant.png,
agent-delegation.jpg, agent-teams.jpg).

Update README to reference new images in Architecture and Orchestration
sections. Consolidate orchestration section (remove separate delegation
and teams subsections).

* feat(readme): add remaining 4 architecture sketchnotes with sections

Add Knowledge Vault, Self-Evolution, Provider Adapters, and Event-Driven
Architecture sections with corresponding sketchnote images and concise
descriptions. All 9 v3 architecture diagrams now in README.

* feat: reorder README architecture images, evolution guardrails fix, memory tools enhancement

README: reorder architecture sketchnotes (Multi-Tenant first), remove
pinnedSkills from task mode sections badge.

Backend: evolution guardrails fix, memory auto-injector and tools
enhancement, gateway HTTP wiring update.

* feat(desktop): add 12 v3 feature sections to agent detail panel

Add evolution expansion (skill learning, v3 flags), prompt mode selector,
evolution dashboard tab (CSS bar charts, suggestions, guardrails),
thinking/reasoning config, orchestration display, context pruning,
compaction, subagents (lite-limited), tool policy, sandbox config,
and pinned skills management.

Extract agent detail state into use-agent-detail-state hook.
Add getWithParams to ApiClient. Add i18n keys to en/vi/zh locales.

* fix(vault): team-scope security — prevent cross-team data corruption and leaks

- Add team_id UUID + custom_scope to vault_documents (PG migration 043, SQLite migration 10)
- COALESCE-based UNIQUE prevents silent cross-team data overwrite on ON CONFLICT
- PG trigger auto-corrects scope to 'personal' on team deletion (ON DELETE SET NULL)
- Store layer: TeamID filter on all query methods, RunContext-based team scoping
- CreateLink validates same-tenant + same-team boundary (defense in depth)
- VaultInterceptor: infer scope from RunContext, add AfterWriteMedia for binary files
- Wire VaultInterceptor into 5 tools (create_image, create_video, create_audio, tts, edit)
- HTTP handlers: team membership validation via HasTeamAccess, non-owner defaults to personal
- GetBacklinks: single JOIN + LIMIT 100 replaces N+1, VaultBacklink struct with team_id
- vault_search/vault_link/vault_backlinks tools read TeamID from RunContext
- Backlinks filtered by team boundary to prevent title exfiltration

Addresses 7 original + 13 red-team findings (1 CRITICAL, 5 HIGH, 3 MEDIUM).

* feat(evolution): allow CAPABILITIES.md self-evolution, backfill existing agents, cleanup v3 dead flags

- Allow self-evolving predefined agents to read/write CAPABILITIES.md
  (domain expertise) in addition to SOUL.md (style/tone)
- Add CAPABILITIES.md to contextFileSet (DB routing) and protectedFileSet
  (group chat permission check)
- Update buildSelfEvolveSection() system prompt to mention CAPABILITIES.md
- Merge ensureUserPredefined + ensureCapabilities into single
  ensureBackfillFiles() — one DB query instead of two, with error logging
- Remove dead V3MemoryEnabled/V3RetrievalEnabled from PipelineConfig,
  Loop, LoopConfig, resolver, and adapter (always true at runtime)
- Keep fields in V3Flags struct for JSONB backward compat with
  deprecation comments
- Add 10 new tests covering interceptor read/write, prompt section,
  and backfill logic

* feat(export): add v3 sections to agent import/export pipeline

- Phase 1: KG temporal fields (valid_from/valid_until) in export/import
- Phase 2: Episodic summaries section (episodic/summaries.jsonl)
- Phase 3: Evolution metrics + suggestions (evolution/*.jsonl)
- Phase 4: Vault documents + links (vault/*.jsonl, two-pass link resolution)
- Phase 5: ImportSummary expanded to 17 fields, cron/overrides UPSERT dedup
- Add backup/pgpass.go: secure .pgpass credential handling for pg_dump

* feat(backup): system and tenant backup/restore with S3 support

Phase 6: System backup — pg_dump + filesystem tar.gz, .pgpass security,
  preflight check, CLI (goclaw backup) + HTTP API with SSE progress
Phase 7: System restore — psql restore, path traversal protection,
  active connection check, --force/--dry-run/--skip-db/--skip-files
Phase 8: S3 integration — upload/download via AWS SDK v2, credentials
  encrypted in config_secrets (AES-256-GCM), custom endpoint support
Phase 9: Tenant backup/restore — per-table JSONL export (43 tables,
  5-tier FK ordering), 3 restore modes (upsert/replace/new-tenant),
  tenant admin permission checks, CLI + HTTP endpoints

* feat(backup): add table registry validation + gate tenant backup for PG-only

- DiscoverTenantTables() queries information_schema for tables with tenant_id
- ValidateTableRegistry() cross-checks hardcoded registry vs actual schema,
  warns about unregistered tables to prevent silent data loss
- TenantBackup() runs validation before export
- Tenant backup/restore gated for PG-only — SQLite Lite edition has only
  master tenant, returns clear error directing to system backup instead

* fix(backup): address code review security + correctness findings

- C1: Remove backup_path from S3 upload — prevent file exfiltration,
  require backup_token only
- H1: createNewTenant fails explicitly on slug conflict instead of
  silent NOOP that orphans imported data
- H2: Validate JSONL column names against safe regex before SQL
  interpolation — prevent SQL injection from crafted archives
- H3: Replace EOF string comparison with io.Copy in addFileToTar
- H6: Add atomic concurrency guard — reject concurrent backup/restore

* feat(web): add backup & restore admin page with 4 tabs

New admin-only page at /backup-restore with System Backup, System
Restore, S3 Config, and Tenant Backup tabs. Reuses existing
useSseProgress hook and OperationProgress component for real-time
SSE streaming. Includes full i18n support (en/vi/zh).

* feat(web): system prompt preview modal + CAPABILITIES.md backfill + pipeline parity

- Add Eye button in agent header → opens wide modal with markdown-rendered system prompt preview
- CAPABILITIES.md one-time startup backfill for pre-v3 agents (single SQL INSERT WHERE NOT EXISTS)
- Add CAPABILITIES.md to allowedAgentFiles so it shows in Files tab
- Refactor preview API to reuse pipeline's BuildSystemPrompt via BuildPreviewPrompt()
- Resolve actual tool names from registry, provider contributions, sandbox config, shell deny groups
- Resolve team context (TEAM.md virtual file, members, workspace, delegation targets) from DB
- Add IsMinimalAllowed() for mode-aware context file filtering
- Support ?user_id= query param for per-user context file preview

* feat(prompt): redesign 4 system prompt modes with tiered context files

- Add AGENTS_CORE.md (minimal) and AGENTS_TASK.md (task) templates
- Implement ModeAllowlist() for per-mode context file filtering
- Wire filtering into pipeline (loop_history) and preview API
- Upgrade none mode from single-line to functional tool-call prompt
  with slim safety, pinned skills, MCP search, workspace, runtime
- Task mode now gets full persona (SOUL.md + IDENTITY.md)
- Add prompt_mode selector to agent creation dialog
- Mode upgrade warning toast on save
- Skip summoning modal for none/minimal mode agents
- SQL migration 000044: seed AGENTS_CORE/TASK, remove AGENTS_MINIMAL
- SQLite: fix migration v9 duplicate column, v10 COALESCE in UNIQUE
- Add ModeAllowlist tests, none mode tests, SQLite schema tests
- Update i18n (en/vi/zh) with new section keys and mode descriptions

Token targets: full ~4.8K, task ~1.3K, minimal ~570, none ~640

* refactor(web): extract PromptModeCards shared component

Reuse same card layout (icon + name + desc + tokens + section tags)
in both agent creation dialog and agent settings page.
Create dialog uses compact=true to hide section tags.

* fix(backup): align preflight API contract + add package guidance UX

Backend preflight endpoint now returns flat JSON matching frontend
interface (pg_dump_available, disk_space_ok, size metrics) instead of
internal checks array. Adds DirSize/FormatBytes helpers.

Frontend: PageHeader component for consistency, amber alert banner
with link to /packages when pg_dump missing, mobile grid fix.

* feat(web): system prompt preview modal + CAPABILITIES.md backfill + pipeline parity

* feat(vault): async enrich worker for auto summary + semantic linking

Add EventBus-driven worker that generates LLM summaries for vault
documents, embeds them via pgvector, and auto-creates semantic links
between related docs using cosine similarity search.

- Skip embedding in UpsertDocument when summary is empty
- Add UpdateSummaryAndReembed + FindSimilarDocs to VaultStore interface
- Wire enrichment events in AfterWrite/AfterWriteMedia interceptors
- BatchQueue batching for burst writes, bounded dedup (10K cap)
- 5-minute LLM timeout, 0.7 similarity threshold, top-5 neighbors
- SQLite: summary-only (no vector ops, graceful noop)

* refactor(web): phase 1 quick wins — grid breakpoints, silent catches, useQuery migration

- Fix 6 grid-cols-2 without mobile breakpoint → grid-cols-1 sm:grid-cols-2
- Replace 9 silent .catch(() => {}) with console.error for debuggability
- Migrate system-prompt-preview.tsx from useEffect+useState to useQuery
- Add agent column + path tail truncation to vault documents table

* refactor(web): phase 2 split god components into sub-components

- cron-overview-tab: extract schedule, delivery, lifecycle sections
- channel-detail: extract timeline hook + dialogs component
- agent-advanced: extract state utils (deriveState, buildPayload)
- heartbeat-config: add deriveFormDefaults helper
- tenant-backup: split into backup + restore sections
- board-container: extract useBoardTasks hook
- provider-form: extract standard form fields
- contacts: extract contacts table component

* refactor(web): phase 3 form standardization — RHF+Zod migration + field errors

- Migrate 6 forms from useState to React Hook Form + Zod validation
- Create 4 new schema files (api-key, vault, login, s3-config)
- Add inline field error display to memory, mcp, heartbeat, agent-create forms
- Replace raw <input> with <Input> in login forms for consistent styling

* refactor(web): phase 4 lazy-load 20 dialog components + memory co-location

- Convert 20 heavy dialogs from eager to React.lazy + Suspense
- All dialogs use named exports with .then(m => ({ default: m.X })) wrapper
- Suspense fallback={null} per-dialog (invisible during chunk load)
- Move memory page components into documents/, knowledge-graph/, episodic/ subdirs

* refactor(web): phase 5 data fetching polish — staleTime tiers + optimistic updates

- Apply 3-tier staleTime policy: static 5min, standard 60s, realtime 5-15s
- Update 31 hooks with explicit staleTime (37 useQuery call sites)
- Add optimistic updates to builtin-tools, v3-flags, cron, mcp toggles

* refactor(web): phase 6 styling standardization — CVA, design tokens, font utility

- Convert input, textarea, select trigger to CVA with size variants
- Add font-mono-code utility class, replace 4 inline fontFamily styles
- Add text-2xs (10px) and text-xs-plus (11px) design tokens
- Replace 241 arbitrary text-[10px]/text-[11px] with token classes
- Co-locate memory page knowledge-graph files into subdirectory

* feat(vault): pagination, team filter, graph upgrade, and link_type param

- Add CountDocuments to VaultStore interface (PG + SQLite)
- Wrap vault list response as {documents, total} for pagination
- Add optional link_type param to vault_link tool (wikilink/reference)
- Fix resolveOrRegister to use inferVaultDocType instead of hardcoded "note"
- Add team filter dropdown and pagination UI (100/page) to vault page
- Rewrite vault graph with KG-level features: zoom controls, node limit
  selector, click highlight/dim, double-click detail, link labels, stats bar
- Decouple graph data fetch from table (independent limit 500)
- Update VaultDocument type with team_id, summary, custom_scope, media

* fix(vault): graph not rendering due to containerRef timing issue

The early return for loading state prevented containerRef from mounting,
so useLayoutEffect and ResizeObserver never captured dimensions. Moved
loading/empty states inside the container div so the ref always exists.

* fix(vault): eliminate graph flicker on zoom by using ref instead of state

onZoom fired setZoomLevel on every frame → React re-render → new inline
callback references → ForceGraph2D flickered. Now zoom level is stored
in a ref and the display is updated via direct DOM mutation, avoiding
re-renders during continuous zoom interactions.

* refactor(backend): comprehensive audit — safety, god files, interfaces, BaseChannel, tests, benchmarks

Phase 1: ExportTokenStore with lifecycle management (replaces leaked globals)
Phase 2: Split 4 god files (agents_export, agents_import, openai, loop_history) into 17 focused files
Phase 3: Add testability interfaces (consolidation EntityExtractor, heartbeat ProviderResolver/EventPublisher/ActiveSessionChecker)
Phase 4: Consolidate 6 duplicated fields + policy/pairing logic into BaseChannel, migrate all 8 channels
Phase 5: 57 new unit tests (i18n, edition, channels/policy, consolidation, heartbeat) + CI coverage
Phase 6: 30 benchmark tests (tokencount, skills BM25, tool registry, agent loop)
Phase 7: Context propagation fix + 6 metadata key constants

69 files changed, +7361/-3828 (net -3050 lines removed)

* fix(vault): fetch graph links per-agent so all-agents mode shows links

useVaultAllLinks required a single agentId, so links never loaded in
all-agents mode (agentId=""). Replaced with inline per-agent fetching
that groups documents by agent_id and fetches links for each agent.

* docs: add vault enhancement changelog entry

* feat(agent): add displayName to Loop and SystemPrompt for runtime context

Pass agent display name through LoopConfig and SystemPromptConfig so
the runtime section can show a human-readable agent name.

* fix(vault): truncate path from head and hash from middle in detail dialog

Path now uses dir=rtl so ellipsis appears at the start, keeping the
meaningful filename visible. SHA-256 hash shows first 8 + last 8 chars
with ellipsis in the middle instead of truncating the tail.

---------

Co-authored-by: Plateau Nguyen <nguyennlt.ncc@gmail.com>
Co-authored-by: Kai (Tam Nhu) Tran <61256810+kaitranntt@users.noreply.github.com>
2026-04-09 21:15:19 +07:00
Viet Tran cd022699f6 feat: multi-tenant isolation — complete implementation (#359)
* feat(security): multi-tenant user data isolation (Plan 1)

Comprehensive user data isolation for non-owner system users:

- API key identity binding: owner_id column forces user_id on auth,
  prevents spoofing via X-GoClaw-User-Id header
- Sessions: ownership checks on list/preview/patch/delete/reset,
  non-admin users see only their own sessions
- Cron: user_id filtering on list, ownership checks on mutations
- Server-side WS event filtering: agent/chat/session/cron/team events
  scoped per-user instead of broadcast to all clients
- Web UI role guards: RequireAdmin on 15 admin-only pages, role
  propagated from WS connect response to auth store
- Tracing/activity: user_id enforcement for non-admin HTTP callers
- Teams: HasTeamAccess membership checks on get/delete/list
- Skills: fail-closed ownership check (deny non-admin if store
  doesn't support owner lookup)
- HTTP auth: requireAuthBearer now enforces owner_id + user context
  for file/media downloads (was missing)
- Dead code: removed delegation_history, handoff_routes tables and
  all related handlers/store code
- New: team_user_grants table for user-to-team access control

Migration 000026: api_keys.owner_id + team_user_grants + DROP legacy tables

* feat(security): multi-tenant foundation — tenants table, tenant_id propagation, permission cache (Plan 2)

Add tenant isolation infrastructure across the entire gateway:

Schema (migration 000027):
- Create tenants + tenant_users tables with master tenant seed
- Add tenant_id column to 30 user-scoped tables (NOT NULL DEFAULT master)
- api_keys.tenant_id nullable (NULL = system-level cross-tenant key)
- Create builtin_tool_tenant_configs + skill_tenant_configs for per-tenant overrides
- Drop custom_tools table (agent loop integration never wired)

Store layer:
- TenantStore interface + PGTenantStore (CRUD tenants + tenant_users)
- TenantID field on AgentData + APIKeyData
- tenant_id in agents/api_keys/skills SQL (Create, Get, List)

Context propagation:
- WithTenantID/TenantIDFromContext (uuid.Nil = fail-closed)
- WithCrossTenant/IsCrossTenant (owner/system admin flag)

Auth tenant resolution:
- HTTP: resolveAuthBearer sets TenantID/CrossTenant on all 5 auth paths
- WS: handleConnect sets tenantID/crossTenant on Client
- API key 2-tier: NULL = cross-tenant (system), set = tenant-scoped

Runtime isolation:
- Event bus: TenantID field on Event, fail-closed filter in event_filter.go
- Cron: tenant context injected in RunJob handler
- Subagent: tenant validation prevents cross-tenant spawn
- Security logging: tenant_id in auth resolution logs

Tenant management:
- WS RPC: 7 methods (tenants.list/get/create/update, tenants.users.*)
- HTTP: 7 endpoints (/v1/tenants/*)
- Slug validation + path traversal prevention
- Role validation (owner/admin/operator/member/viewer)

Infrastructure:
- PermissionCache: 4 sub-caches (tenant resolve, role, agent access, team access)
- tenant_paths.go: filesystem path helpers with master-tenant backward compat
- i18n: MsgInvalidRole key + translations (en/vi/zh)

Dead code removed: custom_tools store, HTTP handler, DynamicToolLoader (-828 lines)

* feat(security): tenant query filtering + workspace isolation (Plan 3)

Add WHERE tenant_id filtering to all 30+ tenant-scoped store queries,
wire workspace filesystem isolation, and harden restrict_to_workspace.

Store query filtering:
- Add tenantClauseN/tenantIDForInsert/requireTenantID helpers
- Filter all SELECT/INSERT/UPDATE/DELETE by tenant_id for non-cross-tenant
- Refactor SessionStore.GetOrCreate and CronStore.AddJob/ListJobs to
  accept context.Context for tenant propagation
- System skills (is_system=true) bypass tenant filter for all tenants
- Special cases: GetByKey (channels), GetByHash (auth) skip filter

Workspace isolation:
- Resolver computes tenant-scoped workspace + dataDir for non-master tenants
- Add WithTenantSlug/TenantSlugFromContext to context propagation
- Add TenantStore + Workspace to ResolverDeps
- Force effectiveRestrict() to always return true (multi-tenant security)
- Remove restrict_to_workspace from agentAllowedFields

UI cleanup:
- Remove custom-tools pages, types, routes, constants (backend removed in Plan 2)
- Clean tool-name-select component of custom tools references

* feat(security): session ctx propagation + execMapUpdate tenant guard (Plan 4)

Session store:
- Add ctx to AddMessage, SetSessionMetadata, SetAgentInfo, List, Save
- List now filters by tenant_id for non-cross-tenant callers
- Save uses ExecContext for cancellation support
- All ~15 callers updated to pass ctx

execMapUpdate tenant guard:
- Remove deleted_at IS NULL from execMapUpdateWhereTenant (only agents has soft-delete)
- Migrate 8 callers to execMapUpdateWhereTenant: agent_links, channel_instances,
  mcp_servers, secure_cli, tracing, teams, skills_crud, cron_update
- Add ctx to UpdateSkill, UpdateJob interfaces + all callers

Deferred: cron scheduler global cache (correct by design — system process),
browser per-tenant isolation (separate plan).

* refactor(store): add context.Context to all SessionStore interface methods

Complete ctx propagation across all 24 SessionStore methods for:
- Future tenant-aware DB operations
- Request cancellation/timeout support
- Distributed tracing capability

Updated ~15 files including all callers in agent loop, gateway methods,
heartbeat ticker, tools, and CLI commands.

* fix(security): remove context.Background() shadowing in gateway handlers

Critical fix from code review: gateway agent handlers (create, update,
delete, identity, files, links, teams) were creating ctx := context.Background()
which shadowed the handler's ctx that carries tenant_id. This breaks
tenant-scoped agent queries for non-master tenants.

- Remove ctx shadowing in 7 agent handler files
- Add ctx param to resolveAgentUUID/resolveAgentInfo helpers
- Use store.WithCrossTenant in resolver (system-level operation)

* feat(security): tenant-scoped UNIQUE constraints for multi-tenant isolation

Update UNIQUE indexes to include tenant_id, allowing same names across tenants:
- agents: (agent_key) → (tenant_id, agent_key) WHERE deleted_at IS NULL
- sessions: (session_key) → (tenant_id, session_key)
- skills: (slug) → (tenant_id, slug)
- mcp_servers: (name) → (tenant_id, name)
- channel_contacts: (channel_type, sender_id) → (tenant_id, channel_type, sender_id)

Code changes:
- GetByKey now filters by tenant_id (same pattern as GetByID)
- ON CONFLICT clauses updated for sessions and skills
- Channel consumer uses WithCrossTenant for agent resolution
- Down migration restores original constraints

* fix(security): close remaining tenant isolation gaps from final audit

Critical fixes:
- gateway_setup: WithCrossTenant for default agent lookup at startup (C6)
- channel_contacts: ON CONFLICT updated to (tenant_id, channel_type, sender_id) (Q15)
- agents.Delete: tenant filter on DELETE (Q1)

High priority fixes:
- agents: List, GetDefault, ShareAgent, RevokeShare, ListShares, CanAccess,
  ListAccessible, Update unset-default — all now tenant-scoped
- skills_crud: DeleteSkill now takes ctx, verifies tenant ownership
- mcp_servers, channel_instances, secure_cli: Delete methods tenant-scoped
- WithCrossTenant added to: gateway team notifications, team_tool_cache,
  pending_messages GetDefault

* fix(migration): add tenant_id to usage_snapshots unique index

Update idx_usage_snapshots_unique to include tenant_id, preventing
cross-tenant upsert collisions when different tenants have agents
with same provider/model/channel combination.

* feat(security): cron tenant guard + browser per-tenant isolation

Phase 3 — Cron API tenant guard:
- Add ctx to 5 CronStore methods (GetJob, RemoveJob, EnableJob, RunJob, GetRunLog)
- All API-facing cron ops now filter by tenant_id (prevents cross-tenant CRUD)
- RemoveJob/EnableJob return "not found" on tenant mismatch (no enumeration)
- GetRunLog JOINs cron_jobs for tenant filtering
- UpdateJob internal reads scoped by tenant (defense-in-depth)
- Scheduler-internal methods (GetDueJobs, refreshJobCache) unchanged (system-level)

Phase 4 — Browser per-tenant isolation:
- Per-tenant incognito browser contexts via rod Incognito() (separate cookie jars)
- All page access (Snapshot, Screenshot, Navigate, Click, Type, etc.) validated
  via getPageForTenant — blocks cross-tenant access by targetID
- OpenTab creates pages in tenant's incognito context
- ListTabs scoped to tenant's incognito context
- ConsoleMessages validates page ownership
- Stop/reconnect properly cleans up incognito contexts

* feat(security): isolation gaps + per-tenant config (Plan 5)

Part A — Isolation Gap Fixes:
- Merge migration 028 into 027: add tenant_id to llm_providers +
  config_secrets, fix UNIQUE constraints for paired_devices +
  channel_instances
- providers.go: tenant filtering on all CRUD queries
- config_secrets.go: ON CONFLICT (key, tenant_id)
- pairing_store: add ctx to all 7 interface methods, remove hardcoded
  MasterTenantID, update ~15 channel caller files
- Session cache: prefix keys with tenantID to prevent cross-tenant
  collision. DB queries (loadFromDB, Save, Delete, LastUsedChannel)
  add tenant filter
- config_permissions cache: prefix keys with tenantID
- Cron ListJobs: fail-closed when tenant context missing

Part B — Per-Tenant Configuration:
- Provider Registry: compound key tenantID/name with fallback to
  master tenant. GetForTenant/ListForTenant/RegisterForTenant
- Resolver: uses tenant-aware provider lookup + disabled tools query
- Agent loop: filter disabled tools from LLM tool definitions
- Builtin tool tenant configs: store interface + PG implementation +
  PUT/DELETE HTTP endpoints
- Skill tenant configs: store interface + PG + ListAccessible LEFT
  JOIN to exclude disabled skills per tenant
- OAuth: DBTokenSource with tenantID field for tenant-scoped token
  refresh
- All HTTP provider handlers use RegisterForTenant/UnregisterForTenant

* feat(security): channel tenant propagation + MCP per-user credentials (Plan 6)

- Propagate tenant_id from channel_instances through BaseChannel →
  InboundMessage → agent loop context (fixes 5-point break in tenant flow)
- Inject tenant context in WS router dispatch for all gateway methods
- Add MCP per-user credential overrides (api_key, headers, env) with
  AES-256-GCM encryption and HTTP API endpoints
- Rewrite MCP pool with tenant-scoped keys, slot semaphore, idle eviction,
  and credential rotation support (Evict per tenant+server)
- Bypass pool for users with custom credentials (separate connections)
- Fix MCP APIKey never passed to connections (inject as Authorization header)

* fix(security): close remaining tenant isolation gaps from Plan 1-6 audit

- Add tenant_id to 6 missing tables: agent_context_files,
  skill_agent_grants, mcp_agent_grants, team_tasks, spans,
  embedding_cache (migration 027)
- Fix tid==uuid.Nil fallback to fail-closed (return error) in 8 update
  methods: agent_links, teams, skills, channel_instances, secure_cli,
  cron, mcp_servers, tracing
- Add tenant filter to bare DELETEs: DeleteLink, DeleteTeam
- Add tenant filter to queries: ListChildTraces, GetMonthlyAgentCost,
  CountAgentGrantsByServer, ListAccessible (MCP), ReviewRequest,
  ResolveGroupTitles, buildTraceWhere
- Fix missing tenant_id in INSERTs: CreateSkill, GrantToUser,
  ReviewRequest grant INSERTs
- Add tenant filter to api_keys: List, Revoke, Delete
- Fix cron scanJob/RemoveJob/EnableJob fallthrough patterns

* fix(security): inject tenant context into channel handler entry points

Channel handlers used context.Background() which lost tenant context,
causing store operations to either fail-closed or default to master
tenant. Now all 10 handler entry points inject tenant from BaseChannel.

* fix(security): tenant filters for teams, tasks, skills (Plan 6b audit)

- Teams: add tenant filter to GetTeamForAgent, ListMembers,
  ListIdleMembers, KnownUserIDs (JOIN agent_teams for tenant check)
- Teams: add tenant_id to GrantTeamAccess INSERT, tenant filter to
  RevokeTeamAccess, ListTeamGrants, HasTeamAccess
- Team tasks: add tenant_id to CreateTask INSERT, fail-closed
  UpdateTask, tenant filter on all 7 query/delete methods
- Skills: add tenant filter to RevokeFromAgent, ListAgentGrants
- Skills: add ctx param + tenant filter to ToggleSkill
- History: annotate context.Background() locations with TODOs for
  future tenant injection (requires PendingHistory struct refactor)

* fix(security): add tenant_id to 4 missing team tables + fix INSERTs

Add tenant_id column to: agent_team_members, team_task_comments,
team_task_events, team_task_attachments (migration 027).

Fix INSERT statements to include tenant_id: AddMember,
AddTaskComment, RecordTaskEvent, AttachFileToTask.

* fix(migration): cast UUID literals in tenant_users seed + usage_snapshots index

PostgreSQL doesn't auto-cast string to UUID in SELECT and expression
index contexts. Add explicit ::uuid casts to prevent migration failure.

* docs: add multi-tenant architecture guide for integrators

Comprehensive solution doc covering auth model, WS protocol, event
system, data isolation, API reference, and integration patterns.
Target audience: developers building custom frontends or SaaS on GoClaw.

* feat(ui): multi-tenant awareness + tenant admin page (Plan 7)

Backend:
- Enrich WS connect response with tenant_name, tenant_slug, cross_tenant
- Add tenants.mine WS method (any user, returns own memberships)
- Parse tenant_hint in connect params for browser pairing multi-tenant
- Wire tenantStore to MethodRouter for connect-time tenant lookup

Frontend:
- Auth store: tenantId, tenantName, tenantSlug, isCrossTenant, availableTenants
- WS client: capture tenant fields from connect, send tenant_hint
- WS provider: auto-fetch tenants.mine on connect
- useTenants() shared hook for all tenant-aware components
- Tenant indicator in sidebar connection status
- Tenant admin page (/admin/tenants) with list + create dialog
- Tenants nav in sidebar (cross-tenant admin only)
- i18n: tenants namespace (en/vi/zh)
- Type updates: tenant_id on AgentData, ApiKeyData

* refactor(ui): move tenant selector into user menu dropdown in topbar

Replace simple logout button with a Radix Popover user menu showing:
- User ID display
- Tenant selector (when multi-tenant: list all tenants with check mark)
- Logout button

Remove tenant indicator from connection-status.tsx (now in topbar).
Tenant switch saves slug to localStorage and reloads for reconnect.

* feat(ui): add logout confirmation dialog

Show destructive confirm dialog before logout via ConfirmDialog
component. Added logoutConfirm i18n key for en/vi/zh.

* fix(ui): security hardening — hide admin nav, fix route guard, fix refresh

- Hide System nav group for non-admin roles in sidebar (was visible to all)
- Replace RequireAdmin with RequireCrossTenant guard on /admin/tenants route
- Add RequireCrossTenant component to require-role.tsx
- Fix refresh button animation: use isFetching instead of isLoading
- Clean up connection-status.tsx (remove tenant indicator, now in topbar)

* feat: cross-tenant admin tenant scope selector

Backend: add tenant_scope connect param. Cross-tenant clients can
narrow their scope to a specific tenant (slug). applyTenantScope()
sets client.tenantID and clears crossTenant flag.

UI: user menu shows "All Tenants" option for cross-tenant admins.
Selecting a tenant saves slug to localStorage as tenant_scope,
reload reconnects with narrowed scope. "All Tenants" clears scope.

* feat: provisioning API key scope + tenant detail page (Plan 8)

Backend:
- Add operator.provision scope for limited tenant management
- Add HasScope() method to gateway Client
- Allow provision-scoped keys to create tenants + add users
- Allow provision-scoped keys to create tenant-bound API keys

Frontend:
- Tenant detail page with user management (list, add, remove)
- Clickable tenant list rows navigate to detail
- i18n: tenant detail keys (en/vi/zh)
- Route /admin/tenants/:id with RequireCrossTenant guard

* fix: tenant scope keeps admin privileges + UI pattern fixes

Backend:
- applyTenantScope keeps crossTenant=true (retains admin features)
- Router: scoped cross-tenant injects WithTenantID (filters data)
  while keeping admin role for method access

UI:
- Fix "All Tenants" check mark (compare against nil UUID string)
- Fix tenant label when scope active (show selected tenant name)
- Use ConfirmDialog for user removal (was hand-rolled)
- Add DialogDescription to add-user dialog (Radix a11y)
- Fix table min-w-[600px] consistency
- Fix column header mismatch (was "role", should be "created")

* fix(ui): clean up tenant detail header — remove redundant info panel

Remove duplicate slug/status/created panel. Info now shown in
PageHeader description (slug + date). Status badge removed (redundant
with description). Cleaner, consistent with other admin pages.

* fix(ui): redesign tenant detail with info cards + user cards

* feat(ui): tenant selection gate — require tenant before app access

- Add tenantSelected flag to auth store (persisted via localStorage)
- WS provider auto-selects: single-tenant user auto, cross-tenant
  admin defaults to "All Tenants", zero-tenant user blocked
- RequireAuth gate: redirect to /select-tenant when connected but
  no tenant selected
- New TenantSelectorPage: centered card layout matching login page,
  "All Tenants" amber card for cross-tenant admin, per-tenant cards
  with role badges, no-access state with logout button
- i18n: selectTenant, noAccess keys (en/vi/zh)

* fix(security): scope events for cross-tenant admin with tenant_scope

Event filter was checking !crossTenant before filtering — scoped
cross-tenant admins (crossTenant=true + tenantID set) bypassed
tenant event filtering. Now checks tenantID != Nil regardless of
crossTenant flag, ensuring scoped admins only see their chosen
tenant's events.

* fix(security): HTTP API now respects tenant_scope for gateway token

Root cause: UI uses HTTP API (/v1/agents, /v1/mcp/servers, etc.)
for data fetching. HTTP auth middleware with gateway token always
set CrossTenant=true with no tenant filtering. tenant_scope only
worked for WS connection, not HTTP requests.

Fix:
- HTTP client sends X-GoClaw-Tenant-Scope header from localStorage
- HTTP auth resolves header slug → tenant UUID via tenantStore
- requireAuth: CrossTenant + TenantID → WithTenantID (scoped)
- Wire InitTenantStore(pgStores.Tenants) in gateway startup

* feat(security): tenant-aware provider registry, event filter, and membership validation

- Refactor providers.Registry: Get(ctx, name) / List(ctx) extract tenant
  from context via injected TenantFromCtx func (avoids circular import)
- Event filter: fail-closed 3-mode tenant filtering
  Mode 1: unscoped admin sees all
  Mode 2: scoped admin sees tenant events + system events
  Mode 3: regular user sees only own tenant (fail-closed)
- WS connect: resolveTenantHint validates membership via GetUserRole
  with PermissionCache (30s TTL, bus invalidation)
- BroadcastForTenant helper for tenant-scoped event emission
- Session list: add TenantID to SessionListOpts from context
- Cron handleRun: preserve tenant in background goroutine context
- GOCLAW_LOG_LEVEL env var (debug|info|warn|error) for Docker/K8s
- Cache debug logging: tenant_cache, permission_cache, api_key_cache
- Friendly verify error: timeout → user-readable message
- Verify timeout: 15s → 30s

* feat(ui): setup wizard improvements + agent preset enrichment

- Setup: skip link with confirm dialog, language selector (en/vi/zh)
- Setup: card padding fix (py-0 gap-0 on Card, py-5 on CardContent)
- Setup: remove duplicate skip link from layout
- Step Model: verify countdown timer (30s), stops on result
- Step Agent: default Fox Spirit preset, selected state styling,
  hide agent key/name inputs, auto-derive from preset, emoji in config
- Summoning modal: elapsed timer (m:ss format)
- Agent presets: enriched prompts with human-like quirks
  Fox Spirit: playful personality, care reminders
  Artisan: portrait/banner/ads/logo expertise
  Astrologer: reference sites (astro.com, cafeastrology, labyrinthos)
- i18n: "triệu hồi linh hồn" fix, all 3 locales updated

* feat(ui): API Key tenant support + card layout + provider chain fix

- API Key create: tenant selector for cross-tenant admin, provision scope
- API Key create: redesigned dialog with scope cards, Radix Select, icons
- API Key list: card layout with badges (status, tenant, scopes)
- API Key: shortcut in user menu (topbar)
- API Key: keep "API Key" untranslated across all locales
- Provider chain: empty state fix — skip legacy entry when provider
  not found in current tenant
- i18n: form.cancel key added to all 3 locales

* fix(ui): add bottom padding to all page layouts + misc improvements

- Add pb-10 to all 24 page containers to prevent content touching
  bottom edge of viewport
- Various UI polish from user modifications (summoning colors,
  layout icon, agent cards, sidebar adjustments)

* feat(ui): MCP user credentials dialog + builtin tool tenant toggle

- MCP: per-user credentials dialog (api_key, headers, env KV editor)
  with status badges, delete all, save
- MCP: "My Credentials" button on each server row
- Builtin Tools: per-tenant enable/disable override toggle
  with "Using default" / "Enabled/Disabled for tenant" badges
  and reset-to-default button
- Setup: larger logo (h-16) and bolder title (text-4xl font-bold)
- i18n: all keys added to en/vi/zh for both features

* fix(ui): API key card spacing + remove pagination border

- Card padding: px-4 py-3.5 (was px-3 py-2), rows spaced with gap-2
- Scopes on separate row from dates for readability
- Card gap: space-y-2.5 between cards
- Pagination: add className prop, remove border-t on API keys page
- Badge/icon sizes bumped to text-xs / h-3.5 (was text-[10px] / h-3)

* fix(security): comprehensive tenant isolation audit — SQL, events, cache, skills, files

Defense-in-depth hardening across 12 audit phases:

- SQL: add tenant_id WHERE to teams_tasks lifecycle/activity/followup/progress/embedding (~30 functions)
- Events: broadcastTeamEvent + task_ticker + subagent announce now carry TenantID
- Cache: agentKeyCache scoped by tenant (agent keys per-tenant, not globally unique)
- Skills: SkillStore interface accepts ctx, SQL filter (is_system OR tenant_id=$N), per-tenant list cache, GrantToAgent includes tenant_id, tenant-scoped file storage
- Files: StorageHandler/FilesHandler/TeamAttachments/teamWorkspaceDir use config.TenantDataDir/TenantTeamDir
- Security: HMAC signed file tokens (file_token.go) replace gateway token in URLs
- Audit: AuditEventPayload carries TenantID for async subscriber tenant scoping
- InboundMessage: subagent/dispatch/validation/session_send propagate TenantID
- Pending messages: DeleteStale scoped by tenant

* fix(security): skip gateway token in URLs with signed file tokens

toFileUrl() now skips appending ?token=GATEWAY_TOKEN when the URL
already contains ?ft= (HMAC signed file token). Prevents gateway
token exposure via browser history, logs, and referrer headers.

* fix(security): stop persisting auth tokens in session media URLs

mediaToMarkdown() now stores clean paths (/v1/files/path) without
any auth tokens. Previously embedded ?token=GATEWAY_TOKEN (or ?ft=)
into markdown which gets persisted in session messages DB.

Frontend toFileUrl() adds auth at render time — tokens never stored.

* fix(security): migration 027 strips leaked gateway tokens from session URLs

Adds cleanup step to tenant foundation migration: removes ?token=xxx
from persisted media URLs in session messages. Old code embedded the
gateway token; new code stores clean paths only.

* fix(security): sign file URLs at delivery time, not persist time

Add SignFileURLs() utility that finds /v1/files/ and /v1/media/ URLs
in content and appends HMAC signed ?ft= tokens before delivery.

Applied at 4 delivery points:
- WS agent events (OnEvent callback in gateway_managed.go)
- WS chat.history response
- WS sessions.preview response
- HTTP /v1/chat/completions response

Sessions store clean paths only. Tokens are generated per-delivery
with 1h TTL — never persisted in DB. Frontend toFileUrl() skips
appending gateway token when ?ft= is already present.

* fix: file token verify path must match signed path (/v1/files/ prefix)

SignFileURLs() signs the full URL path "/v1/files/{path}" but the
verify in files.go auth() was using "/{path}" (without prefix).
HMAC mismatch caused all signed file tokens to return 401.

* fix(security): scope storage size cache per-tenant

sizeCache was a single global entry — all tenants shared one cached
size. Changed to sync.Map keyed by tenantBaseDir so each tenant gets
its own cached size calculation.

* feat(ui): redesign API keys page — table layout + code snippet dialog

Replace card-based API keys list with table layout matching MCP Servers
pattern. Add "API Key Usage" dialog with tabbed code snippets (cURL,
TypeScript, Go) showing gateway connection examples with syntax
highlighting and copy-to-clipboard.

* fix(builtin-tools): seed media tools disabled, fix tenant toggle, add unconfigured warning

- Seed media tools with Enabled=false and no default provider settings
  (user must configure provider chain before enabling)
- Fix provider chain form ghost entries: validate provider exists in
  tenant before showing (parseInitialEntries new-format path)
- Fix double toggle: show only tenant override OR global toggle, not both
- Fix list API: merge tenant_enabled from builtin_tool_tenant_configs
  into response when tenant-scoped (was always null)
- Add ListAll() to BuiltinToolTenantConfigStore for full override map
- Add amber warning banner for enabled media tools missing provider config

* feat(mcp): require_user_credentials setting + KeyValueEditor for user creds

- Add require_user_credentials setting in mcp_servers.settings JSONB
- Backend: skip MCP server in LoadForAgent when user lacks credentials
- Frontend: toggle in MCP form dialog, persisted in settings field
- Redesign MCP user credentials dialog: replace raw Textarea with
  KeyValueEditor (sensitive key masking for auth/token/secret fields)
- Add settings to mcpServerAllowedFields for HTTP update

* fix(security): restrict cross-tenant to owner IDs, config to owners only

- Gateway token + non-owner user ID: admin role but tenant-scoped
  (no cross-tenant access). Fallback: only "system" is owner when
  GOCLAW_OWNER_IDS not configured (fail-closed).
- Config page (WS config.* methods): wrapped with requireCrossTenant
  middleware — non-owner admins get permission denied
- Config sidebar link: hidden for non-cross-tenant users
- Logout: clear tenant_id and tenant_hint from localStorage
  (prevents tenant scope leak to next user session)
- Refactor: LOCAL_STORAGE_KEYS.TENANT_ID/TENANT_HINT constants

* fix(ui): chat bubble contrast, login logo, tenant no-access UX

- Chat bubble: use --chat-bubble-user CSS var (darker orange, L=0.50/0.52)
  with text-white for WCAG AA contrast (~5.5:1)
- Login page: logo h-20 w-20, title text-3xl font-bold
- Tenant selector no-access: shield icon + hint text explaining
  user needs admin to add them to a tenant
- Sidebar: GoClaw text uses text-sidebar-primary (brand color)

* feat(contacts): merge/unmerge contacts to tenant users

Add API and UI for linking channel contacts to tenant_users identity,
enabling cross-channel user identification within a tenant.

Backend:
- POST /v1/contacts/merge — link contacts to existing or new tenant_user
- POST /v1/contacts/unmerge — remove merged_id from contacts
- GET /v1/contacts/merged/{id} — list contacts by tenant_user
- GET /v1/tenant-users — list users for current tenant
- Add display_name + metadata columns to tenant_users (migration 27)
- All endpoints enforce tenant isolation via context tenant_id

Frontend:
- Checkbox multi-select on contacts table
- Selection toolbar with Merge/Unmerge buttons
- Merge dialog: link to existing user or create new
- Link2 icon indicator for merged contacts
- i18n: en/vi/zh translations for merge section

* fix(security): add tenant_id to span and embedding_cache inserts

SpanData struct was missing TenantID field — all span inserts failed
with NOT NULL constraint violation after migration 027 dropped defaults.

Fix captures tenant_id from context at emit time (6 call sites in
loop_tracing.go + subagent_tracing.go), then includes it in both
CreateSpan() and BatchCreateSpans() SQL (25→26 columns).

Also fixes embedding_cache writeEmbeddingCache() which was missing
tenant_id in its batch INSERT — same class of bug.

Both use MasterTenantID fallback for backward compatibility.

* feat: Introduce tenant switcher UI and enhance multi-tenant architecture documentation.

* fix(security): enforce tenant scoping, fix session isolation and UI cleanup

- Force cross-tenant admins to always have a concrete tenant_id (default
  MasterTenantID) instead of unscoped WithCrossTenant — prevents mismatch
  between session listing (no filter) and writes (MasterTenantID fallback)
- Make agent router tenant-aware: Get(ctx, agentID) resolves agent for
  the caller's tenant, preventing cross-tenant agent cache collisions
- Fix context.Background() in title goroutine and summarization — now
  uses tenant-aware context (WithoutCancel) so titles and compaction
  persist to the correct tenant
- Add read-only SessionStore.Get() method; replace GetOrCreate in auth
  checks (preview/patch/delete/reset) to prevent phantom session creation
- Inject tenant from channel instance into inbound message processing
- Remove "All Tenants" option from tenant selector, topbar switcher,
  and ws-provider auto-select — admin must always operate within a tenant
- Fix contacts page selection toolbar layout shift (always rendered)
- Widen MCP credentials sensitive header regex to catch API_KEY etc.

* fix(security): propagate tenant_id in consumer handlers and background ops

- InjectTeamDispatch: use context.WithoutCancel instead of context.Background
  to preserve tenant_id while avoiding cancel propagation from HTTP/WS handlers
- handleTeammateMessage/handleSubagentAnnounce: inject tenant_id from msg
- Add nil guard for outcome.Result to prevent panic on agent-not-found
- Use BroadcastForTenant for EventTeamTaskFailed/Completed/LeaderProcessing
- Remove unnecessary WithCrossTenant in autoSetFollowup (ctx already scoped)
- resolveAgentByKey: accept ctx param for tenant-scoped agent lookup
- pending_messages: use request ctx instead of cross-tenant for GetDefault

* fix(security): tenant-scope EnsureContact and PendingHistory DB operations

- All channel EnsureContact calls now use tenant-scoped ctx instead of
  context.Background (whatsapp, slack, discord, telegram, feishu, zalo)
- PendingHistory: add tenantID field, thread through constructors
- All PendingHistory DB ops (load, flush, compact, delete) use tenantCtx()
- Normalize timeouts: 10s for simple queries, 15s for batch writes

* feat(teams): auto-attach media, retry completed tasks, improve tool messages

- Auto-attach workspace media from any tool (create_image/audio/video) to
  team tasks via loop-level hook, not just write_file interceptor
- Store absolute paths in team_task_attachments instead of relative
- Extend retry action to support completed tasks (reopen for follow-up)
- Context-aware comment result messages with next-action guidance for
  leader vs member roles and task status
- All tool results include task_id for agent follow-up actions
- Use #N "subject" format instead of raw UUIDs in tool messages

* feat(multi-tenant): tenant isolation for media, events, providers and UI

- Tenant-scoped media store, event filter, provider registry
- Tenant header propagation in WS/HTTP clients
- UI: tenant-aware chat messages, markdown renderer improvements
- Protocol: tenant error codes and event definitions

* docs: add multi-tenant architecture documentation

* fix(teams): store absolute paths in team_task_attachments

- AutoAttachWorkspaceFile: use cleanPath consistently instead of raw absPath
- executeAttach: resolve relative paths to absolute via team workspace
- AfterWrite interceptor already uses filepath.Clean (verified)

* fix(teams): attachment download handles both absolute and relative paths

filepath.Join with an absolute att.Path discards the teamBase prefix,
causing path traversal check to fail and download to serve wrong file.
Now checks IsAbs first — uses path directly for new absolute entries,
falls back to legacy join for old relative entries.

* fix(teams): attachment download validates against workspace root not tenant dir

Absolute paths stored in DB don't match TenantTeamDir structure
(master tenant has no tenants/ prefix). Now validates absolute paths
against dataDir (workspace root) instead. Legacy relative paths still
resolve via TenantTeamDir as before. IDOR check on att.TeamID ensures
cross-team isolation.

* fix(teams): attachment download uses workspace root, not data dir

Files are stored under GOCLAW_WORKSPACE/teams/ but handler was passed
dataDir (GOCLAW_DATA_DIR) — completely different directory. Now passes
workspace. Legacy relative paths resolve via {workspace}/teams/{teamID}/{chatID}/{path}.

* fix(security): use HMAC-signed file tokens for attachment downloads

Replace gateway token exposure (?token=) with HMAC-signed short-lived
file tokens (?ft=) for team task attachment downloads — same mechanism
used by chat file URLs.

Backend:
- team_attachments auth: accept ?ft= signed token (priority 1), Bearer (priority 2)
- teams_tasks RPC: sign download_url with HMAC at delivery time
- Add fileTokenSecret to TeamsMethods, thread through wireChannelRPCMethods

Frontend:
- Use server-signed download_url from attachment data instead of ?token=
- Remove useAuthStore dependency from task-detail-dialog

* fix(security): decouple file token signing from gateway token

- Generate random 256-bit HMAC key at startup (crypto/rand, memory-only)
- All file signing/verification uses FileSigningKey() instead of gateway token
- Remove ?token= query param fallback from /v1/files/, /v1/media/, attachments
- Only ?ft= signed tokens and Bearer header accepted for file access
- Reduce file token TTL from 1h to 5min
- Frontend: remove gateway token from all file URLs and imports
- Note: tokens invalidate on restart (acceptable for 5min TTL + WS reconnect)

* fix(ui): use signed download_url for task attachments

- Add download_url to TeamTaskAttachment type
- Use a.download_url (server-signed ?ft=) instead of bare URL

* fix(security): tenant-scope team workspace paths + show user/tenant in topbar

- WorkspaceDir callers now use config.TenantWorkspace() to resolve
  tenant-scoped base dir (non-master tenants get workspace/tenants/{slug}/)
- Fixes: all tenants previously wrote to global /app/workspace/teams/
  without filesystem-level isolation
- Affected: loop.go (agent run), team_tasks_mutations.go (task creation)
- teams_workspace.go already correct (uses TenantTeamDir)
- UI topbar: show "userId (tenantName)" in user menu

* feat(ui): redesign task detail dialog with improved UX

- Split monolithic 343-line component into 5 focused files
- New header: subject as title, identifier + status badges above
- Metadata grid with soft bg-muted/30 background, priority icons
- Attachments: card-style with mime-type icons + proper download Button
- Description/Result: markdown rendering via MarkdownRenderer
- Comments: avatar circles + markdown rendering for content
- All sections collapsible with chevron + count badge
- Timeline: vertical dot-line pattern, collapsed by default
- Fix kanban card hover layout shift (opacity instead of display toggle)

* fix(security): tenant-scoped workspace paths and tool cache isolation

- Scope team workspace paths to tenant directory
- Add tenant isolation to tool cache and task reads
- Shell deny pattern improvements
- Agent resolver and context file tenant scoping
- Sidebar tenant/user display fix
- Add tests for workspace, boundary, and context file interceptor

* fix(ui): tenant visibility fallback, merge coming-soon, task detail tweaks

- Tenants page: try tenants.list (owner), fall back to tenants.mine
  for regular users; hide create button for non-owners
- Merge contacts dialog: add coming-soon banner (i18n en/vi/zh),
  disable form and submit button
- Task detail: collapse attachments/comments by default,
  guard download_url before rendering
2026-03-23 08:08:23 +07:00
viettranx 0d3230b2bf feat(cache): add build-tag-gated Redis cache backend
Add optional Redis cache support via `go build -tags redis`, following
the same paired-stub pattern as OTel and Tailscale. The Cache[V] interface
is unchanged; Redis and in-memory implementations are injected at startup
without altering usage logic.

- Add RedisCache[V] implementation with JSON serialization, fail-open on errors
- Add gateway_redis.go / gateway_redis_noop.go paired wiring files
- Refactor GroupWriterCache and ContextFileInterceptor to accept injected caches
- Add GOCLAW_REDIS_DSN env var, docker-compose.redis.yml overlay
- Update Dockerfile and GitHub Actions with ENABLE_REDIS build arg
- Add Redis variant to CI matrix (5 variants: latest, otel, tsnet, redis, full)
2026-03-07 19:27:24 +07:00
Viet Tran 6895e369f6 refactor: remove standalone mode, consolidate to managed-only (PostgreSQL) (#70)
- Remove standalone mode code: file-based stores, standalone gateway,
  heartbeat service, SQLite memory, standalone docker-compose
- Rename docker-compose.managed.yml → docker-compose.postgres.yml
- Clean up ~130 Go comments referencing "managed mode" qualifier
- Simplify docker-compose.yml env vars (providers/channels via web UI)
- Update .env.example to essential vars only (token + encryption key)
- Add setup wizard UI (provider → agent → channel bootstrap flow)
- Add logs.tail WebSocket handler for live log streaming
- Add cursor-pointer to interactive UI components
- Clean up config page (remove standalone-only sections)
- Update README and docs for managed-only architecture
2026-03-06 18:51:11 +07:00