- Update lane name, env var, channel references across all docs - Mark delegate tool as removed in tools system docs - Update routing tables and architecture diagrams
18 KiB
00 - Architecture Overview
1. Overview
GoClaw is an AI agent gateway written in Go. It exposes a WebSocket RPC (v3) interface and an OpenAI-compatible HTTP API for orchestrating LLM-powered agents. The system uses PostgreSQL as its storage backend with full multi-tenant isolation, per-user context files, encrypted credentials, agent delegation, teams, and LLM call tracing.
2. Component Diagram
flowchart TD
subgraph Clients
WS[WebSocket Clients]
HTTP[HTTP Clients]
TG[Telegram]
DC[Discord]
FS[Feishu / Lark]
ZL[Zalo OA]
ZLP[Zalo Personal]
WA[WhatsApp]
SL[Slack]
end
subgraph Gateway["Gateway Server"]
WSS[WebSocket Server]
HTTPS[HTTP API Server]
MR[Method Router]
RL[Rate Limiter]
RBAC[Permission Engine]
end
subgraph Channels["Channel Manager"]
CM[Channel Manager]
PA[Pairing Service]
end
subgraph Core["Core Engine"]
BUS[Message Bus]
SCHED[Scheduler -- 4 Lanes]
AR[Agent Router]
LOOP[Agent Loop -- Think / Act / Observe]
end
subgraph Providers["LLM Providers"]
ANTH[Anthropic -- Native HTTP + SSE]
OAI["OpenAI-Compatible -- HTTP + SSE<br/>(OpenAI, Gemini, DeepSeek, DashScope, +8)"]
ACP["ACP -- JSON-RPC 2.0 stdio<br/>(Claude Code, Codex, Gemini CLI)"]
end
subgraph Tools["Tool Registry"]
FS_T[Filesystem]
EXEC[Exec / Shell]
WEB[Web Search / Fetch]
MEM[Memory]
SUB[Subagent]
DEL[Delegation]
TEAM_T[Teams]
TTS_T[TTS]
BROW[Browser]
SK[Skills]
MCP_T[MCP Bridge]
CT[Custom Tools]
end
subgraph Store["Store Layer"]
SESS[SessionStore]
AGENT_S[AgentStore]
PROV_S[ProviderStore]
CRON_S[CronStore]
MEM_S[MemoryStore]
SKILL_S[SkillStore]
TRACE_S[TracingStore]
MCP_S[MCPServerStore]
CT_S[CustomToolStore]
TM_S[TeamStore]
end
WS --> WSS
HTTP --> HTTPS
TG & DC & FS & ZL & ZLP & WA & SL --> CM
WSS --> MR
HTTPS --> MR
MR --> RL --> RBAC --> AR
CM --> BUS
BUS --> SCHED
SCHED --> AR
AR --> LOOP
LOOP --> Providers
LOOP --> Tools
Tools --> Store
LOOP --> Store
3. Module Map
| Module | Description |
|---|---|
internal/gateway/ |
WebSocket + HTTP server, client handling, method router |
internal/gateway/methods/ |
RPC method handlers: chat, agents, teams, delegations, sessions, config, skills, cron, pairing, exec approval, usage, send |
internal/agent/ |
Agent loop (think, act, observe), router, resolver, system prompt builder, sanitization, pruning, tracing, memory flush, DELEGATION.md + TEAM.md injection |
internal/providers/ |
LLM providers: Anthropic (native HTTP + SSE), OpenAI-compatible (HTTP + SSE, 12+ providers), DashScope (Qwen), ACP (JSON-RPC 2.0 subprocess), Claude CLI, Codex, extended thinking support, retry logic |
internal/providers/acp/ |
ACP protocol implementation: ProcessPool (subprocess lifecycle), ToolBridge (fs/terminal), session management |
internal/tools/ |
Tool registry, filesystem ops, exec/shell, policy engine, subagent, delegation manager, team tools, context file + memory interceptors, credential scrubbing, rate limiting, PathDenyable |
internal/tools/dynamic_loader.go |
Custom tool loader: LoadGlobal (startup), LoadForAgent (per-agent clone), ReloadGlobal (cache invalidation) |
internal/tools/dynamic_tool.go |
Custom tool executor: command template rendering, shell escaping, encrypted env vars |
internal/store/ |
Store interfaces: SessionStore, AgentStore, ProviderStore, SkillStore, MemoryStore, CronStore, PairingStore, TracingStore, MCPServerStore, TeamStore, ChannelInstanceStore, ConfigSecretsStore |
internal/store/pg/ |
PostgreSQL implementations (database/sql + pgx/v5) |
internal/bootstrap/ |
System prompt files (AGENTS.md, SOUL.md, TOOLS.md, IDENTITY.md, USER.md, BOOTSTRAP.md) + seeding + truncation |
internal/config/ |
Config loading (JSON5) + env var overlay |
internal/skills/ |
SKILL.md loader (5-tier hierarchy) + BM25 search + hot-reload via fsnotify |
internal/channels/ |
Channel manager + adapters: Telegram (forum topics, STT, bot commands), Feishu/Lark (streaming cards, media), Zalo OA, Zalo Personal, Discord, WhatsApp, Slack |
internal/mcp/ |
MCP server bridge (stdio, SSE, streamable-HTTP transports) |
internal/scheduler/ |
Lane-based concurrency control (main, subagent, cron, team lanes) with per-session serialization |
internal/memory/ |
Memory system (pgvector hybrid search) |
internal/permissions/ |
RBAC policy engine (admin, operator, viewer roles) |
internal/store/pg/pairing.go |
DM/device pairing service (8-character codes, database-backed) |
internal/sandbox/ |
Docker-based code execution sandbox |
internal/tts/ |
Text-to-Speech providers: OpenAI, ElevenLabs, Edge, MiniMax |
internal/http/ |
HTTP API handlers: /v1/chat/completions, /v1/agents, /v1/skills, /v1/traces, /v1/mcp, /v1/delegations, summoner |
internal/crypto/ |
AES-256-GCM encryption for API keys |
internal/tracing/ |
LLM call tracing (traces + spans), in-memory buffer with periodic store flush |
internal/tracing/otelexport/ |
Optional OpenTelemetry OTLP exporter (opt-in via build tags; adds gRPC + protobuf) |
internal/cache/ |
Caching layer for agent state and provider responses |
internal/bus/ |
Event pub/sub message bus for inter-component communication |
internal/knowledgegraph/ |
Knowledge graph storage and traversal |
internal/mcp/ |
Model Context Protocol bridge/server (stdio, SSE, streamable-HTTP) |
internal/media/ |
Media handling utilities |
internal/oauth/ |
OAuth authentication integration |
internal/sessions/ |
Session management and lifecycle |
internal/tasks/ |
Task management system |
internal/upgrade/ |
Database schema version tracking and migrations |
4. Multi-Tenant Identity Model
GoClaw uses the Identity Propagation pattern (also known as Trusted Subsystem). It does not implement authentication or authorization — instead, it trusts the upstream service that authenticates with the gateway token to provide accurate user identity.
flowchart LR
subgraph "Upstream Service (trusted)"
AUTH["Authenticate end-user"]
HDR["Set X-GoClaw-User-Id header<br/>or user_id in WS connect"]
end
subgraph "GoClaw Gateway"
EXTRACT["Extract user_id<br/>(opaque, VARCHAR 255)"]
CTX["store.WithUserID(ctx)"]
SCOPE["Per-user scoping:<br/>sessions, context files,<br/>memory, traces, agent shares"]
end
AUTH --> HDR
HDR --> EXTRACT
EXTRACT --> CTX
CTX --> SCOPE
Identity Flow
| Entry Point | How user_id is provided | Enforcement |
|---|---|---|
| HTTP API | X-GoClaw-User-Id header |
Required |
| WebSocket | user_id field in connect handshake |
Required |
| Channels | Derived from platform sender ID (e.g., Telegram user ID) | Automatic |
Compound User ID Convention
The user_id field is opaque to GoClaw — it does not interpret or validate the format. For multi-tenant deployments, the recommended convention is:
tenant.{tenantId}.user.{userId}
This hierarchical format ensures natural isolation between tenants. Since user_id is used as a scoping key across all per-user tables (user_context_files, user_agent_profiles, user_agent_overrides, agent_shares, sessions, traces), the compound format guarantees that users from different tenants cannot access each other's data.
Where user_id is used
| Component | Usage |
|---|---|
| Session keys | agent:{agentId}:{channel}:direct:{peerId} — peerId derived from user_id |
| Context files | user_context_files table scoped by (agent_id, user_id) |
| User profiles | user_agent_profiles table — first/last seen, workspace |
| User overrides | user_agent_overrides — per-user provider/model preferences |
| Agent shares | agent_shares table — user-level access control |
| Memory | Per-user memory entries via context propagation |
| Traces | traces table includes user_id for filtering |
| MCP grants | mcp_user_grants — per-user MCP server access |
| Skills grants | skill_user_grants — per-user skill access |
6. Gateway Startup Sequence
sequenceDiagram
participant CLI as CLI (cmd/root.go)
participant GW as runGateway()
participant PG as PostgreSQL
participant Engine as Core Engine
CLI->>GW: 1. Parse CLI flags + load config
GW->>GW: 2. Resolve workspace + data dirs
GW->>GW: 3. Create Message Bus
GW->>PG: 4. Connect to Postgres (pg.NewPGStores)
PG-->>GW: PG stores created
GW->>GW: 5. Start tracing collector
GW->>PG: 6. Register providers from DB
GW->>PG: 7. Wire embedding provider to PGMemoryStore
GW->>PG: 8. Backfill memory embeddings (background)
GW->>GW: 9. Register config-based providers
GW->>GW: 10. Create tool registry (filesystem, exec, web, memory, browser, TTS, subagent, MCP)
GW->>GW: 11. Load bootstrap files (DB)
GW->>GW: 12. Create skills loader + register skill_search tool
GW->>GW: 13. Wire skill embeddings
GW->>GW: 14. Create agents lazily (set ManagedResolver)
GW->>GW: 15. wireManagedExtras (interceptors, cache subscribers)
GW->>GW: 16. Wire managed HTTP handlers (agents, skills, traces, MCP)
GW->>Engine: 17. Create gateway server (WS + HTTP)
GW->>Engine: 18. Register RPC methods
GW->>Engine: 19. Register + start channels (Telegram, Discord, Feishu, Zalo, WhatsApp)
GW->>Engine: 20. Start cron, scheduler (4 lanes)
GW->>Engine: 21. Start skills watcher + inbound consumer
GW->>Engine: 22. Listen on host:port
7. Database Wiring
The wireManagedExtras() function in cmd/gateway_managed.go wires multi-tenant components:
flowchart TD
W1["1. ContextFileInterceptor<br/>Routes read_file / write_file to DB"] --> W2
W2["2. User Seeding Callback<br/>Seeds per-user context files on first chat"] --> W3
W3["3. Context File Loader<br/>Loads per-user vs agent-level files by agent_type"] --> W4
W4["4. ManagedResolver<br/>Lazy-creates agent Loops from DB on cache miss"] --> W5
W5["5. Virtual FS Interceptors<br/>Wire interceptors on read_file + write_file + memory tools"] --> W6
W6["6. Memory Store Wiring<br/>Wire PGMemoryStore on memory_search + memory_get tools"] --> W7
W7["7. Cache Invalidation Subscribers<br/>Subscribe to MessageBus events"] --> W8
W8["8. Delegation Tools<br/>DelegateManager + agent links"] --> W9
W9["9. Team Tools<br/>team_tasks + team_message + team auto-linking"] --> W10
W10["10. Team Mailbox<br/>team_message tool for peer communication"]
Cache Invalidation Events
| Event | Subscriber | Action |
|---|---|---|
cache:bootstrap |
ContextFileInterceptor | InvalidateAgent() or InvalidateAll() |
cache:agent |
AgentRouter | InvalidateAgent() -- forces re-resolve from DB |
cache:skills |
SkillStore | BumpVersion() |
cache:cron |
CronStore | InvalidateCache() |
cache:custom_tools |
DynamicToolLoader | ReloadGlobal() + AgentRouter.InvalidateAll() |
8. Scheduler Lanes
The scheduler uses a lane-based concurrency model. Each lane is a named worker pool with a bounded semaphore. Per-session queues control concurrency within each session.
flowchart TD
subgraph Main["Lane: main (concurrency 2)"]
M1[Channel messages]
M2[WebSocket requests]
end
subgraph Sub["Lane: subagent (concurrency 4)"]
S1[Subagent executions]
end
subgraph Del["Lane: team (concurrency 100)"]
D1[Delegation executions]
end
subgraph Cron["Lane: cron (concurrency 1)"]
C1[Cron job executions]
end
Main --> SEM1[Semaphore]
Sub --> SEM2[Semaphore]
Del --> SEM3[Semaphore]
Cron --> SEM4[Semaphore]
SEM1 --> Q[Per-Session Queue]
SEM2 --> Q
SEM3 --> Q
SEM4 --> Q
Q --> AGENT[Agent Loop]
Lane Defaults
| Lane | Concurrency | Env Override | Purpose |
|---|---|---|---|
main |
30 | GOCLAW_LANE_MAIN |
Primary user chat sessions |
subagent |
50 | GOCLAW_LANE_SUBAGENT |
Spawned subagents |
team |
100 | GOCLAW_LANE_TEAM |
Agent team/delegation executions |
cron |
30 | GOCLAW_LANE_CRON |
Scheduled cron jobs |
Session Queue Concurrency
Per-session queues now support configurable maxConcurrent:
- DMs:
maxConcurrent = 1(single-threaded per user) - Groups:
maxConcurrent = 3(multiple concurrent responses) - Adaptive throttle: When session history exceeds 60% of context window, concurrency drops to 1
Queue Modes
| Mode | Behavior |
|---|---|
queue |
FIFO -- new messages wait until the current run completes |
followup |
Merges incoming message into the pending queue as a follow-up |
interrupt |
Cancels the active run and replaces it with the new message |
Default queue config: capacity 10, drop policy old (drops oldest on overflow), debounce 800ms.
/stop and /stopall
/stop-- Cancel the oldest running task (others keep going)/stopall-- Cancel all running tasks + drain the queue
Both are intercepted before the debouncer to avoid being merged with normal messages.
9. Graceful Shutdown
When the process receives SIGINT or SIGTERM:
- Broadcast
shutdownevent to all connected WebSocket clients. channelMgr.StopAll()-- stop all channel adapters.cronStore.Stop()-- stop cron scheduler.sandboxMgr.Stop()+ReleaseAll()-- release Docker containers.cancel()-- cancel root context, propagating to consumer + scheduler.- Deferred cleanup: flush tracing collector, close memory store, close browser manager, stop scheduler lanes.
- HTTP server shutdown with a 5-second timeout (
context.WithTimeout).
10. Config System
Configuration is loaded from a JSON5 file with environment variable overlay. Secrets are never persisted to the config file.
flowchart TD
A{Config path?} -->|--config flag| B[CLI flag path]
A -->|GOCLAW_CONFIG env| C[Env var path]
A -->|default| D["config.json"]
B & C & D --> LOAD["config.Load()"]
LOAD --> S1["1. Set defaults"]
S1 --> S2["2. Parse JSON5"]
S2 --> S3["3. Env var overlay<br/>(GOCLAW_*_API_KEY)"]
S3 --> S4["4. Apply computed defaults<br/>(context pruning, etc.)"]
S4 --> READY[Config ready]
Key Config Sections
| Section | Purpose |
|---|---|
gateway |
host, port, token, allowed_origins, rate_limit_rpm, max_message_chars |
agents |
defaults (provider, model, context_window) + list (per-agent overrides) |
tools |
profile, allow/deny lists, exec_approval, web, browser, mcp_servers, rate_limit_per_hour |
channels |
Per-channel: enabled, token, dm_policy, group_policy, allow_from |
database |
postgres_dsn read only from env var |
Secret Handling
- Secrets exist only in env vars or
.env.local-- never inconfig.json. GOCLAW_POSTGRES_DSNis taggedjson:"-"and cannot be read from the config file.MaskedCopy()replaces API keys with"***"when returning config over WebSocket.StripSecrets()removes secrets before writing config to disk.- Config hot-reload via
fsnotifywatcher with 300ms debounce.
11. File Reference
| File | Purpose |
|---|---|
cmd/root.go |
Cobra CLI entry point, flag parsing |
cmd/gateway.go |
Gateway startup orchestrator (runGateway()) |
cmd/gateway_managed.go |
Database wiring (wireManagedExtras(), wireManagedHTTP()) |
cmd/gateway_callbacks.go |
Shared callbacks (user seeding, context file loading) |
cmd/gateway_consumer.go |
Inbound message consumer (subagent, teammate routing) |
cmd/gateway_providers.go |
Provider registration (config-based + DB-based) |
cmd/gateway_methods.go |
RPC method registration |
internal/config/config.go |
Config struct definitions |
internal/config/config_load.go |
JSON5 loading + env overlay |
internal/config/config_channels.go |
Channel config structs |
internal/gateway/server.go |
WS + HTTP server, CORS, rate limiter setup |
internal/gateway/client.go |
WebSocket client handling, read limit (512KB) |
internal/gateway/router.go |
RPC method routing |
internal/scheduler/lanes.go |
Lane definitions, semaphore-based concurrency |
internal/scheduler/queue.go |
Per-session queue, queue modes, debounce |
internal/store/stores.go |
Stores container struct (all 22+ store interfaces) |
internal/store/types.go |
StoreConfig, BaseModel |
Cross-References
| Document | Content |
|---|---|
| 01-agent-loop.md | Agent loop detail, sanitization pipeline, history management |
| 02-providers.md | LLM providers, retry logic, schema cleaning |
| 03-tools-system.md | Tool registry, policy engine, interceptors, custom tools, MCP grants |
| 04-gateway-protocol.md | WebSocket protocol v3, HTTP API, RBAC, identity propagation |
| 05-channels-messaging.md | Channel adapters, Telegram formatting, pairing, per-user scoping |
| 06-store-data-model.md | Store interfaces, PostgreSQL schema, session caching, custom tool store |
| 07-bootstrap-skills-memory.md | Bootstrap files, skills system, memory, skills grants |
| 08-scheduling-cron.md | Scheduler lanes, cron lifecycle |
| 09-security.md | Defense layers, encryption, rate limiting, RBAC, sandbox |
| 10-tracing-observability.md | Tracing collector, span hierarchy, OTel export, trace API |
| 11-agent-teams.md | Agent teams, task board, mailbox, delegation integration |
| 12-extended-thinking.md | Extended thinking, per-provider support, streaming |