mirror of https://github.com/tiennm99/goclaw.git synced 2026-06-11 16:11:18 +00:00

Files

T

Thieu Nguyen 8ad580521d refactor: deprecate standalone mode, managed mode is now default (#126 )

* refactor: remove managed/standalone mode distinction from codebase

Standalone mode is deprecated; managed mode is now the only mode.
Remove redundant "managed mode" qualifiers from comments, docs,
and error messages. Error strings now reference "database stores"
instead of "managed mode" for clarity.

* improve(onboard): streamline onboard process and env setup

Simplify onboard wizard, extract helpers to dedicated file,
update env example and entrypoint for default managed mode,
clean up prepare-env script, update i18n catalogs.

2026-03-11 07:27:38 +07:00

16 KiB

Raw Blame History

09 - Security

Defense-in-depth with five independent layers from transport to isolation. Each layer operates independently -- even if one layer is bypassed, the remaining layers continue to protect the system.

AES-256-GCM encryption protects secrets stored in PostgreSQL (LLM provider API keys, MCP server API keys, custom tool environment variables). Agent-level access control uses the 4-step CanAccess pipeline (see 06-store-data-model.md).

1. Five Defense Layers

flowchart TD
    REQ["Request"] --> L1["Layer 1: Transport<br/>CORS, message size limits, timing-safe auth"]
    L1 --> L2["Layer 2: Input<br/>Injection detection (6 patterns), message truncation"]
    L2 --> L3["Layer 3: Tool<br/>Shell deny patterns, path traversal, SSRF, exec approval"]
    L3 --> L4["Layer 4: Output<br/>Credential scrubbing, content wrapping"]
    L4 --> L5["Layer 5: Isolation<br/>Workspace isolation, Docker sandbox, read-only FS"]

Layer 1: Transport Security

Mechanism	Detail
CORS (WebSocket)	`checkOrigin()` validates against `allowed_origins` (empty = allow all for backward compatibility)
WS message limit	`SetReadLimit(512KB)` -- gorilla auto-closes connection on exceed
HTTP body limit	`MaxBytesReader(1MB)` -- error returned before JSON decode
Token auth	`crypto/subtle.ConstantTimeCompare` (timing-safe)
Rate limiting	Token bucket per user/IP, configurable via `rate_limit_rpm`

Layer 2: Input -- Injection Detection

The input guard scans for 6 injection patterns.

Pattern	Detection Target
`ignore_instructions`	"ignore all previous instructions"
`role_override`	"you are now...", "pretend you are..."
`system_tags`	`<system>`, `[SYSTEM]`, `[INST]`, `<<SYS>>`
`instruction_injection`	"new instructions:", "override:", "system prompt:"
`null_bytes`	Null characters `\x00` (obfuscation attempts)
`delimiter_escape`	"end of system", `</instructions>`, `</prompt>`

Configurable action (gateway.injection_action):

Value	Behavior
`"log"`	Log info level, continue processing
`"warn"` (default)	Log warning level, continue processing
`"block"`	Log warning, return error, stop processing
`"off"`	Disable detection entirely

Message truncation: Messages exceeding max_message_chars (default 32K) are truncated (not rejected), and the LLM is notified of the truncation.

Layer 3: Tool Security

Shell deny patterns -- 7 categories of blocked commands:

Category	Examples
Destructive file ops	`rm -rf`, `del /f`, `rmdir /s`
Destructive disk ops	`mkfs`, `dd if=`, `> /dev/sd*`
System commands	`shutdown`, `reboot`, `poweroff`
Fork bombs	`:(){ ... };:`
Remote code execution	`curl \| sh`, `wget -O - \| sh`
Reverse shells	`/dev/tcp/`, `nc -e`
Eval injection	`eval $()`, `base64 -d \| sh`

SSRF protection -- 3-step validation:

flowchart TD
    URL["URL to fetch"] --> S1["Step 1: Check blocked hostnames<br/>localhost, *.local, *.internal,<br/>metadata.google.internal"]
    S1 --> S2["Step 2: Check private IP ranges<br/>10.0.0.0/8, 172.16.0.0/12,<br/>192.168.0.0/16, 127.0.0.0/8,<br/>169.254.0.0/16, IPv6 loopback/link-local"]
    S2 --> S3["Step 3: DNS Pinning<br/>Resolve domain, check every resolved IP.<br/>Also applied to redirect targets."]
    S3 --> ALLOW["Allow request"]

Path traversal: resolvePath() applies filepath.Clean() then HasPrefix() to ensure all paths stay within the workspace. With restrict = true, any path outside the workspace is blocked.

PathDenyable -- An interface that lets filesystem tools reject specific path prefixes:

type PathDenyable interface {
    DenyPaths(...string)
}

All four filesystem tools (read_file, write_file, list_files, edit) implement PathDenyable. The agent loop calls DenyPaths(".goclaw") at startup to prevent agents from accessing internal data directories. list_files additionally filters denied directories from output entirely -- the agent does not see denied paths in directory listings.

Layer 4: Output Security

Mechanism	Detail
Credential scrubbing	Static regex detection of: OpenAI (`sk-...`), Anthropic (`sk-ant-...`), GitHub (`ghp_/gho_/ghu_/ghs_/ghr_`), AWS (`AKIA...`), generic key-value patterns, connection strings (`postgres://`, `mysql://`), env var patterns (`KEY=`, `SECRET=`, `DSN=`), long hex strings (64+ chars). All replaced with `[REDACTED]`.
Dynamic credential scrubbing	Runtime-registered values (e.g., server IPs) scrubbed alongside static patterns via `AddDynamicScrubValues()`
Web content wrapping	Fetched content wrapped in `<<<EXTERNAL_UNTRUSTED_CONTENT>>>` tags with security warning

Layer 5: Isolation

Per-user workspace isolation -- Two levels prevent cross-user file access:

Level	Scope	Directory Pattern
Per-agent	Each agent gets its own base directory	`~/.goclaw/{agent-key}-workspace/`
Per-user	Each user gets a subdirectory within the agent workspace	`{agent-workspace}/user_{sanitized_id}/`

The workspace is injected into tools via WithToolWorkspace(ctx) context injection. Tools read the workspace from context at execution time (fallback to the struct field for backward compatibility). User IDs are sanitized: anything outside [a-zA-Z0-9_-] becomes an underscore (group:telegram:-1001234 → group_telegram_-1001234).

Docker sandbox -- Container-based isolation for shell command execution:

Hardening	Configuration
Read-only root FS	`--read-only`
Drop all capabilities	`--cap-drop ALL`
No new privileges	`--security-opt no-new-privileges`
Memory limit	512 MB
CPU limit	1.0
PID limit	Enabled
Network disabled	`--network none`
Tmpfs mounts	`/tmp`, `/var/tmp`, `/run`
Output limit	1 MB
Timeout	300 seconds

2. Encryption

AES-256-GCM encryption for secrets stored in PostgreSQL. Key provided via GOCLAW_ENCRYPTION_KEY environment variable.

What's Encrypted	Table	Column
LLM provider API keys	`llm_providers`	`api_key`
MCP server API keys	`mcp_servers`	`api_key`
Custom tool env vars	`custom_tools`	`env`

Format: "aes-gcm:" + base64(12-byte nonce + ciphertext + GCM tag)

Backward compatible: values without the aes-gcm: prefix are returned as plaintext (for migration from unencrypted data).

3. Rate Limiting -- Gateway + Tool

Protection at two levels: gateway-wide (per user/IP) and tool-level (per session).

flowchart TD
    subgraph "Gateway Level"
        GW_REQ["Request"] --> GW_CHECK{"rate_limit_rpm > 0?"}
        GW_CHECK -->|No| GW_PASS["Allow all"]
        GW_CHECK -->|Yes| GW_BUCKET{"Token bucket<br/>has capacity?"}
        GW_BUCKET -->|Available| GW_ALLOW["Allow + consume token"]
        GW_BUCKET -->|Exhausted| GW_REJECT["WS: INVALID_REQUEST error<br/>HTTP: 429 + Retry-After header"]
    end

    subgraph "Tool Level"
        TL_REQ["Tool call"] --> TL_CHECK{"Entries in<br/>last 1 hour?"}
        TL_CHECK -->|">= maxPerHour"| TL_REJECT["Error: rate limit exceeded"]
        TL_CHECK -->|"< maxPerHour"| TL_ALLOW["Record + allow"]
    end

Level	Algorithm	Key	Burst	Cleanup
Gateway	Token bucket	user/IP	5	Every 5 min (inactive > 10 min)
Tool	Sliding window	`agent:userID`	N/A	Manual `Cleanup()`

Gateway rate limiting applies to both WebSocket (chat.send) and HTTP (/v1/chat/completions) chat endpoints. Config: gateway.rate_limit_rpm (0 = disabled, any positive value = enabled).

4. RBAC -- 3 Roles

Role-based access control for WebSocket RPC methods and HTTP API endpoints. Roles are hierarchical: higher levels include all permissions of lower levels.

flowchart LR
    V["Viewer (level 1)<br/>Read-only access"] --> O["Operator (level 2)<br/>Read + Write"]
    O --> A["Admin (level 3)<br/>Full control"]

Role	Key Permissions
Viewer	agents.list, config.get, sessions.list, health, status, skills.list
Operator	+ chat.send, chat.abort, sessions.delete/reset, cron.*, skills.update
Admin	+ config.apply/patch, agents.create/update/delete, channels.toggle, device.pair.approve/revoke

Access Check Flow

flowchart TD
    REQ["Method call"] --> S1["Step 1: MethodRole(method)<br/>Determine minimum required role"]
    S1 --> S2{"Step 2: roleLevel(user) >= roleLevel(required)?"}
    S2 -->|Yes| ALLOW["Allow"]
    S2 -->|No| DENY["Deny"]
    S2 --> S3["Step 3 (optional):<br/>CanAccessWithScopes() for tokens<br/>with narrow scope restrictions"]

Token-based role assignment happens during the WebSocket connect handshake. Scopes include: operator.admin, operator.read, operator.write, operator.approvals, operator.pairing.

5. Sandbox -- Container Lifecycle

Docker-based code isolation for shell command execution.

flowchart TD
    REQ["Exec request"] --> CHECK{"ShouldSandbox?"}
    CHECK -->|off| HOST["Execute on host<br/>timeout: 60s"]
    CHECK -->|non-main / all| SCOPE["ResolveScopeKey()"]
    SCOPE --> GET["DockerManager.Get(scopeKey)"]
    GET --> EXISTS{"Container exists?"}
    EXISTS -->|Yes| REUSE["Reuse existing container"]
    EXISTS -->|No| CREATE["docker run -d<br/>+ security flags<br/>+ resource limits<br/>+ workspace mount"]
    REUSE --> EXEC["docker exec sh -c [cmd]<br/>timeout: 300s"]
    CREATE --> EXEC
    EXEC --> RESULT["ExecResult{ExitCode, Stdout, Stderr}"]

Sandbox Modes

Mode	Behavior
`off` (default)	Execute directly on host
`non-main`	Sandbox all agents except main/default
`all`	Sandbox every agent

Container Scope

Scope	Reuse Level	Scope Key
`session` (default)	One container per session	sessionKey
`agent`	Shared across sessions for the same agent	`"agent:" + agentID`
`shared`	One container for all agents	`"shared"`

Workspace Access

Mode	Mount
`none`	No workspace access
`ro`	Read-only mount
`rw`	Read-write mount

Auto-Pruning

Parameter	Default	Action
`idle_hours`	24	Remove containers idle for more than 24 hours
`max_age_days`	7	Remove containers older than 7 days
`prune_interval_min`	5	Check every 5 minutes

FsBridge -- File Operations in Sandbox

Operation	Docker Command
ReadFile	`docker exec [id] cat -- [path]`
WriteFile	`docker exec -i [id] sh -c 'cat > [path]'`
ListDir	`docker exec [id] ls -la -- [path]`
Stat	`docker exec [id] stat -- [path]`

6. Security Logging Convention

All security events use slog.Warn with a security.* prefix for consistent filtering and alerting.

Event	Meaning
`security.injection_detected`	Prompt injection pattern detected
`security.injection_blocked`	Message blocked due to injection (when action = block)
`security.rate_limited`	Request rejected due to rate limit
`security.cors_rejected`	WebSocket connection rejected due to CORS policy
`security.message_truncated`	Message truncated because it exceeded the size limit

Filter all security events by grepping for the security. prefix in log output.

7. Hook Recursion Prevention

The hook system (quality gates) can trigger infinite recursion: an agent evaluator delegates to a reviewer → delegation completes → fires quality gate → delegates to reviewer again → infinite loop.

A context flag hooks.WithSkipHooks(ctx, true) prevents this. Three injection points set the flag:

Injection Point	Why
Agent evaluator	Delegating to the reviewer for quality checks must not re-trigger gates
Evaluate-optimize loop	All internal generator/evaluator delegations skip gates
Agent eval callback (cmd layer)	When the hook engine itself triggers delegation

DelegateManager.Delegate() checks hooks.SkipHooksFromContext(ctx) before applying quality gates. If the flag is set, gates are skipped entirely.

8. Group File Writer Restrictions

In group chats (Telegram), write-sensitive operations are restricted to designated writers. This prevents unauthorized users from modifying agent files or resetting sessions in shared groups.

flowchart TD
    CMD["Write-sensitive command<br/>(/reset, /addwriter, file writes)"] --> GROUP{"In group chat?"}
    GROUP -->|No| ALLOW["Allow (DM = no restriction)"]
    GROUP -->|Yes| CHECK["Check IsGroupFileWriter()<br/>(agentID, groupID, senderID)"]
    CHECK -->|Writer| ALLOW
    CHECK -->|Not writer| DENY["Deny operation"]
    CHECK -->|DB error| FALLBACK["Fail-open: Allow<br/>(log security.reset_writer_check_failed)"]

Group ID Format

group:{channel}:{chatID} — for example, group:telegram:-1001234567.

Managed Commands

Command	Restriction
`/reset`	Writers only in groups
`/addwriter`	Writers only (reply to target user to add)
`/removewriter`	Writers only
`/writers`	No restriction (informational)
File writes (exec)	Writers only in groups

Writers are managed via /addwriter (reply to a user's message) and /removewriter commands. The writer list is stored per-agent per-group in the agent store.

9. Delegation Security

Agent delegation uses directed permissions via the agent_links table.

Control	Scope	Description
Directed links	A → B	A single row `(A→B, outbound)` means A can delegate to B, not the reverse
Per-user deny/allow	Per-link	`settings` JSONB on each link holds per-user restrictions (premium users only, blocked accounts)
Per-link concurrency	A → B	`agent_links.max_concurrent` limits simultaneous delegations from A to B
Per-agent load cap	B (all sources)	`other_config.max_delegation_load` limits total concurrent delegations targeting B

When concurrency limits are hit, the error message is written for LLM reasoning: "Agent at capacity (5/5). Try a different agent or handle it yourself."

File Reference

File	Description
`internal/agent/input_guard.go`	Injection pattern detection (6 patterns)
`internal/tools/scrub.go`	Credential scrubbing (regex-based redaction)
`internal/tools/shell.go`	Shell deny patterns, command validation
`internal/tools/web_fetch.go`	Web content wrapping, SSRF protection
`internal/permissions/policy.go`	RBAC (3 roles, scope-based access)
`internal/gateway/ratelimit.go`	Gateway-level token bucket rate limiter
`internal/sandbox/`	Docker sandbox manager, FsBridge
`internal/crypto/aes.go`	AES-256-GCM encrypt/decrypt
`internal/tools/types.go`	PathDenyable interface definition
`internal/tools/filesystem.go`	Denied path checking (`checkDeniedPath` helper)
`internal/tools/filesystem_list.go`	Denied path support + directory filtering
`internal/hooks/context.go`	WithSkipHooks / SkipHooksFromContext (recursion prevention)
`internal/hooks/engine.go`	Hook engine, evaluator registry

Cross-References

Document	Relevant Content
03-tools-system.md	Shell deny patterns, exec approval, PathDenyable, delegation system, quality gates
04-gateway-protocol.md	WebSocket auth, RBAC, rate limiting
06-store-data-model.md	API key encryption, agent access control pipeline, agent_links table
07-bootstrap-skills-memory.md	Context file merging, virtual files
08-scheduling-cron.md	Scheduler lanes, cron lifecycle, /stop and /stopall
10-tracing-observability.md	Tracing and OTel export

16 KiB Raw Blame History