docs: Add comprehensive documentation suite

- Project overview, system architecture, code standards - API reference with 15+ examples - Quick start guide with troubleshooting - Updated README with feature highlights and compatibility matrix
2026-04-17 13:20:56 +00:00 · 2026-04-05 11:47:18 +07:00
parent a1113e02aa
commit 170cdb1324
8 changed files with 1929 additions and 10 deletions
--- a/docs/system-architecture.md
+++ b/docs/system-architecture.md
@@ -0,0 +1,419 @@
+# System Architecture
+
+## High-Level Overview
+
+Claude Central Gateway acts as a protocol translator between Anthropic's Messages API and OpenAI's Chat Completions API. Requests flow through a series of transformation stages with minimal overhead.
+
+```
+Client (Claude Code)
+    ↓
+HTTP Request (Anthropic API format)
+    ↓
+[Auth Middleware] → Validates x-api-key token
+    ↓
+[Model Mapping] → Maps claude-* model names to openai models
+    ↓
+[Request Transformation] → Anthropic format → OpenAI format
+    ↓
+[OpenAI Client] → Sends request to OpenAI API
+    ↓
+OpenAI Response Stream
+    ↓
+[Response Transformation] → OpenAI format → Anthropic SSE format
+    ↓
+HTTP Response (Anthropic SSE or JSON)
+    ↓
+Client receives response
+```
+
+## Request Flow (Detailed)
+
+### 1. Incoming Request
+```
+POST /v1/messages HTTP/1.1
+Host: gateway.example.com
+x-api-key: my-secret-token
+Content-Type: application/json
+
+{
+  "model": "claude-sonnet-4-20250514",
+  "messages": [...],
+  "tools": [...],
+  "stream": true,
+  ...
+}
+```
+
+### 2. Authentication Stage
+- **Middleware**: `authMiddleware()` from `auth-middleware.js`
+- **Input**: HTTP request with headers
+- **Process**:
+  1. Extract `x-api-key` header or `Authorization: Bearer` header
+  2. Compare against `GATEWAY_TOKEN` using `timingSafeEqual()` (constant-time comparison)
+  3. If invalid: Return 401 Unauthorized
+  4. If valid: Proceed to next middleware
+
+### 3. Model Mapping
+- **Module**: `openai-client.js`
+- **Input**: Model name from request (e.g., `claude-sonnet-4-20250514`)
+- **Process**:
+  1. Check `MODEL_MAP` environment variable (format: `claude:gpt-4o,claude-opus:gpt-4-turbo`)
+  2. If mapping found: Use mapped model name
+  3. If no mapping: Use original model name as fallback
+- **Output**: Canonical OpenAI model name (e.g., `gpt-4o`)
+
+### 4. Request Transformation
+- **Module**: `transform-request.js`, function `buildOpenAIRequest()`
+- **Input**: Anthropic request body + mapped model name
+- **Transformations**:
+
+  **Parameters** (direct pass-through with mappings):
+  - `max_tokens` → `max_tokens`
+  - `temperature` → `temperature`
+  - `top_p` → `top_p`
+  - `stream` → `stream` (and adds `stream_options: { include_usage: true }`)
+  - `stop_sequences` → `stop` array
+
+  **Tools**:
+  - Convert Anthropic tool definitions to OpenAI function tools
+  - Map `tool_choice` enum to OpenAI tool_choice format
+
+  **Messages Array** (complex transformation):
+  - **System message**: String or array of text blocks → Single system message
+  - **User messages**: Handle text, images, and tool_result blocks
+  - **Assistant messages**: Handle text and tool_use blocks
+
+  **Content Block Handling**:
+  - `text`: Preserved as-is
+  - `image` (base64 or URL): Converted to `image_url` format
+  - `tool_use`: Converted to OpenAI `tool_calls`
+  - `tool_result`: Split into separate tool messages
+  - Other blocks (thinking, cache_control): Filtered out
+
+- **Output**: OpenAI Chat Completions request payload (object, not stringified)
+
+### 5. OpenAI API Call
+- **Module**: `routes/messages.js` route handler
+- **Process**:
+  1. Serialize payload to JSON
+  2. Send to OpenAI API with authentication header
+  3. If streaming: Request returns async iterable of chunks
+  4. If non-streaming: Request returns single response object
+
+### 6. Response Transformation
+
+#### Non-Streaming Path
+- **Module**: `transform-response.js`, function `transformResponse()`
+- **Input**: OpenAI response object + original Anthropic request
+- **Process**:
+  1. Extract first choice from OpenAI response
+  2. Build content blocks array:
+     - Extract text from `message.content` if present
+     - Extract tool_calls and convert to Anthropic `tool_use` format
+  3. Map OpenAI `finish_reason` to Anthropic `stop_reason`
+  4. Build response envelope with message metadata
+  5. Convert usage tokens (prompt/completion → input/output)
+- **Output**: Single Anthropic message response object
+
+#### Streaming Path
+- **Module**: `transform-response.js`, function `streamAnthropicResponse()`
+- **Input**: Hono context + OpenAI response stream + original Anthropic request
+- **Process**:
+  1. Emit `message_start` event with empty message envelope
+  2. For each OpenAI chunk:
+     - Track `finish_reason` for final stop_reason
+     - Handle text deltas: Send `content_block_start`, `content_block_delta`, `content_block_stop`
+     - Handle tool_calls deltas: Similar sequencing, buffer arguments
+     - Track usage tokens from final chunk
+  3. Emit `message_delta` with final stop_reason and output tokens
+  4. Emit `message_stop` to mark end of stream
+- **Output**: Server-Sent Events stream (Content-Type: text/event-stream)
+
+### 7. HTTP Response
+```
+HTTP/1.1 200 OK
+Content-Type: text/event-stream (streaming) or application/json (non-streaming)
+
+event: message_start
+data: {"type":"message_start","message":{...}}
+
+event: content_block_start
+data: {"type":"content_block_start",...}
+
+event: content_block_delta
+data: {"type":"content_block_delta",...}
+
+event: message_delta
+data: {"type":"message_delta",...}
+
+event: message_stop
+data: {"type":"message_stop"}
+```
+
+## Tool Use Round-Trip (Special Case)
+
+Complete workflow for tool execution:
+
+### Step 1: Initial Request with Tools
+```
+Client sends:
+{
+  "messages": [{"role": "user", "content": "Search for X"}],
+  "tools": [{"name": "search", "description": "...", "input_schema": {...}}]
+}
+```
+
+### Step 2: Model Selects Tool
+```
+OpenAI responds:
+{
+  "choices": [{
+    "message": {
+      "content": null,
+      "tool_calls": [{"id": "call_123", "function": {"name": "search", "arguments": "{..."}}]
+    }
+  }]
+}
+```
+
+### Step 3: Transform & Return to Client
+```
+Gateway converts:
+{
+  "content": [
+    {"type": "tool_use", "id": "call_123", "name": "search", "input": {...}}
+  ],
+  "stop_reason": "tool_use"
+}
+```
+
+### Step 4: Client Executes Tool and Responds
+```
+Client sends:
+{
+  "messages": [
+    {"role": "user", "content": "Search for X"},
+    {"role": "assistant", "content": [{"type": "tool_use", "id": "call_123", ...}]},
+    {"role": "user", "content": [
+      {"type": "tool_result", "tool_use_id": "call_123", "content": "Result: ..."}
+    ]}
+  ]
+}
+```
+
+### Step 5: Transform & Forward to OpenAI
+```
+Gateway converts:
+{
+  "messages": [
+    {"role": "user", "content": "Search for X"},
+    {"role": "assistant", "content": null, "tool_calls": [...]},
+    {"role": "tool", "tool_call_id": "call_123", "content": "Result: ..."}
+  ]
+}
+```
+
+### Step 6: Model Continues
+OpenAI processes tool result and continues conversation.
+
+## Stop Reason Mapping
+
+| OpenAI `finish_reason` | Anthropic `stop_reason` | Notes |
+|----------------------|----------------------|-------|
+| `stop` | `end_turn` | Normal completion |
+| `stop` (with stop_sequences) | `stop_sequence` | Hit user-specified stop sequence |
+| `length` | `max_tokens` | Hit max_tokens limit |
+| `tool_calls` | `tool_use` | Model selected a tool |
+| `content_filter` | `end_turn` | Content filtered by safety filters |
+
+## Data Structures
+
+### Request Object (Anthropic format)
+```javascript
+{
+  model: string,
+  messages: [{
+    role: "user" | "assistant",
+    content: string | [{
+      type: "text" | "image" | "tool_use" | "tool_result",
+      text?: string,
+      source?: {type: "base64" | "url", media_type?: string, data?: string, url?: string},
+      id?: string,
+      name?: string,
+      input?: object,
+      tool_use_id?: string,
+      is_error?: boolean
+    }]
+  }],
+  system?: string | [{type: "text", text: string}],
+  tools?: [{
+    name: string,
+    description: string,
+    input_schema: object
+  }],
+  tool_choice?: {type: "auto" | "any" | "none" | "tool", name?: string},
+  max_tokens: number,
+  temperature?: number,
+  top_p?: number,
+  stop_sequences?: string[],
+  stream?: boolean
+}
+```
+
+### Response Object (Anthropic format)
+```javascript
+{
+  id: string,
+  type: "message",
+  role: "assistant",
+  content: [{
+    type: "text" | "tool_use",
+    text?: string,
+    id?: string,
+    name?: string,
+    input?: object
+  }],
+  model: string,
+  stop_reason: "end_turn" | "max_tokens" | "stop_sequence" | "tool_use",
+  usage: {
+    input_tokens: number,
+    output_tokens: number
+  }
+}
+```
+
+## Deployment Topology
+
+### Single-Instance Deployment (Typical)
+```
+                    ┌─────────────────────┐
+                    │   Claude Code       │
+                    │   (Claude IDE)      │
+                    └──────────┬──────────┘
+                               │ HTTP/HTTPS
+                               ▼
+                    ┌─────────────────────┐
+                    │  Claude Central     │
+                    │  Gateway (Vercel)   │
+                    │  ┌────────────────┐ │
+                    │  │ Auth            │ │
+                    │  │ Transform Req   │ │
+                    │  │ Transform Resp  │ │
+                    │  └────────────────┘ │
+                    └──────────┬──────────┘
+                               │ HTTP/HTTPS
+                               ▼
+                    ┌─────────────────────┐
+                    │   OpenAI API        │
+                    │   chat/completions  │
+                    └─────────────────────┘
+```
+
+### Multi-Instance Deployment (Stateless)
+Multiple gateway instances can run independently. Requests distribute via:
+- Load balancer (Vercel built-in, Cloudflare routing)
+- Client-side retry on failure
+
+Each instance:
+- Shares same `GATEWAY_TOKEN` for authentication
+- Shares same `MODEL_MAP` for consistent routing
+- Connects independently to OpenAI
+
+No coordination required between instances.
+
+## Scalability Characteristics
+
+### Horizontal Scaling
+- ✅ Fully stateless: Add more instances without coordination
+- ✅ No shared state: Each instance owns only active requests
+- ✅ Database-free: No bottleneck or single point of failure
+
+### Rate Limiting
+- ⚠️ Currently none: Single token shared across all users
+- Recommendation: Implement per-token or per-IP rate limiting if needed
+
+### Performance
+- Latency: ~50-200ms overhead per request (serialization + HTTP)
+- Throughput: Limited by OpenAI API tier, not gateway capacity
+- Memory: ~20MB per instance (Hono + dependencies)
+
+## Error Handling Architecture
+
+### Authentication Errors
+```
+Client → Gateway (missing/invalid token)
+         └→ Return 401 with error details
+            No API call made
+```
+
+### Transform Errors
+```
+Client → Gateway → Transform fails (malformed request)
+                   └→ Return 400 Bad Request
+                      No API call made
+```
+
+### OpenAI API Errors
+```
+Client → Gateway → OpenAI API returns error
+                   └→ Convert to Anthropic error format
+                      └→ Return to client
+```
+
+### Network Errors
+```
+Client → Gateway → OpenAI unreachable
+                   └→ Timeout or connection error
+                      └→ Return 500 Internal Server Error
+```
+
+## Security Model
+
+### Authentication
+- **Method**: Single shared token (`GATEWAY_TOKEN`)
+- **Comparison**: Timing-safe to prevent brute-force via timing attacks
+- **Suitable for**: Personal use, small teams with trusted members
+- **Not suitable for**: Multi-tenant, public access, high-security requirements
+
+### Token Locations
+- Client stores in `ANTHROPIC_AUTH_TOKEN` environment variable
+- Server validates against `GATEWAY_TOKEN` environment variable
+- Never logged or exposed in error messages
+
+### Recommendations for Production
+1. Use strong, randomly generated token (32+ characters)
+2. Rotate token periodically
+3. Use HTTPS only (Vercel provides free HTTPS)
+4. Consider rate limiting by IP if exposed to untrusted networks
+5. Monitor token usage logs for suspicious patterns
+
+## Monitoring & Observability
+
+### Built-in Logging
+- Hono logger middleware logs all requests (method, path, status, latency)
+- Errors logged to console with stack traces
+
+### Recommended Additions
+- Request/response body logging (for debugging, exclude in production)
+- Token usage tracking (prompt/completion tokens)
+- API error rate monitoring
+- Latency percentiles (p50, p95, p99)
+- OpenAI API quota tracking
+
+## Future Architecture Considerations
+
+### Potential Enhancements
+1. **Per-request authentication**: Support API keys per user/token
+2. **Request routing**: Route based on model, user, or other properties
+3. **Response caching**: Cache repeated identical requests
+4. **Rate limiting**: Token bucket or sliding window per client
+5. **Webhook logging**: Send detailed logs to external system
+6. **Provider abstraction**: Support multiple backends (Google, Anthropic, etc.)
+
+### Current Constraints Preventing Enhancement
+- Single-token auth: No per-user isolation
+- Minimal state: Cannot track usage per user
+- Stateless design: Cannot implement caching or rate limiting without storage
+- Simple model mapping: Cannot route intelligently
+
+These are intentional trade-offs prioritizing simplicity over flexibility.