mirror of https://github.com/tiennm99/claude-central-gateway.git synced 2026-04-17 17:21:03 +00:00

Files

tiennm99 170cdb1324 docs: Add comprehensive documentation suite

- Project overview, system architecture, code standards
- API reference with 15+ examples
- Quick start guide with troubleshooting
- Updated README with feature highlights and compatibility matrix

2026-04-05 11:47:18 +07:00

13 KiB

Raw Blame History

System Architecture

High-Level Overview

Claude Central Gateway acts as a protocol translator between Anthropic's Messages API and OpenAI's Chat Completions API. Requests flow through a series of transformation stages with minimal overhead.

Client (Claude Code)
    ↓
HTTP Request (Anthropic API format)
    ↓
[Auth Middleware] → Validates x-api-key token
    ↓
[Model Mapping] → Maps claude-* model names to openai models
    ↓
[Request Transformation] → Anthropic format → OpenAI format
    ↓
[OpenAI Client] → Sends request to OpenAI API
    ↓
OpenAI Response Stream
    ↓
[Response Transformation] → OpenAI format → Anthropic SSE format
    ↓
HTTP Response (Anthropic SSE or JSON)
    ↓
Client receives response

Request Flow (Detailed)

1. Incoming Request

POST /v1/messages HTTP/1.1
Host: gateway.example.com
x-api-key: my-secret-token
Content-Type: application/json

{
  "model": "claude-sonnet-4-20250514",
  "messages": [...],
  "tools": [...],
  "stream": true,
  ...
}

2. Authentication Stage

Middleware: authMiddleware() from auth-middleware.js
Input: HTTP request with headers
Process:
1. Extract x-api-key header or Authorization: Bearer header
2. Compare against GATEWAY_TOKEN using timingSafeEqual() (constant-time comparison)
3. If invalid: Return 401 Unauthorized
4. If valid: Proceed to next middleware

3. Model Mapping

Module: openai-client.js
Input: Model name from request (e.g., claude-sonnet-4-20250514)
Process:
1. Check MODEL_MAP environment variable (format: claude:gpt-4o,claude-opus:gpt-4-turbo)
2. If mapping found: Use mapped model name
3. If no mapping: Use original model name as fallback
Output: Canonical OpenAI model name (e.g., gpt-4o)

4. Request Transformation

Module: transform-request.js, function buildOpenAIRequest()
Input: Anthropic request body + mapped model name
Transformations:

Parameters (direct pass-through with mappings):
- max_tokens → max_tokens
- temperature → temperature
- top_p → top_p
- stream → stream (and adds stream_options: { include_usage: true })
- stop_sequences → stop array
Tools:
- Convert Anthropic tool definitions to OpenAI function tools
- Map tool_choice enum to OpenAI tool_choice format
Messages Array (complex transformation):
- System message: String or array of text blocks → Single system message
- User messages: Handle text, images, and tool_result blocks
- Assistant messages: Handle text and tool_use blocks
Content Block Handling:
- text: Preserved as-is
- image (base64 or URL): Converted to image_url format
- tool_use: Converted to OpenAI tool_calls
- tool_result: Split into separate tool messages
- Other blocks (thinking, cache_control): Filtered out
Output: OpenAI Chat Completions request payload (object, not stringified)

5. OpenAI API Call

Module: routes/messages.js route handler
Process:
1. Serialize payload to JSON
2. Send to OpenAI API with authentication header
3. If streaming: Request returns async iterable of chunks
4. If non-streaming: Request returns single response object

6. Response Transformation

Non-Streaming Path

Module: transform-response.js, function transformResponse()
Input: OpenAI response object + original Anthropic request
Process:
1. Extract first choice from OpenAI response
2. Build content blocks array:
  - Extract text from message.content if present
  - Extract tool_calls and convert to Anthropic tool_use format
3. Map OpenAI finish_reason to Anthropic stop_reason
4. Build response envelope with message metadata
5. Convert usage tokens (prompt/completion → input/output)
Output: Single Anthropic message response object

Streaming Path

Module: transform-response.js, function streamAnthropicResponse()
Input: Hono context + OpenAI response stream + original Anthropic request
Process:
1. Emit message_start event with empty message envelope
2. For each OpenAI chunk:
  - Track finish_reason for final stop_reason
  - Handle text deltas: Send content_block_start, content_block_delta, content_block_stop
  - Handle tool_calls deltas: Similar sequencing, buffer arguments
  - Track usage tokens from final chunk
3. Emit message_delta with final stop_reason and output tokens
4. Emit message_stop to mark end of stream
Output: Server-Sent Events stream (Content-Type: text/event-stream)

7. HTTP Response

HTTP/1.1 200 OK
Content-Type: text/event-stream (streaming) or application/json (non-streaming)

event: message_start
data: {"type":"message_start","message":{...}}

event: content_block_start
data: {"type":"content_block_start",...}

event: content_block_delta
data: {"type":"content_block_delta",...}

event: message_delta
data: {"type":"message_delta",...}

event: message_stop
data: {"type":"message_stop"}

Tool Use Round-Trip (Special Case)

Complete workflow for tool execution:

Step 1: Initial Request with Tools

Client sends:
{
  "messages": [{"role": "user", "content": "Search for X"}],
  "tools": [{"name": "search", "description": "...", "input_schema": {...}}]
}

Step 2: Model Selects Tool

OpenAI responds:
{
  "choices": [{
    "message": {
      "content": null,
      "tool_calls": [{"id": "call_123", "function": {"name": "search", "arguments": "{..."}}]
    }
  }]
}

Step 3: Transform & Return to Client

Gateway converts:
{
  "content": [
    {"type": "tool_use", "id": "call_123", "name": "search", "input": {...}}
  ],
  "stop_reason": "tool_use"
}

Step 4: Client Executes Tool and Responds

Client sends:
{
  "messages": [
    {"role": "user", "content": "Search for X"},
    {"role": "assistant", "content": [{"type": "tool_use", "id": "call_123", ...}]},
    {"role": "user", "content": [
      {"type": "tool_result", "tool_use_id": "call_123", "content": "Result: ..."}
    ]}
  ]
}

Step 5: Transform & Forward to OpenAI

Gateway converts:
{
  "messages": [
    {"role": "user", "content": "Search for X"},
    {"role": "assistant", "content": null, "tool_calls": [...]},
    {"role": "tool", "tool_call_id": "call_123", "content": "Result: ..."}
  ]
}

Step 6: Model Continues

OpenAI processes tool result and continues conversation.

Stop Reason Mapping

OpenAI `finish_reason`	Anthropic `stop_reason`	Notes
`stop`	`end_turn`	Normal completion
`stop` (with stop_sequences)	`stop_sequence`	Hit user-specified stop sequence
`length`	`max_tokens`	Hit max_tokens limit
`tool_calls`	`tool_use`	Model selected a tool
`content_filter`	`end_turn`	Content filtered by safety filters

Data Structures

Request Object (Anthropic format)

{
  model: string,
  messages: [{
    role: "user" | "assistant",
    content: string | [{
      type: "text" | "image" | "tool_use" | "tool_result",
      text?: string,
      source?: {type: "base64" | "url", media_type?: string, data?: string, url?: string},
      id?: string,
      name?: string,
      input?: object,
      tool_use_id?: string,
      is_error?: boolean
    }]
  }],
  system?: string | [{type: "text", text: string}],
  tools?: [{
    name: string,
    description: string,
    input_schema: object
  }],
  tool_choice?: {type: "auto" | "any" | "none" | "tool", name?: string},
  max_tokens: number,
  temperature?: number,
  top_p?: number,
  stop_sequences?: string[],
  stream?: boolean
}

Response Object (Anthropic format)

{
  id: string,
  type: "message",
  role: "assistant",
  content: [{
    type: "text" | "tool_use",
    text?: string,
    id?: string,
    name?: string,
    input?: object
  }],
  model: string,
  stop_reason: "end_turn" | "max_tokens" | "stop_sequence" | "tool_use",
  usage: {
    input_tokens: number,
    output_tokens: number
  }
}

Deployment Topology

Single-Instance Deployment (Typical)

                    ┌─────────────────────┐
                    │   Claude Code       │
                    │   (Claude IDE)      │
                    └──────────┬──────────┘
                               │ HTTP/HTTPS
                               ▼
                    ┌─────────────────────┐
                    │  Claude Central     │
                    │  Gateway (Vercel)   │
                    │  ┌────────────────┐ │
                    │  │ Auth            │ │
                    │  │ Transform Req   │ │
                    │  │ Transform Resp  │ │
                    │  └────────────────┘ │
                    └──────────┬──────────┘
                               │ HTTP/HTTPS
                               ▼
                    ┌─────────────────────┐
                    │   OpenAI API        │
                    │   chat/completions  │
                    └─────────────────────┘

Multi-Instance Deployment (Stateless)

Multiple gateway instances can run independently. Requests distribute via:

Load balancer (Vercel built-in, Cloudflare routing)
Client-side retry on failure

Each instance:

Shares same GATEWAY_TOKEN for authentication
Shares same MODEL_MAP for consistent routing
Connects independently to OpenAI

No coordination required between instances.

Scalability Characteristics

Horizontal Scaling

✅ Fully stateless: Add more instances without coordination
✅ No shared state: Each instance owns only active requests
✅ Database-free: No bottleneck or single point of failure

Rate Limiting

⚠️ Currently none: Single token shared across all users
Recommendation: Implement per-token or per-IP rate limiting if needed

Performance

Latency: ~50-200ms overhead per request (serialization + HTTP)
Throughput: Limited by OpenAI API tier, not gateway capacity
Memory: ~20MB per instance (Hono + dependencies)

Error Handling Architecture

Authentication Errors

Client → Gateway (missing/invalid token)
         └→ Return 401 with error details
            No API call made

Transform Errors

Client → Gateway → Transform fails (malformed request)
                   └→ Return 400 Bad Request
                      No API call made

OpenAI API Errors

Client → Gateway → OpenAI API returns error
                   └→ Convert to Anthropic error format
                      └→ Return to client

Network Errors

Client → Gateway → OpenAI unreachable
                   └→ Timeout or connection error
                      └→ Return 500 Internal Server Error

Security Model

Authentication

Method: Single shared token (GATEWAY_TOKEN)
Comparison: Timing-safe to prevent brute-force via timing attacks
Suitable for: Personal use, small teams with trusted members
Not suitable for: Multi-tenant, public access, high-security requirements

Token Locations

Client stores in ANTHROPIC_AUTH_TOKEN environment variable
Server validates against GATEWAY_TOKEN environment variable
Never logged or exposed in error messages

Recommendations for Production

Use strong, randomly generated token (32+ characters)
Rotate token periodically
Use HTTPS only (Vercel provides free HTTPS)
Consider rate limiting by IP if exposed to untrusted networks
Monitor token usage logs for suspicious patterns

Monitoring & Observability

Built-in Logging

Hono logger middleware logs all requests (method, path, status, latency)
Errors logged to console with stack traces

Recommended Additions

Request/response body logging (for debugging, exclude in production)
Token usage tracking (prompt/completion tokens)
API error rate monitoring
Latency percentiles (p50, p95, p99)
OpenAI API quota tracking

Future Architecture Considerations

Potential Enhancements

Per-request authentication: Support API keys per user/token
Request routing: Route based on model, user, or other properties
Response caching: Cache repeated identical requests
Rate limiting: Token bucket or sliding window per client
Webhook logging: Send detailed logs to external system
Provider abstraction: Support multiple backends (Google, Anthropic, etc.)

Current Constraints Preventing Enhancement

Single-token auth: No per-user isolation
Minimal state: Cannot track usage per user
Stateless design: Cannot implement caching or rate limiting without storage
Simple model mapping: Cannot route intelligently

These are intentional trade-offs prioritizing simplicity over flexibility.

13 KiB Raw Blame History