Cherry-picked valuable changes from PR #206:
- hasReadImageProvider supports chain format {"providers":[...]} config
- create_image/video/audio verify file persistence after write with diagnostic logging
- HistoryEntry gains Media field + CollectMedia() for group media context on @mention
- Zalo extractContentAndMedia refactored: all media types via DetectMIMEType/BuildMediaTags, 20MB limit
- Discord/Zalo pass media paths to Record() and collect historical media on @mention
- Zalo send_helpers logs directory contents when checkFileSize stat fails
Media tools (create_image, create_video, create_audio, read_audio,
read_video, read_document) routed API calls based on provider name
pattern matching (e.g. strings.HasPrefix(name, "gemini")). This breaks
when users give custom names to DB providers — a Gemini provider named
"chatgpt-sap-het" would be misrouted to the OpenAI-compat endpoint,
causing 404 errors.
Fix: carry the DB provider_type through OpenAIProvider, resolve it via
typedProvider interface in ExecuteWithChain, and inject as _provider_type
param for callProvider routing. Name-based heuristic kept as fallback
for config-file providers that don't have a DB type.
Co-authored-by: Luvu182 <208665161+Luvu182@users.noreply.github.com>
The per-agent `imageGen` and `vision` fields in `ToolPolicySpec` (stored
in agents.tools_config JSONB) were added in d5cc5a7 (Feb 26) as the
original way to configure image/vision providers. When the media provider
chain system was introduced in 5815437 (Mar 8), these fields were kept
"for backward compat" but became dead code with no UI to manage them.
This causes a hard-to-debug issue: if an agent's tools_config contains
stale imageGen/vision data (set via API or leftover from DB), it silently
overrides the global provider chain configured in the builtin tools UI.
Users see the correct chain in the UI but the tool calls a completely
different provider/model, with no indication of why.
Changes:
- Remove Vision and ImageGen fields + struct definitions from ToolPolicySpec
- Remove associated context helpers (WithVisionConfig, WithImageGenConfig, etc.)
- Remove per-agent override injection in agent loop
- Simplify create_image and read_image to use chain as sole source of truth
- UI: whitelist known tools_config fields on save to clean stale DB data
Co-authored-by: Luvu182 <208665161+Luvu182@users.noreply.github.com>
ExecuteWithChain previously required all providers to implement
credentialProvider (APIKey/APIBase). OAuth-based providers like
CodexProvider (ChatGPT OAuth) don't expose static credentials,
causing all media tools to fail with "does not expose API credentials".
Make credentialProvider optional (nil when unsupported). Each
callProvider gracefully falls back to the provider's Chat() API
when credentials are unavailable. Generation tools (create_image,
create_video, create_audio) return a clear error since they require
direct API access with no Chat fallback.
Co-authored-by: Luvu182 <208665161+Luvu182@users.noreply.github.com>
- Update go.mod and Dockerfile to Go 1.26
- Apply `go fix ./...` stdlib modernizations across 170+ files
- Add `go fix` to post-implementation checklist in CLAUDE.md
- Fix go fix misapplied rewrite in loop_history.go
Refactor create_image and create_video to use a shared provider chain system.
Each tool now supports an ordered list of providers with per-entry timeout,
max retries, and provider-specific params. Includes MiniMax and DashScope
image/video generation implementations.
- New media_provider_chain.go: shared chain resolution, retry execution, limitedReadAll
- create_image: refactored to ExecuteWithChain, added MiniMax + DashScope providers
- create_video: refactored to ExecuteWithChain, added MiniMax async video generation
- Backward compatible with legacy {provider, model} settings format
- Add persistent media storage (internal/media/) replacing temp file deletion
- Add MediaRef type for lightweight media references in session messages
- Refactor media pipeline to use bus.MediaFile{Path, MimeType} across all channels
- Add read_document builtin tool for PDF/DOCX/XLSX analysis via Gemini native API
- Move image sanitization from Telegram to shared agent/media layer
- Add media reload for multi-turn conversations (images from last 5 messages)
- Add reply-to-message media resolution for Telegram (re-download on reply)
- Add media inventory to compaction summary to preserve awareness after truncation
- Fix coreToolSummaries for read_image, read_document, create_image tools
- Add real-time trace update events via WebSocket broadcast
- Improve trace detail UI with media refs and tool result display
Root cause: create_image tool only set ForLLM:"MEDIA:path" but never
populated result.Media. The main agent loop parses MEDIA: prefix via
parseMediaResult(), but the subagent exec loop only checked result.Media
— so media paths were silently lost for all subagent/spawn workflows.
This caused the entire downstream pipeline (task.Media → AnnounceQueue →
PublishInbound → ContentSuffix → session) to receive empty media, making
images invisible in WS chat despite Telegram working fine.
Fixes:
- create_image.go: set result.Media = []string{imagePath}
- subagent_exec.go: add MEDIA: prefix fallback parsing as safety net
- Use errors.Is() instead of direct sentinel comparison (13 instances)
- Convert if/else-if chains to switch/case for same-variable comparisons
- Remove redundant bitwise OR with zero
- Add post-implementation checklist to CLAUDE.md
- Fix resolvePath for nested non-existent dirs (use resolveThroughExistingAncestors)
- Channel-isolated workspace: user_agent_profiles.workspace stores channel prefix,
used as source of truth with backward compat for existing users
- Loop caches workspace per-user with CacheKindUserWorkspace invalidation via pubsub
- ContractHome/ExpandHome for portable ~-based paths in DB
- create_image saves to workspace/generated/YYYY-MM-DD/ instead of OS temp dir
- SOUL.md template: add ## Expertise section for domain knowledge
- Summoner buildEditPrompt: section guide, complete file output, frontmatter update
- Bus: Topic* constants for Subscribe/Broadcast keys, CacheKind* for payload kinds
- Teams, delegates, sessions, agent links: various enhancements
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
create_image exclusively used /chat/completions with modalities:["image","text"]
which only works on OpenRouter. Gemini returns HTTP 400:
"Image generation is not yet supported on the chat.completions endpoint"
OpenAI's DALL-E models also require /images/generations, not /chat/completions.
Fix: route OpenRouter through /chat/completions (supports modalities),
route all other providers (Gemini, OpenAI, etc.) through the standard
/images/generations endpoint with response_format:"b64_json".
Also update default Gemini model from deprecated gemini-2.0-flash-exp
to gemini-2.5-flash-image.