Cherry-picked valuable changes from PR #206:
- hasReadImageProvider supports chain format {"providers":[...]} config
- create_image/video/audio verify file persistence after write with diagnostic logging
- HistoryEntry gains Media field + CollectMedia() for group media context on @mention
- Zalo extractContentAndMedia refactored: all media types via DetectMIMEType/BuildMediaTags, 20MB limit
- Discord/Zalo pass media paths to Record() and collect historical media on @mention
- Zalo send_helpers logs directory contents when checkFileSize stat fails
Media tools (create_image, create_video, create_audio, read_audio,
read_video, read_document) routed API calls based on provider name
pattern matching (e.g. strings.HasPrefix(name, "gemini")). This breaks
when users give custom names to DB providers — a Gemini provider named
"chatgpt-sap-het" would be misrouted to the OpenAI-compat endpoint,
causing 404 errors.
Fix: carry the DB provider_type through OpenAIProvider, resolve it via
typedProvider interface in ExecuteWithChain, and inject as _provider_type
param for callProvider routing. Name-based heuristic kept as fallback
for config-file providers that don't have a DB type.
Co-authored-by: Luvu182 <208665161+Luvu182@users.noreply.github.com>
ExecuteWithChain previously required all providers to implement
credentialProvider (APIKey/APIBase). OAuth-based providers like
CodexProvider (ChatGPT OAuth) don't expose static credentials,
causing all media tools to fail with "does not expose API credentials".
Make credentialProvider optional (nil when unsupported). Each
callProvider gracefully falls back to the provider's Chat() API
when credentials are unavailable. Generation tools (create_image,
create_video, create_audio) return a clear error since they require
direct API access with no Chat fallback.
Co-authored-by: Luvu182 <208665161+Luvu182@users.noreply.github.com>
- Update go.mod and Dockerfile to Go 1.26
- Apply `go fix ./...` stdlib modernizations across 170+ files
- Add `go fix` to post-implementation checklist in CLAUDE.md
- Fix go fix misapplied rewrite in loop_history.go
Refactor create_image and create_video to use a shared provider chain system.
Each tool now supports an ordered list of providers with per-entry timeout,
max retries, and provider-specific params. Includes MiniMax and DashScope
image/video generation implementations.
- New media_provider_chain.go: shared chain resolution, retry execution, limitedReadAll
- create_image: refactored to ExecuteWithChain, added MiniMax + DashScope providers
- create_video: refactored to ExecuteWithChain, added MiniMax async video generation
- Backward compatible with legacy {provider, model} settings format
- Fix create_video: use predictLongRunning API instead of generateContent
(async polling flow: POST → poll every 10s → download video from URI)
- Fix durationSeconds as int (not string) per actual Gemini API requirement
- Fix MediaRef collection order: historical first, current last, so
refs[len-1] always returns the most recent file (fixes read_audio
picking up old file instead of current voice message)
- Remove misleading "video not yet supported" text from Telegram handler
that prevented LLM from calling read_video tool
- Add isNonChatModel() to skip chat-based verify for generation models
(veo-*, dall-e-*, imagen-*, gemini-*-image)
- Add read_audio tool with Gemini File API, OpenAI input_audio, and fallback support
- Add read_video tool with Gemini File API and base64 fallback for video analysis
- Add create_video tool with Gemini Veo and OpenRouter chat completions support
- Add shared gemini_file_api.go for upload → poll → generateContent pipeline
- Add shared openai_compat_call.go for custom JSON chat completions
- Fix system prompt showing denied tools: use filteredToolNames() instead of tools.List()
- Wire audio/video MediaRef context propagation in agent loop
- Register new tools in seed data, policy groups, and web UI settings
- Enforce duration (max 30s) and aspect_ratio limits on create_video