Commit Graph

8 Commits

Author SHA1 Message Date
viettranx 120fc2d09c fix(media): chain provider format, post-write verification, group media history (#206)
Cherry-picked valuable changes from PR #206:
- hasReadImageProvider supports chain format {"providers":[...]} config
- create_image/video/audio verify file persistence after write with diagnostic logging
- HistoryEntry gains Media field + CollectMedia() for group media context on @mention
- Zalo extractContentAndMedia refactored: all media types via DetectMIMEType/BuildMediaTags, 20MB limit
- Discord/Zalo pass media paths to Record() and collect historical media on @mention
- Zalo send_helpers logs directory contents when checkFileSize stat fails
2026-03-18 08:12:10 +07:00
Luan Vu 405a753239 fix: resolve media provider type from DB instead of guessing from name (#154)
Media tools (create_image, create_video, create_audio, read_audio,
read_video, read_document) routed API calls based on provider name
pattern matching (e.g. strings.HasPrefix(name, "gemini")). This breaks
when users give custom names to DB providers — a Gemini provider named
"chatgpt-sap-het" would be misrouted to the OpenAI-compat endpoint,
causing 404 errors.

Fix: carry the DB provider_type through OpenAIProvider, resolve it via
typedProvider interface in ExecuteWithChain, and inject as _provider_type
param for callProvider routing. Name-based heuristic kept as fallback
for config-file providers that don't have a DB type.

Co-authored-by: Luvu182 <208665161+Luvu182@users.noreply.github.com>
2026-03-11 18:32:51 +07:00
Luan Vu fa5f51e72e fix: allow OAuth providers in media tool chain (read_audio, read_image, etc.) (#150)
ExecuteWithChain previously required all providers to implement
credentialProvider (APIKey/APIBase). OAuth-based providers like
CodexProvider (ChatGPT OAuth) don't expose static credentials,
causing all media tools to fail with "does not expose API credentials".

Make credentialProvider optional (nil when unsupported). Each
callProvider gracefully falls back to the provider's Chat() API
when credentials are unavailable. Generation tools (create_image,
create_video, create_audio) return a clear error since they require
direct API access with no Chat fallback.

Co-authored-by: Luvu182 <208665161+Luvu182@users.noreply.github.com>
2026-03-11 16:40:35 +07:00
viettranx bdb60de7ae chore: upgrade Go 1.25 → 1.26 and apply go fix modernizations
- Update go.mod and Dockerfile to Go 1.26
- Apply `go fix ./...` stdlib modernizations across 170+ files
- Add `go fix` to post-implementation checklist in CLAUDE.md
- Fix go fix misapplied rewrite in loop_history.go
2026-03-10 00:09:15 +07:00
viettranx 9d0af657e5 fix(tools): correct media provider params and UI fixes
- MiniMax audio: fix invalid params (sample_rate/bitrate as int, auto
  instrumental when no lyrics, pass duration_seconds)
- MiniMax/DashScope image: map aspect_ratio to provider size format
- Gemini video: read person_generation from chain params instead of hardcode
- GetParamInt: add string-to-int coercion for UI select values
- UI: fix combobox portal selection bug, dropdown overflow clipping,
  provider chain form spacing, skill dialog min-height
- Update bitrate options to numeric bps values in params schema
2026-03-08 22:32:08 +07:00
viettranx 5815437f78 feat(tools): add media provider chain with ordered fallback and retry
Refactor create_image and create_video to use a shared provider chain system.
Each tool now supports an ordered list of providers with per-entry timeout,
max retries, and provider-specific params. Includes MiniMax and DashScope
image/video generation implementations.

- New media_provider_chain.go: shared chain resolution, retry execution, limitedReadAll
- create_image: refactored to ExecuteWithChain, added MiniMax + DashScope providers
- create_video: refactored to ExecuteWithChain, added MiniMax async video generation
- Backward compatible with legacy {provider, model} settings format
2026-03-08 20:09:43 +07:00
viettranx e1a6801a7a fix(tools): correct Veo API, media ref ordering, video tag, and model verify
- Fix create_video: use predictLongRunning API instead of generateContent
  (async polling flow: POST → poll every 10s → download video from URI)
- Fix durationSeconds as int (not string) per actual Gemini API requirement
- Fix MediaRef collection order: historical first, current last, so
  refs[len-1] always returns the most recent file (fixes read_audio
  picking up old file instead of current voice message)
- Remove misleading "video not yet supported" text from Telegram handler
  that prevented LLM from calling read_video tool
- Add isNonChatModel() to skip chat-based verify for generation models
  (veo-*, dall-e-*, imagen-*, gemini-*-image)
2026-03-08 15:21:08 +07:00
viettranx 691ddce8fb feat(tools): add read_audio, read_video, create_video tools and fix system prompt tool filtering
- Add read_audio tool with Gemini File API, OpenAI input_audio, and fallback support
- Add read_video tool with Gemini File API and base64 fallback for video analysis
- Add create_video tool with Gemini Veo and OpenRouter chat completions support
- Add shared gemini_file_api.go for upload → poll → generateContent pipeline
- Add shared openai_compat_call.go for custom JSON chat completions
- Fix system prompt showing denied tools: use filteredToolNames() instead of tools.List()
- Wire audio/video MediaRef context propagation in agent loop
- Register new tools in seed data, policy groups, and web UI settings
- Enforce duration (max 30s) and aspect_ratio limits on create_video
2026-03-08 14:43:18 +07:00