Commit Graph

15 Commits

Author SHA1 Message Date
viettranx 120fc2d09c fix(media): chain provider format, post-write verification, group media history (#206)
Cherry-picked valuable changes from PR #206:
- hasReadImageProvider supports chain format {"providers":[...]} config
- create_image/video/audio verify file persistence after write with diagnostic logging
- HistoryEntry gains Media field + CollectMedia() for group media context on @mention
- Zalo extractContentAndMedia refactored: all media types via DetectMIMEType/BuildMediaTags, 20MB limit
- Discord/Zalo pass media paths to Record() and collect historical media on @mention
- Zalo send_helpers logs directory contents when checkFileSize stat fails
2026-03-18 08:12:10 +07:00
Luan Vu 405a753239 fix: resolve media provider type from DB instead of guessing from name (#154)
Media tools (create_image, create_video, create_audio, read_audio,
read_video, read_document) routed API calls based on provider name
pattern matching (e.g. strings.HasPrefix(name, "gemini")). This breaks
when users give custom names to DB providers — a Gemini provider named
"chatgpt-sap-het" would be misrouted to the OpenAI-compat endpoint,
causing 404 errors.

Fix: carry the DB provider_type through OpenAIProvider, resolve it via
typedProvider interface in ExecuteWithChain, and inject as _provider_type
param for callProvider routing. Name-based heuristic kept as fallback
for config-file providers that don't have a DB type.

Co-authored-by: Luvu182 <208665161+Luvu182@users.noreply.github.com>
2026-03-11 18:32:51 +07:00
Luan Vu 0592be359d fix: remove legacy per-agent imageGen/vision override from tools_config (#153)
The per-agent `imageGen` and `vision` fields in `ToolPolicySpec` (stored
in agents.tools_config JSONB) were added in d5cc5a7 (Feb 26) as the
original way to configure image/vision providers. When the media provider
chain system was introduced in 5815437 (Mar 8), these fields were kept
"for backward compat" but became dead code with no UI to manage them.

This causes a hard-to-debug issue: if an agent's tools_config contains
stale imageGen/vision data (set via API or leftover from DB), it silently
overrides the global provider chain configured in the builtin tools UI.
Users see the correct chain in the UI but the tool calls a completely
different provider/model, with no indication of why.

Changes:
- Remove Vision and ImageGen fields + struct definitions from ToolPolicySpec
- Remove associated context helpers (WithVisionConfig, WithImageGenConfig, etc.)
- Remove per-agent override injection in agent loop
- Simplify create_image and read_image to use chain as sole source of truth
- UI: whitelist known tools_config fields on save to clean stale DB data

Co-authored-by: Luvu182 <208665161+Luvu182@users.noreply.github.com>
2026-03-11 17:37:55 +07:00
Luan Vu fa5f51e72e fix: allow OAuth providers in media tool chain (read_audio, read_image, etc.) (#150)
ExecuteWithChain previously required all providers to implement
credentialProvider (APIKey/APIBase). OAuth-based providers like
CodexProvider (ChatGPT OAuth) don't expose static credentials,
causing all media tools to fail with "does not expose API credentials".

Make credentialProvider optional (nil when unsupported). Each
callProvider gracefully falls back to the provider's Chat() API
when credentials are unavailable. Generation tools (create_image,
create_video, create_audio) return a clear error since they require
direct API access with no Chat fallback.

Co-authored-by: Luvu182 <208665161+Luvu182@users.noreply.github.com>
2026-03-11 16:40:35 +07:00
viettranx bdb60de7ae chore: upgrade Go 1.25 → 1.26 and apply go fix modernizations
- Update go.mod and Dockerfile to Go 1.26
- Apply `go fix ./...` stdlib modernizations across 170+ files
- Add `go fix` to post-implementation checklist in CLAUDE.md
- Fix go fix misapplied rewrite in loop_history.go
2026-03-10 00:09:15 +07:00
viettranx 5815437f78 feat(tools): add media provider chain with ordered fallback and retry
Refactor create_image and create_video to use a shared provider chain system.
Each tool now supports an ordered list of providers with per-entry timeout,
max retries, and provider-specific params. Includes MiniMax and DashScope
image/video generation implementations.

- New media_provider_chain.go: shared chain resolution, retry execution, limitedReadAll
- create_image: refactored to ExecuteWithChain, added MiniMax + DashScope providers
- create_video: refactored to ExecuteWithChain, added MiniMax async video generation
- Backward compatible with legacy {provider, model} settings format
2026-03-08 20:09:43 +07:00
viettranx 0f2737ce53 feat(media): persistent media storage, read_document tool, and pipeline refactor
- Add persistent media storage (internal/media/) replacing temp file deletion
- Add MediaRef type for lightweight media references in session messages
- Refactor media pipeline to use bus.MediaFile{Path, MimeType} across all channels
- Add read_document builtin tool for PDF/DOCX/XLSX analysis via Gemini native API
- Move image sanitization from Telegram to shared agent/media layer
- Add media reload for multi-turn conversations (images from last 5 messages)
- Add reply-to-message media resolution for Telegram (re-download on reply)
- Add media inventory to compaction summary to preserve awareness after truncation
- Fix coreToolSummaries for read_image, read_document, create_image tools
- Add real-time trace update events via WebSocket broadcast
- Improve trace detail UI with media refs and tool result display
2026-03-08 14:00:34 +07:00
viettranx 96845d1e44 fix(media): set result.Media on create_image and add MEDIA: fallback in subagent exec
Root cause: create_image tool only set ForLLM:"MEDIA:path" but never
populated result.Media. The main agent loop parses MEDIA: prefix via
parseMediaResult(), but the subagent exec loop only checked result.Media
— so media paths were silently lost for all subagent/spawn workflows.

This caused the entire downstream pipeline (task.Media → AnnounceQueue →
PublishInbound → ContentSuffix → session) to receive empty media, making
images invisible in WS chat despite Telegram working fine.

Fixes:
- create_image.go: set result.Media = []string{imagePath}
- subagent_exec.go: add MEDIA: prefix fallback parsing as safety net
2026-03-08 00:06:13 +07:00
viettranx b62d46e50e refactor(lint): apply Go best practices across codebase
- Use errors.Is() instead of direct sentinel comparison (13 instances)
- Convert if/else-if chains to switch/case for same-variable comparisons
- Remove redundant bitwise OR with zero
- Add post-implementation checklist to CLAUDE.md
2026-03-07 20:51:39 +07:00
viettranx 6ed62b8506 feat: channel-isolated workspace, resolvePath fix, create_image workspace, summoner Expertise section, bus Topic constants
- Fix resolvePath for nested non-existent dirs (use resolveThroughExistingAncestors)
- Channel-isolated workspace: user_agent_profiles.workspace stores channel prefix,
  used as source of truth with backward compat for existing users
- Loop caches workspace per-user with CacheKindUserWorkspace invalidation via pubsub
- ContractHome/ExpandHome for portable ~-based paths in DB
- create_image saves to workspace/generated/YYYY-MM-DD/ instead of OS temp dir
- SOUL.md template: add ## Expertise section for domain knowledge
- Summoner buildEditPrompt: section guide, complete file output, frontmatter update
- Bus: Topic* constants for Subscribe/Broadcast keys, CacheKind* for payload kinds
- Teams, delegates, sessions, agent links: various enhancements

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 10:52:32 +07:00
viettranx 45ea0ee9a4 feat: Add native Gemini image generation support and refine media path stripping in agent output. 2026-02-28 18:44:51 +07:00
Michael 370c290642 fix(tools): use /images/generations endpoint for Gemini and OpenAI image gen (#9)
create_image exclusively used /chat/completions with modalities:["image","text"]
which only works on OpenRouter. Gemini returns HTTP 400:
  "Image generation is not yet supported on the chat.completions endpoint"
OpenAI's DALL-E models also require /images/generations, not /chat/completions.

Fix: route OpenRouter through /chat/completions (supports modalities),
route all other providers (Gemini, OpenAI, etc.) through the standard
/images/generations endpoint with response_format:"b64_json".

Also update default Gemini model from deprecated gemini-2.0-flash-exp
to gemini-2.5-flash-image.
2026-02-28 13:08:32 +07:00
viettranx 86d58e1021 feat: Introduce a new upgrade command and enhance built-in tool settings with provider and model configuration. 2026-02-27 11:38:04 +07:00
viettranx d65d792646 feat: Implement built-in tool management with persistence, API, and UI. 2026-02-27 10:19:19 +07:00
viettranx d5cc5a745d feat: Implement vision capabilities and image generation tools, adding media handling, dedicated configurations, and trace optimization for image data. 2026-02-26 22:28:27 +07:00