* Fix HostedVLLMRerankConfig will not be used
Signed-off-by: Jun-Fei Cherng <jfcherng@realtek.com>
* Fix no usage statistics in rerank with hosted_vllm
Signed-off-by: Jun-Fei Cherng <jfcherng@realtek.com>
* Revise typo in comment
Signed-off-by: Jun-Fei Cherng <jfcherng@realtek.com>
---------
Signed-off-by: Jun-Fei Cherng <jfcherng@realtek.com>
Update documentation to reflect actual API response format:
- Change singular 'image' field to plural 'images' array
- Add complete ImageURLListItem structure with index and type fields
- Update all code examples to use message.images instead of message.image
- Fix streaming examples to access images[0]["image_url"]["url"]
The documentation was incorrectly showing 'image' (singular object)
but the actual implementation returns 'images' (array of ImageURLListItem).
Related to issue #16227
Add "aembedding" to Literal type hints in ProxyLogging methods:
- pre_call_hook overloads (lines 872, 893, 913)
- during_call_hook (line 1052)
- _process_guardrail_callback (line 803)
Add type: ignore comments where ProxyLogging calls CustomLogger
callbacks (lines 1021, 1106) to handle type mismatch between
ProxyLogging's broader Literal (includes "aembedding") and
CustomLogger's narrower Literal (doesn't include "aembedding").
Related to PR #16328 which changed embeddings endpoint to use
call_type="aembedding" for async operations.
* Revert "Initial changes for supporting prompts to multiple models"
This reverts commit 0d8dee4401a410531ddc4a29ec11dc17f7807c4b.
* Add test for the single model select
* Fix Azure GPT-5 incorrectly routing to O-series config
GPT-5 models support reasoning but are NOT O-series models and DO support
temperature parameter. The previous routing logic in get_provider_responses_api_config()
was incorrectly sending Azure GPT-5 requests to AzureOpenAIOSeriesResponsesAPIConfig
which removes temperature from supported params.
This fix explicitly excludes GPT-5 models from O-series routing, ensuring they
use the standard AzureOpenAIResponsesAPIConfig which properly supports temperature.
Fixes: Azure GPT-5 throwing UnsupportedParamsError for temperature parameter
Tested: Added comprehensive unit tests for GPT-5 and O-series routing
* Apply suggestion from @xingyaoww
* Apply suggestion from @xingyaoww
* Improve Azure routing logic to use broader 'gpt' check for temperature support
Based on feedback from @krrishdholakia, updated the routing logic to check
for 'gpt' in model name instead of specifically 'gpt-5'. This approach is:
- More future-proof: covers all GPT models (gpt-3.5, gpt-4, gpt-5, future models)
- Simpler: single check for all GPT variants
- More maintainable: won't need updates for each new GPT model
Changes:
- litellm/utils.py: Changed from is_gpt5 to is_gpt_model check
- tests: Added comprehensive test for all GPT model variants (gpt-3.5 through gpt-5)
All tests pass:
- GPT models (gpt-3.5-turbo, gpt-4, gpt-4o, gpt-5) -> AzureOpenAIResponsesAPIConfig (supports temperature)
- O-series models (o1, o3) -> AzureOpenAIOSeriesResponsesAPIConfig (no temperature)
Co-authored-by: openhands <openhands@all-hands.dev>
---------
Co-authored-by: openhands <openhands@all-hands.dev>
* feat(guardrails): Add deduplication and session tracking
- Implement deduplication logic to prevent duplicate scans (via call_id; add _check_and_mark_scanned) caused by LiteLLM callback system
- Add session tracking using litellm_trace_id as AI Session ID for Prisma AIRS SCM logging
- Extract helper methods _extract_prompt_from_request maintainability
- Use httpxSpecialProvider import (LoggingCallback -> GuardrailCallback)
- Add comprehensive tests for deduplication and session tracking (7 new tests)
- Update documentation with multi-turn conversation tracking examples
* docs: update PANW Prisma AIRS multi-turn conversation example to use industry-standard terminology
- Clearer example for conversation tracking
- Updated terminology from 'AI Session ID' to 'Prisma AIRS AI Session ID' for clarity
* fix: remove unused asyncio import
* fix: correct mypy type ignore comment
* Fix embeddings endpoint call_type to use valid CallTypes enum value
Fixed bug where the `/embeddings` endpoint was passing `call_type="embeddings"`
to guardrail hooks, but "embeddings" is not a valid value in the CallTypes enum.
Changed to use `call_type="aembedding"` (async embedding) which is the correct
CallTypes enum value and matches the route_type used in the same function.
Added unit tests to verify:
- "embeddings" is not a valid CallTypes enum value
- "aembedding" is the correct valid value
- The fix prevents ValueError when guardrails are enabled
Fixes#16240
* Inline embeddings call type regression check
* Ensure embedding test preserves proxy metadata