* Fix HostedVLLMRerankConfig will not be used
Signed-off-by: Jun-Fei Cherng <jfcherng@realtek.com>
* Fix no usage statistics in rerank with hosted_vllm
Signed-off-by: Jun-Fei Cherng <jfcherng@realtek.com>
* Revise typo in comment
Signed-off-by: Jun-Fei Cherng <jfcherng@realtek.com>
---------
Signed-off-by: Jun-Fei Cherng <jfcherng@realtek.com>
* Fix Azure GPT-5 incorrectly routing to O-series config
GPT-5 models support reasoning but are NOT O-series models and DO support
temperature parameter. The previous routing logic in get_provider_responses_api_config()
was incorrectly sending Azure GPT-5 requests to AzureOpenAIOSeriesResponsesAPIConfig
which removes temperature from supported params.
This fix explicitly excludes GPT-5 models from O-series routing, ensuring they
use the standard AzureOpenAIResponsesAPIConfig which properly supports temperature.
Fixes: Azure GPT-5 throwing UnsupportedParamsError for temperature parameter
Tested: Added comprehensive unit tests for GPT-5 and O-series routing
* Apply suggestion from @xingyaoww
* Apply suggestion from @xingyaoww
* Improve Azure routing logic to use broader 'gpt' check for temperature support
Based on feedback from @krrishdholakia, updated the routing logic to check
for 'gpt' in model name instead of specifically 'gpt-5'. This approach is:
- More future-proof: covers all GPT models (gpt-3.5, gpt-4, gpt-5, future models)
- Simpler: single check for all GPT variants
- More maintainable: won't need updates for each new GPT model
Changes:
- litellm/utils.py: Changed from is_gpt5 to is_gpt_model check
- tests: Added comprehensive test for all GPT model variants (gpt-3.5 through gpt-5)
All tests pass:
- GPT models (gpt-3.5-turbo, gpt-4, gpt-4o, gpt-5) -> AzureOpenAIResponsesAPIConfig (supports temperature)
- O-series models (o1, o3) -> AzureOpenAIOSeriesResponsesAPIConfig (no temperature)
Co-authored-by: openhands <openhands@all-hands.dev>
---------
Co-authored-by: openhands <openhands@all-hands.dev>
* feat(guardrails): Add deduplication and session tracking
- Implement deduplication logic to prevent duplicate scans (via call_id; add _check_and_mark_scanned) caused by LiteLLM callback system
- Add session tracking using litellm_trace_id as AI Session ID for Prisma AIRS SCM logging
- Extract helper methods _extract_prompt_from_request maintainability
- Use httpxSpecialProvider import (LoggingCallback -> GuardrailCallback)
- Add comprehensive tests for deduplication and session tracking (7 new tests)
- Update documentation with multi-turn conversation tracking examples
* docs: update PANW Prisma AIRS multi-turn conversation example to use industry-standard terminology
- Clearer example for conversation tracking
- Updated terminology from 'AI Session ID' to 'Prisma AIRS AI Session ID' for clarity
* fix: remove unused asyncio import
* fix: correct mypy type ignore comment
* Fix embeddings endpoint call_type to use valid CallTypes enum value
Fixed bug where the `/embeddings` endpoint was passing `call_type="embeddings"`
to guardrail hooks, but "embeddings" is not a valid value in the CallTypes enum.
Changed to use `call_type="aembedding"` (async embedding) which is the correct
CallTypes enum value and matches the route_type used in the same function.
Added unit tests to verify:
- "embeddings" is not a valid CallTypes enum value
- "aembedding" is the correct valid value
- The fix prevents ValueError when guardrails are enabled
Fixes#16240
* Inline embeddings call type regression check
* Ensure embedding test preserves proxy metadata
## Problem
The `extra_body` parameter in `litellm.responses()` and `litellm.aresponses()`
was being accepted but never passed to the HTTP request sent to the LLM provider.
This prevented users from sending custom/experimental parameters to provider APIs.
## Changes
- Added `data.update(extra_body)` in `async_response_api_handler` (line 2138)
- Added `data.update(extra_body)` in `response_api_handler` (line 2012)
- Added tests to `test_openai_responses_api.py` for extra_body functionality
## Testing
- Tests verify extra_body params are passed in both sync and async modes
- Existing Responses API tests continue to pass
- Manually verified with OpenAI API that custom params are sent correctly
## Impact
Users can now pass custom/experimental parameters via extra_body:
```python
litellm.aresponses(
model="gpt-4o",
input="hello",
extra_body={"custom_param": "value"} # Now works!
)
```
This aligns with the OpenAI SDK pattern and matches behavior in other
LiteLLM endpoints (completion, embedding, etc.) that already support extra_body.
* Update MCP version from 1.10.1 to 1.20.0
- Update mcp dependency: 1.10.1 -> 1.20.0 in requirements.txt, pyproject.toml, and CI config
- Update uvicorn dependency: 0.29.0 -> 0.31.1 (required by MCP 1.20.0)
- Update PyJWT constraint to support newer versions required by MCP
- Update all CI pipeline references to MCP 1.20.0
- Add test to verify MCP version and import compatibility
MCP 1.20.0 requires uvicorn >=0.31.1 and PyJWT >=2.10.1.
MCP package remains Python >=3.10 only (no change to version constraint).
* Update poetry.lock for MCP 1.20.0
* Fix bug, add new unit test
* Extract payload builder code to a separate namespace
* Update opik.py to use logic from the new namespace
* Code cleanup, type hints improvements
* Run linter
* Log model name as span field
* Reformat arguments in payload builders
* Use dataclasses for payloads, use opik native client if it's available
* Add cost and provider
* Add provider mapping
- Add HCP_VAULT_MOUNT_NAME env var to override default 'secret' mount
- Add HCP_VAULT_PATH_PREFIX env var to add prefix to secret paths
- Update get_url() method to construct URLs with configurable mount and prefix
- Add test coverage for custom mount names and path prefixes
- Maintain backward compatibility with existing configurations
This allows users to configure Vault paths like:
- Custom mount: {VAULT_ADDR}/v1/{MOUNT_NAME}/data/{SECRET}
- With prefix: {VAULT_ADDR}/v1/secret/data/{PREFIX}/{SECRET}
- Both: {VAULT_ADDR}/v1/{MOUNT_NAME}/data/{PREFIX}/{SECRET}
Resolves issue where mount name was hardcoded and path prefixes weren't supported.