Commit Graph

57 Commits

Author SHA1 Message Date
Dima-Mediator a0d4d0b304 Gemini models: capture image_tokens and support cost_per_output_image_token in costs calculations (#16912) 2025-11-21 19:59:24 -08:00
Jack Cherng 2ab34f9a52 Fix HostedVLLMRerankConfig will not be used (#16352)
* Fix HostedVLLMRerankConfig will not be used

Signed-off-by: Jun-Fei Cherng <jfcherng@realtek.com>

* Fix no usage statistics in rerank with hosted_vllm

Signed-off-by: Jun-Fei Cherng <jfcherng@realtek.com>

* Revise typo in comment

Signed-off-by: Jun-Fei Cherng <jfcherng@realtek.com>

---------

Signed-off-by: Jun-Fei Cherng <jfcherng@realtek.com>
2025-11-07 19:11:59 -08:00
Sameer Kankute 0c743e1adc Add E2E Container API Support (#16136)
* Add v1 cut of container api

* fix lint errors

* Add proxy support to container apis & logging support (#16049)

* Add proxy support to container apis

* Add logging support

* Add cost tracking support for containers and documentation

* Add new constant documentation

* Add container cost in model map

* fix failing azure tests

* Update tests based on model map changes

* fix model map tests

* fix model map tests

* Container modeshould be container

* Container tests fix

* Merge branch 'main' into litellm_sameer_oct_staging_2

---------

Co-authored-by: Ishaan Jaffer <ishaanjaffer0324@gmail.com>
2025-11-01 14:03:51 -07:00
Ishaan Jaffer 33371d18f4 test fix claude-sonnet-4-5-20250929 2025-10-28 19:05:13 -07:00
Ishaan Jaffer 1b49dba1dd fix claude-sonnet-4-5 2025-10-28 17:37:08 -07:00
Krrish Dholakia da3988b768 fix: fix test 2025-10-25 15:43:16 -07:00
Sameer Kankute 0f9996a4d0 Litellm sameer oct staging (#15806)
* Addd v2/chat support for cohere

* fix streaming

* Use v2_transformation for logging passthrough:

* Use v2_transformation for logging passthrough:

* Add test for checking if document and citation_options is getting passed

* Update the cohere model

* Add cost tracking for vertex ai passthrough batch jobs

* Add full passthrough support

* refactor code according to the comments

* Add passthrough handler

* remove invalid params

* Updated documentation

* Updated documentation

* Updated documentation

* Correct the import

* Add openai videos generation and retrieval support

* add retrieval endpoint

* Add docs

* Add imports

* remove orjson

* remove double import

* fix openai videos format

* remove mock code

* remove not required comments

* Add tests

* Add tests

* Add other video endpoints

* Fix cost calculation and transformation

* Fixed mypy tests

* remove not used imports

* fix documentation for get batch req (#15742)

* Add grounding info to responses API (#15737)

* Add grounding info to responses API

* fix lint errors

* Use typed objects for annotations

* Use typed objects for annotations

* fix mypy error

* Litellm fix json serialize alreting 2 (#15741)

* fix json serializable error for alerts

* Add test

* fix mypt errors

* fix mypt errors

* Add Qwen3 imported model support for AWS Bedrock (#15783)

* Add qwen imported model support

* fix mypy errors

* fix empty user message error (#15784)

* fix typed dict for list

* Add azure supported videos endpoint

* fix mapped tests

* add azure sora models to model map

* Add OpenAI video generation and content retrieval support (#15745)

* Add openai videos generation and retrieval support

* add retrieval endpoint

* Add docs

* Add imports

* remove orjson

* remove double import

* fix openai videos format

* remove mock code

* remove not required comments

* Add tests

* Add tests

* Add other video endpoints

* Fix cost calculation and transformation

* Fixed mypy tests

* remove not used imports

* fix typed dict for list

* fix mypy errors

* move directory

* make v2 chat default

* Fix mypy tests

* Fix mypy tests

* Fix mypy tests

* Fix mypy tests

* Revert "Add Azure Video Generation Support with Sora Integration"

* refactor videos repo

* add test

* Add azure openai videos support

* Add azure openai videos support

* Add router endpoint support for videos

* fix mypy error

* add azure models

* fix mapped test

* fix mypy error

* Add proxy router test

* Add proxy router test

* remove deprecated model name from tests

* fix import error

* fix import error

* Add gaurdrail integration in videos endpoint

* Add logging support for videos endpoint

* Add final documentation supporting videos integration

* fix model name and document input

* Update literals to avoid mypy errors

* Remove unused imports and print statements

* revert guardrail support for video generation and video remix

* revert guardrail support for video generation and video remix

* Fix failing mapped and llm translation tests
2025-10-24 12:17:22 -07:00
Ishaan Jaffer 74b8a1dbdf test_aaamodel_prices_and_context_window_json_is_valid 2025-10-23 08:47:08 -07:00
Ishaan Jaffer 8780b1af6f test fix 2025-10-17 18:55:05 -07:00
Sameer Kankute 138bdbb6d8 fix mapped tests 1 (#15445)
* fix mapped tests

* fix mapped tests
2025-10-11 08:33:08 -07:00
Ishaan Jaffer eb72990aa6 test_get_valid_models_with_cli_pattern 2025-09-23 16:46:40 -07:00
Ishaan Jaffer 8016bcb1b9 test fix 2025-09-23 14:41:30 -07:00
Ishaan Jaffer f8c1f519c9 test_aaamodel_prices_and_context_window_json_is_valid 2025-09-23 14:26:38 -07:00
Ishaan Jaff b9ffa98c55 [Feat] Proxy CLI: Create a python method to login using litellm proxy (#14782)
* fix: cli auth with SSO okta

* fix: add LITTELM_CLI_SERVICE_ACCOUNT_NAME

* fix: get_litellm_cli_user_api_key_auth

* use existing_key CLI

* fix: use existing key

* test auth commands

* test_cli_sso_callback_regenerate_vs_create_flow

* feat: add CLI Token Utilities

* fix: get_stored_api_key

* move file

* fix: get_valid_models

* fix config.yaml

* TestCLITokenUtils

* TestGetValidModelsWithCLI

* fix: tie user id to keys created through CLI

* fix: add teams interface to CLI

* add /keys/update to the list client commands

* fix /sso/cli/poll to return the user_id

* fix: working TeamsManagementClient

* fix CLI Login command

* fixes for auth

* Potential fix for code scanning alert no. 3400: Clear-text logging of sensitive information

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

* ruff fix

---------

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
2025-09-22 21:28:38 -07:00
Krrish Dholakia 22d3e492f9 fix(test_utils.py): fix test 2025-09-16 19:17:44 -07:00
Krrish Dholakia e735808614 test(test_utils.py): fix cache creation tokens 2025-09-16 19:03:16 -07:00
Ishaan Jaff 69ef062f55 fix tiered_pricing test 2025-09-11 19:56:44 -07:00
Ishaan Jaff 51d5255452 [Bug]: Azure OpenAI & AI Foundry Reject Image Generation Payload Due to extra_body Injection in LiteLLM v1.76.3 (#14475)
* add request body azure img gen

* fix test_get_optional_params_image_gen_filters_empty_values

* test_azure_image_generation_request_body

* test_azure_image_generation_request_body
2025-09-11 19:39:06 -07:00
Ishaan Jaff 1227b54fa6 test_get_model_info_gemini 2025-09-06 13:43:42 -07:00
Krish Dholakia f67339a86c Merge pull request #14028 from onlylhf/volcengine-embedding-support
Add Volcengine embedding module with handler and transformation logic
2025-09-04 21:01:25 -07:00
Ishaan Jaff 1237be04a5 test_aaamodel_prices_and_context_window_json_is_valid 2025-09-04 07:58:20 -07:00
Ishaan Jaff be7c762882 add video_generation 2025-09-03 18:25:27 -07:00
Ishaan Jaff 61b2209827 test_proxy_function_calling_support_consistency 2025-09-02 07:22:52 -07:00
Ishaan Jaff d37be48a80 test: llama-3.3-70b-versatile 2025-09-01 20:14:12 -07:00
Ishaan Jaff 7656cb3d6e test fix 2025-09-01 17:04:47 -07:00
Ishaan Jaff de6b08e76e test_supports_tool_choice 2025-08-30 10:35:16 -07:00
Ishaan Jaff 562790cb31 fix: /v1/realtime check 2025-08-29 18:38:14 -07:00
李海峰 2d0a57a719 Add Volcengine embedding module with handler and transformation logic
- Implemented VolcEngineEmbeddingHandler for synchronous and asynchronous embedding requests.
- Created VolcEngineEmbeddingConfig for transforming requests and responses to/from Volcengine format.
- Added integration tests for embedding functionality, covering various scenarios including error handling and parameter validation.
- Established test structure for Volcengine embedding, ensuring compliance with LiteLLM testing patterns.
- Included comprehensive tests for parameter mapping, request transformation, and response handling.
2025-08-28 15:05:11 +08:00
Ishaan Jaff 76f78f7b2a fix: LITELLM_LOG_FILE test 2025-08-27 08:25:24 -07:00
Ishaan Jaff ab7efaa832 test_pre_process_non_default_params (#13990) 2025-08-26 19:15:17 -07:00
Krish Dholakia be30bc68ae Merge pull request #13759 from kankute-sameer/litellm_feat_correct_cost_calculations
Add long context support for claude-4-sonnet
2025-08-19 22:30:25 -07:00
Krrish Dholakia 7d09375d52 fix: fix gpt-5-chat mappings 2025-08-19 22:21:00 -07:00
Sameer Kankute 48622a4ee7 add cache above 200k keys in INTENDED_SCHEMA 2025-08-20 01:47:34 +05:30
Krish Dholakia f360e0ead2 Merge pull request #13590 from BerriAI/litellm_bedrock_api_header
[LLM translation] Refactor Anthropic Configurations and Add Support for `anthropic_beta` Headers
2025-08-14 11:32:18 -07:00
Ishaan Jaff 75bcfbb76a [Feat] New model vertex_ai/deepseek-ai/deepseek-r1-0528-maas (#13594)
* add ertex_ai/deepseek-ai/deepseek-r1-0528-maas

* fix init

* test_model_info_for_vertex_ai_deepseek_model
2025-08-13 13:44:45 -07:00
Jugal Bhatt c2310bcccc Refactor Anthropic Configurations in Tests
- Updated test cases to use the renamed `AmazonAnthropicClaudeConfig` instead of `AmazonAnthropicClaude3Config` for consistency with recent changes.
- Adjusted imports and assertions in test files to reflect the new configuration class name.
2025-08-13 11:52:41 -07:00
Ishaan Jaff 7695882d8a test_supports_tool_choice 2025-08-07 16:56:45 -07:00
Jugal D. Bhatt 29a8c583c2 added redis iam auth (#13275) 2025-08-05 10:56:34 -07:00
Jugal D. Bhatt de7108b5f8 input cost per token higher than 1 test (#13270) 2025-08-04 18:02:03 -07:00
Jugal D. Bhatt 3867813277 [Proxy]fix key mgmt (#13148)
* fix key mgmt

* Add unit test
2025-08-01 17:17:15 -07:00
Ishaan Jaff a8371d2cb1 [Feat] Add Google AI Studio Imagen4 model family (#13065)
* add gemini

* add init files

* add get_gemini_image_generation_config

* refactor transform

* TestGoogleImageGen

* fix transform

* fix transform

* add gemini_image_cost_calculator

* add cost tracking for gemini/imagen models

* docs image gen

* docs image gen

* test_get_model_info_gemini
2025-07-28 21:25:40 -07:00
Jugal D. Bhatt c2833e693e clean and verify key before inserting (#12840)
* clean and verify key

* change checking logic

* Add unit test
2025-07-25 10:09:28 -07:00
Ishaan Jaff 2d0187824c test_proxy_model_resolution_with_custom_names_documentation 2025-07-23 13:13:46 -07:00
Ishaan Jaff 00fd020291 fix test 2025-07-23 09:04:09 -07:00
Krish Dholakia 3ad4d9fc3e fix(router.py): use more descriptive error message (#12629)
* fix(router.py): use more descriptive error message

* fix(proxy/_types.py): note `/team/member_update` is a self-managed route

route has it's own logic for rbac - enables team admins to update member permissions

Fixes issue where team admins on UI could not update member permissions

* fix(token_counter.py): move log line to being '.debug' instead of '.error'

Fixes https://github.com/BerriAI/litellm/issues/12269
2025-07-15 22:34:20 -07:00
Krish Dholakia 49e9b73fcb Claude 4 Bedrock /invoke route support + Bedrock application inference profile tool choice support (#12599)
* docs(config_settings.md): document enable_json_schema_validation

Closes https://github.com/BerriAI/litellm/issues/12518

* fix(utils.py): add claude-sonnet-4 on bedrock support

Fixes https://github.com/BerriAI/litellm/issues/12366

* refactor(utils.py): move list to getter in function

more maintainable

* fix(utils.py): handle bedrock_converse in provider check

Fixes https://github.com/BerriAI/litellm/issues/11751
2025-07-14 21:42:25 -07:00
Adam Holmberg 38278d9583 fix: make TextCompletionStreamWrapper conversion retain reasoning_content (#12377)
ref: #12375
2025-07-07 17:25:12 -07:00
Cole McIntosh b6589a72c9 fix: Add size parameter support for Vertex AI image generation (#12292)
- Added 'size' to supported parameters for vertex_ai in get_optional_params_image_gen
- Implemented mapping from OpenAI size format (e.g., '1024x1024') to Vertex AI aspectRatio format (e.g., '1:1')
- Supports common aspect ratios: 1:1 (square), 16:9 (landscape), 9:16 (portrait)
- Added comprehensive test coverage for the size parameter mapping

Fixes LIT-279: Vertex AI Image Generation Aspect Ratio Support
2025-07-03 21:43:37 -07:00
Cole McIntosh 8965cc3b6c Fix unpack_defs handling of nested $ref inside anyOf items (#11964)
* refactor(unpack_defs): enhance handling of schema properties and anyOf structures

- Improved the unpack_defs function to handle top-level properties and nested structures more effectively.
- Added recursion for items in schemas and refined the handling of anyOf branches to ensure proper unpacking of references.
- Streamlined the logic for resolving $ref keys and managing nested schemas.

* test(unpack_defs): add test for resolving nested $ref in anyOf items

- Introduced a new test to verify that unpack_defs correctly resolves references within items of anyOf structures, addressing a specific bug scenario (Issue #11372).
- The test includes a minimal schema to ensure proper unpacking and validation of the resolved items schema.

* refactor(unpack_defs): implement a generic resolver for $ref entries

- Redesigned the unpack_defs function to provide a more robust and dependency-free implementation for resolving all $ref entries in JSON schemas.
- Introduced a depth-first traversal method that efficiently handles nested structures, including anyOf, allOf, and items, while avoiding infinite recursion.
- Enhanced memory management by resolving nodes in-place without creating a full dereferenced copy, improving performance and reducing overhead.

* Remove test for unpack_defs resolving nested references in anyOf items from test_utils.py

* Add test for unpack_defs resolving nested references in anyOf items

This commit introduces a new test to ensure that the unpack_defs function correctly resolves $ref references within items of anyOf schemas, addressing issue #11372. The test verifies that the unpacked schema contains the expected properties and structure.
2025-06-24 09:08:15 -07:00
Cole McIntosh 02a095d4db feat: implement Perplexity citation tokens and search queries cost calculation (#11938)
* feat: add citation_cost_per_token and search_queries_cost_per_1000 fields to ModelInfoBase

- Add citation_cost_per_token field to ModelInfoBase for Perplexity citation token costs
- Add search_queries_cost_per_1000 field to ModelInfoBase for Perplexity search query costs
- Update _get_model_info_helper to include these fields in model info responses
- Enables proper cost calculation for Perplexity-specific usage metrics

* feat: update Perplexity sonar-deep-research model pricing configuration

- Update input/output token costs to / per million tokens respectively
- Add reasoning token cost at  per million tokens
- Add citation_cost_per_token at  per million tokens (same as input)
- Add search_queries_cost_per_1000 at /bin/zsh.005 per 1000 search queries
- Remove deprecated search_context_cost_per_query structure
- Aligns with Perplexity's updated pricing model for deep research capabilities

* feat: implement Perplexity-specific cost calculator

- Create cost_per_token function for Perplexity provider
- Calculate standard input/output token costs
- Add citation token cost calculation using citation_cost_per_token rate
- Add reasoning token cost calculation with fallback to completion_tokens_details
- Add search query cost calculation using search_queries_cost_per_1000 rate
- Return separate prompt_cost and completion_cost for accurate billing
- Handles all Perplexity-specific usage metrics: citation_tokens, num_search_queries, reasoning_tokens

* feat: integrate Perplexity cost calculator with main cost calculation system

- Import perplexity_cost_per_token function in main cost calculator
- Add perplexity provider case to cost_per_token function
- Enables automatic routing of Perplexity cost calculations to provider-specific logic
- Maintains compatibility with existing cost calculation patterns
- Supports all Perplexity-specific cost metrics through unified interface

* feat: enhance Perplexity response transformation to extract cost-related fields

- Override transform_response method to extract Perplexity-specific usage fields
- Add _enhance_usage_with_perplexity_fields method to process API responses
- Extract citation_tokens from citations array using character-based estimation (~4 chars/token)
- Extract num_search_queries from both usage field and root level with priority handling
- Create usage object when none exists to ensure cost fields are always captured
- Handle empty citations and missing fields gracefully
- Enables automatic extraction of cost metrics from Perplexity API responses

* test: add comprehensive test suite for Perplexity cost calculation features

Add 82 comprehensive tests across 3 test files:

- test_perplexity_cost_calculator.py (59 tests):
  * Cost calculation with citation tokens, search queries, reasoning tokens
  * Various combinations and edge cases
  * Integration with main cost calculator
  * Model info access and validation
  * Zero values and missing fields handling

- test_perplexity_chat_transformation.py (12 tests):
  * Citation token extraction from API responses
  * Search query extraction from usage and root fields
  * Priority handling and field aggregation
  * Empty citations and missing fields handling
  * Token estimation accuracy validation

- test_perplexity_integration.py (11 tests):
  * End-to-end cost calculation workflows
  * High-volume and edge case scenarios
  * Model info integration validation
  * Case-insensitive provider matching
  * Transformation preservation of existing fields

Ensures reliability and correctness of all Perplexity cost features with comprehensive coverage of happy path, edge cases, and error conditions.

* fix: remove unused Union import from Perplexity transformation

- Remove unused typing.Union import from litellm/llms/perplexity/chat/transformation.py
- Fixes F401 linting error: 'typing.Union imported but unused'
- Maintains only necessary imports: Any, List, Optional, Tuple

* Fix JSON schema validation and use web_search_requests field

- Add citation_cost_per_token and search_queries_cost_per_1000 to JSON schema
- Update Perplexity transformation to use web_search_requests in PromptTokensDetailsWrapper
- Update Perplexity cost calculator to read from web_search_requests field
- Maintain backward compatibility while using standard LiteLLM fields

* Fix type errors in Perplexity cost calculator

- Add null checks for token counts and cost values to prevent None multiplication errors
- Use .get() with fallback values instead of direct dictionary access
- Ensure all arithmetic operations handle None values safely

This fixes the failing job 44517525148 type errors.

* Refactor Perplexity cost calculation tests to improve accuracy and consistency

- Replace absolute difference assertions with math.isclose for better precision in cost comparisons
- Update tests to utilize PromptTokensDetailsWrapper for handling web search requests
- Ensure all test cases correctly reflect the new structure of usage fields, enhancing clarity and maintainability

* fix: address type hinting issues in PerplexityChatConfig usage handling

- Add type ignore comments to model_response.usage assignments to resolve type checking errors
- Ensures compatibility with type definitions while maintaining existing functionality

* Update model pricing configuration in JSON backup

- Add citation_cost_per_token and search_queries_cost_per_1000 fields to enhance cost tracking
- Remove deprecated search_context_cost_per_query structure to streamline pricing model
- Aligns with recent updates in Perplexity's pricing strategy

* Update search queries cost structure in model_prices_and_context_window.json to use search_context_cost_per_query

* Refactor search queries cost structure in model_prices_and_context_window_backup.json and update related code to use search_queries_cost_per_query. Remove deprecated search_queries_cost_per_1000 references across model info and tests.

* Enhance cost calculation in cost_calculator.py by introducing a safe float casting function to handle potential None and invalid values. Update cost calculations for input, citation, output, reasoning, and search query tokens to use this new function, ensuring more robust handling of model pricing data.

* Refactor cost calculation in cost_calculator.py to support both legacy and current search cost keys. Enhance handling of search cost values by accommodating both dictionary and float formats, ensuring robust cost computation for search queries.

* Update test cases to reflect changes in cost structure, renaming search_queries_cost_per_query to search_context_cost_per_query for consistency with recent refactor. Ensure assertions in tests align with updated cost keys.

* Update test_perplexity_integration.py to rename search_queries_cost_per_query to search_context_cost_per_query, ensuring consistency with recent cost structure changes. Adjust assertions to align with updated cost keys.
2025-06-23 14:15:25 -07:00