* Add Azure OpenAI assistant features cost tracking
Implements cost tracking for Azure's new assistant features:
- File Search: $0.1 USD per 1 GB/Day (storage-based pricing)
- Code Interpreter: $0.03 USD per session
- Computer Use: $0.003 input + $0.012 output per 1K tokens
Features:
- Provider-specific pricing (Azure vs OpenAI)
- Model-specific pricing overrides via JSON config
- Environment variable configuration
- Backwards compatible with existing OpenAI pricing
* Add comprehensive tests for Azure assistant features cost tracking
- Unit tests for file search, code interpreter, computer use, vector store
- Integration tests for combined cost calculation
- Provider-specific pricing tests (Azure vs OpenAI)
- Model-specific pricing override tests
- Edge case handling (None inputs, zero values)
- All 17 tests passing
* Fix test and ensure all Azure assistant cost tracking tests pass
- Fixed integration test approach
- All 17 tests now passing
- Comprehensive coverage of Azure assistant features cost tracking
* Enhance cost tracking for Azure assistant features
- Safely convert and extract parameters for file search, computer use, and code interpreter sessions.
- Ensure model_info is consistently converted to a dictionary format.
- Improve error handling for input values to prevent type-related issues.
- Maintain compatibility with existing cost calculation methods.
* Refactor cost tracking for Azure assistant features
- Introduced separate methods for handling costs related to web search, file search, vector store, computer use, and code interpreter.
- Enhanced parameter extraction and conversion for file search and computer use.
- Improved error handling and type safety throughout the cost calculation process.
- Maintained compatibility with existing cost calculation methods while streamlining the overall structure.
* feat(model_prices_and_context_window.json): add mistral-small-3.2-24b-instruct model with token costs and chat mode support
* fix(model_prices_and_context_window.json): update model paths to include 'openrouter' prefix for mistral-small-3.1 and 3.2
* feat: add citation_cost_per_token and search_queries_cost_per_1000 fields to ModelInfoBase
- Add citation_cost_per_token field to ModelInfoBase for Perplexity citation token costs
- Add search_queries_cost_per_1000 field to ModelInfoBase for Perplexity search query costs
- Update _get_model_info_helper to include these fields in model info responses
- Enables proper cost calculation for Perplexity-specific usage metrics
* feat: update Perplexity sonar-deep-research model pricing configuration
- Update input/output token costs to / per million tokens respectively
- Add reasoning token cost at per million tokens
- Add citation_cost_per_token at per million tokens (same as input)
- Add search_queries_cost_per_1000 at /bin/zsh.005 per 1000 search queries
- Remove deprecated search_context_cost_per_query structure
- Aligns with Perplexity's updated pricing model for deep research capabilities
* feat: implement Perplexity-specific cost calculator
- Create cost_per_token function for Perplexity provider
- Calculate standard input/output token costs
- Add citation token cost calculation using citation_cost_per_token rate
- Add reasoning token cost calculation with fallback to completion_tokens_details
- Add search query cost calculation using search_queries_cost_per_1000 rate
- Return separate prompt_cost and completion_cost for accurate billing
- Handles all Perplexity-specific usage metrics: citation_tokens, num_search_queries, reasoning_tokens
* feat: integrate Perplexity cost calculator with main cost calculation system
- Import perplexity_cost_per_token function in main cost calculator
- Add perplexity provider case to cost_per_token function
- Enables automatic routing of Perplexity cost calculations to provider-specific logic
- Maintains compatibility with existing cost calculation patterns
- Supports all Perplexity-specific cost metrics through unified interface
* feat: enhance Perplexity response transformation to extract cost-related fields
- Override transform_response method to extract Perplexity-specific usage fields
- Add _enhance_usage_with_perplexity_fields method to process API responses
- Extract citation_tokens from citations array using character-based estimation (~4 chars/token)
- Extract num_search_queries from both usage field and root level with priority handling
- Create usage object when none exists to ensure cost fields are always captured
- Handle empty citations and missing fields gracefully
- Enables automatic extraction of cost metrics from Perplexity API responses
* test: add comprehensive test suite for Perplexity cost calculation features
Add 82 comprehensive tests across 3 test files:
- test_perplexity_cost_calculator.py (59 tests):
* Cost calculation with citation tokens, search queries, reasoning tokens
* Various combinations and edge cases
* Integration with main cost calculator
* Model info access and validation
* Zero values and missing fields handling
- test_perplexity_chat_transformation.py (12 tests):
* Citation token extraction from API responses
* Search query extraction from usage and root fields
* Priority handling and field aggregation
* Empty citations and missing fields handling
* Token estimation accuracy validation
- test_perplexity_integration.py (11 tests):
* End-to-end cost calculation workflows
* High-volume and edge case scenarios
* Model info integration validation
* Case-insensitive provider matching
* Transformation preservation of existing fields
Ensures reliability and correctness of all Perplexity cost features with comprehensive coverage of happy path, edge cases, and error conditions.
* fix: remove unused Union import from Perplexity transformation
- Remove unused typing.Union import from litellm/llms/perplexity/chat/transformation.py
- Fixes F401 linting error: 'typing.Union imported but unused'
- Maintains only necessary imports: Any, List, Optional, Tuple
* Fix JSON schema validation and use web_search_requests field
- Add citation_cost_per_token and search_queries_cost_per_1000 to JSON schema
- Update Perplexity transformation to use web_search_requests in PromptTokensDetailsWrapper
- Update Perplexity cost calculator to read from web_search_requests field
- Maintain backward compatibility while using standard LiteLLM fields
* Fix type errors in Perplexity cost calculator
- Add null checks for token counts and cost values to prevent None multiplication errors
- Use .get() with fallback values instead of direct dictionary access
- Ensure all arithmetic operations handle None values safely
This fixes the failing job 44517525148 type errors.
* Refactor Perplexity cost calculation tests to improve accuracy and consistency
- Replace absolute difference assertions with math.isclose for better precision in cost comparisons
- Update tests to utilize PromptTokensDetailsWrapper for handling web search requests
- Ensure all test cases correctly reflect the new structure of usage fields, enhancing clarity and maintainability
* fix: address type hinting issues in PerplexityChatConfig usage handling
- Add type ignore comments to model_response.usage assignments to resolve type checking errors
- Ensures compatibility with type definitions while maintaining existing functionality
* Update model pricing configuration in JSON backup
- Add citation_cost_per_token and search_queries_cost_per_1000 fields to enhance cost tracking
- Remove deprecated search_context_cost_per_query structure to streamline pricing model
- Aligns with recent updates in Perplexity's pricing strategy
* Update search queries cost structure in model_prices_and_context_window.json to use search_context_cost_per_query
* Refactor search queries cost structure in model_prices_and_context_window_backup.json and update related code to use search_queries_cost_per_query. Remove deprecated search_queries_cost_per_1000 references across model info and tests.
* Enhance cost calculation in cost_calculator.py by introducing a safe float casting function to handle potential None and invalid values. Update cost calculations for input, citation, output, reasoning, and search query tokens to use this new function, ensuring more robust handling of model pricing data.
* Refactor cost calculation in cost_calculator.py to support both legacy and current search cost keys. Enhance handling of search cost values by accommodating both dictionary and float formats, ensuring robust cost computation for search queries.
* Update test cases to reflect changes in cost structure, renaming search_queries_cost_per_query to search_context_cost_per_query for consistency with recent refactor. Ensure assertions in tests align with updated cost keys.
* Update test_perplexity_integration.py to rename search_queries_cost_per_query to search_context_cost_per_query, ensuring consistency with recent cost structure changes. Adjust assertions to align with updated cost keys.
* build(model_prices_and_context_window.json): mark all gemini-2.5 models as supporting pdf input
Closes https://github.com/BerriAI/litellm/issues/11881
* fix(anthropic_transformation.py): set custom llm provider custom property
Fixes https://github.com/BerriAI/litellm/issues/11861
* test: add unit test for checking supports_reasoning
* test: add test for vertex ai flow
* feat(bedrock/anthropic): ensure thinking param correctly passed for bedrock/invoke
* Added support for reasoning parameters in magistral models, including "reasoning_effort" and "thinking".
* Updated the MistralConfig class to handle reasoning system prompts.
* Implemented tests to verify reasoning functionality and ensure correct parameter mapping for magistral models.
* Enhanced the model prices JSON to reflect new reasoning capabilities.
* Add Deepgram provider to supported providers list and mappings
* add logo
* Add deepgram to model cost map
* ui - require api key for deepgram
* fix logo path
---------
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
* build(model_prices_and_context_window.json): add o3-pro pricing
* build(model_prices_and_context_window.json): add updated o3 model pricing
* build(model_prices_and_context_window.json): add new o3-pro model version
* build(model_prices_and_context_window.json): add 'claude-opus-4' on vertexai (no @)
* build(model_prices_and_context_window.json): add claude sonnet 4 without 2
*@
* feat(internal_user_endpoints.py): assign user to orgs on user creation
allows user to be a member of orgs on creation - work to enable default orgs on UI
* fix(internal_user_endpoints.py): fix http_request
* fix(vertex_and_google_ai_studio_gemini.py): add web search request tracking
Enables cost calculation for google web search
* fix(vertex_and_gemini): use common processing logic across stream / non-stream calls
* fix(vertex_And_google_ai_studio_Gemini.py): fix initial choice
* fix: fix linting error
* fix: add initial support for google search cost tracking
* fix(tool_call_cost_tracking.py): working tool cost tracking for gemini
* fix(vertex_ai/gemini/cost_calculator.py): add google web search tool cost tracking for vertex ai
Closes LIT-210
* fix: fix check
* build(model_prices_and_context_window.json): fix amazon nova max output tokens
Closes https://github.com/BerriAI/litellm/issues/11441
* fix: fix ruff check
* fix: update model path for llama-v3p2-90b-vision-instruct in pricing configuration (missing fireworks_ai/ prefix)
* feat: add deepseek-r1-0528 model configuration to pricing JSON
* feat: add configurations for new Claude 4 model alias to pricing JSON
* undo prefix change
* fix: update supports_response_schema to false in pricing JSON for litellm_provider
* update supports_tool_choice and supports_response_schema
* Update model configuration to disable function calling and tool choice for multiple models in fireworks_ai. Adjusted supported parameters in FireworksAIConfig to conditionally include tools and tool_choice based on model compatibility.
* Refactor FireworksAIConfig to use supports_function_calling from utils
* Enhance FireworksAIConfig to conditionally support tool_choice based on model capabilities
- Updated model_prices_and_context_window.json to include embed-v4.0 with relevant pricing and metadata.
- Added embed-v4.0 to cohere_embedding_models in constants.py.
- Implemented comprehensive tests for Cohere Embed v4.0 in test_cohere.py, covering basic functionality, input types, error handling, and optional parameters.
* Update mistral-medium prices and context sizes
While testing the Mistral model, I noticed a discrepancy in the pricing shown on the logs screen. After reviewing the code, I confirmed that the pricing values were incorrect.
This PR corrects the input and output token pricing for the latest Mistral model and adds the newly released mistral-medium-2505 version.
* Adds tool calling flag to mistral-medium
* Adds mistral-medium price updates to the main model price file
* Update model_prices_and_context_window_backup.json
sets mistral medium alias to the old values as it probably points to the old version.
* Update model_prices_and_context_window.json
* Update model_prices_and_context_window_backup.json
* Update model_prices_and_context_window.json