diff --git a/docs/my-website/docs/observability/callbacks.md b/docs/my-website/docs/observability/callbacks.md index 69cb0d053e..040d83697d 100644 --- a/docs/my-website/docs/observability/callbacks.md +++ b/docs/my-website/docs/observability/callbacks.md @@ -4,9 +4,14 @@ liteLLM provides `input_callbacks`, `success_callbacks` and `failure_callbacks`, making it easy for you to send data to a particular provider depending on the status of your responses. +:::tip +**New to LiteLLM Callbacks?** Check out our comprehensive [Callback Management Guide](./callback_management.md) to understand when to use different callback hooks like `async_log_success_event` vs `async_post_call_success_hook`. +::: + liteLLM supports: - [Custom Callback Functions](https://docs.litellm.ai/docs/observability/custom_callback) +- [Callback Management Guide](./callback_management.md) - **Comprehensive guide for choosing the right hooks** - [Lunary](https://lunary.ai/docs) - [Langfuse](https://langfuse.com/docs) - [LangSmith](https://www.langchain.com/langsmith) diff --git a/docs/my-website/docs/observability/custom_callback.md b/docs/my-website/docs/observability/custom_callback.md index cc586b2e5d..c206c23d0f 100644 --- a/docs/my-website/docs/observability/custom_callback.md +++ b/docs/my-website/docs/observability/custom_callback.md @@ -4,7 +4,6 @@ **For PROXY** [Go Here](../proxy/logging.md#custom-callback-class-async) ::: - ## Callback Class You can create a custom callback class to precisely log events as they occur in litellm. @@ -57,6 +56,17 @@ def async completion(): asyncio.run(completion()) ``` +## Common Hooks + +- `async_log_success_event` - Log successful API calls +- `async_log_failure_event` - Log failed API calls +- `log_pre_api_call` - Log before API call +- `log_post_api_call` - Log after API call + +**Proxy-only hooks** (only work with LiteLLM Proxy): +- `async_post_call_success_hook` - Access user data + modify responses +- `async_pre_call_hook` - Modify requests before sending + ## Callback Functions If you just want to log on a specific event (e.g. on input) - you can use callback functions. @@ -174,260 +184,87 @@ async def test_chat_openai(): asyncio.run(test_chat_openai()) ``` -:::info +## What's Available in kwargs? -We're actively trying to expand this to other event types. [Tell us if you need this!](https://github.com/BerriAI/litellm/issues/1007) -::: - -## What's in kwargs? - -Notice we pass in a kwargs argument to custom callback. -```python -def custom_callback( - kwargs, # kwargs to completion - completion_response, # response from completion - start_time, end_time # start/end time -): - # Your custom code here - print("LITELLM: in custom callback function") - print("kwargs", kwargs) - print("completion_response", completion_response) - print("start_time", start_time) - print("end_time", end_time) -``` - -This is a dictionary containing all the model-call details (the params we receive, the values we send to the http endpoint, the response we receive, stacktrace in case of errors, etc.). - -This is all logged in the [model_call_details via our Logger](https://github.com/BerriAI/litellm/blob/fc757dc1b47d2eb9d0ea47d6ad224955b705059d/litellm/utils.py#L246). - -Here's exactly what you can expect in the kwargs dictionary: -```shell -### DEFAULT PARAMS ### -"model": self.model, -"messages": self.messages, -"optional_params": self.optional_params, # model-specific params passed in -"litellm_params": self.litellm_params, # litellm-specific params passed in (e.g. metadata passed to completion call) -"start_time": self.start_time, # datetime object of when call was started - -### PRE-API CALL PARAMS ### (check via kwargs["log_event_type"]="pre_api_call") -"input" = input # the exact prompt sent to the LLM API -"api_key" = api_key # the api key used for that LLM API -"additional_args" = additional_args # any additional details for that API call (e.g. contains optional params sent) - -### POST-API CALL PARAMS ### (check via kwargs["log_event_type"]="post_api_call") -"original_response" = original_response # the original http response received (saved via response.text) - -### ON-SUCCESS PARAMS ### (check via kwargs["log_event_type"]="successful_api_call") -"complete_streaming_response" = complete_streaming_response # the complete streamed response (only set if `completion(..stream=True)`) -"end_time" = end_time # datetime object of when call was completed - -### ON-FAILURE PARAMS ### (check via kwargs["log_event_type"]="failed_api_call") -"exception" = exception # the Exception raised -"traceback_exception" = traceback_exception # the traceback generated via `traceback.format_exc()` -"end_time" = end_time # datetime object of when call was completed -``` - - -### Cache hits - -Cache hits are logged in success events as `kwarg["cache_hit"]`. - -Here's an example of accessing it: - - ```python - import litellm -from litellm.integrations.custom_logger import CustomLogger -from litellm import completion, acompletion, Cache - -class MyCustomHandler(CustomLogger): - async def async_log_success_event(self, kwargs, response_obj, start_time, end_time): - print(f"On Success") - print(f"Value of Cache hit: {kwargs['cache_hit']"}) - -async def test_async_completion_azure_caching(): - customHandler_caching = MyCustomHandler() - litellm.cache = Cache(type="redis", host=os.environ['REDIS_HOST'], port=os.environ['REDIS_PORT'], password=os.environ['REDIS_PASSWORD']) - litellm.callbacks = [customHandler_caching] - unique_time = time.time() - response1 = await litellm.acompletion(model="azure/chatgpt-v-2", - messages=[{ - "role": "user", - "content": f"Hi 👋 - i'm async azure {unique_time}" - }], - caching=True) - await asyncio.sleep(1) - print(f"customHandler_caching.states pre-cache hit: {customHandler_caching.states}") - response2 = await litellm.acompletion(model="azure/chatgpt-v-2", - messages=[{ - "role": "user", - "content": f"Hi 👋 - i'm async azure {unique_time}" - }], - caching=True) - await asyncio.sleep(1) # success callbacks are done in parallel - print(f"customHandler_caching.states post-cache hit: {customHandler_caching.states}") - assert len(customHandler_caching.errors) == 0 - assert len(customHandler_caching.states) == 4 # pre, post, success, success - ``` - -### Get complete streaming response - -LiteLLM will pass you the complete streaming response in the final streaming chunk as part of the kwargs for your custom callback function. +The kwargs dictionary contains all the details about your API call: ```python -# litellm.set_verbose = False - def custom_callback( - kwargs, # kwargs to completion - completion_response, # response from completion - start_time, end_time # start/end time - ): - # print(f"streaming response: {completion_response}") - if "complete_streaming_response" in kwargs: - print(f"Complete Streaming Response: {kwargs['complete_streaming_response']}") - - # Assign the custom callback function - litellm.success_callback = [custom_callback] - - response = completion(model="claude-instant-1", messages=messages, stream=True) - for idx, chunk in enumerate(response): - pass -``` - - -### Log additional metadata - -LiteLLM accepts a metadata dictionary in the completion call. You can pass additional metadata into your completion call via `completion(..., metadata={"key": "value"})`. - -Since this is a [litellm-specific param](https://github.com/BerriAI/litellm/blob/b6a015404eed8a0fa701e98f4581604629300ee3/litellm/main.py#L235), it's accessible via kwargs["litellm_params"] - -```python -from litellm import completion -import os, litellm - -## set ENV variables -os.environ["OPENAI_API_KEY"] = "your-api-key" - -messages = [{ "content": "Hello, how are you?","role": "user"}] - -def custom_callback( - kwargs, # kwargs to completion - completion_response, # response from completion - start_time, end_time # start/end time -): - print(kwargs["litellm_params"]["metadata"]) +def custom_callback(kwargs, completion_response, start_time, end_time): + # Access common data + model = kwargs.get("model") + messages = kwargs.get("messages", []) + cost = kwargs.get("response_cost", 0) + cache_hit = kwargs.get("cache_hit", False) - -# Assign the custom callback function -litellm.success_callback = [custom_callback] - -response = litellm.completion(model="gpt-3.5-turbo", messages=messages, metadata={"hello": "world"}) + # Access metadata you passed in + metadata = kwargs.get("litellm_params", {}).get("metadata", {}) ``` -## Examples +**Key fields in kwargs:** +- `model` - The model name +- `messages` - Input messages +- `response_cost` - Calculated cost +- `cache_hit` - Whether response was cached +- `litellm_params.metadata` - Your custom metadata -### Custom Callback to track costs for Streaming + Non-Streaming -By default, the response cost is accessible in the logging object via `kwargs["response_cost"]` on success (sync + async) +## Practical Examples + +### Track API Costs ```python +def track_cost_callback(kwargs, completion_response, start_time, end_time): + cost = kwargs["response_cost"] # litellm calculates this for you + print(f"Request cost: ${cost}") -# Step 1. Write your custom callback function -def track_cost_callback( - kwargs, # kwargs to completion - completion_response, # response from completion - start_time, end_time # start/end time -): - try: - response_cost = kwargs["response_cost"] # litellm calculates response cost for you - print("regular response_cost", response_cost) - except: - pass - -# Step 2. Assign the custom callback function litellm.success_callback = [track_cost_callback] -# Step 3. Make litellm.completion call -response = completion( - model="gpt-3.5-turbo", - messages=[ - { - "role": "user", - "content": "Hi 👋 - i'm openai" - } - ] -) - -print(response) +response = completion(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Hello"}]) ``` -### Custom Callback to log transformed Input to LLMs +### Log Inputs to LLMs ```python -def get_transformed_inputs( - kwargs, -): +def get_transformed_inputs(kwargs): params_to_model = kwargs["additional_args"]["complete_input_dict"] print("params to model", params_to_model) litellm.input_callback = [get_transformed_inputs] -def test_chat_openai(): - try: - response = completion(model="claude-2", - messages=[{ - "role": "user", - "content": "Hi 👋 - i'm openai" - }]) - - print(response) - - except Exception as e: - print(e) - pass +response = completion(model="claude-2", messages=[{"role": "user", "content": "Hello"}]) ``` -#### Output -```shell -params to model {'model': 'claude-2', 'prompt': "\n\nHuman: Hi 👋 - i'm openai\n\nAssistant: ", 'max_tokens_to_sample': 256} +### Send to External Service +```python +import requests + +def send_to_analytics(kwargs, completion_response, start_time, end_time): + data = { + "model": kwargs.get("model"), + "cost": kwargs.get("response_cost", 0), + "duration": (end_time - start_time).total_seconds() + } + requests.post("https://your-analytics.com/api", json=data) + +litellm.success_callback = [send_to_analytics] ``` -### Custom Callback to write to Mixpanel +## Common Issues + +### Callback Not Called +Make sure you: +1. Register callbacks correctly: `litellm.callbacks = [MyHandler()]` +2. Use the right hook names (check spelling) +3. Don't use proxy-only hooks in library mode + +### Performance Issues +- Use async hooks for I/O operations +- Don't block in callback functions +- Handle exceptions properly: ```python -import mixpanel -import litellm -from litellm import completion - -def custom_callback( - kwargs, # kwargs to completion - completion_response, # response from completion - start_time, end_time # start/end time -): - # Your custom code here - mixpanel.track("LLM Response", {"llm_response": completion_response}) - - -# Assign the custom callback function -litellm.success_callback = [custom_callback] - -response = completion( - model="gpt-3.5-turbo", - messages=[ - { - "role": "user", - "content": "Hi 👋 - i'm openai" - } - ] -) - -print(response) - +class SafeHandler(CustomLogger): + async def async_log_success_event(self, kwargs, response_obj, start_time, end_time): + try: + await external_service(response_obj) + except Exception as e: + print(f"Callback error: {e}") # Log but don't break the flow ``` - - - - - - - - - - - diff --git a/docs/my-website/docs/proxy/call_hooks.md b/docs/my-website/docs/proxy/call_hooks.md index b4e22027d1..aef33f8c70 100644 --- a/docs/my-website/docs/proxy/call_hooks.md +++ b/docs/my-website/docs/proxy/call_hooks.md @@ -6,6 +6,10 @@ import Image from '@theme/IdealImage'; - Reject data before making llm api calls / before returning the response - Enforce 'user' param for all openai endpoint calls +:::tip +**Understanding Callback Hooks?** Check out our [Callback Management Guide](../observability/callback_management.md) to understand the differences between proxy-specific hooks like `async_pre_call_hook` and general logging hooks like `async_log_success_event`. +::: + See a complete example with our [parallel request rate limiter](https://github.com/BerriAI/litellm/blob/main/litellm/proxy/hooks/parallel_request_limiter.py) ## Quick Start