mirror of
https://github.com/tiennm99/litellm.git
synced 2026-06-17 20:48:32 +00:00
f9407bc036
* chore(tests): migrate Bedrock CI from AWS account 888602223428 to 941277531214
The original account (888602223428) was put under a security restriction by
AWS after a root access key leaked in a PR comment. While that account works
its way through the AWS Support unlock process, Bedrock-touching CI tests have
been migrated to a fresh account (941277531214).
Changes:
- Replace 26 hardcoded references to 888602223428 with 941277531214 across
8 files (provisioned-model ARNs, imported-model ARNs, AgentCore runtime
ARNs, batch execution role ARN, and example proxy config).
- The provisioned-model and imported-model ARNs are referenced only from
mocked unit tests — no AWS resources to recreate.
- The batch execution IAM role has been recreated in the new account with
the same name and equivalent permissions.
- The two AgentCore runtimes (hosted_agent_r9jvp-3ySZuRHjLC,
hosted_agent_13sf6-cALnp38iZD) are being recreated in the new account
under the same names — see tools/agentcore-deploy/ in a follow-up.
CircleCI env vars AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY / AWS_REGION_NAME
were updated separately via the CircleCI API to point at the new account.
Smoke-tested locally against the new account:
aws bedrock-runtime converse --region us-west-2 \
--model-id us.anthropic.claude-sonnet-4-5-20250929-v1:0 \
--messages '[{"role":"user","content":[{"text":"ping"}]}]'
→ 200, model returned 'pong'
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* chore(tests): refresh AgentCore ARN suffixes to match newly-deployed runtimes
The first migration commit replaced just the account ID, but AgentCore
auto-assigns a random 10-char suffix to every runtime on creation — we
can't reuse the original suffixes (`3ySZuRHjLC`, `cALnp38iZD`) in the
new account. Updated the AgentCore-runtime ARNs in the three files that
reference real runtime IDs (not the mock-based unit-test ARNs).
Deployed runtimes:
arn:aws:bedrock-agentcore:us-west-2:941277531214:runtime/hosted_agent_r9jvp-Rq79QFC2fp
arn:aws:bedrock-agentcore:us-west-2:941277531214:runtime/hosted_agent_13sf6-4046UzHSwy
Both runtimes are status=READY and pass a smoke invoke:
$ aws bedrock-agentcore invoke-agent-runtime --agent-runtime-arn ... --payload '{"prompt":"ping"}'
→ 200, {"result": "echo: ping"}
The agent is a minimal echo (see /tmp/agentcore_deploy/agent.py for the
deploy artifacts). Tests that only verify the SDK wiring will pass; if any
test asserts on agent output content, swap the echo for the real agent.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* chore(tests): point Bedrock batch tests at new-account S3 bucket
The account migration (888602223428 -> 941277531214) was a flat
account-ID swap, which only rewrites ARNs that embed the account
number. S3 bucket names carry no account ID, so the live Bedrock
batch tests still uploaded to `litellm-proxy` — a bucket that lives
in the old account. S3 names are globally unique, and the old account
still holds that name, so it can't be recreated in the new account.
Rename to `litellm-proxy-941277531214` (account-ID suffix guarantees
global uniqueness). The bucket must be created in 941277531214 and the
batch execution role granted s3:GetObject/PutObject/ListBucket on it
before this job is run in CI.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore(tests): point live S3 logging test at new-account bucket
Same account-ID-free blind spot as the batch bucket: `load-testing-oct`
lives in the old account and its name can't be reused globally. The
`logging_testing` CI job is wired into the workflow and runs
test_basic_s3_logging, which uploads to this bucket with the CI env
creds, then lists and deletes objects — a live dependency.
Rename to `load-testing-oct-941277531214`. The bucket must exist in the
new account with the CI IAM principal granted
s3:PutObject/GetObject/ListBucket/DeleteObject before this job runs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore(tests): repoint Bedrock guardrail IDs to new-account guardrails
The migration left guardrail IDs untouched (no account ID in them), so
all live guardrail tests failed with "guardrail identifier or version
does not exist" against 941277531214. Recreated both guardrails in the
new account and updated the hardcoded IDs:
- wf0hkdb5x07f -> zgkmukebruil (PII mask: PHONE + CREDIT_DEBIT_CARD,
with explicit inputAction=ANONYMIZE so masking applies to INPUT,
which is the source litellm's moderation hook sends)
- ff6ujrregl1q -> 4w3d1di3snt5 (blocks "coffee"; blocked message set
to the exact string the tests assert on)
Updated test_bedrock_guardrails.py, otel_test_config.yaml, and the
guardrailConfig in test_bedrock_completion.py. Verified locally: the 5
previously-failing guardrail tests now pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test(bedrock): migrate legacy models to current inference profiles
The new CI account (941277531214) cannot invoke legacy Bedrock models
(AWS gates them: "marked by provider as Legacy... not actively using in
the last 30 days"). Migrated the live-call tests:
- anthropic.claude-3-sonnet-20240229 -> us.anthropic.claude-sonnet-4-5-20250929-v1:0
- anthropic.claude-3-haiku-20240307 -> us.anthropic.claude-haiku-4-5-20251001-v1:0
Current Claude models on Bedrock require the us. inference-profile prefix
(bare on-demand ids are rejected).
cohere.command-r-plus has no working replacement (all Cohere is legacy-
gated in the new account): swapped to claude-haiku-4-5 in provider-
agnostic param lists. amazon.titan-image-generator skipped (no working
replacement). Mocked/transformation/cost tests that reference the legacy
strings are intentionally left unchanged. Verified live against the new
account.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test(bedrock): repoint SageMaker + Knowledge Base to new-account resources
These referenced account-scoped resources by hardcoded id that only
existed in the old account, so the migration's account-ID swap missed
them. Recreated in 941277531214 and repointed:
- SageMaker endpoint jumpstart-dft-hf-textgeneration1-mp-20240815-185614
-> litellm-ci-textgen (gpt2 on a TGI container, ml.g5.xlarge)
- Bedrock Knowledge Base T37J8R4WTM -> LCYXFBR2TU (OpenSearch Serverless
vector store + titan-embed-text-v2, seeded with a LiteLLM doc)
Verified live: test_sagemaker.py (12 passed) and
test_bedrock_knowledgebase_hook.py (12 passed).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test(reasoning_effort_grid): skip bedrock claude-opus-4-7 cells (not entitled on 941277531214)
claude-opus-4-7 is listed in the new Bedrock CI account's foundation
models but invoke is denied (AccessDeniedException: "not available for
this account"). Bedrock access to the flagship Opus requires an AWS
Sales request, not the self-serve model-access toggle, so it can't be
enabled inline with the rest of the account migration.
Add an optional `skip_reason` to ModelEntry and set it on the
bedrock-claude-opus-4-7 entry; the grid test honors it via pytest.skip.
Cell count (231) and route coverage are unchanged, so the structural
asserts still pass. Restore coverage by deleting the one skip_reason
line once access is granted.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test(bedrock): swap/skip legacy-gated models unavailable on new CI account
The migrated AWS account (941277531214) cannot access several models that
the old account could, so the remaining red CI jobs were hitting real
Bedrock "Access denied / Legacy" and "account not authorized" errors:
- image_gen: skip both Nova Canvas test classes (amazon.nova-canvas-v1:0 is
legacy-gated), matching the existing titan skip.
- batches: skip test_async_file_and_batch (Bedrock batch inference is not
authorized on the new account; requires an AWS support case).
- litellm_overhead: swap legacy claude-3-5-haiku for the active
us.anthropic.claude-haiku-4-5 inference profile.
- test_completion_claude_3_function_call: swap legacy claude-3-sonnet for the
active us.anthropic.claude-sonnet-4-5 inference profile.
https://claude.ai/code/session_01Y7zgHYu9GX29YRwV4yiWAa
* test(bedrock): fix remaining e2e legacy-model + batch failures on new CI account
- e2e_openai_endpoints: skip test_bedrock_batches_api (Bedrock batch inference
is not authorized on account 941277531214) and migrate the missed
s3_bucket_name in oai_misc_config.yaml to litellm-proxy-941277531214.
- build_and_test: swap legacy bedrock claude-3-sonnet for the active
us.anthropic.claude-sonnet-4-5 inference profile in the proxy structured
output e2e test.
https://claude.ai/code/session_01Y7zgHYu9GX29YRwV4yiWAa
* test(bedrock): make opus-4-7 + batch cells fail loudly and mock image-gen (#28791)
Replace the silent skips added for the new CI account with noisier behavior:
- reasoning-effort grid: opus-4-7 cells now fail (when AWS creds are present)
instead of skipping, so the missing entitlement stays visible in CI; they
still skip when AWS creds are absent (local dev)
- Bedrock batch inference tests: drop the skip so they run and fail until
batch access is granted
- Titan + Nova Canvas image-gen tests: mock the Bedrock HTTP call so the
transform + cost-tracking path stays under test without live model access
https://claude.ai/code/session_01MT7SWDnXUjv6e6EPG7BDjT
Co-authored-by: Claude <noreply@anthropic.com>
* test(bedrock): use pytest.xfail for known-failing opus-4-7 cells
Replace pytest.fail with pytest.xfail when a model has a fail_reason,
so known-broken cells stay visible as XFAIL without keeping CI red.
Co-authored-by: Yassin Kortam <yassin@berri.ai>
---------
Co-authored-by: Mateo <mateo@Mateos-MacBook-Pro.local>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Yassin Kortam <yassin@berri.ai>
795 lines
32 KiB
Python
795 lines
32 KiB
Python
import os
|
|
import sys
|
|
import traceback
|
|
|
|
from dotenv import load_dotenv
|
|
|
|
load_dotenv()
|
|
import io
|
|
import os
|
|
|
|
sys.path.insert(
|
|
0, os.path.abspath("../..")
|
|
) # Adds the parent directory to the system path
|
|
import pytest
|
|
from unittest.mock import patch, MagicMock, AsyncMock
|
|
import litellm
|
|
from litellm import RateLimitError, Timeout, completion, completion_cost, embedding
|
|
|
|
litellm.num_retries = 0
|
|
litellm.cache = None
|
|
# litellm.set_verbose=True
|
|
import json
|
|
|
|
# litellm.success_callback = ["langfuse"]
|
|
|
|
|
|
def get_current_weather(location, unit="fahrenheit"):
|
|
"""Get the current weather in a given location"""
|
|
if "tokyo" in location.lower():
|
|
return json.dumps({"location": "Tokyo", "temperature": "10", "unit": "celsius"})
|
|
elif "san francisco" in location.lower():
|
|
return json.dumps(
|
|
{"location": "San Francisco", "temperature": "72", "unit": "fahrenheit"}
|
|
)
|
|
elif "paris" in location.lower():
|
|
return json.dumps({"location": "Paris", "temperature": "22", "unit": "celsius"})
|
|
else:
|
|
return json.dumps({"location": location, "temperature": "unknown"})
|
|
|
|
|
|
# Example dummy function hard coded to return the same weather
|
|
|
|
|
|
# In production, this could be your backend API or an external API
|
|
@pytest.mark.parametrize(
|
|
"model",
|
|
[
|
|
"gpt-3.5-turbo-1106",
|
|
"mistral/mistral-large-latest",
|
|
"claude-haiku-4-5-20251001",
|
|
"gemini/gemini-2.5-flash-lite",
|
|
"us.anthropic.claude-sonnet-4-5-20250929-v1:0",
|
|
],
|
|
)
|
|
@pytest.mark.flaky(retries=3, delay=1)
|
|
def test_aaparallel_function_call(model):
|
|
try:
|
|
litellm.set_verbose = True
|
|
litellm.modify_params = True
|
|
# Step 1: send the conversation and available functions to the model
|
|
messages = [
|
|
{
|
|
"role": "user",
|
|
"content": "What's the weather like in San Francisco, Tokyo, and Paris? - give me 3 responses",
|
|
}
|
|
]
|
|
tools = [
|
|
{
|
|
"type": "function",
|
|
"function": {
|
|
"name": "get_current_weather",
|
|
"description": "Get the current weather in a given location",
|
|
"parameters": {
|
|
"type": "object",
|
|
"properties": {
|
|
"location": {
|
|
"type": "string",
|
|
"description": "The city and state",
|
|
},
|
|
"unit": {
|
|
"type": "string",
|
|
"enum": ["celsius", "fahrenheit"],
|
|
},
|
|
},
|
|
"required": ["location"],
|
|
},
|
|
},
|
|
}
|
|
]
|
|
response = litellm.completion(
|
|
model=model,
|
|
messages=messages,
|
|
tools=tools,
|
|
tool_choice="auto", # auto is default, but we'll be explicit
|
|
)
|
|
print("Response\n", response)
|
|
response_message = response.choices[0].message
|
|
tool_calls = response_message.tool_calls
|
|
|
|
print("Expecting there to be 3 tool calls")
|
|
assert (
|
|
len(tool_calls) > 0
|
|
) # this has to call the function for SF, Tokyo and paris
|
|
|
|
# Step 2: check if the model wanted to call a function
|
|
print(f"tool_calls: {tool_calls}")
|
|
if tool_calls:
|
|
# Step 3: call the function
|
|
# Note: the JSON response may not always be valid; be sure to handle errors
|
|
available_functions = {
|
|
"get_current_weather": get_current_weather,
|
|
} # only one function in this example, but you can have multiple
|
|
messages.append(
|
|
response_message
|
|
) # extend conversation with assistant's reply
|
|
print("Response message\n", response_message)
|
|
# Step 4: send the info for each function call and function response to the model
|
|
for tool_call in tool_calls:
|
|
function_name = tool_call.function.name
|
|
if function_name not in available_functions:
|
|
# the model called a function that does not exist in available_functions - don't try calling anything
|
|
return
|
|
function_to_call = available_functions[function_name]
|
|
function_args = json.loads(tool_call.function.arguments)
|
|
function_response = function_to_call(
|
|
location=function_args.get("location"),
|
|
unit=function_args.get("unit"),
|
|
)
|
|
messages.append(
|
|
{
|
|
"tool_call_id": tool_call.id,
|
|
"role": "tool",
|
|
"name": function_name,
|
|
"content": function_response,
|
|
}
|
|
) # extend conversation with function response
|
|
print(f"messages: {messages}")
|
|
second_response = litellm.completion(
|
|
model=model,
|
|
messages=messages,
|
|
temperature=0.2,
|
|
seed=22,
|
|
# tools=tools,
|
|
drop_params=True,
|
|
) # get a new response from the model where it can see the function response
|
|
print("second response\n", second_response)
|
|
except litellm.InternalServerError as e:
|
|
print(e)
|
|
except litellm.RateLimitError as e:
|
|
print(e)
|
|
except Exception as e:
|
|
pytest.fail(f"Error occurred: {e}")
|
|
|
|
|
|
# test_parallel_function_call()
|
|
|
|
|
|
@pytest.mark.parametrize(
|
|
"model",
|
|
[
|
|
"anthropic/claude-haiku-4-5-20251001",
|
|
"bedrock/us.anthropic.claude-sonnet-4-5-20250929-v1:0",
|
|
],
|
|
)
|
|
@pytest.mark.flaky(retries=3, delay=1)
|
|
def test_aaparallel_function_call_with_anthropic_thinking(model):
|
|
try:
|
|
litellm._turn_on_debug()
|
|
litellm.modify_params = True
|
|
# Step 1: send the conversation and available functions to the model
|
|
messages = [
|
|
{
|
|
"role": "user",
|
|
"content": "What's the weather like in San Francisco, Tokyo, and Paris? - give me 3 responses",
|
|
}
|
|
]
|
|
tools = [
|
|
{
|
|
"type": "function",
|
|
"function": {
|
|
"name": "get_current_weather",
|
|
"description": "Get the current weather in a given location",
|
|
"parameters": {
|
|
"type": "object",
|
|
"properties": {
|
|
"location": {
|
|
"type": "string",
|
|
"description": "The city and state",
|
|
},
|
|
"unit": {
|
|
"type": "string",
|
|
"enum": ["celsius", "fahrenheit"],
|
|
},
|
|
},
|
|
"required": ["location"],
|
|
},
|
|
},
|
|
}
|
|
]
|
|
response = litellm.completion(
|
|
model=model,
|
|
messages=messages,
|
|
tools=tools,
|
|
tool_choice="auto", # auto is default, but we'll be explicit
|
|
thinking={"type": "enabled", "budget_tokens": 1024},
|
|
)
|
|
print("Response\n", response)
|
|
response_message = response.choices[0].message
|
|
tool_calls = response_message.tool_calls
|
|
|
|
print("Expecting there to be 3 tool calls")
|
|
assert (
|
|
len(tool_calls) > 0
|
|
) # this has to call the function for SF, Tokyo and paris
|
|
|
|
# Step 2: check if the model wanted to call a function
|
|
print(f"tool_calls: {tool_calls}")
|
|
if tool_calls:
|
|
# Step 3: call the function
|
|
# Note: the JSON response may not always be valid; be sure to handle errors
|
|
available_functions = {
|
|
"get_current_weather": get_current_weather,
|
|
} # only one function in this example, but you can have multiple
|
|
messages.append(
|
|
response_message
|
|
) # extend conversation with assistant's reply
|
|
print("Response message\n", response_message)
|
|
# Step 4: send the info for each function call and function response to the model
|
|
for tool_call in tool_calls:
|
|
function_name = tool_call.function.name
|
|
if function_name not in available_functions:
|
|
# the model called a function that does not exist in available_functions - don't try calling anything
|
|
return
|
|
function_to_call = available_functions[function_name]
|
|
function_args = json.loads(tool_call.function.arguments)
|
|
function_response = function_to_call(
|
|
location=function_args.get("location"),
|
|
unit=function_args.get("unit"),
|
|
)
|
|
messages.append(
|
|
{
|
|
"tool_call_id": tool_call.id,
|
|
"role": "tool",
|
|
"name": function_name,
|
|
"content": function_response,
|
|
}
|
|
) # extend conversation with function response
|
|
print(f"messages: {messages}")
|
|
second_response = litellm.completion(
|
|
model=model,
|
|
messages=messages,
|
|
seed=22,
|
|
# tools=tools,
|
|
drop_params=True,
|
|
thinking={"type": "enabled", "budget_tokens": 1024},
|
|
) # get a new response from the model where it can see the function response
|
|
print("second response\n", second_response)
|
|
|
|
## THIRD RESPONSE
|
|
except litellm.InternalServerError as e:
|
|
print(e)
|
|
except litellm.RateLimitError as e:
|
|
print(e)
|
|
except Exception as e:
|
|
pytest.fail(f"Error occurred: {e}")
|
|
|
|
|
|
from litellm.types.utils import ChatCompletionMessageToolCall, Function, Message
|
|
|
|
_PARALLEL_TOOL_HISTORY_MESSAGES = [
|
|
{
|
|
"role": "user",
|
|
"content": "What's the weather like in San Francisco, Tokyo, and Paris? - give me 3 responses",
|
|
},
|
|
Message(
|
|
content="Here are the current weather conditions for San Francisco, Tokyo, and Paris:",
|
|
role="assistant",
|
|
tool_calls=[
|
|
ChatCompletionMessageToolCall(
|
|
index=1,
|
|
function=Function(
|
|
arguments='{"location": "San Francisco, CA", "unit": "fahrenheit"}',
|
|
name="get_current_weather",
|
|
),
|
|
id="tooluse_Jj98qn6xQlOP_PiQr-w9iA",
|
|
type="function",
|
|
)
|
|
],
|
|
function_call=None,
|
|
),
|
|
{
|
|
"tool_call_id": "tooluse_Jj98qn6xQlOP_PiQr-w9iA",
|
|
"role": "tool",
|
|
"name": "get_current_weather",
|
|
"content": '{"location": "San Francisco", "temperature": "72", "unit": "fahrenheit"}',
|
|
},
|
|
]
|
|
|
|
|
|
@pytest.mark.parametrize(
|
|
"model, messages, expect_unsupported_params_error",
|
|
[
|
|
# Bedrock Converse still requires modify_params to inject the dummy tool.
|
|
(
|
|
"us.anthropic.claude-sonnet-4-5-20250929-v1:0",
|
|
_PARALLEL_TOOL_HISTORY_MESSAGES,
|
|
True,
|
|
),
|
|
# Anthropic Messages API: dummy tool is injected without modify_params.
|
|
(
|
|
"claude-haiku-4-5-20251001",
|
|
_PARALLEL_TOOL_HISTORY_MESSAGES,
|
|
False,
|
|
),
|
|
(
|
|
"us.anthropic.claude-sonnet-4-5-20250929-v1:0",
|
|
[
|
|
{
|
|
"role": "user",
|
|
"content": "What's the weather like in San Francisco, Tokyo, and Paris? - give me 3 responses",
|
|
}
|
|
],
|
|
False,
|
|
),
|
|
(
|
|
"claude-haiku-4-5-20251001",
|
|
[
|
|
{
|
|
"role": "user",
|
|
"content": "What's the weather like in San Francisco, Tokyo, and Paris? - give me 3 responses",
|
|
}
|
|
],
|
|
False,
|
|
),
|
|
],
|
|
)
|
|
def test_parallel_function_call_anthropic_error_msg(
|
|
model, messages, expect_unsupported_params_error
|
|
):
|
|
"""
|
|
Tool history without an explicit ``tools`` param:
|
|
|
|
- Bedrock **Converse** still raises ``UnsupportedParamsError`` unless
|
|
``litellm.modify_params`` is enabled (dummy tool is only added there).
|
|
- **Anthropic** (and Bedrock Invoke via ``AnthropicConfig.transform_request``)
|
|
always get a dummy tool so CLIs work with ``modify_params`` left off.
|
|
|
|
Reference Issue: https://github.com/BerriAI/litellm/issues/5747, https://github.com/BerriAI/litellm/issues/5388
|
|
"""
|
|
# Ensure modify_params is False so Bedrock Converse path still raises.
|
|
# (other tests in this file set it to True and don't reset it)
|
|
original_modify_params = litellm.modify_params
|
|
litellm.modify_params = False
|
|
try:
|
|
litellm.set_verbose = True
|
|
|
|
if expect_unsupported_params_error:
|
|
with pytest.raises(litellm.UnsupportedParamsError) as e:
|
|
second_response = litellm.completion(
|
|
model=model,
|
|
messages=messages,
|
|
temperature=0.2,
|
|
seed=22,
|
|
drop_params=True,
|
|
) # get a new response from the model where it can see the function response
|
|
print("second response\n", second_response)
|
|
else:
|
|
second_response = litellm.completion(
|
|
model=model,
|
|
messages=messages,
|
|
temperature=0.2,
|
|
seed=22,
|
|
drop_params=True,
|
|
) # get a new response from the model where it can see the function response
|
|
print("second response\n", second_response)
|
|
except litellm.InternalServerError as e:
|
|
print(e)
|
|
except litellm.RateLimitError as e:
|
|
print(e)
|
|
except Exception as e:
|
|
pytest.fail(f"Error occurred: {e}")
|
|
finally:
|
|
litellm.modify_params = original_modify_params
|
|
|
|
|
|
def test_parallel_function_call_stream():
|
|
try:
|
|
litellm.set_verbose = True
|
|
# Step 1: send the conversation and available functions to the model
|
|
messages = [
|
|
{
|
|
"role": "user",
|
|
"content": "What's the weather like in San Francisco, Tokyo, and Paris?",
|
|
}
|
|
]
|
|
tools = [
|
|
{
|
|
"type": "function",
|
|
"function": {
|
|
"name": "get_current_weather",
|
|
"description": "Get the current weather in a given location",
|
|
"parameters": {
|
|
"type": "object",
|
|
"properties": {
|
|
"location": {
|
|
"type": "string",
|
|
"description": "The city and state, e.g. San Francisco, CA",
|
|
},
|
|
"unit": {
|
|
"type": "string",
|
|
"enum": ["celsius", "fahrenheit"],
|
|
},
|
|
},
|
|
"required": ["location"],
|
|
},
|
|
},
|
|
}
|
|
]
|
|
response = litellm.completion(
|
|
model="gpt-3.5-turbo-1106",
|
|
messages=messages,
|
|
tools=tools,
|
|
stream=True,
|
|
tool_choice="auto", # auto is default, but we'll be explicit
|
|
complete_response=True,
|
|
)
|
|
print("Response\n", response)
|
|
# for chunk in response:
|
|
# print(chunk)
|
|
response_message = response.choices[0].message
|
|
tool_calls = response_message.tool_calls
|
|
|
|
print("length of tool calls", len(tool_calls))
|
|
print("Expecting there to be 3 tool calls")
|
|
assert (
|
|
len(tool_calls) > 1
|
|
) # this has to call the function for SF, Tokyo and parise
|
|
|
|
# Step 2: check if the model wanted to call a function
|
|
if tool_calls:
|
|
# Step 3: call the function
|
|
# Note: the JSON response may not always be valid; be sure to handle errors
|
|
available_functions = {
|
|
"get_current_weather": get_current_weather,
|
|
} # only one function in this example, but you can have multiple
|
|
messages.append(
|
|
response_message
|
|
) # extend conversation with assistant's reply
|
|
print("Response message\n", response_message)
|
|
# Step 4: send the info for each function call and function response to the model
|
|
for tool_call in tool_calls:
|
|
function_name = tool_call.function.name
|
|
function_to_call = available_functions[function_name]
|
|
function_args = json.loads(tool_call.function.arguments)
|
|
function_response = function_to_call(
|
|
location=function_args.get("location"),
|
|
unit=function_args.get("unit"),
|
|
)
|
|
messages.append(
|
|
{
|
|
"tool_call_id": tool_call.id,
|
|
"role": "tool",
|
|
"name": function_name,
|
|
"content": function_response,
|
|
}
|
|
) # extend conversation with function response
|
|
print(f"messages: {messages}")
|
|
second_response = litellm.completion(
|
|
model="gpt-3.5-turbo-1106", messages=messages, temperature=0.2, seed=22
|
|
) # get a new response from the model where it can see the function response
|
|
print("second response\n", second_response)
|
|
return second_response
|
|
except Exception as e:
|
|
pytest.fail(f"Error occurred: {e}")
|
|
|
|
|
|
# test_parallel_function_call_stream()
|
|
|
|
|
|
@pytest.mark.skip(
|
|
reason="Flaky test. Groq function calling is not reliable for ci/cd testing."
|
|
)
|
|
def test_groq_parallel_function_call():
|
|
litellm.set_verbose = True
|
|
try:
|
|
# Step 1: send the conversation and available functions to the model
|
|
messages = [
|
|
{
|
|
"role": "system",
|
|
"content": "You are a function calling LLM that uses the data extracted from get_current_weather to answer questions about the weather in San Francisco.",
|
|
},
|
|
{
|
|
"role": "user",
|
|
"content": "What's the weather like in San Francisco?",
|
|
},
|
|
]
|
|
tools = [
|
|
{
|
|
"type": "function",
|
|
"function": {
|
|
"name": "get_current_weather",
|
|
"description": "Get the current weather in a given location",
|
|
"parameters": {
|
|
"type": "object",
|
|
"properties": {
|
|
"location": {
|
|
"type": "string",
|
|
"description": "The city and state, e.g. San Francisco, CA",
|
|
},
|
|
"unit": {
|
|
"type": "string",
|
|
"enum": ["celsius", "fahrenheit"],
|
|
},
|
|
},
|
|
"required": ["location"],
|
|
},
|
|
},
|
|
}
|
|
]
|
|
response = litellm.completion(
|
|
model="groq/llama2-70b-4096",
|
|
messages=messages,
|
|
tools=tools,
|
|
tool_choice="auto", # auto is default, but we'll be explicit
|
|
)
|
|
print("Response\n", response)
|
|
response_message = response.choices[0].message
|
|
if hasattr(response_message, "tool_calls"):
|
|
tool_calls = response_message.tool_calls
|
|
|
|
assert isinstance(
|
|
response.choices[0].message.tool_calls[0].function.name, str
|
|
)
|
|
assert isinstance(
|
|
response.choices[0].message.tool_calls[0].function.arguments, str
|
|
)
|
|
|
|
print("length of tool calls", len(tool_calls))
|
|
|
|
# Step 2: check if the model wanted to call a function
|
|
if tool_calls:
|
|
# Step 3: call the function
|
|
# Note: the JSON response may not always be valid; be sure to handle errors
|
|
available_functions = {
|
|
"get_current_weather": get_current_weather,
|
|
} # only one function in this example, but you can have multiple
|
|
messages.append(
|
|
response_message
|
|
) # extend conversation with assistant's reply
|
|
print("Response message\n", response_message)
|
|
# Step 4: send the info for each function call and function response to the model
|
|
for tool_call in tool_calls:
|
|
function_name = tool_call.function.name
|
|
function_to_call = available_functions[function_name]
|
|
function_args = json.loads(tool_call.function.arguments)
|
|
function_response = function_to_call(
|
|
location=function_args.get("location"),
|
|
unit=function_args.get("unit"),
|
|
)
|
|
|
|
messages.append(
|
|
{
|
|
"tool_call_id": tool_call.id,
|
|
"role": "tool",
|
|
"name": function_name,
|
|
"content": function_response,
|
|
}
|
|
) # extend conversation with function response
|
|
print(f"messages: {messages}")
|
|
second_response = litellm.completion(
|
|
model="groq/llama2-70b-4096", messages=messages
|
|
) # get a new response from the model where it can see the function response
|
|
print("second response\n", second_response)
|
|
except Exception as e:
|
|
pytest.fail(f"Error occurred: {e}")
|
|
|
|
|
|
@pytest.mark.parametrize(
|
|
"model",
|
|
[
|
|
"bedrock/us.anthropic.claude-sonnet-4-5-20250929-v1:0",
|
|
],
|
|
)
|
|
def test_passing_tool_result_as_list(model):
|
|
litellm.set_verbose = True
|
|
litellm._turn_on_debug()
|
|
messages = [
|
|
{
|
|
"content": [
|
|
{
|
|
"type": "text",
|
|
"text": "You are a helpful assistant that have the ability to interact with a computer to solve tasks.",
|
|
}
|
|
],
|
|
"role": "system",
|
|
},
|
|
{
|
|
"content": [
|
|
{
|
|
"type": "text",
|
|
"text": "Write a git commit message for the current staging area and commit the changes.",
|
|
}
|
|
],
|
|
"role": "user",
|
|
},
|
|
{
|
|
"content": [
|
|
{
|
|
"type": "text",
|
|
"text": "I'll help you commit the changes. Let me first check the git status to see what changes are staged.",
|
|
}
|
|
],
|
|
"role": "assistant",
|
|
"tool_calls": [
|
|
{
|
|
"index": 1,
|
|
"function": {
|
|
"arguments": '{"command": "git status", "thought": "Checking git status to see staged changes"}',
|
|
"name": "execute_bash",
|
|
},
|
|
"id": "toolu_01V1paXrun4CVetdAGiQaZG5",
|
|
"type": "function",
|
|
}
|
|
],
|
|
},
|
|
{
|
|
"content": [
|
|
{
|
|
"type": "text",
|
|
"text": 'OBSERVATION:\nOn branch master\r\n\r\nNo commits yet\r\n\r\nChanges to be committed:\r\n (use "git rm --cached <file>..." to unstage)\r\n\tnew file: hello.py\r\n\r\n\r\n[Python Interpreter: /openhands/poetry/openhands-ai-5O4_aCHf-py3.12/bin/python]\nroot@openhands-workspace:/workspace # \n[Command finished with exit code 0]',
|
|
}
|
|
],
|
|
"role": "tool",
|
|
"tool_call_id": "toolu_01V1paXrun4CVetdAGiQaZG5",
|
|
"name": "execute_bash",
|
|
},
|
|
]
|
|
tools = [
|
|
{
|
|
"type": "function",
|
|
"function": {
|
|
"name": "execute_bash",
|
|
"description": 'Execute a bash command in the terminal.\n* Long running commands: For commands that may run indefinitely, it should be run in the background and the output should be redirected to a file, e.g. command = `python3 app.py > server.log 2>&1 &`.\n* Interactive: If a bash command returns exit code `-1`, this means the process is not yet finished. The assistant must then send a second call to terminal with an empty `command` (which will retrieve any additional logs), or it can send additional text (set `command` to the text) to STDIN of the running process, or it can send command=`ctrl+c` to interrupt the process.\n* Timeout: If a command execution result says "Command timed out. Sending SIGINT to the process", the assistant should retry running the command in the background.\n',
|
|
"parameters": {
|
|
"type": "object",
|
|
"properties": {
|
|
"thought": {
|
|
"type": "string",
|
|
"description": "Reasoning about the action to take.",
|
|
},
|
|
"command": {
|
|
"type": "string",
|
|
"description": "The bash command to execute. Can be empty to view additional logs when previous exit code is `-1`. Can be `ctrl+c` to interrupt the currently running process.",
|
|
},
|
|
},
|
|
"required": ["command"],
|
|
},
|
|
},
|
|
},
|
|
{
|
|
"type": "function",
|
|
"function": {
|
|
"name": "finish",
|
|
"description": "Finish the interaction.\n* Do this if the task is complete.\n* Do this if the assistant cannot proceed further with the task.\n",
|
|
},
|
|
},
|
|
{
|
|
"type": "function",
|
|
"function": {
|
|
"name": "str_replace_editor",
|
|
"description": "Custom editing tool for viewing, creating and editing files\n* State is persistent across command calls and discussions with the user\n* If `path` is a file, `view` displays the result of applying `cat -n`. If `path` is a directory, `view` lists non-hidden files and directories up to 2 levels deep\n* The `create` command cannot be used if the specified `path` already exists as a file\n* If a `command` generates a long output, it will be truncated and marked with `<response clipped>`\n* The `undo_edit` command will revert the last edit made to the file at `path`\n\nNotes for using the `str_replace` command:\n* The `old_str` parameter should match EXACTLY one or more consecutive lines from the original file. Be mindful of whitespaces!\n* If the `old_str` parameter is not unique in the file, the replacement will not be performed. Make sure to include enough context in `old_str` to make it unique\n* The `new_str` parameter should contain the edited lines that should replace the `old_str`\n",
|
|
"parameters": {
|
|
"type": "object",
|
|
"properties": {
|
|
"command": {
|
|
"description": "The commands to run. Allowed options are: `view`, `create`, `str_replace`, `insert`, `undo_edit`.",
|
|
"enum": [
|
|
"view",
|
|
"create",
|
|
"str_replace",
|
|
"insert",
|
|
"undo_edit",
|
|
],
|
|
"type": "string",
|
|
},
|
|
"path": {
|
|
"description": "Absolute path to file or directory, e.g. `/repo/file.py` or `/repo`.",
|
|
"type": "string",
|
|
},
|
|
"file_text": {
|
|
"description": "Required parameter of `create` command, with the content of the file to be created.",
|
|
"type": "string",
|
|
},
|
|
"old_str": {
|
|
"description": "Required parameter of `str_replace` command containing the string in `path` to replace.",
|
|
"type": "string",
|
|
},
|
|
"new_str": {
|
|
"description": "Optional parameter of `str_replace` command containing the new string (if not given, no string will be added). Required parameter of `insert` command containing the string to insert.",
|
|
"type": "string",
|
|
},
|
|
"insert_line": {
|
|
"description": "Required parameter of `insert` command. The `new_str` will be inserted AFTER the line `insert_line` of `path`.",
|
|
"type": "integer",
|
|
},
|
|
"view_range": {
|
|
"description": "Optional parameter of `view` command when `path` points to a file. If none is given, the full file is shown. If provided, the file will be shown in the indicated line number range, e.g. [11, 12] will show lines 11 and 12. Indexing at 1 to start. Setting `[start_line, -1]` shows all lines from `start_line` to the end of the file.",
|
|
"items": {"type": "integer"},
|
|
"type": "array",
|
|
},
|
|
},
|
|
"required": ["command", "path"],
|
|
},
|
|
},
|
|
},
|
|
]
|
|
for _ in range(2):
|
|
resp = completion(model=model, messages=messages, tools=tools)
|
|
print(resp)
|
|
|
|
if model == "claude-sonnet-4-5-20250929":
|
|
assert resp.usage.prompt_tokens_details.cached_tokens > 0
|
|
|
|
|
|
@pytest.mark.parametrize("sync_mode", [True, False])
|
|
@pytest.mark.asyncio
|
|
@pytest.mark.flaky(retries=6, delay=1)
|
|
async def test_watsonx_tool_choice(sync_mode, monkeypatch):
|
|
from litellm.llms.custom_httpx.http_handler import HTTPHandler, AsyncHTTPHandler
|
|
import json
|
|
from litellm import acompletion, completion
|
|
|
|
# Mock the IAM token generation to avoid actual API calls
|
|
monkeypatch.setenv("WATSONX_API_KEY", "mock-api-key")
|
|
monkeypatch.setenv("WATSONX_TOKEN", "mock-watsonx-token")
|
|
monkeypatch.setenv("WATSONX_API_BASE", "https://us-south.ml.cloud.ibm.com")
|
|
monkeypatch.setenv("WATSONX_PROJECT_ID", "mock-project-id")
|
|
|
|
litellm.set_verbose = True
|
|
tools = [
|
|
{
|
|
"type": "function",
|
|
"function": {
|
|
"name": "get_current_weather",
|
|
"description": "Get the current weather in a given location",
|
|
"parameters": {
|
|
"type": "object",
|
|
"properties": {
|
|
"location": {
|
|
"type": "string",
|
|
"description": "The city and state, e.g. San Francisco, CA",
|
|
},
|
|
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
|
|
},
|
|
"required": ["location"],
|
|
},
|
|
},
|
|
}
|
|
]
|
|
messages = [{"role": "user", "content": "What is the weather in San Francisco?"}]
|
|
|
|
client = HTTPHandler() if sync_mode else AsyncHTTPHandler()
|
|
with patch.object(client, "post", return_value=MagicMock()) as mock_completion:
|
|
try:
|
|
if sync_mode:
|
|
resp = completion(
|
|
model="watsonx/meta-llama/llama-3-1-8b-instruct",
|
|
messages=messages,
|
|
tools=tools,
|
|
tool_choice="auto",
|
|
client=client,
|
|
)
|
|
else:
|
|
resp = await acompletion(
|
|
model="watsonx/meta-llama/llama-3-1-8b-instruct",
|
|
messages=messages,
|
|
tools=tools,
|
|
tool_choice="auto",
|
|
client=client,
|
|
stream=True,
|
|
)
|
|
|
|
print(resp)
|
|
|
|
mock_completion.assert_called_once()
|
|
print(mock_completion.call_args.kwargs)
|
|
json_data = json.loads(mock_completion.call_args.kwargs["data"])
|
|
assert json_data["tool_choice_option"] == "auto"
|
|
except Exception as e:
|
|
print(e)
|
|
if "The read operation timed out" in str(e):
|
|
pytest.skip("Skipping test due to timeout")
|
|
else:
|
|
raise e
|