Files
litellm/tests/local_testing/test_sagemaker.py
T
Mateo Wang f9407bc036 chore(tests): migrate Bedrock CI to AWS account 941277531214 (#28728)
* chore(tests): migrate Bedrock CI from AWS account 888602223428 to 941277531214

The original account (888602223428) was put under a security restriction by
AWS after a root access key leaked in a PR comment. While that account works
its way through the AWS Support unlock process, Bedrock-touching CI tests have
been migrated to a fresh account (941277531214).

Changes:
  - Replace 26 hardcoded references to 888602223428 with 941277531214 across
    8 files (provisioned-model ARNs, imported-model ARNs, AgentCore runtime
    ARNs, batch execution role ARN, and example proxy config).
  - The provisioned-model and imported-model ARNs are referenced only from
    mocked unit tests — no AWS resources to recreate.
  - The batch execution IAM role has been recreated in the new account with
    the same name and equivalent permissions.
  - The two AgentCore runtimes (hosted_agent_r9jvp-3ySZuRHjLC,
    hosted_agent_13sf6-cALnp38iZD) are being recreated in the new account
    under the same names — see tools/agentcore-deploy/ in a follow-up.

CircleCI env vars AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY / AWS_REGION_NAME
were updated separately via the CircleCI API to point at the new account.

Smoke-tested locally against the new account:
  aws bedrock-runtime converse --region us-west-2 \
    --model-id us.anthropic.claude-sonnet-4-5-20250929-v1:0 \
    --messages '[{"role":"user","content":[{"text":"ping"}]}]'
  → 200, model returned 'pong'

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* chore(tests): refresh AgentCore ARN suffixes to match newly-deployed runtimes

The first migration commit replaced just the account ID, but AgentCore
auto-assigns a random 10-char suffix to every runtime on creation — we
can't reuse the original suffixes (`3ySZuRHjLC`, `cALnp38iZD`) in the
new account. Updated the AgentCore-runtime ARNs in the three files that
reference real runtime IDs (not the mock-based unit-test ARNs).

Deployed runtimes:
  arn:aws:bedrock-agentcore:us-west-2:941277531214:runtime/hosted_agent_r9jvp-Rq79QFC2fp
  arn:aws:bedrock-agentcore:us-west-2:941277531214:runtime/hosted_agent_13sf6-4046UzHSwy

Both runtimes are status=READY and pass a smoke invoke:
  $ aws bedrock-agentcore invoke-agent-runtime --agent-runtime-arn ... --payload '{"prompt":"ping"}'
  → 200, {"result": "echo: ping"}

The agent is a minimal echo (see /tmp/agentcore_deploy/agent.py for the
deploy artifacts). Tests that only verify the SDK wiring will pass; if any
test asserts on agent output content, swap the echo for the real agent.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* chore(tests): point Bedrock batch tests at new-account S3 bucket

The account migration (888602223428 -> 941277531214) was a flat
account-ID swap, which only rewrites ARNs that embed the account
number. S3 bucket names carry no account ID, so the live Bedrock
batch tests still uploaded to `litellm-proxy` — a bucket that lives
in the old account. S3 names are globally unique, and the old account
still holds that name, so it can't be recreated in the new account.

Rename to `litellm-proxy-941277531214` (account-ID suffix guarantees
global uniqueness). The bucket must be created in 941277531214 and the
batch execution role granted s3:GetObject/PutObject/ListBucket on it
before this job is run in CI.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(tests): point live S3 logging test at new-account bucket

Same account-ID-free blind spot as the batch bucket: `load-testing-oct`
lives in the old account and its name can't be reused globally. The
`logging_testing` CI job is wired into the workflow and runs
test_basic_s3_logging, which uploads to this bucket with the CI env
creds, then lists and deletes objects — a live dependency.

Rename to `load-testing-oct-941277531214`. The bucket must exist in the
new account with the CI IAM principal granted
s3:PutObject/GetObject/ListBucket/DeleteObject before this job runs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(tests): repoint Bedrock guardrail IDs to new-account guardrails

The migration left guardrail IDs untouched (no account ID in them), so
all live guardrail tests failed with "guardrail identifier or version
does not exist" against 941277531214. Recreated both guardrails in the
new account and updated the hardcoded IDs:
  - wf0hkdb5x07f -> zgkmukebruil (PII mask: PHONE + CREDIT_DEBIT_CARD,
    with explicit inputAction=ANONYMIZE so masking applies to INPUT,
    which is the source litellm's moderation hook sends)
  - ff6ujrregl1q -> 4w3d1di3snt5 (blocks "coffee"; blocked message set
    to the exact string the tests assert on)

Updated test_bedrock_guardrails.py, otel_test_config.yaml, and the
guardrailConfig in test_bedrock_completion.py. Verified locally: the 5
previously-failing guardrail tests now pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(bedrock): migrate legacy models to current inference profiles

The new CI account (941277531214) cannot invoke legacy Bedrock models
(AWS gates them: "marked by provider as Legacy... not actively using in
the last 30 days"). Migrated the live-call tests:
  - anthropic.claude-3-sonnet-20240229    -> us.anthropic.claude-sonnet-4-5-20250929-v1:0
  - anthropic.claude-3-haiku-20240307     -> us.anthropic.claude-haiku-4-5-20251001-v1:0
Current Claude models on Bedrock require the us. inference-profile prefix
(bare on-demand ids are rejected).

cohere.command-r-plus has no working replacement (all Cohere is legacy-
gated in the new account): swapped to claude-haiku-4-5 in provider-
agnostic param lists. amazon.titan-image-generator skipped (no working
replacement). Mocked/transformation/cost tests that reference the legacy
strings are intentionally left unchanged. Verified live against the new
account.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(bedrock): repoint SageMaker + Knowledge Base to new-account resources

These referenced account-scoped resources by hardcoded id that only
existed in the old account, so the migration's account-ID swap missed
them. Recreated in 941277531214 and repointed:
  - SageMaker endpoint jumpstart-dft-hf-textgeneration1-mp-20240815-185614
    -> litellm-ci-textgen (gpt2 on a TGI container, ml.g5.xlarge)
  - Bedrock Knowledge Base T37J8R4WTM -> LCYXFBR2TU (OpenSearch Serverless
    vector store + titan-embed-text-v2, seeded with a LiteLLM doc)
Verified live: test_sagemaker.py (12 passed) and
test_bedrock_knowledgebase_hook.py (12 passed).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(reasoning_effort_grid): skip bedrock claude-opus-4-7 cells (not entitled on 941277531214)

claude-opus-4-7 is listed in the new Bedrock CI account's foundation
models but invoke is denied (AccessDeniedException: "not available for
this account"). Bedrock access to the flagship Opus requires an AWS
Sales request, not the self-serve model-access toggle, so it can't be
enabled inline with the rest of the account migration.

Add an optional `skip_reason` to ModelEntry and set it on the
bedrock-claude-opus-4-7 entry; the grid test honors it via pytest.skip.
Cell count (231) and route coverage are unchanged, so the structural
asserts still pass. Restore coverage by deleting the one skip_reason
line once access is granted.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(bedrock): swap/skip legacy-gated models unavailable on new CI account

The migrated AWS account (941277531214) cannot access several models that
the old account could, so the remaining red CI jobs were hitting real
Bedrock "Access denied / Legacy" and "account not authorized" errors:

- image_gen: skip both Nova Canvas test classes (amazon.nova-canvas-v1:0 is
  legacy-gated), matching the existing titan skip.
- batches: skip test_async_file_and_batch (Bedrock batch inference is not
  authorized on the new account; requires an AWS support case).
- litellm_overhead: swap legacy claude-3-5-haiku for the active
  us.anthropic.claude-haiku-4-5 inference profile.
- test_completion_claude_3_function_call: swap legacy claude-3-sonnet for the
  active us.anthropic.claude-sonnet-4-5 inference profile.

https://claude.ai/code/session_01Y7zgHYu9GX29YRwV4yiWAa

* test(bedrock): fix remaining e2e legacy-model + batch failures on new CI account

- e2e_openai_endpoints: skip test_bedrock_batches_api (Bedrock batch inference
  is not authorized on account 941277531214) and migrate the missed
  s3_bucket_name in oai_misc_config.yaml to litellm-proxy-941277531214.
- build_and_test: swap legacy bedrock claude-3-sonnet for the active
  us.anthropic.claude-sonnet-4-5 inference profile in the proxy structured
  output e2e test.

https://claude.ai/code/session_01Y7zgHYu9GX29YRwV4yiWAa

* test(bedrock): make opus-4-7 + batch cells fail loudly and mock image-gen (#28791)

Replace the silent skips added for the new CI account with noisier behavior:
- reasoning-effort grid: opus-4-7 cells now fail (when AWS creds are present)
  instead of skipping, so the missing entitlement stays visible in CI; they
  still skip when AWS creds are absent (local dev)
- Bedrock batch inference tests: drop the skip so they run and fail until
  batch access is granted
- Titan + Nova Canvas image-gen tests: mock the Bedrock HTTP call so the
  transform + cost-tracking path stays under test without live model access

https://claude.ai/code/session_01MT7SWDnXUjv6e6EPG7BDjT

Co-authored-by: Claude <noreply@anthropic.com>

* test(bedrock): use pytest.xfail for known-failing opus-4-7 cells

Replace pytest.fail with pytest.xfail when a model has a fail_reason,
so known-broken cells stay visible as XFAIL without keeping CI red.

Co-authored-by: Yassin Kortam <yassin@berri.ai>

---------

Co-authored-by: Mateo <mateo@Mateos-MacBook-Pro.local>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Yassin Kortam <yassin@berri.ai>
2026-05-25 12:03:17 -07:00

487 lines
16 KiB
Python
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
import json
import os
import sys
import traceback
from dotenv import load_dotenv
load_dotenv()
import io
import os
import litellm
from test_streaming import streaming_format_tests
sys.path.insert(
0, os.path.abspath("../..")
) # Adds the parent directory to the system path
import os
from unittest.mock import AsyncMock, MagicMock, patch
import pytest
import litellm
from litellm import RateLimitError, Timeout, completion, completion_cost, embedding
from litellm.llms.custom_httpx.http_handler import AsyncHTTPHandler, HTTPHandler
from litellm.litellm_core_utils.prompt_templates.factory import anthropic_messages_pt
# litellm.num_retries =3
litellm.cache = None
litellm.success_callback = []
user_message = "Write a short poem about the sky"
messages = [{"content": user_message, "role": "user"}]
import logging
from litellm._logging import verbose_logger
def logger_fn(user_model_dict):
print(f"user_model_dict: {user_model_dict}")
@pytest.fixture(autouse=True)
def reset_callbacks():
print("\npytest fixture - resetting callbacks")
litellm.success_callback = []
litellm._async_success_callback = []
litellm.failure_callback = []
litellm.callbacks = []
@pytest.mark.asyncio()
@pytest.mark.parametrize("sync_mode", [True, False])
async def test_completion_sagemaker(sync_mode):
try:
litellm.set_verbose = True
verbose_logger.setLevel(logging.DEBUG)
print("testing sagemaker")
if sync_mode is True:
response = litellm.completion(
model="sagemaker/litellm-ci-textgen",
messages=[
{"role": "user", "content": "hi"},
],
temperature=0.2,
max_tokens=80,
input_cost_per_second=0.000420,
)
else:
response = await litellm.acompletion(
model="sagemaker/litellm-ci-textgen",
messages=[
{"role": "user", "content": "hi"},
],
temperature=0.2,
max_tokens=80,
input_cost_per_second=0.000420,
)
# Add any assertions here to check the response
print(response)
cost = completion_cost(completion_response=response)
print("calculated cost", cost)
assert (
cost > 0.0 and cost < 1.0
) # should never be > $1 for a single completion call
except Exception as e:
pytest.fail(f"Error occurred: {e}")
@pytest.mark.asyncio()
@pytest.mark.parametrize(
"sync_mode",
[True, False],
)
async def test_completion_sagemaker_messages_api(sync_mode):
try:
litellm.set_verbose = True
verbose_logger.setLevel(logging.DEBUG)
print("testing sagemaker")
from litellm.llms.custom_httpx.http_handler import AsyncHTTPHandler, HTTPHandler
if sync_mode is True:
client = HTTPHandler()
with patch.object(client, "post") as mock_post:
try:
resp = litellm.completion(
model="sagemaker_chat/huggingface-pytorch-tgi-inference-2024-08-23-15-48-59-245",
messages=[
{"role": "user", "content": "hi"},
],
temperature=0.2,
max_tokens=80,
client=client,
)
except Exception as e:
print(e)
mock_post.assert_called_once()
json_data = json.loads(mock_post.call_args.kwargs["data"])
assert (
json_data["model"]
== "huggingface-pytorch-tgi-inference-2024-08-23-15-48-59-245"
)
assert json_data["messages"] == [{"role": "user", "content": "hi"}]
assert json_data["temperature"] == 0.2
assert json_data["max_tokens"] == 80
else:
client = AsyncHTTPHandler()
with patch.object(client, "post") as mock_post:
try:
resp = await litellm.acompletion(
model="sagemaker_chat/huggingface-pytorch-tgi-inference-2024-08-23-15-48-59-245",
messages=[
{"role": "user", "content": "hi"},
],
temperature=0.2,
max_tokens=80,
num_retries=0,
client=client,
)
except Exception as e:
print(e)
mock_post.assert_called_once()
json_data = json.loads(mock_post.call_args.kwargs["data"])
assert (
json_data["model"]
== "huggingface-pytorch-tgi-inference-2024-08-23-15-48-59-245"
)
assert json_data["messages"] == [{"role": "user", "content": "hi"}]
assert json_data["temperature"] == 0.2
assert json_data["max_tokens"] == 80
except Exception as e:
pytest.fail(f"Error occurred: {e}")
@pytest.mark.asyncio()
@pytest.mark.parametrize("sync_mode", [False, True])
@pytest.mark.parametrize(
"model",
[
# "sagemaker_chat/huggingface-pytorch-tgi-inference-2024-08-23-15-48-59-245",
"sagemaker/litellm-ci-textgen",
],
)
# @pytest.mark.flaky(retries=3, delay=1)
async def test_completion_sagemaker_stream(sync_mode, model):
try:
litellm.set_verbose = False
print("testing sagemaker")
verbose_logger.setLevel(logging.DEBUG)
full_text = ""
if sync_mode is True:
response = litellm.completion(
model=model,
messages=[
{"role": "user", "content": "hi - what is ur name"},
],
temperature=0.2,
stream=True,
max_tokens=80,
input_cost_per_second=0.000420,
)
for idx, chunk in enumerate(response):
print(chunk)
streaming_format_tests(idx=idx, chunk=chunk)
full_text += chunk.choices[0].delta.content or ""
print("SYNC RESPONSE full text", full_text)
else:
response = await litellm.acompletion(
model=model,
messages=[
{"role": "user", "content": "hi - what is ur name"},
],
stream=True,
temperature=0.2,
max_tokens=80,
input_cost_per_second=0.000420,
)
print("streaming response")
idx = 0
async for chunk in response:
print(chunk)
streaming_format_tests(idx=idx, chunk=chunk)
full_text += chunk.choices[0].delta.content or ""
idx += 1
print("ASYNC RESPONSE full text", full_text)
except Exception as e:
pytest.fail(f"Error occurred: {e}")
@pytest.mark.asyncio()
@pytest.mark.parametrize("sync_mode", [False, True])
@pytest.mark.parametrize(
"model",
[
# "sagemaker_chat/huggingface-pytorch-tgi-inference-2024-08-23-15-48-59-245",
"sagemaker/litellm-ci-textgen",
],
)
async def test_completion_sagemaker_streaming_bad_request(sync_mode, model):
litellm.set_verbose = True
print("testing sagemaker")
if sync_mode is True:
with pytest.raises(litellm.BadRequestError):
response = litellm.completion(
model=model,
messages=[
{"role": "user", "content": "hi"},
],
stream=True,
max_tokens=8000000000000000,
)
else:
with pytest.raises(litellm.BadRequestError):
response = await litellm.acompletion(
model=model,
messages=[
{"role": "user", "content": "hi"},
],
stream=True,
max_tokens=8000000000000000,
)
@pytest.mark.asyncio
async def test_acompletion_sagemaker_non_stream():
mock_response = AsyncMock()
def return_val():
return {
"generated_text": "This is a mock response from SageMaker.",
"id": "cmpl-mockid",
"object": "text_completion",
"created": 1629800000,
"model": "sagemaker/litellm-ci-textgen",
"choices": [
{
"text": "This is a mock response from SageMaker.",
"index": 0,
"logprobs": None,
"finish_reason": "length",
}
],
"usage": {"prompt_tokens": 1, "completion_tokens": 8, "total_tokens": 9},
}
mock_response.json = return_val
mock_response.status_code = 200
expected_payload = {
"inputs": "hi",
"parameters": {"temperature": 0.2, "max_new_tokens": 80},
}
with patch(
"litellm.llms.custom_httpx.http_handler.AsyncHTTPHandler.post",
return_value=mock_response,
) as mock_post:
# Act: Call the litellm.acompletion function
response = await litellm.acompletion(
model="sagemaker/litellm-ci-textgen",
messages=[
{"role": "user", "content": "hi"},
],
temperature=0.2,
max_tokens=80,
input_cost_per_second=0.000420,
)
# Print what was called on the mock
print("call args=", mock_post.call_args)
# Assert
mock_post.assert_called_once()
_, kwargs = mock_post.call_args
args_to_sagemaker = json.loads(kwargs["data"])
print("Arguments passed to sagemaker=", args_to_sagemaker)
assert args_to_sagemaker == expected_payload
assert (
kwargs["url"]
== "https://runtime.sagemaker.us-west-2.amazonaws.com/endpoints/litellm-ci-textgen/invocations"
)
@pytest.mark.asyncio
async def test_completion_sagemaker_non_stream():
mock_response = MagicMock()
def return_val():
return {
"generated_text": "This is a mock response from SageMaker.",
"id": "cmpl-mockid",
"object": "text_completion",
"created": 1629800000,
"model": "sagemaker/litellm-ci-textgen",
"choices": [
{
"text": "This is a mock response from SageMaker.",
"index": 0,
"logprobs": None,
"finish_reason": "length",
}
],
"usage": {"prompt_tokens": 1, "completion_tokens": 8, "total_tokens": 9},
}
mock_response.json = return_val
mock_response.status_code = 200
expected_payload = {
"inputs": "hi",
"parameters": {"temperature": 0.2, "max_new_tokens": 80},
}
with patch(
"litellm.llms.custom_httpx.http_handler.HTTPHandler.post",
return_value=mock_response,
) as mock_post:
# Act: Call the litellm.acompletion function
response = litellm.completion(
model="sagemaker/litellm-ci-textgen",
messages=[
{"role": "user", "content": "hi"},
],
temperature=0.2,
max_tokens=80,
input_cost_per_second=0.000420,
)
# Print what was called on the mock
print("call args=", mock_post.call_args)
# Assert
mock_post.assert_called_once()
_, kwargs = mock_post.call_args
args_to_sagemaker = json.loads(kwargs["data"])
print("Arguments passed to sagemaker=", args_to_sagemaker)
assert args_to_sagemaker == expected_payload
assert (
kwargs["url"]
== "https://runtime.sagemaker.us-west-2.amazonaws.com/endpoints/litellm-ci-textgen/invocations"
)
@pytest.mark.asyncio
@pytest.mark.flaky(retries=3, delay=1)
async def test_completion_sagemaker_prompt_template_non_stream():
mock_response = MagicMock()
def return_val():
return {
"generated_text": "This is a mock response from SageMaker.",
"id": "cmpl-mockid",
"object": "text_completion",
"created": 1629800000,
"model": "sagemaker/litellm-ci-textgen",
"choices": [
{
"text": "This is a mock response from SageMaker.",
"index": 0,
"logprobs": None,
"finish_reason": "length",
}
],
"usage": {"prompt_tokens": 1, "completion_tokens": 8, "total_tokens": 9},
}
mock_response.json = return_val
mock_response.status_code = 200
expected_payload = {
"inputs": "<begin▁of▁sentence>You are an AI programming assistant, utilizing the Deepseek Coder model, developed by Deepseek Company, and you only answer questions related to computer science. For politically sensitive questions, security and privacy issues, and other non-computer science questions, you will refuse to answer\n\n### Instruction:\nhi\n\n\n### Response:\n",
"parameters": {"temperature": 0.2, "max_new_tokens": 80},
}
with patch(
"litellm.llms.custom_httpx.http_handler.HTTPHandler.post",
return_value=mock_response,
) as mock_post:
# Act: Call the litellm.acompletion function
response = litellm.completion(
model="sagemaker/deepseek_coder_6.7_instruct",
messages=[
{"role": "user", "content": "hi"},
],
temperature=0.2,
max_tokens=80,
hf_model_name="deepseek-ai/deepseek-coder-6.7b-instruct",
)
# Print what was called on the mock
print("call args=", mock_post.call_args)
# Assert
mock_post.assert_called_once()
_, kwargs = mock_post.call_args
args_to_sagemaker = json.loads(kwargs["data"])
print("Arguments passed to sagemaker=", args_to_sagemaker)
assert args_to_sagemaker == expected_payload
@pytest.mark.asyncio
async def test_completion_sagemaker_non_stream_with_aws_params():
mock_response = MagicMock()
def return_val():
return {
"generated_text": "This is a mock response from SageMaker.",
"id": "cmpl-mockid",
"object": "text_completion",
"created": 1629800000,
"model": "sagemaker/litellm-ci-textgen",
"choices": [
{
"text": "This is a mock response from SageMaker.",
"index": 0,
"logprobs": None,
"finish_reason": "length",
}
],
"usage": {"prompt_tokens": 1, "completion_tokens": 8, "total_tokens": 9},
}
mock_response.json = return_val
mock_response.status_code = 200
expected_payload = {
"inputs": "hi",
"parameters": {"temperature": 0.2, "max_new_tokens": 80},
}
with patch(
"litellm.llms.custom_httpx.http_handler.HTTPHandler.post",
return_value=mock_response,
) as mock_post:
# Act: Call the litellm.acompletion function
response = litellm.completion(
model="sagemaker/litellm-ci-textgen",
messages=[
{"role": "user", "content": "hi"},
],
temperature=0.2,
max_tokens=80,
input_cost_per_second=0.000420,
aws_access_key_id="gm",
aws_secret_access_key="s",
aws_region_name="us-west-5",
)
# Print what was called on the mock
print("call args=", mock_post.call_args)
# Assert
mock_post.assert_called_once()
_, kwargs = mock_post.call_args
args_to_sagemaker = json.loads(kwargs["data"])
print("Arguments passed to sagemaker=", args_to_sagemaker)
assert args_to_sagemaker == expected_payload
assert (
kwargs["url"]
== "https://runtime.sagemaker.us-west-5.amazonaws.com/endpoints/litellm-ci-textgen/invocations"
)