fix(team): keep team_alias cache in sync on _cache_team_object writes (#28737)

* fix(team): keep team_alias cache in sync on _cache_team_object writes _cache_team_object wrote only to the team_id:<id> cache key, but the JWT auth path that uses team_alias_jwt_field reads from a separate team_alias:<alias> key (get_team_object_by_alias caches under both keys on miss, but reads only the alias-keyed one). After any team-mutation endpoint (team_model_add, team_model_delete, update_team, the two access-group writes) the team_id cache was refreshed but the team_alias cache stayed stale until TTL — JWT callers using team_alias_jwt_field kept seeing the pre-mutation team for the full cache window. Mirror the write under the alias key inside _cache_team_object so every existing caller stays in sync without further changes. Skip the alias write when team_alias is None/empty so we don't collide across alias-less teams. Surfaced testing the LIT-3244 cherry-pick on patch/1.86.0: the LIT-3244 fix correctly invalidated the team_id cache but the customer's JWT used team_alias_jwt_field, so they kept hitting the stale alias-keyed entry. * fix(team): delete (not overwrite) team_alias cache on _cache_team_object The prior shape of this PR wrote both team_id:<id> AND team_alias:<alias> from _cache_team_object. team_alias is NOT unique in the schema (no @unique on LiteLLM_TeamTable.team_alias), and get_team_object_by_alias enforces uniqueness on its own DB-fetch path (len(teams) > 1 raises). Writing the alias-keyed cache from the generic refresh path bypassed that check: a team admin renaming their team to collide with another team's alias could silently overwrite the cached team for JWT-by-alias auth, swapping the resolved team under that alias for the cache window. Switch the alias-keyed operation from a write to a delete (mirroring the dual-cache delete pattern in _delete_cache_key_object). After every team write, the next JWT-by-alias reader cache-misses and falls through to get_team_object_by_alias, which (a) re-fetches the fresh team from DB, closing the LIT-3244 staleness gap that motivated this PR, and (b) enforces alias uniqueness before populating either cache key. team_id:<id> writes are unchanged — team_id is the table PK and is guaranteed unique. Surfaced in veria-ai review on #28739. * fix(managed-files): anchor model_id regex so it doesn't match llm_output_file_model_id extract_model_id_from_unified_id used `re.search(r"model_id,([^;]+)", ...)` which substring-matches the `model_id,` inside the file-ID encoding's `llm_output_file_model_id,<deployment_uuid>` field. parse_unified_id then fed that deployment UUID back into the auth path as a model candidate via _extract_models_from_managed_resource_id, and every team-BYOK file attach 403'd with: team not allowed to access model. This team can only access models=['openai/*']. Tried to access <deployment-uuid> The team's models list correctly contains the public name (`openai/*`) that target_model_names matches, but the bogus UUID candidate fails the wildcard check first. Anchor the regex to a field boundary (`(?:^|;)model_id,`) so it matches the legitimate top-level `model_id,<value>` field on vector_store unified IDs and skips substring matches inside other fields. File-IDs (which have no top-level `model_id` field) now return None and contribute no spurious UUID candidate. Surfaced reproducing LIT-3244 on patch/1.86.0 with the customer's exact flow: team with openai/* BYOK deployment, JWT-scoped user, POST /v1/vector_stores/{id}/files attaching a file uploaded with target_model_names=openai/gpt-4o.
2026-08-02 12:21:10 +00:00 · 2026-05-25 19:16:36 -07:00
parent 0d5040fc06
commit 7cd98508e7
4 changed files with 264 additions and 5 deletions
@@ -177,8 +177,14 @@ def extract_model_id_from_unified_id(
        if decoded_id:
            unified_id = decoded_id

-        # Extract model ID
-        match = re.search(r"model_id,([^;]+)", unified_id)
+        # Extract model ID. Anchor to a field boundary (start of string or
+        # after `;`) so this regex doesn't substring-match the `model_id,`
+        # inside file_id encodings' `llm_output_file_model_id,<deployment_uuid>`
+        # field — that would feed the deployment UUID as a model candidate
+        # into the team-access check and 403 every team-BYOK file attach
+        # with `Tried to access <uuid>` (LIT-3244 patch/1.86.0 second-order
+        # finding).
+        match = re.search(r"(?:^|;)model_id,([^;]+)", unified_id)
        if match:
            return match.group(1).strip()

@@ -1765,19 +1765,39 @@ async def _cache_team_object(
    user_api_key_cache: UserApiKeyCache,
    proxy_logging_obj: Optional[ProxyLogging],
 ):
-    key = "team_id:{}".format(team_id)
-
    ## CACHE REFRESH TIME!
    team_table.last_refreshed_at = time.time()

+    # team_id is the table primary key — guaranteed unique, safe to write.
    await _cache_management_object(
-        key=key,
+        key="team_id:{}".format(team_id),
        value=team_table,
        user_api_key_cache=user_api_key_cache,
        proxy_logging_obj=proxy_logging_obj,
        model_type=LiteLLM_TeamTableCachedObj,
    )

+    # Invalidate the alias-keyed cache so the JWT auth path with
+    # `team_alias_jwt_field` (which reads via `get_team_object_by_alias`)
+    # doesn't keep serving the pre-mutation team after every team-write
+    # endpoint (team_model_add, team_model_delete, update_team, etc.).
+    #
+    # Why DELETE and not WRITE: `team_alias` has no UNIQUE constraint in
+    # schema.prisma. Writing this cache from the generic refresh path
+    # would let a team admin who renamed their team to collide with
+    # another team's alias silently overwrite the cached team for
+    # JWT-by-alias auth (veria-ai review on #28739). Deleting forces the
+    # next reader through `get_team_object_by_alias`, which DOES enforce
+    # uniqueness (len(teams) > 1 raises HTTPException) before populating
+    # the cache from a verified single row.
+    if team_table.team_alias:
+        alias_key = "team_alias:{}".format(team_table.team_alias)
+        user_api_key_cache.delete_cache(key=alias_key)
+        if proxy_logging_obj is not None:
+            await proxy_logging_obj.internal_usage_cache.dual_cache.async_delete_cache(
+                key=alias_key
+            )
+

 async def _cache_key_object(
    hashed_token: str,
@@ -0,0 +1,134 @@
+"""
+Tests for `litellm.llms.base_llm.managed_resources.utils.extract_model_id_from_unified_id`.
+
+The regex inside this helper is shared by both the vector-store unified-ID
+format (`...;model_id,<value>;...`) and the file-ID format (`...;llm_output_file_model_id,<uuid>`).
+A naive regex (`r"model_id,([^;]+)"`) substring-matches the latter and
+returns the deployment UUID, which then gets fed as a model candidate
+into the team-access check and 403s every team-BYOK file attach
+(LIT-3244 patch/1.86.0 second-order finding). These tests pin the
+field-boundary anchor that prevents that.
+"""
+
+import pytest
+
+from litellm.llms.base_llm.managed_resources.utils import (
+    encode_unified_id,
+    extract_model_id_from_unified_id,
+)
+
+# ---------------------------------------------------------------------------
+# Vector-store unified-ID shape — has a top-level `model_id,<value>` field.
+# Existing behavior must be preserved: returns the value.
+# ---------------------------------------------------------------------------
+
+
+def test_extract_model_id_returns_value_for_vector_store_unified_id():
+    unified_id = (
+        "litellm_proxy:vector_store"
+        ";unified_id,abc-123"
+        ";target_model_names,gpt-4,gemini"
+        ";resource_id,vs_xyz"
+        ";model_id,deployment-uuid-456"
+    )
+    assert extract_model_id_from_unified_id(unified_id) == "deployment-uuid-456"
+
+
+def test_extract_model_id_returns_value_when_field_is_first():
+    """`model_id` is the very first field after the prefix (anchor must accept start-of-string)."""
+    unified_id = "litellm_proxy:vector_store;model_id,first-field-value;unified_id,abc"
+    # First field after the prefix is preceded by `;`, so it matches via the
+    # `;model_id,` branch. Pin that the anchor isn't accidentally too strict.
+    assert extract_model_id_from_unified_id(unified_id) == "first-field-value"
+
+
+# ---------------------------------------------------------------------------
+# File-ID shape — has `llm_output_file_model_id,<uuid>` but no top-level
+# `model_id,` field. Must return None (the previous regex would have
+# substring-matched and returned the deployment UUID).
+# ---------------------------------------------------------------------------
+
+
+def test_extract_model_id_returns_none_for_file_id_without_model_id_field():
+    """Regression pin for LIT-3244 patch/1.86.0.
+
+    File-IDs constructed via `LITELLM_MANAGED_FILE_COMPLETE_STR` have
+    `llm_output_file_model_id,<deployment_uuid>` but no top-level
+    `model_id,` field. The previous regex matched the substring and
+    returned the UUID, which then 403'd team-BYOK file attaches with
+    `Tried to access <uuid>`.
+    """
+    file_id = (
+        "litellm_proxy:text/plain"
+        ";unified_id,file-uuid-123"
+        ";target_model_names,openai/gpt-4o"
+        ";llm_output_file_id,file-OpenAIReturnedId"
+        ";llm_output_file_model_id,813bf25f-e5a7-4658-8253-a6f677be8eb5"
+    )
+    assert extract_model_id_from_unified_id(file_id) is None, (
+        "File-ID has no top-level `model_id,` field — the deployment UUID "
+        "in `llm_output_file_model_id,` must NOT be returned. Returning it "
+        "feeds the UUID as a model candidate into the team-access check "
+        "and 403s every team-BYOK file attach (LIT-3244 patch/1.86.0)."
+    )
+
+
+def test_extract_model_id_returns_none_for_file_id_with_model_id_value_null():
+    """The current file-ID builder writes `llm_output_file_model_id,None`
+    (the Python `None` stringified) when the upstream model_id isn't known.
+    Still no top-level `model_id,` field → must return None.
+    """
+    file_id = (
+        "litellm_proxy:text/plain"
+        ";unified_id,uuid"
+        ";target_model_names,openai/gpt-4o"
+        ";llm_output_file_id,file-Y"
+        ";llm_output_file_model_id,None"
+    )
+    assert extract_model_id_from_unified_id(file_id) is None
+
+
+# ---------------------------------------------------------------------------
+# Base64-encoded inputs must decode and apply the same anchor.
+# ---------------------------------------------------------------------------
+
+
+def test_extract_model_id_decodes_base64_then_anchors():
+    file_id_plain = (
+        "litellm_proxy:text/plain"
+        ";unified_id,uuid"
+        ";target_model_names,openai/gpt-4o"
+        ";llm_output_file_id,file-Y"
+        ";llm_output_file_model_id,813bf25f-e5a7-4658-8253-a6f677be8eb5"
+    )
+    encoded = encode_unified_id(file_id_plain)
+    assert extract_model_id_from_unified_id(encoded) is None
+
+    vector_store_plain = (
+        "litellm_proxy:vector_store"
+        ";unified_id,abc"
+        ";target_model_names,gpt-4"
+        ";resource_id,vs_xyz"
+        ";model_id,real-model-id"
+    )
+    encoded_vs = encode_unified_id(vector_store_plain)
+    assert extract_model_id_from_unified_id(encoded_vs) == "real-model-id"
+
+
+# ---------------------------------------------------------------------------
+# Defensive: malformed / non-string inputs must not raise.
+# ---------------------------------------------------------------------------
+
+
+@pytest.mark.parametrize("bad_input", [None, 42, b"bytes-not-str", []])
+def test_extract_model_id_returns_none_for_non_string_input(bad_input):
+    assert extract_model_id_from_unified_id(bad_input) is None  # type: ignore[arg-type]
+
+
+def test_extract_model_id_returns_none_when_field_absent():
+    assert (
+        extract_model_id_from_unified_id(
+            "litellm_proxy:other;unified_id,abc;some_field,whatever"
+        )
+        is None
+    )
@@ -3370,3 +3370,102 @@ async def test_resolve_end_user_reraises_budget_exceeded(
            prisma_client=MagicMock(),
            user_api_key_cache=cache,
        )
+
+
+@pytest.mark.asyncio
+async def test_cache_team_object_writes_team_id_and_invalidates_team_alias():
+    """
+    Regression pin for LIT-3244 patch/1.86.0 follow-up.
+
+    `_cache_team_object` is the canonical "refresh this team" primitive.
+    Two cache keys are in play:
+      - "team_id:<id>"    — used by `get_team_object(team_id=...)`,
+                            i.e. API-key auth and JWT-with-team_id_jwt_field
+      - "team_alias:<alias>" — used by `get_team_object_by_alias(team_alias=...)`,
+                               i.e. JWT-with-team_alias_jwt_field
+
+    Invariants this test pins:
+      1. Writes the team_id-keyed entry with the refreshed object (team_id
+         is the table PK — guaranteed unique, safe to write).
+      2. DELETES (does NOT write) the team_alias-keyed entry. `team_alias`
+         has no UNIQUE constraint in schema.prisma, so writing it from
+         this generic refresh path would let a team admin who renames
+         their team to collide with another team's alias silently
+         overwrite the cached team for JWT-by-alias auth (veria-ai
+         review on #28739). Deleting forces the next JWT-by-alias
+         reader through `get_team_object_by_alias`, which enforces
+         len(teams)==1 before populating the cache.
+      3. When team_alias is None, NO alias-key operation happens (no
+         delete of an empty-keyed entry, no spurious write).
+    """
+    from unittest.mock import AsyncMock, MagicMock
+
+    from litellm.proxy._types import LiteLLM_TeamTableCachedObj
+    from litellm.proxy.auth.auth_checks import _cache_team_object
+
+    base_team_row = {
+        "team_id": "team-1234",
+        "team_alias": "H-Capacity",
+        "models": ["openai/*", "bedrock-claude-sonnet-4"],
+    }
+
+    # ===== team_alias is set =====
+    team_table = LiteLLM_TeamTableCachedObj(**base_team_row)
+    cache = MagicMock()
+    cache.async_set_cache = AsyncMock()
+    cache.delete_cache = MagicMock()
+    logging_obj = MagicMock()
+    logging_obj.internal_usage_cache.dual_cache.async_delete_cache = AsyncMock()
+
+    await _cache_team_object(
+        team_id="team-1234",
+        team_table=team_table,
+        user_api_key_cache=cache,
+        proxy_logging_obj=logging_obj,
+    )
+
+    # (1) team_id-keyed write fires with the refreshed object
+    written_keys = [
+        (c.kwargs.get("key") or c.args[0])
+        for c in cache.async_set_cache.await_args_list
+    ]
+    assert written_keys == ["team_id:team-1234"], (
+        "Only the team_id-keyed write should fire; the alias key must be "
+        "deleted, NOT written. "
+        f"Got writes: {written_keys}"
+    )
+    written_value = (
+        cache.async_set_cache.await_args.kwargs.get("value")
+        or cache.async_set_cache.await_args.args[1]
+    )
+    assert written_value is team_table
+
+    # (2) team_alias-keyed entry is deleted in BOTH the in-memory cache
+    # and the Redis dual cache (mirrors _delete_cache_key_object pattern).
+    cache.delete_cache.assert_called_once_with(key="team_alias:H-Capacity")
+    logging_obj.internal_usage_cache.dual_cache.async_delete_cache.assert_awaited_once_with(
+        key="team_alias:H-Capacity"
+    )
+
+    # ===== team_alias is None: no alias-key operation =====
+    aliasless = LiteLLM_TeamTableCachedObj(**{**base_team_row, "team_alias": None})
+    cache2 = MagicMock()
+    cache2.async_set_cache = AsyncMock()
+    cache2.delete_cache = MagicMock()
+    logging_obj2 = MagicMock()
+    logging_obj2.internal_usage_cache.dual_cache.async_delete_cache = AsyncMock()
+
+    await _cache_team_object(
+        team_id="team-no-alias",
+        team_table=aliasless,
+        user_api_key_cache=cache2,
+        proxy_logging_obj=logging_obj2,
+    )
+
+    cache2.delete_cache.assert_not_called()
+    logging_obj2.internal_usage_cache.dual_cache.async_delete_cache.assert_not_awaited()
+    written_keys_aliasless = [
+        (c.kwargs.get("key") or c.args[0])
+        for c in cache2.async_set_cache.await_args_list
+    ]
+    assert written_keys_aliasless == ["team_id:team-no-alias"]