litellm/tests/test_litellm/proxy/memory/__init__.py at 8f41fdbd218341fa4fec33e6df2d6a4e8cd65a8f - litellm - Gitea: Git with a cup of tea

tiennm99/litellm

mirror of https://github.com/tiennm99/litellm.git synced 2026-06-17 14:48:44 +00:00

Files

T

Krrish Dholakia 70492cee42 feat(proxy): add /v1/memory CRUD endpoints (#26218 )

* feat(proxy): add /v1/memory CRUD endpoints with user/team scoping

New LiteLLM_MemoryTable stores user/team-scoped key/value entries with
optional JSON metadata. Value is a String (LLM-readable text) and metadata
is an optional Json? envelope, matching the Letta + mem0 hybrid model so
future structured fields can be added without a schema migration.

Endpoints:
  POST   /v1/memory         - create
  GET    /v1/memory         - list (caller-scoped; admins see all)
  GET    /v1/memory/{key}   - fetch one
  PUT    /v1/memory/{key}   - upsert
  DELETE /v1/memory/{key}   - delete

Non-admin callers cannot set a user_id/team_id other than their own.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(proxy/memory): omit metadata field when None on create

Prisma's Python client rejects `metadata=None` on a `Json?` field with
"A value is required but not set" — the field must be omitted from the
`data` dict entirely to store SQL NULL. Build the create payload
conditionally in both `create_memory` and the PUT-create branch of
`upsert_memory`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(ui): add Memory page to view/manage /v1/memory entries

Adds a new "Memory" sidebar item under Tools so users can see what their
agents have stored. Lists all memories visible to the caller (scoped by
the backend), with a key-search filter, preview column, scope tags, and
view/edit/delete actions. Create modal accepts optional JSON metadata.

- networking.tsx: fetchMemoryList / createMemory / updateMemory / deleteMemory
  wired to the /v1/memory CRUD endpoints.
- MemoryView + MemoryEditModal: new antd-based components (per CLAUDE.md:
  use antd for new UI, not tremor).
- page.tsx + leftnav.tsx: wire the "memory" route + sidebar entry.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(memory): add key_prefix filter + promote Memory to AI GATEWAY nav

Backend:
- GET /v1/memory now accepts `key_prefix` for Redis-style namespace
  scans (e.g. `?key_prefix=user:`). When both `key` and `key_prefix`
  are passed, `key_prefix` wins.
- Prefix filter sits under the visibility filter in the Prisma where
  clause, so it can never leak rows across user/team scopes.
- New tests: prefix match, and cross-scope isolation (another user's
  `user:*` rows must not appear in the caller's results).

UI:
- Memory moved from a Tools submenu to a top-level AI GATEWAY item
  (alongside Agents, MCP Servers, Skills) — it's an API primitive,
  not a tool-management surface.
- Search box now drives prefix search, matching the Redis mental
  model ("type the namespace, see everything under it").

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(memory): enforce unique key per scope by using NULLS NOT DISTINCT

The unique constraint `(key, user_id, team_id)` on LiteLLM_MemoryTable
silently allowed duplicates when user_id or team_id was NULL, because
Postgres treats every NULL as distinct by default (ANSI semantics). A
caller with no team_id could POST the same key three times and get
three rows.

Migration:
1. Dedupe existing rows, keeping the most recent per (key, user_id,
   team_id), using `IS NOT DISTINCT FROM` so NULL == NULL.
2. Drop the old unique index.
3. Recreate it with `NULLS NOT DISTINCT` (Postgres 15+).

No code change: POST already returns 409 on unique-violation error
messages — it just wasn't firing before because the constraint didn't
catch the NULL-team case.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(memory): make key globally unique, 409 on any duplicate

Switches from the compound unique `(key, user_id, team_id)` to a simple
`key @unique`. The compound form silently allowed duplicates when
user_id or team_id was NULL (Postgres treats each NULL as distinct), so
callers could POST the same key repeatedly. Globally-unique key means
one row per key, period — any duplicate create → 409.

- schema.prisma (×3): `key String @unique`, drop `@@unique(...)`.
- initial add_memory_table migration: unique index on (key) only.
- Remove the now-unused follow-up NULLS NOT DISTINCT migration.
- Endpoint error message simplified ("already exists" — no "for this scope").
- Test fake's create() now enforces global key uniqueness.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ui/memory): full-width layout + user/teams-style columns

- Add `w-full` to the MemoryView outer div so the page fills the
  flex-flex-1 container (was collapsing to intrinsic width).
- Replace the combined "Scope" column with separate User ID / Team ID
  columns, matching the layout of the Users / Teams pages: ID, Name,
  Preview, User ID, Team ID, Updated, Actions.
- IDs render with a truncated mono label + copy-to-clipboard button,
  same pattern as view_users.
- Detail drawer now shows Memory ID / User ID / Team ID as separate
  fields instead of stacked color tags.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ui/memory): use clean MCP-style ID pill, drop copy icons

The ID / User ID / Team ID columns showed a mono text blob with a
copy-to-clipboard icon next to each value — too busy compared to the
MCP Servers page. Swap the renderer for MCP's pill style:

- Truncated mono ID inside a blue Tailwind pill
  (`font-mono text-blue-600 bg-blue-50 ... rounded-md border`).
- No copy icon. Full ID surfaces via tooltip.
- ID column is a button that opens the detail drawer on click;
  user/team ID pills are static (not clickable).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(memory): address greptile review feedback

Addresses 5 greptile findings (3/5 → higher confidence target):

1. Identity-less orphan rows (P1): non-admin callers with no user_id AND
   no team_id could create rows that the visibility filter would never
   match again. Now rejected up front with 400 — caller must authenticate
   with a scoped key or act as PROXY_ADMIN.

2. Upsert race returning 500 (P1): PUT's check-then-create isn't atomic;
   a concurrent writer could slip a row in between the 404-check and the
   create call. Now catch unique-violation on create, re-read, and fall
   through to update — PUT stays idempotent. If the conflicting row
   belongs to a different scope, surface a 409 instead of 500.

3. PUT-create scope inconsistency (P2): PUT's create branch always used
   the caller's own user_id/team_id, so admins couldn't bootstrap rows
   scoped elsewhere via PUT (only POST). Now PUT-create calls the shared
   `_resolve_scope()` helper, matching POST semantics.

4. Stale schema comment (P2): schema said "Keyed by (key, user_id,
   team_id)" but `key` is globally unique. Updated all three schema
   copies to reflect the actual design.

5. UI silently truncated at 200 (P2): MemoryView fetched pageSize=200
   with no load-more. Swapped to real server-side pagination driven by
   `data.total`; page size is now 50 and the pager is a real AntD
   control.

Also extracts a shared `_resolve_scope()` helper and `_is_unique_violation()`
from create_memory so POST and PUT don't drift on the scope/error logic.

Tests: +3 new (identity-less 400, PUT admin bootstrap, PUT race →
update), 18/18 pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(memory): typed Prisma error + explicit-null metadata on PUT

Two more greptile threads from the last review:

- Unique-violation detection was string-matching "Unique"/"UniqueViolation"
  in the exception message, fragile across Prisma/driver versions. Now
  check the typed error `code == "P2002"` first, with string fallback.

- PUT could not distinguish "metadata omitted" from "metadata: null" —
  both parsed as `None`, so callers had no way to clear stored metadata.
  Switch to Pydantic v2's `model_fields_set` to tell which fields the
  caller actually sent; explicit null now clears the column.

New tests:
- explicit null clears metadata
- omitted metadata preserves existing value

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ui/memory): send explicit null when user clears metadata

Addresses the remaining P1 from the last greptile review:

When the edit modal's metadata textarea was cleared and saved,
`metadataParsed` stayed `undefined`, `JSON.stringify` dropped the key
entirely, and the backend's `model_fields_set` guard therefore left
the stored metadata untouched — UI showed success but nothing changed.

Now: empty textarea on edit → send explicit `null` so the backend
sees `metadata` in `model_fields_set` and clears the column.
Empty textarea on create still maps to `undefined` (field omitted)
to avoid Prisma's `Json? = None` quirk on insert.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ui/memory): preserve slashes in key path encoding

The backend route `/v1/memory/{key:path}` supports keys with slashes,
but `encodeURIComponent` encoded `/` as `%2F`. Some proxies (nginx
default, CloudFlare, AWS ALB) reject or re-decode `%2F` mid-flight,
so UI update/delete calls on slash-containing keys could fail or
silently misroute.

New helper `encodeMemoryKeyForPath` splits by `/`, URL-encodes each
segment, then rejoins with literal `/`. Every other unsafe char
(spaces, `?`, `#`, `%`) stays encoded per-segment; slashes stay as
path delimiters, matching what the `:path` converter expects.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ui/memory): drop misleading client-side column sorters

With server-side pagination, client sorters on `key` and `updated_at`
only reorder the current page while pretending to sort the full
dataset — users would see "sorted by name" but only the visible 50
rows would actually be sorted.

Remove the sorters. The backend already returns rows in
`updated_at DESC` order (sensible default for a memory view), and
users can narrow the result with the key-prefix filter.

Greptile also flagged missing `@@map` on the new model as a
"consistency" issue, but only 1 of 59 tables in this repo uses
`@@map` — the dominant pattern is to rely on Prisma's default
(model name == table name). Skipping that finding as a
false-positive on convention.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(memory): compose visibility + key filters via explicit AND

Greptile P1 (filter-fragility): `where.update(vis)` was semantically
correct today, but dict-merging by key meant any future visibility
filter that grew a new top-level "OR" would silently clobber the
existing key filter.

Compose explicitly instead:

    where = {"AND": [key_filter, vis]}

Applied to both `list_memory` and `_find_memory_for_caller`. When
either side is empty (admin has no visibility filter; list has no
key filter), skip the wrapper and use the non-empty side directly
to keep the generated SQL clean.

Test fake's `_matches` now understands top-level `AND` too.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(ui/memory): wrap write helpers with react-query useMutation

Previously the Memory view read via `useQuery` but called the raw
create/update/delete fetch helpers directly in handlers, tracking
loading state with a local `submitting` flag and invalidating state
via `refetch()`. That mixes two concerns:

- it skips react-query's mutation state (isPending / isError / isSuccess)
- `refetch()` only retouches the currently-mounted query instance, not
  other cached pages, so navigating back to an older page could show
  stale rows

Switch the three write paths to `useMutation`:

- `createMutation`, `updateMutation`, `deleteMutation` — each owns
  the mutation fn, success toast, and error toast.
- Success handlers invalidate the whole `["memoryList", ...]` prefix
  via `queryClient.invalidateQueries`, so every cached page refetches
  (pagination + filter-aware).
- Refresh button now invalidates instead of `refetch()`, keeping all
  behavior consistent.
- handleSave/handleDelete become thin adapters that call `.mutateAsync`;
  their errors are swallowed locally since the mutation's onError has
  already surfaced the toast.

Also tightened the edit modal's key-field tooltip to reflect the
actual global-unique semantics (was "Unique per user/team scope").

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(memory): close cross-user write gap + sanitize 500 errors (Veria)

Addresses two Veria findings:

**High — cross-user memory tampering via team membership.** The
visibility filter uses an OR (`user_id == caller OR team_id == caller`)
so team members can SEE each other's team-scoped rows. That's
intentional for list/get. But because PUT/DELETE used the same filter
to find the target row, any team member could overwrite or delete a
teammate's *personal* row whenever both `user_id` and `team_id` were
stamped on it — broader visibility was being silently treated as
broader authority.

New `_assert_write_access(row, caller)` enforces ownership for
mutations. Non-admin rules:

- The row's `user_id` must match the caller (personal ownership), OR
- The row has no `user_id` and its `team_id` matches the caller's
  team (a "pure team row" intended for shared writes).

Admins bypass the check. The same gate runs in PUT (both regular
and post-race-recovery branches) and DELETE.

**Medium — DB internals leaked through 500 detail.** Every `except`
block was raising `HTTPException(500, detail=str(e))`, which surfaces
Prisma error strings (table/column names, host:port, error class
names) to API callers. New `_internal_error()` helper logs the real
exception server-side and returns a generic, caller-safe `detail`.
Applied to create, list, upsert (general fallthrough), and delete.

Also tightened the race-recovery 409 message to drop the "in a
different scope" wording — the caller never needs to know whose
scope it lives in.

Tests (+5):
- teammate cannot overwrite personal row → 403
- teammate cannot delete personal row → 403
- teammate CAN modify pure team row (no user_id stamped) → 200
- admin bypasses write-auth → 200
- 500 response never echoes Prisma internals (table/host/class names)

25/25 unit tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(memory): require team admin to modify pure team rows

Tightens the write-authorization rule for "pure team rows" (rows with
no user_id stamped, only team_id) to match the pattern used by
team-management endpoints (`_is_user_team_admin` + `_is_user_org_admin_for_team`):

- Plain team members can READ team rows via the OR visibility filter
  (intentional, unchanged).
- Only PROXY_ADMIN, team admins of the row's team_id, or org admins
  for the team's organization may MODIFY them. Plain members get 403.

`_assert_write_access` is now async and takes the prisma_client so it
can fetch the team and run the existing `_is_user_team_admin` /
`_is_user_org_admin_for_team` helpers from
`litellm.proxy.management_endpoints.common_utils`. The org-admin path
is best-effort: it calls `get_user_object`, which depends on the
proxy_server module being initialized, so any exception there is
treated as "not an org admin" rather than crashing the request.

Tests:
- team admin can modify pure team row → 200
- plain team member cannot modify pure team row → 403
- plain team member cannot delete pure team row → 403

Updates the test fake to add a tiny `litellm_teamtable.find_unique`
implementation and a `_make_team(team_id, admin_user_ids=[...])`
helper.

27/27 unit tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: mypy + UI page-metadata sync for memory page

Two CI failures:

1. mypy: `_find_memory_for_caller` had `key_filter` inferred as
   `dict[str, str]` (literal type) and the conditional `{"AND": [key_filter, vis]}`
   returned `dict[str, list[...]]`, so the join site failed
   `dict-item` typing. Annotate both intermediates as `dict` so mypy
   widens the value type.

2. UI test (`page_utils.test.ts > should have descriptions for all
   pages`): every leftnav entry must have a description in
   `page_metadata.ts`, and `memory` was missing. Added a one-line
   description, matching the style of neighboring entries.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* [Feat] Day-0 support for GPT-5.5 and GPT-5.5 Pro (#26449)

* feat(openai): day-0 support for GPT-5.5 and GPT-5.5 Pro

Add pricing + capability entries for the new GPT-5.5 family launched by
OpenAI on 2026-04-24:

- gpt-5.5 / gpt-5.5-2026-04-23 (chat): $5/$30/$0.50 per 1M
  input/output/cached input
- gpt-5.5-pro / gpt-5.5-pro-2026-04-23 (responses-only): $60/$360/$6
  per 1M input/output/cached input

Other fees (long-context >272k, flex, batches, priority, cache
discounts) follow the same ratios as GPT-5.4, with context window
retained at 1.05M input / 128K output.

No transformation / classifier code changes are required:
OpenAIGPT5Config.is_model_gpt_5_4_plus_model() already matches 5.5+ via
numeric version parsing, and model registration is driven from the
JSON. The existing responses-API bridge for tools + reasoning_effort
(litellm/main.py:970) already covers gpt-5.5-pro.

Tests:
- GPT5_MODELS regression list now covers gpt-5.5-pro and dated variants
- New test_generic_cost_per_token_gpt55_pro cost-calc test
- Updated test_generic_cost_per_token_gpt55 for long-context fields

* fix(openai): mirror reasoning_effort flags onto gpt-5.5 dated variants

gpt-5.5-2026-04-23 and gpt-5.5-pro-2026-04-23 were missing the
supports_none_reasoning_effort, supports_xhigh_reasoning_effort, and
supports_minimal_reasoning_effort flags that their non-dated
counterparts define. Reasoning-effort routing in OpenAIGPT5Config is
fully capability-driven from these JSON flags — since an absent flag
is treated as False for opt-in levels (xhigh), users pinning to a
dated snapshot would silently lose xhigh support and diverge from the
base alias on logprobs + flexible temperature handling.

Copy the flags onto both dated variants so every dated snapshot
inherits the base model's reasoning-effort capability profile.

Adds a parametrized regression test that asserts
supports_{none,minimal,xhigh}_reasoning_effort parity between each
dated variant and its non-dated counterpart, preventing future drift
when new snapshots are added.

* fix(schema): close LiteLLM_MemoryTable model brace dropped during merge

The rebase against `litellm_internal_staging` (which added
`LiteLLM_AdaptiveRouterState` / `LiteLLM_AdaptiveRouterSession`) left
the closing brace of `LiteLLM_MemoryTable` missing in all three
schema copies — the next model declaration ended up parsed as a field
of the memory table, surfacing as the CI prisma error:

    error: This line is not a valid field or attribute definition.
      -->  schema.prisma:1250
       |
    1249 | // Per-(router, request_type, model) Beta posterior for the adaptive router.
    1250 | model LiteLLM_AdaptiveRouterState {

Add the missing `}` (and the standard blank line) after the memory
table's `@@index([team_id])` in `schema.prisma`,
`litellm/proxy/schema.prisma`, and
`litellm-proxy-extras/litellm_proxy_extras/schema.prisma`.

`prisma generate --schema litellm/proxy/schema.prisma` now runs clean;
27/27 memory unit tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Mateo Wang <277851410+mateo-berri@users.noreply.github.com>

2026-04-24 18:38:07 -07:00

0 lines

0 B

Python

Raw Blame History

The file is empty.