Semantic search¶
SLayer ships a search tool that lets agents find both memories and
entities (datasources, models, columns, named measures, custom
aggregations) using up to three parallel retrieval channels merged by
Reciprocal Rank Fusion. It is the only retrieval surface — there is
no separate recall tool.
A third channel (dense embeddings via litellm) is gated behind the
optional embedding_search extra. When the extra is not installed or
no provider API key is configured, the embedding channel emits a
warning into SearchResponse.warnings and search degrades gracefully
via tantivy + BM25 alone.
When you have entity references in hand, the BM25 channel pulls back the most relevant memories. When you don't yet know which entity to look at, the tantivy full-text channel surfaces entities matching your natural-language question. Both run together when both inputs are supplied.
The three retrieval channels¶
Channel 1 — entity-overlap BM25 with implicit self-references¶
Inputs are resolved to canonical entity strings (<ds>, <ds>.<model>,
or <ds>.<model>.<leaf> — see
memories.md) and scored against
each memory's stored entity tags via BM25Plus. Memories with zero
overlap are excluded.
Implicit self-references (DEV-1513). Channel 1 contributes to BOTH the memory ranking AND the entity ranking via a single unifying model: every doc is conceptually tagged with an implicit reference to itself.
- A memory
Mis treated as having effective tagsM.entities ∪ {memory:<M.id>}, so anentities=["memory:<id>"]ref surfaces the named memory itself at the top of the memory BM25 ranking. - An entity
Eis treated as having a single tag{<canonical_of_E>}, so anentities=["<ds>.<model>.<col>"]ref surfaces the named entity at the top of the entities bucket.
Concretely:
{
"call": {"entities": ["mydb.orders.amount"], "max_memories": 0},
"response": {
"entities": [{"id": "mydb.orders.amount", "kind": "column"}]
}
}
{
"call": {"entities": ["memory:42"], "max_entities": 0},
"response": {
"memories": [{"id": "42", "matched_entities": ["memory:42"]}]
}
}
Filter rules for the new entity surfacing:
memory:<id>refs participate in the memory ranking only — they never appear in the entities bucket.- Refs not rooted at
datasource(when set) drop with a warningentity '<X>' is not rooted at datasource '<ds>'; dropped from entities bucket.The memory side fires the symmetricmemory:<id> is not rooted at datasource '<ds>'; dropped.when the named memory has no entities rooted at the requested datasource. - Refs on a hidden model or hidden column drop from the entities bucket with
entity '<X>' is on a hidden model/column; dropped from entities bucket.BM25 over original memory tags is unaffected — memories tagged with that canonical still surface. - An explicitly-named
memory:<id>whose attachedMemory.queryhas stale references emits the standard stale-query warning regardless ofmax_example_queries(the user explicitly asked for that memory; they deserve to know the query is broken).
Activated when entities and/or query is supplied to search.
Channel 2 — tantivy full-text over memories ∪ entities¶
A fresh tantivy in-memory index is built per call covering:
- one doc per non-hidden datasource;
- one doc per non-hidden model (excluding model
meta); - one doc per non-hidden column on each non-hidden model (including the
cached
Column.sampledsnapshot, the column'ssqlexpression, and itsdescription/label/format/allowed_aggregations); - one doc per named
ModelMeasure(formula + description + label); - one doc per custom
Aggregation(formula + params + description); - one doc per memory (learning text + canonical entity tags).
The index uses tantivy's en_stem analyzer (Porter stemmer + default
tokenizer that splits on _ and .), so a search for "shipped"
matches docs containing "shipping", and "customer" matches
customer_id. An exact-match canonical field also lets agents paste
a literal canonical string and get the doc back.
Activated when question is supplied.
Channel 3 — dense embedding similarity¶
A persistent embeddings sidecar table holds one row per indexable doc
(memory or non-hidden datasource / model / column / measure /
aggregation) under each configured embedding_model_name. On search,
the question is embedded once, the corpus matrix is loaded fresh, and
top-k cosine similarities are computed with numpy.
Activated when all of the following hold:
questionis supplied;- the
embedding_searchextra is installed (pip install motley-slayer[embedding_search]); - at least one embedding row exists for the active model name;
- the query-embedding call succeeds.
When any precondition is not met, the channel emits a one-line warning
into SearchResponse.warnings and contributes no rankings — search
continues via channels 1 and 2.
Configuration. SLAYER_EMBEDDING_MODEL (env var) selects the
embedding model, in <provider>/<model-name> litellm format. Defaults
to openai/text-embedding-3-small. Provider credentials
(OPENAI_API_KEY, AZURE_API_KEY, etc.) are read by litellm itself.
Refresh. Embedding rows are refreshed inline on the same write-side
edges as Column.sampled:
slayer ingest/ingest_datasource_modelsMCP /POST /ingest— refreshes the datasource doc plus every visible model + its visible children (columns, named measures, aggregations);edit_model— refreshes the model's whole subtree;save_memory— refreshes that one memory.
Each refresh hashes the rendered indexed text and compares it to the
stored content_hash; the litellm call is skipped when the source text
hasn't changed since the last refresh, so idempotent re-runs are cheap.
Model changes. Switching SLAYER_EMBEDDING_MODEL mid-project leaves
old rows in place but inert — the search service reads only rows
matching the active model name. Re-run slayer ingest (or re-save
memories) to populate the new model's rows. A dimension-mismatch
between the question embedding and stored rows is detected and emits a
warning instead of crashing.
Failure mode. Per-entity embed failures (rate limits, transient
network errors, bad keys) are non-fatal: the failing row is simply not
written, and a warning is appended to the surfaced response (or to
IdempotentIngestResult.errors on ingest).
Reciprocal Rank Fusion¶
Memory rankings from every active channel are fused via RRF (k = 60):
Entity rankings from channels 1, 2, and 3 are RRF-fused the same way. Channel 1's entity ranking is the user-supplied canonical refs in supplied order (DEV-1513); channels 2 and 3 contribute fuzzy hits.
Per-bucket ranking invariance (DEV-1414)¶
Each channel produces a full per-kind ranking — channel 2 runs as
two kind-filtered tantivy queries (one over memory docs only, one over
entity docs only), and channel 3 partitions the embedding corpus by
entity_kind and ranks each side independently. There is no shared
candidate-pool budget across kinds, so for a fixed
(question, datasource, max_X) the membership and order of the
returned X bucket (memories / example_queries / entities) is a
pure function of the corpus + question + that one cap. Varying the
other two caps cannot move an id in or out of the returned list nor
reorder it. The max_* caps are pure post-fusion slice operations on
the three independent ranked lists.
Tool surface¶
search(
entities: Optional[List[str]] = None,
query: Optional[Union[SlayerQuery, dict]] = None,
question: Optional[str] = None,
datasource: Optional[str] = None,
max_memories: int = 5,
max_example_queries: int = 2,
max_entities: int = 5,
) -> SearchResponse
| Surface | How to call |
|---|---|
| MCP | search(entities=[...], question="...") tool |
| REST | POST /search with SearchRequest body |
| CLI | slayer search --entity <e> --question "..." [--format json] |
| Python client | await client.search(entities=[...], question="...") |
datasource filter (DEV-1409)¶
All four surfaces accept an optional datasource: Optional[str] = None
argument. When set, every channel pre-filters its corpus to that one
datasource:
- Entity hits (channels 1, 2, and 3) include only docs whose
canonical_idis rooted at the requested datasource — exact name match (<ds>) or strict dotted-path descendant (<ds>.<model>,<ds>.<model>.<leaf>). Character-prefix matches do NOT qualify, sodatasource="prod"excludes a sibling datasource namedprod_v2. Channel 1 (DEV-1513) drops a user-suppliedentities=ref that isn't rooted at the requested datasource with a warning rather than silently surfacing it. - Memory hits (channels 1, 2, 3, and the recency fallback) include
only memories whose
entitieslist has at least one entry rooted at the requested datasource. A memory that references bothprod.*andstaging.*surfaces from each datasource when each is filtered independently; an untagged memory drops out under any filter. A user-suppliedentities=["memory:<id>"]ref whose memory was filtered out emits a symmetric warning. - BM25 and tantivy IDF statistics reflect the filtered subset only — pre-filter, not post-filter. The embedding cosine corpus (channel 3) is filtered before the numpy matrix is built, so cosine scores are computed only against eligible rows.
Unknown datasource → ValueError (HTTP 400 on REST, MCP-formatted error
on the MCP tool). Validation runs before any corpus walk so typos
surface fast.
canonical_id rooting uses slayer.memories.resolver.canonical_id_rooted_at,
which encodes the same dotted-namespace rule the embedding cascade-delete
already enforces (DEV-1405). Datasource names cannot contain . (rejected
by DatasourceConfig.name + SlayerModel.data_source validators), so the
prefix match is unambiguous.
Behaviour matrix¶
entities/query |
question |
Result |
|---|---|---|
| set | set | All eligible channels run. Memories RRF-fused (channels 1 + 2 + 3); entities RRF-fused (channels 1 + 2 + 3, DEV-1513). Channel 3 is skipped with a warning when the embedding_search extra is missing. Query-bearing memories partitioned out to example_queries. |
| set | unset/empty | Channel 1 only. Memories partitioned by query presence; entity hits = the named refs themselves (DEV-1513). |
| unset/empty | set | Channels 2 and 3 (when eligible). Memories RRF-fused; entities RRF-fused. |
| unset/empty | unset/empty | Recency fallback: newest max_memories learning-only memories + newest max_example_queries query-bearing memories, with a warning. |
Response shape¶
Memories are partitioned by Memory.query is None: learning-only
memories land in memories, query-bearing memories in
example_queries. The two lists are capped independently so a few
bulky example queries cannot crowd out small learning-only notes.
class MemoryHit(BaseModel):
id: str # memory id (forget_memory(id=hit.id) works)
score: float # RRF-fused (or single-channel raw)
text: str # full indexed text (no truncation)
matched_entities: List[str] # canonical entities that channel-1 input
# overlapped with the memory's tags;
# stale tags are filtered before this is
# computed (DEV-1428 lazy GC).
class ExampleQueryHit(BaseModel):
id: str # memory id
score: float # RRF-fused
text: str # full indexed text
matched_entities: List[str]
query: SlayerQuery # always set on this hit type
class EntityHit(BaseModel):
id: str # canonical entity string
kind: str # "datasource"|"model"|"column"|"measure"|"aggregation"
score: float # RRF-fused across channels 1+2+3
# (DEV-1513), or single-channel raw
# when only one channel contributed
text: str # full indexed text (no truncation)
class SearchResponse(BaseModel):
memories: List[MemoryHit] # learning-only (query is None)
example_queries: List[ExampleQueryHit] # query-bearing
entities: List[EntityHit]
resolved_input_entities: List[str] # echo of the resolver output
warnings: List[str]
Lenient input validation (DEV-1428)¶
Unresolved entity / memory references in search(entities=...) and
search(query=...) are demoted to warnings rather than raising. The
dropped token does not appear in resolved_input_entities, but the
search proceeds against whatever did resolve. Examples:
entities=["mydb.orders.amount", "memory:nonexistent"]returns a normal response;warningsincludesentity 'memory:nonexistent' dropped: No memory with id 'nonexistent'.- A stale entity tag inside a saved memory does not contribute to
channel-1 BM25 ranking, and is excluded from any hit's
matched_entitieslist. - An
example_querieshit whose attachedMemory.queryreferences a vanished column gets the warningexample_query memory:<id>: attached query has stale references (...); re-save to clean.but is still surfaced with its stored query intact.
memory:<id> is also accepted in entities (cross-memory linking) —
the resolver checks the memory exists and the canonical form
round-trips as-is.
Sample-value cache¶
For richer search results, every column carries three optional sample-value fields:
Column.sampled— a formatted text snapshot. For categorical columns, the top-20 most-common values comma-joined; for high-cardinality categoricals (> 50 distinct), the top-20 plus a... (N distinct)suffix carrying the true total. For numeric / temporal columns, themin .. maxrange.Column.sampled_values(DEV-1480) — the structured top-50-by-frequency list for categorical columns. StaysNonefor numeric / temporal columns. Consumers comparing predicate literals against actual stored values should read this field directly — text-split onsampledis ambiguous for values that themselves contain commas (e.g."R$ 1,000–3,000").Column.distinct_count(DEV-1480) — the true total cardinality at profile time. Set for every profiled categorical column (computed via a secondarycount_distinctquery when overflow is detected so the count is exact, not capped). StaysNonefor numeric / temporal columns.
All three are populated:
- on every
slayer ingest/ingest_datasource_modelsMCP call /POST /ingestfor every table-backed model in the touched datasource; - on
slayer search refresh-samples [--data-source X] [--model M ...]; - on
edit_model(column edits → that column; model-level filter / sql / source-query body change → every column); - lazily on
inspect_modelwhen the cached value is missing (write-back best-effort); - lazily inside
search()itself for any column hit whose persistedsampled_valuesis stale (DEV-1516). The post-fusion column-hit hook groups hits by(data_source, model_name)— refreshes within a model serialise (the storage write is a model-level read-modify-write); refreshes across different models run concurrently viaasyncio.gather. Whensearch()is constructed without an engine (storage-only contexts), the hook is a silent no-op.
Cache validity for categorical columns requires sampled_values is not None —
v6 (legacy sampled only) models re-profile on the next inspect_model
or search() column hit so the structured field gets populated.
sql-mode and query-backed models are silently skipped in v1.
How sample values surface in search results¶
The per-column doc rendered by slayer/search/render.py:render_column_text
prefers the structured sampled_values list (full top-50) over the
20-truncated sampled text. When sampled_values is populated:
Column: warehouse.orders.status
Type: TEXT
Description: Order status.
Sample values: ["paid", "refunded", "cancelled", "pending", …] ← JSON-encoded, all 50
Distinct count: 12345 ← only when distinct_count > len(sampled_values)
The list is rendered as a JSON array (not comma-joined) so values that
themselves contain commas — "R$ 1,000–3,000", locale-formatted numbers,
multi-clause labels — survive unambiguously to the consumer. This is why
DEV-1480 introduced the structured sampled_values field in the first
place; comma-joining it back to a flat string would re-introduce the
exact ambiguity it was meant to solve.
When sampled_values is None (numeric / temporal columns, or legacy
v6 data, or rare overflow-with-failed-count_distinct rows), the renderer
falls back to the persisted sampled text — which already carries the
... (N distinct) suffix for the legacy overflow case, so no extra
Distinct count line is emitted. An empty sampled_values=[] list is
authoritative-empty: the line is skipped entirely (no fallback to stale
sampled).
This same text feeds both the per-column search index doc AND
EntityHit.text returned by search() — single renderer, single
source of truth. inspect_model's markdown ## Columns table is the
all-columns-at-once surface and continues to show the 20-truncated
sampled text per column for readability on wide models. JSON
inspect_model output already carries the full sampled_values list.
Known limitation. The refresh hook runs after RRF fusion, on the
top-K hits being returned. Ranking (BM25 / tantivy / embeddings) still
operates on whatever the corpus held at index-build time. A query whose
only match against a column is a newly-revealed value in positions 21-50
may still fail to surface that column. The text the agent sees IS
refreshed; tantivy / embeddings will catch up on the next
slayer ingest content-hash pass.
Index design notes¶
- The tantivy index is built fresh on every search call in v1 (no persistence, no invalidation logic). For typical SLayer setups (tens to low-hundreds of models, tens to low-thousands of memories) this is fast; persistent on-disk indexing is a future follow-up.
metais excluded from indexed text — arbitrary user JSON.- Hidden models and hidden columns are skipped entirely from the index.
- Each tantivy doc has four schema fields:
id(raw),kind(raw),canonical(raw, exact-match),text(en-stemmed + tokenised).
Embedding sidecar design notes¶
- Stored, not rebuilt per call. Rows live in an indexed
embeddingsSQLite table — in the main.dbfile forSQLiteStorage, or at<base_dir>/embeddings.dbforYAMLStorage(DEV-1405). Keyed by(canonical_id, embedding_model_name). Both backends share the same SQL through aSidecarEmbeddingStorehelper. Search loads the corpus matrix fresh per call and runs cosine similarity in numpy. - Same render pipeline as tantivy (
slayer/search/render.py) — every doc that goes into the tantivy index also feeds the embedding text. - Refresh is inline on the same write-side edges as
Column.sampled: ingest,edit_model,save_memory. SHA256 content hash makes idempotent re-runs cheap. The hot path (EmbeddingService._apply_pending) issues one batchedget_embeddings_for_canonical_idsfor the hash-skip filter and one batchedsave_embeddingsfor the persist step (DEV-1405) — refresh cost is independent of subtree size. Memories are included in theslayer ingest/--ingest-on-startupper-datasource refresh (DEV-1416), filtered to memories with at least one canonical entity rooted at the current datasource — soembeddings.dbcan be repaired by re-running ingest, no separateslayer embeddings refreshstep required. - Cascade semantics (DEV-1405 fix):
delete_embeddings_for_canonicalmatches the canonical id exactly OR as a strict dotted-path descendant (<root>.<...>) — never as a character prefix. Sodelete_memory(4)removes onlymemory:4(notmemory:42,memory:43, …);delete_datasource("orders")does not touch a sibling datasource namedorders_archive;delete_model("orders", "customers")does not touch a siblingcustomers_v2. - Optional pip extra:
pip install motley-slayer[embedding_search]installslitellm+numpy. When omitted, the embedding channel emits a one-line warning and contributes nothing. - Storage shape: embeddings are stored as JSON lists of floats — portable, debuggable, dialect-neutral. ~6 KB per 1536-dim row.