SLayer in 10 minutes¶
An agent that writes raw SQL can produce a perfectly valid query for the wrong metric — and lose the context behind every choice the next time it's invoked. SLayer takes a different bet: give the agent a typed map of the data, metrics that compose at query time, and a memory of the business context it has accumulated. The result is queries that stay consistent across calls and an agent that gets sharper every session.
Why just asking your agent to generate SQL is not optimal¶
- Business-logic complexity — agents must juggle objective rules (currency conversion, time zones) alongside undocumented conventions (which rows count as a "real" order, what defines a funnel stage).
- Undocumented table relationships — joins often exist by convention rather than as foreign keys, so schema introspection misses them.
- Consistency across calls — several reasonable SQL translations exist for the same natural-language question; the agent must land on the same one every time or it looks unreliable.
- Prompt overload — packing every rule into the system prompt means the model quietly ignores some of them once the context gets crowded.
- Hard to validate — writing general checks that catch wrong-but-syntactically-fine SQL (off-by-one time shifts, rows silently dropped by a bad JOIN) is far from trivial.
Each of the next sections shows how SLayer takes one of these off the agent's plate.
What an agent needs to actually analyse data¶
- See what data is available
- Get trusted metric definitions
- Ask using ad-hoc queries — including complex ones
- Know which metric to choose when
- Have business context for specific cases
- Find the relevant bits of all of the above quickly
The rest of the talk walks through each one.
Setup¶
We'll use the Jaffle Shop dataset — synthetic coffee-shop orders across 6 stores, 3 years of data, ~960k orders. Every cell below runs against it.
Prerequisites: pip install 'motley-slayer[embedding_search]' jafgen. OPENAI_API_KEY is optional — without it, search still works via BM25 + tantivy; with it, the dense-embedding channel also activates.
import os, sys
sys.path.insert(0, os.path.join(os.getcwd(), '..', '..', '..'))
sys.path.insert(0, os.getcwd())
from setup_talk import ensure_lightning_talk_demo
from slayer.async_utils import run_sync
from slayer.core.errors import MemoryNotFoundError
engine, storage, client, models = ensure_lightning_talk_demo()
See — auto-ingestion gave us models for free¶
Point SLayer at a database, it walks the schema, infers types, and discovers foreign keys. The agent gets a typed graph of your tables without anyone writing YAML. The setup cell above already did this — there's nothing else to configure.
You will probably want to add custom metrics and otherwise customize the models, but auto-ingestion allows you to get started quickly.
See Auto-Ingestion docs.
print(f'{len(models)} models ingested:')
for m in sorted(models, key=lambda m: m.name):
print(f' - {m.name:>10} ({len(m.columns)} columns, {len(m.joins)} joins)')
orders = next(m for m in models if m.name == 'orders')
print(f'orders: {len(orders.columns)} columns, {len(orders.joins)} direct joins')
print()
print('Columns:')
for col in orders.columns:
pk = ' [PK]' if col.primary_key else ''
print(f' {col.name:<14} {str(col.type):<10}{pk}')
print()
print('Joins discovered automatically:')
for j in orders.joins:
src, tgt = j.join_pairs[0]
print(f' orders.{src} -> {j.target_model}.{tgt}')
7 models ingested: - customers (2 columns, 0 joins) - items (3 columns, 2 joins) - orders (7 columns, 2 joins) - products (5 columns, 0 joins) - stores (4 columns, 0 joins) - supplies (5 columns, 1 joins) - tweets (4 columns, 1 joins) orders: 7 columns, 2 direct joins Columns: id TEXT [PK] customer_id TEXT ordered_at DATE store_id TEXT subtotal DOUBLE tax_paid DOUBLE order_total DOUBLE Joins discovered automatically: orders.customer_id -> customers.id orders.store_id -> stores.id
Get — one metric definition, the right rollup per question¶
order_total is a column — defined once in the model. The aggregation (sum / average / median / weighted-average / …) is the agent's choice at query time. The metric never drifts across calls because the definition lives in the model; the agent isn't forced to pre-commit to one aggregation method.
result = engine.execute_sync(query={
'source_model': 'orders',
'measures': [
{'formula': 'order_total:sum', 'name': 'revenue'},
{'formula': 'order_total:avg', 'name': 'avg_order'},
{'formula': 'order_total:median', 'name': 'median_order'},
],
'dimensions': ['stores.name'],
'order': [{'column': 'revenue', 'direction': 'desc'}],
})
assert len(result.data) > 0, 'multi-aggregation query returned no rows'
import pandas as pd
pd.DataFrame(result.data)
| orders.stores.name | orders.revenue | orders.avg_order | orders.median_order | |
|---|---|---|---|---|
| 0 | Brooklyn | 1037680.12 | 12.664211 | 7.28 |
| 1 | Philadelphia | 802124.24 | 12.768003 | 7.42 |
| 2 | Chicago | 438820.68 | 13.193250 | 7.43 |
| 3 | San Francisco | 352668.92 | 13.212038 | 7.52 |
| 4 | New Orleans | 55789.74 | 12.760691 | 7.28 |
Ask — complex queries compose like Lego¶
Real questions stack things: group by month, aggregate, compare to the prior month, compare to last year, filter to growth, take the top N. Doing this in raw SQL means CTEs, window functions, self-joins. SLayer lets the agent build the question out of primitives.
hero = {
'source_model': 'orders',
'measures': [
{'formula': 'order_total:sum', 'name': 'revenue'},
{'formula': 'change_pct(order_total:sum)', 'name': 'mom_growth'},
{
'formula': "order_total:sum / time_shift(order_total:sum, -1, 'year') - 1",
'name': 'yoy_growth',
},
],
'dimensions': ['stores.name'],
'time_dimensions': [{'dimension': 'ordered_at', 'granularity': 'month'}],
'filters': ['change_pct(order_total:sum) > 0'],
'order': [{'column': 'mom_growth', 'direction': 'desc'}],
'limit': 10,
}
result = engine.execute_sync(query=hero)
assert len(result.data) > 0, 'hero query returned no rows'
display(pd.DataFrame(result.data))
| orders.stores.name | orders.ordered_at | orders.revenue | orders.mom_growth | orders.yoy_growth | |
|---|---|---|---|---|---|
| 0 | Philadelphia | 2023-06-01 | 6498.32 | 15.660223 | NaN |
| 1 | Chicago | 2025-02-01 | 12728.15 | 2.163515 | NaN |
| 2 | San Francisco | 2025-06-01 | 15231.01 | 0.886053 | NaN |
| 3 | Chicago | 2025-06-01 | 19681.54 | 0.816735 | NaN |
| 4 | Brooklyn | 2024-06-01 | 24083.25 | 0.815521 | NaN |
| 5 | San Francisco | 2025-07-01 | 27545.57 | 0.808519 | NaN |
| 6 | Philadelphia | 2024-06-01 | 19436.72 | 0.789620 | 1.991038 |
| 7 | Philadelphia | 2023-07-01 | 11601.89 | 0.785368 | NaN |
| 8 | Philadelphia | 2025-07-01 | 32888.09 | 0.784384 | 0.140402 |
| 9 | Brooklyn | 2025-06-01 | 32618.51 | 0.668387 | 0.354406 |
# And this is the SQL generated by SLayer for the above, which neither you, nor the agent had to write.
print(result.sql)
SELECT
"orders.stores.name",
"orders.ordered_at",
"orders.revenue",
"orders.mom_growth",
"orders.yoy_growth"
FROM (
SELECT *
FROM (
WITH base AS (
SELECT
stores.name AS "orders.stores.name",
DATE_TRUNC('MONTH', orders.ordered_at) AS "orders.ordered_at",
SUM(orders.order_total) AS "orders.revenue"
FROM orders AS orders
LEFT JOIN stores AS stores
ON orders.store_id = stores.id
GROUP BY
stores.name,
DATE_TRUNC('MONTH', orders.ordered_at)
),
shifted__ts_mom_growth AS (
SELECT
stores.name AS "orders.stores.name",
DATE_TRUNC('MONTH', CAST(orders.ordered_at + INTERVAL '1' MONTH AS TIMESTAMP)) AS "orders.ordered_at",
SUM(orders.order_total) AS "orders.revenue"
FROM orders AS orders
LEFT JOIN stores AS stores
ON orders.store_id = stores.id
GROUP BY
stores.name,
DATE_TRUNC('MONTH', CAST(orders.ordered_at + INTERVAL '1' MONTH AS TIMESTAMP))
),
sjoin__ts_mom_growth AS (
SELECT base."orders.ordered_at", base."orders.revenue", base."orders.stores.name", shifted__ts_mom_growth."orders.revenue" AS "orders._ts_mom_growth"
FROM base
LEFT JOIN shifted__ts_mom_growth
ON base."orders.ordered_at" = shifted__ts_mom_growth."orders.ordered_at" AND base."orders.stores.name" = shifted__ts_mom_growth."orders.stores.name"
),
shifted__t0 AS (
SELECT
stores.name AS "orders.stores.name",
DATE_TRUNC('MONTH', CAST(orders.ordered_at + INTERVAL '1' YEAR AS TIMESTAMP)) AS "orders.ordered_at",
SUM(orders.order_total) AS "orders.revenue"
FROM orders AS orders
LEFT JOIN stores AS stores
ON orders.store_id = stores.id
GROUP BY
stores.name,
DATE_TRUNC('MONTH', CAST(orders.ordered_at + INTERVAL '1' YEAR AS TIMESTAMP))
),
sjoin__t0 AS (
SELECT sjoin__ts_mom_growth."orders._ts_mom_growth", sjoin__ts_mom_growth."orders.ordered_at", sjoin__ts_mom_growth."orders.revenue", sjoin__ts_mom_growth."orders.stores.name", shifted__t0."orders.revenue" AS "orders._t0"
FROM sjoin__ts_mom_growth
LEFT JOIN shifted__t0
ON sjoin__ts_mom_growth."orders.ordered_at" = shifted__t0."orders.ordered_at" AND sjoin__ts_mom_growth."orders.stores.name" = shifted__t0."orders.stores.name"
),
shifted__ts__ft0 AS (
SELECT
stores.name AS "orders.stores.name",
DATE_TRUNC('MONTH', CAST(orders.ordered_at + INTERVAL '1' MONTH AS TIMESTAMP)) AS "orders.ordered_at",
SUM(orders.order_total) AS "orders.revenue"
FROM orders AS orders
LEFT JOIN stores AS stores
ON orders.store_id = stores.id
GROUP BY
stores.name,
DATE_TRUNC('MONTH', CAST(orders.ordered_at + INTERVAL '1' MONTH AS TIMESTAMP))
),
sjoin__ts__ft0 AS (
SELECT sjoin__t0."orders._t0", sjoin__t0."orders._ts_mom_growth", sjoin__t0."orders.ordered_at", sjoin__t0."orders.revenue", sjoin__t0."orders.stores.name", shifted__ts__ft0."orders.revenue" AS "orders._ts__ft0"
FROM sjoin__t0
LEFT JOIN shifted__ts__ft0
ON sjoin__t0."orders.ordered_at" = shifted__ts__ft0."orders.ordered_at" AND sjoin__t0."orders.stores.name" = shifted__ts__ft0."orders.stores.name"
),
step2 AS (
SELECT
"orders._t0",
"orders._ts__ft0",
"orders._ts_mom_growth",
"orders.ordered_at",
"orders.revenue",
"orders.stores.name",
CASE WHEN "orders._ts_mom_growth" != 0 THEN ("orders.revenue" - "orders._ts_mom_growth") * 1.0 / "orders._ts_mom_growth" END AS "orders.mom_growth",
"orders.revenue" / "orders._t0" - 1 AS "orders.yoy_growth",
CASE WHEN "orders._ts__ft0" != 0 THEN ("orders.revenue" - "orders._ts__ft0") * 1.0 / "orders._ts__ft0" END AS "orders._ft0"
FROM sjoin__ts__ft0
)
SELECT
"orders._ft0",
"orders._t0",
"orders._ts__ft0",
"orders._ts_mom_growth",
"orders.mom_growth",
"orders.ordered_at",
"orders.revenue",
"orders.stores.name",
"orders.yoy_growth"
FROM step2
) AS _filtered
WHERE "orders._ft0" > 0
) AS _outer
ORDER BY
"orders.mom_growth" DESC
LIMIT 10
Queries as models — unlimited depth¶
Every SLayer query also defines a model: its result columns become dimensions, its numeric columns become measure sources, and the next query can source_model it by name. No bolt-on multi-stage syntax — this falls out of the design.
Example: average monthly revenue per store. Stage 1 sums revenue by store-month; stage 2 averages those monthly sums per store.
from IPython.display import Markdown
result = engine.execute_sync(query=[
{
'name': 'monthly_store_revenue',
'source_model': 'orders',
'measures': ['order_total:sum'],
'dimensions': ['stores.name'],
'time_dimensions': [{'dimension': 'ordered_at', 'granularity': 'month'}],
},
{
'source_model': 'monthly_store_revenue',
'measures': ['order_total_sum:avg'],
'dimensions': ['stores__name'],
'order': [{'column': 'order_total_sum_avg', 'direction': 'desc'}],
},
])
assert len(result.data) > 0, 'multistage query returned no rows'
# Note the pretty formatting of the numbers if you export them as markdown
Markdown(result.to_markdown())
| monthly_store_revenue.stores__name | monthly_store_revenue.order_total_sum_avg |
|---|---|
| Brooklyn | 34.6K |
| Chicago | 25.8K |
| San Francisco | 22.0K |
| Philadelphia | 21.7K |
| New Orleans | 9298 |
Have & Find — text context plus a way to find it¶
Most semantic layers stop at a description field per metric and leave the agent to guess which one to read. SLayer ships two things on top:
- Memories — free-form notes the agent writes, each linked to whichever entities it's about (and optionally carrying an example query).
- Search — one tool that retrieves both memories and canonical entities, fusing three retrieval channels.
See Memories docs and Search docs.
# You can save a learning linked to multiple entities
brooklyn = run_sync(client.save_memory(
learning=(
'Brooklyn switched to a new POS system in late 2024. '
'Order totals before 2025-01-01 are from the legacy system; '
'they exclude tax and use a different currency-rounding rule. '
'Exclude pre-2025-01-01 Brooklyn data when comparing to other stores.'
),
linked_entities=[
'jaffle_shop.orders.order_total',
'jaffle_shop.stores.name',
],
id='lightning.brooklyn_pos',
))
print(f'Saved memory id={brooklyn.memory_id}')
print(f'Linked to: {brooklyn.resolved_entities}')
if brooklyn.warnings:
print(f'Warnings: {brooklyn.warnings}')
Saved memory id=lightning.brooklyn_pos Linked to: ['jaffle_shop.orders.order_total', 'jaffle_shop.stores.name']
# You can save an example query with annotation, too, and it will be linked to the entities it contains
top_customers = run_sync(client.save_memory(
learning=(
'Top-5 customers by lifetime spend - known-good query pattern '
'used in the weekly CRM-ops dashboard.'
),
linked_entities={
'source_model': 'orders',
'measures': [{'formula': 'order_total:sum', 'name': 'lifetime_spend'}],
'dimensions': ['customers.name'],
'order': [{'column': 'lifetime_spend', 'direction': 'desc'}],
'limit': 5,
},
id='lightning.top_customers',
))
print(f'Saved query-bearing memory id={top_customers.memory_id}')
print(f'Auto-extracted entities: {top_customers.resolved_entities}')
Saved query-bearing memory id=lightning.top_customers Auto-extracted entities: ['jaffle_shop.orders', 'jaffle_shop.customers.name', 'jaffle_shop.orders.order_total']
Find — one tool, three channels¶
search retrieves both memories and canonical entities, merging three channels via Reciprocal Rank Fusion:
- BM25 over each memory's stored entity tags — strongest when the agent already has an entity reference in hand.
- tantivy full-text over learning text and canonical entities — natural-language questions.
- embeddings (cosine over a sidecar table) — dense similarity. Optional; degrades gracefully if
OPENAI_API_KEYisn't set.
The agent doesn't pick a channel. It asks once.
# You can search using a text query
resp = run_sync(client.search(
question='What should I know before comparing Brooklyn revenue to other stores?',
max_memories=3,
max_example_queries=2,
max_entities=0,
))
print('Memories:')
for hit in resp.memories:
print(f' [{hit.id}] score={hit.score:.3f}')
print(f' -> {hit.text[:100]}...')
assert any(m.id == 'lightning.brooklyn_pos' for m in resp.memories), \
'Brooklyn memory must surface in resp.memories for this question'
Memories:
[lightning.brooklyn_pos] score=0.033
-> Brooklyn switched to a new POS system in late 2024. Order totals before 2025-01-01 are from the lega...
# And/or entity references
resp = run_sync(client.search(
entities=['jaffle_shop.orders.order_total'],
max_memories=3,
max_entities=0,
))
print('Memories anchored to jaffle_shop.orders.order_total:')
for hit in resp.memories:
print(f' [{hit.id}] matched_entities={hit.matched_entities}')
print(f' -> {hit.text[:100]}...')
assert any(m.id == 'lightning.brooklyn_pos' for m in resp.memories), \
'Brooklyn memory must surface via the order_total entity tag'
Memories anchored to jaffle_shop.orders.order_total:
[lightning.brooklyn_pos] matched_entities=['jaffle_shop.orders.order_total']
-> Brooklyn switched to a new POS system in late 2024. Order totals before 2025-01-01 are from the lega...
# And specifically look for memories with queries
resp = run_sync(client.search(
question='How have analysts queried customer lifetime spend before?',
max_memories=0,
max_example_queries=2,
max_entities=5,
))
print('Example queries:')
for eq in resp.example_queries:
print(f' [{eq.id}] score={eq.score:.3f}')
print(f' -> {eq.text[:100]}...')
print()
print('Canonical entities:')
for ent in resp.entities:
print(f' [{ent.kind}] {ent.id} (score={ent.score:.3f})')
assert any(eq.id == 'lightning.top_customers' for eq in resp.example_queries), \
'top-customers example query must surface in resp.example_queries'
assert len(resp.entities) > 0, 'expected at least one canonical entity hit'
Example queries:
[lightning.top_customers] score=0.033
-> Top-5 customers by lifetime spend - known-good query pattern used in the weekly CRM-ops dashboard.
T...
Canonical entities:
[column] jaffle_shop.orders.customer_id (score=0.016)
[model] jaffle_shop.customers (score=0.016)
[model] jaffle_shop.orders (score=0.016)
[column] jaffle_shop.customers.name (score=0.016)
[column] jaffle_shop.customers.id (score=0.015)
Wrap-up¶
Here today:
- Open source (MIT)
- Interfaces: MCP, CLI, REST, Python, Flight SQL
- Auto-ingestion
- Postgres / MySQL/ DuckDB / ClickHouse / SQLite
- Multistage queries, named measures, custom aggregations,
- Memories with embedding + entity + full text search.
Coming next: proper knowledge graph support, so memories and entities form an explicit graph the agent can traverse.
Try it in Claude Code¶
One command wires SLayer up as an MCP server with the Jaffle Shop demo preloaded:
claude mcp add slayer -- uvx --from motley-slayer slayer mcp --demo
Then ask Claude in any project: "What stores are in jaffle_shop and which one has the highest revenue?" — it will call the same tools used above.
# Idempotent teardown: both ids may already be absent (e.g. partial prior
# run, or this cell rerun standalone). MemoryNotFoundError is the only
# expected failure mode here.
try:
run_sync(client.forget_memory('lightning.brooklyn_pos'))
except MemoryNotFoundError:
pass
try:
run_sync(client.forget_memory('lightning.top_customers'))
except MemoryNotFoundError:
pass
print('Memories removed. Re-running this notebook is idempotent.')
Memories removed. Re-running this notebook is idempotent.