Auto-Ingestion: From Database to Semantic Models¶
TL;DR: SLayer can introspect a database schema and automatically generate models with dimensions, measures, and join relationships matching the schema.
This notebook shows what happens under the hood.
We use the Jaffle Shop dataset — a synthetic e-commerce schema with 7 tables and foreign key relationships between them.
Prerequisites: pip install motley-slayer
See also: Auto-Ingestion docs | Models docs
import os
import sys
import duckdb
import sqlalchemy as sa
sys.path.insert(0, os.path.join(os.getcwd(), "..", "..", ".."))
sys.path.insert(0, os.path.join(os.getcwd(), "..", "jaffle_data"))
from setup_jaffle import ensure_jaffle_shop, DB_PATH
engine, storage, models = ensure_jaffle_shop()
The Jaffle Shop Schema¶
The database has 7 tables connected by foreign keys:
customers <── orders ──> stores
│
items ──> products <── supplies
customers <── tweets
Let's peek at the data:
conn = duckdb.connect(DB_PATH, read_only=True)
for table in ["customers", "stores", "products", "orders", "items", "supplies", "tweets"]:
count = conn.execute(f"SELECT COUNT(*) FROM {table}").fetchone()[0]
print(f"{table:>15}: {count:>8,} rows")
conn.close()
customers: 2,590 rows
stores: 6 rows
products: 10 rows
orders: 661,146 rows
items: 981,268 rows
supplies: 65 rows
tweets: 215,535 rows
Step 1: FK Graph Discovery¶
The first thing auto-ingestion does is introspect foreign key constraints and build a directed dependency graph. Each edge means "this table has a FK pointing to that table."
The graph must be acyclic — if cycles are found, SLayer raises a RollupGraphError.
For each table, SLayer computes the transitive closure: all tables reachable by following FK chains. This is used for column introspection (knowing which tables' columns might be queried), but joins are only created for direct FKs — multi-hop paths are resolved at query time by walking the join graph.
from slayer.core.models import DatasourceConfig
from slayer.engine.ingestion import _build_fk_graph, _compute_transitive_closure
ds = DatasourceConfig(name="jaffle_shop", type="duckdb", database=DB_PATH)
sa_engine = sa.create_engine(ds.resolve_env_vars().get_connection_string())
inspector = sa.inspect(sa_engine)
table_names = inspector.get_table_names()
fk_graph = _build_fk_graph(inspector=inspector, table_names=table_names, schema=None)
print("FK Graph (table -> tables it references):")
for table in sorted(fk_graph):
refs = sorted(fk_graph[table])
print(f" {table} -> {refs}")
print()
print("Tables with no FK references (leaf tables):")
for t in sorted(table_names):
if t not in fk_graph:
print(f" {t}")
FK Graph (table -> tables it references): items -> ['orders', 'products'] orders -> ['customers', 'stores'] supplies -> ['products'] tweets -> ['customers'] Tables with no FK references (leaf tables): customers products stores
# Transitive closure: what tables are reachable from items?
for table in ["orders", "items"]:
reachable = _compute_transitive_closure(fk_graph, table)
print(f"{table} can reach: {sorted(reachable)}")
sa_engine.dispose()
orders can reach: ['customers', 'stores'] items can reach: ['customers', 'orders', 'products', 'stores']
Notice that items transitively reaches customers and stores through orders, even though it has no direct FK to them. At query time, SLayer walks the join graph through orders to reach these tables.
Step 2: Join Generation¶
Each table's own FK relationships become ModelJoin objects — one per FK. Each join specifies:
- target_model: the table being joined
- join_pairs:
[[source_column, target_column]]— the columns to join on
Only direct FKs are stored. Tables reachable via multiple hops (e.g. items → orders → customers) are resolved at query time by walking each intermediate model's own joins.
orders_model = next(m for m in models if m.name == "orders")
items_model = next(m for m in models if m.name == "items")
print("=== orders model joins (direct FKs only) ===")
for j in orders_model.joins:
src, tgt = j.join_pairs[0]
print(f" -> {j.target_model} ON {src} = {tgt}")
print()
print("=== items model joins (direct FKs only) ===")
for j in items_model.joins:
src, tgt = j.join_pairs[0]
print(f" -> {j.target_model} ON {src} = {tgt}")
print()
print("Note: items has no join to customers or stores —")
print("those are reached at query time via orders.joins.")
=== orders model joins (direct FKs only) === -> customers ON customer_id = id -> stores ON store_id = id === items model joins (direct FKs only) === -> orders ON order_id = id -> products ON sku = sku Note: items has no join to customers or stores — those are reached at query time via orders.joins.
Step 3: Dimension & Measure Generation¶
For each table, SLayer generates:
- Dimensions for every column. See the joins on how to refer to joined dimensions and measures.
- Measures:
countalways; one measure per numeric non-ID column (withsqlset to the column name); non-numeric non-ID columns also get a measure
# Iterate the unified `columns` list — v2 merges v1's separate dimensions and
# row-level measures into one list. Each Column carries data type, primary_key,
# allowed_aggregations (whitelist), and filter (CASE-WHEN at aggregation time).
print("=== orders model columns ===")
print(f"{'name':<25} {'sql':<25} {'type':<10}")
print("-" * 70)
for col in orders_model.columns:
pk = " [PK]" if col.primary_key else ""
sql_str = col.sql or col.name
print(f"{col.name:<25} {sql_str:<25} {str(col.type):<10}{pk}")
=== orders model columns === name sql type ---------------------------------------------------------------------- id id string [PK] customer_id customer_id string ordered_at ordered_at date store_id store_id string subtotal subtotal number tax_paid tax_paid number order_total order_total number
# In v2 `model.measures` holds *named formulas* (ModelMeasure objects), not
# row-level measure definitions. After auto-ingest the list is empty by
# default; agents/users populate it via edit_model with reusable formulas
# like `{"name": "aov", "formula": "revenue:sum / *:count"}`.
print("=== orders model measures (named formulas) ===")
print(f"{len(orders_model.measures)} formula(s) defined.")
=== orders model measures (named formulas) === 0 formula(s) defined.
Joined dimensions aren't stored on the model — they're resolved at query time.
Use dot syntax to walk the join graph, including through multiple hops.
# 1-hop: items -> products
result = engine.execute_sync(
query={
"source_model": "items",
"measures": ["*:count"],
"dimensions": ["products.name"],
"order": [{"column": {"name": "_count"}, "direction": "desc"}],
"limit": 5,
}
)
print("=== 1-hop: items -> products ===")
for row in result.data:
print(
f" {row['items.products.name']}: {row['items._count']} items"
)
# Multi-hop: items -> orders -> customers
result = engine.execute_sync(
query={
"source_model": "items",
"measures": ["*:count"],
"dimensions": ["orders.customers.name"],
"order": [{"column": {"name": "_count"}, "direction": "desc"}],
"limit": 5,
}
)
print("\n=== Multi-hop: items -> orders -> customers ===")
for row in result.data:
print(
f" {row['items.orders.customers.name']}: {row['items._count']} items"
)
=== 1-hop: items -> products === chai and mighty: 151699 items adele-ade: 151175 items vanilla ice: 150853 items for richer or pourover : 150795 items tangaroo: 150723 items === Multi-hop: items -> orders -> customers === Nicole Jackson: 1706 items Jacob Brown: 1464 items Michael Conner: 1440 items Kyle Evans: 1431 items Timothy Evans: 1390 items
Summary¶
Auto-ingestion discovers the database structure and generates a complete semantic layer:
- FK graph identifies table relationships and validates they're acyclic
- Transitive closure determines which tables are reachable (used for column introspection)
- Direct joins — one
ModelJoinper FK on the source table; multi-hop paths are resolved at query time by walking the graph - Dimensions and measures are generated with proper naming conventions
No join-related SQL is baked into the models — JOINs are constructed dynamically at query time from each model's direct join metadata.
Next: See the Example Queries notebook to query this data, or the Joins notebook for a deep dive into join mechanics.