Joins in SLayer¶
TL;DR: Use the dot syntax to refer to measures/dimensions from joined models inside queries; use the __ syntax to refer to the tables underlying joined models when writing SQL snippets inside dimension, measure, model filter definitions.
Joins connect models so that dimensions and measures from one model are accessible when querying another. SLayer only supports LEFT JOINs — the kind most commonly used for data enrichment.
If you think of left joins as directed edges of a graph whose vertices are models, then the dimensions, measures, and filters in a model have access to columns from any model reachable in the join graph.
This notebook illustrates every aspect of joins with working code.
See also: Models — Joins | Queries — Cross-Model Measures | Ingestion — Diamond Joins
Prerequisites: pip install motley-slayer
import os
import sys
sys.path.insert(0, os.path.join(os.getcwd(), "..", "..", ".."))
sys.path.insert(0, os.path.join(os.getcwd(), "..", "jaffle_data"))
from setup_jaffle import ensure_jaffle_shop
engine, storage, models = ensure_jaffle_shop()
Basic Joins¶
A join is defined by a target model and a list of join pairs — column pairs to join on. The orders model has two joins, auto-generated from its foreign keys to customers and stores:
orders_model = next(m for m in models if m.name == "orders")
print("orders model joins:")
for j in orders_model.joins:
print(f" -> {j.target_model} ON {j.join_pairs}")
# Query using a joined dimension
result = engine.execute_sync(
query={
"source_model": "orders",
"measures": ["*:count", "order_total:sum"],
"dimensions": ["customers.name"],
"order": [{"column": "order_total_sum", "direction": "desc"}],
"limit": 5,
}
)
print("\nTop 5 customers by revenue:")
for row in result.data:
print(f" {row['orders.customers.name']}: {row['orders._count']} orders, ${row['orders.order_total_sum']:,.2f}")
orders model joins: -> customers ON [['customer_id', 'id']] -> stores ON [['store_id', 'id']] Top 5 customers by revenue: Michael Conner: 144 orders, $13,357.66 Nicole Jackson: 817 orders, $13,349.50 Timothy Evans: 139 orders, $12,724.40 Jacob Brown: 722 orders, $12,381.73 Erik Gordon: 132 orders, $12,310.45
Referencing Joined Models in Queries (Dot Syntax)¶
SLayer uses dot syntax to reference dimensions and measures from joined models:
- 1-hop:
customers.name(orders -> customers) - Multi-hop:
orders.customers.name(items -> orders -> customers)
The full path avoids ambiguity when there are multiple ways to reach a model.
# 1-hop: orders -> stores
result = engine.execute_sync(
query={
"source_model": "orders",
"measures": ["*:count"],
"dimensions": ["stores.name"],
}
)
print("1-hop (orders -> stores):")
for row in result.data:
print(f" {row['orders.stores.name']}: {row['orders._count']} orders")
1-hop (orders -> stores): Brooklyn: 256532 orders San Francisco: 90401 orders New Orleans: 15473 orders Chicago: 106290 orders Philadelphia: 192450 orders
# Multi-hop: items -> orders -> customers
result = engine.execute_sync(
query={
"source_model": "items",
"measures": ["*:count"],
"dimensions": ["orders.customers.name"],
"order": [{"column": "_count", "direction": "desc"}],
"limit": 5,
}
)
print("Multi-hop (items -> orders -> customers):")
for row in result.data:
print(
f" {row['items.orders.customers.name']}: {row['items._count']} line items"
)
Multi-hop (items -> orders -> customers): Nicole Jackson: 1706 line items Jacob Brown: 1464 line items Michael Conner: 1440 line items Kyle Evans: 1431 line items Timothy Evans: 1390 line items
Referencing Joined Models in SQL Snippets¶
When defining dimensions and measures at the model level (in YAML or Python), their sql fields use double underscores instead of dots for joined table aliases. This is because dots aren't valid in SQL identifiers.
| Context | Syntax | Example |
|---|---|---|
| Query dimensions/measures | dots | orders.customers.name |
| Model SQL expressions | __ aliases |
orders__customers.name |
SLayer substitutes the __ aliases for correct table aliases at query time.
# The __ alias convention is for model-level SQL definitions (YAML / Python).
# Auto-ingestion doesn't generate these — joined columns are resolved at query time.
#
# Here's a manual example showing the two syntaxes side by side:
from slayer.core.models import Column
manual_col = Column(
name="customer_name",
sql="orders__customers.name", # __ alias: used in model SQL definitions
type="string",
)
print(f"Model-level SQL (__ alias): sql={manual_col.sql!r}")
print("Query-level (dot syntax): orders.customers.name")
# At query time, use dot syntax — SLayer resolves the __ aliases in generated SQL:
result = engine.execute_sync(
query={
"source_model": "items",
"measures": ["*:count"],
"dimensions": ["orders.customers.name"],
"limit": 3,
}
)
print(f"\nResult keys: {list(result.data[0].keys())}")
for row in result.data:
print(f" {row['items.orders.customers.name']}: {row['items._count']} items")
Model-level SQL (__ alias): sql='orders__customers.name' Query-level (dot syntax): orders.customers.name Result keys: ['items.orders.customers.name', 'items._count'] Michelle Bentley: 347 items William Lewis: 344 items Megan Allen: 272 items
Auto-Ingesting Schemas¶
When auto-ingesting a database, SLayer introspects FK constraints and creates a ModelJoin for each direct FK on the source table. Multi-hop reachability (e.g. items → orders → customers) is resolved at query time by walking each intermediate model's own joins — it is not baked into the source model's join list.
import sqlalchemy as sa
from setup_jaffle import DB_PATH
from slayer.core.models import DatasourceConfig
from slayer.engine.ingestion import _build_fk_graph
ds = DatasourceConfig(name="jaffle_shop", type="duckdb", database=DB_PATH)
sa_engine = sa.create_engine(ds.resolve_env_vars().get_connection_string())
inspector = sa.inspect(sa_engine)
fk_graph = _build_fk_graph(inspector=inspector, table_names=inspector.get_table_names(), schema=None)
sa_engine.dispose()
print("FK graph:")
for table in sorted(fk_graph):
print(f" {table} -> {sorted(fk_graph[table])}")
items_model = next(m for m in models if m.name == "items")
orders_model = next(m for m in models if m.name == "orders")
print("\nitems joins (direct FKs only):")
for j in items_model.joins:
src, tgt = j.join_pairs[0]
print(f" -> {j.target_model:<12} ON {src:<20} = {tgt}")
print("\norders joins (direct FKs only):")
for j in orders_model.joins:
src, tgt = j.join_pairs[0]
print(f" -> {j.target_model:<12} ON {src:<20} = {tgt}")
print("\nMulti-hop: items reaches customers via orders.joins")
FK graph: items -> ['orders', 'products'] orders -> ['customers', 'stores'] supplies -> ['products'] tweets -> ['customers'] items joins (direct FKs only): -> orders ON order_id = id -> products ON sku = sku orders joins (direct FKs only): -> customers ON customer_id = id -> stores ON store_id = id Multi-hop: items reaches customers via orders.joins
Diamond Joins¶
A diamond join occurs when the same table is reachable via multiple FK paths. For example:
orders -> customers -> regions
orders -> warehouses -> regions
SLayer treats each path as a separate copy of the target table, using path-based aliases to disambiguate:
customers.regions.name-> table aliascustomers__regionswarehouses.regions.name-> table aliaswarehouses__regions
The Jaffle Shop schema doesn't have a natural diamond, so let's construct another schema for illustration:
from joins_utils import setup_diamond_example
diamond_engine, diamond_storage, diamond_models, diamond_db_path, diamond_work_dir = setup_diamond_example()
diamond_orders = next(m for m in diamond_models if m.name == "orders")
diamond_customers = next(m for m in diamond_models if m.name == "customers")
diamond_warehouses = next(m for m in diamond_models if m.name == "warehouses")
print("orders joins (direct FKs only):")
for j in diamond_orders.joins:
src, tgt = j.join_pairs[0]
print(f" -> {j.target_model:<12} ON {src:<25} = {tgt}")
print("\ncustomers joins:")
for j in diamond_customers.joins:
src, tgt = j.join_pairs[0]
print(f" -> {j.target_model:<12} ON {src:<25} = {tgt}")
print("\nwarehouses joins:")
for j in diamond_warehouses.joins:
src, tgt = j.join_pairs[0]
print(f" -> {j.target_model:<12} ON {src:<25} = {tgt}")
print("\nDiamond: orders reaches regions via customers.joins AND warehouses.joins")
print("Each path gets a unique alias: customers__regions vs warehouses__regions")
orders joins (direct FKs only): -> customers ON customer_id = id -> warehouses ON warehouse_id = id customers joins: -> regions ON region_id = id warehouses joins: -> regions ON region_id = id Diamond: orders reaches regions via customers.joins AND warehouses.joins Each path gets a unique alias: customers__regions vs warehouses__regions
# Query both paths simultaneously
result = diamond_engine.execute_sync(
query={
"source_model": "orders",
"measures": ["*:count", "amount:sum"],
"dimensions": [
"customers.regions.name",
"warehouses.regions.name",
],
}
)
print(f"{'Customer Region':<18} {'Warehouse Region':<18} {'Orders':>7} {'Amount':>10}")
print("-" * 56)
for row in result.data:
cr = row["orders.customers.regions.name"]
wr = row["orders.warehouses.regions.name"]
print(f"{cr:<18} {wr:<18} {row['orders._count']:>7} ${row['orders.amount_sum']:>9,.2f}")
Customer Region Warehouse Region Orders Amount -------------------------------------------------------- East West 2 $ 600.00 Central West 1 $ 225.00 Central Central 1 $ 150.00 West East 3 $ 750.00 West West 1 $ 175.00 East Central 2 $ 375.00
Recombining Diamond Joins with Filters¶
By default, each path to the same table produces independent copies. If you want to enforce that the two paths resolve to the same row (re-creating a true diamond), add a filter equating the two paths.
Since this filter equates two SQL-level table aliases (customers__regions vs warehouses__regions), it belongs on the model — added via ModelExtension at query time. Model filters use the __ alias syntax for multi-hop join paths (see SQL vs DSL):
If you want to make that filter permanent, you can of course just add it to the model in question (orders in the example below) instead of adding it inside a query.
# Only keep orders where customer and warehouse are in the same region.
# The filter equates two SQL aliases, so it goes on the model via ModelExtension.
result = diamond_engine.execute_sync(
query={
"source_model": {
"source_name": "orders",
"filters": ["customers__regions.name = warehouses__regions.name"],
},
"measures": ["*:count", "amount:sum"],
"dimensions": ["customers.regions.name"],
}
)
print("Orders where customer and warehouse are in the same region:")
for row in result.data:
print(f" {row['orders.customers.regions.name']}: {row['orders._count']} orders, ${row['orders.amount_sum']:,.2f}")
Orders where customer and warehouse are in the same region: West: 4 orders, $925.00 East: 4 orders, $975.00 Central: 2 orders, $375.00
Dynamic Joins (ModelExtension)¶
Joins can be added at query time via ModelExtension, without modifying the stored model. This is useful for:
- Ad-hoc joins to lookup tables
- Joining to models created dynamically from queries
- Adding context-specific enrichment
More on this in the post (and companion notebook) on multistage queries.
Summary¶
SLayer's join system provides:
| Feature | Description |
|---|---|
| LEFT JOINs | The only join type, used for data enrichment |
| Dot syntax | customers.name, orders.customers.regions.name in queries |
__ aliases |
orders__customers.name in model SQL definitions |
| Auto-ingestion | FK constraints become ModelJoin objects automatically |
| Diamond joins | Same table via multiple paths gets separate path-based aliases |
| Dynamic joins | ModelExtension adds joins at query time without modifying models |
See the Models docs and Ingestion docs for the full reference.