Joins in SLayer¶

TL;DR: Use the dot syntax to refer to measures/dimensions from joined models inside queries; use the __ syntax to refer to the tables underlying joined models when writing SQL snippets inside dimension, measure, model filter definitions.

Joins connect models so that dimensions and measures from one model are accessible when querying another. SLayer only supports LEFT JOINs — the kind most commonly used for data enrichment.

If you think of left joins as directed edges of a graph whose vertices are models, then the dimensions, measures, and filters in a model have access to columns from any model reachable in the join graph.

This notebook illustrates every aspect of joins with working code.

Prerequisites: pip install motley-slayer

In [1]:

Copied!

import os
import sys

sys.path.insert(0, os.path.join(os.getcwd(), "..", "..", ".."))
sys.path.insert(0, os.path.join(os.getcwd(), "..", "jaffle_data"))

from setup_jaffle import ensure_jaffle_shop

engine, storage, models = ensure_jaffle_shop()
import os
import sys

sys.path.insert(0, os.path.join(os.getcwd(), "..", "..", ".."))
sys.path.insert(0, os.path.join(os.getcwd(), "..", "jaffle_data"))

from setup_jaffle import ensure_jaffle_shop

engine, storage, models = ensure_jaffle_shop()

Basic Joins¶

A join is defined by a target model and a list of join pairs — column pairs to join on. The orders model has two joins, auto-generated from its foreign keys to customers and stores:

In [2]:

Copied!





orders_model = next(m for m in models if m.name == "orders")

print("orders model joins:")
for j in orders_model.joins:
    print(f"  -> {j.target_model}  ON {j.join_pairs}")

# Query using a joined dimension
result = engine.execute_sync(
    query={
        "source_model": "orders",
        "measures": ["*:count", "order_total:sum"],
        "dimensions": ["customers.name"],
        "order": [{"column": "order_total_sum", "direction": "desc"}],
        "limit": 5,
    }
)

print("\nTop 5 customers by revenue:")
for row in result.data:
    print(f"  {row['orders.customers.name']}: {row['orders._count']} orders, ${row['orders.order_total_sum']:,.2f}")
orders_model = next(m for m in models if m.name == "orders")

print("orders model joins:")
for j in orders_model.joins:
    print(f"  -> {j.target_model}  ON {j.join_pairs}")

# Query using a joined dimension
result = engine.execute_sync(
    query={
        "source_model": "orders",
        "measures": ["*:count", "order_total:sum"],
        "dimensions": ["customers.name"],
        "order": [{"column": "order_total_sum", "direction": "desc"}],
        "limit": 5,
    }
)

print("\nTop 5 customers by revenue:")
for row in result.data:
    print(f"  {row['orders.customers.name']}: {row['orders._count']} orders, ${row['orders.order_total_sum']:,.2f}")

orders model joins:
  -> customers  ON [['customer_id', 'id']]
  -> stores  ON [['store_id', 'id']]

Top 5 customers by revenue:
  Michael Conner: 144 orders, $13,357.66
  Nicole Jackson: 817 orders, $13,349.50
  Timothy Evans: 139 orders, $12,724.40
  Jacob Brown: 722 orders, $12,381.73
  Erik Gordon: 132 orders, $12,310.45

Referencing Joined Models in Queries (Dot Syntax)¶

SLayer uses dot syntax to reference dimensions and measures from joined models:

1-hop: customers.name (orders -> customers)
Multi-hop: orders.customers.name (items -> orders -> customers)

The full path avoids ambiguity when there are multiple ways to reach a model.

In [3]:

Copied!





# 1-hop: orders -> stores
result = engine.execute_sync(
    query={
        "source_model": "orders",
        "measures": ["*:count"],
        "dimensions": ["stores.name"],
    }
)

print("1-hop (orders -> stores):")
for row in result.data:
    print(f"  {row['orders.stores.name']}: {row['orders._count']} orders")
# 1-hop: orders -> stores
result = engine.execute_sync(
    query={
        "source_model": "orders",
        "measures": ["*:count"],
        "dimensions": ["stores.name"],
    }
)

print("1-hop (orders -> stores):")
for row in result.data:
    print(f"  {row['orders.stores.name']}: {row['orders._count']} orders")

1-hop (orders -> stores):
  Brooklyn: 256532 orders
  San Francisco: 90401 orders
  New Orleans: 15473 orders
  Chicago: 106290 orders
  Philadelphia: 192450 orders

In [4]:

Copied!





# Multi-hop: items -> orders -> customers
result = engine.execute_sync(
    query={
        "source_model": "items",
        "measures": ["*:count"],
        "dimensions": ["orders.customers.name"],
        "order": [{"column": "_count", "direction": "desc"}],
        "limit": 5,
    }
)

print("Multi-hop (items -> orders -> customers):")
for row in result.data:
    print(
        f"  {row['items.orders.customers.name']}: {row['items._count']} line items"
    )
# Multi-hop: items -> orders -> customers
result = engine.execute_sync(
    query={
        "source_model": "items",
        "measures": ["*:count"],
        "dimensions": ["orders.customers.name"],
        "order": [{"column": "_count", "direction": "desc"}],
        "limit": 5,
    }
)

print("Multi-hop (items -> orders -> customers):")
for row in result.data:
    print(
        f"  {row['items.orders.customers.name']}: {row['items._count']} line items"
    )

Multi-hop (items -> orders -> customers):
  Nicole Jackson: 1706 line items
  Jacob Brown: 1464 line items
  Michael Conner: 1440 line items
  Kyle Evans: 1431 line items
  Timothy Evans: 1390 line items

Referencing Joined Models in SQL Snippets¶

When defining dimensions and measures at the model level (in YAML or Python), their sql fields use double underscores instead of dots for joined table aliases. This is because dots aren't valid in SQL identifiers.

Context	Syntax	Example
Query dimensions/measures	dots	`orders.customers.name`
Model SQL expressions	`__` aliases	`orders__customers.name`

SLayer substitutes the __ aliases for correct table aliases at query time.

In [5]:

Copied!





# The __ alias convention is for model-level SQL definitions (YAML / Python).
# Auto-ingestion doesn't generate these — joined columns are resolved at query time.
#
# Here's a manual example showing the two syntaxes side by side:

from slayer.core.models import Column

manual_col = Column(
    name="customer_name",
    sql="orders__customers.name",  # __ alias: used in model SQL definitions
    type="string",
)
print(f"Model-level SQL (__ alias):  sql={manual_col.sql!r}")
print("Query-level (dot syntax):    orders.customers.name")

# At query time, use dot syntax — SLayer resolves the __ aliases in generated SQL:
result = engine.execute_sync(
    query={
        "source_model": "items",
        "measures": ["*:count"],
        "dimensions": ["orders.customers.name"],
        "limit": 3,
    }
)

print(f"\nResult keys: {list(result.data[0].keys())}")
for row in result.data:
    print(f"  {row['items.orders.customers.name']}: {row['items._count']} items")
# The __ alias convention is for model-level SQL definitions (YAML / Python).
# Auto-ingestion doesn't generate these — joined columns are resolved at query time.
#
# Here's a manual example showing the two syntaxes side by side:

from slayer.core.models import Column

manual_col = Column(
    name="customer_name",
    sql="orders__customers.name",  # __ alias: used in model SQL definitions
    type="string",
)
print(f"Model-level SQL (__ alias):  sql={manual_col.sql!r}")
print("Query-level (dot syntax):    orders.customers.name")

# At query time, use dot syntax — SLayer resolves the __ aliases in generated SQL:
result = engine.execute_sync(
    query={
        "source_model": "items",
        "measures": ["*:count"],
        "dimensions": ["orders.customers.name"],
        "limit": 3,
    }
)

print(f"\nResult keys: {list(result.data[0].keys())}")
for row in result.data:
    print(f"  {row['items.orders.customers.name']}: {row['items._count']} items")

Model-level SQL (__ alias):  sql='orders__customers.name'
Query-level (dot syntax):    orders.customers.name

Result keys: ['items.orders.customers.name', 'items._count']
  Michelle Bentley: 347 items
  William Lewis: 344 items
  Megan Allen: 272 items

Auto-Ingesting Schemas¶

When auto-ingesting a database, SLayer introspects FK constraints and creates a ModelJoin for each direct FK on the source table. Multi-hop reachability (e.g. items → orders → customers) is resolved at query time by walking each intermediate model's own joins — it is not baked into the source model's join list.

In [6]:

Copied!





import sqlalchemy as sa

from setup_jaffle import DB_PATH
from slayer.core.models import DatasourceConfig
from slayer.engine.ingestion import _build_fk_graph

ds = DatasourceConfig(name="jaffle_shop", type="duckdb", database=DB_PATH)
sa_engine = sa.create_engine(ds.resolve_env_vars().get_connection_string())
inspector = sa.inspect(sa_engine)
fk_graph = _build_fk_graph(inspector=inspector, table_names=inspector.get_table_names(), schema=None)
sa_engine.dispose()

print("FK graph:")
for table in sorted(fk_graph):
    print(f"  {table} -> {sorted(fk_graph[table])}")

items_model = next(m for m in models if m.name == "items")
orders_model = next(m for m in models if m.name == "orders")

print("\nitems joins (direct FKs only):")
for j in items_model.joins:
    src, tgt = j.join_pairs[0]
    print(f"  -> {j.target_model:<12} ON {src:<20} = {tgt}")

print("\norders joins (direct FKs only):")
for j in orders_model.joins:
    src, tgt = j.join_pairs[0]
    print(f"  -> {j.target_model:<12} ON {src:<20} = {tgt}")

print("\nMulti-hop: items reaches customers via orders.joins")
import sqlalchemy as sa

from setup_jaffle import DB_PATH
from slayer.core.models import DatasourceConfig
from slayer.engine.ingestion import _build_fk_graph

ds = DatasourceConfig(name="jaffle_shop", type="duckdb", database=DB_PATH)
sa_engine = sa.create_engine(ds.resolve_env_vars().get_connection_string())
inspector = sa.inspect(sa_engine)
fk_graph = _build_fk_graph(inspector=inspector, table_names=inspector.get_table_names(), schema=None)
sa_engine.dispose()

print("FK graph:")
for table in sorted(fk_graph):
    print(f"  {table} -> {sorted(fk_graph[table])}")

items_model = next(m for m in models if m.name == "items")
orders_model = next(m for m in models if m.name == "orders")

print("\nitems joins (direct FKs only):")
for j in items_model.joins:
    src, tgt = j.join_pairs[0]
    print(f"  -> {j.target_model:<12} ON {src:<20} = {tgt}")

print("\norders joins (direct FKs only):")
for j in orders_model.joins:
    src, tgt = j.join_pairs[0]
    print(f"  -> {j.target_model:<12} ON {src:<20} = {tgt}")

print("\nMulti-hop: items reaches customers via orders.joins")

FK graph:
  items -> ['orders', 'products']
  orders -> ['customers', 'stores']
  supplies -> ['products']
  tweets -> ['customers']

items joins (direct FKs only):
  -> orders       ON order_id             = id
  -> products     ON sku                  = sku

orders joins (direct FKs only):
  -> customers    ON customer_id          = id
  -> stores       ON store_id             = id

Multi-hop: items reaches customers via orders.joins

Diamond Joins¶

A diamond join occurs when the same table is reachable via multiple FK paths. For example:

orders -> customers -> regions
orders -> warehouses -> regions

SLayer treats each path as a separate copy of the target table, using path-based aliases to disambiguate:

customers.regions.name -> table alias customers__regions
warehouses.regions.name -> table alias warehouses__regions

The Jaffle Shop schema doesn't have a natural diamond, so let's construct another schema for illustration:

In [7]:

Copied!





from joins_utils import setup_diamond_example

diamond_engine, diamond_storage, diamond_models, diamond_db_path, diamond_work_dir = setup_diamond_example()

diamond_orders = next(m for m in diamond_models if m.name == "orders")
diamond_customers = next(m for m in diamond_models if m.name == "customers")
diamond_warehouses = next(m for m in diamond_models if m.name == "warehouses")

print("orders joins (direct FKs only):")
for j in diamond_orders.joins:
    src, tgt = j.join_pairs[0]
    print(f"  -> {j.target_model:<12} ON {src:<25} = {tgt}")

print("\ncustomers joins:")
for j in diamond_customers.joins:
    src, tgt = j.join_pairs[0]
    print(f"  -> {j.target_model:<12} ON {src:<25} = {tgt}")

print("\nwarehouses joins:")
for j in diamond_warehouses.joins:
    src, tgt = j.join_pairs[0]
    print(f"  -> {j.target_model:<12} ON {src:<25} = {tgt}")

print("\nDiamond: orders reaches regions via customers.joins AND warehouses.joins")
print("Each path gets a unique alias: customers__regions vs warehouses__regions")
from joins_utils import setup_diamond_example

diamond_engine, diamond_storage, diamond_models, diamond_db_path, diamond_work_dir = setup_diamond_example()

diamond_orders = next(m for m in diamond_models if m.name == "orders")
diamond_customers = next(m for m in diamond_models if m.name == "customers")
diamond_warehouses = next(m for m in diamond_models if m.name == "warehouses")

print("orders joins (direct FKs only):")
for j in diamond_orders.joins:
    src, tgt = j.join_pairs[0]
    print(f"  -> {j.target_model:<12} ON {src:<25} = {tgt}")

print("\ncustomers joins:")
for j in diamond_customers.joins:
    src, tgt = j.join_pairs[0]
    print(f"  -> {j.target_model:<12} ON {src:<25} = {tgt}")

print("\nwarehouses joins:")
for j in diamond_warehouses.joins:
    src, tgt = j.join_pairs[0]
    print(f"  -> {j.target_model:<12} ON {src:<25} = {tgt}")

print("\nDiamond: orders reaches regions via customers.joins AND warehouses.joins")
print("Each path gets a unique alias: customers__regions vs warehouses__regions")

orders joins (direct FKs only):
  -> customers    ON customer_id               = id
  -> warehouses   ON warehouse_id              = id

customers joins:
  -> regions      ON region_id                 = id

warehouses joins:
  -> regions      ON region_id                 = id

Diamond: orders reaches regions via customers.joins AND warehouses.joins
Each path gets a unique alias: customers__regions vs warehouses__regions

In [8]:

Copied!





# Query both paths simultaneously
result = diamond_engine.execute_sync(
    query={
        "source_model": "orders",
        "measures": ["*:count", "amount:sum"],
        "dimensions": [
            "customers.regions.name",
            "warehouses.regions.name",
        ],
    }
)

print(f"{'Customer Region':<18} {'Warehouse Region':<18} {'Orders':>7} {'Amount':>10}")
print("-" * 56)
for row in result.data:
    cr = row["orders.customers.regions.name"]
    wr = row["orders.warehouses.regions.name"]
    print(f"{cr:<18} {wr:<18} {row['orders._count']:>7} ${row['orders.amount_sum']:>9,.2f}")
# Query both paths simultaneously
result = diamond_engine.execute_sync(
    query={
        "source_model": "orders",
        "measures": ["*:count", "amount:sum"],
        "dimensions": [
            "customers.regions.name",
            "warehouses.regions.name",
        ],
    }
)

print(f"{'Customer Region':<18} {'Warehouse Region':<18} {'Orders':>7} {'Amount':>10}")
print("-" * 56)
for row in result.data:
    cr = row["orders.customers.regions.name"]
    wr = row["orders.warehouses.regions.name"]
    print(f"{cr:<18} {wr:<18} {row['orders._count']:>7} ${row['orders.amount_sum']:>9,.2f}")

Customer Region    Warehouse Region    Orders     Amount
--------------------------------------------------------
East               West                     2 $   600.00
Central            West                     1 $   225.00
Central            Central                  1 $   150.00
West               East                     3 $   750.00
West               West                     1 $   175.00
East               Central                  2 $   375.00

Recombining Diamond Joins with Filters¶

By default, each path to the same table produces independent copies. If you want to enforce that the two paths resolve to the same row (re-creating a true diamond), add a filter equating the two paths.

Since this filter equates two SQL-level table aliases (customers__regions vs warehouses__regions), it belongs on the model — added via ModelExtension at query time. Model filters use the __ alias syntax for multi-hop join paths (see SQL vs DSL):

If you want to make that filter permanent, you can of course just add it to the model in question (orders in the example below) instead of adding it inside a query.

In [9]:

Copied!





# Only keep orders where customer and warehouse are in the same region.
# The filter equates two SQL aliases, so it goes on the model via ModelExtension.
result = diamond_engine.execute_sync(
    query={
        "source_model": {
            "source_name": "orders",
            "filters": ["customers__regions.name = warehouses__regions.name"],
        },
        "measures": ["*:count", "amount:sum"],
        "dimensions": ["customers.regions.name"],
    }
)

print("Orders where customer and warehouse are in the same region:")
for row in result.data:
    print(f"  {row['orders.customers.regions.name']}: {row['orders._count']} orders, ${row['orders.amount_sum']:,.2f}")
# Only keep orders where customer and warehouse are in the same region.
# The filter equates two SQL aliases, so it goes on the model via ModelExtension.
result = diamond_engine.execute_sync(
    query={
        "source_model": {
            "source_name": "orders",
            "filters": ["customers__regions.name = warehouses__regions.name"],
        },
        "measures": ["*:count", "amount:sum"],
        "dimensions": ["customers.regions.name"],
    }
)

print("Orders where customer and warehouse are in the same region:")
for row in result.data:
    print(f"  {row['orders.customers.regions.name']}: {row['orders._count']} orders, ${row['orders.amount_sum']:,.2f}")

Orders where customer and warehouse are in the same region:
  West: 4 orders, $925.00
  East: 4 orders, $975.00
  Central: 2 orders, $375.00

Dynamic Joins (ModelExtension)¶

Joins can be added at query time via ModelExtension, without modifying the stored model. This is useful for:

Ad-hoc joins to lookup tables
Joining to models created dynamically from queries
Adding context-specific enrichment

More on this in the post (and companion notebook) on multistage queries.

Summary¶

SLayer's join system provides:

Feature	Description
LEFT JOINs	The only join type, used for data enrichment
Dot syntax	`customers.name`, `orders.customers.regions.name` in queries
`__` aliases	`orders__customers.name` in model SQL definitions
Auto-ingestion	FK constraints become `ModelJoin` objects automatically
Diamond joins	Same table via multiple paths gets separate path-based aliases
Dynamic joins	`ModelExtension` adds joins at query time without modifying models

See the Models docs and Ingestion docs for the full reference.