Measures and Aggregations¶
In SLayer, a measure is a named row-level SQL expression. The aggregation (sum, avg, count, etc.) is specified at query time using colon syntax: revenue:sum, *:count, tax_rate:weighted_avg(weight=subtotal).
This notebook demonstrates the full range of aggregation features with working code.
Prerequisites: pip install motley-slayer
import os
import sys
sys.path.insert(0, os.path.join(os.getcwd(), "..", "..", ".."))
sys.path.insert(0, os.path.join(os.getcwd(), "..", "jaffle_data"))
from setup_jaffle import ensure_jaffle_shop
engine, storage, models = ensure_jaffle_shop()
Basic Colon Syntax¶
Every field formula specifies both the measure and the aggregation, separated by a colon. *:count is the universal row count.
result = engine.execute_sync(
query={
"source_model": "orders",
"measures": ["*:count", "order_total:sum", "order_total:avg", "order_total:min", "order_total:max"],
}
)
row = result.data[0]
print(f"Orders: {row['orders._count']:,}")
print(f"Total: ${row['orders.order_total_sum']:,.2f}")
print(f"Average: ${row['orders.order_total_avg']:.2f}")
print(f"Min: ${row['orders.order_total_min']:.2f}")
print(f"Max: ${row['orders.order_total_max']:.2f}")
Orders: 661,146 Total: $7,317,140.30 Average: $11.07 Min: $0.00 Max: $105.35
Multiple Aggregations on One Measure¶
One measure, many aggregations — no need to define order_total_sum, order_total_avg, etc. as separate measures.
result = engine.execute_sync(
query={
"source_model": "orders",
"measures": ["*:count", "order_total:sum", "order_total:avg"],
"dimensions": ["stores.name"],
"order": [{"column": "order_total_sum", "direction": "desc"}],
"limit": 5,
}
)
print(f"{'Store':<20} {'Orders':>8} {'Total':>14} {'Average':>10}")
print("-" * 54)
for row in result.data:
print(
f"{row['orders.stores.name']:<20} {row['orders._count']:>8,} ${row['orders.order_total_sum']:>13,.2f} ${row['orders.order_total_avg']:>9.2f}"
)
Store Orders Total Average ------------------------------------------------------ Brooklyn 256,532 $ 2,793,658.90 $ 10.89 Philadelphia 192,450 $ 2,099,292.44 $ 10.91 Chicago 106,290 $ 1,183,765.61 $ 11.14 San Francisco 90,401 $ 1,059,640.66 $ 11.72 New Orleans 15,473 $ 180,782.69 $ 11.68
Arithmetic on Aggregated Measures¶
Aggregated measures compose naturally in arithmetic expressions.
result = engine.execute_sync(
query={
"source_model": "orders",
"measures": [
"*:count",
"order_total:sum",
{"formula": "order_total:sum / *:count", "name": "aov", "label": "Avg Order Value"},
],
"dimensions": ["stores.name"],
"order": [{"column": "order_total_sum", "direction": "desc"}],
"limit": 5,
}
)
for row in result.data:
print(f"{row['orders.stores.name']}: AOV = ${row['orders.aov']:.2f}")
Brooklyn: AOV = $10.89 Philadelphia: AOV = $10.91 Chicago: AOV = $11.14 San Francisco: AOV = $11.72 New Orleans: AOV = $11.68
Transforms on Aggregated Measures¶
All transform functions (cumsum, change, time_shift, rank, etc.) wrap aggregated measure refs.
result = engine.execute_sync(
query={
"source_model": "orders",
"measures": [
"order_total:sum",
{"formula": "cumsum(order_total:sum)", "name": "running_total"},
{"formula": "change(order_total:sum)", "name": "mom_change"},
],
"time_dimensions": [{"dimension": "ordered_at", "granularity": "month"}],
"order": [{"column": "ordered_at", "direction": "asc"}],
"limit": 6,
}
)
print(f"{'Month':<12} {'Revenue':>14} {'Running Total':>14} {'MoM Change':>12}")
print("-" * 54)
for row in result.data:
month = str(row["orders.ordered_at"])[:7]
rev = row["orders.order_total_sum"]
running = row["orders.running_total"]
change = row["orders.mom_change"]
change_str = f"${change:>11,.2f}" if change is not None else f"{'N/A':>12}"
print(f"{month:<12} ${rev:>13,.2f} ${running:>13,.2f} {change_str}")
Month Revenue Running Total MoM Change ------------------------------------------------------ 2023-05 $ 15,102.20 $ 15,102.20 N/A 2023-06 $ 23,098.27 $ 38,200.47 $ 7,996.07 2023-07 $ 29,880.94 $ 68,081.41 $ 6,782.67 2023-08 $ 38,628.40 $ 106,709.81 $ 8,747.46 2023-09 $ 44,370.20 $ 151,080.01 $ 5,741.80 2023-10 $ 47,540.71 $ 198,620.72 $ 3,170.51
Cross-Model Aggregated Measures¶
Cross-model measures use dot syntax for the join path, colon for the aggregation. customers.*:count means COUNT(*) on the joined customers model.
result = engine.execute_sync(
query={
"source_model": "orders",
"measures": [
"*:count",
"order_total:sum",
"customers.*:count",
],
"time_dimensions": [{"dimension": "ordered_at", "granularity": "month"}],
"order": [{"column": "ordered_at", "direction": "asc"}],
"limit": 6,
}
)
print(f"{'Month':<12} {'Orders':>8} {'Revenue':>14} {'Customers':>10}")
print("-" * 46)
for row in result.data:
month = str(row["orders.ordered_at"])[:7]
orders = row["orders._count"]
rev = row["orders.order_total_sum"]
custs = row["orders.customers._count"]
print(f"{month:<12} {orders:>8,} ${rev:>13,.2f} {custs:>10,}")
Month Orders Revenue Customers ---------------------------------------------- 2023-05 1,335 $ 15,102.20 2,590 2023-06 2,107 $ 23,098.27 2,590 2023-07 2,671 $ 29,880.94 2,590 2023-08 3,466 $ 38,628.40 2,590 2023-09 3,988 $ 44,370.20 2,590 2023-10 4,352 $ 47,540.71 2,590
Inspecting Model Measures¶
Auto-ingested models now have one measure per column — not five.
# In v2 every column carries its data type and (optionally) an explicit
# `allowed_aggregations` whitelist. There's no separate row-level measures
# list — every column can be aggregated at query time via colon syntax.
orders_model = next(m for m in models if m.name == "orders")
print(f"orders_model has {len(orders_model.columns)} columns:")
for col in orders_model.columns:
sql_str = col.sql or col.name
print(f" {col.name:<20} sql: {sql_str:<20} type: {col.type}")
print("\nAny column can be aggregated at query time via colon syntax,")
print("subject to the column's type and allowed_aggregations whitelist.")
print("Primary keys default to count / count_distinct only.")
print("Plus *:count is always available for COUNT(*).")
orders_model has 7 columns: id sql: id type: string customer_id sql: customer_id type: string ordered_at sql: ordered_at type: date store_id sql: store_id type: string subtotal sql: subtotal type: number tax_paid sql: tax_paid type: number order_total sql: order_total type: number Any column can be aggregated at query time via colon syntax, subject to the column's type and allowed_aggregations whitelist. Primary keys default to count / count_distinct only. Plus *:count is always available for COUNT(*).
Custom Aggregation: Weighted Average¶
Let's define a model with a custom weighted average aggregation.
# Define a custom aggregation alongside built-ins. v2 unifies row-level
# definitions into Column; the custom Aggregation lives on the model.
#
# Each store has a `tax_rate`, but stores differ in size — a simple AVG of
# tax_rate would give every store equal weight. We want the *effective* tax
# rate paid across the business, weighted by order revenue. We define this on
# a custom orders model so both `tax_rate` (via the join to stores) and the
# weight `subtotal` are available row-level.
from slayer.core.enums import DataType
from slayer.core.models import Aggregation, AggregationParam, Column, ModelJoin, SlayerModel
orders_custom = SlayerModel(
name="orders_custom",
sql_table="orders",
data_source="jaffle_shop",
columns=[
Column(name="id", sql="id", type=DataType.TEXT, primary_key=True),
Column(name="store_id", sql="store_id", type=DataType.TEXT),
Column(name="subtotal", sql="subtotal", type=DataType.DOUBLE),
Column(name="store_tax_rate", sql="stores.tax_rate", type=DataType.DOUBLE),
],
joins=[ModelJoin(target_model="stores", join_pairs=[["store_id", "id"]])],
aggregations=[
Aggregation(
name="weighted_avg_tax",
formula="SUM({value} * {weight}) / NULLIF(SUM({weight}), 0)",
params=[AggregationParam(name="weight", sql="subtotal")],
description="Tax rate weighted by order subtotal (effective tax rate)",
),
],
)
print(f"Custom aggregations on orders_custom: {[a.name for a in orders_custom.aggregations]}")
Custom aggregations on orders_custom: ['weighted_avg_tax']
Allowed Aggregations Whitelist¶
Restrict which aggregations make sense for a given measure.
# Restrict per-column aggregation eligibility via `allowed_aggregations`. The
# whitelist overrides the type-default eligibility rule (numeric columns get
# sum/avg/min/max/etc by default; PK columns are always count-only).
from slayer.core.enums import DataType
from slayer.core.models import Column, SlayerModel
restricted_model = SlayerModel(
name="restricted_orders",
sql_table="orders",
data_source="jaffle_shop",
columns=[
Column(name="store_id", sql="store_id", type=DataType.TEXT),
Column(
name="subtotal", sql="subtotal", type=DataType.DOUBLE,
allowed_aggregations=["sum", "avg"],
),
],
)
sub_col = next(c for c in restricted_model.columns if c.name == "subtotal")
print(f"subtotal column allows: {sub_col.allowed_aggregations}")
subtotal column allows: ['sum', 'avg']
Generated SQL¶
The colon syntax compiles to standard SQL aggregation functions.
result = engine.execute_sync(
query={
"source_model": "orders",
"measures": ["*:count", "order_total:sum", "order_total:avg"],
"dimensions": ["stores.name"],
"limit": 3,
}
)
print("Generated SQL:")
print(result.sql)
Generated SQL: SELECT stores.name AS "orders.stores.name", COUNT(*) AS "orders._count", SUM(orders.order_total) AS "orders.order_total_sum", AVG(orders.order_total) AS "orders.order_total_avg" FROM orders AS orders LEFT JOIN stores AS stores ON orders.store_id = stores.id GROUP BY stores.name LIMIT 3
Summary¶
| Concept | Syntax | Example |
|---|---|---|
| Aggregation at query time | measure:agg |
revenue:sum, price:avg |
| Row count | *:count |
Always available, no measure needed |
| Non-null count | col:count |
email:count |
| Distinct count | col:count_distinct |
customer_id:count_distinct |
| First/last by time | col:first, col:last |
balance:last(updated_at) |
| Weighted average | col:weighted_avg(weight=w) |
tax_rate:weighted_avg(weight=subtotal) |
| Percentile | col:percentile(p=0.95) |
latency:percentile(p=0.99) |
| Median | col:median |
Shorthand for percentile(p=0.5) |
| Custom formula | Define in model aggregations |
score:trimmed_mean(lo=10, hi=90) |
| Whitelist | allowed_aggregations on measure |
Restricts valid aggregations |
See the aggregations guide for the full explanation.