Measures and Aggregations¶

In SLayer, a measure is a named row-level SQL expression. The aggregation (sum, avg, count, etc.) is specified at query time using colon syntax: revenue:sum, *:count, tax_rate:weighted_avg(weight=subtotal).

This notebook demonstrates the full range of aggregation features with working code.

Prerequisites: pip install motley-slayer

In [1]:

Copied!

import os
import sys

sys.path.insert(0, os.path.join(os.getcwd(), "..", "..", ".."))
sys.path.insert(0, os.path.join(os.getcwd(), "..", "jaffle_data"))

from setup_jaffle import ensure_jaffle_shop

engine, storage, models = ensure_jaffle_shop()
import os
import sys

sys.path.insert(0, os.path.join(os.getcwd(), "..", "..", ".."))
sys.path.insert(0, os.path.join(os.getcwd(), "..", "jaffle_data"))

from setup_jaffle import ensure_jaffle_shop

engine, storage, models = ensure_jaffle_shop()

Basic Colon Syntax¶

Every field formula specifies both the measure and the aggregation, separated by a colon. *:count is the universal row count.

In [2]:

Copied!





result = engine.execute_sync(
    query={
        "source_model": "orders",
        "measures": ["*:count", "order_total:sum", "order_total:avg", "order_total:min", "order_total:max"],
    }
)

row = result.data[0]
print(f"Orders:  {row['orders._count']:,}")
print(f"Total:   ${row['orders.order_total_sum']:,.2f}")
print(f"Average: ${row['orders.order_total_avg']:.2f}")
print(f"Min:     ${row['orders.order_total_min']:.2f}")
print(f"Max:     ${row['orders.order_total_max']:.2f}")
result = engine.execute_sync(
    query={
        "source_model": "orders",
        "measures": ["*:count", "order_total:sum", "order_total:avg", "order_total:min", "order_total:max"],
    }
)

row = result.data[0]
print(f"Orders:  {row['orders._count']:,}")
print(f"Total:   ${row['orders.order_total_sum']:,.2f}")
print(f"Average: ${row['orders.order_total_avg']:.2f}")
print(f"Min:     ${row['orders.order_total_min']:.2f}")
print(f"Max:     ${row['orders.order_total_max']:.2f}")

Orders:  661,146
Total:   $7,317,140.30
Average: $11.07
Min:     $0.00
Max:     $105.35

Multiple Aggregations on One Measure¶

One measure, many aggregations — no need to define order_total_sum, order_total_avg, etc. as separate measures.

In [3]:

Copied!





result = engine.execute_sync(
    query={
        "source_model": "orders",
        "measures": ["*:count", "order_total:sum", "order_total:avg"],
        "dimensions": ["stores.name"],
        "order": [{"column": "order_total_sum", "direction": "desc"}],
        "limit": 5,
    }
)

print(f"{'Store':<20} {'Orders':>8} {'Total':>14} {'Average':>10}")
print("-" * 54)
for row in result.data:
    print(
        f"{row['orders.stores.name']:<20} {row['orders._count']:>8,} ${row['orders.order_total_sum']:>13,.2f} ${row['orders.order_total_avg']:>9.2f}"
    )
result = engine.execute_sync(
    query={
        "source_model": "orders",
        "measures": ["*:count", "order_total:sum", "order_total:avg"],
        "dimensions": ["stores.name"],
        "order": [{"column": "order_total_sum", "direction": "desc"}],
        "limit": 5,
    }
)

print(f"{'Store':<20} {'Orders':>8} {'Total':>14} {'Average':>10}")
print("-" * 54)
for row in result.data:
    print(
        f"{row['orders.stores.name']:<20} {row['orders._count']:>8,} ${row['orders.order_total_sum']:>13,.2f} ${row['orders.order_total_avg']:>9.2f}"
    )

Store                  Orders          Total    Average
------------------------------------------------------
Brooklyn              256,532 $ 2,793,658.90 $    10.89
Philadelphia          192,450 $ 2,099,292.44 $    10.91
Chicago               106,290 $ 1,183,765.61 $    11.14
San Francisco          90,401 $ 1,059,640.66 $    11.72
New Orleans            15,473 $   180,782.69 $    11.68

Arithmetic on Aggregated Measures¶

Aggregated measures compose naturally in arithmetic expressions.

In [4]:

Copied!





result = engine.execute_sync(
    query={
        "source_model": "orders",
        "measures": [
            "*:count",
            "order_total:sum",
            {"formula": "order_total:sum / *:count", "name": "aov", "label": "Avg Order Value"},
        ],
        "dimensions": ["stores.name"],
        "order": [{"column": "order_total_sum", "direction": "desc"}],
        "limit": 5,
    }
)

for row in result.data:
    print(f"{row['orders.stores.name']}: AOV = ${row['orders.aov']:.2f}")
result = engine.execute_sync(
    query={
        "source_model": "orders",
        "measures": [
            "*:count",
            "order_total:sum",
            {"formula": "order_total:sum / *:count", "name": "aov", "label": "Avg Order Value"},
        ],
        "dimensions": ["stores.name"],
        "order": [{"column": "order_total_sum", "direction": "desc"}],
        "limit": 5,
    }
)

for row in result.data:
    print(f"{row['orders.stores.name']}: AOV = ${row['orders.aov']:.2f}")

Brooklyn: AOV = $10.89
Philadelphia: AOV = $10.91
Chicago: AOV = $11.14
San Francisco: AOV = $11.72
New Orleans: AOV = $11.68

Transforms on Aggregated Measures¶

All transform functions (cumsum, change, time_shift, rank, etc.) wrap aggregated measure refs.

In [5]:

Copied!





result = engine.execute_sync(
    query={
        "source_model": "orders",
        "measures": [
            "order_total:sum",
            {"formula": "cumsum(order_total:sum)", "name": "running_total"},
            {"formula": "change(order_total:sum)", "name": "mom_change"},
        ],
        "time_dimensions": [{"dimension": "ordered_at", "granularity": "month"}],
        "order": [{"column": "ordered_at", "direction": "asc"}],
        "limit": 6,
    }
)

print(f"{'Month':<12} {'Revenue':>14} {'Running Total':>14} {'MoM Change':>12}")
print("-" * 54)
for row in result.data:
    month = str(row["orders.ordered_at"])[:7]
    rev = row["orders.order_total_sum"]
    running = row["orders.running_total"]
    change = row["orders.mom_change"]
    change_str = f"${change:>11,.2f}" if change is not None else f"{'N/A':>12}"
    print(f"{month:<12} ${rev:>13,.2f} ${running:>13,.2f} {change_str}")
result = engine.execute_sync(
    query={
        "source_model": "orders",
        "measures": [
            "order_total:sum",
            {"formula": "cumsum(order_total:sum)", "name": "running_total"},
            {"formula": "change(order_total:sum)", "name": "mom_change"},
        ],
        "time_dimensions": [{"dimension": "ordered_at", "granularity": "month"}],
        "order": [{"column": "ordered_at", "direction": "asc"}],
        "limit": 6,
    }
)

print(f"{'Month':<12} {'Revenue':>14} {'Running Total':>14} {'MoM Change':>12}")
print("-" * 54)
for row in result.data:
    month = str(row["orders.ordered_at"])[:7]
    rev = row["orders.order_total_sum"]
    running = row["orders.running_total"]
    change = row["orders.mom_change"]
    change_str = f"${change:>11,.2f}" if change is not None else f"{'N/A':>12}"
    print(f"{month:<12} ${rev:>13,.2f} ${running:>13,.2f} {change_str}")

Month               Revenue  Running Total   MoM Change
------------------------------------------------------
2023-05      $    15,102.20 $    15,102.20          N/A
2023-06      $    23,098.27 $    38,200.47 $   7,996.07
2023-07      $    29,880.94 $    68,081.41 $   6,782.67
2023-08      $    38,628.40 $   106,709.81 $   8,747.46
2023-09      $    44,370.20 $   151,080.01 $   5,741.80
2023-10      $    47,540.71 $   198,620.72 $   3,170.51

Cross-Model Aggregated Measures¶

Cross-model measures use dot syntax for the join path, colon for the aggregation. customers.*:count means COUNT(*) on the joined customers model.

In [6]:

Copied!





result = engine.execute_sync(
    query={
        "source_model": "orders",
        "measures": [
            "*:count",
            "order_total:sum",
            "customers.*:count",
        ],
        "time_dimensions": [{"dimension": "ordered_at", "granularity": "month"}],
        "order": [{"column": "ordered_at", "direction": "asc"}],
        "limit": 6,
    }
)

print(f"{'Month':<12} {'Orders':>8} {'Revenue':>14} {'Customers':>10}")
print("-" * 46)
for row in result.data:
    month = str(row["orders.ordered_at"])[:7]
    orders = row["orders._count"]
    rev = row["orders.order_total_sum"]
    custs = row["orders.customers._count"]
    print(f"{month:<12} {orders:>8,} ${rev:>13,.2f} {custs:>10,}")
result = engine.execute_sync(
    query={
        "source_model": "orders",
        "measures": [
            "*:count",
            "order_total:sum",
            "customers.*:count",
        ],
        "time_dimensions": [{"dimension": "ordered_at", "granularity": "month"}],
        "order": [{"column": "ordered_at", "direction": "asc"}],
        "limit": 6,
    }
)

print(f"{'Month':<12} {'Orders':>8} {'Revenue':>14} {'Customers':>10}")
print("-" * 46)
for row in result.data:
    month = str(row["orders.ordered_at"])[:7]
    orders = row["orders._count"]
    rev = row["orders.order_total_sum"]
    custs = row["orders.customers._count"]
    print(f"{month:<12} {orders:>8,} ${rev:>13,.2f} {custs:>10,}")

Month          Orders        Revenue  Customers
----------------------------------------------
2023-05         1,335 $    15,102.20      2,590
2023-06         2,107 $    23,098.27      2,590
2023-07         2,671 $    29,880.94      2,590
2023-08         3,466 $    38,628.40      2,590
2023-09         3,988 $    44,370.20      2,590
2023-10         4,352 $    47,540.71      2,590

Inspecting Model Measures¶

Auto-ingested models now have one measure per column — not five.

In [7]:

Copied!





# In v2 every column carries its data type and (optionally) an explicit
# `allowed_aggregations` whitelist. There's no separate row-level measures
# list — every column can be aggregated at query time via colon syntax.
orders_model = next(m for m in models if m.name == "orders")

print(f"orders_model has {len(orders_model.columns)} columns:")
for col in orders_model.columns:
    sql_str = col.sql or col.name
    print(f"  {col.name:<20} sql: {sql_str:<20} type: {col.type}")

print("\nAny column can be aggregated at query time via colon syntax,")
print("subject to the column's type and allowed_aggregations whitelist.")
print("Primary keys default to count / count_distinct only.")
print("Plus *:count is always available for COUNT(*).")
# In v2 every column carries its data type and (optionally) an explicit
# `allowed_aggregations` whitelist. There's no separate row-level measures
# list — every column can be aggregated at query time via colon syntax.
orders_model = next(m for m in models if m.name == "orders")

print(f"orders_model has {len(orders_model.columns)} columns:")
for col in orders_model.columns:
    sql_str = col.sql or col.name
    print(f"  {col.name:<20} sql: {sql_str:<20} type: {col.type}")

print("\nAny column can be aggregated at query time via colon syntax,")
print("subject to the column's type and allowed_aggregations whitelist.")
print("Primary keys default to count / count_distinct only.")
print("Plus *:count is always available for COUNT(*).")

orders_model has 7 columns:
  id                   sql: id                   type: string
  customer_id          sql: customer_id          type: string
  ordered_at           sql: ordered_at           type: date
  store_id             sql: store_id             type: string
  subtotal             sql: subtotal             type: number
  tax_paid             sql: tax_paid             type: number
  order_total          sql: order_total          type: number

Any column can be aggregated at query time via colon syntax,
subject to the column's type and allowed_aggregations whitelist.
Primary keys default to count / count_distinct only.
Plus *:count is always available for COUNT(*).

Custom Aggregation: Weighted Average¶

Let's define a model with a custom weighted average aggregation.

In [8]:

Copied!





# Define a custom aggregation alongside built-ins. v2 unifies row-level
# definitions into Column; the custom Aggregation lives on the model.
#
# Each store has a `tax_rate`, but stores differ in size — a simple AVG of
# tax_rate would give every store equal weight. We want the *effective* tax
# rate paid across the business, weighted by order revenue. We define this on
# a custom orders model so both `tax_rate` (via the join to stores) and the
# weight `subtotal` are available row-level.
from slayer.core.enums import DataType
from slayer.core.models import Aggregation, AggregationParam, Column, ModelJoin, SlayerModel

orders_custom = SlayerModel(
    name="orders_custom",
    sql_table="orders",
    data_source="jaffle_shop",
    columns=[
        Column(name="id", sql="id", type=DataType.TEXT, primary_key=True),
        Column(name="store_id", sql="store_id", type=DataType.TEXT),
        Column(name="subtotal", sql="subtotal", type=DataType.DOUBLE),
        Column(name="store_tax_rate", sql="stores.tax_rate", type=DataType.DOUBLE),
    ],
    joins=[ModelJoin(target_model="stores", join_pairs=[["store_id", "id"]])],
    aggregations=[
        Aggregation(
            name="weighted_avg_tax",
            formula="SUM({value} * {weight}) / NULLIF(SUM({weight}), 0)",
            params=[AggregationParam(name="weight", sql="subtotal")],
            description="Tax rate weighted by order subtotal (effective tax rate)",
        ),
    ],
)
print(f"Custom aggregations on orders_custom: {[a.name for a in orders_custom.aggregations]}")
# Define a custom aggregation alongside built-ins. v2 unifies row-level
# definitions into Column; the custom Aggregation lives on the model.
#
# Each store has a `tax_rate`, but stores differ in size — a simple AVG of
# tax_rate would give every store equal weight. We want the *effective* tax
# rate paid across the business, weighted by order revenue. We define this on
# a custom orders model so both `tax_rate` (via the join to stores) and the
# weight `subtotal` are available row-level.
from slayer.core.enums import DataType
from slayer.core.models import Aggregation, AggregationParam, Column, ModelJoin, SlayerModel

orders_custom = SlayerModel(
    name="orders_custom",
    sql_table="orders",
    data_source="jaffle_shop",
    columns=[
        Column(name="id", sql="id", type=DataType.TEXT, primary_key=True),
        Column(name="store_id", sql="store_id", type=DataType.TEXT),
        Column(name="subtotal", sql="subtotal", type=DataType.DOUBLE),
        Column(name="store_tax_rate", sql="stores.tax_rate", type=DataType.DOUBLE),
    ],
    joins=[ModelJoin(target_model="stores", join_pairs=[["store_id", "id"]])],
    aggregations=[
        Aggregation(
            name="weighted_avg_tax",
            formula="SUM({value} * {weight}) / NULLIF(SUM({weight}), 0)",
            params=[AggregationParam(name="weight", sql="subtotal")],
            description="Tax rate weighted by order subtotal (effective tax rate)",
        ),
    ],
)
print(f"Custom aggregations on orders_custom: {[a.name for a in orders_custom.aggregations]}")

Custom aggregations on orders_custom: ['weighted_avg_tax']

Allowed Aggregations Whitelist¶

Restrict which aggregations make sense for a given measure.

In [9]:

Copied!





# Restrict per-column aggregation eligibility via `allowed_aggregations`. The
# whitelist overrides the type-default eligibility rule (numeric columns get
# sum/avg/min/max/etc by default; PK columns are always count-only).
from slayer.core.enums import DataType
from slayer.core.models import Column, SlayerModel

restricted_model = SlayerModel(
    name="restricted_orders",
    sql_table="orders",
    data_source="jaffle_shop",
    columns=[
        Column(name="store_id", sql="store_id", type=DataType.TEXT),
        Column(
            name="subtotal", sql="subtotal", type=DataType.DOUBLE,
            allowed_aggregations=["sum", "avg"],
        ),
    ],
)
sub_col = next(c for c in restricted_model.columns if c.name == "subtotal")
print(f"subtotal column allows: {sub_col.allowed_aggregations}")
# Restrict per-column aggregation eligibility via `allowed_aggregations`. The
# whitelist overrides the type-default eligibility rule (numeric columns get
# sum/avg/min/max/etc by default; PK columns are always count-only).
from slayer.core.enums import DataType
from slayer.core.models import Column, SlayerModel

restricted_model = SlayerModel(
    name="restricted_orders",
    sql_table="orders",
    data_source="jaffle_shop",
    columns=[
        Column(name="store_id", sql="store_id", type=DataType.TEXT),
        Column(
            name="subtotal", sql="subtotal", type=DataType.DOUBLE,
            allowed_aggregations=["sum", "avg"],
        ),
    ],
)
sub_col = next(c for c in restricted_model.columns if c.name == "subtotal")
print(f"subtotal column allows: {sub_col.allowed_aggregations}")

subtotal column allows: ['sum', 'avg']

Generated SQL¶

The colon syntax compiles to standard SQL aggregation functions.

In [10]:

Copied!





result = engine.execute_sync(
    query={
        "source_model": "orders",
        "measures": ["*:count", "order_total:sum", "order_total:avg"],
        "dimensions": ["stores.name"],
        "limit": 3,
    }
)

print("Generated SQL:")
print(result.sql)
result = engine.execute_sync(
    query={
        "source_model": "orders",
        "measures": ["*:count", "order_total:sum", "order_total:avg"],
        "dimensions": ["stores.name"],
        "limit": 3,
    }
)

print("Generated SQL:")
print(result.sql)

Generated SQL:
SELECT
  stores.name AS "orders.stores.name",
  COUNT(*) AS "orders._count",
  SUM(orders.order_total) AS "orders.order_total_sum",
  AVG(orders.order_total) AS "orders.order_total_avg"
FROM orders AS orders
LEFT JOIN stores AS stores
  ON orders.store_id = stores.id
GROUP BY
  stores.name
LIMIT 3

Summary¶

Concept	Syntax	Example
Aggregation at query time	`measure:agg`	`revenue:sum`, `price:avg`
Row count	`*:count`	Always available, no measure needed
Non-null count	`col:count`	`email:count`
Distinct count	`col:count_distinct`	`customer_id:count_distinct`
First/last by time	`col:first`, `col:last`	`balance:last(updated_at)`
Weighted average	`col:weighted_avg(weight=w)`	`tax_rate:weighted_avg(weight=subtotal)`
Percentile	`col:percentile(p=0.95)`	`latency:percentile(p=0.99)`
Median	`col:median`	Shorthand for percentile(p=0.5)
Custom formula	Define in model `aggregations`	`score:trimmed_mean(lo=10, hi=90)`
Whitelist	`allowed_aggregations` on measure	Restricts valid aggregations

See the aggregations guide for the full explanation.