Logo
  • Home
  • Getting Started

  • MCP (AI Agents)
  • CLI (Terminal)
  • REST API (Any Language)
  • Python SDK

Concepts

  • Terminology
  • Models
  • Queries
  • Formulas
  • References (SQL vs DSL)
  • Auto-Ingestion
  • Schema Drift
  • Memories
  • Search

Reference

  • MCP Server
  • REST API
  • Python Client
  • CLI
  • Database Support

Wire Protocols (BI Tools)

  • Flight SQL
  • Postgres Facade

dbt

  • SLayer vs dbt
  • Importing dbt definitions

Tutorials

  • Make it dynamic
  • SQL vs DSL
    • Notebook
  • Auto-Ingestion
    • Notebook
  • Time Dimensions
    • Notebook
  • Joins
    • Notebook
  • Joined Measures
    • Notebook
  • Multistage Queries
    • Notebook
  • Aggregations
    • Notebook
      • Basic Colon Syntax
      • Multiple Aggregations on One Measure
      • Arithmetic on Aggregated Measures
      • Transforms on Aggregated Measures
      • Cross-Model Aggregated Measures
      • Inspecting Model Measures
      • Custom Aggregation: Weighted Average
      • Allowed Aggregations Whitelist
      • Generated SQL
      • Summary
  • Lightning Talk
    • Notebook
  • Schema Drift (worked example)

Configuration

  • Datasources
  • Storage Backends
  • Development
SLayer
  • Tutorials
  • Aggregations
  • Notebook
  • Edit on motleyai/slayer

Measures and Aggregations¶

In SLayer, a measure is a named row-level SQL expression. The aggregation (sum, avg, count, etc.) is specified at query time using colon syntax: revenue:sum, *:count, tax_rate:weighted_avg(weight=subtotal).

This notebook demonstrates the full range of aggregation features with working code.

Prerequisites: pip install motley-slayer

In [1]:
Copied!
import os
import sys

sys.path.insert(0, os.path.join(os.getcwd(), "..", "..", ".."))
sys.path.insert(0, os.path.join(os.getcwd(), "..", "jaffle_data"))

from setup_jaffle import ensure_jaffle_shop

engine, storage, models = ensure_jaffle_shop()
import os import sys sys.path.insert(0, os.path.join(os.getcwd(), "..", "..", "..")) sys.path.insert(0, os.path.join(os.getcwd(), "..", "jaffle_data")) from setup_jaffle import ensure_jaffle_shop engine, storage, models = ensure_jaffle_shop()

Basic Colon Syntax¶

Every field formula specifies both the measure and the aggregation, separated by a colon. *:count is the universal row count.

In [2]:
Copied!
result = engine.execute_sync(
    query={
        "source_model": "orders",
        "measures": ["*:count", "order_total:sum", "order_total:avg", "order_total:min", "order_total:max"],
    }
)

row = result.data[0]
print(f"Orders:  {row['orders._count']:,}")
print(f"Total:   ${row['orders.order_total_sum']:,.2f}")
print(f"Average: ${row['orders.order_total_avg']:.2f}")
print(f"Min:     ${row['orders.order_total_min']:.2f}")
print(f"Max:     ${row['orders.order_total_max']:.2f}")
result = engine.execute_sync( query={ "source_model": "orders", "measures": ["*:count", "order_total:sum", "order_total:avg", "order_total:min", "order_total:max"], } ) row = result.data[0] print(f"Orders: {row['orders._count']:,}") print(f"Total: ${row['orders.order_total_sum']:,.2f}") print(f"Average: ${row['orders.order_total_avg']:.2f}") print(f"Min: ${row['orders.order_total_min']:.2f}") print(f"Max: ${row['orders.order_total_max']:.2f}")
Orders:  661,146
Total:   $7,317,140.30
Average: $11.07
Min:     $0.00
Max:     $105.35

Multiple Aggregations on One Measure¶

One measure, many aggregations — no need to define order_total_sum, order_total_avg, etc. as separate measures.

In [3]:
Copied!
result = engine.execute_sync(
    query={
        "source_model": "orders",
        "measures": ["*:count", "order_total:sum", "order_total:avg"],
        "dimensions": ["stores.name"],
        "order": [{"column": "order_total_sum", "direction": "desc"}],
        "limit": 5,
    }
)

print(f"{'Store':<20} {'Orders':>8} {'Total':>14} {'Average':>10}")
print("-" * 54)
for row in result.data:
    print(
        f"{row['orders.stores.name']:<20} {row['orders._count']:>8,} ${row['orders.order_total_sum']:>13,.2f} ${row['orders.order_total_avg']:>9.2f}"
    )
result = engine.execute_sync( query={ "source_model": "orders", "measures": ["*:count", "order_total:sum", "order_total:avg"], "dimensions": ["stores.name"], "order": [{"column": "order_total_sum", "direction": "desc"}], "limit": 5, } ) print(f"{'Store':<20} {'Orders':>8} {'Total':>14} {'Average':>10}") print("-" * 54) for row in result.data: print( f"{row['orders.stores.name']:<20} {row['orders._count']:>8,} ${row['orders.order_total_sum']:>13,.2f} ${row['orders.order_total_avg']:>9.2f}" )
Store                  Orders          Total    Average
------------------------------------------------------
Brooklyn              256,532 $ 2,793,658.90 $    10.89
Philadelphia          192,450 $ 2,099,292.44 $    10.91
Chicago               106,290 $ 1,183,765.61 $    11.14
San Francisco          90,401 $ 1,059,640.66 $    11.72
New Orleans            15,473 $   180,782.69 $    11.68

Arithmetic on Aggregated Measures¶

Aggregated measures compose naturally in arithmetic expressions.

In [4]:
Copied!
result = engine.execute_sync(
    query={
        "source_model": "orders",
        "measures": [
            "*:count",
            "order_total:sum",
            {"formula": "order_total:sum / *:count", "name": "aov", "label": "Avg Order Value"},
        ],
        "dimensions": ["stores.name"],
        "order": [{"column": "order_total_sum", "direction": "desc"}],
        "limit": 5,
    }
)

for row in result.data:
    print(f"{row['orders.stores.name']}: AOV = ${row['orders.aov']:.2f}")
result = engine.execute_sync( query={ "source_model": "orders", "measures": [ "*:count", "order_total:sum", {"formula": "order_total:sum / *:count", "name": "aov", "label": "Avg Order Value"}, ], "dimensions": ["stores.name"], "order": [{"column": "order_total_sum", "direction": "desc"}], "limit": 5, } ) for row in result.data: print(f"{row['orders.stores.name']}: AOV = ${row['orders.aov']:.2f}")
Brooklyn: AOV = $10.89
Philadelphia: AOV = $10.91
Chicago: AOV = $11.14
San Francisco: AOV = $11.72
New Orleans: AOV = $11.68

Transforms on Aggregated Measures¶

All transform functions (cumsum, change, time_shift, rank, etc.) wrap aggregated measure refs.

In [5]:
Copied!
result = engine.execute_sync(
    query={
        "source_model": "orders",
        "measures": [
            "order_total:sum",
            {"formula": "cumsum(order_total:sum)", "name": "running_total"},
            {"formula": "change(order_total:sum)", "name": "mom_change"},
        ],
        "time_dimensions": [{"dimension": "ordered_at", "granularity": "month"}],
        "order": [{"column": "ordered_at", "direction": "asc"}],
        "limit": 6,
    }
)

print(f"{'Month':<12} {'Revenue':>14} {'Running Total':>14} {'MoM Change':>12}")
print("-" * 54)
for row in result.data:
    month = str(row["orders.ordered_at"])[:7]
    rev = row["orders.order_total_sum"]
    running = row["orders.running_total"]
    change = row["orders.mom_change"]
    change_str = f"${change:>11,.2f}" if change is not None else f"{'N/A':>12}"
    print(f"{month:<12} ${rev:>13,.2f} ${running:>13,.2f} {change_str}")
result = engine.execute_sync( query={ "source_model": "orders", "measures": [ "order_total:sum", {"formula": "cumsum(order_total:sum)", "name": "running_total"}, {"formula": "change(order_total:sum)", "name": "mom_change"}, ], "time_dimensions": [{"dimension": "ordered_at", "granularity": "month"}], "order": [{"column": "ordered_at", "direction": "asc"}], "limit": 6, } ) print(f"{'Month':<12} {'Revenue':>14} {'Running Total':>14} {'MoM Change':>12}") print("-" * 54) for row in result.data: month = str(row["orders.ordered_at"])[:7] rev = row["orders.order_total_sum"] running = row["orders.running_total"] change = row["orders.mom_change"] change_str = f"${change:>11,.2f}" if change is not None else f"{'N/A':>12}" print(f"{month:<12} ${rev:>13,.2f} ${running:>13,.2f} {change_str}")
Month               Revenue  Running Total   MoM Change
------------------------------------------------------
2023-05      $    15,102.20 $    15,102.20          N/A
2023-06      $    23,098.27 $    38,200.47 $   7,996.07
2023-07      $    29,880.94 $    68,081.41 $   6,782.67
2023-08      $    38,628.40 $   106,709.81 $   8,747.46
2023-09      $    44,370.20 $   151,080.01 $   5,741.80
2023-10      $    47,540.71 $   198,620.72 $   3,170.51

Cross-Model Aggregated Measures¶

Cross-model measures use dot syntax for the join path, colon for the aggregation. customers.*:count means COUNT(*) on the joined customers model.

In [6]:
Copied!
result = engine.execute_sync(
    query={
        "source_model": "orders",
        "measures": [
            "*:count",
            "order_total:sum",
            "customers.*:count",
        ],
        "time_dimensions": [{"dimension": "ordered_at", "granularity": "month"}],
        "order": [{"column": "ordered_at", "direction": "asc"}],
        "limit": 6,
    }
)

print(f"{'Month':<12} {'Orders':>8} {'Revenue':>14} {'Customers':>10}")
print("-" * 46)
for row in result.data:
    month = str(row["orders.ordered_at"])[:7]
    orders = row["orders._count"]
    rev = row["orders.order_total_sum"]
    custs = row["orders.customers._count"]
    print(f"{month:<12} {orders:>8,} ${rev:>13,.2f} {custs:>10,}")
result = engine.execute_sync( query={ "source_model": "orders", "measures": [ "*:count", "order_total:sum", "customers.*:count", ], "time_dimensions": [{"dimension": "ordered_at", "granularity": "month"}], "order": [{"column": "ordered_at", "direction": "asc"}], "limit": 6, } ) print(f"{'Month':<12} {'Orders':>8} {'Revenue':>14} {'Customers':>10}") print("-" * 46) for row in result.data: month = str(row["orders.ordered_at"])[:7] orders = row["orders._count"] rev = row["orders.order_total_sum"] custs = row["orders.customers._count"] print(f"{month:<12} {orders:>8,} ${rev:>13,.2f} {custs:>10,}")
Month          Orders        Revenue  Customers
----------------------------------------------
2023-05         1,335 $    15,102.20      2,590
2023-06         2,107 $    23,098.27      2,590
2023-07         2,671 $    29,880.94      2,590
2023-08         3,466 $    38,628.40      2,590
2023-09         3,988 $    44,370.20      2,590
2023-10         4,352 $    47,540.71      2,590

Inspecting Model Measures¶

Auto-ingested models now have one measure per column — not five.

In [7]:
Copied!
# In v2 every column carries its data type and (optionally) an explicit
# `allowed_aggregations` whitelist. There's no separate row-level measures
# list — every column can be aggregated at query time via colon syntax.
orders_model = next(m for m in models if m.name == "orders")

print(f"orders_model has {len(orders_model.columns)} columns:")
for col in orders_model.columns:
    sql_str = col.sql or col.name
    print(f"  {col.name:<20} sql: {sql_str:<20} type: {col.type}")

print("\nAny column can be aggregated at query time via colon syntax,")
print("subject to the column's type and allowed_aggregations whitelist.")
print("Primary keys default to count / count_distinct only.")
print("Plus *:count is always available for COUNT(*).")
# In v2 every column carries its data type and (optionally) an explicit # `allowed_aggregations` whitelist. There's no separate row-level measures # list — every column can be aggregated at query time via colon syntax. orders_model = next(m for m in models if m.name == "orders") print(f"orders_model has {len(orders_model.columns)} columns:") for col in orders_model.columns: sql_str = col.sql or col.name print(f" {col.name:<20} sql: {sql_str:<20} type: {col.type}") print("\nAny column can be aggregated at query time via colon syntax,") print("subject to the column's type and allowed_aggregations whitelist.") print("Primary keys default to count / count_distinct only.") print("Plus *:count is always available for COUNT(*).")
orders_model has 7 columns:
  id                   sql: id                   type: string
  customer_id          sql: customer_id          type: string
  ordered_at           sql: ordered_at           type: date
  store_id             sql: store_id             type: string
  subtotal             sql: subtotal             type: number
  tax_paid             sql: tax_paid             type: number
  order_total          sql: order_total          type: number

Any column can be aggregated at query time via colon syntax,
subject to the column's type and allowed_aggregations whitelist.
Primary keys default to count / count_distinct only.
Plus *:count is always available for COUNT(*).

Custom Aggregation: Weighted Average¶

Let's define a model with a custom weighted average aggregation.

In [8]:
Copied!
# Define a custom aggregation alongside built-ins. v2 unifies row-level
# definitions into Column; the custom Aggregation lives on the model.
#
# Each store has a `tax_rate`, but stores differ in size — a simple AVG of
# tax_rate would give every store equal weight. We want the *effective* tax
# rate paid across the business, weighted by order revenue. We define this on
# a custom orders model so both `tax_rate` (via the join to stores) and the
# weight `subtotal` are available row-level.
from slayer.core.enums import DataType
from slayer.core.models import Aggregation, AggregationParam, Column, ModelJoin, SlayerModel

orders_custom = SlayerModel(
    name="orders_custom",
    sql_table="orders",
    data_source="jaffle_shop",
    columns=[
        Column(name="id", sql="id", type=DataType.TEXT, primary_key=True),
        Column(name="store_id", sql="store_id", type=DataType.TEXT),
        Column(name="subtotal", sql="subtotal", type=DataType.DOUBLE),
        Column(name="store_tax_rate", sql="stores.tax_rate", type=DataType.DOUBLE),
    ],
    joins=[ModelJoin(target_model="stores", join_pairs=[["store_id", "id"]])],
    aggregations=[
        Aggregation(
            name="weighted_avg_tax",
            formula="SUM({value} * {weight}) / NULLIF(SUM({weight}), 0)",
            params=[AggregationParam(name="weight", sql="subtotal")],
            description="Tax rate weighted by order subtotal (effective tax rate)",
        ),
    ],
)
print(f"Custom aggregations on orders_custom: {[a.name for a in orders_custom.aggregations]}")
# Define a custom aggregation alongside built-ins. v2 unifies row-level # definitions into Column; the custom Aggregation lives on the model. # # Each store has a `tax_rate`, but stores differ in size — a simple AVG of # tax_rate would give every store equal weight. We want the *effective* tax # rate paid across the business, weighted by order revenue. We define this on # a custom orders model so both `tax_rate` (via the join to stores) and the # weight `subtotal` are available row-level. from slayer.core.enums import DataType from slayer.core.models import Aggregation, AggregationParam, Column, ModelJoin, SlayerModel orders_custom = SlayerModel( name="orders_custom", sql_table="orders", data_source="jaffle_shop", columns=[ Column(name="id", sql="id", type=DataType.TEXT, primary_key=True), Column(name="store_id", sql="store_id", type=DataType.TEXT), Column(name="subtotal", sql="subtotal", type=DataType.DOUBLE), Column(name="store_tax_rate", sql="stores.tax_rate", type=DataType.DOUBLE), ], joins=[ModelJoin(target_model="stores", join_pairs=[["store_id", "id"]])], aggregations=[ Aggregation( name="weighted_avg_tax", formula="SUM({value} * {weight}) / NULLIF(SUM({weight}), 0)", params=[AggregationParam(name="weight", sql="subtotal")], description="Tax rate weighted by order subtotal (effective tax rate)", ), ], ) print(f"Custom aggregations on orders_custom: {[a.name for a in orders_custom.aggregations]}")
Custom aggregations on orders_custom: ['weighted_avg_tax']

Allowed Aggregations Whitelist¶

Restrict which aggregations make sense for a given measure.

In [9]:
Copied!
# Restrict per-column aggregation eligibility via `allowed_aggregations`. The
# whitelist overrides the type-default eligibility rule (numeric columns get
# sum/avg/min/max/etc by default; PK columns are always count-only).
from slayer.core.enums import DataType
from slayer.core.models import Column, SlayerModel

restricted_model = SlayerModel(
    name="restricted_orders",
    sql_table="orders",
    data_source="jaffle_shop",
    columns=[
        Column(name="store_id", sql="store_id", type=DataType.TEXT),
        Column(
            name="subtotal", sql="subtotal", type=DataType.DOUBLE,
            allowed_aggregations=["sum", "avg"],
        ),
    ],
)
sub_col = next(c for c in restricted_model.columns if c.name == "subtotal")
print(f"subtotal column allows: {sub_col.allowed_aggregations}")
# Restrict per-column aggregation eligibility via `allowed_aggregations`. The # whitelist overrides the type-default eligibility rule (numeric columns get # sum/avg/min/max/etc by default; PK columns are always count-only). from slayer.core.enums import DataType from slayer.core.models import Column, SlayerModel restricted_model = SlayerModel( name="restricted_orders", sql_table="orders", data_source="jaffle_shop", columns=[ Column(name="store_id", sql="store_id", type=DataType.TEXT), Column( name="subtotal", sql="subtotal", type=DataType.DOUBLE, allowed_aggregations=["sum", "avg"], ), ], ) sub_col = next(c for c in restricted_model.columns if c.name == "subtotal") print(f"subtotal column allows: {sub_col.allowed_aggregations}")
subtotal column allows: ['sum', 'avg']

Generated SQL¶

The colon syntax compiles to standard SQL aggregation functions.

In [10]:
Copied!
result = engine.execute_sync(
    query={
        "source_model": "orders",
        "measures": ["*:count", "order_total:sum", "order_total:avg"],
        "dimensions": ["stores.name"],
        "limit": 3,
    }
)

print("Generated SQL:")
print(result.sql)
result = engine.execute_sync( query={ "source_model": "orders", "measures": ["*:count", "order_total:sum", "order_total:avg"], "dimensions": ["stores.name"], "limit": 3, } ) print("Generated SQL:") print(result.sql)
Generated SQL:
SELECT
  stores.name AS "orders.stores.name",
  COUNT(*) AS "orders._count",
  SUM(orders.order_total) AS "orders.order_total_sum",
  AVG(orders.order_total) AS "orders.order_total_avg"
FROM orders AS orders
LEFT JOIN stores AS stores
  ON orders.store_id = stores.id
GROUP BY
  stores.name
LIMIT 3

Summary¶

Concept Syntax Example
Aggregation at query time measure:agg revenue:sum, price:avg
Row count *:count Always available, no measure needed
Non-null count col:count email:count
Distinct count col:count_distinct customer_id:count_distinct
First/last by time col:first, col:last balance:last(updated_at)
Weighted average col:weighted_avg(weight=w) tax_rate:weighted_avg(weight=subtotal)
Percentile col:percentile(p=0.95) latency:percentile(p=0.99)
Median col:median Shorthand for percentile(p=0.5)
Custom formula Define in model aggregations score:trimmed_mean(lo=10, hi=90)
Whitelist allowed_aggregations on measure Restricts valid aggregations

See the aggregations guide for the full explanation.

Previous Next

Built with MkDocs using a theme provided by Read the Docs.