Columns and Aggregations Are Separate Things¶
Most semantic layers force you to bake the aggregation into the column definition. You want revenue? Define revenue_sum. Want average revenue too? Define revenue_avg. Five aggregation types per numeric column, times twenty columns, and you're staring at a hundred definitions before you've written a single query.
SLayer takes a different approach: a column is just a named SQL expression — a row-level fact about your data. The aggregation — how you want to roll it up — is specified when you query, not when you define the model.
What this looks like¶
A model defines columns as bare expressions:
columns:
- name: subtotal
sql: subtotal
type: number
- name: tax_paid
sql: tax_paid
type: number
- name: order_total
sql: order_total
type: number
No type: sum or type: avg. Just what the column is.
At query time, you pick the aggregation with colon syntax:
{
"source_model": "orders",
"measures": ["subtotal:sum", "subtotal:avg", "order_total:min", "order_total:max"],
"dimensions": ["stores.name"]
}
subtotal:sum means "take the subtotal column and SUM it." order_total:min means "take the order_total column and find the MIN." One column definition, as many aggregations as you need.
COUNT(*) and the star measure¶
COUNT(*) doesn't aggregate a specific column — it counts rows. In SLayer, * is the "all rows" placeholder:
*:count produces COUNT(*). Result column: orders._count (the underscore prefix distinguishes it from any dimension that might happen to be called count).
Note:
*can only be used withcount. Combinations like*:sumor*:avgare invalid — use a named measure instead.
You can also count non-null values of a specific column: email:count produces COUNT(email). And customer_id:count_distinct gives you COUNT(DISTINCT customer_id).
Built-in aggregations¶
These are always available — no definition needed:
| Aggregation | What it does |
|---|---|
sum |
SUM(expr) |
avg |
AVG(expr) |
sum(window='90d') / avg(window='90d') |
trailing range SUM/AVG ending at each output bucket |
min / max |
MIN/MAX(expr) |
count |
COUNT(expr), or COUNT(*) with * |
count_distinct |
COUNT(DISTINCT expr) |
first / last |
Value from the earliest/latest record per group (by time) |
weighted_avg |
SUM(expr * weight) / SUM(weight) |
median |
PERCENTILE_CONT(0.5) — see database support below |
percentile |
PERCENTILE_CONT(p) — specify p as an argument; see database support below |
stddev_samp / stddev_pop |
Sample / population standard deviation |
var_samp / var_pop |
Sample / population variance |
corr |
price:corr(other=quantity) — Pearson correlation between two columns |
covar_samp / covar_pop |
price:covar_samp(other=quantity) — sample / population covariance |
Database support for median / percentile¶
| Engine | Supported? | How |
|---|---|---|
| Postgres | yes | Native PERCENTILE_CONT(p) WITHIN GROUP (ORDER BY x). |
| DuckDB | yes | sqlglot rewrites ordered-set percentiles to DuckDB's QUANTILE_CONT(x, p ORDER BY x) syntax. |
| SQLite | yes | Python aggregate UDFs registered on every connection by SLayer. |
| ClickHouse | yes | Native median(x) and parametric quantile(p)(x). |
| MySQL | no | No native function and no Python-UDF mechanism — SLayer raises NotImplementedError. Use MariaDB or compute client-side. |
sum and avg can take a trailing time window when the query has a time
dimension:
{
"measures": [
{"formula": "revenue:sum(window='30d')", "name": "revenue_30d"},
{"formula": "revenue:avg(window='1y2m3w5d6h7min8s')", "name": "avg_window"}
],
"time_dimensions": [{"dimension": "created_at", "granularity": "month"}]
}
Duration units are y, m, w, d, h, min, and s. The window is
computed over raw source rows, so it may be larger, equal to, or smaller than
the query's time granularity.
Custom aggregations¶
The built-ins cover common cases. When they don't, define your own:
aggregations:
- name: trimmed_mean
formula: "AVG(CASE WHEN {value} BETWEEN {lo} AND {hi} THEN {value} END)"
params:
- name: lo
sql: "0"
- name: hi
sql: "1000"
{value} is the measure's SQL expression. {lo} and {hi} are parameters with defaults that can be overridden at query time:
You can also override built-in aggregation defaults. If weighted_avg should default to a specific weight column in your model:
Now tax_rate:weighted_avg uses subtotal as the weight without you specifying it every time. But you can still override: tax_rate:weighted_avg(weight=order_total).
Controlling which aggregations apply¶
Not every aggregation makes sense for every column. customer_id:avg? Probably not useful. The allowed_aggregations field lets you whitelist:
columns:
- name: customer_id
sql: customer_id
type: number
allowed_aggregations: [count, count_distinct]
- name: revenue
sql: amount
type: number
allowed_aggregations: [sum, avg, min, max, weighted_avg]
SLayer validates this at query time and at model creation — if you try customer_id:sum, you get a clear error listing the valid options.
first and last¶
first and last return the value from the earliest or latest record in each group, ordered by a time column. They need a time dimension to know what "earliest" and "latest" mean:
{
"measures": ["balance:last", "balance:first"],
"time_dimensions": [{"dimension": "updated_at", "granularity": "month"}]
}
If you want to use a specific time column (overriding the query's time dimension), pass it as an argument:
This explicit time argument takes priority over everything — query-level time_dimensions, main_time_dimension, and the model's default_time_dimension.
Don't confuse the last aggregation (balance:last) with the last() transform (last(revenue:sum)). The aggregation picks the latest record's value within each time bucket. The transform broadcasts the latest time bucket's aggregated value to every row. Different operations, different use cases.
Percentiles¶
median is built in, but you might want the 95th percentile, or Q1/Q3:
Composing with transforms and arithmetic¶
Arithmetic:
Transforms:
{"formula": "cumsum(revenue:sum)"}
{"formula": "change(revenue:sum)"}
{"formula": "time_shift(revenue:sum, -1, 'year')"}
Cross-model:
Result column naming¶
The colon becomes an underscore in result keys:
| Formula | Result key |
|---|---|
revenue:sum |
orders.revenue_sum |
*:count |
orders._count |
revenue:avg |
orders.revenue_avg |
customers.*:count |
orders.customers._count |
When a query is saved as a model (create_model with a query parameter), these canonical names become the new model's column names.
See the companion notebook for runnable code demonstrating all of the above.