Models

A model is SLayer's view of a database table or an underlying SQL query. It declares the columns, named metric formulas, joins, and always-applied filters that queries can build on. Models are defined as YAML (one file per model under models/<data_source>/) or created via API/MCP — the two paths produce the same persisted object.

A tiny example to anchor what follows:

name: orders
sql_table: public.orders
data_source: my_postgres
columns:
  - {name: id, type: number, primary_key: true}
  - {name: status, type: string}
  - {name: revenue, sql: amount, type: number}
measures:
  - {name: aov, formula: "revenue:sum / *:count"}

A query then asks for revenue:sum (aggregate the revenue column), aov (the saved formula), or status (group by it). Same model, different roles per query.

Fields at a glance

Field Type Required Description
name string Yes Unique model name
sql_table string One of A physical database table (e.g. public.orders)
sql string these A SQL subquery to use as the source
source_queries list[SlayerQuery] three Saved query stages — makes the model query-backed
data_source string Yes Datasource name
columns list[Column] No Column definitions. For query-backed models this is an engine-managed cache
measures list[ModelMeasure] No Named formula library — referenced by bare name in queries
aggregations list[Aggregation] No Custom aggregation operators usable via colon syntax
joins list[ModelJoin] No LEFT JOIN relationships to other models
filters list[str] No Model-level WHERE filters (always applied)
default_time_dimension string No Default time dim for time-dependent formulas
query_variables dict No Defaults for {var} placeholders (query-backed models only)
backing_query_sql string No Engine-managed cache of the rendered backing query
description string No Helps agents and users understand the model
hidden bool No Hide from listings
meta dict No Arbitrary JSON metadata for caller bookkeeping
version int No Schema version stamp (currently 6)

Source modes

A model has exactly one source — set by one of three mutually exclusive fields:

  • sql_table — a physical database table.
  • sql — an explicit SQL subquery (a SELECT statement). Useful when the model's underlying shape requires cleaning or joining beyond what SLayer expresses natively.
  • source_queries — one or more saved SlayerQuery stages. Makes the model query-backed: see Query-backed models.

Validators reject empty source_queries=[], multiple sources, or missing names on non-final stages.

Columns

A column is the unit of structure on the model. The same column entry can serve as a group-by key in one query and as input to an aggregation in another — the role is decided per query, not declared up front. What the column carries is its identity (name), how to compute it from the source (sql), what data type to expect, and a handful of policy fields (which aggregations are allowed, whether it's a primary key, whether it's hidden).

Fields

Field Type Required Default Description
name string Yes Unique within the model. Must not contain .
description string No Clarifies meaning for agents and users
label string No Display name; propagates into query result metadata
sql string No (bare column name) SQL expression — defaults to the column's name
type string No string string, number, boolean, time, date
primary_key bool No false Restricts aggregation to count / count_distinct
hidden bool No false Hide from listings
format dict No NumberFormat used by response metadata
allowed_aggregations list[str] No Whitelist (must be a subset of the type-default eligibility set, or a custom aggregation defined on this model)
filter string No SQL condition applied inside CASE WHEN at aggregation time. See Filtered columns
meta dict No Arbitrary JSON metadata
sampled string No Cached sample-value text snapshot (top-20 by frequency joined, or top20 ... (N distinct) on overflow, or min .. max for numeric/temporal); populated by slayer ingest and friends
sampled_values list[str] No Structured top-50-by-frequency list (categorical only); the unambiguous counterpart to sampled for consumers that need to compare predicate literals against stored values. None for numeric/temporal columns
distinct_count int No True total cardinality at profile time (categorical only). Set via a secondary count_distinct query when overflow is detected, so it's exact rather than capped. None for numeric/temporal columns

Data types

Type Description SQL examples
string Text values VARCHAR, TEXT, CHAR
number Numeric values INTEGER, FLOAT, NUMERIC
boolean True/false BOOLEAN
time Timestamp TIMESTAMP, DATETIME
date Date only DATE

Aggregation eligibility

A column with no explicit allowed_aggregations whitelist gets a default set based on its data type. The defaults are deliberately liberal for numerics and strict for the rest, so an agent doesn't accidentally SUM a string column.

Type Default eligible aggregations
number sum, avg, min, max, count, count_distinct, median, weighted_avg, percentile, first, last, stddev_samp, stddev_pop, var_samp, var_pop, corr, covar_samp, covar_pop
string count, count_distinct, first, last, min, max
boolean count, count_distinct, sum, min, max, first, last
date / time count, count_distinct, first, last, min, max

Primary-key columns are always restricted to count / count_distinct regardless of type. When allowed_aggregations is set, every entry must already be eligible under the type-default map (or be a custom aggregation defined on this model); violations are caught at model construction time, so query-time validation is a single membership check.

Filtered columns

A column can carry a filter — a SQL condition wrapped around the column inside an aggregation via CASE WHEN. This is how you express business metrics that apply to a row subset without a separate model:

columns:
  - name: active_revenue
    sql: amount
    type: number
    filter: "status = 'active'"
  - name: completed_count
    sql: id
    type: number
    filter: "status = 'completed'"

active_revenue:sum then generates SUM(CASE WHEN status = 'active' THEN amount END). The filter does nothing when the column is used as a group-by dimension — it fires only inside aggregations.

Filters can reference joined columns via dot syntax (categories.type = 'electronics'). Filtered and unfiltered columns coexist freely in the same query and combine cleanly in arithmetic formulas (e.g. {"formula": "active_revenue:sum / total_revenue:sum"}).

Derived Columns Referencing Other Derived Columns

A Column.sql may reference any other column on the same model or on a joined model — including columns that are themselves derived (have their own sql expression rather than being a bare base-table column). The engine recursively inlines those references at query time, so chains stay DRY.

# Model: stations
columns:
  - name: foo_raw                 # base column
    sql: "foo_raw"
    type: number

  - name: foo_normalized          # derived on stations
    sql: "foo_raw / 100.0"
    type: number

# Model: telescopes — joined to stations
columns:
  - name: aoi_ratio               # derived on telescopes, references the
                                  # *derived* stations.foo_normalized
    sql: "telescopes.aperture / stations.foo_normalized"
    type: number
joins:
  - target_model: stations
    join_pairs: [["station_id", "id"]]

At query time, aoi_ratio expands to telescopes.aperture / (stations.foo_raw / 100.0). The same applies to local-model chains (a column on the source model referencing another derived column on the same model) and to multi-hop join paths (use the __-delimited form, e.g., B__C.x_derived, when crossing more than one join).

Same-model references may be written bare (just the column name) or qualified with the host alias — both forms expand the same way. So given bucket.sql = "raw_a / 10", a sibling rn.sql = "ROW_NUMBER() OVER (PARTITION BY bucket ORDER BY id)" correctly expands bucket to the inlined body. Bare references inside a nested scope (sub-query, UNION branch, CTE, VALUES) are NOT inlined — those identifiers belong to the inner rowset, not the host model — so Column.sql = "(SELECT MAX(score) FROM other) + score" inlines the outer score but leaves the inner one alone.

Cycles in the reference graph (e.g., c1.sql = "c2 + 1" and c2.sql = "c1 - 1") are rejected at save_model time and raise ColumnCycleError (which subclasses both SlayerError and ValueError) with the cycle path in the message — so a broken chain never reaches a query. The compile-time guard remains as defence in depth. Save-time validation stays within the model's data_source; unresolved cross-datasource refs are silently skipped. The same expansion is applied to filters and to colon-aggregated measures, so "B.foo_normalized:sum" produces SUM(B.foo_raw / 100.0).

Window functions in Column.sql

A column's sql may contain a window function (row_number() over (...), dense_rank() over (...), etc.). The column behaves like any other column when used in dimensions / SELECT.

Filtering on a windowed column is rejected. A query filter naming a Column whose sql contains a window function (e.g. {"filters": ["rn <= 3"]} against a column whose sql is row_number() over (...)) raises with a clear message. Use {"filters": ["rank(<measure>) <= 3"]} (see formulas.md) — the rank-family transforms cover the top-N case in pure DSL — or factor the column into a multi-stage source_queries model.

SQL expression conventions

Bare column names auto-qualify against the model's own table — single references ("amount") and arithmetic alike ("amount * quantity"). Explicit model.column works too but adds nothing. For joined columns inside a model's own sql, use the __ alias form (customers__regions.name) — see Joins.

SQLite JSON extraction

json_extract(col, '$.path') in a Column.sql is preserved verbatim on SQLite — SLayer does not rewrite it to col -> '$.path'. The -> operator in SQLite returns the JSON-quoted form ('"Owned"', with literal quotes), which silently breaks equality and CASE WHEN matches against bare-string literals. The function form returns the unquoted scalar.

columns:
  - name: tier
    type: string
    sql: "json_extract(payload, '$.tier')"
  - name: is_gold
    type: number
    sql: "CASE LOWER(json_extract(payload, '$.tier')) WHEN 'gold' THEN 1 ELSE 0 END"

If you specifically want SQLite's JSON-scalar operator, write ->> (exp.JSONExtractScalar) directly — SLayer leaves it untouched.

Measures vs aggregations — the disambiguation

SLayer has two list fields on a model that both relate to metrics, and the names don't make the difference obvious. The split is real and load-bearing:

  • measures is a library of named formulas — saved expressions like aov = revenue:sum / *:count. Queries reference them by bare name and the formula expands inline. Think what to compute, at the metric level.
  • aggregations is a registry of custom operators — definitions like trimmed_mean(p) or weighted_avg(weight=…). Once defined, they become usable as colon suffixes inside any formula: revenue:trimmed_mean(p=0.1). Think how to aggregate, at the operator level.

A typical model uses zero or a few entries in each. They compose:

aggregations:
  - name: trimmed_mean
    formula: "AVG(CASE WHEN {expr} BETWEEN {low} AND {high} THEN {expr} END)"

measures:
  - name: clean_aov
    formula: "revenue:trimmed_mean(low=0, high=1e6) / *:count"

Measures (named formulas)

A measure is a saved formula. Its shape is identical to an inline SlayerQuery.measures entry; the only difference is scope — model-level measures are reusable across queries.

Field Type Required Description
formula string Yes e.g. "revenue:sum / *:count", "cumsum(revenue:sum)"
name string No Queries reference this by bare name (auto-derived if omitted)
label string No Display name
description string No Explanatory text
type DataType No Declares the formula's result type (drives outer CAST)
meta dict No Arbitrary JSON metadata

Column and measure names share a namespace within a model — you can't have a column aov and a measure aov. A measure can use any other measure by bare name, including inside transforms and arithmetic:

"measures": [
  {"formula": "aov"},
  {"formula": "cumsum(aov)"},
  {"formula": "aov * 1.1", "name": "aov_with_markup"}
]

Expansion is purely textual at parse time, so the SQL is identical to writing the longhand formula. Cycles (a → b → a) are detected and rejected with the cycle in the error message. Names that would shadow built-in transforms (cumsum, change, rank, first, last, etc.) are rejected at model construction time.

See formulas.md for the full formula grammar — operators, transforms, time-shifted forms, and the Mode B DSL rules.

Aggregations

Aggregations are the operators that turn a column expression into a value: :sum, :avg, :percentile(p=…), and so on. They're applied at query time via colon syntax — measure:aggregation — and the same operator works on any compatible column.

Built-in aggregations

Aggregation Colon syntax SQL
count *:count COUNT(*) — counts all rows
count col:count COUNT(col) — counts non-null values
count_distinct col:count_distinct COUNT(DISTINCT col)
sum revenue:sum SUM(revenue)
avg revenue:avg AVG(revenue)
min / max revenue:min MIN(revenue) / MAX(revenue)
first / last col:first(time_col) Earliest / latest record's value, time-ordered
weighted_avg price:weighted_avg(weight=quantity) SUM(price * quantity) / SUM(quantity)
median revenue:median Median value
percentile revenue:percentile(p=0.95) 95th percentile
stddev_samp / stddev_pop latency:stddev_samp Sample / population standard deviation
var_samp / var_pop latency:var_samp Sample / population variance
corr price:corr(other=quantity) Pearson correlation between two columns
covar_samp / covar_pop price:covar_samp(other=quantity) Sample / population covariance

*:count is always available with no measure definition. * means "all rows" and is only valid with count*:sum and friends are rejected. Detailed NULL / N=1 semantics for the statistical aggregations are documented in database-support.md.

The first and last aggregations

first and last return the value from the earliest or most recent record within each grouped bucket — like min/max, but ordered by time instead of value. Useful for snapshot metrics like balances, inventory counts, or status fields where you want the latest state:

columns:
  - {name: balance, sql: balance, type: number}

balance:last(updated_at) gives the most recent balance per group; balance:first(updated_at) the earliest. When grouped by month, each month returns the latest (or earliest) record's balance in that month. If no time column is specified, ordering resolves via: query's main_time_dimension → first time/date dimension in the query → first time dimension in filters → model's default_time_dimension.

Not to be confused with the last() formula function — a window-function transform that broadcasts a value across all rows. Same name, different layer.

Custom aggregations

Add your own operators via the aggregations list. Each entry has a name and a SQL formula template using {expr} for the measure expression and named placeholders for kwargs:

aggregations:
  - name: weighted_avg
    formula: "sum({expr} * {weight}) / sum({weight})"
  - name: trimmed_mean
    formula: "avg(CASE WHEN {expr} BETWEEN {low} AND {high} THEN {expr} END)"

Use at query time: price:weighted_avg(weight=quantity), revenue:trimmed_mean(low=10, high=1000). An aggregation entry can also override a built-in's default parameters without redefining the SQL. Like columns and measures, aggregations accept an optional meta dict for caller bookkeeping.

Joins

Models declare explicit LEFT JOINs to other models:

name: orders
sql_table: public.orders
joins:
  - target_model: customers
    join_pairs: [["customer_id", "id"]]
  - target_model: products
    join_pairs: [["product_id", "id"]]

Joins enable cross-model measures — querying a measure from a joined model alongside the main model's data. See Cross-Model Measures. During auto-ingestion, joins are generated automatically from foreign-key relationships; multi-hop paths are resolved at query time by walking each intermediate model's own joins.

Path-based table aliases

Joined tables use __-delimited path aliases in generated SQL so diamond joins stay unambiguous — when the same table is reachable via multiple paths. For example, if orders joins both customers and warehouses, each referencing regions:

  • customers.regions.name → table alias customers__regions
  • warehouses.regions.name → table alias warehouses__regions

The convention is split by context: queries use dots for paths (customers.regions.name), model-internal SQL uses __ for the alias (customers__regions.name). See Diamond Joins for details.

Model filters

Model filters are SQL conditions always-applied to the underlying table:

name: active_orders
sql_table: public.orders
filters:
  - "deleted_at IS NULL"
  - "status <> 'test'"

These are SQL-mode expressions (Mode A): any valid SQL the underlying dialect accepts — function calls (json_extract, coalesce, …), CASE WHEN, joined-column references via the __ alias. Aggregation colon syntax and SLayer transforms are rejected here — those are DSL constructs (Mode B) and belong in query-level filters or ModelMeasure.formula. See references.md for the full Mode A / Mode B table. Multi-hop joined references (customers.regions.name) are auto-converted to the __ form with a warning.

Query-backed models

A query-backed model is a queryable relation whose rows are the final-stage result of one or more saved SlayerQuery stages. You can save any query as a model, then run it directly by name or use it as source_model in another query — exactly like any table-backed model.

Saving a query as a model

await engine.create_model_from_query(
    query={
        "source_model": "orders",
        "measures": [{"formula": "amount:sum"}],
        "dimensions": ["region"],
        "time_dimensions": [{"dimension": "ordered_at", "granularity": "month"}],
    },
    name="monthly_revenue",
    description="Monthly revenue by region",
    variables={"region": "US"},  # default placeholder values, optional
)

This saves the query structure in model.source_queries, saves any defaults in model.query_variables, runs save-time validation (any unresolved {var} placeholder defaults to '0' so SQL generation succeeds), and caches the resulting columns and rendered backing_query_sql on the model for fast inspection.

create_model_from_query accepts a single SlayerQuery or a list of stages; for multi-stage queries, every non-final stage must have a name so it can be referenced. Stages form a DAG: any stage may use a prior named sibling as source_model or as joins.target_model. Forward and self references are rejected.

Two ways to use a saved query

Run the backing query directly by name — returns the final-stage result:

await engine.execute("monthly_revenue", variables={"region": "US"})

Or use the saved result as a model in another query:

{
  "source_model": "monthly_revenue",
  "measures": [{"formula": "amount_sum:avg"}],
  "dimensions": ["region"]
}

Variable precedence

When a query-backed model references {var} placeholders, values flow in this order (highest first):

  1. Runtime kwargvariables= on engine.execute(...) (also via REST /query, MCP query / create_model, CLI --variables / --variables-json). Wins at every nesting level.
  2. Stage .variables — set on an individual SlayerQuery stage.
  3. Outer query .variables — when a query-backed model is used as source_model in another query.
  4. Model defaultsmodel.query_variables.

Unresolved placeholders raise a clear error at execute time, naming the model and the missing variable. Runtime-kwarg variables that don't appear anywhere are silently ignored.

What gets cached

For a query-backed model the engine caches model.columns (final-stage output columns — a discoverability snapshot) and model.backing_query_sql (the rendered backing query). The cache is populated only on save through engine.save_model (REST POST/PUT /models, MCP create_model/edit_model). Read operations never write storageengine.execute, inspect_model, get_column_types, MCP query, and REST /query will never modify the persisted cache. Writing a query-backed model directly to storage outside the engine leaves the cache stale until the next engine save.

You cannot supply columns or backing_query_sql yourself when creating a query-backed model — both are engine-managed, and any user-supplied value is rejected with a clear error.

Column naming in query-derived models

A query result is a self-contained table — it no longer has the joins the source model may have had. Dimensions and measures that came from joined models use __ to encode the original join path in their name:

Inner query field Virtual model column name
stores.name stores__name
customers.regions.name customers__regions__name
customer_id customer_id
*:count (measure) count
revenue:sum (measure) revenue_sum
{"formula": "revenue:sum", "name": "rev"} rev

This uses the same __ convention as SQL-level join path aliases. When referencing these columns in an outer query, use the __ name directly (e.g., {"name": "stores__name"}), not dot syntax — dots would imply a join to a model that doesn't exist on the virtual table.

An explicit name on a measure spec overrides the canonical naming above, for both arithmetic/transform formulas and simple aggregations. This is what lets multi-stage source_queries rename inner-stage outputs cleanly:

{
  "source_queries": [
    {
      "name": "raw",
      "source_model": "orders",
      "dimensions": ["region"],
      "measures": [{"formula": "amount:sum", "name": "rev"}]
    },
    {
      "source_model": "raw",
      "measures": [{"formula": "rev:sum"}]
    }
  ]
}

The inner stage emits a column named rev (not amount_sum), and the outer stage references it by that chosen name. See the multistage queries example for working examples.

Result column format

Query results use model_name.column_name keys. Colon syntax is converted: revenue:sum becomes orders.revenue_sum; *:count becomes orders._count (leading underscore so the alias never collides with a user column literally named count). Multi-hop joined dimensions keep their full path:

{"orders.status": "completed", "orders._count": 42, "orders.revenue_sum": 1500}
{"orders.customers.regions.name": "US", "orders._count": 3}

Schema versioning

Every persisted SLayer entity (SlayerModel, SlayerQuery, DatasourceConfig) carries a version: int field — currently 6 for SlayerModel, 3 for SlayerQuery, 1 for DatasourceConfig. Behaviour:

  • On save, SLayer always writes the current schema version.
  • On load, an older version triggers a chain of pure dict→dict converters before Pydantic validation. Hand-edited and older files keep working as the schema evolves.
  • Round-tripping an older file (load → save) upgrades it on disk to the current schema.
  • Forward tolerance. A file with a higher version than this SLayer knows about loads on a best-effort basis. For SlayerModel and DatasourceConfig, unknown fields are ignored. SlayerQuery v3 sets extra="forbid", so any unknown field on a future-version query raises a ValidationError rather than being silently dropped — this catches typos but means a future schema's new fields will not load on an older SLayer.

The v2→v3 converter drops the legacy dry_run / explain fields from SlayerQuery — they were execution-mode flags that had no business being persisted; pass them as kwargs to engine.execute(query, dry_run=..., explain=...) instead. Each migrated query emits one logger.warning and one DeprecationWarning on first load.

Migrations are defined in slayer/storage/migrations.py and apply at the Pydantic-validation layer, so every storage backend (YAML, SQLite, third-party backends registered via register_storage, plus HTTP API, MCP server, and dbt importer) gets them automatically.

Keeping models in sync with the live schema

When the live database schema changes — a column drops, a type bucket flips, a table goes away — persisted models stop being valid. SLayer surfaces this as a first-class concept: slayer validate-models returns a structured diff, slayer ingest is idempotent (additive only), and query-time DBAPI errors get attributed to the right model with a SchemaDriftError. See Schema Drift for the full diff / cascade contract and the --force-clean apply path.