yads Specification¶
This page is the practical authoring guide for the yads YAML specification. Use
it while drafting or reviewing specs so every table definition is predictable,
well-typed, and friendly to downstream converters.
Current spec format
Target yads_spec_version: 0.0.2
The specification JSON schema is available in the GitHub repo.
Required header¶
Top-level named parameters describing the spec.
| Field | Type | Required | Description |
|---|---|---|---|
name |
string | Yes | Fully-qualified identifier [catalog].[database].[table]. |
version |
integer | Yes | Monotonic registry version, starting at 1. |
yads_spec_version |
string | No | Spec format version used to validate the document. When not specified, the loader will default to the current version of the specification. |
columns |
array | Yes | Ordered list of column definitions. |
Required header example
name: "catalog.db.table_name"
version: 3
yads_spec_version: "0.0.2"
columns:
- name: "id"
type: "bigint"
Identity and metadata¶
Optional top-level named parameters.
| Field | Type | Required | Description |
|---|---|---|---|
description |
string | No | Short summary of table intent. |
external |
boolean | No | Emit CREATE EXTERNAL for compatible converters when True. Defaults to false. |
metadata |
map | No | Arbitrary container for metadata as key-value pairs. |
Identity and metadata example
description: "Customer transaction facts."
external: true
metadata:
owner: "data-team"
sensitivity: "internal"
Storage layout¶
Optional top-level block for storage-related metadata.
| Field | Type | Required | Description |
|---|---|---|---|
storage.format |
string | No | Storage file format such as parquet, iceberg, orc, csv. |
storage.location |
string | No | URI or path for the table root. |
storage.tbl_properties |
map | No | Engine-specific key/value properties. |
Storage block
storage:
format: "parquet"
location: "/data/warehouse/customers"
tbl_properties:
write_compression: "snappy"
Partitioning¶
Optional top-level block for physical table partition information.
| Field | Type | Required | Description |
|---|---|---|---|
partitioned_by[].column |
string | Yes | Column backing the partition. Required when the partitioned_by block is defined. |
partitioned_by[].transform |
string | No | Transform name (month, year, truncate, bucket, ...). |
partitioned_by[].transform_args |
array | No | Unnamed arguments passed to the transform. |
Partition definitions
partitioned_by:
- column: "event_date"
transform: "month"
- column: "country_code"
transform: "truncate"
transform_args: [2]
Table constraints¶
Optional top-level block for Primary Key and Foreign Key table constraints.
| Field | Type | Required | Description |
|---|---|---|---|
table_constraints[].type |
string | Yes | primary_key or foreign_key. Required when the table_constraints block is defined. |
table_constraints[].name |
string | Yes | Stable constraint identifier. Required when the table_constraints block is defined. |
table_constraints[].columns |
array(string) | Yes | Participating columns. Required when the table_constraints block is defined. |
table_constraints[].references.table |
string | Foreign keys | Referenced table. Required for foreign_key table constraints. |
table_constraints[].references.columns |
array(string) | No | Referenced columns on foreign_key table constraints. Defaults to matching columns when not specified. |
Composite constraints
table_constraints:
- type: "primary_key"
name: "pk_orders"
columns: ["order_id", "order_date"]
- type: "foreign_key"
name: "fk_customer"
columns: ["customer_id"]
references:
table: "dim_customers"
columns: ["id"]
Column reference¶
Each entry in columns captures a single field plus metadata. The spec is strict:
unrecognized keys within a column block cause validation failures.
Column fields¶
| Field | Type | Required | Description |
|---|---|---|---|
name |
string | Yes | Column identifier. |
type |
string | Yes | Case-insensitive token from the type catalog. |
params |
map | No | Type-specific arguments (length, precision, tz, ...). |
element |
column | Arrays or tensors | Nested type for arrays or tensors. |
fields |
array(column) | Structs | Ordered struct fields. |
key |
column | Maps | Map key type. |
value |
column | Maps | Map value type. |
description |
string | No | Text-form field or column description. |
constraints |
map | No | See column constraints. |
generated_as |
map | No | See generated columns. |
metadata |
map | No | Arbitrary per-column metadata. |
Column entry
columns:
- name: "customer_id"
type: "bigint"
description: "Surrogate primary key."
constraints:
primary_key: true
not_null: true
- name: "created_at"
type: "timestamptz"
params:
tz: "UTC"
- name: "created_date"
type: "date"
generated_as:
column: "created_at"
transform: "cast"
transform_args: ["date"]
Column constraints¶
| Field | Type | Required | Description |
|---|---|---|---|
not_null |
boolean | No | Disallow nulls. |
primary_key |
boolean | No | Declare a single-column primary key (prefer table-level blocks for composites). |
default |
literal | No | Literal default consumed by downstream systems. |
identity.start |
integer | No | Starting value for identity sequences. |
identity.increment |
integer | No | Step for identity sequences. |
identity.always |
boolean | No | Whether identity is always generated. Defaults to true when an identity constraint is defined. |
foreign_key.name |
string | No | Column-level foreign key name. |
foreign_key.references.table |
string | Foreign keys | Referenced foreign key table name. |
foreign_key.references.columns |
array(string) | No | Referenced foreign key column names. |
Column constraints
- name: "submission_id"
type: "bigint"
constraints:
primary_key: true
not_null: true
identity:
start: 1
increment: 1
Generated columns¶
| Field | Type | Required | Description |
|---|---|---|---|
column |
string | Yes | Source column supplying values. |
transform |
string | No | Transform name (upper, month, cast, ...). |
transform_args |
array | No | Extra positional arguments. |
Generated column
- name: "created_date"
type: "date"
generated_as:
column: "created_at"
transform: "cast"
transform_args: ["date"]
Type catalog¶
Type tokens are lower-case by convention but case-insensitive. Each table below
represents the keys accepted under params for a column entry. Additional keys
(such as element or fields) are called out in their sections.
Scalar types¶
string¶
UTF-8 text with optional fixed length.
| Parameter | Description | Type | Required | Default |
|---|---|---|---|---|
length |
Maximum characters allowed. | integer | No | Unlimited |
string
- name: "email"
type: "string"
params:
length: 320
binary¶
Byte arrays or VARBINARY columns.
| Parameter | Description | Type | Required | Default |
|---|---|---|---|---|
length |
Maximum number of bytes. | integer | No | Unlimited |
binary
- name: "payload"
type: "binary"
params:
length: 16
boolean¶
True/false values. No additional parameters.
boolean
- name: "is_active"
type: "boolean"
integer¶
Signed or unsigned whole numbers. Aliases include tinyint, smallint,
int, bigint, int8, uint32, etc.
| Parameter | Description | Type | Required | Default |
|---|---|---|---|---|
bits |
Bit width (8, 16, 32, 64). |
integer | No | Target default |
signed |
Include negative values. | boolean | No | true |
integer
- name: "partition_bucket"
type: "int"
params:
bits: 32
signed: false
float¶
IEEE floating point numbers.
| Parameter | Description | Type | Required | Default |
|---|---|---|---|---|
bits |
Bit width (16, 32, 64). |
integer | No | Target default |
float
- name: "confidence"
type: "float"
params:
bits: 32
decimal¶
Exact precision decimals.
| Parameter | Description | Type | Required | Default |
|---|---|---|---|---|
precision |
Total digits. | integer | Precision & scale together | — |
scale |
Digits to the right of the decimal point (can be negative). | integer | Precision & scale together | — |
bits |
Storage width (128 or 256). |
integer | No | Target default |
decimal
- name: "completion_percent"
type: "decimal"
params:
precision: 5
scale: 2
Temporal types¶
date¶
Calendar date.
| Parameter | Description | Type | Required | Default |
|---|---|---|---|---|
bits |
Logical width (32 or 64). |
integer | No | 32 |
date
- name: "invoice_date"
type: "date"
params:
bits: 64
time¶
Wall-clock time with fractional precision.
| Parameter | Description | Type | Required | Default |
|---|---|---|---|---|
unit |
Granularity s, ms, us, ns. |
string | No | ms |
bits |
Storage width (32 or 64). |
integer | No | Target default |
time
- name: "alarm_time"
type: "time"
params:
unit: "us"
bits: 64
timestamp¶
Timezone-naive timestamp.
| Parameter | Description | Type | Required | Default |
|---|---|---|---|---|
unit |
Granularity s, ms, us, ns. |
string | No | ns |
timestamp
- name: "processed_at"
type: "timestamp"
params:
unit: "ms"
timestamptz¶
Timestamp with explicit timezone.
| Parameter | Description | Type | Required | Default |
|---|---|---|---|---|
unit |
Granularity s, ms, us, ns. |
string | No | ns |
tz |
IANA timezone name. | string | Yes | UTC |
timestamptz
- name: "submitted_at"
type: "timestamptz"
params:
tz: "UTC"
timestampltz¶
Timestamp interpreted in the session's timezone.
| Parameter | Description | Type | Required | Default |
|---|---|---|---|---|
unit |
Granularity s, ms, us, ns. |
string | No | ns |
timestampltz
- name: "user_created_at"
type: "timestampltz"
params:
unit: "us"
timestampntz¶
Timestamp with explicit "no timezone" semantics.
| Parameter | Description | Type | Required | Default |
|---|---|---|---|---|
unit |
Granularity s, ms, us, ns. |
string | No | ns |
timestampntz
- name: "materialized_at"
type: "timestampntz"
params:
unit: "s"
duration¶
Elapsed amount of time.
| Parameter | Description | Type | Required | Default |
|---|---|---|---|---|
unit |
Granularity s, ms, us, ns. |
string | No | ns |
duration
- name: "session_length"
type: "duration"
params:
unit: "ms"
interval¶
SQL-style intervals bounded by start and optional end units.
| Parameter | Description | Type | Required | Default |
|---|---|---|---|---|
interval_start |
Most significant unit. Year-Month: [YEAR, MONTH], or Day-Time: [DAY, HOUR, MINUTE, SECOND]. |
string | Yes | — |
interval_end |
Least significant unit. Must be same category as interval_start. |
string | No | Single-unit interval |
interval
- name: "contract_term"
type: "interval"
params:
interval_start: "YEAR"
interval_end: "MONTH"
Collection types¶
array¶
Ordered list of values sharing the same element type.
| Parameter | Description | Type | Required | Default |
|---|---|---|---|---|
element |
Nested type definition. | column | Yes | — |
params.size |
Max array length for fixed-size arrays. | integer | No | Unlimited |
array
- name: "tags"
type: "array"
element:
type: "string"
params:
size: 10
struct¶
Named grouping of heterogenous fields.
| Parameter | Description | Type | Required | Default |
|---|---|---|---|---|
fields |
Ordered list of embedded column definitions. | array(column) | Yes | — |
struct
- name: "address"
type: "struct"
fields:
- name: "street"
type: "string"
- name: "city"
type: "string"
- name: "postal_code"
type: "string"
map¶
Key/value pairs.
| Parameter | Description | Type | Required | Default |
|---|---|---|---|---|
key |
Type definition for map keys. | column | Yes | — |
value |
Type definition for map values. | column | Yes | — |
params.keys_sorted |
Whether keys are guaranteed sorted. | boolean | No | false |
map
- name: "attributes"
type: "map"
key:
type: "string"
value:
type: "variant"
params:
keys_sorted: true
tensor¶
Multi-dimensional numeric data.
| Parameter | Description | Type | Required | Default |
|---|---|---|---|---|
element |
Base type for tensor entries. | column | Yes | — |
params.shape |
Positive integers describing each dimension. | array(integer) | Yes | — |
tensor
- name: "embedding"
type: "tensor"
element:
type: "float"
params:
shape: [3, 128]
Semi-structured, spatial, and identifiers¶
json¶
Semi-structured JSON payload. No additional parameters.
json
- name: "event_payload"
type: "json"
variant¶
Union-style semi-structured payload. No additional parameters.
variant
- name: "document"
type: "variant"
uuid¶
128-bit identifiers formatted as canonical UUID strings.
uuid
- name: "record_uuid"
type: "uuid"
constraints:
not_null: true
void¶
Represents NULL or placeholder fields.
void
- name: "reserved"
type: "void"
geometry¶
Planar geometry column.
| Parameter | Description | Type | Required | Default |
|---|---|---|---|---|
srid |
Spatial reference identifier. | integer or string | No | — |
geometry
- name: "parcel_shape"
type: "geometry"
params:
srid: 4326
geography¶
Spherical geometry column.
| Parameter | Description | Type | Required | Default |
|---|---|---|---|---|
srid |
Spatial reference identifier. | integer or string | No | — |
geography
- name: "customer_location"
type: "geography"
params:
srid: 4326
Complete spec example¶
Below is a complete example of the yads specification.
name: "catalog.db.full_spec"
version: 1
yads_spec_version: "0.0.2"
description: "A full spec with all features."
external: true
metadata:
owner: "data-team"
sensitive: false
storage:
format: "parquet"
location: "/data/full.spec"
tbl_properties:
write_compression: "snappy"
partitioned_by:
- column: "c_string_len"
- column: "c_string"
transform: "truncate"
transform_args: [10]
- column: "c_date"
transform: "month"
table_constraints:
- type: "primary_key"
name: "pk_full_spec"
columns: ["c_uuid", "c_date"]
- type: "foreign_key"
name: "fk_other_table"
columns: ["c_int64"]
references:
table: "other_table"
columns: ["id"]
columns:
- name: "c_string"
type: "string"
description: "A string column with a default value."
constraints:
default: "default_string"
- name: "c_string_len"
type: "string"
params:
length: 255
description: "A string column with a specific length."
- name: "c_string_upper"
type: "string"
description: "A string column with a generated value."
generated_as:
column: "c_string"
transform: "upper"
- name: "c_int8"
type: "tinyint"
description: "A tiny integer."
- name: "c_int16"
type: "smallint"
description: "A small integer."
- name: "c_int32_identity"
type: "int"
description: "An integer with an identity constraint."
constraints:
identity:
always: true
start: 1
increment: 1
- name: "c_int64"
type: "bigint"
description: "A big integer, part of a foreign key."
- name: "c_float32"
type: "float"
description: "A 32-bit float."
- name: "c_float64"
type: "double"
description: "A 64-bit float."
- name: "c_decimal"
type: "decimal"
description: "A decimal with default precision/scale."
- name: "c_decimal_ps"
type: "decimal"
params:
precision: 10
scale: 2
description: "A decimal with specified precision/scale."
- name: "c_boolean"
type: "boolean"
description: "A boolean value."
- name: "c_binary"
type: "binary"
description: "A binary data column."
- name: "c_binary_len"
type: "binary"
params:
length: 8
description: "A fixed-length binary column."
- name: "c_date"
type: "date"
description: "A date value, part of the primary key."
constraints:
not_null: true
- name: "c_date_month"
type: "int"
description: "An integer column with a generated value."
generated_as:
column: "c_date"
transform: "month"
- name: "c_time"
type: "time"
description: "A time value."
- name: "c_timestamp"
type: "timestamp"
description: "A timestamp."
- name: "c_timestamp_date"
type: "date"
description: "A date column with a generated value."
generated_as:
column: "c_timestamp"
transform: "cast"
transform_args: ["date"]
- name: "c_timestamp_tz"
type: "timestamptz"
description: "A timestamp with timezone."
- name: "c_timestamp_ltz"
type: "timestampltz"
description: "A timestamp with local timezone."
- name: "c_timestamp_ntz"
type: "timestampntz"
description: "A timestamp without timezone."
- name: "c_interval_ym"
type: "interval"
params:
interval_start: "YEAR"
interval_end: "MONTH"
description: "A year-month interval."
- name: "c_interval_d"
type: "interval"
params:
interval_start: "DAY"
description: "A day interval."
- name: "c_array"
type: "array"
element:
type: "int"
description: "An array of integers."
- name: "c_array_sized"
type: "array"
element:
type: "string"
params:
size: 2
description: "A fixed-size array of strings."
- name: "c_struct"
type: "struct"
fields:
- name: "nested_int"
type: "int"
description: "A nested integer."
- name: "nested_string"
type: "string"
description: "A nested string."
description: "A struct column."
- name: "c_map"
type: "map"
key:
type: "string"
value:
type: "double"
description: "A map from string to double."
- name: "c_json"
type: "json"
description: "A JSON column."
- name: "c_geometry"
type: "geometry"
params:
srid: 4326
description: "A geometry column."
- name: "c_geography"
type: "geography"
params:
srid: 4326
description: "A geography column."
- name: "c_uuid"
type: "uuid"
description: "Primary key part 1"
constraints:
not_null: true
- name: "c_void"
type: "void"
description: "A void column."
- name: "c_variant"
type: "variant"
description: "A variant column."