User-Defined Functions
User-defined scalar functions (UDFs) and table functions (UDTFs) declared in a spicepod's `functions:` section — SQL, Remote HTTP, and WebAssembly tiers.
Spice.ai supports user-defined functions declared declaratively in a spicepod's functions: section. Each function can be a scalar UDF (one value per input row) or a table UDTF (a relation returned in a SQL FROM clause), and is dispatched through one of three execution tiers:
SQL (
from: sql) — in-process function whose body is a DataFusion SQL expression (scalar) or query (table).Remote (
from: http://…orfrom: https://…) — async function that invokes a remote HTTP endpoint with a JSON request and receives a JSON response.WebAssembly (
from: wasm) — function backed by a sandboxed WASM module invoked with Arrow IPC batches.
Functions are automatically registered into the SQL session, exposed via the list_udfs() UDTF and the GET /v1/functions HTTP endpoint, and (when scalar) surfaced to LLM tool-calling. They hot-reload when the spicepod changes on disk.
Distribution availability. Inline SQL functions (from: sql) are available in all Spice distributions, including open source. Remote HTTP (from: http://…) and WebAssembly (from: wasm) tiers ship by default in Spice.ai Enterprise and Spice Cloud Platform distributions; open source users can enable them by building locally with the http-functions and wasm-functions cargo features. See Distributions.
User-defined functions are currently in ALPHA. Behaviour, on-disk schema, and APIs may change without notice. A one-time warning is logged when the first functions: entry is registered.
Set runtime.functions.enabled: true in spicepod.yaml to register user-defined functions. Without the flag, functions: entries log an error and are skipped at startup, and tools: entries with as_sql: true log a warning and remain LLM-only.
Quickstart
Declare a function in spicepod.yaml:
version: v2
kind: Spicepod
name: my_app
runtime:
functions:
enabled: true
functions:
- name: haversine_km
from: sql
description: Haversine distance in kilometres.
volatility: immutable
signature:
args:
- { name: lat1, type: float64 }
- { name: lon1, type: float64 }
- { name: lat2, type: float64 }
- { name: lon2, type: float64 }
returns: float64
body: |
6371 * acos(
cos(radians(lat1)) * cos(radians(lat2)) *
cos(radians(lon2) - radians(lon1)) +
sin(radians(lat1)) * sin(radians(lat2))
)Use it in SQL:
Schema Reference
Each entry in functions: is a Function object. Fields are strictly validated (deny_unknown_fields is enforced).
name
string
yes
Identifier the function is registered under. Referenced by that name in SQL.
from
string
yes
Source URI selecting the execution tier. sql, http://…, https://…, wasm.
enabled
bool
no
Defaults to true. Set to false to keep the declaration in the spicepod without registering it for SQL, tool exposure, list_udfs(), or /v1/functions.
description
string
no
Free-form description surfaced in list_udfs() and GET /v1/functions.
kind
enum
no
scalar (default) or table. Both are wired today. aggregate and window are reserved and rejected at registration with a clear error.
signature
object
yes
Typed signature. See below.
body
string
tier-dep
Inline SQL expression (scalar) or SELECT query (table). Required for from: sql unless body_ref is set. Optional for from: wasm to supply a table input from SQL. Forbidden for from: http*. Mutually exclusive with body_ref.
body_ref
string
tier-dep
Path to a file whose contents are the function body. Resolved relative to the runtime's CWD. Same tier rules as body. Mutually exclusive with body.
metadata
map
no
Free-form metadata surfaced alongside the declaration.
params
map
no
Tier-specific knobs. Supports ${ secrets:KEY } / ${ env:KEY } interpolation. See Remote params and WebAssembly params.
dependsOn
string[]
no
Names of spicepod components that must load before this function. Inferred from SQL bodies and params.input_table when omitted.
metrics
object
no
Per-function metrics configuration.
as_tool
bool
no
Expose the function as an LLM tool. Defaults to true for scalar functions; table functions are always SQL-only. See LLM Tool Exposure.
signature
signaturetables
Optional list of declared table inputs. When present, table inputs are passed before scalar arguments at call sites. Used by table functions and scalar functions that consume a relation.
tables[].name
Logical input name exposed to the backend.
tables[].columns
Arrow schema declared for that input.
args
Positional scalar argument list. Empty for niladic functions.
args[].name
Argument name. Referenced by name in SQL bodies.
args[].type
Arrow logical-type string (e.g. float64, utf8, list<int64>, decimal(38, 10), timestamp(us, utc)). See Supported types.
returns
For kind: scalar, a single Arrow type string. For kind: table, a list of named output columns (each with name and type).
Execution Tiers
SQL (from: sql)
from: sql)The body is parsed into a DataFusion logical expression against a schema derived from the argument list, then lowered to a physical expression at build time. The standard DataFusion scalar functions (math, string, datetime), Spark built-ins, and datafusion-functions-json are all available in bodies.
The body runs entirely in-process with no sandbox. Prefer SQL-tier UDFs for anything that can be expressed in a SQL expression — they're fastest, type-checked at startup, and don't leave the runtime.
SQL functions support the full set of Arrow logical types accepted by DataFusion, including primitives, list<…>, large_list<…>, struct<…>, decimal(p, s), decimal256(p, s), and timestamp(unit[, timezone]).
body_ref for longer bodies
body_ref for longer bodiesKeep non-trivial SQL in its own file with proper editor support:
body and body_ref are mutually exclusive; exactly one must be set when from: sql.
Remote (from: http://… | https://…)
from: http://… | https://…)The remote tier requires the http-functions cargo feature. It is enabled by default in Spice.ai Enterprise and Spice Cloud Platform distributions. Open source users must build locally with --features http-functions; the prebuilt OSS images and install script do not ship this tier. See Distributions.
The remote tier invokes an external HTTP endpoint, batching rows for scalar invocations and issuing a single request for table inputs. Powered by DataFusion's AsyncScalarUDFImpl, so the query scheduler overlaps remote calls with other work.
Wire contract
Scalar functions without table arguments issue one HTTP request per batch:
Request —
POST <endpoint>,Content-Type: application/json, body:Response — HTTP
200with body:values.lengthmust equalrows.length. A mismatch is treated as an execution error.
Scalar functions with dynamic table arguments and table functions (kind: table) issue a single request per invocation with both scalar arguments and table inputs:
Request body:
The
tablesfield is omitted when no table inputs are declared. Each row in a table input is a JSON object whose keys match the declaredtables[].columns.Response body —
{"rows": [{"<output_col>": <value>, ...}, ...]}for table functions,{"values": [<result>]}for scalar.
Non-2xx responses surface as query errors, with the response body snippet (up to 256 chars, one line) included in the error for debugging. Argument and return values are encoded through Arrow's JSON reader/writer, so any Arrow type with a JSON representation (including list<…>, struct<…>, and timestamps) is permitted.
Remote params
timeout
integer seconds or duration string (2s, 500ms)
30s
Per-call HTTP timeout.
batch_size
integer, 1..=100000
1024
Rows per HTTP request for scalar functions. Large inputs are chunked.
batch_concurrency
integer, 1..=64
4
Maximum in-flight HTTP batches per invocation. Results are appended in input order.
max_response_bytes
integer bytes, up to 1 GiB
10MiB
Maximum decoded response body size per HTTP call. Larger responses fail the query.
max_rows
integer, up to 1000000
100000
Cap on rows returned by a table-function call when the query has no LIMIT. Ignored for scalar batches.
auth_bearer
string (supports ${secrets:…})
none
Sets Authorization: Bearer <value> on every request.
WebAssembly (from: wasm)
from: wasm)The WASM tier requires the wasm-functions cargo feature (compiling Rust source additionally requires wasm-functions-compile). It is enabled by default in Spice.ai Enterprise and Spice Cloud Platform distributions. Open source users must build locally to enable it. See Distributions.
The WebAssembly tier executes user code inside a sandboxed wasmtime module, exchanging Arrow IPC streams with the host. Both scalar and table kinds are supported; a single guest export serves both shapes.
WebAssembly params
module
string
—
Path to a precompiled .wasm artifact. Mutually exclusive with source. One of the two is required.
source
string
—
Path to source code to compile to WASM at registration. Requires wasm-functions-compile.
language
string
rust
Source language when source is set. Only rust is supported today.
entrypoint
string
spice_transform
Exported guest function name.
input_table
string
—
Name of an existing table to use as the table input. Mutually exclusive with body / body_ref.
fuel
integer
100000000
Per-invocation wasmtime fuel budget. Exhaustion fails the call with an interrupt error.
max_memory_bytes
integer
67108864 (64MiB)
Hard cap on guest linear-memory growth.
max_table_elements
integer
1000000
Hard cap on guest WASM-table growth.
The guest module must export memory, spice_alloc, spice_dealloc, and the configured entrypoint. The entrypoint receives (input_ptr, input_len, args_ptr, args_len) and returns a packed (output_ptr << 32) | output_len pointer to an Arrow IPC stream matching the declared return schema.
Table Functions
Set kind: table to register a user-defined table function (UDTF). Table functions return a relation instead of a single value and are invoked in a SQL FROM clause:
The signature.returns field becomes a list of output columns. The function may take any combination of scalar arguments and declared signature.tables inputs; table inputs always precede scalar arguments at call sites.
SQL table functions
The body is a single SELECT query. Scalar arguments are visible through the reserved args table; declared signature.tables inputs are visible by their declared names.
To pass a relation into a table function, declare a signature.tables entry and reference the input table inside the body. The argument at the call site can be a table name or an inline subquery:
Remote and WebAssembly table functions
Remote (from: http://…) and WebAssembly (from: wasm) table functions follow the same kind: table shape. Inputs and outputs are exchanged over the tier's native wire format — JSON for remote, Arrow IPC for WASM — as described in Execution Tiers.
Table functions are always SQL-only. They are not exposed to LLM tool-calling regardless of the as_tool setting.
Supported Types
All tiers share a single Arrow type parser. Type names are case-insensitive and accept shorthand aliases (string ↔ utf8, bool ↔ boolean, int ↔ int32, double ↔ float64).
int8, int16, int32, int64
uint8, uint16, uint32, uint64
float32, float64
utf8 / string, large_utf8
boolean / bool
binary, large_binary
date32, date64
timestamp(s), timestamp(ms), timestamp(us), timestamp(ns)
Optional second argument for a timezone, e.g. timestamp(us, utc).
decimal(p, s), decimal128(p, s), decimal256(p, s)
list<T>, large_list<T>
struct<name:T, name2:T2, …>
Field names may be unquoted or quoted ("name":T / 'name':T).
Return types are coerced implicitly where DataFusion knows how (e.g. Int32 → Int64, Float32 → Float64, Utf8 → LargeUtf8) — a literal like 6371 widening to Float64 is fine.
Tier notes:
SQL tier lowers the body to a DataFusion physical expression, so any DataFusion-recognised type round-trips natively.
Remote tier serializes arguments and results through Arrow's JSON reader/writer, so any Arrow type with a JSON representation is acceptable. Plan for wire size when passing nested structures or large lists.
WebAssembly tier uses Arrow IPC streams as the host/guest ABI; the guest module must understand the declared input and output schemas.
Volatility
Volatility governs caching, constant-folding, and plan-time evaluation. Defaults to volatile — the safest choice.
immutable
Same inputs always yield the same output (e.g. abs, upper).
Constant-fold, cache forever.
stable
Stable within a single query, may change across queries (e.g. now()).
Cache per query.
volatile
Unpredictable on every call (e.g. random()).
Never cache.
Remote tier auto-caps at stable. A user-declared immutable or stable remote function is lowered to DataFusion's Stable volatility — constant-folding is disabled because the runtime cannot prove the remote service is deterministic or stable across process lifetimes. Queries within a single execution still benefit from common-subexpression elimination.
Federation Deny-List
Every user function is automatically added to Spice's federation deny-list. This prevents the query planner from attempting to push a user function down into a federated data source (e.g. Postgres, MySQL, Databricks) — the remote source doesn't know about it, and a pushdown would either fail or, worse, silently call a different function with the same name.
Built-in Spice UDFs (ai, embed, bucket, identity SQL functions, etc.) are likewise denied. Functions are added on registration and removed on deregistration / hot-reload removal. No manual configuration is required.
Hot-Reload
When the spicepod file changes on disk, Spice diffs the functions: section against the live set:
Removed functions are deregistered from DataFusion, removed from the deny-list, removed from
USER_FUNCTION_INFO, and dropped from the tool registry.Changed functions are deregistered and re-registered (equality is structural — any field change triggers a re-register).
Added functions are built and registered.
Failures to build a new or changed function are logged at ERROR and do not halt the diff — other functions still apply.
Introspection
SQL: list_udfs()
list_udfs()A built-in UDTF returns every function registered in the current session — Spice built-ins, DataFusion standard library, and user-declared functions together.
name
utf8, NOT NULL
UDF identifier.
source
utf8, NOT NULL
"builtin" or "user".
kind
utf8, nullable
"scalar" | "aggregate" | "window" | "table". NULL for built-ins.
volatility
utf8, nullable
"immutable" | "stable" | "volatile". NULL for built-ins.
from
utf8, nullable
Source URI for user functions. NULL for built-ins.
description
utf8, nullable
Free-form description.
HTTP: GET /v1/functions
GET /v1/functionsReturns only the user-declared functions (not built-ins). Intended for operational tooling.
The OpenAPI spec exposes this under operation_id: list_functions, tag Functions.
LLM Tool Exposure
When as_tool: true (the default), every declared scalar function is also registered as an LLM tool. The tool invokes the function through a SELECT fn_name(arg0, …) AS result query over the runtime's DataFusion session, so all execution tiers dispatch transparently. Table functions (kind: table) are always SQL-only and are not surfaced to the LLM.
Argument values are interpolated as typed SQL literals by the tool bridge — integers, floats, strings, booleans. No free-form SQL from the model is ever concatenated.
The tool's JSON Schema
parametersis derived from the function's Arrow signature. Types outside the JSON-encodable primitive set (int64,float64,utf8,boolean) are not eligible for tool exposure — the function still registers for SQL use, but tool registration is skipped with aWARNlog.Name collisions with a previously-registered tool (built-in or
tools:entry) are skipped with aWARNlog. Rename one, or setas_tool: falseon the function.
Exposing Tools as SQL UDFs
The inverse of LLM Tool Exposure: set as_sql: true on a tools: entry to register the tool as a SQL scalar UDF. The UDF dispatches to the tool per row, packing each row's arguments into a JSON object that matches the supplied signature.
Tool fields for SQL exposure
as_sql
bool
Defaults to false. Set to true to register a scalar UDF alongside the LLM tool. Requires runtime.functions.enabled: true and a signature: block.
Behaviour
A
signature:block is required whenas_sql: true— a tool's free-form JSON Schema does not uniquely map to a SQL signature, so argument and return Arrow types must be pinned explicitly.Supported Arrow types are the same primitive set as the Remote tier:
int64,float64,utf8,boolean. Other types are rejected at build time with anUnsupported*Typeerror naming the tool.The derived UDF is always
volatile. Constant-folding is disabled and per-row results are not memoized because the underlying tool is non-deterministic RPC.Per-query dispatch is capped at 16 concurrent in-flight calls, with row order preserved on output.
Tool-backed UDFs are automatically added to the federation deny-list and are never pushed down to remote data sources.
If
runtime.functions.enabledisfalse, orsignature:is missing, SQL registration is skipped with aWARNlog and the tool remains callable as an LLM tool only.
Distributed Execution
In a Spice.ai Enterprise distributed cluster, UDFs declared on the scheduler are automatically propagated to executors as part of the GetAppDefinition bootstrap RPC — executors load the full Spicepod definition (datasets, catalogs, views, and UDFs) from the scheduler at startup, so query planning and execution use a consistent view of the function set across all nodes.
User functions are deny-listed from federation pushdown on every node, so there's no risk of a stray executor attempting to push a user function into a downstream source.
Limitations
Current (ALPHA) scope:
kind: scalarandkind: tableare implemented.aggregateandwindoware reserved kinds — they parse but are rejected at registration so forward-compatible spicepods still load.Supported
from:schemes aresql,http://,https://, andwasm. The HTTP and WASM tiers are gated behind thehttp-functionsandwasm-functionscargo features respectively (default-on in Enterprise and Cloud, opt-in for OSS builds). Other schemes (e.g.grpc://,flight://) parse but are rejected.LLM tool exposure (
as_tool: true) is limited to scalar functions with primitive Arrow types in their signature (int64,float64,utf8,boolean). Functions with richer signatures remain callable from SQL.Remote tier issues batches in parallel with a default
batch_concurrencyof 4 (max 64). Retries, circuit-breakers, per-function rate limits, andhttp-arrow/ Flight protocols are not yet implemented.No aggregate/window semantics, UDAF pushdown, or execution-engine delegation beyond what DataFusion's scalar UDF and table function paths provide.
Last updated
Was this helpful?