githubEdit

hard-driveStorage

Storage and acceleration tiering for production Spice.ai Enterprise deployments.

Spice.ai accelerations are latency- and IOPS-sensitive. Picking the right storage tier is the single highest-impact decision for query performance in a production deployment. This page is the reference for choosing storage for SpicepodSet, SpicepodCluster, and standalone runtime deployments.

Storage tiers

Choose storage in this order of preference:

Node-local NVMe SSDs deliver the lowest latency and highest IOPS available on every major cloud and on-prem platform. Local NVMe is the default recommendation for accelerator volumes and for executor shuffle scratch in SpicepodCluster.

Cloud
Instance families with attached NVMe

AWS

i4i, i7ie, i8g, and any d-suffixed family (m6id, m7gd, c7gd, r7gd, m6gd, c6gd, r6gd).

GCP

Local SSD machine typesarrow-up-right \u2014 *-lssd variants of N2, C3, and Z3.

On-prem

Any node with locally-attached NVMe SSDs (Intel Optane, Samsung PM9A3, Micron 7450, etc.).

Expose local NVMe to Kubernetes using the Local Volume Static Provisionerarrow-up-right as a local-storage StorageClass, then point the SpicepodSet volume.storage_class_name at it:

apiVersion: spice.ai/v1
kind: SpicepodSet
metadata:
  name: prod-accelerator
spec:
  replicas: 1
  volume:
    storage_class_name: local-storage
    storage_requests: 500Gi
  spicepod: |
    name: prod-accelerator
    kind: Spicepod
    version: v1
circle-exclamation
circle-exclamation

2. High-IOPS network block storage

When persistence must survive node replacement \u2014 for example, large file-based accelerations or SpicepodCluster scheduler state \u2014 use a high-IOPS block storage class. Network block storage is slower than local NVMe but recoverable.

Cloud
First choice
Fallback

AWS

EBS gp3 with provisioned IOPS

Azure

Premium SSD v2arrow-up-right (sub-ms, 80K IOPS)

Premium SSD (managed-csi-premium)

GCP

Persistent Disk SSD (pd-ssd)

Provision a custom StorageClass per cloud:

3. Cayenne shared object storage (Cayenne only)

For Cayennearrow-up-right acceleration that must be shared across replicas or persisted independently of pod lifecycle, point Cayenne at object storage:

  • AWS: S3 Express One Zonearrow-up-right directory buckets deliver single-digit-millisecond latency.

  • Azure: ADLS Gen2 hot tier, or Premium block blob.

  • GCP: Cloud Storage with the Standard storage class.

Object-store-backed Cayenne is the recommended pattern for SpicepodCluster deployments where multiple executors must share the same accelerated dataset.

Network file systems trade latency for sharing semantics. Do not use them as acceleration storage classes:

  • AWS EFS \u2014 NFS-style latency. Acceptable for stateless artefacts but not accelerators.

  • Azure Files (azurefile-csi) \u2014 SMB / NFS protocol overhead. Use only for ReadWriteMany shared artefacts.

  • GCP Filestore \u2014 same trade-off as the above.

Sizing accelerations

Spice acceleration storage is sized as: dataset_size_at_max_lookback * compression_ratio * 1.3 to leave headroom for query temp files.

Acceleration engine
Typical compression vs. raw Parquet
Notes

Cayenne

1.0x \u2013 1.5x larger

Uses Vortex columnar format. Best for analytical workloads and shared persistence.

DuckDB

0.7x \u2013 1.2x

Best for general-purpose analytical workloads.

SQLite / Postgres

1.5x \u2013 3x

Use only for OLTP-shaped workloads.

Arrow (memory)

1.5x \u2013 2x

Memory-resident. Sized as memory, not disk.

Validate sizing under representative load using runtime.task_history and the Grafana dashboard.

Object store for SpicepodCluster shared state

SpicepodCluster requires an S3-compatible object store for shared scheduler state and shuffle data. Configure it once per cluster:

Property
Recommended value

Bucket type

Single-purpose bucket per cluster.

Region

Same region as the executors. Cross-region object stores will dominate query latency.

Versioning

Enabled. Required to recover from corrupted writes.

Lifecycle

Expire shuffle/* after 7 days; expire scheduler/state/* after 30 days.

Encryption

Server-side encryption with KMS-managed keys.

Access

Workload identity / IRSA on the cluster's ServiceAccount. No static keys.

Disaster recovery

Production Spice.ai Enterprise deployments are recoverable from three sources of truth:

  1. Spicepod manifests in Git \u2014 the canonical source of dataset, model, and acceleration configuration. Drives the Argo CD / Flux pipeline.

  2. Object-store-backed cluster state \u2014 for SpicepodCluster, the scheduler state survives full pod loss.

  3. Upstream data sources \u2014 accelerations re-hydrate from the configured connector on startup. RTO is bounded by the time to refresh the largest accelerated dataset.

Recovery procedure:

  1. Restore the Kubernetes cluster (or fail over to the secondary region's cluster).

  2. Reapply the operator chart and the Spicepod manifests via the GitOps controller.

  3. Wait for executors to attach to the existing object-store state, or for SpicepodSet accelerations to refresh from upstream.

For RPO-sensitive deployments, run a warm-standby cluster in a second region with the same Spicepod manifests and a cross-region replicated object store.

Last updated

Was this helpful?