LogoLogo
BlogTwitterDiscordTelegramSignup/Login
  • Getting Started
    • Welcome to Spice.ai Cloud
    • Getting Started
      • Sign in with GitHub
      • Create a Spice app
      • Add a Dataset and query data
      • Add AI Model and chat with your data
      • Next Steps
    • FAQ
  • Features
    • Federated SQL Query
    • Data Acceleration
      • In-Memory Arrow Data Accelerator
      • DuckDB Data Accelerator
      • PostgreSQL Data Accelerator
      • SQLite Data Accelerator
    • Search & Retrieval
    • AI Gateway
    • Semantic Models
    • ML Models
    • Observability
      • Task History
      • Zipkin
  • Building Blocks
    • Data Connectors
      • ABFS
      • ClickHouse
      • Databricks
      • Delta Lake
      • Dremio
      • DuckDB
      • DynamoDB
      • FlightSQL
      • FTP
      • GitHub
      • GraphQL
      • HTTPS
      • LocalPod
      • Memory
      • MSSQL
      • MySQL
      • ODBC
      • Postgres
      • S3
      • SharePoint
      • Snowflake
      • Spark
      • SpiceAI
    • Model Providers
      • Anthropic
      • Azure
      • Hugging Face
      • OpenAI
      • Perplexity
      • SpiceAI
      • XAI
  • API
    • SQL Query API
      • HTTP API
      • Apache Arrow Flight API
    • OpenAI API
    • Health API
  • Portal
    • Playground
      • SQL Query
      • AI Chat
    • Organizations
    • Apps
      • API keys
      • Secrets
      • Connect GitHub
      • Transfer
    • Public Apps
    • App Spicepod
      • Spicepod Configuration
      • Deployments
      • Spice Runtime Versions
    • Monitoring
    • Profile
      • Personal Access Tokens
  • Use-Cases
    • Agentic AI Apps
    • Database CDN
    • Data Lakehouse
    • Enterprise Search
    • Enterprise RAG
  • SDKs
    • Python SDK
      • Streaming
    • Node.js SDK
      • Streaming
      • API Reference
    • Go SDK
    • Rust SDK
    • Dotnet SDK
    • Java SDK
  • Integrations
    • GitHub Copilot
    • Grafana
    • Databricks
  • REFERENCE
    • Core Concepts
      • Duration Literals
    • SQL Reference
      • Data Types
      • SQL Functions
        • Aggregate
          • APPROX_COUNT_DISTINCT
          • AVG
          • BIT_AND
          • BIT_OR
          • CORR
          • COUNT
          • COVAR_POP
          • COVAR_SAMP
          • HLL
          • LISTAGG
          • MAX
          • MIN
          • NDV
          • STDDEV
          • STDDEV_POP
          • STDDEV_SAMP
          • SUM
          • VAR_POP
          • VAR_SAMP
        • Binary
          • BASE64
          • BIT_LENGTH
          • FROM_HEX
          • HEX
          • TO_HEX
          • UNBASE64
          • UNHEX
        • Bitwise
          • BIT_AND
          • BIT_OR
          • LSHIFT
          • RSHIFT
          • XOR
        • Boolean
          • IS [NOT] DISTINCT FROM
          • ISFALSE
          • IS [NOT] NULL
          • ISNUMERIC
          • ISTRUE
          • IS_MEMBER
        • Conditional
          • BOOL_AND
          • BOOL_OR
          • CASE
          • COALESCE
          • GREATEST
          • LEAST
          • NULLIF
        • Conversion
          • BINARY_STRING
          • CAST
          • CONVERT_FROM
          • CONVERT_REPLACEUTF8
          • CONVERT_TIMEZONE
          • CONVERT_TO
          • FLATTEN
          • FROM_HEX
          • HASH
          • HEX
          • TOASCII
          • TO_CHAR
          • TO_DATE
          • TO_HEX
          • TO_NUMBER
          • TO_TIME
          • TO_TIMESTAMP
          • UNHEX
        • Cryptography
          • AES_DECRYPT
          • AES_ENCRYPT
          • MD5
          • SHA
          • SHA1
          • SHA256
          • SHA512
        • Data Generation
          • RANDOM
        • Datatype
          • IS_BIGINT
          • IS_DATE
          • IS_INT
          • IS_VARCHAR
          • SIZE
          • TYPEOF
        • Date/Time
          • CONVERT_TIMEZONE
          • CURRENT_DATE
          • CURRENT_DATE_UTC
          • CURRENT_TIME
          • CURRENT_TIMESTAMP
          • DATEDIFF
          • DATE_ADD
          • DATE_DIFF
          • DATE_PART
          • DATE_SUB
          • DATE_TRUNC
          • DAY
          • DAYOFMONTH
          • DAYOFWEEK
          • DAYOFYEAR
          • EXTRACT
          • HOUR
          • LAST_DAY
          • MINUTE
          • MONTH
          • MONTHS_BETWEEN
          • NEXT_DAY
          • QUARTER
          • SECOND
          • TIMESTAMPADD
          • TIMESTAMPDIFF
          • TO_DATE
          • TO_TIME
          • TO_TIMESTAMP
          • UNIX_TIMESTAMP
          • WEEK
          • WEEKOFYEAR
          • YEAR
        • Math
          • ABS
          • ACOS
          • ASIN
          • ATAN
          • CBRT
          • CEILING
          • COS
          • COSH
          • COT
          • DEGREES
          • E
          • EXP
          • FLOOR
          • LOG
          • LOG10
          • MOD
          • PI
          • POWER
          • RADIANS
          • ROUND
          • SIGN
          • SIN
          • SINH
          • SQRT
          • STDDEV
          • STDDEV_POP
          • STDDEV_SAMP
          • TAN
          • TANH
          • TRUNCATE
        • Percentile
          • MEDIAN
          • PERCENTILE_CONT
          • PERCENTILE_DISC
        • Regular Expressions
          • REGEXP_EXTRACT
          • REGEXP_LIKE
          • REGEXP_MATCHES
          • REGEXP_REPLACE
          • REGEXP_SPLIT
        • Semistructured Data
          • ARRAY_CONTAINS
          • MAP_KEYS
          • MAP_VALUES
        • String
          • ASCII
          • BASE64
          • BTRIM
          • CHARACTER_LENGTH
          • CHAR_LENGTH
          • CHR
          • COL_LIKE
          • CONCAT
          • CONCAT_WS
          • ENDS_WITH
          • FROM_HEX
          • HEX
          • ILIKE
          • INITCAP
          • INSTR
          • IS_UTF8
          • LCASE
          • LEFT
          • LENGTH
          • LEVENSHTEIN
          • LIKE
          • LOCATE
          • LOWER
          • LPAD
          • LTRIM
          • MASK
          • MASK_FIRST_N
          • MASK_HASH
          • MASK_LAST_N
          • MASK_SHOW_FIRST_N
          • MASK_SHOW_LAST_N
          • OCTET_LENGTH
          • POSITION
          • QUOTE
          • REGEXP_EXTRACT
          • REGEXP_LIKE
          • REGEXP_MATCHES
          • REGEXP_REPLACE
          • REGEXP_SPLIT
          • REPEAT
          • REPEATSTR
          • REPLACE
          • REVERSE
          • RIGHT
          • RPAD
          • RTRIM
          • SIMILAR_TO
          • SOUNDEX
          • SPLIT_PART
          • STARTS_WITH
          • STRPOS
          • SUBSTRING
          • SUBSTRING_INDEX
          • TOASCII
          • TO_HEX
          • TRANSLATE
          • TRIM
          • UCASE
          • UNBASE64
          • UNHEX
          • UPPER
        • Window
          • COUNT
          • COVAR_POP
          • COVAR_SAMP
          • CUME_DIST
          • DENSE_RANK
          • FIRST_VALUE
          • HLL
          • LAG
          • LEAD
          • MAX
          • MIN
          • NDV
          • NTILE
          • PERCENT_RANK
          • RANK
          • ROW_NUMBER
          • SUM
          • VAR_POP
          • VAR_SAMP
      • SQL Commands
        • SELECT
        • USE
        • SHOW
        • DESCRIBE
        • WITH
    • Release Notes
  • Pricing
    • Paid Plans
    • Community Plan
  • Support
    • Support
  • Security
    • Security at Spice AI
    • Report a vulnerability
  • Legal
    • Privacy Policy
    • Website Terms of Use
    • Terms of Service
    • End User License Agreement
Powered by GitBook
On this page
  • Configuration
  • from
  • name
  • params
  • Authentication
  • Service principal authentication
  • Access key authentication
  • Supported file formats
  • Examples
  • Reading a CSV file with an Access Key
  • Using Public Containers
  • Connecting to the Storage Emulator
  • Using secrets for Account name
  • Authenticating using Client Authentication

Was this helpful?

Edit on GitHub
Export as PDF
  1. Building Blocks
  2. Data Connectors

ABFS

Azure BlobFS Data Connector Documentation

The Azure BlobFS (ABFS) Data Connector enables federated SQL queries on files stored in Azure Blob-compatible endpoints. This includes Azure BlobFS (abfss://) and Azure Data Lake (adl://) endpoints.

When a folder path is provided, all the contained files will be loaded.

File formats are specified using the file_format parameter, as described in Object Store File Formats.

datasets:
  - from: abfs://foocontainer/taxi_sample.csv
    name: azure_test
    params:
      abfs_account: spiceadls
      abfs_access_key: ${ secrets:access_key }
      file_format: csv

Configuration

from

Defines the ABFS-compatible URI to a folder or object:

  • from: abfs://<container>/<path> with the account name configured using abfs_account parameter, or

  • from: abfs://<container>@<account_name>.dfs.core.windows.net/<path>

name

Defines the dataset name, which is used as the table name within Spice.

Example:

datasets:
  - from: abfs://foocontainer/taxi_sample.csv
    name: cool_dataset
    params: ...
SELECT COUNT(*) FROM cool_dataset;
+----------+
| count(*) |
+----------+
| 6001215  |
+----------+

params

Basic parameters

Parameter name
Description

file_format

abfs_account

Azure storage account name

abfs_sas_string

SAS (Shared Access Signature) Token to use for authorization

abfs_endpoint

Storage endpoint, default: https://{account}.blob.core.windows.net

abfs_use_emulator

Use true or false to connect to a local emulator

abfs_authority_host

Alternative authority host, default: https://login.microsoftonline.com

abfs_proxy_url

Proxy URL

abfs_proxy_ca_certificate

CA certificate for the proxy

abfs_proxy_exludes

A list of hosts to exclude from proxy connections

abfs_disable_tagging

Disable tagging objects. Use this if your backing store doesn't support tags

allow_http

Allow insecure HTTP connections

hive_partitioning_enabled

Enable partitioning using hive-style partitioning from the folder structure. Defaults to false

Authentication parameters

The following parameters are used when authenticating with Azure. Only one of these parameters can be used at a time:

  • abfs_access_key

  • abfs_bearer_token

  • abfs_client_secret

  • abfs_skip_signature

If none of these are set the connector will default to using a managed identity

Parameter name
Description

abfs_access_key

Secret access key

abfs_bearer_token

abfs_client_id

Client ID for client authentication flow

abfs_client_secret

Client Secret to use for client authentication flow

abfs_tenant_id

Tenant ID to use for client authentication flow

abfs_skip_signature

Skip credentials and request signing for public containers

abfs_msi_endpoint

Endpoint for managed identity tokens

abfs_federated_token_file

File path for federated identity token in Kubernetes

abfs_use_cli

Set to true to use the Azure CLI to acquire access tokens

Retry parameters

Parameter name
Description

abfs_max_retries

Maximum retries

abfs_retry_timeout

Total timeout for retries (e.g., 5s, 1m)

abfs_backoff_initial_duration

Initial retry delay (e.g., 5s)

abfs_backoff_max_duration

Maximum retry delay (e.g., 1m)

abfs_backoff_base

Exponential backoff base (e.g., 0.1)

Authentication

ABFS connector supports three types of authentication, as detailed in the authentication parameters

Service principal authentication

Configure service principal authentication by setting the abfs_client_secret parameter.

  1. Create a new Azure AD application in the Azure portal and generate a client secret under Certificates & secrets.

  2. Grant the Azure AD application read access to the storage account under Access Control (IAM), this can typically be done using the Storage Blob Data Reader built-in role.

Access key authentication

Configure service principal authentication by setting the abfs_access_key parameter to Azure Storage Account Access Key

Supported file formats

Specify the file format using file_format parameter. More details in Object Store File Formats.

Examples

Reading a CSV file with an Access Key

datasets:
  - from: abfs://foocontainer/taxi_sample.csv
    name: azure_test
    params:
      abfs_account: spiceadls
      abfs_access_key: ${ secrets:ACCESS_KEY }
      file_format: csv

Using Public Containers

datasets:
  - from: abfs://pubcontainer/taxi_sample.csv
    name: pub_data
    params:
      abfs_account: spiceadls
      abfs_skip_signature: true
      file_format: csv

Connecting to the Storage Emulator

datasets:
  - from: abfs://test_container/test_csv.csv
    name: test_data
    params:
      abfs_use_emulator: true
      file_format: csv

Using secrets for Account name

datasets:
  - from: abfs://my_container/my_csv.csv
    name: prod_data
    params:
      abfs_account: ${ secrets:PROD_ACCOUNT }
      file_format: csv

Authenticating using Client Authentication

datasets:
  - from: abfs://my_data/input.parquet
    name: my_data
    params:
      abfs_tenant_id: ${ secrets:MY_TENANT_ID }
      abfs_client_id: ${ secrets:MY_CLIENT_ID }
      abfs_client_secret: ${ secrets:MY_CLIENT_SECRET }

Last updated 4 months ago

Was this helpful?

Specifies the data format. Required if not inferrable from from. Options: parquet, csv. Refer to for details.

BEARER access token for user authentication. The token can be obtained from the OAuth2 flow (see ).

Object Store File Formats
access token authentication