LogoLogo
BlogTwitterDiscordTelegramSignup/Login
  • Getting Started
    • Welcome to Spice.ai Cloud
    • Getting Started
      • Sign in with GitHub
      • Create a Spice app
      • Add a Dataset and query data
      • Add AI Model and chat with your data
      • Next Steps
    • FAQ
  • Features
    • Federated SQL Query
    • Data Acceleration
      • In-Memory Arrow Data Accelerator
      • DuckDB Data Accelerator
      • PostgreSQL Data Accelerator
      • SQLite Data Accelerator
    • Search & Retrieval
    • AI Gateway
    • Semantic Models
    • ML Models
    • Observability
      • Task History
      • Zipkin
  • Building Blocks
    • Data Connectors
      • ABFS
      • ClickHouse
      • Databricks
      • Delta Lake
      • Dremio
      • DuckDB
      • DynamoDB
      • FlightSQL
      • FTP
      • GitHub
      • GraphQL
      • HTTPS
      • LocalPod
      • Memory
      • MSSQL
      • MySQL
      • ODBC
      • Postgres
      • S3
      • SharePoint
      • Snowflake
      • Spark
      • SpiceAI
    • Model Providers
      • Anthropic
      • Azure
      • Hugging Face
      • OpenAI
      • Perplexity
      • SpiceAI
      • XAI
  • API
    • SQL Query API
      • HTTP API
      • Apache Arrow Flight API
    • OpenAI API
    • Health API
  • Portal
    • Playground
      • SQL Query
      • AI Chat
    • Organizations
    • Apps
      • API keys
      • Secrets
      • Connect GitHub
      • Transfer
    • Public Apps
    • App Spicepod
      • Spicepod Configuration
      • Deployments
      • Spice Runtime Versions
    • Monitoring
    • Profile
      • Personal Access Tokens
  • Use-Cases
    • Agentic AI Apps
    • Database CDN
    • Data Lakehouse
    • Enterprise Search
    • Enterprise RAG
  • SDKs
    • Python SDK
      • Streaming
    • Node.js SDK
      • Streaming
      • API Reference
    • Go SDK
    • Rust SDK
    • Dotnet SDK
    • Java SDK
  • Integrations
    • GitHub Copilot
    • Grafana
  • REFERENCE
    • Core Concepts
      • Duration Literals
    • SQL Reference
      • Data Types
      • SQL Functions
        • Aggregate
          • APPROX_COUNT_DISTINCT
          • AVG
          • BIT_AND
          • BIT_OR
          • CORR
          • COUNT
          • COVAR_POP
          • COVAR_SAMP
          • HLL
          • LISTAGG
          • MAX
          • MIN
          • NDV
          • STDDEV
          • STDDEV_POP
          • STDDEV_SAMP
          • SUM
          • VAR_POP
          • VAR_SAMP
        • Binary
          • BASE64
          • BIT_LENGTH
          • FROM_HEX
          • HEX
          • TO_HEX
          • UNBASE64
          • UNHEX
        • Bitwise
          • BIT_AND
          • BIT_OR
          • LSHIFT
          • RSHIFT
          • XOR
        • Boolean
          • IS [NOT] DISTINCT FROM
          • ISFALSE
          • IS [NOT] NULL
          • ISNUMERIC
          • ISTRUE
          • IS_MEMBER
        • Conditional
          • BOOL_AND
          • BOOL_OR
          • CASE
          • COALESCE
          • GREATEST
          • LEAST
          • NULLIF
        • Conversion
          • BINARY_STRING
          • CAST
          • CONVERT_FROM
          • CONVERT_REPLACEUTF8
          • CONVERT_TIMEZONE
          • CONVERT_TO
          • FLATTEN
          • FROM_HEX
          • HASH
          • HEX
          • TOASCII
          • TO_CHAR
          • TO_DATE
          • TO_HEX
          • TO_NUMBER
          • TO_TIME
          • TO_TIMESTAMP
          • UNHEX
        • Cryptography
          • AES_DECRYPT
          • AES_ENCRYPT
          • MD5
          • SHA
          • SHA1
          • SHA256
          • SHA512
        • Data Generation
          • RANDOM
        • Datatype
          • IS_BIGINT
          • IS_DATE
          • IS_INT
          • IS_VARCHAR
          • SIZE
          • TYPEOF
        • Date/Time
          • CONVERT_TIMEZONE
          • CURRENT_DATE
          • CURRENT_DATE_UTC
          • CURRENT_TIME
          • CURRENT_TIMESTAMP
          • DATEDIFF
          • DATE_ADD
          • DATE_DIFF
          • DATE_PART
          • DATE_SUB
          • DATE_TRUNC
          • DAY
          • DAYOFMONTH
          • DAYOFWEEK
          • DAYOFYEAR
          • EXTRACT
          • HOUR
          • LAST_DAY
          • MINUTE
          • MONTH
          • MONTHS_BETWEEN
          • NEXT_DAY
          • QUARTER
          • SECOND
          • TIMESTAMPADD
          • TIMESTAMPDIFF
          • TO_DATE
          • TO_TIME
          • TO_TIMESTAMP
          • UNIX_TIMESTAMP
          • WEEK
          • WEEKOFYEAR
          • YEAR
        • Math
          • ABS
          • ACOS
          • ASIN
          • ATAN
          • CBRT
          • CEILING
          • COS
          • COSH
          • COT
          • DEGREES
          • E
          • EXP
          • FLOOR
          • LOG
          • LOG10
          • MOD
          • PI
          • POWER
          • RADIANS
          • ROUND
          • SIGN
          • SIN
          • SINH
          • SQRT
          • STDDEV
          • STDDEV_POP
          • STDDEV_SAMP
          • TAN
          • TANH
          • TRUNCATE
        • Percentile
          • MEDIAN
          • PERCENTILE_CONT
          • PERCENTILE_DISC
        • Regular Expressions
          • REGEXP_EXTRACT
          • REGEXP_LIKE
          • REGEXP_MATCHES
          • REGEXP_REPLACE
          • REGEXP_SPLIT
        • Semistructured Data
          • ARRAY_CONTAINS
          • MAP_KEYS
          • MAP_VALUES
        • String
          • ASCII
          • BASE64
          • BTRIM
          • CHARACTER_LENGTH
          • CHAR_LENGTH
          • CHR
          • COL_LIKE
          • CONCAT
          • CONCAT_WS
          • ENDS_WITH
          • FROM_HEX
          • HEX
          • ILIKE
          • INITCAP
          • INSTR
          • IS_UTF8
          • LCASE
          • LEFT
          • LENGTH
          • LEVENSHTEIN
          • LIKE
          • LOCATE
          • LOWER
          • LPAD
          • LTRIM
          • MASK
          • MASK_FIRST_N
          • MASK_HASH
          • MASK_LAST_N
          • MASK_SHOW_FIRST_N
          • MASK_SHOW_LAST_N
          • OCTET_LENGTH
          • POSITION
          • QUOTE
          • REGEXP_EXTRACT
          • REGEXP_LIKE
          • REGEXP_MATCHES
          • REGEXP_REPLACE
          • REGEXP_SPLIT
          • REPEAT
          • REPEATSTR
          • REPLACE
          • REVERSE
          • RIGHT
          • RPAD
          • RTRIM
          • SIMILAR_TO
          • SOUNDEX
          • SPLIT_PART
          • STARTS_WITH
          • STRPOS
          • SUBSTRING
          • SUBSTRING_INDEX
          • TOASCII
          • TO_HEX
          • TRANSLATE
          • TRIM
          • UCASE
          • UNBASE64
          • UNHEX
          • UPPER
        • Window
          • COUNT
          • COVAR_POP
          • COVAR_SAMP
          • CUME_DIST
          • DENSE_RANK
          • FIRST_VALUE
          • HLL
          • LAG
          • LEAD
          • MAX
          • MIN
          • NDV
          • NTILE
          • PERCENT_RANK
          • RANK
          • ROW_NUMBER
          • SUM
          • VAR_POP
          • VAR_SAMP
      • SQL Commands
        • SELECT
        • USE
        • SHOW
        • DESCRIBE
        • WITH
    • Release Notes
  • Pricing
    • Paid Plans
    • Community Plan
  • Support
    • Support
  • Security
    • Security at Spice AI
    • Report a vulnerability
  • Legal
    • Privacy Policy
    • Website Terms of Use
    • Terms of Service
    • End User License Agreement
Powered by GitBook
On this page
  • Configuration
  • from
  • name
  • params
  • Authentication
  • Service principal authentication
  • Access key authentication
  • Supported file formats
  • Examples
  • Reading a CSV file with an Access Key
  • Using Public Containers
  • Connecting to the Storage Emulator
  • Using secrets for Account name
  • Authenticating using Client Authentication

Was this helpful?

Edit on GitHub
Export as PDF
  1. Building Blocks
  2. Data Connectors

ABFS

Azure BlobFS Data Connector Documentation

Last updated 3 months ago

Was this helpful?

The Azure BlobFS (ABFS) Data Connector enables federated SQL queries on files stored in Azure Blob-compatible endpoints. This includes Azure BlobFS (abfss://) and Azure Data Lake (adl://) endpoints.

When a folder path is provided, all the contained files will be loaded.

File formats are specified using the file_format parameter, as described in .

datasets:
  - from: abfs://foocontainer/taxi_sample.csv
    name: azure_test
    params:
      abfs_account: spiceadls
      abfs_access_key: ${ secrets:access_key }
      file_format: csv

Configuration

from

Defines the ABFS-compatible URI to a folder or object:

  • from: abfs://<container>/<path> with the account name configured using abfs_account parameter, or

  • from: abfs://<container>@<account_name>.dfs.core.windows.net/<path>

name

Defines the dataset name, which is used as the table name within Spice.

Example:

datasets:
  - from: abfs://foocontainer/taxi_sample.csv
    name: cool_dataset
    params: ...
SELECT COUNT(*) FROM cool_dataset;
+----------+
| count(*) |
+----------+
| 6001215  |
+----------+

params

Basic parameters

Parameter name
Description

file_format

abfs_account

Azure storage account name

abfs_sas_string

SAS (Shared Access Signature) Token to use for authorization

abfs_endpoint

Storage endpoint, default: https://{account}.blob.core.windows.net

abfs_use_emulator

Use true or false to connect to a local emulator

abfs_authority_host

Alternative authority host, default: https://login.microsoftonline.com

abfs_proxy_url

Proxy URL

abfs_proxy_ca_certificate

CA certificate for the proxy

abfs_proxy_exludes

A list of hosts to exclude from proxy connections

abfs_disable_tagging

Disable tagging objects. Use this if your backing store doesn't support tags

allow_http

Allow insecure HTTP connections

hive_partitioning_enabled

Enable partitioning using hive-style partitioning from the folder structure. Defaults to false

Authentication parameters

The following parameters are used when authenticating with Azure. Only one of these parameters can be used at a time:

  • abfs_access_key

  • abfs_bearer_token

  • abfs_client_secret

  • abfs_skip_signature

Parameter name
Description

abfs_access_key

Secret access key

abfs_bearer_token

abfs_client_id

Client ID for client authentication flow

abfs_client_secret

Client Secret to use for client authentication flow

abfs_tenant_id

Tenant ID to use for client authentication flow

abfs_skip_signature

Skip credentials and request signing for public containers

abfs_msi_endpoint

Endpoint for managed identity tokens

abfs_federated_token_file

File path for federated identity token in Kubernetes

abfs_use_cli

Set to true to use the Azure CLI to acquire access tokens

Retry parameters

Parameter name
Description

abfs_max_retries

Maximum retries

abfs_retry_timeout

Total timeout for retries (e.g., 5s, 1m)

abfs_backoff_initial_duration

Initial retry delay (e.g., 5s)

abfs_backoff_max_duration

Maximum retry delay (e.g., 1m)

abfs_backoff_base

Exponential backoff base (e.g., 0.1)

Authentication

Service principal authentication

Configure service principal authentication by setting the abfs_client_secret parameter.

  1. Grant the Azure AD application read access to the storage account under Access Control (IAM), this can typically be done using the Storage Blob Data Reader built-in role.

Access key authentication

Supported file formats

Examples

Reading a CSV file with an Access Key

datasets:
  - from: abfs://foocontainer/taxi_sample.csv
    name: azure_test
    params:
      abfs_account: spiceadls
      abfs_access_key: ${ secrets:ACCESS_KEY }
      file_format: csv

Using Public Containers

datasets:
  - from: abfs://pubcontainer/taxi_sample.csv
    name: pub_data
    params:
      abfs_account: spiceadls
      abfs_skip_signature: true
      file_format: csv

Connecting to the Storage Emulator

datasets:
  - from: abfs://test_container/test_csv.csv
    name: test_data
    params:
      abfs_use_emulator: true
      file_format: csv

Using secrets for Account name

datasets:
  - from: abfs://my_container/my_csv.csv
    name: prod_data
    params:
      abfs_account: ${ secrets:PROD_ACCOUNT }
      file_format: csv

Authenticating using Client Authentication

datasets:
  - from: abfs://my_data/input.parquet
    name: my_data
    params:
      abfs_tenant_id: ${ secrets:MY_TENANT_ID }
      abfs_client_id: ${ secrets:MY_CLIENT_ID }
      abfs_client_secret: ${ secrets:MY_CLIENT_SECRET }

Specifies the data format. Required if not inferrable from from. Options: parquet, csv. Refer to for details.

If none of these are set the connector will default to using a

BEARER access token for user authentication. The token can be obtained from the OAuth2 flow (see ).

ABFS connector supports three types of authentication, as detailed in the

Create a new Azure AD application in the and generate a client secret under Certificates & secrets.

Configure service principal authentication by setting the abfs_access_key parameter to

Specify the file format using file_format parameter. More details in .

managed identity
Azure portal
Azure Storage Account Access Key
authentication parameters
access token authentication
Object Store File Formats
Object Store File Formats
Object Store File Formats