# ABFS

The Azure BlobFS (ABFS) Data Connector enables federated SQL queries on files stored in Azure Blob-compatible endpoints. This includes Azure BlobFS (`abfss://`) and Azure Data Lake (`adl://`) endpoints.

When a folder path is provided, all the contained files will be loaded.

File formats are specified using the `file_format` parameter, as described in [Object Store File Formats](/building-blocks/data-connectors.md#object-store-file-formats).

```yaml
datasets:
  - from: abfs://foocontainer/taxi_sample.csv
    name: azure_test
    params:
      abfs_account: spiceadls
      abfs_access_key: ${ secrets:access_key }
      file_format: csv
```

## Configuration

### `from`

Defines the ABFS-compatible URI to a folder or object:

* `from: abfs://<container>/<path>` with the account name configured using `abfs_account` parameter, or
* `from: abfs://<container>@<account_name>.dfs.core.windows.net/<path>`

### `name`

Defines the dataset name, which is used as the table name within Spice.

Example:

```yaml
datasets:
  - from: abfs://foocontainer/taxi_sample.csv
    name: cool_dataset
    params: ...
```

```sql
SELECT COUNT(*) FROM cool_dataset;
```

```shell
+----------+
| count(*) |
+----------+
| 6001215  |
+----------+
```

### `params`

#### Basic parameters

| Parameter name              | Description                                                                                                                                                                                                    |
| --------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `file_format`               | Specifies the data format. Required if not inferrable from `from`. Options: `parquet`, `csv`. Refer to [Object Store File Formats](/building-blocks/data-connectors.md#object-store-file-formats) for details. |
| `abfs_account`              | Azure storage account name                                                                                                                                                                                     |
| `abfs_sas_string`           | SAS (Shared Access Signature) Token to use for authorization                                                                                                                                                   |
| `abfs_endpoint`             | Storage endpoint, default: `https://{account}.blob.core.windows.net`                                                                                                                                           |
| `abfs_use_emulator`         | Use `true` or `false` to connect to a local emulator                                                                                                                                                           |
| `abfs_authority_host`       | Alternative authority host, default: `https://login.microsoftonline.com`                                                                                                                                       |
| `abfs_proxy_url`            | Proxy URL                                                                                                                                                                                                      |
| `abfs_proxy_ca_certificate` | CA certificate for the proxy                                                                                                                                                                                   |
| `abfs_proxy_exludes`        | A list of hosts to exclude from proxy connections                                                                                                                                                              |
| `abfs_disable_tagging`      | Disable tagging objects. Use this if your backing store doesn't support tags                                                                                                                                   |
| `allow_http`                | Allow insecure HTTP connections                                                                                                                                                                                |
| `hive_partitioning_enabled` | Enable partitioning using hive-style partitioning from the folder structure. Defaults to `false`                                                                                                               |

#### Authentication parameters

The following parameters are used when authenticating with Azure. Only one of these parameters can be used at a time:

* `abfs_access_key`
* `abfs_bearer_token`
* `abfs_client_secret`
* `abfs_skip_signature`

If none of these are set the connector will default to using a [managed identity](https://learn.microsoft.com/en-us/entra/identity/managed-identities-azure-resources/overview)

| Parameter name              | Description                                                                                                                                                      |
| --------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `abfs_access_key`           | Secret access key                                                                                                                                                |
| `abfs_bearer_token`         | `BEARER` access token for user authentication. The token can be obtained from the OAuth2 flow (see [access token authentication](#access-token-authentication)). |
| `abfs_client_id`            | Client ID for client authentication flow                                                                                                                         |
| `abfs_client_secret`        | Client Secret to use for client authentication flow                                                                                                              |
| `abfs_tenant_id`            | Tenant ID to use for client authentication flow                                                                                                                  |
| `abfs_skip_signature`       | Skip credentials and request signing for public containers                                                                                                       |
| `abfs_msi_endpoint`         | Endpoint for managed identity tokens                                                                                                                             |
| `abfs_federated_token_file` | File path for federated identity token in Kubernetes                                                                                                             |
| `abfs_use_cli`              | Set to `true` to use the Azure CLI to acquire access tokens                                                                                                      |

#### Retry parameters

| Parameter name                  | Description                                  |
| ------------------------------- | -------------------------------------------- |
| `abfs_max_retries`              | Maximum retries                              |
| `abfs_retry_timeout`            | Total timeout for retries (e.g., `5s`, `1m`) |
| `abfs_backoff_initial_duration` | Initial retry delay (e.g., `5s`)             |
| `abfs_backoff_max_duration`     | Maximum retry delay (e.g., `1m`)             |
| `abfs_backoff_base`             | Exponential backoff base (e.g., `0.1`)       |

## Authentication

ABFS connector supports three types of authentication, as detailed in the [authentication parameters](#authentication-parameters)

### Service principal authentication

Configure service principal authentication by setting the `abfs_client_secret` parameter.

1. Create a new Azure AD application in the [Azure portal](https://portal.azure.com/#view/Microsoft_AAD_IAM/ActiveDirectoryMenuBlade/~/Overview) and generate a `client secret` under `Certificates & secrets`.
2. Grant the Azure AD application read access to the storage account under `Access Control (IAM)`, this can typically be done using the `Storage Blob Data Reader` built-in role.

### Access key authentication

Configure service principal authentication by setting the `abfs_access_key` parameter to [Azure Storage Account Access Key](https://learn.microsoft.com/en-us/azure/storage/common/storage-account-keys-manage?tabs=azure-portal)

## Supported file formats

Specify the file format using `file_format` parameter. More details in [Object Store File Formats](/building-blocks/data-connectors.md#object-store-file-formats).

## Examples

### Reading a CSV file with an Access Key

```yaml
datasets:
  - from: abfs://foocontainer/taxi_sample.csv
    name: azure_test
    params:
      abfs_account: spiceadls
      abfs_access_key: ${ secrets:ACCESS_KEY }
      file_format: csv
```

### Using Public Containers

```yaml
datasets:
  - from: abfs://pubcontainer/taxi_sample.csv
    name: pub_data
    params:
      abfs_account: spiceadls
      abfs_skip_signature: true
      file_format: csv
```

### Connecting to the Storage Emulator

```yaml
datasets:
  - from: abfs://test_container/test_csv.csv
    name: test_data
    params:
      abfs_use_emulator: true
      file_format: csv
```

### Using secrets for Account name

```yaml
datasets:
  - from: abfs://my_container/my_csv.csv
    name: prod_data
    params:
      abfs_account: ${ secrets:PROD_ACCOUNT }
      file_format: csv
```

### Authenticating using Client Authentication

```yaml
datasets:
  - from: abfs://my_data/input.parquet
    name: my_data
    params:
      abfs_tenant_id: ${ secrets:MY_TENANT_ID }
      abfs_client_id: ${ secrets:MY_CLIENT_ID }
      abfs_client_secret: ${ secrets:MY_CLIENT_SECRET }
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.spice.ai/building-blocks/data-connectors/abfs.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
