# Delta Lake

Delta Lake data connector connector enables SQL queries from [Delta Lake](https://delta.io/) tables.

```yaml
datasets:
  - from: delta_lake:s3://my_bucket/path/to/s3/delta/table/
    name: my_delta_lake_table
    params:
      delta_lake_aws_access_key_id: ${secrets:aws_access_key_id}
      delta_lake_aws_secret_access_key: ${secrets:aws_secret_access_key}
```

## Configuration

### `from`

The `from` field for the Delta Lake connector takes the form of `delta_lake:path` where `path` is any supported path, either local or to a cloud storage location. See the [examples](#examples) section below.

### `name`

The dataset name. This will be used as the table name within Spice.

Example:

```yaml
datasets:
  - from: delta_lake:s3://my_bucket/path/to/s3/delta/table/
    name: cool_dataset
    params: ...
```

```sql
SELECT COUNT(*) FROM cool_dataset;
```

```shell
+----------+
| count(*) |
+----------+
| 6001215  |
+----------+
```

### `params`

Use the [secret replacement syntax](https://github.com/spicehq/docs/blob/trunk/building-blocks/secret-stores/index.md) to reference a secret, e.g. `${secrets:aws_access_key_id}`.

| Parameter Name   | Description                                                                                                 |
| ---------------- | ----------------------------------------------------------------------------------------------------------- |
| `client_timeout` | Optional. Specifies timeout for object store operations. Default value is `30s`. E.g. `client_timeout: 60s` |

## Delta Lake object store parameters

### AWS S3

| Parameter Name                     | Description                                                                        |
| ---------------------------------- | ---------------------------------------------------------------------------------- |
| `delta_lake_aws_region`            | Optional. The AWS region for the S3 object store. E.g. `us-west-2`.                |
| `delta_lake_aws_access_key_id`     | The access key ID for the S3 object store.                                         |
| `delta_lake_aws_secret_access_key` | The secret access key for the S3 object store.                                     |
| `delta_lake_aws_endpoint`          | Optional. The endpoint for the S3 object store. E.g. `s3.us-west-2.amazonaws.com`. |

### Azure Blob

{% hint style="info" %}
**Note** One of the following auth values must be provided for Azure Blob:

* `delta_lake_azure_storage_account_key`,
* `delta_lake_azure_storage_client_id` and `azure_storage_client_secret`, or
* `delta_lake_azure_storage_sas_key`.
  {% endhint %}

| Parameter Name                           | Description                                                            |
| ---------------------------------------- | ---------------------------------------------------------------------- |
| `delta_lake_azure_storage_account_name`  | The Azure Storage account name.                                        |
| `delta_lake_azure_storage_account_key`   | The Azure Storage master key for accessing the storage account.        |
| `delta_lake_azure_storage_client_id`     | The service principal client id for accessing the storage account.     |
| `delta_lake_azure_storage_client_secret` | The service principal client secret for accessing the storage account. |
| `delta_lake_azure_storage_sas_key`       | The shared access signature key for accessing the storage account.     |
| `delta_lake_azure_storage_endpoint`      | Optional. The endpoint for the Azure Blob storage account.             |

### Google Storage (GCS)

| Parameter Name           | Description                                                  |
| ------------------------ | ------------------------------------------------------------ |
| `google_service_account` | Filesystem path to the Google service account JSON key file. |

## Examples

### Delta Lake + Local

```yaml
- from: delta_lake:/path/to/local/delta/table # A local filesystem path to a Delta Lake table
  name: my_delta_lake_table
```

### Delta Lake + S3

```yaml
- from: delta_lake:s3://my_bucket/path/to/s3/delta/table/ # A reference to a table in S3
  name: my_delta_lake_table
  params:
    delta_lake_aws_region: us-west-2 # Optional
    delta_lake_aws_access_key_id: ${secrets:aws_access_key_id}
    delta_lake_aws_secret_access_key: ${secrets:aws_secret_access_key}
    delta_lake_aws_endpoint: s3.us-west-2.amazonaws.com # Optional
```

### Delta Lake + Azure Blob

```yaml
- from: delta_lake:abfss://my_container@my_account.dfs.core.windows.net/path/to/azure/delta/table/ # A reference to a table in Azure Blob
  name: my_delta_lake_table
  params:
    # Account Name + Key
    delta_lake_azure_storage_account_name: my_account
    delta_lake_azure_storage_account_key: ${secrets:my_key}

    # OR Service Principal + Secret
    delta_lake_azure_storage_client_id: my_client_id
    delta_lake_azure_storage_client_secret: ${secrets:my_secret}

    # OR SAS Key
    delta_lake_azure_storage_sas_key: my_sas_key
```

### Delta Lake + Google Storage

```yaml
params:
  delta_lake_google_service_account_path: /path/to/service-account.json
```

## Types

The table below shows the Delta Lake data types supported, along with the type mapping to Apache Arrow types in Spice.

| Delta Lake Type | Arrow Type                            |
| --------------- | ------------------------------------- |
| `String`        | `Utf8`                                |
| `Long`          | `Int64`                               |
| `Integer`       | `Int32`                               |
| `Short`         | `Int16`                               |
| `Byte`          | `Int8`                                |
| `Float`         | `Float32`                             |
| `Double`        | `Float64`                             |
| `Boolean`       | `Boolean`                             |
| `Binary`        | `Binary`                              |
| `Date`          | `Date32`                              |
| `Timestamp`     | `Timestamp(Microsecond, Some("UTC"))` |
| `TimestampNtz`  | `Timestamp(Microsecond, None)`        |
| `Decimal`       | `Decimal128`                          |
| `Array`         | `List`                                |
| `Struct`        | `Struct`                              |
| `Map`           | `Map`                                 |

## Limitations

* Delta Lake connector does not support reading Delta tables with the `V2Checkpoint` feature enabled. To use the Delta Lake connector with such tables, drop the `V2Checkpoint` feature by executing the following command:

  ```sql
  ALTER TABLE <table-name> DROP FEATURE v2Checkpoint [TRUNCATE HISTORY];
  ```

  For more details on dropping Delta table features, refer to the official documentation: [Drop Delta table features](https://docs.delta.io/latest/delta-drop-feature.html)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.spice.ai/building-blocks/data-connectors/delta-lake.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
