Data Connectors

Learn how to use Data Connector to query external data.

Data Connectors provide connections to databases, data warehouses, and data lakes for federated SQL queries and data replication.

Supported Data Connectors include:

Name
Description
Protocol/Format

databricks (mode: delta_lake)

[Databricks][databricks]

S3/Delta Lake

delta_lake

Delta Lake

Delta Lake

dremio

[Dremio][dremio]

Arrow Flight

duckdb

DuckDB

Embedded

github

GitHub

GitHub API

postgres

PostgreSQL

s3

[S3][s3]

Parquet, CSV

mysql

MySQL

delta_lake

Delta Lake

Delta Lake

graphql

GraphQL

JSON

databricks (mode: spark_connect)

[Databricks][databricks]

[Spark Connect][spark]

flightsql

FlightSQL

Arrow Flight SQL

mssql

Microsoft SQL Server

Tabular Data Stream (TDS)

snowflake

Snowflake

Arrow

spark

Spark

[Spark Connect][spark]

spice.ai

[Spice.ai][spiceai]

Arrow Flight

iceberg

[Apache Iceberg][iceberg]

Parquet

abfs

Azure BlobFS

Parquet, CSV

clickhouse

Clickhouse

debezium

Debezium CDC

Kafka + JSON

dynamodb

DynamoDB

ftp, sftp

FTP/SFTP

Parquet, CSV

http, https

HTTP(s)

Parquet, CSV

sharepoint

Microsoft SharePoint

Unstructured UTF-8 documents

Object Store File Formats

For data connectors that are object store compatible, if a folder is provided, the file format must be specified with params.file_format.

If a file is provided, the file format will be inferred, and params.file_format is unnecessary.

File formats currently supported are:

Name
Parameter
Supported
Is Document Format

file_format: parquet

file_format: csv

file_format: iceberg

Roadmap

JSON

file_format: json

Roadmap

Microsoft Excel

file_format: xlsx

Roadmap

Markdown

file_format: md

Text

file_format: txt

PDF

file_format: pdf

Alpha

Microsoft Word

file_format: docx

Alpha

File formats support additional parameters in the params (like csv_has_header) described in File Formats

If a format is a document format, each file will be treated as a document, as per document support below.

Note Document formats in Alpha (e.g. pdf, docx) may not parse all structure or text from the underlying documents correctly.

Last updated

Was this helpful?