Data Connectors
Learn how to use Data Connector to query external data.
Data Connectors provide connections to databases, data warehouses, and data lakes for federated SQL queries and data replication.
Supported Data Connectors include:
databricks (mode: delta_lake)
[Databricks][databricks]
S3/Delta Lake
delta_lake
Delta Lake
Delta Lake
dremio
[Dremio][dremio]
Arrow Flight
duckdb
DuckDB
Embedded
github
GitHub
GitHub API
postgres
PostgreSQL
s3
[S3][s3]
Parquet, CSV
mysql
MySQL
delta_lake
Delta Lake
Delta Lake
graphql
GraphQL
JSON
databricks (mode: spark_connect)
[Databricks][databricks]
[Spark Connect][spark]
flightsql
FlightSQL
Arrow Flight SQL
mssql
Microsoft SQL Server
Tabular Data Stream (TDS)
snowflake
Snowflake
Arrow
spark
Spark
[Spark Connect][spark]
spice.ai
[Spice.ai][spiceai]
Arrow Flight
iceberg
[Apache Iceberg][iceberg]
Parquet
abfs
Azure BlobFS
Parquet, CSV
clickhouse
Clickhouse
debezium
Debezium CDC
Kafka + JSON
dynamodb
DynamoDB
ftp
, sftp
FTP/SFTP
Parquet, CSV
http
, https
HTTP(s)
Parquet, CSV
sharepoint
Microsoft SharePoint
Unstructured UTF-8 documents
Object Store File Formats
For data connectors that are object store compatible, if a folder is provided, the file format must be specified with params.file_format
.
If a file is provided, the file format will be inferred, and params.file_format
is unnecessary.
File formats currently supported are:
JSON
file_format: json
Roadmap
❌
Microsoft Excel
file_format: xlsx
Roadmap
❌
Markdown
file_format: md
✅
✅
Text
file_format: txt
✅
✅
file_format: pdf
Alpha
✅
Microsoft Word
file_format: docx
Alpha
✅
File formats support additional parameters in the params
(like csv_has_header
) described in File Formats
If a format is a document format, each file will be treated as a document, as per document support below.
Note Document formats in Alpha (e.g. pdf, docx) may not parse all structure or text from the underlying documents correctly.
Last updated
Was this helpful?