LogoLogo
BlogTwitterDiscordTelegramSignup/Login
  • Getting Started
    • Welcome to Spice.ai Cloud
    • Getting Started
      • Sign in with GitHub
      • Create a Spice app
      • Add a Dataset and query data
      • Add AI Model and chat with your data
      • Next Steps
    • FAQ
  • Features
    • Federated SQL Query
    • Data Acceleration
      • In-Memory Arrow Data Accelerator
      • DuckDB Data Accelerator
      • PostgreSQL Data Accelerator
      • SQLite Data Accelerator
    • Search & Retrieval
    • AI Gateway
    • Semantic Models
    • ML Models
    • Observability
      • Task History
      • Zipkin
  • Building Blocks
    • Data Connectors
      • ABFS
      • ClickHouse
      • Databricks
      • Delta Lake
      • Dremio
      • DuckDB
      • DynamoDB
      • FlightSQL
      • FTP
      • GitHub
      • GraphQL
      • HTTPS
      • LocalPod
      • Memory
      • MSSQL
      • MySQL
      • ODBC
      • Postgres
      • S3
      • SharePoint
      • Snowflake
      • Spark
      • SpiceAI
    • Model Providers
      • Anthropic
      • Azure
      • Hugging Face
      • OpenAI
      • Perplexity
      • SpiceAI
      • XAI
  • API
    • SQL Query API
      • HTTP API
      • Apache Arrow Flight API
    • OpenAI API
    • Health API
  • Portal
    • Playground
      • SQL Query
      • AI Chat
    • Organizations
    • Apps
      • API keys
      • Secrets
      • Connect GitHub
      • Transfer
    • Public Apps
    • App Spicepod
      • Spicepod Configuration
      • Deployments
      • Spice Runtime Versions
    • Monitoring
    • Profile
      • Personal Access Tokens
  • Use-Cases
    • Agentic AI Apps
    • Database CDN
    • Data Lakehouse
    • Enterprise Search
    • Enterprise RAG
  • SDKs
    • Python SDK
      • Streaming
    • Node.js SDK
      • Streaming
      • API Reference
    • Go SDK
    • Rust SDK
    • Dotnet SDK
    • Java SDK
  • Integrations
    • GitHub Copilot
    • Grafana
  • REFERENCE
    • Core Concepts
      • Duration Literals
    • SQL Reference
      • Data Types
      • SQL Functions
        • Aggregate
          • APPROX_COUNT_DISTINCT
          • AVG
          • BIT_AND
          • BIT_OR
          • CORR
          • COUNT
          • COVAR_POP
          • COVAR_SAMP
          • HLL
          • LISTAGG
          • MAX
          • MIN
          • NDV
          • STDDEV
          • STDDEV_POP
          • STDDEV_SAMP
          • SUM
          • VAR_POP
          • VAR_SAMP
        • Binary
          • BASE64
          • BIT_LENGTH
          • FROM_HEX
          • HEX
          • TO_HEX
          • UNBASE64
          • UNHEX
        • Bitwise
          • BIT_AND
          • BIT_OR
          • LSHIFT
          • RSHIFT
          • XOR
        • Boolean
          • IS [NOT] DISTINCT FROM
          • ISFALSE
          • IS [NOT] NULL
          • ISNUMERIC
          • ISTRUE
          • IS_MEMBER
        • Conditional
          • BOOL_AND
          • BOOL_OR
          • CASE
          • COALESCE
          • GREATEST
          • LEAST
          • NULLIF
        • Conversion
          • BINARY_STRING
          • CAST
          • CONVERT_FROM
          • CONVERT_REPLACEUTF8
          • CONVERT_TIMEZONE
          • CONVERT_TO
          • FLATTEN
          • FROM_HEX
          • HASH
          • HEX
          • TOASCII
          • TO_CHAR
          • TO_DATE
          • TO_HEX
          • TO_NUMBER
          • TO_TIME
          • TO_TIMESTAMP
          • UNHEX
        • Cryptography
          • AES_DECRYPT
          • AES_ENCRYPT
          • MD5
          • SHA
          • SHA1
          • SHA256
          • SHA512
        • Data Generation
          • RANDOM
        • Datatype
          • IS_BIGINT
          • IS_DATE
          • IS_INT
          • IS_VARCHAR
          • SIZE
          • TYPEOF
        • Date/Time
          • CONVERT_TIMEZONE
          • CURRENT_DATE
          • CURRENT_DATE_UTC
          • CURRENT_TIME
          • CURRENT_TIMESTAMP
          • DATEDIFF
          • DATE_ADD
          • DATE_DIFF
          • DATE_PART
          • DATE_SUB
          • DATE_TRUNC
          • DAY
          • DAYOFMONTH
          • DAYOFWEEK
          • DAYOFYEAR
          • EXTRACT
          • HOUR
          • LAST_DAY
          • MINUTE
          • MONTH
          • MONTHS_BETWEEN
          • NEXT_DAY
          • QUARTER
          • SECOND
          • TIMESTAMPADD
          • TIMESTAMPDIFF
          • TO_DATE
          • TO_TIME
          • TO_TIMESTAMP
          • UNIX_TIMESTAMP
          • WEEK
          • WEEKOFYEAR
          • YEAR
        • Math
          • ABS
          • ACOS
          • ASIN
          • ATAN
          • CBRT
          • CEILING
          • COS
          • COSH
          • COT
          • DEGREES
          • E
          • EXP
          • FLOOR
          • LOG
          • LOG10
          • MOD
          • PI
          • POWER
          • RADIANS
          • ROUND
          • SIGN
          • SIN
          • SINH
          • SQRT
          • STDDEV
          • STDDEV_POP
          • STDDEV_SAMP
          • TAN
          • TANH
          • TRUNCATE
        • Percentile
          • MEDIAN
          • PERCENTILE_CONT
          • PERCENTILE_DISC
        • Regular Expressions
          • REGEXP_EXTRACT
          • REGEXP_LIKE
          • REGEXP_MATCHES
          • REGEXP_REPLACE
          • REGEXP_SPLIT
        • Semistructured Data
          • ARRAY_CONTAINS
          • MAP_KEYS
          • MAP_VALUES
        • String
          • ASCII
          • BASE64
          • BTRIM
          • CHARACTER_LENGTH
          • CHAR_LENGTH
          • CHR
          • COL_LIKE
          • CONCAT
          • CONCAT_WS
          • ENDS_WITH
          • FROM_HEX
          • HEX
          • ILIKE
          • INITCAP
          • INSTR
          • IS_UTF8
          • LCASE
          • LEFT
          • LENGTH
          • LEVENSHTEIN
          • LIKE
          • LOCATE
          • LOWER
          • LPAD
          • LTRIM
          • MASK
          • MASK_FIRST_N
          • MASK_HASH
          • MASK_LAST_N
          • MASK_SHOW_FIRST_N
          • MASK_SHOW_LAST_N
          • OCTET_LENGTH
          • POSITION
          • QUOTE
          • REGEXP_EXTRACT
          • REGEXP_LIKE
          • REGEXP_MATCHES
          • REGEXP_REPLACE
          • REGEXP_SPLIT
          • REPEAT
          • REPEATSTR
          • REPLACE
          • REVERSE
          • RIGHT
          • RPAD
          • RTRIM
          • SIMILAR_TO
          • SOUNDEX
          • SPLIT_PART
          • STARTS_WITH
          • STRPOS
          • SUBSTRING
          • SUBSTRING_INDEX
          • TOASCII
          • TO_HEX
          • TRANSLATE
          • TRIM
          • UCASE
          • UNBASE64
          • UNHEX
          • UPPER
        • Window
          • COUNT
          • COVAR_POP
          • COVAR_SAMP
          • CUME_DIST
          • DENSE_RANK
          • FIRST_VALUE
          • HLL
          • LAG
          • LEAD
          • MAX
          • MIN
          • NDV
          • NTILE
          • PERCENT_RANK
          • RANK
          • ROW_NUMBER
          • SUM
          • VAR_POP
          • VAR_SAMP
      • SQL Commands
        • SELECT
        • USE
        • SHOW
        • DESCRIBE
        • WITH
    • Release Notes
  • Pricing
    • Paid Plans
    • Community Plan
  • Support
    • Support
  • Security
    • Security at Spice AI
    • Report a vulnerability
  • Legal
    • Privacy Policy
    • Website Terms of Use
    • Terms of Service
    • End User License Agreement
Powered by GitBook
On this page
  • Common Configuration
  • Configuration
  • from
  • name
  • params
  • Filter Push Down
  • Examples
  • Querying GitHub Files
  • Querying GitHub Issues
  • Querying GitHub Pull Requests
  • Querying GitHub Commits
  • Querying GitHub stars (Stargazers)

Was this helpful?

Edit on GitHub
Export as PDF
  1. Building Blocks
  2. Data Connectors

GitHub

GitHub Data Connector Documentation

Last updated 3 months ago

Was this helpful?

The GitHub Data Connector enables federated SQL queries on various GitHub resources such as files, issues, pull requests, and commits by specifying github as the selector in the from value for the dataset.

Common Configuration

Configuration

from

The from field takes the form of github:github.com/<owner>/<repo>/<content> where content could be files, issues, pulls, commits, stargazers. See for more configuration detail.

name

The dataset name. This will be used as the table name within Spice.

params

Personal Access Token

Parameter Name
Description

github_token

GitHub App Installation

GitHub Apps provide a secure and scalable way to integrate with GitHub's API. .

Parameter Name
Description

github_client_id

Required. Specifies the client ID for GitHub App Installation auth mode.

github_private_key

Required. Specifies the private key for GitHub App Installation auth mode.

github_installation_id

Required. Specifies the installation ID for GitHub App Installation auth mode.

Limitations

With GitHub App Installation authentication, the connector's functionality depends on the permissions and scope of the GitHub App. Ensure that the app is installed on the repositories and configured with content, commits, issues and pull permissions to allow the corresponding datasets to work.

Common Parameters

Parameter Name
Description

github_query_mode

owner

Required. Specifies the owner of the GitHub repository.

repo

Required. Specifies the name of the GitHub repository.

Filter Push Down

GitHub queries support a github_query_mode parameter, which can be set to either auto or search for the following types:

  • Issues: Defaults to auto. Query filters are only pushed down to the GitHub API in search mode.

  • Pull Requests: Defaults to auto. Query filters are only pushed down to the GitHub API in search mode.

  • author and state; supports exact matches, or NOT matches. For example, WHERE author = 'peasee' or WHERE author <> 'peasee'.

  • body and title; supports exact matches, or LIKE matches. For example, WHERE body LIKE '%duckdb%'.

All other filters are supported when github_query_mode is set to search, but cannot be pushed down to the GitHub API for improved performance.

Limitations

  • GitHub has a limitation in the Search API where it may return more stale data than the standard API used in the default query mode.

Examples

Querying GitHub Files

Limitations

  • content column is fetched only when acceleration is enabled.

  • Querying GitHub files does not support filter push down, which may result in long query times when acceleration is disabled.

  • Setting github_query_mode to search is not supported.

  • ref - Required. Specifies the GitHub branch or tag to fetch files from.

  • include - Optional. Specifies a pattern to include specific files. Supports glob patterns. If not specified, all files are included by default.

datasets:
  - from: github:github.com/<owner>/<repo>/files/<ref>
    name: spiceai.files
    params:
      github_token: ${secrets:GITHUB_TOKEN}
      include: '**/*.json; **/*.yaml'
    acceleration:
      enabled: true

Schema

Column Name
Data Type
Is Nullable

name

Utf8

YES

path

Utf8

YES

size

Int64

YES

sha

Utf8

YES

mode

Utf8

YES

url

Utf8

YES

download_url

Utf8

YES

content

Utf8

YES

Example

datasets:
  - from: github:github.com/spiceai/spiceai/files/v0.17.2-beta
    name: spiceai.files
    params:
      github_token: ${secrets:GITHUB_TOKEN}
      include: '**/*.txt' # include txt files only
    acceleration:
      enabled: true
sql> select * from spiceai.files
+-------------+-------------+------+------------------------------------------+--------+-------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------+-------------+
| name        | path        | size | sha                                      | mode   | url                                                                                             | download_url                                                               | content     |
+-------------+-------------+------+------------------------------------------+--------+-------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------+-------------+
| version.txt | version.txt | 12   | ee80f747038c30e776eecb2c2ae155dec9a68187 | 100644 | https://api.github.com/repos/spiceai/spiceai/git/blobs/ee80f747038c30e776eecb2c2ae155dec9a68187 | https://raw.githubusercontent.com/spiceai/spiceai/v0.17.2-beta/version.txt | 0.17.2-beta |
|             |             |      |                                          |        |                                                                                                 |                                                                            |             |
+-------------+-------------+------+------------------------------------------+--------+-------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------+-------------+

Time: 0.005067 seconds. 1 rows.

Querying GitHub Issues

Limitations

datasets:
  - from: github:github.com/<owner>/<repo>/issues
    name: spiceai.issues
    params:
      github_token: ${secrets:GITHUB_TOKEN}
    acceleration:
      enabled: true

Schema

Column Name
Data Type
Is Nullable

assignees

List(Utf8)

YES

author

Utf8

YES

body

Utf8

YES

closed_at

Timestamp

YES

comments

List(Struct)

YES

created_at

Timestamp

YES

id

Utf8

YES

labels

List(Utf8)

YES

milestone_id

Utf8

YES

milestone_title

Utf8

YES

comments_count

Int64

YES

number

Int64

YES

state

Utf8

YES

title

Utf8

YES

updated_at

Timestamp

YES

url

Utf8

YES

Example

datasets:
  - from: github:github.com/spiceai/spiceai/issues
    name: spiceai.issues
    params:
      github_token: ${secrets:GITHUB_TOKEN}
sql> select title, state, labels from spiceai.issues where title like '%duckdb%'
+-----------------------------------------------------------------------------------------------------------+--------+----------------------+
| title                                                                                                     | state  | labels               |
+-----------------------------------------------------------------------------------------------------------+--------+----------------------+
| Limitation documentation duckdb accelerator about nested struct and decimal256                            | CLOSED | [kind/documentation] |
| Inconsistent duckdb connector params: `params.open` and `params.duckdb_file`                              | CLOSED | [kind/bug]           |
| federation across multiple duckdb acceleration tables.                                                    | CLOSED | []                   |
| Integration tests to cover "On Conflict" behaviors for duckdb accelerator                                 | CLOSED | [kind/task]          |
| Permission denied issue while using duckdb data connector with spice using HELM for Kubernetes deployment | CLOSED | [kind/bug]           |
+-----------------------------------------------------------------------------------------------------------+--------+----------------------+

Time: 0.011877542 seconds. 5 rows.

Querying GitHub Pull Requests

Limitations

datasets:
  - from: github:github.com/<owner>/<repo>/pulls
    name: spiceai.pulls
    params:
      github_token: ${secrets:GITHUB_TOKEN}

Schema

Column Name
Data Type
Is Nullable

additions

Int64

YES

assignees

List(Utf8)

YES

author

Utf8

YES

body

Utf8

YES

changed_files

Int64

YES

closed_at

Timestamp

YES

comments_count

Int64

YES

commits_count

Int64

YES

created_at

Timestamp

YES

deletions

Int64

YES

hashes

List(Utf8)

YES

id

Utf8

YES

labels

List(Utf8)

YES

merged_at

Timestamp

YES

number

Int64

YES

reviews_count

Int64

YES

state

Utf8

YES

title

Utf8

YES

url

Utf8

YES

Example

datasets:
  - from: github:github.com/spiceai/spiceai/pulls
    name: spiceai.pulls
    params:
      github_token: ${secrets:GITHUB_TOKEN}
    acceleration:
      enabled: true
sql> select title, url, state from spiceai.pulls where title like '%GitHub connector%'
+---------------------------------------------------------------------+----------------------------------------------+--------+
| title                                                               | url                                          | state  |
+---------------------------------------------------------------------+----------------------------------------------+--------+
| GitHub connector: convert `labels` and `hashes` to primitive arrays | https://github.com/spiceai/spiceai/pull/2452 | MERGED |
+---------------------------------------------------------------------+----------------------------------------------+--------+

Time: 0.034996667 seconds. 1 rows.

Append Example

datasets:
  - from: github:github.com/spiceai/spiceai/pulls
    name: spiceai.pulls
    params:
      github_token: ${secrets:GITHUB_TOKEN}
      github_query_mode: search
    time_column: created_at
    acceleration:
      enabled: true
      refresh_mode: append
      refresh_check_interval: 6h # check for new results every 6 hours
      refresh_data_window: 90d # at initial load, load the last 90 days of pulls

Querying GitHub Commits

Limitations

  • Setting github_query_mode to search is not supported.

datasets:
  - from: github:github.com/<owner>/<repo>/commits
    name: spiceai.commits
    params:
      github_token: ${secrets:GITHUB_TOKEN}

Schema

Column Name
Data Type
Is Nullable

additions

Int64

YES

author_email

Utf8

YES

author_name

Utf8

YES

committed_date

Timestamp

YES

deletions

Int64

YES

id

Utf8

YES

message

Utf8

YES

message_body

Utf8

YES

message_head_line

Utf8

YES

sha

Utf8

YES

Example

datasets:
  - from: github:github.com/spiceai/spiceai/commits
    name: spiceai.commits
    params:
      github_token: ${secrets:GITHUB_TOKEN}
    acceleration:
      enabled: true
sql> select sha, message_head_line from spiceai.commits limit 10
+------------------------------------------+------------------------------------------------------------------------+
| sha                                      | message_head_line                                                      |
+------------------------------------------+------------------------------------------------------------------------+
| 2a9fab7905737e1af182e17f40aecc5c4b5dd236 |  wait 2 seconds for the status to turn ready in refreshing status tes… |
| b9c210a818abeaf14d2493fde5227781f47faed8 | Update README.md - Remove bigquery from tablet of connectors (#1434)   |
| d61e1af61ebf826f83703b8dd939f19e8b2ba426 | Add databricks_use_ssl parameter (#1406)                               |
| f1ec55c5986e3e5d57eff94197182ffebbae1045 | wording and logs change reflected on readme (#1435)                    |
| bfc74185584d1e048ef66c72ce3572a0b652bfd9 | Update acknowledgements (#1433)                                        |
| 0d870f1791d456e7924b4ecbbda5f3b762db1e32 | Update helm version and use v0.13.0-alpha (#1436)                      |
| 12f930cbad69833077bd97ea43599a75cff985fc | Enable push-down federation by default (#1429)                         |
| 6e4521090aaf39664bd61d245581d34398ce77db | Add functional tests for federation push-down (#1428)                  |
| fa3279b7d9fcaa5e8baaa2425f69b556bb30e309 | Add LRU cache support for http-based sql queries (#1410)               |
| a3f93dde9d1312bfbf14f7ae3b75bdc468289212 | Add guides and examples about error handling (#1427)                   |
+------------------------------------------+------------------------------------------------------------------------+

Time: 0.0065395 seconds. 10 rows.

Querying GitHub stars (Stargazers)

Limitations

  • Setting github_query_mode to search is not supported.

datasets:
  - from: github:github.com/<owner>/<repo>/stargazers
    name: spiceai.stargazers
    params:
      github_token: ${secrets:GITHUB_TOKEN}

Schema

Column Name
Data Type
Is Nullable

starred_at

Timestamp

YES

login

Utf8

YES

email

Utf8

YES

name

Utf8

YES

company

Utf8

YES

x_username

Utf8

YES

location

Utf8

YES

avatar_url

Utf8

YES

bio

Utf8

YES

Example

datasets:
  - from: github:github.com/spiceai/spiceai/stargazers
    name: spiceai.stargazers
    params:
      github_token: ${secrets:GITHUB_TOKEN}
    acceleration:
      enabled: true
sql> select starred_at, login from spiceai.stargazers order by starred_at DESC limit 10
+----------------------+----------------------+
| starred_at           | login                |
+----------------------+----------------------+
| 2024-09-15T13:22:09Z | cisen                |
| 2024-09-14T18:04:22Z | tyan-boot            |
| 2024-09-13T10:38:01Z | yofriadi             |
| 2024-09-13T10:01:33Z | FourSpaces           |
| 2024-09-13T04:02:11Z | d4x1                 |
| 2024-09-11T18:10:28Z | stephenakearns-insta |
| 2024-09-09T22:17:42Z | Lrs121               |
| 2024-09-09T19:56:26Z | jonathanfinley       |
| 2024-09-09T07:02:10Z | leookun              |
| 2024-09-09T03:04:27Z | royswale             |
+----------------------+----------------------+

Time: 0.0088075 seconds. 10 rows.

Required. GitHub personal access token to use to connect to the GitHub API. .

Optional. Specifies whether the connector should use the GitHub for improved filter performance. Defaults to auto, possible values of auto or search.

Commits only supports auto mode. Query with filter push down is only enabled for the committed_date column. commited_date supports exact matches, or greater/less than matches for dates provided in format, like WHERE committed_date > '2024-09-24'.

When set to search, Issues and Pull Requests will use the GitHub for improved filter performance when querying against the columns:

updated_at, created_at, merged_at and closed_at; supports exact matches, or greater/less than matches with dates provided in format. For example, WHERE created_at > '2024-09-24'.

GitHub has a limitation in the Search API where it only returns a maximum of 1000 results for a query. Use to retrieve more results over time. See the for pull requests.

Querying with filters using date columns requires the use of . For example, WHERE created_at > '2024-09-24'.

Querying with filters using date columns requires the use of . For example, WHERE created_at > '2024-09-24'.

Querying with filters using date columns requires the use of . For example, WHERE committed_date > '2024-09-24'.

Querying with filters using date columns requires the use of . For example, WHERE starred_at > '2024-09-24'.

ISO8601
Search API
ISO8601
ISO8601 formatted dates
ISO8601 formatted dates
ISO8601 formatted dates
ISO8601 formatted dates
Learn more
examples
append mode acceleration
append example
Learn more
search API