GitHub
GitHub Data Connector Documentation
The GitHub Data Connector enables federated SQL queries on various GitHub resources such as files, issues, pull requests, and commits by specifying github
as the selector in the from
value for the dataset.
Common Configuration
Configuration
from
from
The from
field takes the form of github:github.com/<owner>/<repo>/<content>
where content
could be files
, issues
, pulls
, commits
, stargazers
. See examples for more configuration detail.
name
name
The dataset name. This will be used as the table name within Spice.
params
params
Personal Access Token
github_token
Required. GitHub personal access token to use to connect to the GitHub API. Learn more.
GitHub App Installation
GitHub Apps provide a secure and scalable way to integrate with GitHub's API. Learn more.
github_client_id
Required. Specifies the client ID for GitHub App Installation auth mode.
github_private_key
Required. Specifies the private key for GitHub App Installation auth mode.
github_installation_id
Required. Specifies the installation ID for GitHub App Installation auth mode.
Limitations
With GitHub App Installation authentication, the connector's functionality depends on the permissions and scope of the GitHub App. Ensure that the app is installed on the repositories and configured with content, commits, issues and pull permissions to allow the corresponding datasets to work.
Common Parameters
github_query_mode
Optional. Specifies whether the connector should use the GitHub search API for improved filter performance. Defaults to auto
, possible values of auto
or search
.
owner
Required. Specifies the owner of the GitHub repository.
repo
Required. Specifies the name of the GitHub repository.
Filter Push Down
GitHub queries support a github_query_mode
parameter, which can be set to either auto
or search
for the following types:
Issues: Defaults to
auto
. Query filters are only pushed down to the GitHub API insearch
mode.Pull Requests: Defaults to
auto
. Query filters are only pushed down to the GitHub API insearch
mode.
Commits only supports auto
mode. Query with filter push down is only enabled for the committed_date
column. commited_date
supports exact matches, or greater/less than matches for dates provided in ISO8601 format, like WHERE committed_date > '2024-09-24'
.
When set to search
, Issues and Pull Requests will use the GitHub Search API for improved filter performance when querying against the columns:
author
andstate
; supports exact matches, or NOT matches. For example,WHERE author = 'peasee'
orWHERE author <> 'peasee'
.body
andtitle
; supports exact matches, or LIKE matches. For example,WHERE body LIKE '%duckdb%'
.updated_at
,created_at
,merged_at
andclosed_at
; supports exact matches, or greater/less than matches with dates provided in ISO8601 format. For example,WHERE created_at > '2024-09-24'
.
All other filters are supported when github_query_mode
is set to search
, but cannot be pushed down to the GitHub API for improved performance.
Limitations
GitHub has a limitation in the Search API where it may return more stale data than the standard API used in the default query mode.
GitHub has a limitation in the Search API where it only returns a maximum of 1000 results for a query. Use append mode acceleration to retrieve more results over time. See the append example for pull requests.
Examples
Querying GitHub Files
Limitations
content
column is fetched only when acceleration is enabled.Querying GitHub files does not support filter push down, which may result in long query times when acceleration is disabled.
Setting
github_query_mode
tosearch
is not supported.
ref
- Required. Specifies the GitHub branch or tag to fetch files from.include
- Optional. Specifies a pattern to include specific files. Supports glob patterns. If not specified, all files are included by default.
Schema
name
Utf8
YES
path
Utf8
YES
size
Int64
YES
sha
Utf8
YES
mode
Utf8
YES
url
Utf8
YES
download_url
Utf8
YES
content
Utf8
YES
Example
Querying GitHub Issues
Limitations
Querying with filters using date columns requires the use of ISO8601 formatted dates. For example,
WHERE created_at > '2024-09-24'
.
Schema
assignees
List(Utf8)
YES
author
Utf8
YES
body
Utf8
YES
closed_at
Timestamp
YES
comments
List(Struct)
YES
created_at
Timestamp
YES
id
Utf8
YES
labels
List(Utf8)
YES
milestone_id
Utf8
YES
milestone_title
Utf8
YES
comments_count
Int64
YES
number
Int64
YES
state
Utf8
YES
title
Utf8
YES
updated_at
Timestamp
YES
url
Utf8
YES
Example
Querying GitHub Pull Requests
Limitations
Querying with filters using date columns requires the use of ISO8601 formatted dates. For example,
WHERE created_at > '2024-09-24'
.
Schema
additions
Int64
YES
assignees
List(Utf8)
YES
author
Utf8
YES
body
Utf8
YES
changed_files
Int64
YES
closed_at
Timestamp
YES
comments_count
Int64
YES
commits_count
Int64
YES
created_at
Timestamp
YES
deletions
Int64
YES
hashes
List(Utf8)
YES
id
Utf8
YES
labels
List(Utf8)
YES
merged_at
Timestamp
YES
number
Int64
YES
reviews_count
Int64
YES
state
Utf8
YES
title
Utf8
YES
url
Utf8
YES
Example
Append Example
Querying GitHub Commits
Limitations
Querying with filters using date columns requires the use of ISO8601 formatted dates. For example,
WHERE committed_date > '2024-09-24'
.Setting
github_query_mode
tosearch
is not supported.
Schema
additions
Int64
YES
author_email
Utf8
YES
author_name
Utf8
YES
committed_date
Timestamp
YES
deletions
Int64
YES
id
Utf8
YES
message
Utf8
YES
message_body
Utf8
YES
message_head_line
Utf8
YES
sha
Utf8
YES
Example
Querying GitHub stars (Stargazers)
Limitations
Querying with filters using date columns requires the use of ISO8601 formatted dates. For example,
WHERE starred_at > '2024-09-24'
.Setting
github_query_mode
tosearch
is not supported.
Schema
starred_at
Timestamp
YES
login
Utf8
YES
Utf8
YES
name
Utf8
YES
company
Utf8
YES
x_username
Utf8
YES
location
Utf8
YES
avatar_url
Utf8
YES
bio
Utf8
YES
Example
Last updated
Was this helpful?