GitHub

GitHub Data Connector Documentation

The GitHub Data Connector enables federated SQL queries on various GitHub resources such as files, issues, pull requests, and commits by specifying github as the selector in the from value for the dataset.

Common Configuration

Configuration

from

The from field takes the form of github:github.com/<owner>/<repo>/<content> where content could be files, issues, pulls, commits, stargazers. See examples for more configuration detail.

name

The dataset name. This will be used as the table name within Spice.

params

Personal Access Token

Parameter Name
Description

github_token

Required. GitHub personal access token to use to connect to the GitHub API. Learn more.

GitHub App Installation

GitHub Apps provide a secure and scalable way to integrate with GitHub's API. Learn more.

Parameter Name
Description

github_client_id

Required. Specifies the client ID for GitHub App Installation auth mode.

github_private_key

Required. Specifies the private key for GitHub App Installation auth mode.

github_installation_id

Required. Specifies the installation ID for GitHub App Installation auth mode.

Common Parameters

Parameter Name
Description

github_query_mode

Optional. Specifies whether the connector should use the GitHub search API for improved filter performance. Defaults to auto, possible values of auto or search.

owner

Required. Specifies the owner of the GitHub repository.

repo

Required. Specifies the name of the GitHub repository.

Filter Push Down

GitHub queries support a github_query_mode parameter, which can be set to either auto or search for the following types:

  • Issues: Defaults to auto. Query filters are only pushed down to the GitHub API in search mode.

  • Pull Requests: Defaults to auto. Query filters are only pushed down to the GitHub API in search mode.

Commits only supports auto mode. Query with filter push down is only enabled for the committed_date column. commited_date supports exact matches, or greater/less than matches for dates provided in ISO8601 format, like WHERE committed_date > '2024-09-24'.

When set to search, Issues and Pull Requests will use the GitHub Search API for improved filter performance when querying against the columns:

  • author and state; supports exact matches, or NOT matches. For example, WHERE author = 'peasee' or WHERE author <> 'peasee'.

  • body and title; supports exact matches, or LIKE matches. For example, WHERE body LIKE '%duckdb%'.

  • updated_at, created_at, merged_at and closed_at; supports exact matches, or greater/less than matches with dates provided in ISO8601 format. For example, WHERE created_at > '2024-09-24'.

All other filters are supported when github_query_mode is set to search, but cannot be pushed down to the GitHub API for improved performance.

Examples

Querying GitHub Files

  • ref - Required. Specifies the GitHub branch or tag to fetch files from.

  • include - Optional. Specifies a pattern to include specific files. Supports glob patterns. If not specified, all files are included by default.

Schema

Column Name
Data Type
Is Nullable

name

Utf8

YES

path

Utf8

YES

size

Int64

YES

sha

Utf8

YES

mode

Utf8

YES

url

Utf8

YES

download_url

Utf8

YES

content

Utf8

YES

Example

Querying GitHub Issues

Schema

Column Name
Data Type
Is Nullable

assignees

List(Utf8)

YES

author

Utf8

YES

body

Utf8

YES

closed_at

Timestamp

YES

comments

List(Struct)

YES

created_at

Timestamp

YES

id

Utf8

YES

labels

List(Utf8)

YES

milestone_id

Utf8

YES

milestone_title

Utf8

YES

comments_count

Int64

YES

number

Int64

YES

state

Utf8

YES

title

Utf8

YES

updated_at

Timestamp

YES

url

Utf8

YES

Example

Querying GitHub Pull Requests

Schema

Column Name
Data Type
Is Nullable

additions

Int64

YES

assignees

List(Utf8)

YES

author

Utf8

YES

body

Utf8

YES

changed_files

Int64

YES

closed_at

Timestamp

YES

comments_count

Int64

YES

commits_count

Int64

YES

created_at

Timestamp

YES

deletions

Int64

YES

hashes

List(Utf8)

YES

id

Utf8

YES

labels

List(Utf8)

YES

merged_at

Timestamp

YES

number

Int64

YES

reviews_count

Int64

YES

state

Utf8

YES

title

Utf8

YES

url

Utf8

YES

Example

Append Example

Querying GitHub Commits

Schema

Column Name
Data Type
Is Nullable

additions

Int64

YES

author_email

Utf8

YES

author_name

Utf8

YES

committed_date

Timestamp

YES

deletions

Int64

YES

id

Utf8

YES

message

Utf8

YES

message_body

Utf8

YES

message_head_line

Utf8

YES

sha

Utf8

YES

Example

Querying GitHub stars (Stargazers)

Schema

Column Name
Data Type
Is Nullable

starred_at

Timestamp

YES

login

Utf8

YES

email

Utf8

YES

name

Utf8

YES

company

Utf8

YES

x_username

Utf8

YES

location

Utf8

YES

avatar_url

Utf8

YES

bio

Utf8

YES

Example

Last updated

Was this helpful?