Skip to content
Try Free →

How to connect GitHub to AskVault

Last updated: · 4 min read

What gets ingested

Three GitHub content types map to AskVault knowledge:

  • README.md at the repo root. Most common case.
  • Markdown files in docs/ directories. A walking pattern picks up everything matching *.md under docs/, documentation/, or wiki/ paths.
  • GitHub Wiki pages if the repo has a Wiki enabled.

Code files (.py, .js, .ts, etc.) are not indexed by default. To include them, configure under Integrations > GitHub > File Patterns. Useful for technical documentation that lives in inline code comments, but be cautious about exposing internal code to the chatbot retrieval surface.

Setup

Four minutes end-to-end.

  1. Open Knowledge > Add Source > GitHub in AskVault. Click Connect with GitHub.
  2. OAuth consent screen. Sign in with GitHub. Grant the AskVault app access to your organizations or specific repos.
  3. Pick repos. Either select specific repositories or grant access to all repos in an org. Org-wide access is convenient but exposes everything; per-repo access is safer.
  4. Configure file patterns. Default: README.md, docs/**/*.md. Adjust per your needs.
  5. Enable webhook re-sync (optional). AskVault sets up a webhook on each connected repo so commits trigger re-indexing within 60 seconds. Without the webhook, the integration re-syncs daily.

Test by asking the bot a question whose answer is in your README. The bot should retrieve and cite back to the GitHub file URL.

Scopes requested

GitHub OAuth flow asks for:

  • repo (private repos) or public_repo (only public).
  • read:org for organization-level repo discovery.

For maximum security, grant access only to specific repos rather than org-wide. AskVault never writes to your repos.

Sync behavior

Two sync mechanisms:

  • Daily sync (default). Runs at midnight UTC. Re-checks every connected repo for changed files. Incremental: only changed files re-index.
  • Webhook sync (recommended). AskVault registers a push webhook on each connected repo. New commits to the default branch trigger re-indexing within about 60 seconds. Other branches are ignored.

Manual sync available under Knowledge Hub > GitHub > Resync anytime.

Branch handling

By default AskVault indexes content from the default branch (typically main or master). Configure under Integrations > GitHub > Branch Selection:

  • Default branch only (recommended for most cases).
  • Specific branch per repo. Useful for teams with a docs branch that's separate from feature work.
  • Tag-based. Index only content from tagged releases (e.g., v1.2.0). Useful for versioned documentation.

Private repo handling

Private repos work the same as public, with the appropriate OAuth scopes granted. Content from private repos is workspace-isolated like any other knowledge source. The bot only retrieves it for verified visitors with the right audience tags.

For sensitive internal documentation, combine private-repo ingestion with identity verification and audience-tagging.

GitHub Enterprise Server

GitHub Enterprise Server (self-hosted) is supported under Enterprise contracts. The connection uses a personal access token instead of OAuth (GitHub Enterprise Server's OAuth doesn't always reach the public internet). Contact sales@askvault.co for setup. Enterprise

What the bot can answer from GitHub content

Common patterns:

  • API documentation questions when the docs live in a docs/ directory of your repo.
  • Setup and installation steps from your README.
  • Troubleshooting from CONTRIBUTING.md or known-issues docs.
  • Architecture overview from a high-level design doc.
  • Versioned release notes from CHANGELOG.md.

The bot retrieves chunks and cites back to the GitHub file URL. Visitors clicking through land on the rendered Markdown on GitHub.

Limits

  • GitHub API rate limits. 5,000 requests per hour for authenticated requests. AskVault uses about 5 to 20 requests per repo per sync.
  • Repo count per workspace. No hard cap, but performance degrades past about 200 repos. Most teams need 5 to 30.
  • File size cap. Files larger than 1 MB get truncated. The first 1 MB indexes; the rest is skipped.

Common pitfalls

Files match the pattern but don't appear in Knowledge Hub. Repo isn't on the default branch you indexed, or the webhook hasn't fired yet. Check Knowledge Hub > GitHub > Last Sync Time.

Webhook setup fails with "AskVault couldn't reach the webhook URL". GitHub Enterprise Server behind a firewall. Use daily sync only, no webhook.

README appears but other docs don't. File pattern doesn't match. Default is README.md plus docs/**/*.md. If your docs live in documentation/ or wiki/, adjust the pattern.

Bot cites old content. Recent commit hasn't synced yet. Check webhook delivery in GitHub's repo settings. Or trigger a manual resync.

FAQ

Does this work with GitHub Actions or CI/CD content?

Workflow YAML files (.github/workflows/*.yml) aren't indexed by default. They're code-like content rather than narrative documentation. Add them to the file pattern if you want them.

Can the bot answer code questions?

For documentation about code (README, docs/), yes. For inline code-block questions ("how does function X work?"), the bot can retrieve from the README/docs but won't read the source files unless you add code-file patterns. Be careful enabling code-file indexing if you have proprietary algorithms.

Will this expose private repo content via the chatbot?

Only to visitors authorized via identity verification and audience tags. By default, an anonymous visitor to your public website widget won't get private-repo content even if it's indexed. The verification layer enforces it.

How do I unindex a repo?

Under Knowledge Hub > GitHub > [repo] > Disconnect. Removes the webhook and deletes indexed content from your vector store within 5 minutes. Backups are purged within 30 days.

Does this work with monorepos?

Yes. Use file patterns to select which paths get indexed. For a typical monorepo, you might pick packages/*/README.md and docs/**/*.md, skipping the rest.

Was this page helpful?