Skip to content
Try Free →

Ingest knowledge from Confluence

Last updated: · 4 min read

What gets indexed

For each connected Confluence space or page tree:

  • Page titles and bodies.
  • Child pages. Followed recursively up to 5 levels by default.
  • Tables. Cell content preserved with column headers.
  • Code blocks.
  • Attachments (as captions; binary content not analyzed today).
  • Page labels (used for filtering, not as content).

What's not indexed:

  • Comments on pages (off by default; toggle on).
  • Spaces the OAuth user can't see.
  • Personal drafts.

Setup walkthrough

About 15 minutes:

Step 1: connect Confluence

For Confluence Cloud (Atlassian Cloud):

  1. Open Knowledge Hub > Add Source > Confluence.
  2. Click "Connect Confluence Cloud".
  3. Sign in to Atlassian.
  4. Pick the Confluence site. Approve scopes:
    • read:confluence-content.all.
    • read:confluence-space.summary.
  5. Done. OAuth token auto-refreshes.

For Confluence Data Center (self-hosted):

  1. Generate an API token in Confluence (Settings > Personal Access Tokens).
  2. Enter Confluence URL plus the API token in AskVault.
  3. AskVault tests the connection.

Self-hosted Confluence requires network reachability from AskVault (public URL or VPN).

Step 2: select content

After connecting:

  • Pick spaces to index.
  • Or pick specific page trees under a space.
  • Set glob filters for path matching (e.g., index only /Support/**).

Tip: start with the support-relevant subset. Indexing your entire Confluence is rarely the right move.

Step 3: configure sync

  • Sync frequency. Default 6 hours; configurable to 1 hour, daily, weekly.
  • Audience tag. Per space or per page.

Step 4: trigger initial sync

Click "Sync now". 500 pages indexes in about 8 minutes.

Status visible under Knowledge Hub > Confluence Source > Pages.

Sync behavior

Confluence Cloud doesn't expose webhooks for page changes by default (without a separate Forge app). Sync runs on schedule.

To force a fresher index:

  • Schedule daily syncs at off-hours.
  • Trigger manual sync after major content updates.
  • For real-time, use the planned webhook bridge (on the roadmap).

Space-level audience tagging

Most teams structure Confluence by space:

  • Engineering Wiki space → audience internal-engineering.
  • HR Handbook space → audience internal-hr.
  • Customer Help Center space → audience public.

Set per-space audience under Knowledge Hub > Confluence Source > Spaces > [space] > Audience. See audience tags.

Sample bot interaction

For an engineering helpdesk:

Engineer: "What's our process for deploying a hotfix?"

Bot: "Hotfix deploys follow the emergency-deploy runbook: 1. Page eng on-call. 2. Create the fix on a hotfix-XXX branch. 3. Open PR with [hotfix] tag. 4. Get one approval and CI green. 5. Deploy via emergency-deploy.sh. See full runbook: Emergency Hotfix Procedure. Last updated 8 days ago."

The bot includes the source page link and last-updated date so the engineer knows the freshness.

Multi-instance Confluence

For organizations with multiple Atlassian instances (e.g., separate per region or per acquisition):

  • Connect each instance as a separate source.
  • Available on Business and above.

Permission inheritance

Confluence's own permission model:

  • AskVault reads with the connecting user's permissions.
  • A page restricted at the Confluence level isn't indexed.
  • Spaces with anonymous access are crawlable; private spaces need OAuth.

Combine with AskVault audience tags for additional segmentation.

Plan availability

  • Free, Starter. No Confluence integration.
  • Growth. Up to 1,000 pages indexed. Growth+
  • Business. Up to 5,000 pages, multi-instance, advanced filters. Business+
  • Enterprise. Unlimited pages, on-prem support.

Confluence content types

How different content types index:

Standard pages. Title + body indexed normally.

Templates. Indexed if pages created from them aren't already covered.

Blueprints. Decision trees, retrospectives, runbooks. Indexed as structured pages.

Attachments. File names indexed; PDF and DOCX attachments optionally extracted (Business and above).

Whiteboards (Cloud feature). Indexed as plain-text representations.

Databases (new Confluence feature). Indexed as structured rows.

Linked-page following

When a Confluence page links to others:

  • AskVault follows internal links within the same space.
  • Cross-space links followed if both spaces are indexed.
  • External links not followed (use URL crawling for those).

On-prem (Data Center) considerations

For self-hosted Confluence Data Center:

  • AskVault must reach the Confluence URL. Either expose publicly with auth, or set up VPN.
  • API token authentication. Per-user. Rotate every 6 to 12 months.
  • No OAuth flow today; API token only.
  • Sync frequency same as Cloud.

For air-gapped Confluence (no internet), the standard integration doesn't work. Contact us about Enterprise on-prem AskVault deployment.

Sync conflicts

What happens when content changes:

  • Page updated in Confluence. Re-indexes on next sync.
  • Page deleted. Removes from index on next sync.
  • Page moved between spaces. Tracked via page ID; index updates with new space membership.
  • Space renamed. Title updates; pages keep indexing.

Audit and compliance

Every retrieval logs:

  • Which Confluence page.
  • Visitor ID.
  • Timestamp.

Useful for compliance audits. Retained 365 days standard.

Planned features (on the roadmap)

Documented for accuracy:

  • Webhook-based real-time sync. Today, scheduled. A planned Forge app brings real-time on Confluence Cloud.
  • Comment threading as Q&A. Today, off by default. Planned to optionally treat comment threads as FAQ entries.
  • Page-property-aware filtering. Today, body text only. Planned: filter by page metadata (labels, owner, category).
  • Native PDF and DOCX attachment indexing. Today, file names only. Full content extraction planned.

Limits

  • Pages per source. 5,000 max.
  • Spaces per source. Up to 50.
  • Sync frequency. As fast as every 1 hour.
  • Initial sync speed. About 1 minute per 60 pages.
  • API token rotation. Up to you; recommended every 6 to 12 months.

Common pitfalls

Pages missing from index. OAuth user doesn't have access. Re-authorize as a broader user.

Stale answers despite recent edits. Sync hasn't run. Trigger manual sync, or bump frequency to every hour.

Page-level Confluence permissions surprising. Restricted pages aren't indexed even if the space is. Verify under Confluence's permissions.

Self-hosted Confluence unreachable. Network issue. AskVault must be able to HTTPS-reach the Confluence URL.

FAQ

Does this work with Confluence Cloud and Data Center?

Yes for both. Cloud uses OAuth; Data Center uses API token.

Can I index public Confluence spaces without auth?

Yes via URL crawling. Slower than the native integration.

How fresh are bot answers?

Within 6 hours (default). Configure to 1 hour if needed. Real-time webhook sync planned.

Can I exclude specific pages?

Yes via per-page audience tag, or by setting page restrictions in Confluence (which AskVault inherits).

Does this work with Jira too?

Jira is a separate integration. See Jira ingest (also on the Atlassian platform).

Was this page helpful?