Ingest knowledge from Confluence

Written by Aashiq, Founder, AskVault · Reviewed by Aashiq

Last updated: May 15, 2026 · 4 min read

What gets indexed

For each connected Confluence space or page tree:

Page titles and bodies.
Child pages. Followed recursively up to 5 levels by default.
Tables. Cell content preserved with column headers.
Code blocks.
Attachments (as captions; binary content not analyzed today).
Page labels (used for filtering, not as content).

What's not indexed:

Comments on pages (off by default; toggle on).
Spaces the OAuth user can't see.
Personal drafts.

Setup walkthrough

About 15 minutes:

Step 1: connect Confluence

For Confluence Cloud (Atlassian Cloud):

Open Knowledge Hub > Add Source > Confluence.
Click "Connect Confluence Cloud".
Sign in to Atlassian.
Pick the Confluence site. Approve scopes:
- read:confluence-content.all.
- read:confluence-space.summary.
Done. OAuth token auto-refreshes.

For Confluence Data Center (self-hosted):

Generate an API token in Confluence (Settings > Personal Access Tokens).
Enter Confluence URL plus the API token in AskVault.
AskVault tests the connection.

Self-hosted Confluence requires network reachability from AskVault (public URL or VPN).

Step 2: select content

After connecting:

Pick spaces to index.
Or pick specific page trees under a space.
Set glob filters for path matching (e.g., index only /Support/**).

Tip: start with the support-relevant subset. Indexing your entire Confluence is rarely the right move.

Step 3: configure sync

Sync frequency. Default 6 hours; configurable to 1 hour, daily, weekly.
Audience tag. Per space or per page.

Step 4: trigger initial sync

Click "Sync now". 500 pages indexes in about 8 minutes.

Status visible under Knowledge Hub > Confluence Source > Pages.

Sync behavior

Confluence Cloud doesn't expose webhooks for page changes by default (without a separate Forge app). Sync runs on schedule.

To force a fresher index:

Schedule daily syncs at off-hours.
Trigger manual sync after major content updates.
For real-time, use the planned webhook bridge (on the roadmap).

Space-level audience tagging

Most teams structure Confluence by space:

Engineering Wiki space → audience internal-engineering.
HR Handbook space → audience internal-hr.
Customer Help Center space → audience public.

Set per-space audience under Knowledge Hub > Confluence Source > Spaces > [space] > Audience. See audience tags.

Sample bot interaction

For an engineering helpdesk:

Engineer: "What's our process for deploying a hotfix?"

Bot: "Hotfix deploys follow the emergency-deploy runbook: 1. Page eng on-call. 2. Create the fix on a hotfix-XXX branch. 3. Open PR with [hotfix] tag. 4. Get one approval and CI green. 5. Deploy via emergency-deploy.sh. See full runbook: Emergency Hotfix Procedure. Last updated 8 days ago."

The bot includes the source page link and last-updated date so the engineer knows the freshness.

Multi-instance Confluence

For organizations with multiple Atlassian instances (e.g., separate per region or per acquisition):

Connect each instance as a separate source.
Available on Business and above.

Permission inheritance

Confluence's own permission model:

AskVault reads with the connecting user's permissions.
A page restricted at the Confluence level isn't indexed.
Spaces with anonymous access are crawlable; private spaces need OAuth.

Combine with AskVault audience tags for additional segmentation.

Plan availability

Free, Starter. No Confluence integration.
Growth. Up to 1,000 pages indexed. Growth+
Business. Up to 5,000 pages, multi-instance, advanced filters. Business+
Enterprise. Unlimited pages, on-prem support.

Confluence content types

How different content types index:

Standard pages. Title + body indexed normally.

Templates. Indexed if pages created from them aren't already covered.

Blueprints. Decision trees, retrospectives, runbooks. Indexed as structured pages.

Attachments. File names indexed; PDF and DOCX attachments optionally extracted (Business and above).

Whiteboards (Cloud feature). Indexed as plain-text representations.

Databases (new Confluence feature). Indexed as structured rows.

Linked-page following

When a Confluence page links to others:

AskVault follows internal links within the same space.
Cross-space links followed if both spaces are indexed.
External links not followed (use URL crawling for those).

On-prem (Data Center) considerations

For self-hosted Confluence Data Center:

AskVault must reach the Confluence URL. Either expose publicly with auth, or set up VPN.
API token authentication. Per-user. Rotate every 6 to 12 months.
No OAuth flow today; API token only.
Sync frequency same as Cloud.

For air-gapped Confluence (no internet), the standard integration doesn't work. Contact us about Enterprise on-prem AskVault deployment.

Sync conflicts

What happens when content changes:

Page updated in Confluence. Re-indexes on next sync.
Page deleted. Removes from index on next sync.
Page moved between spaces. Tracked via page ID; index updates with new space membership.
Space renamed. Title updates; pages keep indexing.

Audit and compliance

Every retrieval logs:

Which Confluence page.
Visitor ID.
Timestamp.

Useful for compliance audits. Retained 365 days standard.

Planned features (on the roadmap)

Documented for accuracy:

Webhook-based real-time sync. Today, scheduled. A planned Forge app brings real-time on Confluence Cloud.
Comment threading as Q&A. Today, off by default. Planned to optionally treat comment threads as FAQ entries.
Page-property-aware filtering. Today, body text only. Planned: filter by page metadata (labels, owner, category).
Native PDF and DOCX attachment indexing. Today, file names only. Full content extraction planned.

Limits

Pages per source. 5,000 max.
Spaces per source. Up to 50.
Sync frequency. As fast as every 1 hour.
Initial sync speed. About 1 minute per 60 pages.
API token rotation. Up to you; recommended every 6 to 12 months.

Common pitfalls

Pages missing from index. OAuth user doesn't have access. Re-authorize as a broader user.

Stale answers despite recent edits. Sync hasn't run. Trigger manual sync, or bump frequency to every hour.

Page-level Confluence permissions surprising. Restricted pages aren't indexed even if the space is. Verify under Confluence's permissions.

Self-hosted Confluence unreachable. Network issue. AskVault must be able to HTTPS-reach the Confluence URL.

FAQ

Does this work with Confluence Cloud and Data Center?

Yes for both. Cloud uses OAuth; Data Center uses API token.

Can I index public Confluence spaces without auth?

Yes via URL crawling. Slower than the native integration.

How fresh are bot answers?

Within 6 hours (default). Configure to 1 hour if needed. Real-time webhook sync planned.

Can I exclude specific pages?

Yes via per-page audience tag, or by setting page restrictions in Confluence (which AskVault inherits).

Does this work with Jira too?

Jira is a separate integration. See Jira ingest (also on the Atlassian platform).

Was this page helpful?