Ingest knowledge from Confluence
What gets indexed
For each connected Confluence space or page tree:
- Page titles and bodies.
- Child pages. Followed recursively up to 5 levels by default.
- Tables. Cell content preserved with column headers.
- Code blocks.
- Attachments (as captions; binary content not analyzed today).
- Page labels (used for filtering, not as content).
What's not indexed:
- Comments on pages (off by default; toggle on).
- Spaces the OAuth user can't see.
- Personal drafts.
Setup walkthrough
About 15 minutes:
Step 1: connect Confluence
For Confluence Cloud (Atlassian Cloud):
- Open Knowledge Hub > Add Source > Confluence.
- Click "Connect Confluence Cloud".
- Sign in to Atlassian.
- Pick the Confluence site. Approve scopes:
read:confluence-content.all.read:confluence-space.summary.
- Done. OAuth token auto-refreshes.
For Confluence Data Center (self-hosted):
- Generate an API token in Confluence (Settings > Personal Access Tokens).
- Enter Confluence URL plus the API token in AskVault.
- AskVault tests the connection.
Self-hosted Confluence requires network reachability from AskVault (public URL or VPN).
Step 2: select content
After connecting:
- Pick spaces to index.
- Or pick specific page trees under a space.
- Set glob filters for path matching (e.g., index only
/Support/**).
Tip: start with the support-relevant subset. Indexing your entire Confluence is rarely the right move.
Step 3: configure sync
- Sync frequency. Default 6 hours; configurable to 1 hour, daily, weekly.
- Audience tag. Per space or per page.
Step 4: trigger initial sync
Click "Sync now". 500 pages indexes in about 8 minutes.
Status visible under Knowledge Hub > Confluence Source > Pages.
Sync behavior
Confluence Cloud doesn't expose webhooks for page changes by default (without a separate Forge app). Sync runs on schedule.
To force a fresher index:
- Schedule daily syncs at off-hours.
- Trigger manual sync after major content updates.
- For real-time, use the planned webhook bridge (on the roadmap).
Space-level audience tagging
Most teams structure Confluence by space:
- Engineering Wiki space → audience
internal-engineering. - HR Handbook space → audience
internal-hr. - Customer Help Center space → audience
public.
Set per-space audience under Knowledge Hub > Confluence Source > Spaces > [space] > Audience. See audience tags.
Sample bot interaction
For an engineering helpdesk:
Engineer: "What's our process for deploying a hotfix?"
Bot: "Hotfix deploys follow the emergency-deploy runbook: 1. Page eng on-call. 2. Create the fix on a hotfix-XXX branch. 3. Open PR with [hotfix] tag. 4. Get one approval and CI green. 5. Deploy via emergency-deploy.sh. See full runbook: Emergency Hotfix Procedure. Last updated 8 days ago."
The bot includes the source page link and last-updated date so the engineer knows the freshness.
Multi-instance Confluence
For organizations with multiple Atlassian instances (e.g., separate per region or per acquisition):
- Connect each instance as a separate source.
- Available on Business and above.
Permission inheritance
Confluence's own permission model:
- AskVault reads with the connecting user's permissions.
- A page restricted at the Confluence level isn't indexed.
- Spaces with anonymous access are crawlable; private spaces need OAuth.
Combine with AskVault audience tags for additional segmentation.
Plan availability
- Free, Starter. No Confluence integration.
- Growth. Up to 1,000 pages indexed. Growth+
- Business. Up to 5,000 pages, multi-instance, advanced filters. Business+
- Enterprise. Unlimited pages, on-prem support.
Confluence content types
How different content types index:
Standard pages. Title + body indexed normally.
Templates. Indexed if pages created from them aren't already covered.
Blueprints. Decision trees, retrospectives, runbooks. Indexed as structured pages.
Attachments. File names indexed; PDF and DOCX attachments optionally extracted (Business and above).
Whiteboards (Cloud feature). Indexed as plain-text representations.
Databases (new Confluence feature). Indexed as structured rows.
Linked-page following
When a Confluence page links to others:
- AskVault follows internal links within the same space.
- Cross-space links followed if both spaces are indexed.
- External links not followed (use URL crawling for those).
On-prem (Data Center) considerations
For self-hosted Confluence Data Center:
- AskVault must reach the Confluence URL. Either expose publicly with auth, or set up VPN.
- API token authentication. Per-user. Rotate every 6 to 12 months.
- No OAuth flow today; API token only.
- Sync frequency same as Cloud.
For air-gapped Confluence (no internet), the standard integration doesn't work. Contact us about Enterprise on-prem AskVault deployment.
Sync conflicts
What happens when content changes:
- Page updated in Confluence. Re-indexes on next sync.
- Page deleted. Removes from index on next sync.
- Page moved between spaces. Tracked via page ID; index updates with new space membership.
- Space renamed. Title updates; pages keep indexing.
Audit and compliance
Every retrieval logs:
- Which Confluence page.
- Visitor ID.
- Timestamp.
Useful for compliance audits. Retained 365 days standard.
Planned features (on the roadmap)
Documented for accuracy:
- Webhook-based real-time sync. Today, scheduled. A planned Forge app brings real-time on Confluence Cloud.
- Comment threading as Q&A. Today, off by default. Planned to optionally treat comment threads as FAQ entries.
- Page-property-aware filtering. Today, body text only. Planned: filter by page metadata (labels, owner, category).
- Native PDF and DOCX attachment indexing. Today, file names only. Full content extraction planned.
Limits
- Pages per source. 5,000 max.
- Spaces per source. Up to 50.
- Sync frequency. As fast as every 1 hour.
- Initial sync speed. About 1 minute per 60 pages.
- API token rotation. Up to you; recommended every 6 to 12 months.
Common pitfalls
Pages missing from index. OAuth user doesn't have access. Re-authorize as a broader user.
Stale answers despite recent edits. Sync hasn't run. Trigger manual sync, or bump frequency to every hour.
Page-level Confluence permissions surprising. Restricted pages aren't indexed even if the space is. Verify under Confluence's permissions.
Self-hosted Confluence unreachable. Network issue. AskVault must be able to HTTPS-reach the Confluence URL.
FAQ
Does this work with Confluence Cloud and Data Center?
Yes for both. Cloud uses OAuth; Data Center uses API token.
Can I index public Confluence spaces without auth?
Yes via URL crawling. Slower than the native integration.
How fresh are bot answers?
Within 6 hours (default). Configure to 1 hour if needed. Real-time webhook sync planned.
Can I exclude specific pages?
Yes via per-page audience tag, or by setting page restrictions in Confluence (which AskVault inherits).
Does this work with Jira too?
Jira is a separate integration. See Jira ingest (also on the Atlassian platform).