The Knowledge Hub
What you see in Knowledge Hub
A single sortable table of every knowledge source in your workspace. Each row carries:
- Source name. URL, filename, integration source, snippet title, or Q&A pair question.
- Type icon. URL, PDF, DOCX, Notion, GitHub, snippet, Q&A pair, etc.
- Status. Indexing, ready, failed, syncing, paused.
- Last sync timestamp.
- Audience tags. Visible as colored chips.
- Chunk count. How many vector chunks this source produces.
- Actions. Resync, edit, delete.
Above the table, a top bar shows aggregate stats: total sources, total chunks, storage used vs plan cap.
Source types
Eight source types appear in the same table:
- Website crawl. Walked URLs from a configured root.
- Sitemap crawl. Bulk import from
/sitemap.xml. - File upload. PDF, DOCX, TXT, MD, CSV.
- Integration. Notion, Confluence, GitHub, WordPress, Zendesk, etc.
- Snippet. Freeform text entries.
- Q&A pair. Curated question-answer entries.
- API import. Programmatically-uploaded documents via the API.
- Webhook ingest. Real-time content pushed from your system to AskVault.
All eight share the same vector index, so retrieval pulls from any of them. Audience tags work uniformly across types.
Add a new source
The + Add Source button in the top right opens the source-type picker. Pick the type you want, follow the per-type setup flow.
Most common starting flows:
- Website crawl. Paste a URL, pick crawl scope, click Start.
- File upload. Drag-and-drop one or more files.
- Notion integration. OAuth flow.
- Q&A pair. Type the question and answer directly.
Detailed setup steps for each type live in the knowledge management cluster.
Monitor indexing status
When you add a source, it goes through:
- Queued. Waiting to start.
- Indexing. Crawling, extracting, chunking, embedding.
- Ready. Available for retrieval. Bot can cite from it.
- Failed. Something went wrong. Click for error details.
Status updates in real-time. Re-syncs show "Syncing" without dropping the existing "Ready" state, so the bot keeps answering from the old version until the new one is fully indexed.
For sources with retry budgets (transient failures), AskVault auto-retries on an exponential schedule. Permanent failures (404, robots-disallowed) don't retry; click to see the specific failure code.
Trigger a re-sync
To pull the latest content from a source:
- Hover the row, click the Resync icon.
- Or open the source detail and click Resync now.
For sources with scheduled sync (daily or weekly), manual resync runs in addition to the schedule, not instead of. Useful when you've just made a meaningful content update and want it indexed immediately.
Re-sync usually completes in 1 to 10 minutes depending on source size. Status shows progress live.
Bulk actions
Multi-select sources (Shift+click or checkbox in the leftmost column), then apply:
- Bulk delete. Removes selected sources and their chunks from the vector index.
- Bulk audience tag. Apply a tag to many sources at once.
- Bulk resync. Trigger re-sync on a set of sources.
- Bulk pause. Temporarily disable a set of sources from retrieval.
Up to 100 sources per bulk action.
Storage usage indicator
The top bar shows current storage usage vs your plan cap:
- Used. Total MB across all sources.
- Cap. 5 MB on Free, 15 MB on Starter, 40 MB on Growth, 100 MB on Business.
- Percentage. Visual bar.
When you approach 90% of cap, AskVault sends an email warning. At 100%, new uploads fail until you delete content or upgrade. Existing content keeps working.
Audience tags at the source level
Click any source to expand its detail. Set audience tags here:
- Single tag like
hr_team. - Multiple tags for cross-team accessibility.
- No tags for content available to all verified visitors and anonymous visitors.
Combined with identity verification, audience tags become enforceable. Visitor without hr_team audience can't retrieve from hr_team-tagged content. Growth+
Chunk inspection
For debugging "why is the bot answering wrong?", inspect individual chunks:
- Click a source row to expand.
- View Chunks opens a paginated list of every chunk extracted from this source.
- Each chunk shows token count, parent heading prefix, and the raw text.
Useful for spotting chunking issues, missing context, or content that didn't extract correctly.
Search across sources
The Knowledge Hub search bar searches both source names and chunk content:
- By name. "shopify" finds the Shopify integration source.
- By content. "refund policy" finds any chunk containing those words.
Search results show the source plus highlighted matching chunks.
Limits
- Plan availability. Knowledge Hub on every plan including Free.
- Source count. No hard cap, but performance optimizes for under 500 sources per workspace.
- Re-sync rate. Up to 3 concurrent sync operations per workspace.
Common pitfalls
Bot answers wrong despite the right content existing. Check Knowledge Hub > [source] > Chunks to confirm the right text is indexed. If chunking is off, re-upload as a different format (e.g., DOCX instead of PDF) or adjust the source.
Sync stuck on "Indexing" for hours. Source has parsing issues. Click for error details. Common causes: corrupted PDF, image-only PDF without OCR, password-protected file.
Sources missing audience tags. Tags need explicit configuration. Default is no tags (visible to all).
Storage cap hit unexpectedly. Recently-deleted sources don't immediately free up storage (the cleanup runs every 10 minutes). Refresh after a few minutes.
FAQ
Can I export everything in the Knowledge Hub?
Yes. Knowledge Hub > Export All generates a zip file with every source and its chunks. Up to 100 MB per export.
How do I know which source the bot cited in a conversation?
Open the conversation in Live Chat. Each bot message lists sources used. Click a source to jump to its row in Knowledge Hub.
Can I share sources between workspaces?
Not directly. Each workspace has its own Knowledge Hub. For shared knowledge, use an integration that both workspaces ingest from (e.g., the same Notion workspace).
Does the bot prefer some sources over others?
Higher-priority sources (Q&A pairs, recently-updated content) surface earlier in retrieval ranking. Configure priority weighting under Knowledge > Settings > Retrieval Priority.
What happens to deleted sources?
Chunks removed from the vector index within 5 minutes. Backups purged within 30 days. After 30 days, unrecoverable.