Recurring sync for AskVault knowledge sources
Why content freshness matters
A bot trained on outdated content gives outdated answers. Pricing changed last month and the bot still quotes old prices. A new product launched and the bot doesn't know about it. A policy got updated and customers get the stale version.
Recurring sync solves this. AskVault re-checks your source content on a schedule, picks up changes, re-embeds them, and updates the vector index. The bot stays current without you doing anything manual.
Available on Growth and above. Growth+
Three sync modes
Three options per knowledge source:
- Daily sync (recommended for active content). Runs once per day at midnight UTC. Picks up every change from the previous 24 hours.
- Weekly sync (right for stable content). Runs Sunday at midnight UTC. Lower processing cost; fine if your content rarely changes.
- Manual sync. No automatic re-crawl. You trigger sync from the dashboard when you've made meaningful changes.
Configure per source under Knowledge Hub > [source] > Sync Schedule. Mix and match: daily for your blog, weekly for static policy docs, manual for one-off uploads.
Incremental sync
Re-sync is incremental, not full. Only changed content is re-processed:
- Unchanged pages. Skipped. No re-embed, no index update.
- Modified pages. Re-extracted, re-chunked, re-embedded. Old chunks replaced.
- New pages. Walked and indexed for the first time.
- Deleted pages. Removed from the index within one sync cycle.
For a 1,000-page site where 20 pages changed, incremental sync processes about 20 pages, not 1,000. Typical re-sync takes 5 to 10% of the initial crawl time.
Webhook-triggered sync
For content you want indexed in near-real-time, configure webhook-triggered sync. Three patterns:
- WordPress webhook. Install the WP Webhooks plugin, point it at AskVault. New posts trigger re-indexing within 60 seconds.
- GitHub webhook. AskVault auto-installs this when you connect the GitHub integration. Commits to the default branch trigger re-indexing.
- Custom webhook. For any system that can POST when content changes, configure under Knowledge Hub > [source] > Webhook URL.
Webhook-triggered sync is incremental same as scheduled sync. Only changed content re-processes.
Detecting changes
AskVault uses three signals to detect content changes:
- HTTP ETag. Most modern hosts return ETags. Unchanged ETag = unchanged content.
- Last-Modified header. Fallback when ETag isn't available.
- Content hash. For sources without HTTP metadata (file uploads, API integrations), AskVault hashes the content and compares.
If a page reports unchanged but the content actually changed (rare bug in caching layers), trigger a manual full re-sync under Knowledge Hub > [source] > Force Resync.
Prune dead links
During re-sync, AskVault detects URLs that 404 (deleted pages) and removes them from the vector index. Default behavior.
Configure under Knowledge Hub > [source] > Prune Dead Links:
- Enabled (default). Deleted pages get removed within one sync cycle.
- Disabled. Deleted pages stay in the index until you manually remove them. Useful when you want to preserve historical content the bot can still cite.
For sites with planned URL changes (redirects), enable Follow Redirects = On so the crawler picks up the new URL while removing the old.
Sync schedule per source
Different sources can have different schedules within the same workspace:
- Blog (changes daily). Daily sync.
- Help articles (changes weekly). Weekly sync.
- Privacy policy (changes rarely). Manual sync.
- Stripe-connected product catalog. Webhook-triggered sync.
Mix freely. The total sync cost stays proportional to actual changes, not number of sources.
Sync log and audit
Every sync run logs:
- Start and end time.
- Pages crawled, new, updated, removed.
- Errors per page (timeouts, 5xx, parsing failures).
- Total time.
Visible under Knowledge Hub > [source] > Sync History. Retained 90 days. Useful for debugging when content seems stale.
Cost considerations
Re-sync uses AskVault compute and counts toward your monthly query quota in a small way:
- Crawl + parse costs a few cents per 100 pages.
- Re-embedding counts as approximately 1 query per changed page (charged against your monthly quota).
For a 1,000-page workspace with 50 daily changes, expect about 1,500 queries per month consumed by sync. Plan accordingly.
Limits
- Sync frequency cap. Maximum daily sync; sub-daily requires webhook trigger.
- Concurrent syncs per workspace. Up to 3 sources can sync in parallel.
- Per-sync timeout. 4 hours. Sources that don't complete within this get split into multiple sync runs.
Common pitfalls
Bot still quotes old content after sync. Vector index might be cached at the retrieval layer. Trigger a force-resync; this rebuilds the index from scratch.
Sync never finishes. Source has rate-limit issues, JS-rendering timeouts, or auth problems. Check Knowledge Hub > [source] > Sync Errors for specifics.
Daily sync runs at the wrong time. All scheduled syncs run at midnight UTC by default. For per-source scheduling at different times, contact support@askvault.co (Enterprise can customize).
Webhook sync fires but no update happens. Webhook URL is misconfigured or auth fails. Check Knowledge Hub > [source] > Webhook Deliveries for delivery status.
FAQ
Can I pause re-sync temporarily?
Yes. Toggle Knowledge Hub > [source] > Sync Schedule > Paused. Resume by switching back to Daily/Weekly.
Does re-sync replace audience tags?
No. Audience tags are workspace-side configuration; re-sync only re-indexes content. Your tags persist across syncs.
What happens to chunks from deleted pages?
Removed from the vector index. The bot stops retrieving them. Any conversation already in flight that cited the page keeps the citation; the citation link 404s if the customer clicks.
Can I see what changed in the last sync?
Yes. Knowledge Hub > [source] > Sync History > Latest > Diff shows pages added, modified, and removed.
Does re-sync count toward my query quota?
Yes, marginally. About 1 query per changed page. A 1,000-page workspace with 50 daily changes uses approximately 1,500 queries per month for sync.