Batch URL ingestion
When to use batch ingest
Three patterns:
- Migrating from another platform. You have a list of URLs from Chatbase/SiteGPT to re-index.
- Onboarding without a sitemap. Known set of important URLs to index.
- Targeted re-index of a subset after content changes.
For full-site crawls without a URL list, see full-site crawl.
Method 1: CSV upload
- Prepare a CSV with one URL per line. Optional columns:
audience,tags. - Knowledge Hub > Add Source > Batch URL > Upload CSV.
- Click Upload.
- AskVault validates (checks URL format, reachability).
- Parallel crawl starts within 60 seconds.
CSV size cap: 10 MB (about 100,000 URLs).
Method 2: paste list
For smaller batches:
- Knowledge Hub > Add Source > Batch URL > Paste.
- Paste up to 1,000 URLs (one per line).
- Click Start.
Method 3: API
For programmatic ingestion:
curl -X POST https://api.askvault.co/v1/documents/batch-crawl \ -H "Authorization: Bearer ak_xxx" \ -H "Content-Type: application/json" \ -d '{ "urls": ["https://yoursite.co/page1", "https://yoursite.co/page2"], "audience": "public" }'Returns a job ID; poll for completion.
Crawl rate
Configurable:
- Default 30 URLs in flight.
- Per-host throttle. Max 8 URLs in flight per origin (prevents hammering your own server).
- Customizable under Source Settings.
For a 1,000-URL batch on a healthy site: about 5 to 15 minutes to complete.
Audience and tags per URL
CSV with metadata:
url,audience,tagshttps://yoursite.co/pricing,public,pricinghttps://yoursite.co/enterprise-docs,enterprise,enterprise|docshttps://yoursite.co/internal-runbook,internal,runbookEach URL inherits the specified audience and tags.
Progress monitoring
Dashboard shows:
- Total URLs in batch.
- Queued / Indexing / Ready / Failed counts.
- ETA based on current rate.
- Per-URL detail clickable.
Failed URLs surface with the reason (404, timeout, blocked, etc.).
Re-running
Re-crawling URLs in the batch:
- All URLs: trigger full re-crawl.
- Failed only: retry just the failed ones.
- Stale only: re-crawl URLs not synced in N days.
Limits
- CSV size. 10 MB.
- URLs per single API call. 1,000.
- Concurrent batch jobs. 3 per workspace.
- Crawl rate. Up to 30 in flight; 8 per host.
Common pitfalls
Rate limited by your own server. Lower per-host throttle.
Many failures. Site blocks our crawler. Allowlist our crawler IPs or use BYOK scraper.
Slow batch. Per-host throttle. Increase under Source Settings.
Duplicates. Same URL with different query strings. Normalize before upload.
FAQ
Can I cancel a batch mid-flight?
Yes. Cancel button in dashboard. In-flight URLs complete; queued ones drop.
Will batch affect my plan's MB cap?
Yes. Each crawled page counts toward the MB cap.
Can I pause and resume?
Pause supported on Business and above. Pauses queue; resume picks up.