Advanced scraping settings
When to use
For sites that crawl poorly with defaults:
- Anti-bot protection (Cloudflare, hCaptcha challenges).
- JS-heavy rendering (React/Vue/Angular SPAs with no SSR).
- Login-walled content.
- Rate-limited sites that drop our requests.
Settings available
Six knobs:
- User-agent string. Default is
AskVault-Bot/1.0. Override for sites that allowlist specific UAs. - JS rendering. Toggle headless rendering on/off. Slower but handles SPAs.
- Retry behavior. Number of retries on 5xx, timeout window.
- BYOK scraper. Bring your own ScrapingBee/Bright Data API key for harder sites.
- Per-host throttle. Concurrent fetches per origin.
- Custom headers. Add cookies, auth tokens (see cookies for login crawls).
JS rendering
By default, AskVault tries lightweight HTTP fetch first; falls back to headless rendering on detection of JS-required content.
Force JS rendering if your site is purely SPA-based:
- Settings > Advanced Scraping > Force JS Rendering.
- Set per source or globally.
Trade-off: 2 to 5x slower than HTTP fetch.
BYOK scraper
For aggressively-protected sites (Cloudflare Pro, custom anti-bot):
- Settings > BYOK Scraper > Provider.
- Pick provider (ScrapingBee, Bright Data, etc.).
- Paste your API key.
- AskVault routes through your account for protected URLs.
You pay the upstream provider. AskVault doesn't surcharge.
Per-host throttle
For respecting your own server's capacity:
- Default 8 in flight per host.
- Lower to 3 if your server struggles.
- Higher to 20 if your server has capacity.
Configure per source.
Custom headers
For sites with custom auth or rate-limit:
{ "Authorization": "Bearer xxx", "X-API-Key": "yyy", "Cookie": "session=zzz"}Sent on every crawl request. See cookies for login crawls for cookies specifically.
Retry behavior
For transient errors:
- 5xx. Retry 3 times with exponential backoff.
- 429 rate limit. Wait per Retry-After header.
- Timeout. 30 seconds default; configurable.
About 95% of crawls complete successfully with defaults.
Limits
- JS rendering capacity. About 100 concurrent renderings per workspace.
- BYOK provider plan limits apply.
- Custom headers count. Up to 10.
- Retry attempts. 3 over 30 seconds.
- Setup time. About 15 minutes.
Common pitfalls
JS render too slow. Most pages don't need it. Only force for SPAs.
BYOK provider quota. Monitor with provider; AskVault doesn't track your quota.
Throttle too tight. Crawl takes hours. Raise.
Headers leak secrets in audit log. Sensitive headers redacted.
FAQ
Will I get banned from my own site?
Configure user-agent and per-host throttle to identify and respect.
Can I use my own ScrapingBee free tier?
Yes. AskVault works with any compatible BYOK provider.
Does JS rendering work for React, Vue, Angular?
Yes. Headless Chromium handles all standard frameworks.