Skip to content
Try Free →

Advanced scraping settings

Last updated: · 4 min read

When to use

For sites that crawl poorly with defaults:

  • Anti-bot protection (Cloudflare, hCaptcha challenges).
  • JS-heavy rendering (React/Vue/Angular SPAs with no SSR).
  • Login-walled content.
  • Rate-limited sites that drop our requests.

Settings available

Six knobs:

  1. User-agent string. Default is AskVault-Bot/1.0. Override for sites that allowlist specific UAs.
  2. JS rendering. Toggle headless rendering on/off. Slower but handles SPAs.
  3. Retry behavior. Number of retries on 5xx, timeout window.
  4. BYOK scraper. Bring your own ScrapingBee/Bright Data API key for harder sites.
  5. Per-host throttle. Concurrent fetches per origin.
  6. Custom headers. Add cookies, auth tokens (see cookies for login crawls).

JS rendering

By default, AskVault tries lightweight HTTP fetch first; falls back to headless rendering on detection of JS-required content.

Force JS rendering if your site is purely SPA-based:

  1. Settings > Advanced Scraping > Force JS Rendering.
  2. Set per source or globally.

Trade-off: 2 to 5x slower than HTTP fetch.

BYOK scraper

For aggressively-protected sites (Cloudflare Pro, custom anti-bot):

  1. Settings > BYOK Scraper > Provider.
  2. Pick provider (ScrapingBee, Bright Data, etc.).
  3. Paste your API key.
  4. AskVault routes through your account for protected URLs.

You pay the upstream provider. AskVault doesn't surcharge.

Per-host throttle

For respecting your own server's capacity:

  • Default 8 in flight per host.
  • Lower to 3 if your server struggles.
  • Higher to 20 if your server has capacity.

Configure per source.

Custom headers

For sites with custom auth or rate-limit:

{
"Authorization": "Bearer xxx",
"X-API-Key": "yyy",
"Cookie": "session=zzz"
}

Sent on every crawl request. See cookies for login crawls for cookies specifically.

Retry behavior

For transient errors:

  • 5xx. Retry 3 times with exponential backoff.
  • 429 rate limit. Wait per Retry-After header.
  • Timeout. 30 seconds default; configurable.

About 95% of crawls complete successfully with defaults.

Limits

  • JS rendering capacity. About 100 concurrent renderings per workspace.
  • BYOK provider plan limits apply.
  • Custom headers count. Up to 10.
  • Retry attempts. 3 over 30 seconds.
  • Setup time. About 15 minutes.

Common pitfalls

JS render too slow. Most pages don't need it. Only force for SPAs.

BYOK provider quota. Monitor with provider; AskVault doesn't track your quota.

Throttle too tight. Crawl takes hours. Raise.

Headers leak secrets in audit log. Sensitive headers redacted.

FAQ

Will I get banned from my own site?

Configure user-agent and per-host throttle to identify and respect.

Can I use my own ScrapingBee free tier?

Yes. AskVault works with any compatible BYOK provider.

Does JS rendering work for React, Vue, Angular?

Yes. Headless Chromium handles all standard frameworks.

Was this page helpful?