Skip to content
Try Free →

Cookies for authenticated crawls

Last updated: · 4 min read

When to use

Three cases:

  1. Customer portal docs that require login.
  2. Paid-content pages for subscriber sites.
  3. Internal wiki accessible only to authenticated users.

For OAuth-supporting sources (Notion, Confluence, etc.), prefer OAuth. Cookies are for sites without API access.

Setup

10 minutes:

Step 1: log in to the target site

In your browser, log in normally.

Step 2: extract cookies

Use a browser extension like "Cookie Editor" or open DevTools > Application > Cookies. Export as JSON.

[
{"name": "session_id", "value": "abc123...", "domain": ".yoursite.co"},
{"name": "auth_token", "value": "xyz789...", "domain": ".yoursite.co"}
]

Step 3: paste into AskVault

  1. Knowledge Hub > URL Crawl source > Advanced > Authentication.
  2. Pick "Cookies".
  3. Paste the JSON.
  4. Save.

Step 4: trigger crawl

AskVault sends the cookies with every request. Behind-login pages now crawl successfully.

Cookies typically expire:

  • Session cookies. Expire on browser close.
  • Persistent cookies. Last days to months.

AskVault stores cookies encrypted at rest. When they expire, crawls fail with 401. Re-extract and update.

Auto-refresh

Some sites auto-rotate cookies. AskVault captures rotations during crawls:

  • Set-Cookie response headers processed.
  • New cookies stored for subsequent requests.

Useful for sticky-session sites that rotate every few hours.

Security considerations

Cookies are sensitive:

  • Treat like passwords. Don't share or commit.
  • AskVault stores encrypted at rest.
  • Rotate every 30 days as best practice.
  • Use dedicated service-account login rather than your personal account.

Limits

  • Cookies per source. 50.
  • Cookie value length. 4 KB each.
  • Re-extraction frequency. Up to you; recommended every 30 days.

Alternatives

If cookies are too brittle:

  • OAuth. For Notion, Confluence, GitHub, Salesforce. Preferred.
  • API tokens. For sites with API plus tokens (Jira, Stripe, etc.).
  • Direct content upload. Manually upload PDFs of the behind-login content.
  • Public mirror. Many docs sites maintain a public mirror for SEO.

Common pitfalls

Cookies expire during crawl. Initial pages succeed; later fail. Re-extract.

Cookie matches wrong domain. Must match site domain exactly (or be a parent domain).

MFA-protected accounts. Session cookies bypass MFA but only briefly. Use service accounts without MFA, or refresh more often.

Cookies leak via logs. AskVault redacts cookies in audit logs.

FAQ

Is this safe?

If using a dedicated service account, yes. Don't use your personal admin account.

Can I use this for a site I don't own?

Only if you have explicit authorization. Respect Terms of Service.

What if the site rotates cookies aggressively?

AskVault's rotation-capture helps. For very aggressive sites, the crawl may break frequently; consider alternatives.

Was this page helpful?