Cookies for authenticated crawls
When to use
Three cases:
- Customer portal docs that require login.
- Paid-content pages for subscriber sites.
- Internal wiki accessible only to authenticated users.
For OAuth-supporting sources (Notion, Confluence, etc.), prefer OAuth. Cookies are for sites without API access.
Setup
10 minutes:
Step 1: log in to the target site
In your browser, log in normally.
Step 2: extract cookies
Use a browser extension like "Cookie Editor" or open DevTools > Application > Cookies. Export as JSON.
[ {"name": "session_id", "value": "abc123...", "domain": ".yoursite.co"}, {"name": "auth_token", "value": "xyz789...", "domain": ".yoursite.co"}]Step 3: paste into AskVault
- Knowledge Hub > URL Crawl source > Advanced > Authentication.
- Pick "Cookies".
- Paste the JSON.
- Save.
Step 4: trigger crawl
AskVault sends the cookies with every request. Behind-login pages now crawl successfully.
Cookie lifespan
Cookies typically expire:
- Session cookies. Expire on browser close.
- Persistent cookies. Last days to months.
AskVault stores cookies encrypted at rest. When they expire, crawls fail with 401. Re-extract and update.
Auto-refresh
Some sites auto-rotate cookies. AskVault captures rotations during crawls:
Set-Cookieresponse headers processed.- New cookies stored for subsequent requests.
Useful for sticky-session sites that rotate every few hours.
Security considerations
Cookies are sensitive:
- Treat like passwords. Don't share or commit.
- AskVault stores encrypted at rest.
- Rotate every 30 days as best practice.
- Use dedicated service-account login rather than your personal account.
Limits
- Cookies per source. 50.
- Cookie value length. 4 KB each.
- Re-extraction frequency. Up to you; recommended every 30 days.
Alternatives
If cookies are too brittle:
- OAuth. For Notion, Confluence, GitHub, Salesforce. Preferred.
- API tokens. For sites with API plus tokens (Jira, Stripe, etc.).
- Direct content upload. Manually upload PDFs of the behind-login content.
- Public mirror. Many docs sites maintain a public mirror for SEO.
Common pitfalls
Cookies expire during crawl. Initial pages succeed; later fail. Re-extract.
Cookie matches wrong domain. Must match site domain exactly (or be a parent domain).
MFA-protected accounts. Session cookies bypass MFA but only briefly. Use service accounts without MFA, or refresh more often.
Cookies leak via logs. AskVault redacts cookies in audit logs.
FAQ
Is this safe?
If using a dedicated service account, yes. Don't use your personal admin account.
Can I use this for a site I don't own?
Only if you have explicit authorization. Respect Terms of Service.
What if the site rotates cookies aggressively?
AskVault's rotation-capture helps. For very aggressive sites, the crawl may break frequently; consider alternatives.