Skip to content
Try Free →

Ingest knowledge from Google Drive

Last updated: · 3 min read

What gets indexed

  • Google Docs. Full body, headings, lists.
  • Google Sheets. First sheet, rows as chunks.
  • Google Slides. Slide titles plus body text.
  • PDFs stored in Drive. Text-based PDFs; scanned require OCR.
  • DOCX, XLSX, PPTX uploaded to Drive.

What's not indexed:

  • Images and videos beyond captions.
  • Files in trashed folders.
  • Shared-with-me content unless explicitly selected.

Setup

10 to 15 minutes:

  1. Knowledge Hub > Add Source > Google Drive.
  2. Click "Connect Google Drive".
  3. Sign in with Google.
  4. Approve scopes: drive.readonly.
  5. Pick folder(s) to index.
  6. Trigger initial sync. 100 docs indexes in about 5 minutes.

Folder selection

Three options:

  • Specific folder plus subfolders.
  • Shared Drive (Team Drive).
  • Whole "My Drive" (rare; too noisy).

Recommended: dedicated "AskVault Knowledge" folder where you curate what's indexed.

Sync behavior

  • Default 6 hours.
  • Webhook-triggered sync. Within 60 seconds of file edit.
  • Manual sync any time.

Use cases

Internal helpdesk. Index "Employee Handbook" Drive folder. Employees query via Slack bot.

Sales playbook. Index sales-team Drive folder. Sales reps query the bot for objection-handling answers.

Customer-facing docs. Marketing keeps public-facing Drive docs indexed for the website bot.

Audience tags

Per folder:

  • internal for employee-only Drive folders.
  • public for marketing-facing docs.

Permissions

OAuth scope drive.readonly:

  • Read access only.
  • Limited to files the OAuth user can see.
  • No write or share.

Limits

  • Files per source. 1,000.
  • File size per file. 50 MB.
  • Sync frequency. 1 hour minimum.
  • OAuth re-auth. Every 6 months recommended.

Planned features

  • Image OCR for scanned PDFs in Drive.
  • Real-time Drive change notifications.
  • Sheets formula-aware retrieval.

Common pitfalls

Files missing. OAuth user can't see them. Grant access in Drive then re-sync.

Scanned PDFs empty. OCR not run. Pre-process before upload.

Sheets too large. Cap at 50 MB. Filter columns or split.

FAQ

Does this work with Google Workspace?

Yes. Same OAuth flow as personal Google.

Can I index a shared Drive?

Yes if the OAuth user is a member.

Will Drive changes by other people propagate?

Yes via webhook (within 60 seconds) or scheduled sync.

Was this page helpful?