Crawl whole sites.
In one job.
From a 500-page documentation site to a 100,000-page knowledge base, the crawler walks the link graph, respects robots.txt by default, and streams clean results back to your webhook when it is done.
curl -X POST https://api.datasonar.dev/v1/crawl \
-H "Authorization: Bearer osk_..." \
-H "Content-Type: application/json" \
-d '{
"url": "https://docs.example.com",
"max_pages": 500,
"depth": 3,
"concurrency": 10,
"respect_robots": true,
"webhook_url": "https://yourapp.com/datasonar-callback"
}'
# Returns: { "status": "queued", "job_id": "..." }Use cases for site-wide crawling
LLM training corpora
Crawl entire documentation sites, knowledge bases, or product catalogs. Receive clean markdown for every page, ready to embed.
Competitor monitoring
Snapshot a competitor's site on a schedule. Diff structural changes, new pages, removed pages, modified pricing.
Internal search indexing
Build full-text search over content you do not own. The crawler returns a structured map of every page it discovers, ready for ingestion into your index.
Archival and compliance
Capture full-site snapshots for legal hold, regulatory archive, or pre-acquisition due diligence. Webhook delivery means no long-held connections.
Crawler questions
How big can a crawl be? ▾
Does the crawler respect robots.txt? ▾
respect_robots: false flag in cases where you have explicit permission to crawl, such as your own site or a partner's.Can I scope the crawl to a single subdomain? ▾
same_host: false to follow links across subdomains, or use include_patterns and exclude_patterns for finer control.How does webhook delivery work? ▾
webhook_url with the request. When the job completes, we send a single POST to your URL with the full result body and a header containing the job id. Webhooks are signed so you can verify the origin.What happens to a crawl if I hit my monthly quota midway? ▾
quota_exceeded flag. You can upgrade your plan and resume the job with the same job id.How fast is a crawl? ▾
Start your first crawl free.
The free tier covers 1,000 pages a month. Plenty to prove value.