Scrape any page.
Get clean data.

Five output formats, smart routing, native action macros, and an async pattern that handles batches of any size. One endpoint family for everything from a single URL to a hundred million.

Five output formats

Markdown for LLMs, HTML for fidelity, plain text for indexing, link arrays for crawling, original-source for diagnostics. Choose per-request. No transcoding pipeline needed.

Smart routing

The smart-scrape endpoint detects whether a page needs JavaScript and routes accordingly. Static pages skip the browser and return in milliseconds. Dynamic pages get the full stealth browser treatment automatically.

Action macros

Drive clicks, typing, waits, scrolls, and form submissions from a JSON array. No Puppeteer or Playwright code. The same description works for every site.

Async with webhooks

Fire large batches, walk away, get a clean POST to your endpoint when each job completes. Built for production pipelines that should not hold connections open.

Examples that work today.

Drop any of these into a terminal or notebook.

curl -X POST https://api.datasonar.dev/v1/scrape \
  -H "Authorization: Bearer osk_..." \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "format": "markdown",
    "stealth": true,
    "actions": [
      {"type": "click", "selector": ".load-more"},
      {"type": "wait", "ms": 2000}
    ]
  }'

Scraping questions

What formats does /v1/scrape return?
Five: markdown for LLM-friendly text, html for raw page source, text for stripped plain text, links for the URL graph extracted from the page, and original for the raw response without transformation. Specify in the request body.
When should I use /v1/scrape vs /v1/scrape/smart?
Use smart when you do not know whether the target page needs JavaScript. It is faster on static pages and falls back to the full browser when needed. Use scrape directly when you already know the page is dynamic, or when you need to pass a custom action macro.
How big can a batch be?
Up to 100 URLs per synchronous batch call. For larger jobs use the async batch endpoint, which queues the work and posts to a webhook when each chunk completes — no upper bound beyond your monthly quota.
Does DataSonar handle JavaScript-rendered single page apps?
Yes. The default browser engine waits for network idle and DOM stability before extracting. You can also pass wait_until values for finer control and use action macros for click-and-wait patterns common to SPAs.
What about pages that require login?
Use the action macro to drive the login form, then continue with subsequent steps in the same session. For pages that need a persistent authenticated session across many requests, talk to us about enterprise options.
How do I scrape paginated content?
Two patterns. For numbered pagination, send the page-2, page-3, page-N URLs as a batch. For infinite-scroll pagination, use the action macro to scroll-and-wait until the load button stops appearing.
Is there a request timeout limit?
Each request has a per-call timeout parameter you can set up to 120 seconds. The default is 30 seconds for /v1/scrape and 120 seconds for /v1/scrape/batch.

Start pulling clean data in minutes.

1,000 requests free every month. No credit card required.