Docs · Extraction
Clean article extraction
Readability extraction — strips nav, ads, sidebars.
POST
/v1/extract/clean Returns the main article body of a page along with the title, word count, and estimated reading time. Built for LLM ingestion and content indexing.
Parameters
| Name | Type | Required | Default | Description |
|---|---|---|---|---|
| url | string | yes | — | URL of the article. |
| stealth | boolean | no | true | Apply stealth countermeasures. |
| timeout | integer | no | 30 | Per-request timeout. |
Request
curl -X POST https://api.datasonar.dev/v1/extract/clean \
-H "Authorization: Bearer osk_..." \
-d '{"url": "https://en.wikipedia.org/wiki/Web_scraping"}' Response
{
"status": "success",
"title": "Web scraping - Wikipedia",
"content_html": "<div>...</div>",
"content_text": "Method of extracting data from websites...",
"word_count": 3873,
"reading_time_min": 16
}