Docs · Extraction

Clean article extraction

Readability extraction — strips nav, ads, sidebars.

POST /v1/extract/clean

Returns the main article body of a page along with the title, word count, and estimated reading time. Built for LLM ingestion and content indexing.

Parameters

Name	Type	Required	Default	Description
url	string	yes	—	URL of the article.
stealth	boolean	no	true	Apply stealth countermeasures.
timeout	integer	no	30	Per-request timeout.

Request

curl -X POST https://api.qcrawl.com/v1/extract/clean \
  -H "Authorization: Bearer osk_..." \
  -d '{"url": "https://en.wikipedia.org/wiki/Web_scraping"}'

Response

{
  "status": "success",
  "title": "Web scraping - Wikipedia",
  "content_html": "<div>...</div>",
  "content_text": "Method of extracting data from websites...",
  "word_count": 3873,
  "reading_time_min": 16
}

POST /v1/extract/structured

Structured data extraction

POST /v1/scrape

Scrape a single URL

Clean article extraction

Parameters

Request

Response

Related