BlazeCrawl — The Web Scraping API That Doesn't Rip You Off

Getting Started

BlazeCrawl turns any website into clean, LLM-ready data. Get started in under two minutes: create an account, grab your API key, and make your first request.

1. Create an Account

Head to blazecrawl-dev.web.app/login and sign up with your email, Google, or GitHub account. No credit card required.

2. Get Your API Key

After signing in, navigate to the Dashboard and create a new API key. Your key will look like bc_live_xxxxxxxxxxxxxxxx. Keep it safe — treat it like a password.

3. Make Your First Request

Scrape any URL and get markdown back. Here is a complete example:

Your first scrape

curl -X POST https://blazecrawl-dev.web.app/api/v1/scrape \
  -H "Authorization: Bearer bc_live_xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "format": "markdown"
  }'

You will receive a JSON response like this:

Response

{
  "success": true,
  "data": {
    "markdown": "# Example Domain\n\nThis domain is for use in illustrative examples...",
    "metadata": {
      "title": "Example Domain",
      "description": "Example Domain",
      "statusCode": 200,
      "url": "https://example.com"
    }
  }
}

The free tier includes 500 credits per month with 2 concurrent requests. Upgrade anytime from your dashboard.

Authentication

All API requests require a Bearer token in the Authorization header. You can create and manage API keys from your dashboard.

Authorization Header

Authorization: Bearer bc_live_xxxxxxxxxxxxxxxx

API Key Types

Prefix	Environment	Usage
`bc_live_`	Production	Live API access, metered usage
`bc_test_`	Test	Sandbox access, no credits consumed

Never expose your API key in client-side code. Always call the BlazeCrawl API from your backend server or a serverless function.

Scrape

The Scrape endpoint converts a single URL into clean markdown, HTML, or structured data. It handles JavaScript rendering, anti-bot bypassing, and dynamic content automatically.

POST/api/v1/scrape

Request Body

Parameter	Type	Required	Description
`url`	`string`	Required	The URL to scrape
`format`	`string`	Optional	Output format: "markdown" (default), "html", "text", "screenshot"
`includeTags`	`string[]`	Optional	Only include content from these CSS selectors
`excludeTags`	`string[]`	Optional	Exclude content matching these CSS selectors
`waitFor`	`number`	Optional	Wait time in ms after page load (for dynamic content)
`timeout`	`number`	Optional	Maximum request timeout in ms (default: 30000)
`headers`	`object`	Optional	Custom HTTP headers to send with the request
`renderJs`	`boolean`	Optional	Use Playwright for JavaScript rendering
`screenshot`	`boolean`	Optional	Include a base64 screenshot in the response
`pdf`	`boolean`	Optional	Include a base64 PDF in the response
`skipCache`	`boolean`	Optional	Bypass response cache and fetch fresh content

Example Request

Scrape with options

curl -X POST https://blazecrawl-dev.web.app/api/v1/scrape \
  -H "Authorization: Bearer bc_live_xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://docs.example.com/getting-started",
    "format": "markdown",
    "excludeTags": ["nav", "footer", ".sidebar"],
    "waitFor": 2000
  }'

Response

200 OK

{
  "success": true,
  "data": {
    "markdown": "# Getting Started\n\nWelcome to the documentation...",
    "html": "<h1>Getting Started</h1><p>Welcome to the documentation...</p>",
    "metadata": {
      "title": "Getting Started - Example Docs",
      "description": "Learn how to get started with Example.",
      "language": "en",
      "statusCode": 200,
      "url": "https://docs.example.com/getting-started"
    }
  }
}

Use excludeTags to remove navigation, footers, and sidebars for cleaner LLM input. This can reduce token usage by 30-50%.

Crawl

The Crawl endpoint discovers and scrapes all pages on a website. It follows links, respects your configuration, and delivers results via webhook or polling. Ideal for indexing an entire docs site or building a knowledge base.

Start a Crawl

POST/api/v1/crawl

Parameter	Type	Required	Description
`url`	`string`	Required	The starting URL to crawl
`maxPages`	`number`	Optional	Maximum pages to crawl (default: 100)
`maxDepth`	`number`	Optional	Maximum link depth from the starting URL (default: 3)
`includePaths`	`string[]`	Optional	Only crawl URLs matching these glob patterns
`excludePaths`	`string[]`	Optional	Skip URLs matching these glob patterns
`format`	`string`	Optional	Output format for each page: "markdown" (default), "html", "text"
`webhook`	`string`	Optional	URL to receive crawl completion notification

Example Request

Start a crawl

curl -X POST https://blazecrawl-dev.web.app/api/v1/crawl \
  -H "Authorization: Bearer bc_live_xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://docs.example.com",
    "maxPages": 500,
    "maxDepth": 5,
    "includePaths": ["/docs/*", "/guides/*"],
    "excludePaths": ["/blog/*"],
    "format": "markdown"
  }'

Response (Crawl Started)

202 Accepted

{
  "success": true,
  "id": "crawl_abc123xyz",
  "url": "https://blazecrawl-dev.web.app/api/v1/crawl/crawl_abc123xyz"
}

Check Crawl Status

GET/api/v1/crawl/:id

Poll crawl status

curl https://blazecrawl-dev.web.app/api/v1/crawl/crawl_abc123xyz \
  -H "Authorization: Bearer bc_live_xxx"

200 OK — In Progress

{
  "success": true,
  "status": "crawling",
  "pagesFound": 142,
  "pagesCrawled": 87,
  "data": [
    {
      "url": "https://docs.example.com/intro",
      "markdown": "# Introduction\n\n...",
      "metadata": { "title": "Introduction", "statusCode": 200 }
    }
  ]
}

Use the webhook parameter instead of polling. BlazeCrawl will POST to your URL when the crawl completes, saving you from building a polling loop.

Map

The Map endpoint discovers every URL on a site without scraping content. It parses sitemaps, follows links, and returns a complete URL map. Use this to plan your crawl or understand site structure before consuming credits.

POST/api/v1/map

Parameter	Type	Required	Description
`url`	`string`	Required	The website URL to map
`maxUrls`	`number`	Optional	Maximum URLs to discover (default: 1000)
`includePaths`	`string[]`	Optional	Only include URLs matching these patterns
`excludePaths`	`string[]`	Optional	Exclude URLs matching these patterns
`useSitemap`	`boolean`	Optional	Parse sitemap.xml if available (default: true)

Example Request

Map a website

curl -X POST https://blazecrawl-dev.web.app/api/v1/map \
  -H "Authorization: Bearer bc_live_xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "maxUrls": 500,
    "includePaths": ["/docs/*"]
  }'

Response

200 OK

{
  "success": true,
  "count": 47,
  "urls": [
    "https://example.com/docs",
    "https://example.com/docs/getting-started",
    "https://example.com/docs/authentication",
    "https://example.com/docs/api-reference",
    "https://example.com/docs/sdks/python",
    "https://example.com/docs/sdks/node"
  ]
}

Map is free — it does not consume credits. Use it to discover URLs before crawling, so you only pay for the pages you actually need.

Extract

The Extract endpoint uses AI to pull structured data from any webpage. Define a JSON schema and BlazeCrawl will return perfectly formatted data — powered by Claude AI. Ideal for price monitoring, lead generation, and data pipelines.

POST/api/v1/extract

Parameter	Type	Required	Description
`url`	`string`	Required	The URL to extract data from
`schema`	`object`	Required	JSON Schema describing the data structure you want
`prompt`	`string`	Optional	Additional instructions for the AI extraction model
`format`	`string`	Optional	Source format for extraction: "markdown" (default), "html"

Example Request

Extract product data

curl -X POST https://blazecrawl-dev.web.app/api/v1/extract \
  -H "Authorization: Bearer bc_live_xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://store.example.com/product/wireless-headphones",
    "schema": {
      "type": "object",
      "properties": {
        "name": { "type": "string" },
        "price": { "type": "number" },
        "currency": { "type": "string" },
        "rating": { "type": "number" },
        "reviewCount": { "type": "integer" },
        "inStock": { "type": "boolean" },
        "features": {
          "type": "array",
          "items": { "type": "string" }
        }
      }
    },
    "prompt": "Extract the main product details from this page."
  }'

Response

200 OK

{
  "success": true,
  "data": {
    "name": "ProSound Wireless Headphones X3",
    "price": 149.99,
    "currency": "USD",
    "rating": 4.7,
    "reviewCount": 2847,
    "inStock": true,
    "features": [
      "Active noise cancellation",
      "40-hour battery life",
      "Bluetooth 5.3",
      "Multi-device pairing",
      "Foldable design"
    ]
  }
}

The Extract endpoint costs 5 credits per page (vs 1 for Scrape) because it uses AI processing. Use Scrape for simple content and Extract for structured data.

Search

The Search endpoint performs a web search and returns scraped, LLM-ready content for each result. Combine search with scraping in a single API call — perfect for RAG pipelines and research agents.

POST/api/v1/search

Parameter	Type	Required	Description
`query`	`string`	Required	The search query string
`limit`	`number`	Optional	Maximum number of results to return (default: 5)
`format`	`string`	Optional	Output format: "markdown" (default), "html", "text"
`country`	`string`	Optional	Country code for localized results (e.g., "us", "gb", "de")
`lang`	`string`	Optional	Language code for results (e.g., "en", "fr", "ja")

Example Request

Search the web

curl -X POST https://blazecrawl-dev.web.app/api/v1/search \
  -H "Authorization: Bearer bc_live_xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "best practices for web scraping 2026",
    "limit": 3,
    "format": "markdown",
    "country": "us",
    "lang": "en"
  }'

Response

200 OK

{
  "success": true,
  "data": {
    "query": "best practices for web scraping 2026",
    "results": [
      {
        "url": "https://example.com/web-scraping-guide",
        "title": "Web Scraping Best Practices in 2026",
        "description": "A comprehensive guide to ethical and efficient web scraping...",
        "markdown": "# Web Scraping Best Practices\n\nIn 2026, the landscape of web scraping...",
        "metadata": {
          "statusCode": 200,
          "language": "en"
        }
      }
    ]
  }
}

Search costs 1 credit per result returned. If Google API credentials are not configured, BlazeCrawl automatically falls back to DuckDuckGo for search results.

Interact

The Interact endpoint lets you perform browser actions on a page — click buttons, fill forms, scroll, take screenshots, and more. Powered by Playwright, it enables scraping of content that requires user interaction to reveal.

POST/api/v1/interact

Parameter	Type	Required	Description
`url`	`string`	Required	The URL to interact with
`actions`	`Action[]`	Required	Array of actions to perform sequentially
`format`	`string`	Optional	Output format: "markdown" (default), "html", "text"

Action Types

Type	Fields	Description
`click`	`selector`	Click an element matching the CSS selector
`type`	`selector, value`	Type text into an input field
`scroll`	`direction`	Scroll the page ("up" or "down")
`wait`	`milliseconds`	Wait for a specified duration
`press`	`key`	Press a keyboard key (e.g., "Enter", "Tab")
`screenshot`	`—`	Capture a screenshot at this step

Example Request

Interact with a page

curl -X POST https://blazecrawl-dev.web.app/api/v1/interact \
  -H "Authorization: Bearer bc_live_xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/search",
    "actions": [
      { "type": "type", "selector": "input[name=q]", "value": "BlazeCrawl" },
      { "type": "press", "key": "Enter" },
      { "type": "wait", "milliseconds": 2000 },
      { "type": "screenshot" }
    ],
    "format": "markdown"
  }'

Response

200 OK

{
  "success": true,
  "data": {
    "url": "https://example.com/search?q=BlazeCrawl",
    "content": "# Search Results\n\n1. BlazeCrawl - Turn websites into LLM-ready data...",
    "format": "markdown",
    "screenshots": [
      "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAA..."
    ]
  }
}

Interact costs 2 credits per request. It requires Playwright to be available on the server — if Playwright is not installed, the API returns a 501 Not Implemented response.

Batch Scrape

The Batch Scrape endpoint lets you scrape up to 100 URLs in a single request. Jobs are processed asynchronously — poll for status or provide a webhook URL to be notified when the batch completes.

Start a Batch

POST/api/v1/batch/scrape

Parameter	Type	Required	Description
`urls`	`string[]`	Required	Array of URLs to scrape (max 100)
`format`	`string`	Optional	Output format: "markdown" (default), "html", "text"
`webhook`	`string`	Optional	URL to receive a POST when batch completes
`renderJs`	`boolean`	Optional	Use Playwright for JavaScript rendering

Example Request

Start a batch scrape

curl -X POST https://blazecrawl-dev.web.app/api/v1/batch/scrape \
  -H "Authorization: Bearer bc_live_xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": [
      "https://example.com/page-1",
      "https://example.com/page-2",
      "https://example.com/page-3"
    ],
    "format": "markdown",
    "webhook": "https://your-server.com/webhook/batch-complete"
  }'

Response (Batch Started)

202 Accepted

{
  "success": true,
  "data": {
    "batchId": "batch_abc123xyz",
    "status": "processing",
    "totalUrls": 3
  }
}

Check Batch Status

GET/api/v1/batch/scrape/:id

Poll batch status

curl https://blazecrawl-dev.web.app/api/v1/batch/scrape/batch_abc123xyz \
  -H "Authorization: Bearer bc_live_xxx"

200 OK — Completed

{
  "success": true,
  "data": {
    "batchId": "batch_abc123xyz",
    "status": "completed",
    "totalUrls": 3,
    "completedUrls": 3,
    "results": [
      {
        "url": "https://example.com/page-1",
        "markdown": "# Page 1\n\nContent...",
        "metadata": { "title": "Page 1", "statusCode": 200 }
      }
    ]
  }
}

Batch Scrape costs 1 credit per URL. Use the webhook parameter to avoid polling — BlazeCrawl will POST the full results to your URL when the batch completes.

Agent

The Agent endpoint uses AI to autonomously browse the web and complete research tasks. Give it a prompt, and the agent will plan its approach, visit multiple pages, extract data, and synthesize results. Ideal for complex research that spans multiple sites.

Start an Agent Job

POST/api/v1/agent

Parameter	Type	Required	Description
`prompt`	`string`	Required	Natural language description of the research task
`maxSteps`	`number`	Optional	Maximum steps the agent can take (default: 10, max: 50)
`maxUrls`	`number`	Optional	Maximum URLs the agent can visit (default: 5)

Example Request

Start an agent job

curl -X POST https://blazecrawl-dev.web.app/api/v1/agent \
  -H "Authorization: Bearer bc_live_xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Find the pricing pages for the top 3 web scraping APIs and compare their free tier limits",
    "maxSteps": 20,
    "maxUrls": 10
  }'

Response (Job Started)

202 Accepted

{
  "success": true,
  "data": {
    "jobId": "agent_abc123xyz",
    "status": "running"
  }
}

Check Agent Status

GET/api/v1/agent/:id

Poll agent status

curl https://blazecrawl-dev.web.app/api/v1/agent/agent_abc123xyz \
  -H "Authorization: Bearer bc_live_xxx"

200 OK — Completed

{
  "success": true,
  "data": {
    "jobId": "agent_abc123xyz",
    "status": "completed",
    "stepsUsed": 12,
    "urlsVisited": 6,
    "result": "## Web Scraping API Pricing Comparison\n\n| Provider | Free Tier | Rate Limit |\n|---|---|---|\n| BlazeCrawl | 500 credits/mo | 2 concurrent |\n| ...",
    "sources": [
      "https://blazecrawl.dev/pricing",
      "https://competitor1.com/pricing",
      "https://competitor2.com/pricing"
    ]
  }
}

The Agent endpoint costs 5 credits per step and requires a Growth tier subscription or above. Use maxSteps and maxUrls to control costs.

SDKs

Official SDKs, CLI tools, and integrations make it easy to use BlazeCrawl from any environment. Each SDK wraps the REST API with idiomatic methods, type safety, automatic retries, and built-in error handling.

SDKs & Tools

Node.js SDKnpm

npm install blazecrawl

Python SDKPyPI

pip install blazecrawl

CLInpm

npm install -g blazecrawl-cli

MCP Servernpm

npm install blazecrawl-mcp

LangChainPyPI

pip install langchain-blazecrawl

LlamaIndexPyPI

pip install llama-index-blazecrawl

Node.js SDK

Node.js / TypeScript

import BlazeCrawl from "blazecrawl";

const client = new BlazeCrawl({ apiKey: "bc_live_xxx" });

// Scrape a single page
const result = await client.scrape({
  url: "https://example.com",
  format: "markdown",
});
console.log(result.markdown);

// Crawl an entire site
const crawl = await client.crawl({
  url: "https://docs.example.com",
  maxPages: 100,
  format: "markdown",
});
for (const page of crawl.data) {
  console.log(`${page.url}: ${page.markdown.length} chars`);
}

// Search the web
const search = await client.search({
  query: "web scraping best practices",
  limit: 5,
});
for (const r of search.results) {
  console.log(`${r.title}: ${r.url}`);
}

Python SDK

from blazecrawl import BlazeCrawl

client = BlazeCrawl(api_key="bc_live_xxx")

# Scrape a single page
result = client.scrape(
    url="https://example.com",
    format="markdown"
)
print(result.markdown)

# Crawl an entire site
crawl = client.crawl(
    url="https://docs.example.com",
    max_pages=100,
    format="markdown"
)
for page in crawl.data:
    print(f"{page.url}: {len(page.markdown)} chars")

# Extract structured data
product = client.extract(
    url="https://store.example.com/product/1",
    schema={
        "type": "object",
        "properties": {
            "name": {"type": "string"},
            "price": {"type": "number"}
        }
    }
)
print(f"{product.data['name']}: ${product.data['price']}")

CLI

BlazeCrawl CLI

# Install globally
npm install -g blazecrawl-cli

# Authenticate
blazecrawl auth set bc_live_xxx

# Scrape a URL
blazecrawl scrape https://example.com --format markdown

# Crawl a site
blazecrawl crawl https://docs.example.com --max-pages 50

# Search the web
blazecrawl search "web scraping tutorials" --limit 5

MCP Server

MCP Server for AI Agents

# Install the MCP server
npm install blazecrawl-mcp

# Run as a Model Context Protocol server
npx blazecrawl-mcp --api-key bc_live_xxx

# Add to your Claude Desktop / AI agent config:
# {
#   "mcpServers": {
#     "blazecrawl": {
#       "command": "npx",
#       "args": ["blazecrawl-mcp", "--api-key", "bc_live_xxx"]
#     }
#   }
# }

LangChain Integration

LangChain

from langchain_blazecrawl import BlazeCrawlLoader

loader = BlazeCrawlLoader(
    api_key="bc_live_xxx",
    url="https://docs.example.com",
    mode="crawl",
    params={"max_pages": 50, "format": "markdown"}
)
documents = loader.load()

# Use with any LangChain chain or retriever
for doc in documents:
    print(f"{doc.metadata['url']}: {len(doc.page_content)} chars")

LlamaIndex Integration

LlamaIndex

from llama_index_blazecrawl import BlazeCrawlReader

reader = BlazeCrawlReader(api_key="bc_live_xxx")

# Load documents from a website
documents = reader.load_data(
    url="https://docs.example.com",
    max_pages=50
)

# Build an index
from llama_index.core import VectorStoreIndex
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
response = query_engine.query("How do I authenticate?")

Rate Limits

Rate limits are enforced per API key. When you exceed your limit, the API returns a 429 Too Many Requests response with a Retry-After header.

Pricing Tiers

Tier	Credits/Mo	Concurrent	Price
Free	500	2	$0
Hobby	3,000	5	$16/mo
Standard	100,000	50	$83/mo
Growth	500,000	100	$333/mo
Scale	1,000,000	150	$599/mo
Enterprise	Unlimited	500	Custom

Credit Costs

Different operations consume different amounts of credits. Plan your usage accordingly:

Operation	Credits
Scrape	1
Crawl	1 per page
Map	1
Search	1 per result
Interact	2
Batch Scrape	1 per URL
Extract	5
Agent	5 per step

Rate Limit Headers

Every response includes headers to help you track your usage:

Response headers

X-RateLimit-Limit: 100
X-RateLimit-Remaining: 87
X-RateLimit-Reset: 1711500000
Retry-After: 12

Our SDKs handle rate limits automatically with exponential backoff. If you are using the REST API directly, check the Retry-After header and wait before retrying.

Errors

BlazeCrawl uses standard HTTP status codes. All error responses include a JSON body with a human-readable message and an error code for programmatic handling.

Error Response Format

Error response

{
  "success": false,
  "error": {
    "code": "INVALID_URL",
    "message": "The provided URL is not valid or not reachable.",
    "statusCode": 422
  }
}

Error Codes

HTTP Status	Code	Description
`400`	`BAD_REQUEST`	Request body is malformed or missing required fields
`401`	`UNAUTHORIZED`	Missing or invalid API key
`402`	`INSUFFICIENT_CREDITS`	Not enough credits. Top up or upgrade your plan.
`403`	`FORBIDDEN`	API key does not have permission for this action
`404`	`NOT_FOUND`	The requested resource (crawl job, etc.) was not found
`408`	`TIMEOUT`	The scrape timed out. Increase the timeout parameter.
`422`	`INVALID_URL`	The provided URL is not valid or not reachable
`429`	`RATE_LIMITED`	Too many requests. Check the Retry-After header.
`500`	`INTERNAL_ERROR`	Something went wrong on our end. Please retry.

If you receive a 500 INTERNAL_ERROR, please retry with exponential backoff. If the error persists, contact support with the request ID from the X-Request-Id response header.

Ready to start scraping?

Get 500 free credits. No credit card required.

Get Your API Key View on GitHub