Getting Started
BlazeCrawl turns any website into clean, LLM-ready data. Get started in under two minutes: create an account, grab your API key, and make your first request.
1. Create an Account
Head to blazecrawl-dev.web.app/login and sign up with your email, Google, or GitHub account. No credit card required.
2. Get Your API Key
After signing in, navigate to the Dashboard and create a new API key. Your key will look like bc_live_xxxxxxxxxxxxxxxx. Keep it safe — treat it like a password.
3. Make Your First Request
Scrape any URL and get markdown back. Here is a complete example:
curl -X POST https://blazecrawl-dev.web.app/api/v1/scrape \
-H "Authorization: Bearer bc_live_xxx" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com",
"format": "markdown"
}'You will receive a JSON response like this:
{
"success": true,
"data": {
"markdown": "# Example Domain\n\nThis domain is for use in illustrative examples...",
"metadata": {
"title": "Example Domain",
"description": "Example Domain",
"statusCode": 200,
"url": "https://example.com"
}
}
}The free tier includes 500 credits per month with 2 concurrent requests. Upgrade anytime from your dashboard.
Authentication
All API requests require a Bearer token in the Authorization header. You can create and manage API keys from your dashboard.
Authorization: Bearer bc_live_xxxxxxxxxxxxxxxx
API Key Types
| Prefix | Environment | Usage |
|---|---|---|
bc_live_ | Production | Live API access, metered usage |
bc_test_ | Test | Sandbox access, no credits consumed |
Never expose your API key in client-side code. Always call the BlazeCrawl API from your backend server or a serverless function.
Scrape
The Scrape endpoint converts a single URL into clean markdown, HTML, or structured data. It handles JavaScript rendering, anti-bot bypassing, and dynamic content automatically.
/api/v1/scrapeRequest Body
| Parameter | Type | Required | Description |
|---|---|---|---|
url | string | Required | The URL to scrape |
format | string | Optional | Output format: "markdown" (default), "html", "text", "screenshot" |
includeTags | string[] | Optional | Only include content from these CSS selectors |
excludeTags | string[] | Optional | Exclude content matching these CSS selectors |
waitFor | number | Optional | Wait time in ms after page load (for dynamic content) |
timeout | number | Optional | Maximum request timeout in ms (default: 30000) |
headers | object | Optional | Custom HTTP headers to send with the request |
renderJs | boolean | Optional | Use Playwright for JavaScript rendering |
screenshot | boolean | Optional | Include a base64 screenshot in the response |
pdf | boolean | Optional | Include a base64 PDF in the response |
skipCache | boolean | Optional | Bypass response cache and fetch fresh content |
Example Request
curl -X POST https://blazecrawl-dev.web.app/api/v1/scrape \
-H "Authorization: Bearer bc_live_xxx" \
-H "Content-Type: application/json" \
-d '{
"url": "https://docs.example.com/getting-started",
"format": "markdown",
"excludeTags": ["nav", "footer", ".sidebar"],
"waitFor": 2000
}'Response
{
"success": true,
"data": {
"markdown": "# Getting Started\n\nWelcome to the documentation...",
"html": "<h1>Getting Started</h1><p>Welcome to the documentation...</p>",
"metadata": {
"title": "Getting Started - Example Docs",
"description": "Learn how to get started with Example.",
"language": "en",
"statusCode": 200,
"url": "https://docs.example.com/getting-started"
}
}
}Use excludeTags to remove navigation, footers, and sidebars for cleaner LLM input. This can reduce token usage by 30-50%.
Crawl
The Crawl endpoint discovers and scrapes all pages on a website. It follows links, respects your configuration, and delivers results via webhook or polling. Ideal for indexing an entire docs site or building a knowledge base.
Start a Crawl
/api/v1/crawl| Parameter | Type | Required | Description |
|---|---|---|---|
url | string | Required | The starting URL to crawl |
maxPages | number | Optional | Maximum pages to crawl (default: 100) |
maxDepth | number | Optional | Maximum link depth from the starting URL (default: 3) |
includePaths | string[] | Optional | Only crawl URLs matching these glob patterns |
excludePaths | string[] | Optional | Skip URLs matching these glob patterns |
format | string | Optional | Output format for each page: "markdown" (default), "html", "text" |
webhook | string | Optional | URL to receive crawl completion notification |
Example Request
curl -X POST https://blazecrawl-dev.web.app/api/v1/crawl \
-H "Authorization: Bearer bc_live_xxx" \
-H "Content-Type: application/json" \
-d '{
"url": "https://docs.example.com",
"maxPages": 500,
"maxDepth": 5,
"includePaths": ["/docs/*", "/guides/*"],
"excludePaths": ["/blog/*"],
"format": "markdown"
}'Response (Crawl Started)
{
"success": true,
"id": "crawl_abc123xyz",
"url": "https://blazecrawl-dev.web.app/api/v1/crawl/crawl_abc123xyz"
}Check Crawl Status
/api/v1/crawl/:idcurl https://blazecrawl-dev.web.app/api/v1/crawl/crawl_abc123xyz \ -H "Authorization: Bearer bc_live_xxx"
{
"success": true,
"status": "crawling",
"pagesFound": 142,
"pagesCrawled": 87,
"data": [
{
"url": "https://docs.example.com/intro",
"markdown": "# Introduction\n\n...",
"metadata": { "title": "Introduction", "statusCode": 200 }
}
]
}Use the webhook parameter instead of polling. BlazeCrawl will POST to your URL when the crawl completes, saving you from building a polling loop.
Map
The Map endpoint discovers every URL on a site without scraping content. It parses sitemaps, follows links, and returns a complete URL map. Use this to plan your crawl or understand site structure before consuming credits.
/api/v1/map| Parameter | Type | Required | Description |
|---|---|---|---|
url | string | Required | The website URL to map |
maxUrls | number | Optional | Maximum URLs to discover (default: 1000) |
includePaths | string[] | Optional | Only include URLs matching these patterns |
excludePaths | string[] | Optional | Exclude URLs matching these patterns |
useSitemap | boolean | Optional | Parse sitemap.xml if available (default: true) |
Example Request
curl -X POST https://blazecrawl-dev.web.app/api/v1/map \
-H "Authorization: Bearer bc_live_xxx" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com",
"maxUrls": 500,
"includePaths": ["/docs/*"]
}'Response
{
"success": true,
"count": 47,
"urls": [
"https://example.com/docs",
"https://example.com/docs/getting-started",
"https://example.com/docs/authentication",
"https://example.com/docs/api-reference",
"https://example.com/docs/sdks/python",
"https://example.com/docs/sdks/node"
]
}Map is free — it does not consume credits. Use it to discover URLs before crawling, so you only pay for the pages you actually need.
Extract
The Extract endpoint uses AI to pull structured data from any webpage. Define a JSON schema and BlazeCrawl will return perfectly formatted data — powered by Claude AI. Ideal for price monitoring, lead generation, and data pipelines.
/api/v1/extract| Parameter | Type | Required | Description |
|---|---|---|---|
url | string | Required | The URL to extract data from |
schema | object | Required | JSON Schema describing the data structure you want |
prompt | string | Optional | Additional instructions for the AI extraction model |
format | string | Optional | Source format for extraction: "markdown" (default), "html" |
Example Request
curl -X POST https://blazecrawl-dev.web.app/api/v1/extract \
-H "Authorization: Bearer bc_live_xxx" \
-H "Content-Type: application/json" \
-d '{
"url": "https://store.example.com/product/wireless-headphones",
"schema": {
"type": "object",
"properties": {
"name": { "type": "string" },
"price": { "type": "number" },
"currency": { "type": "string" },
"rating": { "type": "number" },
"reviewCount": { "type": "integer" },
"inStock": { "type": "boolean" },
"features": {
"type": "array",
"items": { "type": "string" }
}
}
},
"prompt": "Extract the main product details from this page."
}'Response
{
"success": true,
"data": {
"name": "ProSound Wireless Headphones X3",
"price": 149.99,
"currency": "USD",
"rating": 4.7,
"reviewCount": 2847,
"inStock": true,
"features": [
"Active noise cancellation",
"40-hour battery life",
"Bluetooth 5.3",
"Multi-device pairing",
"Foldable design"
]
}
}The Extract endpoint costs 5 credits per page (vs 1 for Scrape) because it uses AI processing. Use Scrape for simple content and Extract for structured data.
Search
The Search endpoint performs a web search and returns scraped, LLM-ready content for each result. Combine search with scraping in a single API call — perfect for RAG pipelines and research agents.
/api/v1/search| Parameter | Type | Required | Description |
|---|---|---|---|
query | string | Required | The search query string |
limit | number | Optional | Maximum number of results to return (default: 5) |
format | string | Optional | Output format: "markdown" (default), "html", "text" |
country | string | Optional | Country code for localized results (e.g., "us", "gb", "de") |
lang | string | Optional | Language code for results (e.g., "en", "fr", "ja") |
Example Request
curl -X POST https://blazecrawl-dev.web.app/api/v1/search \
-H "Authorization: Bearer bc_live_xxx" \
-H "Content-Type: application/json" \
-d '{
"query": "best practices for web scraping 2026",
"limit": 3,
"format": "markdown",
"country": "us",
"lang": "en"
}'Response
{
"success": true,
"data": {
"query": "best practices for web scraping 2026",
"results": [
{
"url": "https://example.com/web-scraping-guide",
"title": "Web Scraping Best Practices in 2026",
"description": "A comprehensive guide to ethical and efficient web scraping...",
"markdown": "# Web Scraping Best Practices\n\nIn 2026, the landscape of web scraping...",
"metadata": {
"statusCode": 200,
"language": "en"
}
}
]
}
}Search costs 1 credit per result returned. If Google API credentials are not configured, BlazeCrawl automatically falls back to DuckDuckGo for search results.
Interact
The Interact endpoint lets you perform browser actions on a page — click buttons, fill forms, scroll, take screenshots, and more. Powered by Playwright, it enables scraping of content that requires user interaction to reveal.
/api/v1/interact| Parameter | Type | Required | Description |
|---|---|---|---|
url | string | Required | The URL to interact with |
actions | Action[] | Required | Array of actions to perform sequentially |
format | string | Optional | Output format: "markdown" (default), "html", "text" |
Action Types
| Type | Fields | Description |
|---|---|---|
click | selector | Click an element matching the CSS selector |
type | selector, value | Type text into an input field |
scroll | direction | Scroll the page ("up" or "down") |
wait | milliseconds | Wait for a specified duration |
press | key | Press a keyboard key (e.g., "Enter", "Tab") |
screenshot | — | Capture a screenshot at this step |
Example Request
curl -X POST https://blazecrawl-dev.web.app/api/v1/interact \
-H "Authorization: Bearer bc_live_xxx" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com/search",
"actions": [
{ "type": "type", "selector": "input[name=q]", "value": "BlazeCrawl" },
{ "type": "press", "key": "Enter" },
{ "type": "wait", "milliseconds": 2000 },
{ "type": "screenshot" }
],
"format": "markdown"
}'Response
{
"success": true,
"data": {
"url": "https://example.com/search?q=BlazeCrawl",
"content": "# Search Results\n\n1. BlazeCrawl - Turn websites into LLM-ready data...",
"format": "markdown",
"screenshots": [
"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAA..."
]
}
}Interact costs 2 credits per request. It requires Playwright to be available on the server — if Playwright is not installed, the API returns a 501 Not Implemented response.
Batch Scrape
The Batch Scrape endpoint lets you scrape up to 100 URLs in a single request. Jobs are processed asynchronously — poll for status or provide a webhook URL to be notified when the batch completes.
Start a Batch
/api/v1/batch/scrape| Parameter | Type | Required | Description |
|---|---|---|---|
urls | string[] | Required | Array of URLs to scrape (max 100) |
format | string | Optional | Output format: "markdown" (default), "html", "text" |
webhook | string | Optional | URL to receive a POST when batch completes |
renderJs | boolean | Optional | Use Playwright for JavaScript rendering |
Example Request
curl -X POST https://blazecrawl-dev.web.app/api/v1/batch/scrape \
-H "Authorization: Bearer bc_live_xxx" \
-H "Content-Type: application/json" \
-d '{
"urls": [
"https://example.com/page-1",
"https://example.com/page-2",
"https://example.com/page-3"
],
"format": "markdown",
"webhook": "https://your-server.com/webhook/batch-complete"
}'Response (Batch Started)
{
"success": true,
"data": {
"batchId": "batch_abc123xyz",
"status": "processing",
"totalUrls": 3
}
}Check Batch Status
/api/v1/batch/scrape/:idcurl https://blazecrawl-dev.web.app/api/v1/batch/scrape/batch_abc123xyz \ -H "Authorization: Bearer bc_live_xxx"
{
"success": true,
"data": {
"batchId": "batch_abc123xyz",
"status": "completed",
"totalUrls": 3,
"completedUrls": 3,
"results": [
{
"url": "https://example.com/page-1",
"markdown": "# Page 1\n\nContent...",
"metadata": { "title": "Page 1", "statusCode": 200 }
}
]
}
}Batch Scrape costs 1 credit per URL. Use the webhook parameter to avoid polling — BlazeCrawl will POST the full results to your URL when the batch completes.
Agent
The Agent endpoint uses AI to autonomously browse the web and complete research tasks. Give it a prompt, and the agent will plan its approach, visit multiple pages, extract data, and synthesize results. Ideal for complex research that spans multiple sites.
Start an Agent Job
/api/v1/agent| Parameter | Type | Required | Description |
|---|---|---|---|
prompt | string | Required | Natural language description of the research task |
maxSteps | number | Optional | Maximum steps the agent can take (default: 10, max: 50) |
maxUrls | number | Optional | Maximum URLs the agent can visit (default: 5) |
Example Request
curl -X POST https://blazecrawl-dev.web.app/api/v1/agent \
-H "Authorization: Bearer bc_live_xxx" \
-H "Content-Type: application/json" \
-d '{
"prompt": "Find the pricing pages for the top 3 web scraping APIs and compare their free tier limits",
"maxSteps": 20,
"maxUrls": 10
}'Response (Job Started)
{
"success": true,
"data": {
"jobId": "agent_abc123xyz",
"status": "running"
}
}Check Agent Status
/api/v1/agent/:idcurl https://blazecrawl-dev.web.app/api/v1/agent/agent_abc123xyz \ -H "Authorization: Bearer bc_live_xxx"
{
"success": true,
"data": {
"jobId": "agent_abc123xyz",
"status": "completed",
"stepsUsed": 12,
"urlsVisited": 6,
"result": "## Web Scraping API Pricing Comparison\n\n| Provider | Free Tier | Rate Limit |\n|---|---|---|\n| BlazeCrawl | 500 credits/mo | 2 concurrent |\n| ...",
"sources": [
"https://blazecrawl.dev/pricing",
"https://competitor1.com/pricing",
"https://competitor2.com/pricing"
]
}
}The Agent endpoint costs 5 credits per step and requires a Growth tier subscription or above. Use maxSteps and maxUrls to control costs.
SDKs
Official SDKs, CLI tools, and integrations make it easy to use BlazeCrawl from any environment. Each SDK wraps the REST API with idiomatic methods, type safety, automatic retries, and built-in error handling.
SDKs & Tools
npm install blazecrawlpip install blazecrawlnpm install -g blazecrawl-clinpm install blazecrawl-mcppip install langchain-blazecrawlpip install llama-index-blazecrawlNode.js SDK
import BlazeCrawl from "blazecrawl";
const client = new BlazeCrawl({ apiKey: "bc_live_xxx" });
// Scrape a single page
const result = await client.scrape({
url: "https://example.com",
format: "markdown",
});
console.log(result.markdown);
// Crawl an entire site
const crawl = await client.crawl({
url: "https://docs.example.com",
maxPages: 100,
format: "markdown",
});
for (const page of crawl.data) {
console.log(`${page.url}: ${page.markdown.length} chars`);
}
// Search the web
const search = await client.search({
query: "web scraping best practices",
limit: 5,
});
for (const r of search.results) {
console.log(`${r.title}: ${r.url}`);
}Python SDK
from blazecrawl import BlazeCrawl
client = BlazeCrawl(api_key="bc_live_xxx")
# Scrape a single page
result = client.scrape(
url="https://example.com",
format="markdown"
)
print(result.markdown)
# Crawl an entire site
crawl = client.crawl(
url="https://docs.example.com",
max_pages=100,
format="markdown"
)
for page in crawl.data:
print(f"{page.url}: {len(page.markdown)} chars")
# Extract structured data
product = client.extract(
url="https://store.example.com/product/1",
schema={
"type": "object",
"properties": {
"name": {"type": "string"},
"price": {"type": "number"}
}
}
)
print(f"{product.data['name']}: ${product.data['price']}")CLI
# Install globally npm install -g blazecrawl-cli # Authenticate blazecrawl auth set bc_live_xxx # Scrape a URL blazecrawl scrape https://example.com --format markdown # Crawl a site blazecrawl crawl https://docs.example.com --max-pages 50 # Search the web blazecrawl search "web scraping tutorials" --limit 5
MCP Server
# Install the MCP server
npm install blazecrawl-mcp
# Run as a Model Context Protocol server
npx blazecrawl-mcp --api-key bc_live_xxx
# Add to your Claude Desktop / AI agent config:
# {
# "mcpServers": {
# "blazecrawl": {
# "command": "npx",
# "args": ["blazecrawl-mcp", "--api-key", "bc_live_xxx"]
# }
# }
# }LangChain Integration
from langchain_blazecrawl import BlazeCrawlLoader
loader = BlazeCrawlLoader(
api_key="bc_live_xxx",
url="https://docs.example.com",
mode="crawl",
params={"max_pages": 50, "format": "markdown"}
)
documents = loader.load()
# Use with any LangChain chain or retriever
for doc in documents:
print(f"{doc.metadata['url']}: {len(doc.page_content)} chars")LlamaIndex Integration
from llama_index_blazecrawl import BlazeCrawlReader
reader = BlazeCrawlReader(api_key="bc_live_xxx")
# Load documents from a website
documents = reader.load_data(
url="https://docs.example.com",
max_pages=50
)
# Build an index
from llama_index.core import VectorStoreIndex
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
response = query_engine.query("How do I authenticate?")Rate Limits
Rate limits are enforced per API key. When you exceed your limit, the API returns a 429 Too Many Requests response with a Retry-After header.
Pricing Tiers
| Tier | Credits/Mo | Concurrent | Price |
|---|---|---|---|
| Free | 500 | 2 | $0 |
| Hobby | 3,000 | 5 | $16/mo |
| Standard | 100,000 | 50 | $83/mo |
| Growth | 500,000 | 100 | $333/mo |
| Scale | 1,000,000 | 150 | $599/mo |
| Enterprise | Unlimited | 500 | Custom |
Credit Costs
Different operations consume different amounts of credits. Plan your usage accordingly:
| Operation | Credits |
|---|---|
| Scrape | 1 |
| Crawl | 1 per page |
| Map | 1 |
| Search | 1 per result |
| Interact | 2 |
| Batch Scrape | 1 per URL |
| Extract | 5 |
| Agent | 5 per step |
Rate Limit Headers
Every response includes headers to help you track your usage:
X-RateLimit-Limit: 100 X-RateLimit-Remaining: 87 X-RateLimit-Reset: 1711500000 Retry-After: 12
Our SDKs handle rate limits automatically with exponential backoff. If you are using the REST API directly, check the Retry-After header and wait before retrying.
Errors
BlazeCrawl uses standard HTTP status codes. All error responses include a JSON body with a human-readable message and an error code for programmatic handling.
Error Response Format
{
"success": false,
"error": {
"code": "INVALID_URL",
"message": "The provided URL is not valid or not reachable.",
"statusCode": 422
}
}Error Codes
| HTTP Status | Code | Description |
|---|---|---|
400 | BAD_REQUEST | Request body is malformed or missing required fields |
401 | UNAUTHORIZED | Missing or invalid API key |
402 | INSUFFICIENT_CREDITS | Not enough credits. Top up or upgrade your plan. |
403 | FORBIDDEN | API key does not have permission for this action |
404 | NOT_FOUND | The requested resource (crawl job, etc.) was not found |
408 | TIMEOUT | The scrape timed out. Increase the timeout parameter. |
422 | INVALID_URL | The provided URL is not valid or not reachable |
429 | RATE_LIMITED | Too many requests. Check the Retry-After header. |
500 | INTERNAL_ERROR | Something went wrong on our end. Please retry. |
If you receive a 500 INTERNAL_ERROR, please retry with exponential backoff. If the error persists, contact support with the request ID from the X-Request-Id response header.
Ready to start scraping?
Get 500 free credits. No credit card required.