Getting Started

BlazeCrawl turns any website into clean, LLM-ready data. Get started in under two minutes: create an account, grab your API key, and make your first request.

1. Create an Account

Head to blazecrawl-dev.web.app/login and sign up with your email, Google, or GitHub account. No credit card required.

2. Get Your API Key

After signing in, navigate to the Dashboard and create a new API key. Your key will look like bc_live_xxxxxxxxxxxxxxxx. Keep it safe — treat it like a password.

3. Make Your First Request

Scrape any URL and get markdown back. Here is a complete example:

Your first scrape
curl -X POST https://blazecrawl-dev.web.app/api/v1/scrape \
  -H "Authorization: Bearer bc_live_xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "format": "markdown"
  }'

You will receive a JSON response like this:

Response
{
  "success": true,
  "data": {
    "markdown": "# Example Domain\n\nThis domain is for use in illustrative examples...",
    "metadata": {
      "title": "Example Domain",
      "description": "Example Domain",
      "statusCode": 200,
      "url": "https://example.com"
    }
  }
}

The free tier includes 500 credits per month with 2 concurrent requests. Upgrade anytime from your dashboard.

Authentication

All API requests require a Bearer token in the Authorization header. You can create and manage API keys from your dashboard.

Authorization Header
Authorization: Bearer bc_live_xxxxxxxxxxxxxxxx

API Key Types

PrefixEnvironmentUsage
bc_live_ProductionLive API access, metered usage
bc_test_TestSandbox access, no credits consumed

Never expose your API key in client-side code. Always call the BlazeCrawl API from your backend server or a serverless function.

Scrape

The Scrape endpoint converts a single URL into clean markdown, HTML, or structured data. It handles JavaScript rendering, anti-bot bypassing, and dynamic content automatically.

POST/api/v1/scrape

Request Body

ParameterTypeRequiredDescription
urlstringRequiredThe URL to scrape
formatstringOptionalOutput format: "markdown" (default), "html", "text", "screenshot"
includeTagsstring[]OptionalOnly include content from these CSS selectors
excludeTagsstring[]OptionalExclude content matching these CSS selectors
waitFornumberOptionalWait time in ms after page load (for dynamic content)
timeoutnumberOptionalMaximum request timeout in ms (default: 30000)
headersobjectOptionalCustom HTTP headers to send with the request
renderJsbooleanOptionalUse Playwright for JavaScript rendering
screenshotbooleanOptionalInclude a base64 screenshot in the response
pdfbooleanOptionalInclude a base64 PDF in the response
skipCachebooleanOptionalBypass response cache and fetch fresh content

Example Request

Scrape with options
curl -X POST https://blazecrawl-dev.web.app/api/v1/scrape \
  -H "Authorization: Bearer bc_live_xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://docs.example.com/getting-started",
    "format": "markdown",
    "excludeTags": ["nav", "footer", ".sidebar"],
    "waitFor": 2000
  }'

Response

200 OK
{
  "success": true,
  "data": {
    "markdown": "# Getting Started\n\nWelcome to the documentation...",
    "html": "<h1>Getting Started</h1><p>Welcome to the documentation...</p>",
    "metadata": {
      "title": "Getting Started - Example Docs",
      "description": "Learn how to get started with Example.",
      "language": "en",
      "statusCode": 200,
      "url": "https://docs.example.com/getting-started"
    }
  }
}

Use excludeTags to remove navigation, footers, and sidebars for cleaner LLM input. This can reduce token usage by 30-50%.

Crawl

The Crawl endpoint discovers and scrapes all pages on a website. It follows links, respects your configuration, and delivers results via webhook or polling. Ideal for indexing an entire docs site or building a knowledge base.

Start a Crawl

POST/api/v1/crawl
ParameterTypeRequiredDescription
urlstringRequiredThe starting URL to crawl
maxPagesnumberOptionalMaximum pages to crawl (default: 100)
maxDepthnumberOptionalMaximum link depth from the starting URL (default: 3)
includePathsstring[]OptionalOnly crawl URLs matching these glob patterns
excludePathsstring[]OptionalSkip URLs matching these glob patterns
formatstringOptionalOutput format for each page: "markdown" (default), "html", "text"
webhookstringOptionalURL to receive crawl completion notification

Example Request

Start a crawl
curl -X POST https://blazecrawl-dev.web.app/api/v1/crawl \
  -H "Authorization: Bearer bc_live_xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://docs.example.com",
    "maxPages": 500,
    "maxDepth": 5,
    "includePaths": ["/docs/*", "/guides/*"],
    "excludePaths": ["/blog/*"],
    "format": "markdown"
  }'

Response (Crawl Started)

202 Accepted
{
  "success": true,
  "id": "crawl_abc123xyz",
  "url": "https://blazecrawl-dev.web.app/api/v1/crawl/crawl_abc123xyz"
}

Check Crawl Status

GET/api/v1/crawl/:id
Poll crawl status
curl https://blazecrawl-dev.web.app/api/v1/crawl/crawl_abc123xyz \
  -H "Authorization: Bearer bc_live_xxx"
200 OK — In Progress
{
  "success": true,
  "status": "crawling",
  "pagesFound": 142,
  "pagesCrawled": 87,
  "data": [
    {
      "url": "https://docs.example.com/intro",
      "markdown": "# Introduction\n\n...",
      "metadata": { "title": "Introduction", "statusCode": 200 }
    }
  ]
}

Use the webhook parameter instead of polling. BlazeCrawl will POST to your URL when the crawl completes, saving you from building a polling loop.

Map

The Map endpoint discovers every URL on a site without scraping content. It parses sitemaps, follows links, and returns a complete URL map. Use this to plan your crawl or understand site structure before consuming credits.

POST/api/v1/map
ParameterTypeRequiredDescription
urlstringRequiredThe website URL to map
maxUrlsnumberOptionalMaximum URLs to discover (default: 1000)
includePathsstring[]OptionalOnly include URLs matching these patterns
excludePathsstring[]OptionalExclude URLs matching these patterns
useSitemapbooleanOptionalParse sitemap.xml if available (default: true)

Example Request

Map a website
curl -X POST https://blazecrawl-dev.web.app/api/v1/map \
  -H "Authorization: Bearer bc_live_xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "maxUrls": 500,
    "includePaths": ["/docs/*"]
  }'

Response

200 OK
{
  "success": true,
  "count": 47,
  "urls": [
    "https://example.com/docs",
    "https://example.com/docs/getting-started",
    "https://example.com/docs/authentication",
    "https://example.com/docs/api-reference",
    "https://example.com/docs/sdks/python",
    "https://example.com/docs/sdks/node"
  ]
}

Map is free — it does not consume credits. Use it to discover URLs before crawling, so you only pay for the pages you actually need.

Extract

The Extract endpoint uses AI to pull structured data from any webpage. Define a JSON schema and BlazeCrawl will return perfectly formatted data — powered by Claude AI. Ideal for price monitoring, lead generation, and data pipelines.

POST/api/v1/extract
ParameterTypeRequiredDescription
urlstringRequiredThe URL to extract data from
schemaobjectRequiredJSON Schema describing the data structure you want
promptstringOptionalAdditional instructions for the AI extraction model
formatstringOptionalSource format for extraction: "markdown" (default), "html"

Example Request

Extract product data
curl -X POST https://blazecrawl-dev.web.app/api/v1/extract \
  -H "Authorization: Bearer bc_live_xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://store.example.com/product/wireless-headphones",
    "schema": {
      "type": "object",
      "properties": {
        "name": { "type": "string" },
        "price": { "type": "number" },
        "currency": { "type": "string" },
        "rating": { "type": "number" },
        "reviewCount": { "type": "integer" },
        "inStock": { "type": "boolean" },
        "features": {
          "type": "array",
          "items": { "type": "string" }
        }
      }
    },
    "prompt": "Extract the main product details from this page."
  }'

Response

200 OK
{
  "success": true,
  "data": {
    "name": "ProSound Wireless Headphones X3",
    "price": 149.99,
    "currency": "USD",
    "rating": 4.7,
    "reviewCount": 2847,
    "inStock": true,
    "features": [
      "Active noise cancellation",
      "40-hour battery life",
      "Bluetooth 5.3",
      "Multi-device pairing",
      "Foldable design"
    ]
  }
}

The Extract endpoint costs 5 credits per page (vs 1 for Scrape) because it uses AI processing. Use Scrape for simple content and Extract for structured data.

The Search endpoint performs a web search and returns scraped, LLM-ready content for each result. Combine search with scraping in a single API call — perfect for RAG pipelines and research agents.

POST/api/v1/search
ParameterTypeRequiredDescription
querystringRequiredThe search query string
limitnumberOptionalMaximum number of results to return (default: 5)
formatstringOptionalOutput format: "markdown" (default), "html", "text"
countrystringOptionalCountry code for localized results (e.g., "us", "gb", "de")
langstringOptionalLanguage code for results (e.g., "en", "fr", "ja")

Example Request

Search the web
curl -X POST https://blazecrawl-dev.web.app/api/v1/search \
  -H "Authorization: Bearer bc_live_xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "best practices for web scraping 2026",
    "limit": 3,
    "format": "markdown",
    "country": "us",
    "lang": "en"
  }'

Response

200 OK
{
  "success": true,
  "data": {
    "query": "best practices for web scraping 2026",
    "results": [
      {
        "url": "https://example.com/web-scraping-guide",
        "title": "Web Scraping Best Practices in 2026",
        "description": "A comprehensive guide to ethical and efficient web scraping...",
        "markdown": "# Web Scraping Best Practices\n\nIn 2026, the landscape of web scraping...",
        "metadata": {
          "statusCode": 200,
          "language": "en"
        }
      }
    ]
  }
}

Search costs 1 credit per result returned. If Google API credentials are not configured, BlazeCrawl automatically falls back to DuckDuckGo for search results.

Interact

The Interact endpoint lets you perform browser actions on a page — click buttons, fill forms, scroll, take screenshots, and more. Powered by Playwright, it enables scraping of content that requires user interaction to reveal.

POST/api/v1/interact
ParameterTypeRequiredDescription
urlstringRequiredThe URL to interact with
actionsAction[]RequiredArray of actions to perform sequentially
formatstringOptionalOutput format: "markdown" (default), "html", "text"

Action Types

TypeFieldsDescription
clickselectorClick an element matching the CSS selector
typeselector, valueType text into an input field
scrolldirectionScroll the page ("up" or "down")
waitmillisecondsWait for a specified duration
presskeyPress a keyboard key (e.g., "Enter", "Tab")
screenshotCapture a screenshot at this step

Example Request

Interact with a page
curl -X POST https://blazecrawl-dev.web.app/api/v1/interact \
  -H "Authorization: Bearer bc_live_xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/search",
    "actions": [
      { "type": "type", "selector": "input[name=q]", "value": "BlazeCrawl" },
      { "type": "press", "key": "Enter" },
      { "type": "wait", "milliseconds": 2000 },
      { "type": "screenshot" }
    ],
    "format": "markdown"
  }'

Response

200 OK
{
  "success": true,
  "data": {
    "url": "https://example.com/search?q=BlazeCrawl",
    "content": "# Search Results\n\n1. BlazeCrawl - Turn websites into LLM-ready data...",
    "format": "markdown",
    "screenshots": [
      "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAA..."
    ]
  }
}

Interact costs 2 credits per request. It requires Playwright to be available on the server — if Playwright is not installed, the API returns a 501 Not Implemented response.

Batch Scrape

The Batch Scrape endpoint lets you scrape up to 100 URLs in a single request. Jobs are processed asynchronously — poll for status or provide a webhook URL to be notified when the batch completes.

Start a Batch

POST/api/v1/batch/scrape
ParameterTypeRequiredDescription
urlsstring[]RequiredArray of URLs to scrape (max 100)
formatstringOptionalOutput format: "markdown" (default), "html", "text"
webhookstringOptionalURL to receive a POST when batch completes
renderJsbooleanOptionalUse Playwright for JavaScript rendering

Example Request

Start a batch scrape
curl -X POST https://blazecrawl-dev.web.app/api/v1/batch/scrape \
  -H "Authorization: Bearer bc_live_xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": [
      "https://example.com/page-1",
      "https://example.com/page-2",
      "https://example.com/page-3"
    ],
    "format": "markdown",
    "webhook": "https://your-server.com/webhook/batch-complete"
  }'

Response (Batch Started)

202 Accepted
{
  "success": true,
  "data": {
    "batchId": "batch_abc123xyz",
    "status": "processing",
    "totalUrls": 3
  }
}

Check Batch Status

GET/api/v1/batch/scrape/:id
Poll batch status
curl https://blazecrawl-dev.web.app/api/v1/batch/scrape/batch_abc123xyz \
  -H "Authorization: Bearer bc_live_xxx"
200 OK — Completed
{
  "success": true,
  "data": {
    "batchId": "batch_abc123xyz",
    "status": "completed",
    "totalUrls": 3,
    "completedUrls": 3,
    "results": [
      {
        "url": "https://example.com/page-1",
        "markdown": "# Page 1\n\nContent...",
        "metadata": { "title": "Page 1", "statusCode": 200 }
      }
    ]
  }
}

Batch Scrape costs 1 credit per URL. Use the webhook parameter to avoid polling — BlazeCrawl will POST the full results to your URL when the batch completes.

Agent

The Agent endpoint uses AI to autonomously browse the web and complete research tasks. Give it a prompt, and the agent will plan its approach, visit multiple pages, extract data, and synthesize results. Ideal for complex research that spans multiple sites.

Start an Agent Job

POST/api/v1/agent
ParameterTypeRequiredDescription
promptstringRequiredNatural language description of the research task
maxStepsnumberOptionalMaximum steps the agent can take (default: 10, max: 50)
maxUrlsnumberOptionalMaximum URLs the agent can visit (default: 5)

Example Request

Start an agent job
curl -X POST https://blazecrawl-dev.web.app/api/v1/agent \
  -H "Authorization: Bearer bc_live_xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Find the pricing pages for the top 3 web scraping APIs and compare their free tier limits",
    "maxSteps": 20,
    "maxUrls": 10
  }'

Response (Job Started)

202 Accepted
{
  "success": true,
  "data": {
    "jobId": "agent_abc123xyz",
    "status": "running"
  }
}

Check Agent Status

GET/api/v1/agent/:id
Poll agent status
curl https://blazecrawl-dev.web.app/api/v1/agent/agent_abc123xyz \
  -H "Authorization: Bearer bc_live_xxx"
200 OK — Completed
{
  "success": true,
  "data": {
    "jobId": "agent_abc123xyz",
    "status": "completed",
    "stepsUsed": 12,
    "urlsVisited": 6,
    "result": "## Web Scraping API Pricing Comparison\n\n| Provider | Free Tier | Rate Limit |\n|---|---|---|\n| BlazeCrawl | 500 credits/mo | 2 concurrent |\n| ...",
    "sources": [
      "https://blazecrawl.dev/pricing",
      "https://competitor1.com/pricing",
      "https://competitor2.com/pricing"
    ]
  }
}

The Agent endpoint costs 5 credits per step and requires a Growth tier subscription or above. Use maxSteps and maxUrls to control costs.

SDKs

Official SDKs, CLI tools, and integrations make it easy to use BlazeCrawl from any environment. Each SDK wraps the REST API with idiomatic methods, type safety, automatic retries, and built-in error handling.

SDKs & Tools

Node.js SDKnpm
npm install blazecrawl
Python SDKPyPI
pip install blazecrawl
CLInpm
npm install -g blazecrawl-cli
MCP Servernpm
npm install blazecrawl-mcp
LangChainPyPI
pip install langchain-blazecrawl
LlamaIndexPyPI
pip install llama-index-blazecrawl

Node.js SDK

Node.js / TypeScript
import BlazeCrawl from "blazecrawl";

const client = new BlazeCrawl({ apiKey: "bc_live_xxx" });

// Scrape a single page
const result = await client.scrape({
  url: "https://example.com",
  format: "markdown",
});
console.log(result.markdown);

// Crawl an entire site
const crawl = await client.crawl({
  url: "https://docs.example.com",
  maxPages: 100,
  format: "markdown",
});
for (const page of crawl.data) {
  console.log(`${page.url}: ${page.markdown.length} chars`);
}

// Search the web
const search = await client.search({
  query: "web scraping best practices",
  limit: 5,
});
for (const r of search.results) {
  console.log(`${r.title}: ${r.url}`);
}

Python SDK

Python SDK
from blazecrawl import BlazeCrawl

client = BlazeCrawl(api_key="bc_live_xxx")

# Scrape a single page
result = client.scrape(
    url="https://example.com",
    format="markdown"
)
print(result.markdown)

# Crawl an entire site
crawl = client.crawl(
    url="https://docs.example.com",
    max_pages=100,
    format="markdown"
)
for page in crawl.data:
    print(f"{page.url}: {len(page.markdown)} chars")

# Extract structured data
product = client.extract(
    url="https://store.example.com/product/1",
    schema={
        "type": "object",
        "properties": {
            "name": {"type": "string"},
            "price": {"type": "number"}
        }
    }
)
print(f"{product.data['name']}: ${product.data['price']}")

CLI

BlazeCrawl CLI
# Install globally
npm install -g blazecrawl-cli

# Authenticate
blazecrawl auth set bc_live_xxx

# Scrape a URL
blazecrawl scrape https://example.com --format markdown

# Crawl a site
blazecrawl crawl https://docs.example.com --max-pages 50

# Search the web
blazecrawl search "web scraping tutorials" --limit 5

MCP Server

MCP Server for AI Agents
# Install the MCP server
npm install blazecrawl-mcp

# Run as a Model Context Protocol server
npx blazecrawl-mcp --api-key bc_live_xxx

# Add to your Claude Desktop / AI agent config:
# {
#   "mcpServers": {
#     "blazecrawl": {
#       "command": "npx",
#       "args": ["blazecrawl-mcp", "--api-key", "bc_live_xxx"]
#     }
#   }
# }

LangChain Integration

LangChain
from langchain_blazecrawl import BlazeCrawlLoader

loader = BlazeCrawlLoader(
    api_key="bc_live_xxx",
    url="https://docs.example.com",
    mode="crawl",
    params={"max_pages": 50, "format": "markdown"}
)
documents = loader.load()

# Use with any LangChain chain or retriever
for doc in documents:
    print(f"{doc.metadata['url']}: {len(doc.page_content)} chars")

LlamaIndex Integration

LlamaIndex
from llama_index_blazecrawl import BlazeCrawlReader

reader = BlazeCrawlReader(api_key="bc_live_xxx")

# Load documents from a website
documents = reader.load_data(
    url="https://docs.example.com",
    max_pages=50
)

# Build an index
from llama_index.core import VectorStoreIndex
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
response = query_engine.query("How do I authenticate?")

Rate Limits

Rate limits are enforced per API key. When you exceed your limit, the API returns a 429 Too Many Requests response with a Retry-After header.

Pricing Tiers

TierCredits/MoConcurrentPrice
Free5002$0
Hobby3,0005$16/mo
Standard100,00050$83/mo
Growth500,000100$333/mo
Scale1,000,000150$599/mo
EnterpriseUnlimited500Custom

Credit Costs

Different operations consume different amounts of credits. Plan your usage accordingly:

OperationCredits
Scrape1
Crawl1 per page
Map1
Search1 per result
Interact2
Batch Scrape1 per URL
Extract5
Agent5 per step

Rate Limit Headers

Every response includes headers to help you track your usage:

Response headers
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 87
X-RateLimit-Reset: 1711500000
Retry-After: 12

Our SDKs handle rate limits automatically with exponential backoff. If you are using the REST API directly, check the Retry-After header and wait before retrying.

Errors

BlazeCrawl uses standard HTTP status codes. All error responses include a JSON body with a human-readable message and an error code for programmatic handling.

Error Response Format

Error response
{
  "success": false,
  "error": {
    "code": "INVALID_URL",
    "message": "The provided URL is not valid or not reachable.",
    "statusCode": 422
  }
}

Error Codes

HTTP StatusCodeDescription
400BAD_REQUESTRequest body is malformed or missing required fields
401UNAUTHORIZEDMissing or invalid API key
402INSUFFICIENT_CREDITSNot enough credits. Top up or upgrade your plan.
403FORBIDDENAPI key does not have permission for this action
404NOT_FOUNDThe requested resource (crawl job, etc.) was not found
408TIMEOUTThe scrape timed out. Increase the timeout parameter.
422INVALID_URLThe provided URL is not valid or not reachable
429RATE_LIMITEDToo many requests. Check the Retry-After header.
500INTERNAL_ERRORSomething went wrong on our end. Please retry.

If you receive a 500 INTERNAL_ERROR, please retry with exponential backoff. If the error persists, contact support with the request ID from the X-Request-Id response header.

Ready to start scraping?

Get 500 free credits. No credit card required.