Crawlee

An open-source Python library for building reliable crawlers and browser automation with async support, proxy rotation, and persistent storage.

Apify · Since 2024-01-10

Loading score...

GitHub Website

Overview

Crawlee is an open-source library for production-grade web crawling and browser automation. It provides unified interfaces for HTTP and headless browser crawlers, supports concurrency, proxy rotation, retries, and persistent queues.

Key Features

Multiple crawler types: high-performance HTTP crawlers and Playwright-based browser crawlers.
Async-first with type hints for improved developer experience and IDE support.
Built-in retries, proxy/session management, and request routing to reduce blocking.
Persistent storage options for datasets and key-value stores.

Use Cases

Large-scale web scraping for training data, RAG pipelines, or analytics.
JavaScript-heavy pages and user-interaction simulation (PlaywrightCrawler).
Running long-running crawlers on Apify platform or in self-hosted environments.

Technical Details

Python implementation that integrates with Playwright, BeautifulSoup and modern async libraries.
CLI templates and quickstart tools to bootstrap crawler projects.
Extensible storage backends and robust error handling for production deployments.

Crawlee

Overview

Key Features

Use Cases

Technical Details

Score Breakdown

Related Resources

json-render

UI/UX Pro Max Skill

aicodeprep-gui