Skip to main content
DocPull app icon

DocPull

Web crawler for AI-ready Markdown

Async web scraping package that converts entire sites into clean, AI-ready Markdown. Built for feeding documentation, knowledge bases, and web content into LLMs and RAG pipelines.

Features

  • Async crawling with configurable concurrency
  • Converts HTML to clean, structured Markdown
  • MCP server support
  • Crawl profiles for different site types
  • Built-in caching layer
  • RAG-optimized output format

Why we built it

We needed a reliable way to turn documentation sites into Markdown that LLMs could actually use. Existing scrapers produced messy output full of nav chrome and broken formatting. DocPull strips it down to the content that matters.