Web Crawler
Crawl websites, documentation sites, and help centers starting from one or more seed URLs.Crawl Scope
| Scope | Allows | Blocks |
|---|---|---|
| Same Domain | All subdomains under the root domain | Other domains |
| Same hostname | Exact subdomain only | Other subdomains, other ports |
| Same Origin | Exact hostname, protocol, and port | Everything else |
Render Mode
| Mode | Use when |
|---|---|
| Automatic | Default. Decides per page whether to render JavaScript |
| JavaScript | Single-page apps and dynamic content |
| No JavaScript | Static sites (faster) |
URL Controls
- URL Limit. Maximum pages to crawl (1–20,000). Start small (50–100) to verify your config, then increase.
- Concurrency Limit. Simultaneous requests (1–50). Use lower values to avoid overloading the target site.
- Query Aware. Treat URLs with different query strings as separate pages.
- Fragment Aware. Treat URL fragments (#section) as separate pages.