Q: Limit to 600 page crawls per day?

As a crawler operator I want to be respectful of crawling limits and the operators of the websites I’m trying to crawl. I found a new way (to me) of limiting which is pages per day. What is the best way to achieve this?

Would it be setting the page delay to ~ 150s to roughly limit it just under 600 pages per 24 hour rolling period?

Interesting. We don’t have that as a feature currently, though it seems somewhat related to this issue: [Feature]: Only Archive New URLs · Issue #1372 · webrecorder/browsertrix-cloud · GitHub

Right now I think you’ve come up with a good workaround! That will cause the crawl to take a long time but perhaps not an issue for you as you run your own infrastructure?