Hi all,
I’m a new user of the Browsertrix Crawler project - and it is incredibly impressive!
I’m wondering if there is a way to retry errors that occur during the crawl? For example, during a recent crawl of a website with 815 pages, I got one error:
Load timeout for https://www.examplewebsite.com/fatigued-driving.html TimeoutError: Navigation timeout of 90000 ms exceeded
at /app/node_modules/puppeteer-core/lib/cjs/puppeteer/common/LifecycleWatcher.js:106:111
URL Load Failed: https://www.examplewebsite.com/fatigued-driving.html, Reason: Error: Timeout hit: 180000
There’s no issue with the page itself. I’m able to access it; it loads in a reasonable amount of time. For whatever reason though, the crawler’s request timed out. So, that leads me to two questions actually:
- Is there a mechanism for retrying pages?
- If not, is there a way to update an archive? That is, to recrawl failed pages separately and then add them to the archive?
Any guidance would be appreciated, and thanks for the amazing project!