Crawl not running according to scope

I ran a crawl with the Custom Page Prefix set to https://med.stanford.edu/whsdm.html and Extra URL Prefixes set to https://med.stanford.edu/whsdm/ and https://med.stanford.edu/content/sm/whsdm/.

However, I noticed that many out-of-scope pages were included in the crawl, such as https://med.stanford.edu/profiles and https://med.stanford.edu/health-care.html.

Any idea what might have gone wrong?


Hi Peter,

I’m able to reproduce this issue and found the underlying bug. I’ve created an issue for it here and will be submitting a fix for inclusion in an upcoming release shortly: [Bug]: Custom Page Prefix scope expands unexpectedly · Issue #2721 · webrecorder/browsertrix · GitHub.

Thanks for pointing this out!

1 Like