I ran a crawl with the Custom Page Prefix set to https://med.stanford.edu/whsdm.html
and Extra URL Prefixes set to https://med.stanford.edu/whsdm/
and https://med.stanford.edu/content/sm/whsdm/
.
However, I noticed that many out-of-scope pages were included in the crawl, such as https://med.stanford.edu/profiles
and https://med.stanford.edu/health-care.html
.
Any idea what might have gone wrong?