Hello! This may be related to this topic, but I wasn’t sure, so I started a new thread:
I’m trying to crawl a set of pages referring to a config YAML file. Based on the how the set of pages is architected, I think I need to start with a seed, take two hops, but excluding all pages that do not start with one of two patterns. (Doing a domain type crawl will not work.)
I’m using these patterns under the exclude portion of the config file:
^pattern1.org.*
caret pattern2 dot org dot asterisk (Spelling it out bc I hit an error if I submit more than 2 "url"s.)
(Edit: I’ve tried various other versions of “not this.”)
These regexes work in a regex checker, tho I understand MML with regex! I’d hope that would exclude the capture of all pages not under one of two domains, but it’s not working that way.
Is the kind of crawl I’m hoping to do possible? Is there a problem with the regex?
Thanks!