I’ve been struggling in crawling a wordpress with a crappy theme producing an endless loop of links, with many identical path segments.
What could be a preferred way to implement a feature like
PathologicalPathDecideRule of Heritrix in Browsertrix? A js behaviour?
PathologicalPathDecideRule Rule REJECTs any URI which contains an excessive number of identical, consecutive path-segments (eg http://example.com/a/a/a/boo.html == 3 ‘/a’ segments)