PathologicalPathDecideRule on Browsertrix

I’ve been struggling in crawling a wordpress with a crappy theme producing an endless loop of links, with many identical path segments.

What could be a preferred way to implement a feature like PathologicalPathDecideRule of Heritrix in Browsertrix? A js behaviour?

PathologicalPathDecideRule

Rule REJECTs any URI which contains an excessive number of identical, 
consecutive path-segments 
(eg http://example.com/a/a/a/boo.html == 3 ‘/a’ segments)

https://heritrix.readthedocs.io/en/latest/bean-reference.html?highlight=PathologicalPathDecideRule#pathologicalpathdeciderule

1 Like