Browsertrix has been very impressive. There were a few pages that weren’t indexed. I can’t tell why from the logs, but let’s just assume it was a hiccup. Is there a way to run the crawl again but only add new or modified resources? I see if I go to the Workflow I can hit “Run Crawl” but was afraid doing so may add duplicates of all my pages even if they weren’t changed, doubling the size of my WACZ.
Running the crawl again will create a new archived item with its own WACZ files. It will likely result in capturing a lot of duplicate content.
Is there a way to run the crawl again but only add new or modified resources?
At present, no. Browsertrix does not deduplicate content.