Known bug(s)?: Replay component Pages-search capped at 25; Resources search very limited

We’re using Replay components to archive some multi-page projects, like this one:

The 500s are mostly CSVs render paths, a known bug in the original app; extraPages.jsonl has 818 of them. But the search features can’t reliably help get that number on its own, despite indicating it should be able to do so:

  • The Pages search caps at 25 no matter what; replayweb.page struggles with this, too, on any uploaded WACZ
  • The Resources search can’t seem to match in-HTML or URL patterns, not without hijacking the Replay component’s menu bar with a browser-level query like search://query=including&view=resources&currMime=text/html,text/xhtml&urlSearchType=contains (and a csv query = 0, anyway)

Which are bugs and which are under-documented features? It seems like the Replay component is essentially querying against /pages/ data, without needing a CLI or a multi-step Sheets import. So I hope to be able to use it to train digital friendly non-coders how to QA these crawls.

Hi,

Thanks for reaching out - excited to see archives used in ProPublica!

Yes, this is likely a bug - it should be possible to see the total number of pages and to scroll beyond 25 - 25 was just the initial view for faster rendering, but perhaps something ended up being broken.

This is probably a bug as well - the resource search view probably needs an improvement, one area we haven’t had an opportunity to update recently. But deepLink should allow linking to that particular view, so it’s likely a fixable bug.

Thanks for reporting these issues - if you don’t mind opening an issue on GitHub, that would be easier to track, or we can as well!

Will do, thank you for the quick reply.

GitHub · Where software is built, correct?

This topic was automatically closed after 15 days. New replies are no longer allowed.