We’re using Replay components to archive some multi-page projects, like this one:
The 500s are mostly CSVs render paths, a known bug in the original app; extraPages.jsonl has 818 of them. But the search features can’t reliably help get that number on its own, despite indicating it should be able to do so:
The Pages searchcaps at 25 no matter what; replayweb.page struggles with this, too, on any uploaded WACZ
The Resources search can’t seem to match in-HTML or URL patterns, not without hijacking the Replay component’s menu bar with a browser-level query like search://query=including&view=resources&currMime=text/html,text/xhtml&urlSearchType=contains (and a csv query = 0, anyway)
Which are bugs and which are under-documented features? It seems like the Replay component is essentially querying against /pages/ data, without needing a CLI or a multi-step Sheets import. So I hope to be able to use it to train digital friendly non-coders how to QA these crawls.
Thanks for reaching out - excited to see archives used in ProPublica!
Yes, this is likely a bug - it should be possible to see the total number of pages and to scroll beyond 25 - 25 was just the initial view for faster rendering, but perhaps something ended up being broken.
This is probably a bug as well - the resource search view probably needs an improvement, one area we haven’t had an opportunity to update recently. But deepLink should allow linking to that particular view, so it’s likely a fixable bug.
Thanks for reporting these issues - if you don’t mind opening an issue on GitHub, that would be easier to track, or we can as well!