Replay a 440GB WACZ?

Hi everyone,
I have been archiving a large Website using Browsertrix. It finished the job after 3 days, suc-cessfully creating a 440GB-sized WACZ file, created from 425 WARC-Files. I would like to open and review it somehow and tried using the Webrecorder Desktop-App but so far I failed loading it - it just gives me a blank screen when trying to load it (after about 5 Minutes).
I tried to upload it to a locally deployed Browsertrix-Cloud instance - but after the upload reaches 440 out 440 GB nothing else happens - it just gets stuck there.
Both Methods worked perfectly for a 14GB Test-WACZ of that Web-site generated with the same Configuration.
Any ideas how I could handle such a large file?

On our server we set the crawler to create a new WACZ file after crawling 10GB of content to avoid these issues. Multiple WACZs can then be loaded with a replay.json file in ReplayWeb.page.

If you open the WACZ files, extract the WARCs and piece them together using py-wacz into a bunch of smaller files, that may get around some of the issues here? As for exactly what they are, unsure.

I think Iā€™d file the uploading issue as a bug report?? If files that big are out of scope, we should impose a size limit.

Thanks for the reply - i tried your method of loading a replay.json - worked like a charm!

1 Like

Hooray! I hope to have this documented better in the coming weeks :slight_smile: