Export a site out from a WACZ archive

wwahammy · October 8, 2025, 5:57pm

So I’m trying to export a site out from a WACZ archive captured from browsertrix. As background, I have a website that is going to be shutdown, it has images and scripts from all kinds of domains. Browsertrix is perfect for that. The problem is we don’t want want it to stay in a WACZ archive or be loaded inside a Replay Webpage page from a WACZ archive: we want to export everything out onto a static html site where everything is on the same domain. Since the page doesn’t really load any content via an AJAX request, think this should be workable but… is there a way to do that with a WACZ archive from Browsertrix?

I could just manually try to rewrite all of the references to all of the scripts but I figured if I could use Browsertrix for that, it’d be ideal. Any ideas?

ilya · October 10, 2025, 5:46pm

We don’t have a generic way of extracting data in this way, unwarcit that can do this with WARC files, but the main issue is that there’s not a good way to do it - even without ajax requests, the resources from other domains won’t work, or there may be javascript that makes certain assumptions, so there’s no good way to guarantee it in general.

But, we may have just the solution that does what you want!
You should could try is using: GitHub - webrecorder/web-archive-site-mirror
This will allow you to host the website from a WACZ file, including on GitHub pages with your own domain.
You just need to fill in the details in the init.js and put the WACZ file somewhere, possibly on GitHub too if its small enough.

Here’s a concrete example: GitHub - webrecorder/infinite-ulysses-replay
with the site hosted at: https://infiniteulysses.webrecorder.net/

Hope this helps!