Wacz to Website

France · October 28, 2024, 1:43am

Hi everyone,

I’m a young French guy learning web dev and I really need some advices. I have a .wacz file, and I’m looking for a way to transform it into a full website, with a index.html file and all the assets (CSS, JS, images) already setup.

I’m working on this to better understand how web pages are build and how they work offline.

If anyone has some ideas or know tools that could help me, it would be very nice!

Thanks a lot in advance!

Hank · October 28, 2024, 4:00pm

Hey there (bonjour!) I think I can answer this, but there’s a few things to know first:

WACZ files store their data in WARC files. WACZs are fundamentally ZIP files with a different extension, and include indexing data to ensure that only the data that users request is fetched from the file using HTTP range requests. This index allows them to be embedded on the web much more efficiently than WARC files themselves.
WARC files are basically really really long text files. When you archive a website, every network request you make and recieve is saved to the WARC file. They are not simply a folder with a bunch of files in it. You can check this out for yourself by opening a WARC in a text editor, but I would recommend using one that is made for opening very large text files like glogg.
In order to interpret WARC or WACZ files, you’ll need a piece of software that can piece back together an interactive website out of the network requests and responses. This is the job of ReplayWeb.page, PYWB, or non-Webrecorder software such as Internet Archive’s Wayback Machine.

Now that the primer is out of the way, there’s two possible answers to your question.

You could embed the WACZ file with ReplayWeb.page and run it in replayonly mode with a configuration attribute. This will keep any captured server-side interactions intact but has a loading screen and may not behave exactly as the original website did with regards to travelling to links outside of the archive, etc.
You could try using Emma Dickson’s unwarcit command line tool to extract the files from the WACZ to a standard directory and serve it as you would any other static website. This would likely do what you ask for in your post, but it may not be the highest fidelity possible version of the archived content.

These two approaches have upsides and downsides. Try both and see what works best for your use case?