Replaying one capture that is broken down into many WARCs

Hi all,

I was wondering if it is the case that Replay.web and Archive.web do not replay captures that comprise of many different WARCs or WARC.GZ files. Basically if I click on a link that is not part of the current file that is being replayed, I am getting a Resource not found type of prompt.
I know that by using a WACZ and bundling them all this replay issue is solved, but I would also like to know if there is any way to do this without using the WACZ.

Of course, I am supper happy WACZ exists and I hope it becomes the standard, but from a long-term archiving point of view, I thought it would be good to make sure that the websites captured can be replayed in their entirety without relying only on a specific format for this!

Thanks for any responses!

2 Likes

It’s not possible yet, as we’d need additional UI to support adding/removing WARC files into a single collection, but something that can be supported with additional UI improvements, or with a manifest file.

Of course, the issue with multiple WARC files is that they all need to be read fully, which may take a long time, and WACZ offers a solution for that.

A quick solution is to combine all WARC files into one, which can be done via command-line, for example:
cat *.warc.gz > ./combined/all.warc.gz

Will let you know if/when this feature is added!

2 Likes