Differences between WACZ- and WARC-file for X

Hi everyone,

We captured different x.com-accounts and noticed that we could replay the WACZ-file, but the WARC-file gives an error. Do you have any idea why the replay is different for WACZ- and WARC-files? For Instagram we don’t have this issue. It’s important to know when we want to index these archives in SolrWayback, because it makes use of WARC-files.

Thank you very much for your help!

Sophie Bossaert (ADVN)

Hi Sophie,

Were you using ReplayWeb.page to view the WACZ vs. WARC files? Or another replay system? One of the advantages of WACZ is that it bundles its own indexes and page lists to support easier replay. If you tried uploading just the WARC to RWP, I suspect the replay would be much better in a system like pywb if the WARC was also indexed. But more details would help confirm or point to other potential causes :slight_smile:

Hi Tessa,

Thank you for your respons. I was using ReplayWeb.page to replay the WACZ and WARC files. I haven’t installed pywb yet. Would it be of interest if I sent you the files?

Regards,

Sophie

Hi Sophie,

No need to install pywb, the point I was trying to convey is that the replay experience of a WARC by itself in ReplayWeb.page will be different than if it is added to a system like pywb or SolrWayback that takes advantage of indexes (as WACZs do by bundling them in the same file). I suspect the WARC replay will be replay in SolrWayback with indexes, though I haven’t used SolrWayback personally so am not sure exactly how well its replay system handles X.

If you’d like to send the files to tessa at webrecorder dot org, I’d be happy to take a look.

Hi Tessa,

Thank you for the information! I will send you the files today.

Regards,

Sophie