How to Modify WACZ Files

I would like to modify WACZ files programmatically to remove certain URLs to anonymize dumps and to remove data irrelevant to the dump.

It doesn’t look like there’s any libraries that exist to modify WACZ files. The only solution I can think of is to extract a WARC file from a WACZ and then manually modify the WARC file before converting it back.

The only solution I can think of is to extract a WARC file from a WACZ and then manually modify the WARC file before converting it back.

This would indeed be the way to go. Would recommend checking out warcio if you haven’t already.