Continued problems with warcit and wacz

I’ve finally gotten warcit to create warc files. The attached Goldenage warc seems to work fine loading the URLs, although no pages seem to have been defined. The WTICAlumni warc won’t even load any of the pages it finds. Also the resultant wacz doesn’t show anything even when I used a --detect-pages option. Since I can’t upload files here, a small 780K file is at https://www.wticalumni.com/warc/Warc-Wacz_Problems.zip
Thanks.

Hey! I’ve also run into this issue on a personal archiving project. The way that warcit creates records isn’t completely compatible with py-wacz or ReplayWebpage right now. The WARC file should work in ReplayWebpage if you manually navigate to the correct URL, but it won’t show up in the pages list.

Ultimately, I think warcit needs to be updated? Unfortunately this is a very low priority task for the team right now as we are focused on getting Browsertrix out the door!

I know we’ll get there eventually, but it may take us a little bit of time. :\

Thanks for the reply. At least it’s not me : ) I needed it for a test project of a very small wacz file, so had been hoping to get my warcs combined into a wacz. Need a wacz as I have a two website combined wacz that links two related websites… and there are calls to the other that work great… but don’t want people to have to download a 25g warc file!

I’m working with the state of Connecticut (USA) Digital Archive who are interested in hosting the wacz file, but had hoped to test with a small file first… so I gave them a small single file to test with, instead of a small combined file. We’ll see how the project goes.