Hello!
I’m wondering if people have been able to successfully upload WARC files created in Webrecorder (using the chrome plug in) into Archive-It and get it to display correctly in Wayback? I’ve only tried with downloaded WARC 1.1 files and not WARC 1.0 files with out success. Any workflow suggestions would be greatly appreciated!
Hi @Hwang while I don’t personally know about anyone having done this. I would be interested to hear what went wrong and if you were able to get any feedback from people at Archive-It.
Hi @edsu I’ve been using Webrecord’s plug-in to crawl sites that Archive-it has issues crawling: mainly Wix, Instagram, and Facebook thus far. The Archiveweb.page sessions replay successfully in Replayweb.page but when I upload the WARC files into Archive-it the crawls don’t display fully in Wayback e.g Instagram tiles don’t load, unable to scroll down a feed in Facebook.
As for Archive-It’s response they didn’t go into any details about the replay issue. It seems like WARC 1.0 is supported (no explanation) and the only way to suppress an uploaded WARC from the seed’s wayback calendar is to submit a request ticket.
I have yet to test scheduled crawls with Archive-It after uploading a WARC file generated with Archiveweb.page… to be explored!
Hi @Hwang thanks for the extra details. I’m fairly sure that Archive-It doesn’t use the ReplayWebPage component in their interface. So it’s quite possible that they are missing some functionality in their replay environment that allows the browser to fetch resources from the web archive correctly. Is there a public URL for your Archive-It content?
Hi @edsu the collection is currently private, but once I have the greenlight I’ll loop back with a link. Thanks for your interest in this!