Trouble recording pages in wayback machine with ArchiveWeb extension

Hi - I am creating a collection that requires recording pages that are only accessible in the Wayback Machine. When I try with the browser extension, the recording only shows the Wayback Machine banner and a blank page. I understand there’s a workaround for this?
Thanks for your suggestions/help!
Cate

Hey, sorry this didn’t get looked at earlier! Generally archiving an already archived website with web archiving tools presents a bunch of challenges… Most of which are sort of out of scope for us to fix within ArchiveWeb.page or Browsertrix, but also can also be solved in other ways with different tools!

To the best of my knowedge, Internet Archive doesn’t allow just anyone to download original WARC files from the Wayback Machine. If the site was archived by you through an ArchiveIt subscription see their help documentation for downloading your data. If the site happened to be archived by ArchiveTeam, you can sometimes get the WARC files from them, check the domain here.


This isn’t Webrecorder software so we can’t provide support for it, and is also unsupported by Internet Archive, but you may wish to check out the Wayback Machine Downloader command line tool if you’re familliar with command line applications. This tool downloads original files and not ones that have re-written URLs as viewed in Internet Archive’s archive viewer. These can then be re-packaged as WARC files with warcit, another command line tool that Webrecorder does make.

There’s a reasonable path to doing what you want, but unfortunately I don’t have a good solution if you aren’t yet confident with using command line interfaces. :pensive:

1 Like

We do have an experimental tool for doing that, but we haven’t updated it recently - it’s possible, just hasn’t been our focus - there’s a special way that data needs to be retrieved such that the original page is archived.

Here’s an example: https://express.archiveweb.page/#1996/http://www.geocities.com/

You can try entering the /#<timestamp>/<url> in that format above after or you can go to https://express.archiveweb.page/ and enter it via the UI.

Unfortunately, we don’t have time to update this functionality, but may be able to help with a specific request, depending on how big it is…

2 Likes