Webrecorder to save locally stored HTML pages

alphie · May 24, 2021, 11:40pm

Hey Webrecorder team,

I have an unusual use case that I’m curious if Webrecorder can be a solution to. We have a campus newsletter that is being retired, and that I’d like to preserve. It only exists in an email format which I can save locally, either as a PDF or in HTML.

When loading the locally stored HTML page, Webrecorder just says “Can’t record this page.” because, I’m assuming, it’s not actually downloading anything. My hope was that I’d be able to save this locally stored HTML as part of the WACZ file and use that as a jumping off point to preserve those URLs within the newsletter, to recreate a self-contained file for every weekly newsletter.

Is this not possible? Would anyone have any other solutions that might work towards my goal of creating a singular file a user could load up to view all preserved URLs?

Thanks in advance,

edsu · May 25, 2021, 1:33am

Assuming the diagnosis is correct perhaps a simple solution would be to put these files on a webserver somewhere so that they could be recorded? You could even run a local web server?

python -mhttp.server

alphie · May 25, 2021, 1:57am

Not sure if my reply from my email worked but it looks like this solution works well. Thank you!

raffaele · May 26, 2021, 2:23pm

check this:

warcit is a command-line tool to convert on-disk directories of web documents (commonly HTML, web assets and any other data files) into an ISO standard web archive (WARC) files.