I have small archive use case, I just want to archive a few pages now and then.
I installed pywb
and followed the docs to create a web archive like so:
wb-manager init my-web-archive
And then captured a page:
wayback --record --live -a --auto-interval 10
Navigate to http://localhost:8080/my-web-archive/record/https://github.com/webrecorder/pywb
I can replay fine with http://localhost:8080/my-web-archive/https://github.com/webrecorder/pywb
. I notice that http://localhost:8080/my-web-archive
is not able to show just a list of captured pages, I have to know the url beforehand. I guess I am meant to use ReplayWeb.Page.
I have two issues at this point. First, using the process above to capture any web page, if I load the warc.gz
that pywb
generates into ReplayWeb.Page, I see no “Pages” defined. (The message is No Pages are defined in this archive. The archive may be empty. Try browsing by URL.
) I have to browse by URL which shows all static resources.
The second issue is that every time I capture a new page (or maybe every time I stop/restart wayback
) I end up with a new warc.gz
. This has to be loaded into ReplayWeb.Page, and ultimately I cannot just view a list of all pages I have recorded, I have to pick a warc.gz
.
Is there some way to save all captures with pywb
into the same warc.gz
? Also, how can I give the warc a nice name that will appear in ReplayWeb.Page, like the “Temporary Collection” demo? As is, I just pick loaded warcs by filename.