I have small archive use case, I just want to archive a few pages now and then.
pywb and followed the docs to create a web archive like so:
wb-manager init my-web-archive
And then captured a page:
wayback --record --live -a --auto-interval 10
I can replay fine with
http://localhost:8080/my-web-archive/https://github.com/webrecorder/pywb. I notice that
http://localhost:8080/my-web-archive is not able to show just a list of captured pages, I have to know the url beforehand. I guess I am meant to use ReplayWeb.Page.
I have two issues at this point. First, using the process above to capture any web page, if I load the
pywb generates into ReplayWeb.Page, I see no “Pages” defined. (The message is
No Pages are defined in this archive. The archive may be empty. Try browsing by URL.) I have to browse by URL which shows all static resources.
The second issue is that every time I capture a new page (or maybe every time I stop/restart
wayback) I end up with a new
warc.gz. This has to be loaded into ReplayWeb.Page, and ultimately I cannot just view a list of all pages I have recorded, I have to pick a
Is there some way to save all captures with
pywb into the same
warc.gz? Also, how can I give the warc a nice name that will appear in ReplayWeb.Page, like the “Temporary Collection” demo? As is, I just pick loaded warcs by filename.