Webscaper Regression Testing

Context:
I have webscrapers built with Puppeteer that include the following:

  1. Login
  2. Clicking buttons
  3. Querying a search engine within the site

I’m trying to use an offline archived version of the websites for regression testing.

  1. I only need to test static flows (e.g. Login with specific set of credentials, click buttons in specific order, query specific strings in search engine, ect.)
  2. Ideally, I want to be able to record a WACZ file for a running scraper
  3. The idea is to have separate WACZ files for tests 1, 2, 3 …

Problem: In my experience so far, ArchiveWeb.page (desktop & extension versions) fails to record login authentications. The authentication is the most important since the pages post-login check if the user is authenticated client-side and server-side (meaning login must be recorded for anything to be recorded).

How I’ve been using ArchiveWeb.page: Using the desktop app on windows 11, after a new page loads during the archiving, I wait for the status: “Idle, Continue Browsing.”

Questions

  1. Is it possible to record alongside my own running webscraper (not the browsertrix crawler)?
  2. Am I using the tool incorrectly?
  3. Is archiving the wrong approach? Any suggestions?

Do you have the “Archive Cookies” and “Archive local storage” options checked in ArchiveWeb.page? They are generally needed to be able to archive logins, usually.

In general, we sort of aim to archive a site after login, not the actual login process, since we want to avoid archiving credentials, but is possible, depending on the site. It’s very hard to say what’s going wrong without looking at the exact example.