Context:
I have webscrapers built with Puppeteer that include the following:
- Login
- Clicking buttons
- Querying a search engine within the site
I’m trying to use an offline archived version of the websites for regression testing.
- I only need to test static flows (e.g. Login with specific set of credentials, click buttons in specific order, query specific strings in search engine, ect.)
- Ideally, I want to be able to record a WACZ file for a running scraper
- The idea is to have separate WACZ files for tests 1, 2, 3 …
Problem: In my experience so far, ArchiveWeb.page (desktop & extension versions) fails to record login authentications. The authentication is the most important since the pages post-login check if the user is authenticated client-side and server-side (meaning login must be recorded for anything to be recorded).
How I’ve been using ArchiveWeb.page: Using the desktop app on windows 11, after a new page loads during the archiving, I wait for the status: “Idle, Continue Browsing.”
Questions
- Is it possible to record alongside my own running webscraper (not the browsertrix crawler)?
- Am I using the tool incorrectly?
- Is archiving the wrong approach? Any suggestions?