Dealing with POSTs to URLs in data attributes - in crawl, or in playback?

i’m trying to use Browsertrix Crawler to crawl a Spotlight-based site, which includes URLs in data attributes to which a POST request is being sent. these requests are used to track clicks for the browser session to help navigate search results (e.g., an item display page can then have “next” and “previous” links to help the user navigate through a result set).
even if i’ve configured Browsertrix Crawler appropriately (adding something like a[data-context-href]->@data-context-href to selectLinks), i’m seeing errors in the logs like the following:

{"timestamp":"2025-02-12T21:52:19.997Z","logLevel":"warn","context":"general","message":"Invalid Page - not a valid URL","details":{"url":"/catalog/34-1805/track?counter=-1","page":"https://exhibits.lib.berkeley.edu/spotlight/weichafe/feature/the-funeral-of-weichafe","workerid":2}}

generally speaking, the user-facing search functionality will otherwise not work, which is an acceptable compromise for crawling the site. however, on playback, there’s a JavaScript-based event that fires that makes the POST request to this endpoint, which then is supposed to redirect to the item page. given that the POST request isn’t crawled, playback for the item page fails since the JS event fires. (this is similar to an issue that Thib at LOCKSS mentioned in this CNI presentation (see pages 12-14).

what’s the best option how to proceed? i am imagining one of the following options:

  1. ignore the event-based JavaScript in playback, and ensure the tracking requests requests don’t get crawled.
  2. rewrite the data attribute during playback to be something the JS won’t recognize.
  3. rewrite the data attribute during crawling. as best as i can tell, that’s not really possible.
  4. figure out how to include the tracking URLs in the crawl. this does not seem preferable for the reasons listed above.
  5. rewrite the data attribute pre-crawl by modifying the application. this is also less desirable because we’d like to sunset this application.

thanks for any suggestions - i know this not just a browsertrix-crawler question! :slight_smile: