Set URL to load for public collection based on query param

So this seems like it should be easy but I can’t quite figure it out.

We have a public collection on Browsertrix for a site we want to decommission. When we decommission the site, we want to create an AWS lambda to redirect to the collection page and autoload the specific page in the collection. Ex: We get a request to https://old.domain/path/to/page and our lambda redirects it to the browsertrix collection page (Browsertrix) and when the page loads, the replay-web-page container will replay https://old.domain/path/to/page.

If we used the replay-web-page web component on our own site, we’d simply set the url to whatever the originally requested URL was. And there’s good embedding info on the collection page for how to embed the web component. But what I’m looking for is whether there’s a defined, permanent URL interface for doing this on the Browsertrix collection page.

When I play around with the collection, I do see the page URL change and it appears to follow the format of #url=url_safe_copy_of_url. Is that the URL format you use? Or is that just incidental and I’m totally misunderstanding it? :slight_smile: And if it’s the URL format can I consider that “stable”? Or should I expect that won’t be very permanent?

Any help would be greatly appreciated. Thanks.

Yes, the hashtag URL in the public collection link is designed to specify the URL, for example with: https://app.browsertrix.com/explore/usgov-archive/collections/global-change-research-program#url=https%3A%2F%2Fglobalchange.gov%2F&ts=20241116083045

Links to a version of https://globalchange.gov in the global-change-research-program collection on our usgov-archive account. By changing the hashtag, you can navigate to a different URL in the collection, so that should work.

But, if you’re trying to automatically redirect an entire domain, you might be interested in the web archive site mirror system for your use case. This allows you to make a fully static site that will load from a web archive on a new domain, like https://new.domain/path/to/page and have it be loaded from the same collection. Or, if you have access to https://old.domain, you can repurpose it to work as a web archive. The above link provides a starter page to do that.

https://globalchange.govarchive.us/ is an example of this for the gov archive site, loading the same data as the collection on Browsertrix.

Here are some slides about this system from a recent presentation.

1 Like

Thank you @ilya ! Thanks for the confirmation on the hashtag but an even bigger thank you for the tip on the mirror system. I did not realize there was a template for setting up a mirror system like that. I’ll talk to my client but that looks like it could be a great result.

I did have one question. Is there a particular reason globalchange.govarchive.us opens every internal link in a new window? Or was that just how the original website was designed?

Yes, that just comes from the original site. Other mirrors like https://usaid.govarchive.us/ open in the same tab.

1 Like

@ilya So I’m trying out the web archive site mirror system and I’m struggling a bit in that some links are being changed to the correct domain and others are stuck using the old domain. So in effect, the site is “breaking out” of the service worker. I don’t see that behavior on the govarchive.us sites. The only thing I changed was the init file:

init(
  "https://app.browsertrix.com/api/orgs/8d573a81-af6f-4cde-9fa7-d18d8c55d65a/collections/e14e300a-95c0-4e2d-a227-f2a6d6014a2d/public/replay.json",
"https://mnartists.walkerart.org");

I’m happy to debug it myself (I’m a senior software engineer after all :grinning_face: ) but I don’t even know where to start. Any suggestions?

@ilya I did a bit more debugging, if I navigate directly to a page all of the domains are properly replaced. If I navigate to a page, click a link to go to another page then all of the links are to the wrong domain. I don’t know if that helps debug what might be going on?

Thanks for the tip - there was an edge case bug in the replay system, should be fixed in latest release of wabac.js via proxy rewrite mode: fix element rewriting by ikreymer · Pull Request #329 · webrecorder/wabac.js · GitHub

The default template pulls the latest release, so it should work now (or try in an incognito window to ensure it loads latest version right away)

1 Like

Thanks a ton @ilya, that fixes it in incognito!

Just so I know in the future, how long does it take for a cached version of the service worker to be updated? I haven’t worked with service workers much so this is a bit new to me.

You can switch to a specific version in of wabac.js by refreshing https://cdn.jsdelivr.net/npm/@webrecorder/wabac/dist/sw.js locally, or, for production, you can pin to a specific release, eg: change your local sw.js to use the version:

importScripts(“https://cdn.jsdelivr.net/npm/@webrecorder/wabac@2.26.1/dist/sw.js");

That should update it right away.