I have a site that uses two different URL syntaxes for the same page. In the web archive crawled a few years ago, only one syntax option was crawled so that depending on which link is used on a page, it sometimes resolves and sometimes shows a “this page is not part of the archive” message. Is there a way to set up the replay so that, say, any link/URL with “x” structure redirects to “y”?
An example would be “This text will be hiddenhttp://constructingthesacred.supdigital.org/cts/exploring-an-ancient-site-with-a-3d-model?path=introduction” should redirect to “Exploring an Ancient Site with a 3D Model”, essentially just deleting the string “?path=introduction”. (The word “introduction” is something that would be different depending on the area of the site the page is in, but the idea is the same across the entire archive.)
FWIW, this is a Scalar site. Also, assume recrawling is not an option. Public web archive can be seen at Constructing the Sacred | Archive
I think this is an unsolved problem with web archive curation? A tool to create WARC records to generate redirects post-archiving seems possible but to my knowledge no such tool exists. I can say with certainty that this isn’t on our roadmap at the moment, but I would love to see it happen!
If you do find a solution that does what I mention above, please do send it over!
I’m not familiar with the details of this syntax, or what it does, but I can’t help noticing the terms “path” and “[?]”, which are both items related to this issue. Is there something I can tweak or add here that will initiate the kind of archive-internal redirect I’m looking for?