I have a site that uses two different URL syntaxes for the same page. In the web archive crawled a few years ago, only one syntax option was crawled so that depending on which link is used on a page, it sometimes resolves and sometimes shows a “this page is not part of the archive” message. Is there a way to set up the replay so that, say, any link/URL with “x” structure redirects to “y”?
An example would be “This text will be hiddenhttp://constructingthesacred.supdigital.org/cts/exploring-an-ancient-site-with-a-3d-model?path=introduction” should redirect to “Exploring an Ancient Site with a 3D Model”, essentially just deleting the string “?path=introduction”. (The word “introduction” is something that would be different depending on the area of the site the page is in, but the idea is the same across the entire archive.)
FWIW, this is a Scalar site. Also, assume recrawling is not an option. Public web archive can be seen at Constructing the Sacred | Archive
I think this is an unsolved problem with web archive curation? A tool to create WARC records to generate redirects post-archiving seems possible but to my knowledge no such tool exists. I can say with certainty that this isn’t on our roadmap at the moment, but I would love to see it happen!
If you do find a solution that does what I mention above, please do send it over!
I’m not familiar with the details of this syntax, or what it does, but I can’t help noticing the terms “path” and “[?]”, which are both items related to this issue. Is there something I can tweak or add here that will initiate the kind of archive-internal redirect I’m looking for?
Wondering if anyone can help identifying what the first pair in the “Fuzzy” line should be doing in the replay. Should it be changing the URLS with query parameters to URLS without those query parameters? I believe this might have worked on a previous version or replay but appears to be no longer supported. Is that the case?
Hi Jasmine,
Yes, this is a sort of undocumented feature that I implemented once for Scalar use case We should probably fix and fully document support this, but it requires custom configuration at replay. In theory, it should still work, but haven’t tested it in a long time.
Is it no longer working on Constructing the Sacred with the latest replay? Or for a different Scalar site?
It is not currently working in Constructing the Sacred or in When Melodies Gather (probably not in any of the other 5 Scalar projects either, but I haven’t checked those in a while). I figured something had changed in an update that no longer supported it. If that functionality could be restored, it would ensure all those publications are truly archived again! We’re trying to roll over to archived versions of those two projects as the live publications are degrading, but we can’t as long as so many of the pages aren’t displaying. And sadly, I’m unable to re-crawl Melodies to just create a new version altogether.