Error 500 (internal server error) while browsing in replay

Hi
I am doing my first test crawls and am running into a problem:

  • I display / review the crawl in replay
  • I click on sub-pages in the menu to access another page
  • and get an error “500 internal server error”
  • There is a display of “technical details”, reads like this

Error: Error in Route Query at Pa (http://localhost:5471/w/id-5b2107bb837a/:37a8eec1ce19687d132fe29051dca629d164e2c4958ba141d5f4133a33f0688f/20241128123521esm_/https://www.bs.ch/_nuxt/CMh-WLqc.js:22:7297) at _i (http://localhost:5471/w/id-5b2107bb837a/:37a8eec1ce19687d132fe29051dca629d164e2c4958ba141d5f4133a33f0688f/20241128123521esm_/https://www.bs.ch/_nuxt/CMh-WLqc.js:22:9809) at X (http://localhost:5471/w/id-5b2107bb837a/:37a8eec1ce19687d132fe29051dca629d164e2c4958ba141d5f4133a33f0688f/20241128123521esm_/https://www.bs.ch/_nuxt/CJ3gAP8Z.js:2:3875) at async setup (http://localhost:5471/w/id-5b2107bb837a/:37a8eec1ce19687d132fe29051dca629d164e2c4958ba141d5f4133a33f0688f/20241128123521esm_/https://www.bs.ch/_nuxt/BlLpqCQd.js:3:1484)

This happens in both the integrated replay app in browsertrix or the local app after WACZ-download.
The pages itself are there and archived correctly, I can access them through the URL menu of replay.

Any ideas?
Thanks for help, Oliver

Unfortunately, the issue is that this site navigation is not crawler friendly, as it behaves differently when a link is clicked in the page vs when a page is loaded directly via a URL.

There is no 500 error on the server, it is actually a 404, but the page frontend says its a 500 error. Looks like it performs a different graphql query on every link click - to speed up loading, which the crawler does not do, as it follows the links.

Here’s an example of the 404 (routeNodePage?..) but is displayed as a 500:

This is something that could be fixable with a custom behavior, which we are adding support for soon, but requires some custom work because of how the site is built. When you see this, if you refresh the page, it will load because the crawler loads the page directly.
We’ll consider if there’s any other way to address this,

Hi Ilya
Thanks for that quick response. There’s the workaround of re-loading, but still, for a public webarchive, usability is very limited. You mentioned custom behaviours as a solution: Do you have a timeframe when a fix for this could be tested, and used?

I will test the page with our heritrix instance but I guess it won’t behave any better. So for the moment, this site (main site of our state administration, brand-new) will not be archived. We can surely wait a few months.

Thanks for keeping me up to date.
Oliver

We are actually testing an version that will be able to click links, and using your site as one of the test cases, so far looks promising! Hope to make it available as ‘Beta’ crawler channel at some point soon.

Hi Oliver,

We deployed a test version of the crawler, available under the ‘Dev’ channel, and ran a quick test crawl on your account to check the links, it seems like it’s promising. Will reach out more over email.