Archiving a Wix blog

Hello all,
I’ve recently experienced some difficulties capturing a blog/simple website which was built using the Wix platform.

After some troubleshooting and re-testing capture several times, I’ve created a satisfactory capture by using a blend of the Webrecorder Desktop app and Rhizome’s hosted service Conifer via a Firefox browser*.

But one persistent issue remains: I am unable to capture the header strip image. While it loads in capture (after a small delay), in playback it renders as a blank, plain grey banner. This playback issue is reproduced across Webrecorder Desktop, and both the ReplayWeb.Page desktop app, + https://replayweb.page/ in the browser, as well as when I load my warc into Conifer.

Wondered if anyone else has attempted to capture a Wix site in the past? Have you experienced anything similar? Any thoughts on what this blockage might be, or how to get around it?

With thanks for your thoughts/ideas! Anisa

Hi Anisa,
I’ve definitely been able to archive Wix sites in general, so perhaps there’s an issue with this particular site? Could you share the site and/or WARC file for it? Hopefully something that’s a simple fix.

1 Like

Thank you, @ilya. The site is: https://www.projetomediojurua.org/. Conifer keeps crashing today… Maybe some back-end work taking place… I will share my warc with you asap! A.

1 Like

Thanks for sharing the WARC! It turns out the main issue was that Wix loaded different versions of an image based on the display width. This was done using custom JS, and not using <img srcset> (which would be more standard).

For example, it was loading at least 3 different widths (and probably more):

  • https://static.wixstatic.com/media/094d02_ef76640c8586410282c65c4aaabd8014~mv2.jpg/v1/fill/w_980,h_278,al_c,q_85,usm_0.66_1.00_0.01/094d02_ef76640c8586410282c65c4aaabd8014~mv2.webp
  • https://static.wixstatic.com/media/094d02_ef76640c8586410282c65c4aaabd8014~mv2.jpg/v1/fill/w_1079,h_278,al_c,q_85,usm_0.66_1.00_0.01/094d02_ef76640c8586410282c65c4aaabd8014~mv2.webp
  • https://static.wixstatic.com/media/094d02_ef76640c8586410282c65c4aaabd8014~mv2.jpg/v1/fill/w_1184,h_278,al_c,q_85,usm_0.66_1.00_0.01/094d02_ef76640c8586410282c65c4aaabd8014~mv2.webp

Fortunately, it wasn’t too bad to add a custom fuzzy matching rule that looks for the beginning of the url, eg. https://static.wixstatic.com/media/094d02_ef76640c8586410282c65c4aaabd8014~mv2.jpg and matches that.

This fix is now in latest ReplayWeb.page release and in ReplayWeb.page App 1.1.1 and will also be in the next pywb release.

2 Likes

Hello there @Ilya. I really appreciate your support with this! All is looking great via ReplayWeb.page, except that I am still experiencing an issue where the PDFs within the site do not load consistently. Someone else in my network mentioned that she too was having difficulties with capturing/replaying PDFs (she was working with Conifer). Let me know if you have any ideas about what might be causing this or how to get around it?

No problem! I think the PDF issue is actually due to a bug in Chrome, looking at Firefox browsers and the next Chrome Beta, it appears to be fixed! So hopefully that should be addressed with the next release of Chrome.

1 Like

Excellent! Fingers crossed!

I’ve not had any luck in Chrome Beta (I’ve got Version 87.0.4280.40 (x86_64)).

The PDFs do load fully in Firefox (v.82.0.2) via Conifer. For some reason I’m not able to load my warc into ReplayWeb.Page on Firefox at the moment, loading stalls at 1%…

I’ll keep my eye out for the next Chrome release, re-test and let you know.

Visited the forum to ask about custom URL fuzzy matching and came across this post, which happens to be the exact reason I was asking about fuzzy matching! The fix didn’t work in my case because there is a variation with Wix images where “fill” is replaced by “fit” e.g.
https://static.wixstatic.com/media/nsplsh_5033504669385448625573~mv2_d_6000_3376_s_4_2.jpg/v1/fit/w_6000,h_3376,q_90/file.jpg
I think that it could be handled by changing fill in the match rule to (fill|fit) but I’m not a confident regexer. Anyway, thought it might be useful context for this thread.