Youtube Videos captured with Browsertrix-Crawler not playable in Pywb

I’ve made a crawl of a blog with embedded youtube videos, that were than missing in the archive. Therefor I extracted all youtube embed urls (https://www.youtube.com/embed/) from the logs to capture them again.

I found, that when giving the youtube embed url directly as a seed to browsertrix-crawler, the video could get archived. I see, that the behavior for Youtube is failing right now (Autoplay Behavior: YouTube Embed · Workflow runs · webrecorder/browsertrix-behaviors · GitHub), but I was still able to capture the youtube pages like that - as a functional replay in ReplayWeb.Page shows.

This leads to the replay issue…

Replay issue

The youtube captures are functional in ReplayWeb.page but not in pywb.

pywb (version 2.8.0) ReplayWeb.Page (v2.3.4)
Browsertrix-Crawler capture (1.5.8, with warcio.js 2.4.3) not playable playable
ArchiveWeb.Page capture ( 0.14.2, using warcio.js 2.4.2) playable playable

I think it has something to do with the resource: https://www.youtube.com/youtubei/v1/player?prettyPrint=false. For those combinations of tools, were the video is playable, the response to this url request is 200. In the combination browsertrix-crawler & pywb, the response is 404.

Browsertrix Capture

pywb replay: https://webarchives.rhizome.org/youtube_embeds_5_1741774579/20250312101726/https://www.youtube.com/embed/n7ky-nuw-us
zipped pywb collection: https://monaulrich.online/web_archives/youtube_embeds_5_1741774579.zip

ArchiveWeb.page Capture

downloaded wacz from AWP: https://monaulrich.online/web_archives/youtube_embeds_5_awp.wacz
pywb replay (reindexed): https://webarchives.rhizome.org/youtube_embeds_5_awp/20250312101726/https://www.youtube.com/embed/n7ky-nuw-us
zipped pywb collection: https://monaulrich.online/web_archives/youtube_embeds_5_awp.zip

Index & WARCs Checks

I have also checked the resource (https://www.youtube.com/youtubei/v1/player?prettyPrint=false) in both warcs and indexes. And the pywb index of the browsertrix capture does not contain the hole POST Request Header and Payload - where the pywb index of the ArchiveWeb.Page Captures contains it.

If I can provide any more details, or if should check something else, please let me know.
I am also not 100% sure, if that is the right track.
Thank you very much in advance!

1 Like

I wonder if client side replay in pywb will fix this?