Youtube Videos captured with Browsertrix-Crawler not playable in Pywb

I’ve made a crawl of a blog with embedded youtube videos, that were than missing in the archive. Therefor I extracted all youtube embed urls (https://www.youtube.com/embed/) from the logs to capture them again.

I found, that when giving the youtube embed url directly as a seed to browsertrix-crawler, the video could get archived. I see, that the behavior for Youtube is failing right now (Autoplay Behavior: YouTube Embed · Workflow runs · webrecorder/browsertrix-behaviors · GitHub), but I was still able to capture the youtube pages like that - as a functional replay in ReplayWeb.Page shows.

This leads to the replay issue…

Replay issue

The youtube captures are functional in ReplayWeb.page but not in pywb.

pywb (version 2.8.0) ReplayWeb.Page (v2.3.4)
Browsertrix-Crawler capture (1.5.8, with warcio.js 2.4.3) not playable playable
ArchiveWeb.Page capture ( 0.14.2, using warcio.js 2.4.2) playable playable

I think it has something to do with the resource: https://www.youtube.com/youtubei/v1/player?prettyPrint=false. For those combinations of tools, were the video is playable, the response to this url request is 200. In the combination browsertrix-crawler & pywb, the response is 404.

Browsertrix Capture

pywb replay: https://webarchives.rhizome.org/youtube_embeds_5_1741774579/20250312101726/https://www.youtube.com/embed/n7ky-nuw-us
zipped pywb collection: https://monaulrich.online/web_archives/youtube_embeds_5_1741774579.zip

ArchiveWeb.page Capture

downloaded wacz from AWP: https://monaulrich.online/web_archives/youtube_embeds_5_awp.wacz
pywb replay (reindexed): https://webarchives.rhizome.org/youtube_embeds_5_awp/20250312101726/https://www.youtube.com/embed/n7ky-nuw-us
zipped pywb collection: https://monaulrich.online/web_archives/youtube_embeds_5_awp.zip

Index & WARCs Checks

I have also checked the resource (https://www.youtube.com/youtubei/v1/player?prettyPrint=false) in both warcs and indexes. And the pywb index of the browsertrix capture does not contain the hole POST Request Header and Payload - where the pywb index of the ArchiveWeb.Page Captures contains it.

If I can provide any more details, or if should check something else, please let me know.
I am also not 100% sure, if that is the right track.
Thank you very much in advance!

1 Like

I wonder if client side replay in pywb will fix this?

Hello, thank you very much for your fast response.

I’ve checked the client-side replay: I installed pywb from the source, and changed the config.yaml to
“client_side_replay: true”. But the video is still not playable.

I found the issue.
It is the index entry of the player resource. When adding the post request body to the url search key (first part of the index entry), the resource can be found and the video is playable.

player resource in index and warc

fixed index entry: btc_index_entry_player_resource_fixed.txt

GitHub Issue: Post Request Body missing in index entry · Issue #941 · webrecorder/pywb · GitHub