I’ve made a crawl of a blog with embedded youtube videos, that were than missing in the archive. Therefor I extracted all youtube embed urls (https://www.youtube.com/embed/) from the logs to capture them again.
I found, that when giving the youtube embed url directly as a seed to browsertrix-crawler, the video could get archived. I see, that the behavior for Youtube is failing right now (Autoplay Behavior: YouTube Embed · Workflow runs · webrecorder/browsertrix-behaviors · GitHub), but I was still able to capture the youtube pages like that - as a functional replay in ReplayWeb.Page shows.
This leads to the replay issue…
Replay issue
The youtube captures are functional in ReplayWeb.page but not in pywb.
pywb (version 2.8.0)
ReplayWeb.Page (v2.3.4)
Browsertrix-Crawler capture (1.5.8, with warcio.js 2.4.3)
not playable
playable
ArchiveWeb.Page capture ( 0.14.2, using warcio.js 2.4.2)
playable
playable
I think it has something to do with the resource: https://www.youtube.com/youtubei/v1/player?prettyPrint=false. For those combinations of tools, were the video is playable, the response to this url request is 200. In the combination browsertrix-crawler & pywb, the response is 404.
I have also checked the resource (https://www.youtube.com/youtubei/v1/player?prettyPrint=false) in both warcs and indexes. And the pywb index of the browsertrix capture does not contain the hole POST Request Header and Payload - where the pywb index of the ArchiveWeb.Page Captures contains it.
…
If I can provide any more details, or if should check something else, please let me know.
I am also not 100% sure, if that is the right track.
Thank you very much in advance!
Hello, thank you very much for your fast response.
I’ve checked the client-side replay: I installed pywb from the source, and changed the config.yaml to
“client_side_replay: true”. But the video is still not playable.
I found the issue.
It is the index entry of the player resource. When adding the post request body to the url search key (first part of the index entry), the resource can be found and the video is playable.