This custom vimeo behavior was designed to work on embedded urls like https://player.vimeo.com/video/26405583, and not on the pages where the videos are embedded. For those cases, it would probably need to be adjusted.
It reads the playerConfig to get the playlist.json. It extracts the urls for the audio and video segments from the playlist.json to request them. It also extracts relevant files from the playerConfig.
What it does in detail …
1. Click play
The script clicks the play button, so the playlist.json file gets loaded, where the segment audio and video urls are contained. (This was from an old state of the script, the url of the playlist.json gets now extracted from the playerConfig. So maybe its not necessary anymore to click the play button.)
2. Reads PlayerConfig
It reads the playerConfig, that is inside a script tag in the html file. It extracts [1] relevant urls like player.js and [2] the playlist.json.
[1] Those URLs are requested
“urls”: {
“js”: “https://f.vimeocdn.com/p/4.40.43/js/player.js”,
“js_base”: “https://f.vimeocdn.com/p/4.40.43/js”,
“js_module”: “https://f.vimeocdn.com/p/4.40.43/js/player.module.js”,
“js_vendor_module”: “https://f.vimeocdn.com/p/4.40.43/js/vendor.module.js”,
“locales_js”: {
“de-DE”: “https://f.vimeocdn.com/p/4.40.43/js/player.de-DE.js”,
“en”: “https://f.vimeocdn.com/p/4.40.43/js/player.js”,
“es”: “https://f.vimeocdn.com/p/4.40.43/js/player.es.js”,
“fr-FR”: “https://f.vimeocdn.com/p/4.40.43/js/player.fr-FR.js”,
“ja-JP”: “https://f.vimeocdn.com/p/4.40.43/js/player.ja-JP.js”,
“ko-KR”: “https://f.vimeocdn.com/p/4.40.43/js/player.ko-KR.js”,
“pt-BR”: “https://f.vimeocdn.com/p/4.40.43/js/player.pt-BR.js”,
“zh-CN”: “https://f.vimeocdn.com/p/4.40.43/js/player.zh-CN.js”
},
“ambisonics_js”: “https://f.vimeocdn.com/p/external/ambisonics.min.js”,
“barebone_js”: “https://f.vimeocdn.com/p/4.40.43/js/barebone.js”,
“chromeless_js”: “https://f.vimeocdn.com/p/4.40.43/js/chromeless.js”,
“three_js”: “https://f.vimeocdn.com/p/external/three.rvimeo.min.js”,
“vuid_js”: “https://f.vimeocdn.com/js_opt/modules/utils/vuid.min.js”,
“hive_sdk”: “https://f.vimeocdn.com/p/external/hive-sdk.js”,
“hive_interceptor”: “https://f.vimeocdn.com/p/external/hive-interceptor.js”,
“proxy”: “https://player.vimeo.com/static/proxy.html”,
“css”: “https://f.vimeocdn.com/p/4.40.43/css/player.css”,
“chromeless_css”: “https://f.vimeocdn.com/p/4.40.43/css/chromeless.css”,
“fresnel”: “https://arclight.vimeo.com/add/player-stats”,
“player_telemetry_url”: “https://arclight.vimeo.com/player-events”,
“telemetry_base”: “https://lensflare.vimeo.com”,
“fresnel_manifest_url”: “https://lensflare.vimeo.com/add/playback_manifest”,
“fresnel_chunk_url”: “https://lensflare.vimeo.com/add/chunk_downloads”,
“test_imp”: “https://lensflare.vimeo.com/add/player-test-impression”
},
3. Build Segment URLs
[2] → It than parses the playlist.json to extract the audio and video segments.
The segment urls have this structure https://vod-adaptive-ak.vimeocdn.com/exp=1744989363~acl=%2Fe9a353c8-0bb6-492a-8357-0e99365cf002%2F*~hmac=5fc420131100a7d5dc4c2c26eb8228f0fe263910ad6d40a49a1e429ec4b61e53/e9a353c8-0bb6-492a-8357-0e99365cf002/v2/remux/avf/ceec4e91-401b-4d5f-a791-56ea52fb6e19/segment.m4s?pathsig=8c953e4f~LLQR-DIKRF9HhlYkF5O-juknrHf4gIb4P6HBG6bkV_w&r=dXMtd2VzdDE%3D&sid=3&st=video
Inside the playlist.json, only the last part of the url is contained (“segment.m4s…”) - so the full url has to be build.
To build the full segment urls, the url of the playlist.json is cut after the hmac part.
full playlist.json url:
cut playlist.json url:
adds the “base_url” (id) for the segment and the segment part:
4. Request Segment URLs
All audio and video segment urls are than requested.
Results
1. Working examples
pywb collections:
https://monaulrich.online/web_archives/vimeo_1_1743805875.tar
https://monaulrich.online/web_archives/vimeo_4_1743805875.tar
https://monaulrich.online/web_archives/vimeo_9_1743805875.tar
From ca 900 vimeo videos, 2/3 were could get archived in the first process.
2. Remaining issue: Cloudflare Turnstile
On the other third of the urls, the following issue occurred.
The replay behaves like the live web version did during the capturing: The video is not loaded and after a while an error message appears. So its not a replay issue. But when opening the page in a “normal” browser, the video gets loaded. (https://player.vimeo.com/video/54854426)
"We couldn’t verify the security of your connection.
Access to this content has been restricted. Contact your internet service provider for help.”
The crawl logs show that cloudflare resources are not requested:
{“timestamp”:“2025-04-18T15:15:33.090Z”,“logLevel”:“warn”,“context”:“recorder”,“message”:“Skipping URL from unknown frame”,“details”:{“url”:“https://challenges.cloudflare.com/cdn-cgi/challenge-platform/h/g/turnstile/if/ov2/av0/rcv/theco/0x4AAAAAAAbaszMygKLnGbeo/auto/fbE/new/normal/auto/",“frameId”:"E09C4FB49D09776AEF37B02739BDB6A9”}}
I think, that those requests belong to the turnstile cloudflare check.
It seems like those checks are not made on every vimeo page. And on a second (and third …) run, I could also archive some of those pages, that was failing due to those tests in the first run.
When those checks are made (on the live web) the requested url (https://player.vimeo.com/video/37317482?byline=0&portrait=0) changes to “https://player.vimeo.com/video/37317482?byline=0&portrait=0&turnstile=0.obLJDazyTY5BysU8yeYo4sycQtWnnJneWe67-iXG4wiHOqUfMOtQQWY1NA2vdz-sS2quDvmqhSGOT_oTvS8SdsKFflAURcm2tSphOUwJV6H2w9KG5D6jaVHRU1g78wQdweBvMcg6SnC3HpQdy_1eeb1AYB7T84CkupcTyhCGJDPztASt4oTGd4–V7Y0Jl61GLl9j-pRIsmy_2cRRxP5iSoFGpmtRx5h1YBLbic087hytbQtExWORDw2T7ripcvVXyS2uQUiShF7CLwzAWvXGpWil6vRKWM-nyWmjHs8WEe2bXfFDK4efUSmEXk2QgHnvJKaiXMMwr3ZSZ7GyZqxmj1n3BbqM4BUHQ8KSXY4L83i2tdNTvekvbOt7uUV6IFGeQblTMXHG_zI7fQpudj33QFNG_riZVNHCX2dIZFSJjvo_IyxHjRAXJh0Lg6ecGzvwiOICqfxVwwjQzGJSCYV7kix3uFDWY6DtA5GhnmnqEJuieXhASFURXaIA0QGS5tFhkUkR00aUmbNNZhVwRKCmBMp-VnSMOqAtbxodWIIwv8_jMfvBhMClBYpE8RxwjEAfAZ5ogi647ESR4pRRTAJ2JVJJHxuRGJlXodMBEwtUjOwQHQxCfoDh70u98gdwyobQ-RIV3EQaMpvsoGHwTgAX5H8T3wtcfRMWXNK-4AifU8SvMTYGDAePbwnjH7Dk4DF4jVwc9nqS9n-i-lxSrmizKaBn3Yi7hPe6l8kku_oesrbfWoh27vYPlvvnYR7IkLLjFMC7xqCKhMUwm6Pzy-n5f-82Xa-pi11Y7QBXqoVonWX5V_sHTNJ_8j-g7GQ9wOLHy5n7xn9pQULlwxXKUxd844NPJ0tIpRNAGGqkLLEb1ZjZZBhViUchyLsFc5pIY7Z.eP-me2M0SpuSXj8e1VywOg.359e17cd9ebdd8136fdec9efcbc18ed21058f0fb6628825b535bc20f3f56f2c3&ref=”
So this issue remains.