Yes, good eye! The Content-Length header in the HTTP Response is required by the replay mechanism in order to determine how to perform Range Requests to fetch parts of the WACZ on demand, rather than downloading the entire WACZ file from the server.
The WACZ is a ZIP file, and the ZIP “directory” (a manifest of the contained files and their location) is located at the end of the file. In order to read specific files from within the ZIP file, ReplayWebPage needs to first read the Directory, and uses a Range Request to read backwards from the end, and so it needs to know the Content-Length
. Sorry if that’s TMI
ReplayWebPage does a HEAD request to the WACZ URL:
$ curl --head https://collections.digital.utsc.utoronto.ca/system/files/2023-10/serai.wacz
HTTP/2 200
cache-control: private
date: Thu, 18 Apr 2024 12:36:27 GMT
content-language: en
x-content-type-options: nosniff
x-frame-options: SAMEORIGIN
expires: Sun, 19 Nov 1978 05:00:00 GMT
x-generator: Drupal 10 (https://www.drupal.org)
accept-ranges: bytes
content-security-policy: upgrade-insecure-requests;
access-control-allow-headers: x-requested-with, Content-Type, origin, authorization, accept, client-security-token, X-ISLANDORA-TOKEN, X-Forwarded-For
strict-transport-security: max-age=63072000
last-modified: Mon, 30 Oct 2023 17:49:49 GMT
vary: User-Agent,Origin
content-security-policy: frame-ancestors 'self';
content-type: application/gzip
server: Apache
Sure enough Content-Length
is not there. But it does seem to be there for the older server?
$ curl --head https://memory.digital.utsc.utoronto.ca/sites/default/files/2023-11/utsc_pulse.wacz
HTTP/2 200
content-security-policy: upgrade-insecure-requests;
access-control-allow-headers: x-requested-with, Content-Type, origin, authorization, accept, client-security-token, X-ISLANDORA-TOKEN, X-Forwarded-For
strict-transport-security: max-age=63072000
x-content-type-options: nosniff
last-modified: Thu, 30 Nov 2023 15:50:04 GMT
etag: "4ba323e9-60b609aefb4ff"
accept-ranges: bytes
content-length: 1268982761
cache-control: max-age=31536000
expires: Fri, 18 Apr 2025 13:18:38 GMT
vary: User-Agent,Origin
content-security-policy: frame-ancestors 'self';
content-type: application/x-zip
date: Thu, 18 Apr 2024 13:18:38 GMT
server: Apache
I wonder if your new web server (Apache or Nginx?) is configured to try to gzip compress the ZIP file, and Drupal is deciding it cannot determine the Content-Length?
https://www.drupal.org/project/drupal/issues/3396559