What is the purpose of the proxy in browsertrix-crawler?

I get ERR_TUNNEL_CONNECTION_FAILED when trying to run browsertrix-crawler crawl with docker (podman).

I see environment variables PROXY_HOST=localhost and PROXY_PORT=8080

What proxy is this supposed to be? I don’t see the proxy discussed in the project’s README.

It was an SELinux-related problem. Whatever the proxy was, maybe it didn’t start properly without the ability to write to the crawls directory.

Hi, the proxy is internal to the Docker container, it captures HTTPS network traffic and that’s how the archive is created. It sounds like it was a permissions issue with writing to the crawls directory…
Is there anything you needed to do to make it work? Can definitely add it to the docs

To make it work I added a :Z suffix to the -v (volume) option. I think this is specific to Red Hat’s podman and only applicable where SELinux is in use.

What might help more generally is making sure the proxy can never fail to start without throwing an error. Possibly this has already been done; 0.2.1-beta.0 seems to emit more useful errors than 0.1.2.