I’d also be interested if it’s related to the local IP being used. Did you try the host file edit method?
I did, but to no avail. As in, your /etc/hosts
trick gives me a prettier URL, but still results in the weird behavior I described (i.e. ReplayWeb.page failing to load pages after I Ctrl-c/kill
my webserver, be it http-server
or darkhttpd
).
I would like to try if this happens if I use 127.0.0.1
instead of 192.168.0.70
(i.e. my host IP adress on my home LAN), but I’m hitting a docker wall if I try to do that:
$ sudo docker run --network=host -v $PWD/crawls:/crawls/ -it webrecorder/browsertrix-crawler crawl --url http://127.0.0.1 --generateWACZ --collection localhost_test
{"timestamp":"2024-10-29T15:56:14.079Z","logLevel":"info","context":"general","message":"Browsertrix-Crawler 1.2.3 (with warcio.js 2.2.1)","details":{}}
{"timestamp":"2024-10-29T15:56:14.081Z","logLevel":"info","context":"general","message":"Seeds","details":[{"url":"http://127.0.0.1/","scopeType":"prefix","include":["/^https?:\\/\\/127\\.0\\.0\\.1\\//"],"exclude":[],"allowHash":false,"depth":-1,"sitemap":null,"auth":null,"_authEncoded":null,"maxExtraHops":0,"maxDepth":1000000}]}
{"timestamp":"2024-10-29T15:56:14.155Z","logLevel":"warn","context":"redis","message":"ioredis error","details":{"error":"[ioredis] Unhandled error event:"}}
{"timestamp":"2024-10-29T15:56:14.156Z","logLevel":"warn","context":"state","message":"Waiting for redis at redis://localhost:6379/0","details":{}}
{"timestamp":"2024-10-29T15:56:15.221Z","logLevel":"warn","context":"state","message":"Waiting for redis at redis://localhost:6379/0","details":{}}
{"timestamp":"2024-10-29T15:56:16.241Z","logLevel":"warn","context":"state","message":"Waiting for redis at redis://localhost:6379/0","details":{}}
^C{"timestamp":"2024-10-29T15:56:16.616Z","logLevel":"info","context":"general","message":"SIGINT received...","details":{}}
{"timestamp":"2024-10-29T15:56:16.616Z","logLevel":"error","context":"general","message":"error: no crawler running, exiting
Same thing if I add 127.0.0.1 www.bestemmingsplannen.archive
to my /etc/hosts
, and then run the crawler with --url http://bestemmingsplannen.archive/
— I get stuck at Waiting for redis at redis://localhost:6379/0
Crawling websites from my home LAN IP works fine with /etc/hosts
modifications though, as long as I make sure to include --network=host
.
I’ve reproduced this behavior on multiple LANs, btw.