Debugging a local k8s install: No crawlers start after creating a workflow

Hey y’all,

I’m looking for a bit of advice on getting unstuck with a local install using the helm charts + minikube.

I’ve gotten the whole cluster up and running on minikube (on an m2 mbp), the cluster health seems to be fine, and i can log in to the front end and create & save workflows.

However, the crawls never start, and just sit waiting to start. I assume that some crawlers are supposed to spawn and pick the jobs up, but i’m not certain where to start looking to figure out what’s stuck.

I’m looking for recommendations on logs to poke through, things to check/touch, and any other general advice while i halfheartedly poke through kubernetes documentation :slight_smile:

This topic was automatically closed after 15 hours. New replies are no longer allowed.

We should add some docs on debugging, but until then, a good place to start would be to print out what you see when you run:

  • kubectl get pods -n crawlers
  • kubectl get cjs -n crawlers
  • logs from kubectl logs svc/browsertrix-cloud-backend -c op

This should provide some insight as to what might be going wrong…

I think it’s related to v1.16.2. I experienced the same issue and downgraded to v1.16.0. I was in a hurry to run some crawls and didn’t have time to debug it.
I will try again on a disposable VM.

Here some logs, it’s 1.16.2

$ sudo kubectl get pods -n crawlers
No resources found in crawlers namespace.
$ sudo kubectl get cjs -n crawlers
NAME                                          STATE      PAGES DONE   PAGES FOUND   SIZE   TIME STARTED   TIME FINISHED   STOPPING   FILES ADDED   SCALE
crawljob-manual-20250602070102-3a6f719f-9a7   starting   0            0                    6m35s                          false      0             1
$ sudo kubectl logs --tail=20  svc/browsertrix-cloud-backend -c op
10.42.0.172:41486 - "POST /op/crawls/sync HTTP/1.1" 200
10.42.0.1:39504 - "GET /healthz HTTP/1.1" 200
10.42.0.1:39506 - "GET /healthz HTTP/1.1" 200
10.42.0.172:42134 - "POST /op/crawls/sync HTTP/1.1" 200
10.42.0.172:49982 - "POST /op/crawls/sync HTTP/1.1" 200
10.42.0.172:40530 - "POST /op/crawls/sync HTTP/1.1" 200
10.42.0.1:40504 - "GET /healthz HTTP/1.1" 200
10.42.0.1:40508 - "GET /healthz HTTP/1.1" 200
10.42.0.172:54746 - "POST /op/crawls/sync HTTP/1.1" 200
10.42.0.172:36462 - "POST /op/crawls/sync HTTP/1.1" 200
10.42.0.172:55324 - "POST /op/crawls/sync HTTP/1.1" 200
10.42.0.1:57852 - "GET /healthz HTTP/1.1" 200
10.42.0.1:57868 - "GET /healthz HTTP/1.1" 200
10.42.0.172:53360 - "POST /op/crawls/sync HTTP/1.1" 200
10.42.0.172:54894 - "POST /op/crawls/sync HTTP/1.1" 200
10.42.0.172:42688 - "POST /op/crawls/sync HTTP/1.1" 200
10.42.0.1:44450 - "GET /healthz HTTP/1.1" 200
10.42.0.1:44462 - "GET /healthz HTTP/1.1" 200
10.42.0.172:37994 - "POST /op/crawls/sync HTTP/1.1" 200
10.42.0.172:50316 - "POST /op/crawls/sync HTTP/1.1" 200

Thanks for your help @raffaele , it turned out that the published Helm chart got clobbered, breaking the 1.16.2 release in this case. It should now be fixed, please try again @knowtheory - hopefully it’ll just work now.
(See: [Bug]: Publish helm chart CI action runs on main, overrides helm chart · Issue #2642 · webrecorder/browsertrix · GitHub for more details on the issue)

1 Like

Hey, i was having the same problem as @knowtheory although i’m using docker desktop instead of minikube. But as of today, instead of the crawl starting up but not doing anything i simply get the error ‘Sorry, couldn’t run crawl at this time.’

  • kubectl get pods -n crawlers
    No resources found in crawlers namespace.

  • kubectl get cjs -n crawlers

NAME                                          STATE      PAGES DONE   PAGES FOUND   SIZE   TIME STARTED   TIME FINISHED   STOPPING   FILES ADDED   SCALE
crawljob-manual-20250602123743-2603af6f-a95   starting   0            0                    43h                            false      0             1

i assume this is the crawl i did before the helm fix? the one that got stuck on starting.

  • kubectl logs --tail=20 svc/browsertrix-cloud-backend -c op
10.1.0.80:48918 - "POST /op/crawls/finalize HTTP/1.1" 400
10.1.0.1:39650 - "GET /healthz HTTP/1.1" 200
10.1.0.1:39660 - "GET /healthz HTTP/1.1" 200
10.1.0.80:34528 - "POST /op/crawls/finalize HTTP/1.1" 400
10.1.0.80:57030 - "POST /op/crawls/finalize HTTP/1.1" 400
10.1.0.80:54176 - "POST /op/crawls/finalize HTTP/1.1" 400
10.1.0.1:34108 - "GET /healthz HTTP/1.1" 200
10.1.0.1:34122 - "GET /healthz HTTP/1.1" 200
10.1.0.80:41126 - "POST /op/crawls/finalize HTTP/1.1" 400
10.1.0.80:50868 - "POST /op/crawls/finalize HTTP/1.1" 400
10.1.0.80:49412 - "POST /op/crawls/finalize HTTP/1.1" 400
10.1.0.1:58906 - "GET /healthz HTTP/1.1" 200
10.1.0.1:58922 - "GET /healthz HTTP/1.1" 200
10.1.0.80:42456 - "POST /op/crawls/finalize HTTP/1.1" 400
10.1.0.80:47170 - "POST /op/crawls/finalize HTTP/1.1" 400
10.1.0.80:42062 - "POST /op/crawls/finalize HTTP/1.1" 400
10.1.0.1:60462 - "GET /healthz HTTP/1.1" 200
10.1.0.1:60468 - "GET /healthz HTTP/1.1" 200
10.1.0.80:35124 - "POST /op/crawls/finalize HTTP/1.1" 400
10.1.0.80:44176 - "POST /op/crawls/finalize HTTP/1.1" 400

Anyone any idea why the crawl is not staring? Many thanks in advance!

@ilya do you perhaps have any idea what the problem could be? Many thanks!

We just released 1.17.0, please try this version to see if it works now. It sounds like a different issue, though. The finalize returning 400 seems like something got into a bad state. Try with a fresh install.

Thank you! Seems like all that was needed was a fresh install indeed.

1 Like