Browsertrix deployment stalls when initializing container migrations

I am trying to deploy Browsertrix on Red Hat Enterprise Linux 8.10 using k3s and helm.

I have followed the instructions for remote deployment and manage to get some services running, but no backend.

Running kubectl get pods shows the two backend pods are in ‘init’ status.

NAME                                          READY   STATUS     RESTARTS   AGE
browsertrix-cloud-backend-7d5f9c6cc-tw4pn     0/2     Init:0/1   0          2m8s
browsertrix-cloud-backend-b895df648-xvwff     0/2     Init:0/1   0          2m8s
browsertrix-cloud-frontend-76f584cbbc-q69t5   1/1     Running    0          2m7s
btrix-metacontroller-helm-0                   1/1     Running    0          2m6s
local-minio-7cf5b78b4b-29v56                  1/1     Running    0          2m7s
local-mongo-0                                 1/1     Running    0          2m6s

The frontend is definitely running because I get the login screen at http://129.67.246.94:30870/log-in

Running kubectl describe pod on one of the initializing pods shows:

Name:             browsertrix-cloud-backend-b895df648-xvwff
Namespace:        default
Priority:         0
Service Account:  default
Node:             algorithmic-archive-dev/129.67.246.94
Start Time:       Tue, 10 Jun 2025 14:49:36 +0100
Labels:           app=browsertrix-cloud
                  pod-template-hash=b895df648
                  role=backend
Annotations:      helm.update: gB6A0
Status:           Pending
IP:               10.42.0.21
IPs:
  IP:           10.42.0.21
Controlled By:  ReplicaSet/browsertrix-cloud-backend-b895df648
Init Containers:
  migrations:
    Container ID:  containerd://511b8fad7e8a558bcf0417e16239620840ae4c9b9dc403ac74059f629fb07f4e
    Image:         docker.io/webrecorder/browsertrix-backend:1.16.2
    Image ID:      docker.io/webrecorder/browsertrix-backend@sha256:a2c5b1c8e915a31e5b05c7820058d6e2952a079540782a38d7d898d1a99b5071
    Port:          <none>
    Host Port:     <none>
    Command:
      python3
      -m
      btrixcloud.main_migrations
    State:          Running
      Started:      Tue, 10 Jun 2025 14:49:37 +0100
    Ready:          False
    Restart Count:  0
    Limits:
      memory:  350Mi
    Requests:
      cpu:     100m
      memory:  350Mi
    Environment Variables from:
      backend-env-config  ConfigMap  Optional: false
      backend-auth        Secret     Optional: false
      mongo-auth          Secret     Optional: false
    Environment:
      MOTOR_MAX_WORKERS:  1
    Mounts:
      /app/btrixcloud/email-templates/ from email-templates (rw)
      /app/btrixcloud/templates/ from app-templates (rw)
      /config from config-volume (rw)
      /ops-configs/ from ops-configs (rw)
      /ops-proxy-configs/ from ops-proxy-configs (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-mslvv (ro)
Containers:
  api:
    Container ID:  
    Image:         docker.io/webrecorder/browsertrix-backend:1.16.2
    Image ID:      
    Port:          <none>
    Host Port:     <none>
    Command:
      gunicorn
      btrixcloud.main:app_root
      --bind
      0.0.0.0:8000
      --access-logfile
      -
      --workers
      1
      --worker-class
      uvicorn.workers.UvicornWorker
    State:          Waiting
      Reason:       PodInitializing
    Ready:          False
    Restart Count:  0
    Limits:
      memory:  350Mi
    Requests:
      cpu:      100m
      memory:   350Mi
    Liveness:   http-get http://:8000/healthz delay=5s timeout=1s period=30s #success=1 #failure=15
    Readiness:  http-get http://:8000/healthz delay=5s timeout=1s period=30s #success=1 #failure=5
    Startup:    http-get http://:8000/healthzStartup delay=0s timeout=1s period=10s #success=1 #failure=8640
    Environment Variables from:
      backend-env-config  ConfigMap  Optional: false
      backend-auth        Secret     Optional: false
      mongo-auth          Secret     Optional: false
    Environment:
      MOTOR_MAX_WORKERS:       1
      BTRIX_SUBS_APP_API_KEY:  <set to the key 'BTRIX_SUBS_APP_API_KEY' in secret 'btrix-subs-app-secret'>  Optional: true
      BTRIX_SUBS_APP_URL:      <set to the key 'BTRIX_SUBS_APP_URL' in secret 'btrix-subs-app-secret'>      Optional: true
    Mounts:
      /app/btrixcloud/email-templates/ from email-templates (rw)
      /app/btrixcloud/templates/ from app-templates (rw)
      /ops-configs/ from ops-configs (rw)
      /ops-proxy-configs/ from ops-proxy-configs (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-mslvv (ro)
  op:
    Container ID:  
    Image:         docker.io/webrecorder/browsertrix-backend:1.16.2
    Image ID:      
    Port:          <none>
    Host Port:     <none>
    Command:
      gunicorn
      btrixcloud.main_op:app_root
      --bind
      0.0.0.0:8756
      --access-logfile
      -
      --workers
      1
      --worker-class
      uvicorn.workers.UvicornWorker
    State:          Waiting
      Reason:       PodInitializing
    Ready:          False
    Restart Count:  0
    Limits:
      memory:  350Mi
    Requests:
      cpu:      100m
      memory:   350Mi
    Liveness:   http-get http://:8756/healthz delay=5s timeout=1s period=30s #success=1 #failure=15
    Readiness:  http-get http://:8756/healthz delay=5s timeout=1s period=30s #success=1 #failure=5
    Startup:    http-get http://:8756/healthz delay=5s timeout=1s period=5s #success=1 #failure=5
    Environment Variables from:
      backend-env-config  ConfigMap  Optional: false
      backend-auth        Secret     Optional: false
      mongo-auth          Secret     Optional: false
    Environment:
      MOTOR_MAX_WORKERS:  1
    Mounts:
      /app/btrixcloud/email-templates/ from email-templates (rw)
      /app/btrixcloud/templates/ from app-templates (rw)
      /config from config-volume (rw)
      /ops-configs/ from ops-configs (rw)
      /ops-proxy-configs/ from ops-proxy-configs (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-mslvv (ro)
Conditions:
  Type                        Status
  PodReadyToStartContainers   True 
  Initialized                 False 
  Ready                       False 
  ContainersReady             False 
  PodScheduled                True 
Volumes:
  config-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      shared-job-config
    Optional:  false
  ops-configs:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  ops-configs
    Optional:    false
  ops-proxy-configs:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  ops-proxy-configs
    Optional:    true
  app-templates:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      app-templates
    Optional:  false
  email-templates:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      email-templates
    Optional:  false
  kube-api-access-mslvv:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason     Age    From               Message
  ----    ------     ----   ----               -------
  Normal  Scheduled  3m23s  default-scheduler  Successfully assigned default/browsertrix-cloud-backend-b895df648-xvwff to algorithmic-archive-dev
  Normal  Pulled     3m23s  kubelet            Container image "docker.io/webrecorder/browsertrix-backend:1.16.2" already present on machine
  Normal  Created    3m23s  kubelet            Created container: migrations
  Normal  Started    3m22s  kubelet            Started container migrations

So I can see it started doing some migrations, but it never moves on from this state, it’s stuck initializing forever.

I tried to check the logs but can’t get anything because the pod is initializing.

Error from server (BadRequest): container "api" in pod "browsertrix-cloud-backend-b895df648-xvwff" is waiting to start: PodInitializing

Can anyone troubleshoot what might be going on here?

Checking again today, I have now upgraded to browsertrix version 1.17.1 via helm, and still have the same problem with the backend pods not initializing.

Similar to this error, I’m getting a cycle of ‘connection accepted’ / ‘connection ended’ log messages in the local-mongo-0 pod.

{"t":{"$date":"2025-06-19T10:08:09.917+00:00"},"s":"I",  "c":"REPL",     "id":5853300, "ctx":"initandlisten","msg":"current featureCompatibilityVersion value","attr":{"featureCompatibilityVersion":"6.0","context":"startup"}}
{"t":{"$date":"2025-06-19T10:08:09.917+00:00"},"s":"I",  "c":"STORAGE",  "id":5071100, "ctx":"initandlisten","msg":"Clearing temp directory"}
{"t":{"$date":"2025-06-19T10:08:09.942+00:00"},"s":"I",  "c":"CONTROL",  "id":20536,   "ctx":"initandlisten","msg":"Flow Control is enabled on this deployment"}
{"t":{"$date":"2025-06-19T10:08:09.943+00:00"},"s":"I",  "c":"FTDC",     "id":20625,   "ctx":"initandlisten","msg":"Initializing full-time diagnostic data capture","attr":{"dataDirectory":"/data/db/diagnostic.data"}}
{"t":{"$date":"2025-06-19T10:08:09.953+00:00"},"s":"I",  "c":"REPL",     "id":6015317, "ctx":"initandlisten","msg":"Setting new configuration state","attr":{"newState":"ConfigReplicationDisabled","oldState":"ConfigPreStart"}}
{"t":{"$date":"2025-06-19T10:08:09.953+00:00"},"s":"I",  "c":"STORAGE",  "id":22262,   "ctx":"initandlisten","msg":"Timestamp monitor starting"}
{"t":{"$date":"2025-06-19T10:08:09.963+00:00"},"s":"I",  "c":"NETWORK",  "id":23015,   "ctx":"listener","msg":"Listening on","attr":{"address":"/tmp/mongodb-27017.sock"}}
{"t":{"$date":"2025-06-19T10:08:09.963+00:00"},"s":"I",  "c":"NETWORK",  "id":23015,   "ctx":"listener","msg":"Listening on","attr":{"address":"0.0.0.0"}}
{"t":{"$date":"2025-06-19T10:08:09.963+00:00"},"s":"I",  "c":"NETWORK",  "id":23016,   "ctx":"listener","msg":"Waiting for connections","attr":{"port":27017,"ssl":"off"}}
{"t":{"$date":"2025-06-19T10:08:51.912+00:00"},"s":"I",  "c":"NETWORK",  "id":22943,   "ctx":"listener","msg":"Connection accepted","attr":{"remote":"127.0.0.1:50378","uuid":"b397efaf-7148-4f9e-a876-6eafbcea6a29","connectionId":1,"connectionCount":1}}
{"t":{"$date":"2025-06-19T10:08:51.916+00:00"},"s":"I",  "c":"NETWORK",  "id":51800,   "ctx":"conn1","msg":"client metadata","attr":{"remote":"127.0.0.1:50378","client":"conn1","doc":{"application":{"name":"mongosh 1.8.2"},"driver":{"name":"nodejs|mongosh","version":"5.3.0|1.8.2"},"platform":"Node.js v16.19.1, LE","os":{"name":"linux","architecture":"x64","version":"4.18.0-553.54.1.el8_10.x86_64","type":"Linux"}}}}
{"t":{"$date":"2025-06-19T10:08:51.928+00:00"},"s":"I",  "c":"NETWORK",  "id":22943,   "ctx":"listener","msg":"Connection accepted","attr":{"remote":"127.0.0.1:50384","uuid":"e4956b9e-8f0c-40b9-a877-d73bbe56d3eb","connectionId":2,"connectionCount":2}}
{"t":{"$date":"2025-06-19T10:08:51.928+00:00"},"s":"I",  "c":"NETWORK",  "id":22943,   "ctx":"listener","msg":"Connection accepted","attr":{"remote":"127.0.0.1:50394","uuid":"6efec8af-d3a2-4c91-8b6c-1dbaa300ff7c","connectionId":3,"connectionCount":3}}
{"t":{"$date":"2025-06-19T10:08:51.929+00:00"},"s":"I",  "c":"NETWORK",  "id":51800,   "ctx":"conn2","msg":"client metadata","attr":{"remote":"127.0.0.1:50384","client":"conn2","doc":{"application":{"name":"mongosh 1.8.2"},"driver":{"name":"nodejs|mongosh","version":"5.3.0|1.8.2"},"platform":"Node.js v16.19.1, LE","os":{"name":"linux","architecture":"x64","version":"4.18.0-553.54.1.el8_10.x86_64","type":"Linux"}}}}
{"t":{"$date":"2025-06-19T10:08:51.930+00:00"},"s":"I",  "c":"NETWORK",  "id":51800,   "ctx":"conn3","msg":"client metadata","attr":{"remote":"127.0.0.1:50394","client":"conn3","doc":{"application":{"name":"mongosh 1.8.2"},"driver":{"name":"nodejs|mongosh","version":"5.3.0|1.8.2"},"platform":"Node.js v16.19.1, LE","os":{"name":"linux","architecture":"x64","version":"4.18.0-553.54.1.el8_10.x86_64","type":"Linux"}}}}
{"t":{"$date":"2025-06-19T10:08:51.932+00:00"},"s":"I",  "c":"NETWORK",  "id":22943,   "ctx":"listener","msg":"Connection accepted","attr":{"remote":"127.0.0.1:50400","uuid":"7d2027c6-7347-41af-b191-b89dc1ee2112","connectionId":4,"connectionCount":4}}
{"t":{"$date":"2025-06-19T10:08:51.935+00:00"},"s":"I",  "c":"NETWORK",  "id":51800,   "ctx":"conn4","msg":"client metadata","attr":{"remote":"127.0.0.1:50400","client":"conn4","doc":{"application":{"name":"mongosh 1.8.2"},"driver":{"name":"nodejs|mongosh","version":"5.3.0|1.8.2"},"platform":"Node.js v16.19.1, LE","os":{"name":"linux","architecture":"x64","version":"4.18.0-553.54.1.el8_10.x86_64","type":"Linux"}}}}
{"t":{"$date":"2025-06-19T10:08:52.175+00:00"},"s":"I",  "c":"NETWORK",  "id":22944,   "ctx":"conn2","msg":"Connection ended","attr":{"remote":"127.0.0.1:50384","uuid":"e4956b9e-8f0c-40b9-a877-d73bbe56d3eb","connectionId":2,"connectionCount":3}}
{"t":{"$date":"2025-06-19T10:08:52.175+00:00"},"s":"I",  "c":"NETWORK",  "id":22944,   "ctx":"conn1","msg":"Connection ended","attr":{"remote":"127.0.0.1:50378","uuid":"b397efaf-7148-4f9e-a876-6eafbcea6a29","connectionId":1,"connectionCount":0}}
{"t":{"$date":"2025-06-19T10:08:52.175+00:00"},"s":"I",  "c":"NETWORK",  "id":22944,   "ctx":"conn3","msg":"Connection ended","attr":{"remote":"127.0.0.1:50394","uuid":"6efec8af-d3a2-4c91-8b6c-1dbaa300ff7c","connectionId":3,"connectionCount":1}}
{"t":{"$date":"2025-06-19T10:08:52.175+00:00"},"s":"I",  "c":"NETWORK",  "id":22944,   "ctx":"conn4","msg":"Connection ended","attr":{"remote":"127.0.0.1:50400","uuid":"7d2027c6-7347-41af-b191-b89dc1ee2112","connectionId":4,"connectionCount":2}}
{"t":{"$date":"2025-06-19T10:09:31.919+00:00"},"s":"I",  "c":"NETWORK",  "id":22943,   "ctx":"listener","msg":"Connection accepted","attr":{"remote":"127.0.0.1:56232","uuid":"7ea86a3c-0f3a-4391-b292-db8a5347bc53","connectionId":5,"connectionCount":1}}
{"t":{"$date":"2025-06-19T10:09:31.923+00:00"},"s":"I",  "c":"NETWORK",  "id":51800,   "ctx":"conn5","msg":"client metadata","attr":{"remote":"127.0.0.1:56232","client":"conn5","doc":{"application":{"name":"mongosh 1.8.2"},"driver":{"name":"nodejs|mongosh","version":"5.3.0|1.8.2"},"platform":"Node.js v16.19.1, LE","os":{"name":"linux","architecture":"x64","version":"4.18.0-553.54.1.el8_10.x86_64","type":"Linux"}}}}
{"t":{"$date":"2025-06-19T10:09:31.935+00:00"},"s":"I",  "c":"NETWORK",  "id":22943,   "ctx":"listener","msg":"Connection accepted","attr":{"remote":"127.0.0.1:56234","uuid":"2475fa7a-81e6-4e8b-bc0c-e12ff9a229f6","connectionId":6,"connectionCount":2}}
{"t":{"$date":"2025-06-19T10:09:31.935+00:00"},"s":"I",  "c":"NETWORK",  "id":22943,   "ctx":"listener","msg":"Connection accepted","attr":{"remote":"127.0.0.1:56248","uuid":"d0c3879a-c642-4961-a431-46db1dea1033","connectionId":7,"connectionCount":3}}
{"t":{"$date":"2025-06-19T10:09:31.936+00:00"},"s":"I",  "c":"NETWORK",  "id":51800,   "ctx":"conn6","msg":"client metadata","attr":{"remote":"127.0.0.1:56234","client":"conn6","doc":{"application":{"name":"mongosh 1.8.2"},"driver":{"name":"nodejs|mongosh","version":"5.3.0|1.8.2"},"platform":"Node.js v16.19.1, LE","os":{"name":"linux","architecture":"x64","version":"4.18.0-553.54.1.el8_10.x86_64","type":"Linux"}}}}
{"t":{"$date":"2025-06-19T10:09:31.937+00:00"},"s":"I",  "c":"NETWORK",  "id":51800,   "ctx":"conn7","msg":"client metadata","attr":{"remote":"127.0.0.1:56248","client":"conn7","doc":{"application":{"name":"mongosh 1.8.2"},"driver":{"name":"nodejs|mongosh","version":"5.3.0|1.8.2"},"platform":"Node.js v16.19.1, LE","os":{"name":"linux","architecture":"x64","version":"4.18.0-553.54.1.el8_10.x86_64","type":"Linux"}}}}
{"t":{"$date":"2025-06-19T10:09:31.938+00:00"},"s":"I",  "c":"NETWORK",  "id":22943,   "ctx":"listener","msg":"Connection accepted","attr":{"remote":"127.0.0.1:56260","uuid":"466412a8-b266-4056-a6e6-70695b23d7e3","connectionId":8,"connectionCount":4}}
{"t":{"$date":"2025-06-19T10:09:31.942+00:00"},"s":"I",  "c":"NETWORK",  "id":51800,   "ctx":"conn8","msg":"client metadata","attr":{"remote":"127.0.0.1:56260","client":"conn8","doc":{"application":{"name":"mongosh 1.8.2"},"driver":{"name":"nodejs|mongosh","version":"5.3.0|1.8.2"},"platform":"Node.js v16.19.1, LE","os":{"name":"linux","architecture":"x64","version":"4.18.0-553.54.1.el8_10.x86_64","type":"Linux"}}}}
{"t":{"$date":"2025-06-19T10:09:32.181+00:00"},"s":"I",  "c":"NETWORK",  "id":22944,   "ctx":"conn8","msg":"Connection ended","attr":{"remote":"127.0.0.1:56260","uuid":"466412a8-b266-4056-a6e6-70695b23d7e3","connectionId":8,"connectionCount":3}}
{"t":{"$date":"2025-06-19T10:09:32.181+00:00"},"s":"I",  "c":"NETWORK",  "id":22944,   "ctx":"conn7","msg":"Connection ended","attr":{"remote":"127.0.0.1:56248","uuid":"d0c3879a-c642-4961-a431-46db1dea1033","connectionId":7,"connectionCount":2}}
{"t":{"$date":"2025-06-19T10:09:32.181+00:00"},"s":"I",  "c":"NETWORK",  "id":22944,   "ctx":"conn5","msg":"Connection ended","attr":{"remote":"127.0.0.1:56232","uuid":"7ea86a3c-0f3a-4391-b292-db8a5347bc53","connectionId":5,"connectionCount":0}}
{"t":{"$date":"2025-06-19T10:09:32.181+00:00"},"s":"I",  "c":"NETWORK",  "id":22944,   "ctx":"conn6","msg":"Connection ended","attr":{"remote":"127.0.0.1:56234","uuid":"2475fa7a-81e6-4e8b-bc0c-e12ff9a229f6","connectionId":6,"connectionCount":1}}

Sorry you’re still having issues, the log to check in this case is the migrations container:

kubectl logs -f deploy/browsertrix-cloud-backend -c migrations

This should hopefully have some clues on what’s failing…

Thanks Ilya,

That helps narrow it down; those logs show

Found 2 pods, using pod/browsertrix-cloud-backend-6f6b7c7dd5-zvq8l
/usr/local/lib/python3.12/site-packages/passlib/pwd.py:16: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
  import pkg_resources
Waiting DB

The warning about passlib is a separate issue. The main thing is that the backend is stuck in a retry loop, waiting for the database to be ready.

MongoDB is up and running, so I think it must be a connection problem between the database and the backend.

In an attempt to narrow down the problem, I tried redeploying with Microk8s rather than k3s, on an entirely clean config, browsertrix v1.17.1, and I’m still hitting the same problem where the backend hangs waiting for mongodb.

NAME                                          READY   STATUS     RESTARTS      AGE
browsertrix-cloud-backend-6b77df945b-n2s9f    0/2     Init:0/1   0             17m
browsertrix-cloud-backend-bff855958-q4tnw     0/2     Init:0/1   0             17m
browsertrix-cloud-frontend-679956968b-t8rn7   1/1     Running    2 (16m ago)   17m
btrix-metacontroller-helm-0                   1/1     Running    0             17m
local-minio-7cf5b78b4b-lxhsr                  1/1     Running    0             17m
local-mongo-0                                 1/1     Running    0             17m

If I run nslookup through a busybox image, I do get an IP address for local-mongo.

/ # nslookup local-mongo
Server:    10.152.183.10
Address 1: 10.152.183.10 kube-dns.kube-system.svc.cluster.local

Name:      local-mongo
Address 1: 10.1.64.130 local-mongo-0.local-mongo.default.svc.cluster.local

And if I log into one of the backend containers, I can see they’re getting the right environment values to connect to the database.

echo $MONGO_INITDB_ROOT_PASSWORD
PASSWORD!
echo $MONGO_INITDB_ROOT_USERNAME
root
echo $MONGO_HOST
local-mongo

Is there any way to get more logging information out of the browsertrix backend?

Hm, it looks like it doesn’t get to the ‘Retrying…’ message, so perhaps it just hands there on the initial ping?

One quick think to try is if specifying the FQDN will work better, eg. mongo_host: local-mongo.default.svc.cluster.local in your helm overrides or as --set mongo_host=local-mongo.default.svc.cluster.local in case that makes a difference?

Don’t think we’ve seen an issue here before? Are there any other settings that are changed/overridden, or just stock deployment otherwise?

I was puzzled because the pod itself was responding to pings, that internal domain name was resolving correctly, but I also created a mongosh pod and when I tried to connect to mongodb on port 27017, the connection dropped. I was using a stock deployment with no changed or overridden settings.

So, in the end I found firewall rules on RHEL were blocking traffic on port 27017. Of course! The problem was firewalld all along.

To anyone else encountering this on Red Hat and associated systems, there’s a helpful guide on the microk8s documentation showing how to put your pods in a subnet and prevent the firewall from blocking traffic between them.

Glad you found the cause! I suppose we could add that to our docs to for users running on RHEL to look into this. It happens with both microK8s and k3s, right?

1 Like

Thanks! I’m looking forward to finally trying out Browsertrix with colleagues at the Bodleian :slight_smile:

This does happen with both microk8s and k3s, I’ve opened a PR to add a warning box in the documentation.