Upgrade issue workaround

While upgrading browsertrix-cloud from Jan 24’s version to current master, I ran into an issue that a migration would not function:

INFO:     Application startup complete.
Current database version before migration: 0002
Migration available to apply: 0001
No migration to apply - skipping
Current database version before migration: 0002
Migration available to apply: 0002
No migration to apply - skipping
Current database version before migration: 0002
Migration available to apply: 0003
Performing migration up
Task exception was never retrieved
future: <Task finished name='Task-3' coro=<update_and_prepare_db() done, defined at /app/./btrixcloud/db.py:50> exception=KeyError('jobType')>
Traceback (most recent call last):
  File "/app/./btrixcloud/db.py", line 70, in update_and_prepare_db
    if await run_db_migrations(mdb):
  File "/app/./btrixcloud/db.py", line 100, in run_db_migrations
    if await migration.run():
  File "/app/./btrixcloud/migrations/__init__.py", line 61, in run
    await self.migrate_up()
  File "/app/btrixcloud/migrations/migration_0003_mutable_crawl_configs.py", line 69, in migrate_up
    "jobType": config_result["jobType"],
KeyError: 'jobType'

It looks like there may have been need for a migration somewhere before the migrations 1-3, when jobType was introduced. Or there might have been some other data corruption.

Anyway, in case other people might be running into this, I solved it with the following Python script. Restart the backend, and the migrations would work.

#!/usr/bin/env python3
#
# Populate jobType in crawler configs.
#
# Run for example as:
#
#   cat fix.py | kubectl exec -i deploy/browsertrix-cloud-backend -- python3
#
import os
import urllib
from pymongo import MongoClient

# from btrixcloud/db.py
def resolve_db_url():
    """get the mongo db url, either from MONGO_DB_URL or
    from separate username, password and host settings"""
    db_url = os.environ.get("MONGO_DB_URL")
    if db_url:
        return db_url

    mongo_user = urllib.parse.quote_plus(os.environ["MONGO_INITDB_ROOT_USERNAME"])
    mongo_pass = urllib.parse.quote_plus(os.environ["MONGO_INITDB_ROOT_PASSWORD"])
    mongo_host = os.environ["MONGO_HOST"]

    return f"mongodb://{mongo_user}:{mongo_pass}@{mongo_host}:27017"

client = MongoClient(resolve_db_url())
db = client['browsertrixcloud']

print(db.crawl_configs.update_many(
    { "jobType": None },
    {
        "$set": {
            "jobType": "seed-crawl"
        }
    },
))
2 Likes

Ah thanks for reporting this! That field is still optional in the config itself, but should be getting set. We should probably do something similar to what you have to fix this in the migration itself.

1 Like