While upgrading browsertrix-cloud from Jan 24’s version to current master, I ran into an issue that a migration would not function:
INFO: Application startup complete.
Current database version before migration: 0002
Migration available to apply: 0001
No migration to apply - skipping
Current database version before migration: 0002
Migration available to apply: 0002
No migration to apply - skipping
Current database version before migration: 0002
Migration available to apply: 0003
Performing migration up
Task exception was never retrieved
future: <Task finished name='Task-3' coro=<update_and_prepare_db() done, defined at /app/./btrixcloud/db.py:50> exception=KeyError('jobType')>
Traceback (most recent call last):
File "/app/./btrixcloud/db.py", line 70, in update_and_prepare_db
if await run_db_migrations(mdb):
File "/app/./btrixcloud/db.py", line 100, in run_db_migrations
if await migration.run():
File "/app/./btrixcloud/migrations/__init__.py", line 61, in run
await self.migrate_up()
File "/app/btrixcloud/migrations/migration_0003_mutable_crawl_configs.py", line 69, in migrate_up
"jobType": config_result["jobType"],
KeyError: 'jobType'
It looks like there may have been need for a migration somewhere before the migrations 1-3, when jobType
was introduced. Or there might have been some other data corruption.
Anyway, in case other people might be running into this, I solved it with the following Python script. Restart the backend, and the migrations would work.
#!/usr/bin/env python3
#
# Populate jobType in crawler configs.
#
# Run for example as:
#
# cat fix.py | kubectl exec -i deploy/browsertrix-cloud-backend -- python3
#
import os
import urllib
from pymongo import MongoClient
# from btrixcloud/db.py
def resolve_db_url():
"""get the mongo db url, either from MONGO_DB_URL or
from separate username, password and host settings"""
db_url = os.environ.get("MONGO_DB_URL")
if db_url:
return db_url
mongo_user = urllib.parse.quote_plus(os.environ["MONGO_INITDB_ROOT_USERNAME"])
mongo_pass = urllib.parse.quote_plus(os.environ["MONGO_INITDB_ROOT_PASSWORD"])
mongo_host = os.environ["MONGO_HOST"]
return f"mongodb://{mongo_user}:{mongo_pass}@{mongo_host}:27017"
client = MongoClient(resolve_db_url())
db = client['browsertrixcloud']
print(db.crawl_configs.update_many(
{ "jobType": None },
{
"$set": {
"jobType": "seed-crawl"
}
},
))