Browser disconnection during crawl in Browsertrix Cloud

atroyer · September 21, 2023, 4:22am

Hi all, I’m very new to all this, so I hope this isn’t an obvious question:

I’ve just installed Browsertrix Cloud and started playing with it, using All projects | Hackaday.io as a test bed for learning the ropes. However, I’ve noticed a consistent drop in capture performance after a minute or so of running it. I thought this might be rate limiting, so I dropped to a single crawl (the options seem to be 1, 2, or 3) and added a 10 second delay time after behaviors before loading a new page. Still getting slow captures, and skipped pages. I reviewed the error log, and this is what it looks like. Any idea what might be going on? I can’t imagine I’m getting rate limited at 2 pages every 30 seconds…

Hank · September 28, 2023, 10:30pm

Not an obvious question! Doesn’t sound like a rate limiting issue…

I don’t think I’ve ever gotten the browser disconnected error which leads me to think this may be a problem with your setup? Are you running this locally or on cloud servers?

atroyer · October 4, 2023, 3:21pm

Hi Hank, thanks for the response.

I’m running this locally on my home computer. It’s pretty capable specs-wise, but I have to admit that I’m completely new to kubernetes. I don’t fully understand the how or why of assigning resources to individual “machines” or pods, or whatever fancy name kubernetes likes to use. Is it possible the VMs are to few? Or don’t have enough resources available to them?

(As a side note, Kubernetes is probably nice for developers and scaling, but for a regular joe, it’s kind of a nightmare to understand and configure. The front end is much, MUCH nicer to use than a command line though, so I’m willing to learn!)

ilya · October 7, 2023, 6:16am

The issue was likely the browser memory was configured too low, hence the browser crashes. Which version were you deploying?
We’ve made a lot of improvements here over last couple of weeks, you should try the latest 1.7.0 beta: Release Browsertrix Cloud 1.7.0 Beta 2 · webrecorder/browsertrix-cloud · GitHub
We should have 1.7.0 release ready fairly soon as well.
You should be able to upgrade directly by running the helm upgrade command in the release.

atroyer · October 15, 2023, 4:43pm

Hi ilya,

The upgrade wasn’t working, so I uninstalled and reinstalled from scratch using 1.7.0 Beta 2. It runs as before, but I’m still getting lots of errors resulting in skipped pages. I’m getting mostly the following 2 errors:

1 - Page Load Timeout, skipping page (MSG: Navigation failed because browser has disconnected!)
— This is by far the most common error. After a few minutes of running, it seems to be happening to almost every page. The description makes me feel this is the same error as before, just phrased differently. Is that a correct analysis?

3 - Crawler Instance Crashed (REASON: oom)
— This also seems to be an Out of Memory error.

Is there anything that can be done to modify the memory available, or see what’s happening here? For reference, I’m running a 24-core Ryzen 9 5900x CPU with 64GB of RAM, using “Pop! OS” as an operating system. I’m running Beta 1.7.0 Beta 2 on Microk8s with the bundled helm3 setup.

ilya · October 15, 2023, 6:37pm

It sounds like you should bump the default memory requirements for your deployment.
We are working on making it auto-resize in the future, but for now, you can override the default settings from values.yaml by creating a new local.yaml overrides file.
There you can override these values (these are the defaults from values.yaml), so you can try increasing these, esp. the memory.
For example, placing these in a my-local-overrides.yaml

crawler_extra_cpu_per_browser: 600m

crawler_extra_memory_per_browser: 768Mi

You can then deploy with: helm upgrade --install btrix https://github.com/webrecorder/browsertrix-cloud/releases/download/v1.7.0-beta.2/browsertrix-cloud-v1.7.0-beta.2.tgz -f my-local-overrides.yaml

(Any setting from https://github.com/webrecorder/browsertrix-cloud/blob/main/chart/values.yaml can be overridden this way)

atroyer · October 18, 2023, 1:08am

Thanks for the info, this solved the problem! I created a .yaml file in notepad that had:

Blockquote
crawler_extra_memory_per_browser: 10000Mi

(ridiculous quantity, I’m sure, but I can spare it on my machine) and used the redeployment command you listed. I’ve had no errors since!