API Setup in Self-Hosted Env

alexd · August 15, 2025, 2:36pm

Hello,

I work at a small institutional repository that recently deployed a self-hosted instance of Browsertrix on our server, following the steps in the documentation for microk8s and helm3.

All of the core functionality is working as intended, but I’m having trouble figuring out exactly how to set up our instance’s API. Aside from the information in the /api/docs, there isn’t much written about API set up or access in the documentation.

The API use case is to populate metadata fields in our processing tracker spreadsheet, e.g. crawl ID, crawl status, QA status, pages crawled, crawl size, replay link, etc.

Is this something that is possible? Does anyone have information or guidance on how to do this? Any help is very much appreciated.

Thanks!

tessa-webrecorder · August 18, 2025, 2:10pm

Hi Alex,

Thanks for writing, and for pointing out this gap in our documentation.

The backend API is available at /api. You may find the ReDoc documentation a little easier to read at /api/redoc.

Requests to the API require a bearer token that is returned in the access_token field of the response to the /api/auth/jwt/login endpoint. Probably the best reference for examples for how to use the API at this point is in the backend test code, e.g.: browsertrix/backend/test/conftest.py at main · webrecorder/browsertrix · GitHub.

Using cURL, an example request to get a paginated list the crawls in a Browsertrix organization would look like:

curl -H "Authorization: Bearer <token>" https://<your deployment>/api/orgs/<org_id>/crawls

or for a specific crawl:

curl -H "Authorization: Bearer <token>" https://<your deployment>/api/orgs/<org_id>/crawls/<crawl_id>

That endpoint should return most of the information you’re looking for, and would be a good place to start