Browsertrix-crawler behaviors

Sophie · February 2, 2023, 1:07pm

Hi, we would like to use Browsertrix-crawler to archive TikTok, Instagram and Twitter pages.
Can we do this with a terminal command by adding the flag “–behaviors siteSpecific”? Is it necessary to have a list of seed urls (–seedFile tag) to be exhaustive? When I’m logged in to e.g. Twitter in Chrome browser, it seems Browsertrix is not logged in when archiving the page. How can I change this in some way? Many thanks for your response!

svoboda · March 8, 2024, 2:28pm

Hi,

IG and TW are default behaviours, so you dont have to put the siteSpecific command
I dont know about TikTok - but autoscroll and clicking on content works. But the real struggle is login. It doesnt work.
Ad. logging, you could use Profile create feature ad. github