Replay instagram account

Hello :slight_smile:

I’ve captured an Instagram Account with browsertrix-crawler using a browser profile.

The WACZ is 1 GB in size.

I’ve added the WACZ to replayweb.page. When I click on a post to open it, I get the Login pop-up instead of the post. But if I use pywb, everything works fine - all the data is there and I can view the posts - without getting the Login pop-up.

Is this a known issue? Is there something wrong with my files?

THX
mona

1 Like

Thanks for raising this @mona I can confirm that I see this issue as well when using a logged in Instagram user profile with latest browsertrix-crawler and replayweb.page

Hi, would you be able to share the WACZ file in question? I haven’t been able to repro the issue where it is logged in pywb but not logged in replayweb.page?

Does the pywb collections have any other data in it, if so, perhaps could try in a new collection just in case data is loaded from a previous capture?

Actually I was mistaken, I guess my Instagram profile needed to be refreshed. I crawled the Instagram page with the latest browsertrix-crawler with the following configuration:

collection: ichbinsophiescholl
generateWACZ: true
text: true
behaviors:
  - autoscroll
  - siteSpecific
  - autoplay
  - autofetch
behaviorTimeout: 0
timeout: 36000
profile: /crawls/profiles/instagram-edsuarchivist.tar.gz
screencastPort: 9037
scopeType: page
seeds:
  - url: https://www.instagram.com/ichbinsophiescholl/

And you can see it seems to play back with the latest ReplayWebPage ok?

https://inkdroid.org/web-archives/archive/?source=https%3A%2F%2Fedsu-webarchives.s3.amazonaws.com%2Fichbinsophiescholl.wacz

If you like you can download the WACZ from https://edsu-webarchives.s3.amazonaws.com/ichbinsophiescholl.wacz and try yourself. Please let me know when you would like me to delete it.

Hello Ed,
Hello Ilya,

thank you for your quick replies.
Here is my WACZ file: http://monaulrich.online/web_archives/ichbinsophiescholl_account.wacz

I’ve created it with the following command:
docker run -p 9037:9037 -v $PWD:/crawls/ -it webrecorder/browsertrix-crawler crawl --url [url] --limit 1 --generateWACZ --text --collection ichbinsophiescholl _20221005 --behaviors autoscroll,siteSpecific --profile /crawls/profiles/profile_insta.tar.gz --screencastPort 9037 --timeout 1000000 --behaviorTimeout 0 --scopeType page --saveState

Here are some Screenshots:



@Ed: The WACZ you’ve created works also in my environment. Thank you very much.
I captured the the page again with the configs you posted. And the replay with replaywebpage works fine too.

So, it seems like the error is in my WACZ file. I tried to reproduce the error with the command above but there is a problem in the process of crawling. When I reproduced it, I will post it.

THX
Mona

1 Like