I archived a facebook post with Archive Webpage in my Chrome browsertrix. When I reply the post, the screen grey out. After I click anywhere in the screen, the grey out disappear and I can see the archived post clearly. Any idea how to fix this issue?
Here is the link to the wacz file
Not sure why this is happening, possibly a minor replay issue. Facebook is always a difficult one
More importantly however, I’ve removed the link to the archive because it seemed to be captured with your personal account as a logged in Facebook user. These archives are generally not advisable share publicly as doing so can expose user session credentials depending on the site and what they store in cookies which may be present in the archive. In the future, we always recommend dedicated throwaway archiving accounts to minimize the risk of session hijacking!
Thanks, Hank, for removing the link. I’ve also attempted to archive the same pages using Browsertrix Cloud on my desktop without a Facebook profile. However, a popup box appears, requesting users to log in or sign up. Is there a way to archive Facebook pages without logging in and eliminate this login popup? Also, is there a way to remove the login information in the wacz file?
Is there a way to archive Facebook pages without logging in and eliminate this login popup?
Not that we’re aware of. Most of the time when a company sets up a login-gate the only way around it is an account! This is what browser profiles are intended to solve within Browsertrix.
Also, is there a way to remove the login information in the wacz file?
Not an easy one. You could unzip them and modify the component WARC files to search for and remove any session tokens manually but I don’t really have instructions for finding that data. Modifying the component WARCs will also invalidate any cryptographic validity of the archive as the contents have been tampered with — which may or may not be a concern depending on your use cases.
As an aside, I really like Glogg for viewing WARCs! Sometimes text editors struggle with super long text files, this one doesn’t!
Outside of session tokens, there are other things in social accounts such as messages, followers, posts, etc, that could be revealed if using a personal account which might otherwise be considered private or at least only otherwise viewable by a restricted audience.
Our recommendations are generally as follows:
Login with a burner account created specifically for archiving!
This pretty much eliminates the risk of accounts you care about getting session hijacked.
It also reduces the risk of accounts you care about getting banned by the site in question for bot activity. We’ve never seen this happen using ArchiveWeb.page as you are a human browsing the site but Browsertrix is unquestionably a crawling bot and it doesn’t really announce itself as such unless instructed to with a custom user agent.