In 2022 we had a student spend a few months capturing the Facebook page of the Saskatchewan premier during the pandemic. The files are large with a couple GBs of information, but when I use the Webrecorder tool or warcio to try and retrieve the posts and comments, it seems like it only captured the first two posts and a single comment. I’ve now spent the time to explore the source code behind a Facebook page, so I have some understanding of why this is so difficult, particularly as the web address remains the same as you scroll through dynamic content. When I use the browser extension tool and select audio/video content, I find hundreds of meme videos that must have been included in the comments. So, this suggests the data is all in the WARC somewhere. Can anyone point me to a solution that allows me to parse the WARZ/WARC files and extract Posts and the comments for each post along with basic metadata like dates? Or is it impossible to reassemble the data given the nature of the Facebook dynamic webpages?
At the end of the day, I’m wondering if we should keep this data in our archive or write it off as an unsuccessful attempt to capture this essential public square discussion during the pandemic.