Has anyone here scraped a subreddit before? I’m having considerable difficulty crafting a crawl-config that scrapes a single subreddit (its main pages, its comments, and its posted links), without inadvertently scraping other subreddits.
My crawl config currently looks something like this:
combineWARC: true
seeds:
- url: https://old.reddit.com/r/BrownU/
scopeType: "prefix"
extraHops: 1
include:
- https://old.reddit.com/r/BrownU/*
exclude:
- https://old.reddit.com/$