This question is about a modification I’m planning for Common Crawl’s pywb instance… and I’m just asking for advice.
We make each time-based crawl a separate collection in pywb, and by now we have nearly 100 of them. To make things easier for cdx clients to fetch by date, I’d like to add a start/end timestamp to each collection in collinfo.json. This addition ought not break any cdx client software, but who knows.
So:
- Is changing collinfo.json a good idea?
- Should I make a new .json file instead? (I already have a graphinfo.json for our web graph.)
- Are there other pywb installs that might want to add this feature? Or who have already done this?
I’ve also asked on the IIPC slack. Thanks in advance for your replies! – greg