To understand the code and decisions we made in creating True Business Data please refer to the TBD handbook:
The TBD HandbookSpectacles at the ready!
All the code and instructions necessary to run True Business Data from the common crawl is avaliale on GitHub:
The TBD Git Repo git --enjoy!
We ran out data for Berkeley across 8 sets of the commoncrawl. The example output data can be directly found below and an example of this data plotted on a map of Berkeley can be seen below:
Download data