To understand the code and decisions we made in creating True Business Data please refer to the TBD handbook:

All the code and instructions necessary to run True Business Data from the common crawl is avaliale on GitHub:

We ran out data for Berkeley across 8 sets of the commoncrawl. The example output data can be directly found below and an example of this data plotted on a map of Berkeley can be seen below:

