As I’ve ironed out the basic issues in hbg-crime.org’s data collection process, I’ve been starting to think about adding some slightly more sophisticated analytics. One feature I had in mind from the beginning is classifying crime reports based on their neighborhood, so I dug around and found this decent approximation of Harrisburg neighborhood boundaries. Google allows you to export these maps as KML files, so I just extracted the coordinates into two-dimensional vectors directly in source:
The next step is finding an algorithm to determine whether a given point is inside a given polygon, and as all of the neighborhoods are simple (non-self-intersecting) polygons, I used the “Crossing Number” algorithm detailed here. The algorithm asks you to count the number of times a ray from the point crosses the polygon’s boundaries, and if that number is odd, it’s inside. So the base of the algorithm looks like this:
This is just a simple sum across each side of the polygon, where each side is just a set of two adjacent points (note that in the definition of the polygon above, the first point and the last point are the same, closing the shape).
Then it’s just a matter of implementing the crossing number test. I won’t repeat the logic here, it’s better explained on the geomalgorithms.com site than I would be able to do, but here’s my Clojure version of it:
Destructuring the two vector arguments to this function makes it read really nicely, I think.