As I've ironed out the basic issues in hbg-crime.org's data collection process, I've been starting to think about adding some slightly more sophisticated analytics. One feature I had in mind from the beginning is classifying crime reports based on their neighborhood, so I dug around and found this decent approximation of Harrisburg neighborhood boundaries. Google allows you to export these maps as KML files, so I just extracted the coordinates into two-dimensional vectors directly in source:
(def uptown [ [-76.899979,40.277908] [-76.905258,40.288906] [-76.906013,40.290821] [-76.906357,40.293571] [-76.905853,40.295715] [-76.902512,40.303967] [-76.900864,40.308659] [-76.898247,40.310394] [-76.896751,40.310566] [-76.893913,40.309837] [-76.893478,40.301262] [-76.890831,40.292564] [-76.887260,40.281490] [-76.899979,40.277908] ])
The next step is finding an algorithm to determine whether a given point is inside a given polygon, and as all of the neighborhoods are simple (non-self-intersecting) polygons, I used the "Crossing Number" algorithm detailed here. The algorithm asks you to count the number of times a ray from the point crosses the polygon's boundaries, and if that number is odd, it's inside. So the base of the algorithm looks like this:
(defn inside? "Is point inside the given polygon?" [point polygon] (odd? (reduce + (for [n (range (- (count polygon) 1))] (crossing-number point [(nth polygon n) (nth polygon (+ n 1))])))))
This is just a simple sum across each side of the polygon, where each side is just a set of two adjacent points (note that in the definition of the polygon above, the first point and the last point are the same, closing the shape).
Then it's just a matter of implementing the crossing number test. I won't repeat the logic here, it's better explained on the geomalgorithms.com site than I would be able to do, but here's my Clojure version of it:
(defn- crossing-number "Determine crossing number for given point and segment of a polygon. See http://geomalgorithms.com/a03-_inclusion.html" [[px py] [[x1 y1] [x2 y2]]] (if (or (and (<= y1 py) (> y2 py)) (and (> y1 py) (<= y2 py))) (let [vt (/ (- py y1) (- y2 y1))] (if (< px (+ x1 (* vt (- x2 x1)))) 1 0)) 0))
Destructuring the two vector arguments to this function makes it read really nicely, I think.