As I’ve ironed out the basic issues in hbg-crime.org’s data collection process, I’ve been starting to think about adding some slightly more sophisticated analytics. One feature I had in mind from the beginning is classifying crime reports based on their neighborhood, so I dug around and found this decent approximation of Harrisburg neighborhood boundaries. Google allows you to export these maps as KML files, so I just extracted the coordinates into two-dimensional vectors directly in source:
(def uptown
[
[-76.899979,40.277908]
[-76.905258,40.288906]
[-76.906013,40.290821]
[-76.906357,40.293571]
[-76.905853,40.295715]
[-76.902512,40.303967]
[-76.900864,40.308659]
[-76.898247,40.310394]
[-76.896751,40.310566]
[-76.893913,40.309837]
[-76.893478,40.301262]
[-76.890831,40.292564]
[-76.887260,40.281490]
[-76.899979,40.277908]
])
The next step is finding an algorithm to determine whether a given point is inside a given polygon, and as all of the neighborhoods are simple (non-self-intersecting) polygons, I used the “Crossing Number” algorithm detailed here. The algorithm asks you to count the number of times a ray from the point crosses the polygon’s boundaries, and if that number is odd, it’s inside. So the base of the algorithm looks like this:
(defn inside?
"Is point inside the given polygon?"
[point polygon]
(odd? (reduce + (for [n (range (- (count polygon) 1))]
(crossing-number point [(nth polygon n)
(nth polygon (+ n
1))])))))
This is just a simple sum across each side of the polygon, where each side is just a set of two adjacent points (note that in the definition of the polygon above, the first point and the last point are the same, closing the shape).
Then it’s just a matter of implementing the crossing number test. I won’t repeat the logic here, it’s better explained on the geomalgorithms.com site than I would be able to do, but here’s my Clojure version of it:
(defn- crossing-number
"Determine crossing number for given point and segment of a polygon.
See http://geomalgorithms.com/a03-_inclusion.html"
[[px py] [[x1 y1] [x2 y2]]]
(if (or (and (<= y1 py) (> y2 py))
(and (> y1 py) (<= y2 py)))
(let [vt (/ (- py y1) (- y2 y1))]
(if (< px (+ x1 (* vt (- x2 x1))))
1 0))
0))
Destructuring the two vector arguments to this function makes it read really nicely, I think.