As I’ve ironed out the basic issues in hbg-crime.org’s data collection process, I’ve been starting to think about adding some slightly more sophisticated analytics. One feature I had in mind from the beginning is classifying crime reports based on their neighborhood, so I dug around and found this decent approximation of Harrisburg neighborhood boundaries. Google allows you to export these maps as KML files, so I just extracted the coordinates into two-dimensional vectors directly in source:

(def uptown
  [
   [-76.899979,40.277908]
   [-76.905258,40.288906]
   [-76.906013,40.290821]
   [-76.906357,40.293571]
   [-76.905853,40.295715]
   [-76.902512,40.303967]
   [-76.900864,40.308659]
   [-76.898247,40.310394]
   [-76.896751,40.310566]
   [-76.893913,40.309837]
   [-76.893478,40.301262]
   [-76.890831,40.292564]
   [-76.887260,40.281490]
   [-76.899979,40.277908]
   ])

The next step is finding an algorithm to determine whether a given point is inside a given polygon, and as all of the neighborhoods are simple (non-self-intersecting) polygons, I used the “Crossing Number” algorithm detailed here. The algorithm asks you to count the number of times a ray from the point crosses the polygon’s boundaries, and if that number is odd, it’s inside. So the base of the algorithm looks like this:

(defn inside?
  "Is point inside the given polygon?"
  [point polygon]
  (odd? (reduce + (for [n (range (- (count polygon) 1))]
                    (crossing-number point [(nth polygon n)
                                            (nth polygon (+ n
                                            1))])))))

This is just a simple sum across each side of the polygon, where each side is just a set of two adjacent points (note that in the definition of the polygon above, the first point and the last point are the same, closing the shape).

Then it’s just a matter of implementing the crossing number test. I won’t repeat the logic here, it’s better explained on the geomalgorithms.com site than I would be able to do, but here’s my Clojure version of it:

(defn- crossing-number
  "Determine crossing number for given point and segment of a polygon.
   See http://geomalgorithms.com/a03-_inclusion.html"
  [[px py] [[x1 y1] [x2 y2]]]
  (if (or (and (<= y1 py) (> y2 py))
          (and (> y1 py) (<= y2 py)))
    (let [vt (/ (- py y1) (- y2 y1))]
      (if (< px (+ x1 (* vt (- x2 x1))))
        1 0))
    0))

Destructuring the two vector arguments to this function makes it read really nicely, I think.