One thing I’ve wanted to visualize from the hbg-crime.org dataset is what times of day have the most and least crime, in which parts of the city. Using the Gadfly plotting package with Julia makes that easy.

First, pull down the current dataset:

$ wget http://hbg-crime.org/reports.csv

Then launch Julia, and import all the libraries we’ll be using.

using DataFrames
using Datetime
using Gadfly

We’ll read the reports into a DataFrame:

data = readtable("reports.csv")

Then we need to convert the time of the report into an hour of the day, from 0 (midnight to 1:00 am) to 23 (11:00 pm to midnight):

formatter = "yyyy-MM-ddTHH:mm:ss"
function hourofday(d::String)
    Datetime.hour(Datetime.datetime(formatter, d))
end
@vectorize_1arg String hourofday
@transform(data, Hour => hourofday(End))

We’re just creating a quick function that takes a String timestamp, converts it to a DateTime, then extracts the hour; after that, we just vectorize that function and apply it to the “End” column from the data.

The final data we need is just to group those results by Neighborhood and Hour:

results = by(data, ["Neighborhood", "Hour"], nrow)
complete_cases!(results)

The complete_cases! function just strips all of the non-classified data out, as it tends to give Gadfly some problems. Speaking of which, all that’s left is to create the plot and draw it to an SVG file:

p = plot(results, y="x1", x="Hour", color="Neighborhood", Guide.XLabel("Hour of Day"), Guide.YLabel("Number of Reports"), Geom.bar(position=:dodge))
draw(SVG("results.svg", 6inch, 6inch), p)

The color= attribute tells Gadfly to use the “Neighborhood” column to group different columns.

Crime By Hour Chart

Crime spikes everywhere after dark and decreases during the day, but unsurprisingly Downtown sees a disproportionate spike around 1:00-2:00 am when the bars let out.

Full source is available on Github.