Increasingly Functional.
by Joshua Miller | on Twitter | on the web | on github

Multivariate Gaussian Distributions in Julia

February 17th 2016

Tagged: julia

I recently worked my way through the latest incarnation of Andrew Ng's Machine Learning Course on Coursera. One particular method stood out to me as having some potential real-life applications for me: anomaly detection with multivariate Gaussian distributions.

The class is conducted in Octave, which I find a little annoying to deal with as a language, so to play around with some data, I wanted to convert the procedure to Julia. Not having found anything prebuilt from quick Googling, I translated it myself, and leave it here for anyone who'd like to refer to it later. The mathematical formulas I'm working from are in the above-referenced Wikipedia article, at "Non-degenerate case". The set of data I'll represent as X.

First we calculate our μ, just the mean of each column in X:

mu = mean(X, 1)

Having found our mean, we can now calculate the covariance matrix:

function covariance(x, mu)
  n = length(x[:,1])
  (x .- mu)' * (x .- mu) ./ n
end

sigma = covariance(X, mu)

And having both μ and Σ, we can calculate the probability for any vector x against our distribution:

function probability(mu, sigma, x)
  n = length(mu)
  exp(-0.5 * sum((x .- mu)' .* inv(sigma) .* (x .- mu))) *
  (2 * pi) ^ (-0.5 * n) *
  det(sigma) ^ -0.5
end