I recently worked my way through the latest incarnation of Andrew Ng’s Machine Learning Course on Coursera. One particular method stood out to me as having some potential real-life applications for me: anomaly detection with multivariate Gaussian distributions.
The class is conducted in Octave, which I find a little annoying to
deal with as a language, so to play around with some data, I wanted to
convert the procedure to Julia. Not having
found anything prebuilt from quick Googling, I translated it myself,
and leave it here for anyone who’d like to refer to it later. The
mathematical formulas I’m working from are in the above-referenced
Wikipedia article, at “Non-degenerate
case”.
The set of data I’ll represent as X
.
First we calculate our μ, just the mean of each column in X
:
Having found our mean, we can now calculate the covariance matrix:
And having both μ and Σ, we can calculate the probability for any
vector x
against our distribution: