NMLS - 2. Anomaly Detection - Unsupervised Learning - Introductory Machine Learning
In 2022 I started the new Machine Learning Specialization by Andrew NG.
I completed it's first two courses in 2022. I'll share the jupyter notebooks from this specialization soon.
Now, in 2023 when I am finally free from my M1, I am going to complete the specialization. Here is the third Assignment of course 3 (last course) on Recommender Systems.
Problem Statement:
307 measurements of two features
- throughput (mb/s)
- latency (ms)
of several servers, were given, to us, we were tasked to find anomalous behavior of a server.
Procedure:
Since in addition to unlabelled data, we, also had some labelled data, so, we made a training dataser from unlabelled data and a cross validation set from labelled data.
First we used numpy to find the mean and variance of the features.
Then we using mean and variance we created probability distribution functions for each feature.
Assuming that all features are statistically independent, the total probability for a sample were calculated by simply multiplying the probabilities corresponding to each of it's features.
Probabilities for all of the samples were computed.
The difference of the highest value of probability and the lowest one was divided by 1000 to get a step size.
A loop was run from the lowest value to the highest value in the steps of the step-size calculated in last step.
During each iteration the threshold (epsilon - used to declare samples as anomalous vs non-anomalous) was set equal to the current value of 'iter'.
F1 score was computed for each value threshold/iter. The threshold that generated the highest F1 score was selected.
Relevant github link (my github repo)
Comments
Post a Comment