This document discusses anomaly detection techniques. It begins with an introduction to anomaly detection and its applications in areas like intrusion detection, fraud detection, and healthcare. It then discusses the use of anomaly detection in AIOps and with graph databases. The document categorizes anomalies as point, contextual, or collective and describes methods for identifying outliers like extreme value analysis. It also discusses techniques for anomaly detection in time series data, including using recurrent neural networks, historical analysis with DBSCAN clustering, and time shift detection using cosine similarity. The document compares pros and cons of time shift detection and DBSCAN for anomaly detection.
2. Agenda
2
• Part 1
• Introduction
• Application for Anomaly Detection
• AIOps
• GraphDB
• Part 2
• Type Of Anomaly Detection
• How to Identify Outliers in your Data
• Part 3
• Anomaly Detection for Timeseries Technique
7. Applications of Anomaly Detection in the Business
world
7
• Intrusion detection
• Attacks on computer systems and computer networks.
• Fraud detection
• The purchasing behavior of some one who steals a credit card is probably
different from that of the original owner.
• Ecosystem Disturbance.
• Hurricanes , floods , heat waves … etc
• Medicine.
• Unusual symptoms or test result may indicate potential health problem.
https://zindi.africa/blog/introduction-to-anomaly-detection-using-machine-learning-with-a-case-study
20. Anomalies can be broadly categorized as
20
1. Point Anomalies
2. Contextual Anomalies
3. Collective Anomalies
If an individual data instance can be considered as
anomalous with respect to the rest of data, then the instance is
termed as a point anomaly.
Chandola, V., Banerjee, A. & Kumar, V., 2009. Anomaly Detection: A Survey. ACM Computing Surveys, July. 41(3).
21. Anomalies can be broadly categorized as
21
1. Point Anomalies
2. Contextual Anomalies
3. Collective Anomalies
If a data instance is anomalous in a specific context
(but not otherwise), then it is termed as a
contextual anomaly
Chandola, V., Banerjee, A. & Kumar, V., 2009. Anomaly Detection: A Survey. ACM Computing Surveys, July. 41(3).
22. Anomalies can be broadly categorized as
22
1. Point Anomalies
2. Contextual Anomalies
3. Collective Anomalies
Chandola, V., Banerjee, A. & Kumar, V., 2009. Anomaly Detection: A Survey. ACM Computing Surveys, July. 41(3).
If a collection of related data instances is
anomalous with respect to the entire data set, it
is termed as a collective anomaly
23. How to Identify Outliers in your Data
24
1. Extreme Value Analysis
2. Proximity Methods
3. Projection Methods
4. Methods Robust to Outliers
https://machinelearningmastery.com/how-to-identify-outliers-in-your-data/
24. How to Identify Outliers in your Data
25
1. Extreme Value Analysis
2. Proximity Methods
3. Projection Methods
4. Methods Robust to Outliers
https://machinelearningmastery.com/how-to-identify-outliers-in-your-data/
1. Focus on univariate methods
2. Visualize the data using scatterplots, histograms and box
and whisker plots and look for extreme values
3. Assume a distribution (Gaussian) and look for values more
than 2 or 3 standard deviations from the mean or 1.5 times
from the first or third quartile
4. Filter out outliers candidate from training dataset and
assess your models performance
https://en.wikipedia.org/wiki/Multivariate_normal_distribution
25. How to Identify Outliers in your Data
26
1. Extreme Value Analysis
2. Proximity Methods
3. Projection Methods
4. Methods Robust to Outliers
https://machinelearningmastery.com/how-to-identify-outliers-in-your-data/
26. How to Identify Outliers in your Data
27
1. Extreme Value Analysis
2. Proximity Methods
3. Projection Methods
4. Methods Robust to Outliers
https://machinelearningmastery.com/how-to-identify-outliers-in-your-data/
27. How to Identify Outliers in your Data
28
1. Extreme Value Analysis
2. Proximity Methods
3. Projection Methods
4. Methods Robust to Outliers
https://machinelearningmastery.com/how-to-identify-outliers-in-your-data/
29. Anomaly based IDS using Backpropagation Neural Network
https://pdfs.semanticscholar.org/3ac9/37cb50bad238091fba6d1076a78136fe8691.pdf
• Vrushali D. Mane ME (Electronics) JNEC, Aurangabad
• S.N. Pawar Associate Professor JNEC, Aurangabad
• Neural Network detect 98% accuracy
50. Pros & Cons
Time shift Detection DB-Scan
Work well on slope data like
temperature
Like Time shift compare
Not work on many spikes data Must defined eps and min_samples
Type I - Extreme value anomaly: A case with an extremely high, low or otherwise rare value for one or multiple individual numerical attributes. A case can be an anomaly with respect to one individual variable, so Type I anomalies do not depend on relationships between attributes. Such a case has one or more values that can be considered extreme or rare when the entire dataset is taken into account. Traditional univariate statistics typically considers this type of outlier, e.g. by using a measure of central tendency plus or minus 3 times the standard deviation or the median absolute deviation.
Type II - Rare class anomaly: A case with an uncommon class value for one or multiple categorical variables. A case can be an anomaly with respect to one individual attribute, so Type II anomalies do not depend on relationships between attributes.
Type III - Simple mixed data anomaly: A case that is both a Type I and Type II anomaly, i.e. with at least one extreme value and one rare class. This anomaly type deviates with regard to multiple data types. This requires deviant values for at least two attributes, each anomalous in its own right. These can thus be analyzed separately; analyzing the attributes jointly is not necessary because the case is not anomalous in terms of a combination of values.
Type IV - Multidimensional numerical anomaly: A case that does not conform to the general patterns when the relationship between multiple continuous attributes is taken into account, but which does not have extreme values for any of the individual attributes that partake in this relationship. The anomalous nature of a case of this type lies in the deviant or rare combination of its continuous attribute values, and as such hides in multidimensionality. It therefore requires several continuous attributes to be analyzed jointly to detect this type.
Type V - Multidimensional rare class anomaly: A case with a rare combination of class values. In datasets with independent data points a minimum of two substantive categorical attributes needs to be analyzed jointly to discover a multidimensional rare class anomaly. An example is this curious combination of values from three attributes used to describe dogs: 'MALE', 'PUPPY' and 'PREGNANT'.
Type VI - Multidimensional mixed data anomaly: A case with a deviant relationship between its continuous and categorical attributes. The anomalous case generally has a categorical value or a combination of categorical values that in itself is not rare in the dataset as a whole, but is only rare in its neighborhood (numerical area) or local pattern. As with Type IV and V anomalies, such cases hide in multidimensionality and multiple attributes need thus to be jointly taken into account to identify them. In fact, multiple datatypes need to be used, as a Type VI anomaly per definition requires both numerical and categorical data.