Algorithm To Deduce Parameter From Data
Algorithm To Deduce Parameter From Data
1. Introduction
● Objective: The goal of this project was to optimize the hyperparameters for multiple
outlier detection models using Optuna.
2. Dataset
● Description: The dataset used contains a scaled feature (OC,IC) for clustering and
outlier detection.
● Data Preparation: Data was scaled and preprocessed as required for each model
type.
● DBSCAN:
○ eps: Maximum distance between two samples for one to be
considered as in the neighborhood of the other.
○ min_samples: Minimum number of samples in a neighborhood for a
point to be considered a core point.
● KMeans:
○ n_clusters: Number of clusters to form.
○ init: Method for initialization.
○ max_iter: Maximum number of iterations of the k-means algorithm
for a single run.
● Isolation Forest:
○ n_estimators: Number of base estimators in the ensemble.
○ max_samples: Number of samples to draw to train each base
estimator.
● ABOD:
○ n_neighbors: Number of neighbors to use for the angle-based
calculation.
4. Optimization Methodology
● Optuna Framework: Optuna was used to automate the hyperparameter tuning process.
The study was configured to maximize the outlier detection performance, evaluated
through metrics such as mean, Davies-Bouldin index, or other relevant outlier metrics.
● Search Space:
○ DBSCAN: eps and min_samples.
○ KMeans: n_clusters, init, and max_iter.
○ Isolation Forest: n_estimators and max_samples.
○ ABOD: n_neighbors.
● Optimization Algorithm: Optuna's Tree-structured Parzen Estimator (TPE)
was used for efficient exploration of the hyperparameter space.
5. Results
● Best Hyperparameters:
○ DBSCAN:
■ eps: [Optimal Value]
■ min_samples: [Optimal Value]
○ KMeans:
■ n_clusters: [Optimal Value]
■ init: [Optimal Method]
■ max_iter: [Optimal Value]
○ Isolation Forest:
■ n_estimators: [Optimal Value]
■ max_samples: [Optimal Value]
○ ABOD:
■ n_neighbors: [Optimal Value]
● Performance Metrics:
○ [Include relevant performance metrics for each model before and
after optimization.]
6. Conclusion
● Summary: This Algorithm works well for all modes except the clustering based
algorithm . Because in clustering based algorithms we need a threshold through which
we identify the anomalies.
Hyperopt Hyperparameter Optimization Report
1. Introduction
● Objective: The purpose of this project was to optimize the hyperparameters for several
outlier detection models using Hyperopt.
2. Dataset
● Description: The dataset comprises a scaled feature (OC,IC) used for clustering and
outlier detection.
● Data Preparation: Data was preprocessed and scaled appropriately for each model.
● DBSCAN:
○ eps: The maximum distance between two samples for one to be
considered as in the neighborhood of the other.
○ min_samples: The minimum number of samples in a neighborhood
for a point to be considered a core point.
● KMeans:
○ n_clusters: Number of clusters to form.
○ init: Method for initialization.
○ max_iter: Maximum number of iterations of the k-means algorithm.
● Isolation Forest:
○ n_estimators: Number of base estimators in the ensemble.
○ max_samples: Number of samples to draw to train each base
estimator.
● ABOD:
○ n_neighbors: Number of neighbors to use for the angle-based
calculation.
4. Optimization Methodology
5. Results
● Best Hyperparameters:
○ DBSCAN:
■ eps: [Optimal Value]
■ min_samples: [Optimal Value]
○ KMeans:
■ n_clusters: [Optimal Value]
■ init: [Optimal Method]
■ max_iter: [Optimal Value]
○ Isolation Forest:
■ n_estimators: [Optimal Value]
■ max_samples: [Optimal Value]
○ ABOD:
■ n_neighbors: [Optimal Value]
● Performance Metrics:
○ [Include relevant performance metrics for each model before and
after optimization.]
6. Conclusion
● Summary: This Algorithm works well for all modes except the clustering based
algorithm . Because in clustering based algorithms we need a threshold through which
we identify the anomalies