Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
48 views

A Nonlinear Regression Application Via Machine Learning Techniques For Geomagnetic Data Reconstruction Processing

The document discusses using machine learning algorithms like support vector regression, random forest regression, gradient boosting regression, and LSTM for geomagnetic data reconstruction to predict missing values. Geomagnetic data is collected from sensors at regular time intervals but sometimes values are missing. The algorithms are trained on existing data and can then predict the target values for missing data points. Based on the RMSE errors, the LSTM algorithm provided the best predictions with the least error compared to the other algorithms.

Uploaded by

Sindhu Pranathi
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views

A Nonlinear Regression Application Via Machine Learning Techniques For Geomagnetic Data Reconstruction Processing

The document discusses using machine learning algorithms like support vector regression, random forest regression, gradient boosting regression, and LSTM for geomagnetic data reconstruction to predict missing values. Geomagnetic data is collected from sensors at regular time intervals but sometimes values are missing. The algorithms are trained on existing data and can then predict the target values for missing data points. Based on the RMSE errors, the LSTM algorithm provided the best predictions with the least error compared to the other algorithms.

Uploaded by

Sindhu Pranathi
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

A Nonlinear Regression Application via Machine Learning Techniques

for Geomagnetic Data Reconstruction Processing

Geomagnetic Data contains earth magnetic field data and this data will be
recorded by sensors and by using this data scientists can know the status of
the earth such as when explosion will happen in exploded areas such as
volcanoes. Sensors will be configure to read earth magnetic field data based on
time intervals such as every minute, seconds or hours. Sometime sensor will
miss reporting some data and that missing data can cause serious issues such
as missing volcano eruption information. To overcome from such issues various
techniques were introduce but those techniques require heavy man power and
it’s a time consuming task also.

To overcome from this problem author is suggesting to use machine learning


algorithms to get target information by giving missing values. In this paper
author is evaluating performance of various machine learning algorithms such
as Support Vector Regression, Random Forest Regression, Gradient Boosting
Regression and Deep Learning LSTM Algorithm. In all algorithms LSTM is giving
less prediction error compare to other algorithms.

In this project we are using Geomagnetic dataset obtained from sensors and
this dataset downloaded from below website link.

https://www.intermagnet.org/data-donnee/download-eng.php#view

Dataset values

DATE, TIME, sensor id, HYBX, HYBY, HYBZ, HYBF


2020-01-14, 00:00:00.000, 014, 46.13, 4.16, 71.96, 43615.62
2020-01-14, 00:01:00.000, 014, 46.25, 4.13, 71.95, 43615.72
2020-01-14, 00:02:00.000, 014, 46.30, 4.07, 71.95, 43615.78
2020-01-14, 00:03:00.000, 014, 46.38, 3.99, 71.89, 43615.81

Above dataset obtained from HYDERABAD Area Sensor so its ID contains HYB
and HYBX is the latitude and HYBY is the longitude and others values are the
earth magnetic data. Along with this data we can see sense date and time with
sensor id. In above dataset sensor is configure to sense value every 1 minute
and if we want to have value in half minute then that value is missing. For
example in above dataset
First record time = 00:00:00 and it target value = 43615.62
Second record time = 00:01:00 and it target value = 43615.72

missing value at time = 00:00:30 we want to have value at middle time of 0 and
1 minute that is 00:00:30 and this missing value we can obtained by applying
regression algorithms. Regression algorithms will be trained on past values and
then it can predict target value of missing data.

Support Vector Regression Algorithm

Support Vector Regression: Support Vector Regression (SVR) is quite different


than other Regression models. It uses Support Vector Machine (SVM, a
classification algorithm) algorithm to predict a continuous variable. While
other linear regression models try to minimize the error between the predicted
and the actual value, Support Vector Regression tries to fit the best line within
a predefined or threshold error value. What SVR does in this sense, it tries to
classify all the prediction lines in two types, ones that pass through the error
boundary (space separated by two parallel lines) and ones that don’t. Those
lines which do not pass the error boundary are not considered as the
difference between the predicted value and the actual value has exceeded the
error threshold. The lines that pass, are considered for a potential support
vector to predict the value of an unknown or missing values.

Random Forest Regression Algorithm

Random forest is a bagging technique and not a boosting technique. The trees
in random forests are run in parallel. There is no interaction between these
trees while building the trees. It operates by constructing a multitude of
decision trees at training time and outputting the class that is the mode of the
classes (classification) or mean prediction (regression) of the individual trees. A
random forest is a meta-estimator (i.e. it combines the result of multiple
predictions) which aggregates many decision trees, with some helpful
modifications:

The number of features that can be split on at each node is limited to some
percentage of the total (which is known as the hyper parameter). This ensures
that the ensemble model does not rely too heavily on any individual feature,
and makes fair use of all potentially predictive features.

Each tree draws a random sample from the original data set when generating
its splits, adding a further element of randomness that prevents over fitting.
Deep Learning LSTM (Long Short Term Memory)

In this Long Short Term Memory Neural Network (LSTM) algorithm we will
build train model to predict the target geomagnetic data of unseen/missing
values.

Screen shots

To run this project double click on ‘run.bat’ file to get below screen

In above screen click on ‘Upload Geomagnetic Dataset’ button and upload


dataset
In above screen I am uploading ‘dataset.txt’ file, after dataset upload will get
below screen

In above screen we can see dataset loaded, now click on ‘Run Support Vector
Regression’ to train SVR model on loaded dataset
In above screen we can see SVR RMSE error (RMSE means prediction error and
when error is less then algorithm is able to predict missing record target value
with high accuracy). In above screen we can see total dataset size and then we
can see algorithm used how many records for training and testing. As all data
mining algorithms will used 80 % dataset records for training and 20% records
for testing to get accuracy and RMSE error. Now click on ‘Run Random Forest
Algorithm’ button to get its RMSE error

In above screen we can see Random forest RMSE prediction error. Now click
on ‘Run Gradient Boosting Regression’ button to get its RMSE error
In above screen we can see Gradient Boosting RMSE error, now click on ‘Run
LSTM Deep Learning Algorithm’ button to get LSTM prediction RMSE error

In above screen we can see LSTM RMSE error rate, now click on ‘Upload Test
Value & Reconstruct Data For Missing Values’ button and upload ‘test_dataset’
file and this file contains some values whose target value is missing and this
application will predict target value for those missing values. See below
records from test file
HYBX, HYBY, HYBZ, HYBF
80.00, 23.74, 70.33
46.58, 3.92, 71.95

In above test dataset only 3 values are there and fourth target value is missing
as date and time and sensor ID not require so we are omitting it.

In above screen I am uploading ‘test_dataset’ file and below are the result
values

In above screen we got missing fourth value which we called as reconstructed


or predicted value. Similarly u can add more intervals values in test dataset file
and get its missing target value. Now click on ‘RMSE Comparison Graph’ button
to get below graph

In above graph x-axis represents algorithm names and y-axis represents RMSE
error and from above graph we can see LSTM got less RMSE error and has best
prediction rate for missing values.

You might also like