A Nonlinear Regression Application Via Machine Learning Techniques For Geomagnetic Data Reconstruction Processing
A Nonlinear Regression Application Via Machine Learning Techniques For Geomagnetic Data Reconstruction Processing
Geomagnetic Data contains earth magnetic field data and this data will be
recorded by sensors and by using this data scientists can know the status of
the earth such as when explosion will happen in exploded areas such as
volcanoes. Sensors will be configure to read earth magnetic field data based on
time intervals such as every minute, seconds or hours. Sometime sensor will
miss reporting some data and that missing data can cause serious issues such
as missing volcano eruption information. To overcome from such issues various
techniques were introduce but those techniques require heavy man power and
it’s a time consuming task also.
In this project we are using Geomagnetic dataset obtained from sensors and
this dataset downloaded from below website link.
https://www.intermagnet.org/data-donnee/download-eng.php#view
Dataset values
Above dataset obtained from HYDERABAD Area Sensor so its ID contains HYB
and HYBX is the latitude and HYBY is the longitude and others values are the
earth magnetic data. Along with this data we can see sense date and time with
sensor id. In above dataset sensor is configure to sense value every 1 minute
and if we want to have value in half minute then that value is missing. For
example in above dataset
First record time = 00:00:00 and it target value = 43615.62
Second record time = 00:01:00 and it target value = 43615.72
missing value at time = 00:00:30 we want to have value at middle time of 0 and
1 minute that is 00:00:30 and this missing value we can obtained by applying
regression algorithms. Regression algorithms will be trained on past values and
then it can predict target value of missing data.
Random forest is a bagging technique and not a boosting technique. The trees
in random forests are run in parallel. There is no interaction between these
trees while building the trees. It operates by constructing a multitude of
decision trees at training time and outputting the class that is the mode of the
classes (classification) or mean prediction (regression) of the individual trees. A
random forest is a meta-estimator (i.e. it combines the result of multiple
predictions) which aggregates many decision trees, with some helpful
modifications:
The number of features that can be split on at each node is limited to some
percentage of the total (which is known as the hyper parameter). This ensures
that the ensemble model does not rely too heavily on any individual feature,
and makes fair use of all potentially predictive features.
Each tree draws a random sample from the original data set when generating
its splits, adding a further element of randomness that prevents over fitting.
Deep Learning LSTM (Long Short Term Memory)
In this Long Short Term Memory Neural Network (LSTM) algorithm we will
build train model to predict the target geomagnetic data of unseen/missing
values.
Screen shots
To run this project double click on ‘run.bat’ file to get below screen
In above screen we can see dataset loaded, now click on ‘Run Support Vector
Regression’ to train SVR model on loaded dataset
In above screen we can see SVR RMSE error (RMSE means prediction error and
when error is less then algorithm is able to predict missing record target value
with high accuracy). In above screen we can see total dataset size and then we
can see algorithm used how many records for training and testing. As all data
mining algorithms will used 80 % dataset records for training and 20% records
for testing to get accuracy and RMSE error. Now click on ‘Run Random Forest
Algorithm’ button to get its RMSE error
In above screen we can see Random forest RMSE prediction error. Now click
on ‘Run Gradient Boosting Regression’ button to get its RMSE error
In above screen we can see Gradient Boosting RMSE error, now click on ‘Run
LSTM Deep Learning Algorithm’ button to get LSTM prediction RMSE error
In above screen we can see LSTM RMSE error rate, now click on ‘Upload Test
Value & Reconstruct Data For Missing Values’ button and upload ‘test_dataset’
file and this file contains some values whose target value is missing and this
application will predict target value for those missing values. See below
records from test file
HYBX, HYBY, HYBZ, HYBF
80.00, 23.74, 70.33
46.58, 3.92, 71.95
In above test dataset only 3 values are there and fourth target value is missing
as date and time and sensor ID not require so we are omitting it.
In above screen I am uploading ‘test_dataset’ file and below are the result
values
In above graph x-axis represents algorithm names and y-axis represents RMSE
error and from above graph we can see LSTM got less RMSE error and has best
prediction rate for missing values.