Rainfall Prediction Using Machine Learning
Rainfall Prediction Using Machine Learning
ISSN No:-2456-2165
Abstract:- As global warming increases detection and called the kernel trap, mapping their contributions to high-
prediction of rainfall is becoming a major problem in dimensional component spaces.
countries which do not have access to proper
technology and which if done accurately can help them At the point when information is unlabeled, directed
for several purposes such as farming, health, drinking learning is beyond the realm of imagination, and
and many other. And for this purpose we predict the an unsupervised learning approach is required, which
rainfall of coming year using SVR, SVM and KNN endeavors to discover natural clustering of the data to
machine learning algorithm and compare the results gatherings, and afterward map the new information to
inferred by each algorithm. these shaped gatherings. The support-vector
clustering algorithm applies the measurements with the
I. INTRODUCTION help of vectors, created by the vector machine’s
calculation, to sort unlabeled information, and is a standout
As global warming has increased so has earth’s among the most broadly utilized grouping calculations in
temperature and due to which our local region’s yearly modern applications
rainfall patterns have also been affected and this harms the
population living in the areas, as farmers and other people SVR
who heavily rely on rainfall for all their water based needs A variant of SVM for regression was proposed in
,so in these regions accurate predictions of rainfall is of 1996 by Vladimir N. Vapnik, Harris Drucker, Christopher
utmost importance while there are many ways of predicting J. C. Burges, Linda Kaufman and Alexander J. Smola.This
them one chosen for this study is by observing and strategy is called Support-Vector Regression (SVR). The
collecting the previous year rainfall data (in mm) and then model created by support vector arrangement depends just
predicting the rainfall(in mm) for the coming year , though on a subset of the training data, on the grounds that the cost
it is not a full proof method and may become inefficient capacity for building the model does not depend on
due to any given factor such as a “sudden increase in co2 training data that lie on the outer side of the margin.
levels “ ,it is cheap and may help in counties where seasons Comparably, the model created by SVR depends just on a
are consistent and which are way behind the world in terms subset of the preparation information, in light of the fact
of technological advancements such as rural areas of that the cost function for construction of the model
counties. disregards any training data near the model's prediction.
IJISRT19MY198 www.ijisrt.com 56
Volume 4, Issue 5, May – 2019 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
III. DATA SET Results
As it can be seen from the graphs below SVR is most
The data refers to district wise rainfall (in mm) in efficient and SVM is more efficient than KNN. This is
India from period 1951-2015. [2] because to the data set being used is a time series based
and SVR machine learning algorithm is the most accurate
The data set is released under: National Data Sharing for the required task but for practical purpose one can
and Accessibility Policy (NDSAP)[2] and is contributed by argue that SVM is the best because even though SVR is the
Dr. K. Somasundar (Chief Data Officer) Ministry of Earth most accurate but in real life we get errors in recorded data
Science ,India[1] and even though we may not know about how much but
there is always a bias in natural rainfall patter and hence a
IV. MODELS AND THEIR RESULTS near perfect model may prove to be inefficient so the best
thing would be to add bias to SVM model and create a
Model formalization lower and upper limit.
The initial steps taken for the formation is data
cleaning using python as the major language in the Graphs
software “Jupyter notebook”[8]. The graphs are scaled between rainfall (in mm) and
month of the year 2015.
Then the data is divided into training and testing data Blue Line - Predicted Values
i.e data from 1951 to 2014 acts as training data and data of Green Line/ Red Dots -Actual values
the year 2015 acts as testing data. Then we train each
machine learning model and obtain the results in form of a
graph.
Graph 1:- (Font-10, Bold): Graph plotted using SVM Machine learning algorithm
Graph 2:- (Font-10, Bold): Graph plotted using SVR Machine learning algorithm
IJISRT19MY198 www.ijisrt.com 57
Volume 4, Issue 5, May – 2019 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
Graph 3:- (Font-10, Bold): Graph plotted using KNN Machine learning algorithm
V. CONCLUSION
REFERENCES
IJISRT19MY198 www.ijisrt.com 58