KNN Technique For Analysis and Prediction of Temperature and Humidity Data

International Journal of Computer Applications (0975 – 8887)
Volume 61– No.14, January 2013
KNN Technique for Analysis and Prediction of

Temperature and Humidity Data
Sagar S. Badhiye Nilesh U. Sambhe P. N. Chatur, PhD.

Asst. Prof., Department of CT, Asst. Prof., Department of CT, Head, Department of CSE,
YCCE, Nagpur, YCCE, Nagpur GCOE, Amravati
ABSTRACT prediction is not an ad hoc procedure [3]. It is a process

The research investigates the data mining technique K-Nearest involving a number of premeditated steps and domains, all of
Neighbor resulting in a predictor for numerical series. The which influence the quality of the outcome.
series experimented with come from the climatic data usually The process is far from automatic. A particular prediction task
hard to forecast due to uncertainty. requires experimentation to assess what works best. Part of the
One approach of prediction is to spot patterns in the past, assessment comes from intelligent but to some extent artful
when it is known in advance what followed them and verify it exploratory data analysis. If the task is poorly addressed by
on more recent data. If a pattern is followed by the same existing methods, the exploration might lead to a new
outcome frequently enough, it can be concluded that it is a algorithm development.
genuine relationship. Because this approach does not assume This research work describes how a data mining technique,
any special knowledge or form of the regularities, the method “K-Nearest Neighbor (KNN)” is used to develop a system that
is quite general applicable to other series not just climate. uses numeric historical data to forecast the climate of a
The research searches for an automated pattern spotting, it specific region, city. K-Nearest Neighbor (KNN) [4] which is
involves data mining technique K-Nearest Neighbor for based on Euclidian Distance formula is used to find the hidden
prediction of temperature and humidity data for a specific patterns inside the large dataset so as to transfer the retrieved
region. The results of the research for temperature and information into usable knowledge for prediction of
humidity prediction by K-Nearest Neighbor were satisfactory temperature and humidity values and classifying climate
as it is assumed that no forecasting technique can be 100 % condition as Hot, Warm or Cold based on the predicted values.
Classification task try to classify the data records into three
accurate in prediction. classes Hot (temperature is higher than 23°C), Warm (between
16°C and 23°C) or Cold (bellow 16°C) [5].
Keywords
Data Mining, K-Nearest Neighbor, Numerical Series. 2. MATERIALS AND METHODS
1. INTRODUCTION 2.1 The Dataset
As computers, sensors and information distribution channels The most important part while implementing any data related
propagate; there is an increasing flood of data [1]. However, project is collection of proper data for the analysis using any
the data is of little use, unless it is analyzed and exploited. technique (eg. Data Mining). To test the algorithms in this
There is indeed little use in just gathering the tell tale signals research work, huge amount temperature and humidity data
of a volcano eruption, heart attack, or a stock exchange crash, was required for large number of days or years. Hence, the
unless they are recognized and acted upon in advance. This is dataset for duration of three years was collected from the
where prediction steps in. following website.
To be effective, a prediction system requires good input data, http://www.wunderground.com/history/airport/VANP/2011/10
good pattern-spotting ability, and good discovered pattern /17/DailyHistory.html
evaluation, among other. The input data needs to be
preprocessed, perhaps enhanced by a domain expert Main aspects of the data:
knowledge [2].
i. Data is recorded over 3 Years and 2 Months, from
The prediction algorithms can be provided by methods from 01/01/2009 to 29/02/2012, at Nagpur, Sonegoan, India.
statistics, machine learning, and analysis of dynamical ii. The data for various parameters is obtained in excel format
systems, together known as data mining concerned with from the website as shown in Fig. 1, the required parameters
extracting useful information from raw data, and predictions i.e. temperature and humidity is extracted from these dataset
need to be carefully evaluated to see if they fulfill criteria of and stored in Matlab files which are then available for
significance, novelty, usefulness etc. In other words, analysis.
7
Fig. 1 Sample Dataset

numQueryVectors = size of queryMatrix
2.2 Aggregation, Converting the Raw Data
The algorithm in this research work takes monthly Step 2: Initialize For i = 1 to numQueryVectors
temperature and humidity data as input. The data is available
in excel format and for analysis of temperature and humidity Calculate Euclidian Distance.
variation throughout the year the monthly data need to be Sort Euclidian Distances and neighborIds
aggregated in one file. After aggregations three matrices for in ascending order.
the year 2009, 2010 and 2011 was formed. Each column of the
matrix represents the date (day of month/year), temperature Calculate
and humidity on a particular day, whereas row consists of its NeighborDistance(i) = sqrt(sortval(i to k))
values.
End for loop
For purpose of temperature and humidity prediction only these
parameters data is required from the raw dataset and hence Step 3: Initialize i = 1 to 3
these must be extracted. Thus, the temperature and humidity
Initialize i = 1 to 4
data for each month is extracted and stored in Matrix format
named by the particular month. It is understood that the values tP (i, j) = dataMatrix(2)
of temperature or humidity in a particular month of a year will
have maximum resemblance to its values of that particular hP (i, j) = dataMatrix(3)
month for any other year, hence twelve matrices are created end loop
for these data for each month of the year as Jan, Feb, Mar,
Apr, May, Jun, Jul, Aug, Sept, Oct, Nov and Dec, these end loop
consists of temperature and humidity data of that respective Step 4: Calculate predicted temperature and
month for the complete duration of three years 2009, 2010 and humidity
2011. Data for more years can be added to this. These dataset
matrices are then used for prediction of temperature and mtP = tP/3
humidity.
mhP = hP/3
2.3 Implementing K-Nearest Neighbor for return predicted temperature and
temperature and humidity prediction humidity.
Input:
Step 5: Exit
dataMatrix // Candidate trace data matrix for a
particular month for duration of 3 years 2.4 Working of KNN Algorithm
(2009, 2010, 2011) Fig. 2 shows the working of KNN algorithm for temperature
and humidity prediction.
queryMatrix // Reference trace data matrix consists of
data for previous 3 days to the day of Fig. 2(a) shows two matrices the dataMatrix and the
prediction queryMatrix the first consists of data of temperature and
humidity three years for the month whose prediction is to be
K // Number of neighbors, K=4 in this made for example, if prediction is to be made for 28-2-2012
research work. then dataMatrix consists of temperature and humidity values
Output: for month of February for year 2009, 2010 and 2011, 2012,
size of matrix so formed is 113 x 2. The queryMatrix consists
mtP[4] // Predicted Temperature values for 4 days of temperature and humidity values for 25-2-2012, 26-2-2012
and 27-2-2012. The KNN algorithm calculate four nearest
mhP[4] // Predicted Humidity values for 4 days
neighbor for temperature and humidity data for each day of
KNN Algorithm: // Algorithm to predict temperature and the queryMatrix the index of all these neighbors for each day
humidity is shown in neighbors matrix in Fig. 2(b) and the Euclidian
Distance is shown in Dist matrix in Fig. 2(c) the rows in Dist
Step 1: Initialize variables matrix indicates the ith day in the queryMatrix and ith column
numDataVectors = size of dataMatrix indicates the ith nearest neighbor for temperature and humidity
for the ith day.
8
Fig. 2 Working of KNN Algorithm
Fig. 2(d) and (e) shows the temperature and humidity values found to be 0.143 for temperature prediction and 3.601 for
from the dataMatrix for the index obtained in neighbors matrix. humidity prediction.
The average values of ith column in tP and hP matrix gives the
predicted values for temperature and humidity for the ith day. 3. RESULTS
Fig. 2(e) shows the predicted values for temperature and Fig. 3 shows the graphical user interface of the temperature and
humidity for four days i.e. 28-2-2012 to 2-3-2012. The mean humidity prediction system using data mining.
square error was calculated for the above prediction it was
Fig. 3 User Interface
9
Fig. 4 Plotted Graph for Temperature Analysis for the year 2009
3.1 Temperature and Humidity Analysis 3.2 Adding new files

Fig. 4 shows the plotted output for temperature data for the The database needs to be updated all the time and hence new
complete year 2009 which shows the variation in temperature files can be added to the database by clicking on the ‘Add to
throughout the year. Similarly, graphs for various years for Database’ button and then the database can be used for
temperature and humidity can be plotted. The graphs can also analysis. Fig. 5 and Fig. 6 show the procedure for adding files
be plotted for monthly data. Thus the plotted graphs can be to database.
used by the analysts and researchers for their research. For
example, the graph shows that temperature in Nagpur was The database for the month of February 2012 was added to the
highest during the days 100 to 165 of the year 2009 i.e. for the database as shown in Fig. 5.
months from 2nd week of April to 2nd week of June after
which temperature falls down.
Fig. 5 Adding New Files to the Database
10
Fig. 6 Pop-up Window showing message that ‘Data is added to Database’
3.3 Clustering and 2011. The cluster formed is shown in Fig. 8. Twelve such
In this research work the datasets were divided in number of clusters for each month from January to December are already
clusters based on the type of analysis required for that created in the system based on three years dataset. In same way
‘Clustering’ button is used, Fig. 7 shows the procedure to create 38 clusters for all months from January 2009 to February 2012
cluster for the month of April. The output of clustering forms a was created and 6 cluster for temperature and humidity 2 for
data matrix of size 90 X 3 which consists of date temperature each year 2009, 2010 and 2011 was already stored in the
and humidity values of April month for the years 2009, 2010 database.
Fig. 7 Clustering of Datasets for April month
11
Fig. 8 Cluster for April Month

values of temperature and humidity for the four days as shown
3.4 PREDICTION in Fig. 11. For the above prediction the Mean Square error was
Four days temperature and humidity was predicted using
calculated and error of 0.143 was found for temperature and
Temperature and Humidity Prediction System for which the
that of 3.601 for humidity. Predicting such values for number of
date from where prediction is to be made for four days is to be
samples it was found that the results of KNN for temperature
entered in the pop-up window as shown in Fig. 9 an when ‘Ok’
prediction was better as compared to that of humidity
button is clicked the predicted values for temperature and
prediction and accuracy between 88 % to 92 % was found for
humidity appears in two pop-up window one showing output in
temperature prediction and between 85 % to 90 % for humidity
graphical form with actual and predicted values plotted on the
prediction.
graph as shown in Fig. 10 and other showing the predicted
Fig. 9 Input date for Temperature and Humidity Prediction
12
Fig. 10 Temperature and Humidity Graph with Actual and Predicted values
satisfying accuracy of prediction. In future the following things
can be implemented in this research:
Adding other climatic parameters such as dew point, pressure,
light intensity etc for prediction and increasing the duration of
prediction.
Using larger units of analysis i.e. analyzing the results of
prediction of K-Nearest Neighbor on dataset of number of other
cities or places.
Providing Signal analysis tools for automatic analysis of variation
in the pattern of temperature and humidity data and validating
resulting output patterns to researchers.
i. KNN can be combined with some other techniques such as
Fuzzy Logic which can increase the accuracy of prediction.
ii. The software could be embedded with hardware and used as
a complete unit of prediction. Additional feature helpful to
farmers can be implemented such as prediction of type of crop
that should be planted base on the predicted value of
atmospheric parameters.
Fig. 11 Predicted Values for Temperature and Humidity 6. REFERENCES
[1] Larose D. T.: Discovering Knowledge in Data: An
4. CONCLUSION Introduction to Data Mining, Wiley, Chichester 2005
KNN can predict temperature and humidity with satisfying
accuracy at times but at some other instance of time the accuracy [2] S. Kotsiantis and et. al., “Using Data Mining Techniques for
of prediction reduces due to uncertain behavior of the climatic Estimating Minimum, Maximum and Average Daily
condition. It was observed in some instances that for humidity Temperature Values”, World Academy of Science,
prediction the difference in predicted and actual values were more Engineering and Technology 2007 pp. 450-454
it can be due to the sudden change that occurs in humidity of [3] Han J., Kamber M.: Data Mining concepts and Techniques,
atmosphere which is more frequent for humidity than as compared Elsevier Science and Technology, Amsterdam 2006
to that of temperature. Prediction of temperature and humidity
was done for number of days and it was found that K-Nearest [4] Cover T, Hart P (1967) “Nearest neighbor pattern
Neighbor produced satisfying results in terms of prediction of classification”. IEEE Trans Inform Theory Volume 13(1) pp.
temperature and humidity with accuracy of 88% to 92% for 21–27
temperature prediction and of 85% to 90% for humidity [5] Badhiye S. S., et.al., ‘Temperature and Humidity Data
prediction. The results obtained were satisfying as it is assumed Analysis for Future Value Prediction using Clustering
that no forecasting system can be 100 % accurate due to Technique: An Approach’, International Journal of Emerging
uncertainty of climatic parameters. Technology and Advanced Engineering, 2(1), pp. 88-91,
2012.
5. FUTURE SCOPE
At present the system is able to predict temperature and humidity
data for four days by using K-Nearest Neighbor algorithm with
13

KNN Technique For Analysis and Prediction of Temperature and Humidity Data

Uploaded by

Copyright:

Available Formats

KNN Technique For Analysis and Prediction of Temperature and Humidity Data

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

KNN Technique For Analysis and Prediction of Temperature and Humidity Data

Uploaded by

Copyright:

Available Formats

International Journal of Computer Applications (0975 – 8887)

Volume 61– No.14, January 2013

KNN Technique for Analysis and Prediction of

Sagar S. Badhiye Nilesh U. Sambhe P. N. Chatur, PhD.

ABSTRACT prediction is not an ad hoc procedure [3]. It is a process

Fig. 1 Sample Dataset

Fig. 2 Working of KNN Algorithm

Fig. 3 User Interface

3.1 Temperature and Humidity Analysis 3.2 Adding new files

Fig. 5 Adding New Files to the Database

Fig. 6 Pop-up Window showing message that ‘Data is added to Database’

Fig. 7 Clustering of Datasets for April month

Fig. 8 Cluster for April Month

Fig. 9 Input date for Temperature and Humidity Prediction

You might also like