Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Prediction of Plantation and Their Profits

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 33

Prediction of All India State wise production of Plantation

and their Profits

ABSTRACT:

Prediction of All India State wise production of plantation and their profits in this project we
are going to build a dashboard which helps to predict the plantation in hectares and their
profits based on their cultivation. In this developed application we are going use some data
mining techniques which helps to predict the plantation and their profits by considering
datasets from Year 2002 to 2019 and predicting the result for the year 2018. The datasets
consists of 36 states( 29 states + 7 territories and their Sistricts). The developed application
consists of each district seasons like summer, kharif,and rabi. Each season has different crop
related to their districts. Based on the year we are going to predict the result based on the
each crop.

Introduction

In India there are multiple types of plantation are grown within states. India is famous for
spices. In India every year we grow the plantation(spices) and we have profits and loss
based on the market values or other factors. In this project we are going to build
application which helps in predicting the spices in hectare production and market values
for the upcoming year.

Data mining is a particular data analysis technique that focuses on modeling and
knowledge discovery for predictive rather than purely descriptive purposes, while business
intelligence covers data analysis that relies heavily on aggregation, focusing on business
information.

Motivation

India is famous for spices plantation. In India in every states different spices were grown
and there will profit and loss depends on atmosphere. In our application we are going to
collect datasets from the different states and analyzing their state wise plantation and
profits. With help of this data we going to predict the current year yield and profit which
helps government to concentrate on particular state.

Existing System:

 The government website will give the detail plantation based on hectare wise
thought state. The website contains the previous year datasets which does not help
farmer and government to analyze it.

 There is always change in the increase and decrease in the production

 No future prediction

Proposed System:

 The model helps in predicting the result based on hectare and prediction of price.

 The model helps the government to take the action based on predicting the result

 Clustering based on state wise result.

Objective:

 Analyzing the data for previous year and predicting the result based on the other
parameters.

 State wise differentiating plantation based on high and low spices rate.

 Calculating profit based on the hectare wise based on previous cost.

 Developing the dashboard for the Above.

Parameters:

Season:

 Summer
 Kharif

 Rabi

 Whole year

Crop

 Ragi

 Paddy

 Wheat

 Jowar

 soyabean

 Cashewnut

 Coconut

 Cocoa etc
Literature review:

India is an agricultural country. Farmers are the life-blood of the nation. But the current
condition of a farmer in India is very pathetic. Today farmers are not able to enjoy the
yield produced by them. The farmers should be introduced to the modern farming
techniques because upon their well-being depends the welfare of the nation. Here, ICT
plays a very important role meeting these challenges. Precision farming and modern
society can play important roles in promoting ICT in agriculture. But the adoption is very
slow in nature due to the number of yet unresolved issues discussed in earlier developed
projects. The paper presents a generic framework for e-agricultural system comprising of
knowledge management and monitoring system and also gives a brief description of the
application interface.

India is a country where agriculture and agriculture related industries are the major source
of living for the people. Agriculture is a major source of economy of the country. It is also
one of the country which suffer from major natural calamities like drought or flood which
damages the crop. This leads to huge financial loss for the farmers thus leading to the
suicide. Predicting the crop yield well in advance prior to its harvest can help the farmers
and Government organizations to make appropriate planning like storing, selling, fixing
minimum support price, importing/exporting etc. Predicting a crop well in advance
requires a systematic study of huge data coming from various variables like soil quality,
pH, EC, N, P, K etc. As Prediction of crop deals with large set of database thus making
this prediction system a perfect candidate for application of data mining. Through data
mining we extract the knowledge from the huge size of data. This paper presents the study
about the various data mining techniques used for predicting the crop yield. The success
of any crop yield prediction system heavily relies on how accurately the features have
been extracted and how appropriately classifiers have been employed. This paper
summarizes the results obtained by various algorithms which are being used by various
authors for crop yield prediction, with their accuracy and recommendation.
1. Title: “Data Mining Techniques and Applications to Agricultural Yield Data”.
International Journal of Advanced Research in Computer and Communication
Engineering Vol. 2, Issue 9, September 2013.
Author: D Ramesh, B Vishnu Vardhan.
Summary: In this paper author has focused on the applications of Data Mining
techniques in agricultural field. Different Data Mining techniques are used, such as
K-Means, K-Nearest Neighbor (KNN), Artificial Neural Networks (ANN) and
Support Vector Machines (SVM) for very recent applications of Data Mining
techniques in agriculture field. In this paper they have considered the problem of
predicting yield production. This work aims at finding suitable data models that
achieve a high accuracy and a high generality in terms of yield prediction capabilities.
For this purpose, different types of Data Mining techniques were evaluated on
different data sets.
2. Title: “Brief Survey Of Data Mining Techniques Applied To Applications Of
Agriculture”. International Journal of Advanced Research in Computer and
Communication Engineering Vol. 5, Issue 2, February 2016
Author: Ami Mistry, Vinita Shah
Summary: In this paper authors present some of the most used data mining
techniques in the field of agriculture. In the near future the penetration of Information
Technology and Agriculture results is more interesting area of research. The main aim
of this work is to improve and substantiate the validity of yield prediction which is
useful for the farmers. Agricultural crop production depends on various factors such
as biology, climate, economy and geography. Several factors have different impacts
on agriculture, which can be quantified using appropriate statistical methodologies.
Agronomic traits such as yield can be affected by a large number of variables. In this
survey, they have analyzed Data Mining methods like clustering, classification
models to select the most relevant method for the prospect
3. Title: “Applying Data Mining Techniques To Predict Annual Yield Of Major Crops
And Recommend Planting Different Crops In Different Districts In Bangladesh”.
Department of Electrical and Computer Engineering, North South University,
Bangladesh.
Author: A.T.M Shakil Ahamed, Navid Tanzeem Mahmood, Nazmul Hossain
Summary: In this paper, author’s focus is on application of data mining techniques to
extract knowledge from the agricultural data to estimate crop yield for major cereal
crops in major districts of Bangladesh. The prediction is done using the algorithms
Clustering, K-means, K-NN Algorithm, Linear Regression and Neural Network.
4. Title: “Analysis of Soil Behavior and Prediction of Crop Yield Using Data Mining
Approach”. 2015 International Conference on Computational Intelligence and
Communication Networks.
Author: Monali Paul, Santosh K, Vishvakarma and Ashok Verma
Summary: This work presents a system, which uses data mining techniques in order
to predict the category of the analyzed soil datasets. The category, thus predicted will
indicate the yielding of crops. The problem of predicting the crop yield is formalized
as a classification rule, where Naive Bayes and K-Nearest Neighbor methods are
used.

Source of data Collection:

Datasets

A dataset (or data set) is a collection of data, usually presented in tabular form. Each column
represents a particular variable. Each row corresponds to a given member of the dataset in
question. It lists values for each of the variables, such as height and weight of an object. In
the development of the predictive model the data sets were collected internally in secondary
form. Secondary data imply statistical materials or information not originated or obtained by
the investigator himself, but obtain from someone’s record or published source such as the
central government agencies

Data source:

https://data.gov.in/search/site?query=planttion+an+profits

Feasibility Study
The feasibility study which helps to find solutions to the problems of the project. The solution
which is given that how looks as a new system look like.

Technical Feasibility
The project entitled “Prediction of plantation and Profit” is technically feasible because of
the below mentioned features. The project is developed in Java. The web server is used to
develop “Prediction of plantation and Profit” is local serve. The local server very neatly
coordinates between design and coding part. It provides a Graphical User Interface to design
an application while the coding is done in JAVA. At the same time, it provides high level
reliability, availability and compatibility.

Economic Feasibility

In economic feasibility, cost benefit analysis is done in which expected costs and benefits are
evaluated. Economic analysis is used for effectiveness of the proposed system. In economic
feasibility the most important is cost-benefit analysis. The system “Prediction of plantation
and Profit using Data Mining Techniques” is feasible because it does not exceed the
estimated cost and the estimated benefits are equal.

Operational Feasibility

The project entitled “Prediction of plantation and Profit using Data Mining Techniques” is
technically feasible because of the below mentioned features. The system predicts the
automobile buying behaviour and its stages based on the automobile purchased data, further
the details of the patient are added to the Data Base. The performance of the Data mining
techniques are compared based on their execution time and displayed it through graph.

Prediction of plantation and Profit Data Mining Techniques

Behavior Feasibility

The project entitled “Prediction of plantation and Profit using Data Mining Techniques”
is beneficial because it satisfies the objectives when developed and installed.
SOFTWARE REQUIREMENT ANALYSIS

2.1 INTRODUCTION TO SRS

The introduction of the Software Requirements Specification (SRS) provides an overview of


the entire SRS with purpose, scope, definitions, acronyms, abbreviations, references and
overview of the SRS. The aim of this document is to gather, analyse, and give an in-depth
insight of the complete “Prediction of plantation and Profit” by defining the problem
statement in detail. The detailed requirements of the Prediction of plantation and Profit
user related functions are provided in this document.

2.2 PURPOSE

The Purpose of the Software Requirements Specification for Prediction of plantation and
Profit to provide the technical, Functional and non-functional features, required to develop a
web application App. The entire application designed to provide user flexibility for finding
the shortest and/or time saving path. In short, the purpose of this SRS document is to provide
a detailed overview of our software product, its parameters and goals. This document
describes the project’s target audience and its user interface, hardware and software
requirements. It defines how our client, team and audience see the product and its
functionality.

Scope

The scope of this system is to presents a review on data mining techniques used for the
Prediction of plantation and Profit. It is evident from the system that data mining
technique, like classification, is highly efficient in Prediction of plantation and Profit.
SOFTWARE ARCHIETECTURE:

Technology used

NetBeans :

It is an integrated development environment (IDE) for Java. NetBeans allows applications to


be developed from a set of modular software components called modules. NetBeans runs on
Microsoft Windows, MAC OS, Linux and Solaris. In addition to Java development, it has
extensions for other languages like PHP, C, C++, HTML5 , Javadoc and Javascript.
Applications based on NetBeans, including the NetBeans IDE, can be extended by third party
developers.

JAVA Servlets:
Java Servlets are server-side Java program modules that process and answer client requests
and implement the servlet interface. It helps in enhancing Web server functionality with
minimal overhead, maintenance and support. A servlet acts as an intermediary between the
client and the server. As servlet modules run on the server, they can receive and respond to
requests made by the client. Request and response objects of the servlet offer a convenient
way to handle HTTP requests and send text data back to the client. Since a servlet is
integrated with the Java language, it also possesses all the Java features such as high
portability, platform independence, security and Java database connectivity.

JSP (Java Server Pages):

Java Server Pages (JSP) is a technology for developing Webpages that supports dynamic
content. This helps developers insert java code in HTML pages by making use of special JSP
tags, most of which start with <% and end with %>.A Java Server Pages component is a type
of Java servlet that is designed to fulfill the role of a user interface for a Java web application.
Web developers write JSPs as text files that combine HTML or XHTML code, XML
elements, and embedded JSP actions and commands. Using JSP, you can collect input from
users through Webpage forms, present records from a database or another source, and create
Webpages dynamically.

Highcharts:

Highcharts is a pure JavaScript based charting library meant to enhance web applications by
adding interactive charting capability. Highcharts provides a wide variety of charts. For
example, line charts, spline charts, area charts, bar charts, pie charts and so on.

MySql:

MySQL is an open source relational database management system (RDBMS) based on


Structured Query Language (SQL). MySQL runs on virtually all platforms, including Linux,
UNIX, and Windows. Although it can be used in a wide range of applications, MySQL is
most often associated with web-based applications and online publishing and is an important
component of an open source enterprise stack called LAMP. LAMP is a Web development
platform that uses Linux as the operating system, Apache as the Web server, MySQL as the
relational database management system and PHP as the object-oriented scripting language.

Functional Requirements:

Pre-Processing:

Data pre-processing is a data mining technique which is used to transform the raw data in a
useful and efficient format. The datasets collected form the above website have some null
values. The null values has to be filled using the pre processing techniques.
Steps in data Preprocessing:

 Data cleaning

The data can have many irrelevant and missing parts. To handle this part, data
cleaning is done. It involves handling of missing data, noisy data etc

 Missing Data

This situation arises when some data is missing in the data.

 Cleaning Data
Noisy data is a meaningless data that can’t be interpreted by machines.It can be
generated due to faulty data collection, data entry errors etc.

Data Model:

After preprocessing steps the attributes are selected based on the result.in our application we
used the kmeans clustering and polynomial regression.

Visualization:

 The obtained results are shown with visualization which show the complete report
SYSTEM DESIGN

Introduction to Design document


The Software Design will be used to aid in software development for android application by
providing the details for how the application should built. Within the Software Design,
specifications are narrative and graphical documentation of the software design for the
project includes use case models, sequence diagrams and other supporting requirement
information.

Scope
This software Design Document is for a base level system, which will work as a proof of
concept for the use of building a system that provides a base level of functionality to show
feasibility for large-scale production use. The software Design Document, the focus placed
on generation of the documents and modification of the documents. The system will used in
conjunction with other pre-existing systems and will consist largely of a document interaction
faced that abstracts document interactions and handling of the document objects. This
Document provides the Design specifications of Plantation and profits.
Work flow Diagram
The approach we took for our study follows the traditional data analysis steps
Data Preparation
Missing Value Numeric  Nominal

Modeling

Regression K-means Clustering

Visualization/Result Analysis

Work Flow:

Methodology:

DATA PREPARATION
Data preparation was performed before each model construction. All records with missing
value (usually represented by 0 in the dataset) in the chosen attributes were removed. All
numerical values were converted to nominal value according to the data dictionary.
 Missing Values:
Occurs when the no data value is stored for the observation
Modeling
We first calculate several statistics from the dataset to show the basic characteristics of
theplantation, then applied Regression and clustering relationships among the attributes and
the patterns.

Result Analysis
The results of our analysis include Prediction rules among the variables, clustering of states
and districts in the INDIA on their populations and number of rate. We used a data analytic
tool Highcharts to perform these analysis.

DATA FLOW DIAGRAM

A data flow diagram (DFD) is a way of representing a flow of data of a process or a


system (usually an information system). The DFD also provides information about the output
and inputs of each entity and the process itself. A data flow diagram has no control flow,
there are no decision rules and no loops. Specific operations based on the data can be
represented by a flowchart. There are several notations for displaying data flow diagrams.

Function

Database

Flow
DFD: 0 Level

DFD 0; Clustering

ACTIVITY DIAGRAM

An activity diagram visually presents a series of actions or flow of control in a system


similar to a flowchart or a data flow diagram. Activity diagrams are often used in business
process modelling. They can also describe the steps in a use case diagram. Activities
modelled can be sequential and concurrent. In both cases an activity diagram will have a
beginning (an initial state) and an end (a final state).

ACTIVITY DIAGRAM:
CLUSTERING:

4.1 SEQUENCE DIAGRAM


Sequence diagrams describe interactions among classes in terms of an exchange of
messages over time. They're also called event diagrams. A sequence diagram is a good way
to visualize and validate various runtime scenarios. These can help to predict how a system
will behave and to discover responsibilities a class may need to have in the process of
modelling a new system.

Sequence Diagram:

USECASE DIAGRAM:
DATABASE Table:

Atrributes DATA TYPE SIZE

States varchar 100


District Name varchar 100
Crop Year int 10
Season varchar 20
Crop int 10
Plantation int 10
profits int 10
TABLE : PLANTATION

IMPLEMENTATION
4.1 Introduction

The project is implemented using java which is an object oriented programming language.
Object oriented programming is an approach that provides a way of modularizing program by
creating partitioned memory area of both data and function that can be used as a template for
creating copies of such module on demand.

This project is implemented using java programming language. Both servlet and JSP
technologies are used to create a web application. Servlet are java programs are precompiled
which can create dynamic web contents. There are many interfaces and class in the servlet API
such as Httpservlet, servlet request, servletresponse etc. JSP is used to create a web application
just as servlet.it can be thought of as a extension to servlet because it provides more
functionality than servlet. MySQL server is used as a backend.

4.2 Overview

A regression algorithm is designed to find the historical relationship between an


independent and a dependent variable to predict the future values of the dependent
variable. A regression models the past relationship between variables to predict their
future behavior.

K-means clustering algorithm was used to investigate the high and low-frequency
plantation locations. Further, they have been used association rule mining to
recognize the association between the various factors related to road traffic
plantations at various places with changeable plantation occurrences
Result based on parameters

There are two algorithms are implemented for plantation. The linear regression
algorithm help in predicting the result based on the parameters. In this application we are
using all over india results based on each districts and season wise. Clustering algorithm
is used to differentiate between the district based on year wise.

Result based on k-means clustering

K-means clustering was implemented to distinguish between high and low frequency data
sets for every states. The algorithm result helps in finding the high and low crop grown
districts according to state wise.

4.3 Algorithm to depict overall steps:

Algorithm for Regression Techniques

Input: plantation data set from 2010-2020

Parameters:

Season:

 Summer

 Kharif

 Rabi

 Whole year

Crop

 Ragi

 Paddy

 Wheat

 Jowar
 soyabean

 Cashewnut

 Coconut

 Cocoa etc

Output: Predicting the Result of 2020 shown with Bar graph.

Procedure:

Step 1: scan the transaction dataset.

Step 2: Handling missing data (all records with missing value is represented as 0 in the
dataset)

Step 3: estimate the mean and the variance of both the input and output variables
from the training data.

Step 4: Store the mean and variance result of x -axis.

Step 5: Similarly store the result of y-axis.

Step 6: Calculate the covariance with taking the mean result of x and y axis.

Step 7: Calculate coefficient value of b1 and b0 using the covariance.

Step 8: Once the coefficient are estimated, make the prediction

Y=b0+b1(x).

4.3.2 K-means clustering

Input: plantation data set from 2010-2020


Parameters

 Type of Weather Condition (hot, cloud, heavy rain, light rain, snow).

 Type of Junction (T junction, Y junction, four arm, rail cross).

 Type of Location (near bazar, near factory, near school, near temple,
others).

 Vehicle Defect (bald tyres. punctured, defective Brakes. others).

 Plantation and drugs.

Output: cluster of high and low frequency plantation district wise.

Procedure:

Step1: scan transaction data set.

Step 2: calculate the average value based on the total number of plantation happens
to the total number of states.

Step 3: assume that average value is centroid ‘c’, select ‘C’ cluster center.

Step 4: Calculate the distance between each data point and cluster centers.

Step 5: Assign the data point to the cluster center whose distance from the cluster
center is minimum of all the cluster centers.

Step 6: Recalculate the new cluster center using:

where, ‘ci’ represents the number of data points in ith cluster.


Step 7: Recalculate the distance between each data point and new obtained cluster
centers.

Step 8: If no data point was reassigned then stop, otherwise repeat from step 5).

Pseudo code for Algorithm used

Regression Algorithm

X []  x-axis values

Y[]  y-axis values

//Initialization

N  length of x-axis

//First pass

//Initialization

Sumx  empty

Sumy  empty

Sumx2  empty

// first pass

For loop Begins

Initialise i 0 to until i less than n

Sumx  sumx+ x[]


Sumy  Sumy+ y[]

Sumx2  x[]*y[]

For loop ends

//second pass

// initialize xxbar  0
yybar  0 xybar  0.0;
for loop begins initialize i
 0 to i less than n
xxbar = xxbar+ (x[] - xbar) * (x[] -
xbar); yybar = yybar+(y[]
- ybar) * (y[] - ybar);
xybar = xybar+ (x[i] - xbar) * (y[i] -
ybar); for loop ends slope
= xybar / xxbar;

intercept = ybar - slope * xbar;

// more statistical analysis


double rss = 0.0; // residual sum
of squares double ssr =
0.0; // regression sum of squares
for loop begins double fit =
slope*x[i] + intercept; rss +=
(fit - y[i]) * (fit - y[i]); ssr +=
(fit - ybar) * (fit - ybar); for loop
end end
Example

Input:

Consider the data sets of Plantation from 2009 to 2017

Table:
X(year) Y(value)
2008 3496
2009 3500
2010 3987
2011 2987
2012 3019
2013 3999
2014 4015
2015 4786
2016 4018
2017 4445

//calculate the mean of X and Y

Sum of x=20125

Sum of y=38252

Mean of X=20125/10 2012.5

Mean of Y=38252/10  3825.2


X (year) Y(value) A1=(x- B1=(y— A1 *B1 (A1)2 (b1)2

mean of x) mean of y)
2008 3496 -4 -329 1316 16 108241
2009 3500 -3 -325 975 9 105625
2010 3987 -2 162 -324 4 26244
2011 2987 -1 -838 838 1 702244
2012 3019 0 -806 0 0 649636
2013 3999 1 174 174 1 30276
2014 4015 2 190 380 4 36100
2015 4786 3 961 2883 9 923521
2016 4018 4 193 772 16 37249
2017 4445 5 620 3100 25 384400
Total 5 2 10114 85
3003536

Y=bO + b1(x) // x is prediction value or


independent value b1= (A1 *B1)/(A1)2

10114/85  118.98

bo= mean of y –bo(mean of x)

=3825.5-(118.98*2012)

-235600.76

Y=b0 + b1(X)

Y=-235600.76+ 118.8(2020)

Y=4500 /// This is the predict value of 2020

Output:

Pseudo code for k-means clustering

//initialize

X0 //count the


number of states A[] 
load sub categories a
A[0] b A[1] do
Cluster1[]  length

Cluster2[]  length

For loop begin

Initialize i 0 to i less than length

If A[i] <= A[i+1]

Cluster1[]  A[]

Else

Cluster2[] A[]

For loop end

// Initialize

For loop begin

Initialize i  0; i less than


length sum1 sum1
+cluster1[i];

For loop end

for loop begin


initialize i  0 to i les than length
sum2  sum2 + cluster2[i];
for loop end

,mean1  sum1/k //printing centroid


Mean2 
sum2/k While
end

add[cluster1] //high frequency


plantation states add[cluster2] //low
frequency plantation states end
Testing

6.1 Introduction

Web applications run on devices with limited memory, CPU power and power supply. The
behaviour of the application also depends on external factors like connectivity, general
system utilization, etc.

Therefore, it is very important to debug, test and optimize web application. Having
reasonable test coverage for web application helps to enhance and maintain the web
application. As it is not possible to test bootstrap web applications on all possible device
configurations, it is a common practice to run on typical device configurations. Should test
application at least on one device with the lowest possible configuration. In addition should
test on one device with the highest available configuration, e.g., pixel density, screen
resolution to ensure that it works fine on these devices.

6.2 Testing Concepts

Web application testing based on Unit. In general, a Unit test is a method whose statements
test a part of the application. Organizes test methods into classes called test cases, and group
test cases into test suites.

6.2.1 Unit tests

Local Unit Tests


Unit tests that run on local machine only. These tests compiled to run locally on the NetBeans
to minimize execution time. Use this approach to run unit tests that have no dependencies on
the web framework or have dependencies that mock objects can satisfy.

Instrumented unit tests

Unit tests that run on device. These tests have access to Instrumentation information, such as
the Context of the application are testing. Use this approach to run unit tests that have web
application dependencies, which mock objects cannot easily satisfy.

6.2.2 Integration Tests

This type of test verifies that the target app behaves as expected when a user performs a
specific action or enters a specific input in its activities. For example, it allows checking that
the target app returns the correct UI output in response to user interactions in the app’s
activities. UI testing frameworks like Espresso allow programmatically simulating user
actions and testing complex intra-app user interactions.

6.2.3 Cross-app Components

This type of test verifies the correct behaviour of interactions between different user apps or
between user apps and system apps. For example, might want to test that app behaves
correctly when the user performs an action in the Settings menu. UI testing frameworks that
support crossapp interactions, such as UI Automaton, allow creating tests for such scenarios.

6.3 Test Cases:


A test case is a set of conditions or variables under which a tester will determine whether a
system under test satisfies requirements or works correctly. The process of developing text
causes can also help find problems in the requirements or design of an application.

The following tables show the various test causes scenarios that are generated along with the
required inputs o the given scenarios, expected outputs, actual output and the result whether
the test passes or fails.

Test causes with positive scenarios:


TC scenario Required Input Expected Actual output Test Result
No output

1 Enter Prediction Enter a valid Should predicted Pass


values values predicted successfully
successfully

2 Enter clustering State,year, Should cluster cluster Pass


values type successfully successfully

3 Enter Prediction Enter a valid Should Database fail


values values predicted error
successfully
4 Enter clustering State,year, Should cluster Database Fail
values type successfully error

You might also like