Leaf Disease Detection Using Machine Learning and Python
Leaf Disease Detection Using Machine Learning and Python
Leaf Disease Detection Using Machine Learning and Python
INTRODUCTION
farming activities directly or indirectly. Many varieties of fruits, cereals and vegetables are
produced here and exported to other countries. Hence it is necessary to produce high
quality products with an optimum yield. As diseases of the plants are unavoidable,
detection of plant diseases is essential in the field of Agriculture. In plants, diseases can be
found in various parts such as fruits, stems and leaves. The main diseases of plants are
viral, fungus and bacterial disease like Alternaria, Anthracnose, bacterial spot, canker,
etc.,. The viral disease is due to environmental changes, fungus disease is due to the
presence of fungus in the leaf and bacterial disease is due to presence of germs in leaf or
plants. The proposed framework can be used to identify leaf diseases. Automatic detection
of plant diseases is an essential area since it is able to automatically detect the diseases
photographs of plant leaves, in which color channel manipulation & Boolean operation are
segmentation of plant leaf disease symptoms in which the histograms of the H and color
crop leaf spot disease images by integrating local threshold and seeded region growing.
Singh and Misra proposed detection of plant leaf diseases using soft computing
leaf disease segmentation approach in which texture based clustering for segmentation is
done. Du & Zhang proposed a technique to segment leaf image with non-uniform
illumination based on maximum entropy and genetic algorithm. Dhaygude & Kumbhar
proposed agricultural plant leaf disease detection using image processing in which the
texture statistics are computed from spatial gray-level dependence matrices (SGDM). Diao
et al reviewed the different methods including edge based, region based, Artificial Neural
Network (ANN) etc., for segmentation of plant disease spot. Different methods for
automatic leaf image segmentation and disease identification have been proposed in
literature. In this System, an automatic method of Leaf disease classification and prediction is
Texture features are extracted using GLCM and then classification is done using SVM and
processing, digital image processing has many advantages over analog image processing.
It allows a much wider range of algorithms to be applied to the input data and can avoid
problems such as the build-up of noise and signal distortion during processing. Since
images are defined over two dimensions (perhaps more) digital image processing may be
Image processing is a method to convert an image into digital form and perform some
operations on it, in order to get an enhanced image or to extract some useful information
from it. It is a type of signal dispensation in which input is image, like video frame or
photograph and output may be image or characteristics associated with that image.
Usually Image Processing system includes treating images as two dimensional signals
while applying already set signal processing methods to them. It is among rapidly
growing technologies today, with its applications in various aspects of a business. Image
Processing forms core research area within engineering and computer science disciplines
too.
The two types of methods used for Image Processing are Analog and Digital Image
Processing. Analog or visual techniques of image processing can be used for the hard
copies like printouts and photographs. Image analysts use various fundamentals of
interpretation while using these visual techniques. The image processing is not just
another important tool in image processing through visual techniques. So analysts apply
deficiencies. To get over such flaws and to get originality of information, it has to
undergo various phases of processing. The three general phases that all types of data have to
undergo while using digital technique are Pre- processing, enhancement and
specific task. Machine learning algorithms build a mathematical model of sample data,
explicitly programmed to perform the task. Machine learning algorithms are used in the
methods, theory and application domains to the field of machine learning. Data mining is
Machine learning tasks are classified into several broad categories. In supervised
learning, the algorithm builds a mathematical model of a set of data that contains both
the inputs and the desired outputs. For example, if the task were determining whether an
image contained a certain object, the training data for a supervised learning algorithm
would include images with and without that object (the input), and each image would
have a label (the output) designating whether it contained the object. In special cases, the
learning algorithms develop mathematical models from incomplete training data, where
Classification algorithms are used when the outputs are restricted to a limited set of
values. For a classification algorithm that filters emails, the input would be an incoming
email, and the output would be the name of the folder in which to file the email. For an
algorithm that identifies spam emails, the output would be the prediction of either
zero. Regression algorithms are named for their continuous outputs, meaning they may
have any value within a range. Examples of a continuous value are the temperature,
which contains only inputs and no desired outputs. Unsupervised learning algorithms are
used to find structure in the data, like grouping or clustering of data points. Unsupervised
learning can discover patterns in the data, and can group the inputs into categories, as
Active learning algorithms access the desired outputs (training labels) for a limited set of
inputs based on a budget, and optimize the choice of inputs for which it will acquire
training labels. When used interactively, these can be presented to a human user for
labeling. Reinforcement learning algorithms are given feedback in the form of positive
algorithms in machine learning include topic modeling, where the computer program is
given a set of natural language documents and finds other documents that cover similar
topics. Machine learning algorithms can be used to find the unobservable probability
density function in density estimation problems. Meta learning algorithms learn their
learning algorithms generate their own sequences of learning experiences, also known as
social interaction with humans. These robots use guidance mechanisms such as active
Savita N. Ghaiwat et al. presents survey on different classification techniques that can be
used for plant leaf disease classification. For given test example, k-nearest-neighbor
method is seems to be suitable as well as simplest of all algorithms for class prediction.
There are mainly four steps in developed processing scheme, out of which, first one is,
for the input RGB image, a color transformation structure is created, because this RGB is
used for color generation and transformed or converted image of RGB, that is, HSI is
used for color descriptor. In second step, by using threshold value, green pixels are
green pixels and masking is done for the useful segments that are extracted first in this
step, while image is segmented. And in last or fourth main step the segmentation is done.
In Indian Economy a Machine learning based recognition system to classify and identify
the different diseases through which plants are affected will prove to be very useful as it
saves efforts, money and time too. The approach given in this for feature set extraction is
the Color Co-occurrence Method. For automatic detection of diseases in leaves, neural
networks are used. The approach proposed can significantly support an accurate
detection of leaf, and seems to be important approach, in case of steam, and root
Disease identification process include some steps out of which four main steps are as
follows: first, for the input RGB image, a color transformation structure is taken, and
then using a specific threshold value, the green pixels are masked and removed, which is
further followed by segmentation process, and for getting useful segments the texture
statistics are computed. At last, classifier is used for the features that are extracted to
classify the disease. The robustness of the proposed algorithm is proved by using
Firstly, the images of various leaves are acquired using a digital camera with
required resolution for better quality. The input image is then resized to 256x256
The image database has to be carefully constructed in that it generally decides the
Image pre-processing is used to enhance the quality of the image necessary for
further processing and analysis. It includes color space conversion and image
enhancement. The RGB images of leaves are converted into L*a*b* color space.
layers. The color space conversion is used for the enhancement of visual analysis.
into meaningful form, such as to highlight object of interest from background. The
squares of distances between the image intensities and the cluster centroids. K
partitions the data and assigns n observations to precisely one of k clusters defined
by centroids.
4. Compute the mean of the observations in each cluster to obtain k new centroid
locations.
After segmentation, the GLCM features are extractedfrom the image. Gray-Level
Co-Occurrence Matrix (GLCM) is the statistical method of investigating texture
which considers the spatial relationship of pixels [15]. The GLCM functions
characterize the texture of images by computing the spatial relationship among the
pixels in the images. The statistical measures are extracted from this matrix. In the
varying direction and distance have to be specified. In the proposed method, four
features are extracted which include contrast, energy, homogeneity and correlation.
Let Pijrepresents the (i, j)th entry in the normalized Gray-Level Co- Occurrence
Matrix. N represents the number of distinct gray levels in the quantized image. The
1) Pre-processing Phase
This phase involves image processing steps to extract features from the image by
2) Background Subtraction
This phase involves removing unwanted background details from image for
3) Blob Analysis
A blob is a region having same properties and pixel values which constant or
varies within a prescribed range. This step discovers region of interest for further
processing by finding all connective parts of the frame and choose the biggest
(largest area) amongst them (since the lesion is the largest area). Blob analysis is
4) Noise Reduction
Noise reduction is meant to filter the discontinuity and noise by using smooth
Gaussian filter. This filter removes the noise by smoothening operation. The
Gaussian kernel size used for this filter is 3.This process will smoothen the image
5) Greyscale Conversion
This step converts color image into grayscale image which helps in further
of bits required to store grayscale image are lesser than the bits required storing
color image.
Images acquired in low illumination have close contrast values hence there is a
7) Image Scaling
Image scaling is done to reduce the computational effort needed for image
processing. Every image will be scaled to 100*100 sizes for further processing
8) Training stage
In this phase the data is trained with a machine learning classifier. We use a
number of images of each disease and trains all the images in the classifier. The
classifier learns from the Samples and predicts from the new input sample. Here
9) Testing Stage
Once training is over, classifier is now well trained to distinguish between different
Diseases. Testing isperformed on the new input images. Output of this phase is in a