Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Leaf Disease Detection Using Machine Learning and Python

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 9

LEAF DISEASE DETECTION USING MACHINE LEARNING AND PYTHON

INTRODUCTION

In India, agriculture is the backbone of economy. 50% of the population is involved in

farming activities directly or indirectly. Many varieties of fruits, cereals and vegetables are

produced here and exported to other countries. Hence it is necessary to produce high

quality products with an optimum yield. As diseases of the plants are unavoidable,

detection of plant diseases is essential in the field of Agriculture. In plants, diseases can be

found in various parts such as fruits, stems and leaves. The main diseases of plants are

viral, fungus and bacterial disease like Alternaria, Anthracnose, bacterial spot, canker,

etc.,. The viral disease is due to environmental changes, fungus disease is due to the

presence of fungus in the leaf and bacterial disease is due to presence of germs in leaf or

plants. The proposed framework can be used to identify leaf diseases. Automatic detection

of plant diseases is an essential area since it is able to automatically detect the diseases

from the symptoms that appear on the plant leaves.

Barbedo proposed an automatic method of disease symptoms segmentation in digital

photographs of plant leaves, in which color channel manipulation & Boolean operation are

applied on binary mask of leaf pixels. He proposed the method of semi-automatic

segmentation of plant leaf disease symptoms in which the histograms of the H and color

channels are manipulated. Pang et al proposed the method of automatic segmentation of

crop leaf spot disease images by integrating local threshold and seeded region growing.

Singh and Misra proposed detection of plant leaf diseases using soft computing

techniques. Prasad et al proposed unsupervised resolution independent based natural plant

leaf disease segmentation approach in which texture based clustering for segmentation is

done. Du & Zhang proposed a technique to segment leaf image with non-uniform

illumination based on maximum entropy and genetic algorithm. Dhaygude & Kumbhar

proposed agricultural plant leaf disease detection using image processing in which the

texture statistics are computed from spatial gray-level dependence matrices (SGDM). Diao

et al reviewed the different methods including edge based, region based, Artificial Neural

Network (ANN) etc., for segmentation of plant disease spot. Different methods for
automatic leaf image segmentation and disease identification have been proposed in

literature. In this System, an automatic method of Leaf disease classification and prediction is

proposed. In this method segmentation of leaves is done using K-Means algorithm.

Texture features are extracted using GLCM and then classification is done using SVM and

machine learning is used to predict the disease

3.1 THEORETICAL BACKGROUND

In computer science, digital image processing is the use of computer algorithms to

perform image processing on digital images. As a subcategory or field of digital signal

processing, digital image processing has many advantages over analog image processing.

It allows a much wider range of algorithms to be applied to the input data and can avoid

problems such as the build-up of noise and signal distortion during processing. Since

images are defined over two dimensions (perhaps more) digital image processing may be

modeled in the form of multidimensional systems (Wikipedia, image processing)

Image processing is a method to convert an image into digital form and perform some

operations on it, in order to get an enhanced image or to extract some useful information

from it. It is a type of signal dispensation in which input is image, like video frame or

photograph and output may be image or characteristics associated with that image.

Usually Image Processing system includes treating images as two dimensional signals

while applying already set signal processing methods to them. It is among rapidly

growing technologies today, with its applications in various aspects of a business. Image

Processing forms core research area within engineering and computer science disciplines

too.

The two types of methods used for Image Processing are Analog and Digital Image

Processing. Analog or visual techniques of image processing can be used for the hard

copies like printouts and photographs. Image analysts use various fundamentals of

interpretation while using these visual techniques. The image processing is not just

confined to area that has to be studied but on knowledge of analyst. Association is

another important tool in image processing through visual techniques. So analysts apply

a combination of personal knowledge and collateral data to image processing.

Digital Processing techniques help in manipulation of the digital images by using


computers. As raw data from imaging sensors from satellite platform contains

deficiencies. To get over such flaws and to get originality of information, it has to

undergo various phases of processing. The three general phases that all types of data have to
undergo while using digital technique are Pre- processing, enhancement and

display, information extraction

Machine learning (ML) is the study of algorithms and mathematical

models that computer systems use to progressively improve their performance on a

specific task. Machine learning algorithms build a mathematical model of sample data,

known as "training data", in order to make predictions or decisions without being

explicitly programmed to perform the task. Machine learning algorithms are used in the

applications of email filtering, detection of network intruders, and computer vision,

where it is infeasible to develop an algorithm of specific instructions for performing the

task. Machine learning is closely related to computational statistics, which focuses on

making predictions using computers. The study of mathematical optimization delivers

methods, theory and application domains to the field of machine learning. Data mining is

a field of study within machine learning, and focuses on exploratory data

analysis through unsupervised learning.

Machine learning tasks are classified into several broad categories. In supervised

learning, the algorithm builds a mathematical model of a set of data that contains both

the inputs and the desired outputs. For example, if the task were determining whether an

image contained a certain object, the training data for a supervised learning algorithm

would include images with and without that object (the input), and each image would

have a label (the output) designating whether it contained the object. In special cases, the

input may be only partially available, or restricted to special feedback Semi-supervised

learning algorithms develop mathematical models from incomplete training data, where

a portion of the sample inputs are missing the desired output.

Classification algorithms and regression algorithms are types of supervised learning.

Classification algorithms are used when the outputs are restricted to a limited set of

values. For a classification algorithm that filters emails, the input would be an incoming

email, and the output would be the name of the folder in which to file the email. For an
algorithm that identifies spam emails, the output would be the prediction of either

"spam" or "not spam", represented by the Boolean values one and

zero. Regression algorithms are named for their continuous outputs, meaning they may

have any value within a range. Examples of a continuous value are the temperature,

length, or price of an object.

In unsupervised learning, the algorithm builds a mathematical model of a set of data

which contains only inputs and no desired outputs. Unsupervised learning algorithms are

used to find structure in the data, like grouping or clustering of data points. Unsupervised

learning can discover patterns in the data, and can group the inputs into categories, as

in feature learning. Dimensionality reduction is the process of reducing the number of

"features", or inputs, in a set of data (Wikipedia, image processing).

Active learning algorithms access the desired outputs (training labels) for a limited set of

inputs based on a budget, and optimize the choice of inputs for which it will acquire

training labels. When used interactively, these can be presented to a human user for

labeling. Reinforcement learning algorithms are given feedback in the form of positive

or negative reinforcement in a dynamic environment, and are used in autonomous vehicles or in


learning to play a game against a human opponent.[2]:3 Other specialized

algorithms in machine learning include topic modeling, where the computer program is

given a set of natural language documents and finds other documents that cover similar

topics. Machine learning algorithms can be used to find the unobservable probability

density function in density estimation problems. Meta learning algorithms learn their

own inductive bias based on previous experience. In developmental robotics, robot

learning algorithms generate their own sequences of learning experiences, also known as

a curriculum, to cumulatively acquire new skills through self-guided exploration and

social interaction with humans. These robots use guidance mechanisms such as active

learning, maturation, motor synergies, and imitation.

Savita N. Ghaiwat et al. presents survey on different classification techniques that can be

used for plant leaf disease classification. For given test example, k-nearest-neighbor

method is seems to be suitable as well as simplest of all algorithms for class prediction.

If training data is not linearly separable then it is difficult to determine optimal


parameters in SVM, which appears as one of its drawbacks.

There are mainly four steps in developed processing scheme, out of which, first one is,

for the input RGB image, a color transformation structure is created, because this RGB is

used for color generation and transformed or converted image of RGB, that is, HSI is

used for color descriptor. In second step, by using threshold value, green pixels are

masked and removed. In third, by using pre-computed threshold level, removing of

green pixels and masking is done for the useful segments that are extracted first in this

step, while image is segmented. And in last or fourth main step the segmentation is done.

In Indian Economy a Machine learning based recognition system to classify and identify

the different diseases through which plants are affected will prove to be very useful as it

saves efforts, money and time too. The approach given in this for feature set extraction is

the Color Co-occurrence Method. For automatic detection of diseases in leaves, neural

networks are used. The approach proposed can significantly support an accurate

detection of leaf, and seems to be important approach, in case of steam, and root

diseases, putting fewer efforts in computation.

Disease identification process include some steps out of which four main steps are as

follows: first, for the input RGB image, a color transformation structure is taken, and

then using a specific threshold value, the green pixels are masked and removed, which is

further followed by segmentation process, and for getting useful segments the texture

statistics are computed. At last, classifier is used for the features that are extracted to

classify the disease. The robustness of the proposed algorithm is proved by using

experimental results of about 500 plant leaves in a database.


i. Image Acquisition

Firstly, the images of various leaves are acquired using a digital camera with

required resolution for better quality. The input image is then resized to 256x256

pixels. The construction of an image database depends on the required application.

The image database has to be carefully constructed in that it generally decides the

efficiency of the classifier and performance of the proposed method.

ii. Image Pre-Processing

Image pre-processing is used to enhance the quality of the image necessary for

further processing and analysis. It includes color space conversion and image

enhancement. The RGB images of leaves are converted into L*a*b* color space.

The color transformation is done to determine the luminosity and chromaticity

layers. The color space conversion is used for the enhancement of visual analysis.

iii. Image Segmentation

Image segmentation is the process used to simplify the representation of an image

into meaningful form, such as to highlight object of interest from background. The

K-means clustering algorithm performs segmentation by minimizing the sum of

squares of distances between the image intensities and the cluster centroids. K

means clustering algorithm, or Lloyd's algorithm, is an iterative algorithm that

partitions the data and assigns n observations to precisely one of k clusters defined

by centroids.

The steps in the algorithm are given below.

1. Choose k initial cluster centers (centroid).

2. Compute point-to-cluster-centroid distances of all observations to each centroid.

3. Assign each observation to the cluster with the closest centroid.

4. Compute the mean of the observations in each cluster to obtain k new centroid

locations.

5. Repeat steps 2 through 4 until there is no change in the cluster assignments or

the maximum number of iterations is reached.

iv. Feature Extraction

After segmentation, the GLCM features are extractedfrom the image. Gray-Level
Co-Occurrence Matrix (GLCM) is the statistical method of investigating texture

which considers the spatial relationship of pixels [15]. The GLCM functions

characterize the texture of images by computing the spatial relationship among the

pixels in the images. The statistical measures are extracted from this matrix. In the

creation of GLCMs, an array of offsets which describe pixel relationships of

varying direction and distance have to be specified. In the proposed method, four

features are extracted which include contrast, energy, homogeneity and correlation.

Let Pijrepresents the (i, j)th entry in the normalized Gray-Level Co- Occurrence

Matrix. N represents the number of distinct gray levels in the quantized image. The

different features extracted are defined as follows.


3.2 SYSTEM ANALYSIS
3.2.1 MODULE DESCRIPTION

1) Pre-processing Phase

This phase involves image processing steps to extract features from the image by

performing background subtraction, Blob analysis, noise reduction, gray scale

conversion, brightness normalization and scaling operation one by one. The

common preprocessing are resizing and normalizing

2) Background Subtraction

This phase involves removing unwanted background details from image for

extracting only the essential details from the image.

3) Blob Analysis

A blob is a region having same properties and pixel values which constant or

varies within a prescribed range. This step discovers region of interest for further

processing by finding all connective parts of the frame and choose the biggest

(largest area) amongst them (since the lesion is the largest area). Blob analysis is

applicable in the field of object recognition or object tracking.

4) Noise Reduction

Noise reduction is meant to filter the discontinuity and noise by using smooth

Gaussian filter. This filter removes the noise by smoothening operation. The

Gaussian kernel size used for this filter is 3.This process will smoothen the image

5) Greyscale Conversion
This step converts color image into grayscale image which helps in further

calculations on pixel operations and interrelating diseases. Memory space in terms

of bits required to store grayscale image are lesser than the bits required storing

color image.

6) Brightness and Contrast Normalization

Images acquired in low illumination have close contrast values hence there is a

need to adjust pixel intensity values. Histogram equalization is performed in order

to adjust and normalize brightness and contrast of processing frame.

7) Image Scaling

Image scaling is done to reduce the computational effort needed for image

processing. Every image will be scaled to 100*100 sizes for further processing

8) Training stage

In this phase the data is trained with a machine learning classifier. We use a

number of images of each disease and trains all the images in the classifier. The

classifier learns from the Samples and predicts from the new input sample. Here

we use SVM classifier to train the data and information.

9) Testing Stage

Once training is over, classifier is now well trained to distinguish between different

Diseases. Testing isperformed on the new input images. Output of this phase is in a

number which represents a disease ID.

You might also like