Final Project Report
Final Project Report
Final Project Report
Euphoria GenX
----------------------------X--------------------------------
Index:-
1. Abstract……………………………………………………………………………..1-2
2. Acknowledgement……………………………………………………………..2-3
3. SDK(kits)…………………………………………………………………………….3-4
4. Model …………………………………………………………………………………4-11
5. Machine Learning……………………………………………………………….12
7. Python…………………………………………………………………………………13
8. Workflow Project………………………………………………………………..13-15
15. Conclusion……………………………………………………………………………29-30
17. Bibliography………………………………………………………………………..32
------------------------------------------------------------------------X-----------------------------------------------------------------------------------
1
1.Abstract:-
Smarter applications are making better use of the insights gleaned from data,
having an impact on every industry and research discipline. At the core the
revolution lies the tools and the methods that are driving it, from processing the
massive piles of data generated each day to learning from and taking useful
characteristics and features. Python is one of the most preferred languages for
paper offers insight into the field of machine learning with python, taking a tour
through important topics and libraries of python which enables the development
machine learning and various algorithms of machine leaning. And at last, we will
look at the one of the most used models i.e., Linear Regression.
on the value of another variable. The variable you want to predict is called
dependent variable. The variable you are using to predict the other variable’s
Y = mx + c
And at last, in this paper, we will be going to understand one of the linear
2
regression models for an ice-cream selling company which will predict the sales
Regression.
2.Acknowledgement:-
computer software programs to 'learn' different tasks making use of the available data. ML
programs become more accurate the more they train on the relevant test data.
Many types of Software Development Tools are there.Some of them which is used in
Programming are:-
1.numpy:-
Work:-In Python we have lists that serve the purpose of arrays, but they are slow to process.
NumPy aims to provide an array object that is up to 50x faster than traditional Python lists.
The array object in NumPy is called ndarray, it provides a lot of supporting functions that make
Arrays are very frequently used in data science, where speed and resources are very important.
2.pandas:-
Work:-Pandas allows us to analyze big data and make conclusions based on statistical theories.
Pandas can clean messy data sets, and make them readable and relevant.
Language:-Python.
3.Matplotlib:-
on NumPy arrays and designed to work with the broader SciPy stack and consists of
3
several plots like line, bar, scatter, histogram, etc.
Language:-Python.
4.Pyplot:-
like MATLAB. Each pyplot function makes some change to a figure: e.g., creates a
figure, creates a plotting area in a figure, plots some lines in a plotting area,
Language:-Python.
5.Seaborn:-
of matplotlib and integrates closely with pandas data structures. Seaborn helps you
Language:-Graphics,Python.
3.SDK:-As an AI language model, I can provide you with information about Google
including CPU, GPU, and TPU. It allows users to write, run, and share Python code in a
collaborative and interactive manner. Colab is hosted on Google Drive, and notebooks
While Google Colab does not have an official SDK, it provides a Python library called
Some of the common tasks that can be performed using the "google.colab" library
include:-
Importing and exporting files:-The library allows you to upload and download files to and
from the Colab environment. For example, you can use the "files.upload()" function to
upload files from your local machine to Colab, and the "files.download()" function to
directly from within the Colab environment using the "!pip install" command.
Managing Colab sessions:-The library allows you to manage the lifecycle of a Colab
session. You can use functions like "drive.mount()" to mount your Google Drive,
functionality to connect to external services like Google Drive and Google Sheets,
allowing you to read and write data to these services from within a Colab notebook.
Interacting with Colab UI:- The library allows you to interact with the Colab user
display images, videos, and other media in the output of a Colab cell.
Overall, while Google Colab does not have a standalone SDK, the "google.colab" library
provides a convenient way to interact with the Colab environment programmatically and
automate various tasks within Colab notebooks. You can import the "google.colab"
library in your Python code and use its functions to perform operations within the Colab
environment.
4.Model:-
5
SDLC Model:-
Waterfall Model:-
The waterfall is a universally accepted SDLC model. In this method, the whole process of
seen as flowing steadily downwards (like a waterfall) through the steps of requirements
Linear ordering of activities has some significant consequences. First, to identify the end of a
phase and the beginning of the next, some certification techniques have to be employed at
the end of each step. Some verification and validation usually do this mean that will ensure
that the output of the stage is consistent with its input (which is the output of the previous
step), and that the output of the stage is consistent with the overall requirements of the
system.
RAD Model:-
targets developing software in a short period. The RAD model is based on the concept that a
better system can be developed in lesser time by using focus groups to gather system
requirements.
o Business Modeling
o Data Modeling
6
o Process Modeling
o Application Generation
o Testing and Turnover
Spiral Model:-
The spiral model is a risk-driven process model. This SDLC model helps the group to adopt
elements of one or more process models like a waterfall, incremental, waterfall, etc. The
development activities.
Each cycle in the spiral begins with the identification of objectives for that cycle, the
different alternatives that are possible for achieving the goals, and the constraints that exist.
The next step in the cycle is to evaluate these different alternatives based on the objectives
and constraints. The focus of evaluation in this step is based on the risk perception for the
project.
The next step is to develop strategies that solve uncertainties and risks. This step may
V-Model:-
In this type of SDLC model testing and the development, the step is planned in parallel. So,
there are verification phases on the side and the validation phase on the other side. V-Model
Incremental Model:-
The incremental model is not a separate model. It is necessarily a series of waterfall cycles.
The requirements are divided into groups at the start of the project. For each group, the
SDLC model is followed to develop software. The SDLC process is repeated, with each
release adding more functionality until all requirements are met. In this method, each cycle
7
act as the maintenance phase for the previous software release. Modification to the
incremental model allows development cycles to overlap. After that subsequent cycle may
Agile Model:-
testing during the SDLC process of any project. In the Agile method, the entire project is
divided into small incremental builds. All of these builds are provided in iterations, and each
Any agile software phase is characterized in a manner that addresses several key
1. It is difficult to think in advance which software requirements will persist and which
will change. It is equally difficult to predict how user priorities will change as the project
proceeds.
2. For many types of software, design and development are interleaved. That is, both
activities should be performed in tandem so that design models are proven as they are
created. It is difficult to think about how much design is necessary before construction is
3. Analysis, design, development, and testing are not as predictable (from a planning
Iterative Model:-
initial, simplified implementation, which then progressively gains more complexity and a
broader feature set until the final system is complete. In short, iterative development is a
way of breaking down the software development of a large application into smaller pieces.
8
Big bang model is focusing on all types of resources in software development and coding,
with no or very little planning. The requirements are understood and implemented when they
come.
This model works best for small projects with smaller size development team which are
working together. It is also useful for academic software development projects. It is an ideal
model where requirements are either unknown or final release date is not given.
Prototype Model:-
The prototyping model starts with the requirements gathering. The developer and the user
meet and define the purpose of the software, identify the needs, etc.
A 'quick design' is then created. This design focuses on those aspects of the software that
will be visible to the user. It then leads to the development of a prototype. The customer then
checks the prototype, and any modifications or changes that are needed are made to the
prototype.
Looping takes place in this step, and better versions of the prototype are created. These are
continuously shown to the user so that any new changes can be updated in the prototype.
This process continue until the customer is satisfied with the system. Once a user is
satisfied, the prototype is converted to the actual system with all considerations for quality
and security.
1. Gathering Data:-
Data Gathering is the first step of the machine learning life cycle. The goal of this step is to
In this step, we need to identify the different data sources, as data can be collected from
various sources such as files, database, internet, or mobile devices. It is one of the most
important steps of the life cycle. The quantity and quality of the collected data will
determine the efficiency of the output. The more will be the data, the more accurate will be
the prediction.
By performing the above task, we get a coherent set of data, also called as a dataset. It will
2. Data preparation
After collecting the data, we need to prepare it for further steps. Data preparation is a step
where we put our data into a suitable place and prepare it to use in our machine learning
training.
In this step, first, we put all data together, and then randomize the ordering of data.
o Data exploration:-
It is used to understand the nature of data that we have to work with. We need to
o Data pre-processing:-
Now the next step is preprocessing of data for its analysis.
3. Data Wrangling:-
Data wrangling is the process of cleaning and converting raw data into a useable format. It is
the process of cleaning the data, selecting the variable to use, and transforming the data in a
proper format to make it more suitable for analysis in the next step. It is one of the most
important steps of the complete process. Cleaning of data is required to address the quality
issues.
It is not necessary that data we have collected is always of our use as some of the data may
not be useful. In real-world applications, collected data may have various issues, including:
o Missing Values
o Duplicate data
o Invalid data
o Noise
It is mandatory to detect and remove the above issues because it can negatively affect the
4. Data Analysis:-
Now the cleaned and prepared data is passed on to the analysis step. This step involves:
The aim of this step is to build a machine learning model to analyze the data using various
analytical techniques and review the outcome. It starts with the determination of the type of
as Classification, Regression, Cluster analysis, Association, etc. then build the model using
Hence, in this step, we take the data and use machine learning algorithms to build the model.
5. Train Model:-
Now the next step is to train the model, in this step we train our model to improve its
We use datasets to train the model using various machine learning algorithms. Training a
model is required so that it can understand the various patterns, rules, and, features.
6. Test Model:-
Once our machine learning model has been trained on a given dataset, then we test the
model. In this step, we check for the accuracy of our model by providing a test dataset to it.
Testing the model determines the percentage accuracy of the model as per the requirement
of project or problem.
7. Deployment:-
The last step of machine learning life cycle is deployment, where we deploy the model in the
real-world system.
If the above-prepared model is producing an accurate result as per our requirement with
acceptable speed, then we deploy the model in the real system. But before deploying the
project, we will check whether it is improving its performance using available data or not.
The deployment phase is similar to making the final report for a project.
12
5.Machine Learning:-
ML-based deep learning can simplify the task of crop breeding. Algorithms simply collect
field data on plant behavior and use that data to develop a probabilistic model.
Crop yield prediction is another instance of machine learning in the agriculture sector.
The technology amplifies decisions on what crop species to grow and what activities to
perform during the growing season. Tech-wise, crop yield is used as a dependent
variable when making predictions. The major factors include temperature, soil type,
rainfall, and actual crop information. Based on these inputs, ML algorithms like neural
to detect plant and crop rows. Aerial images, taken by an Unmanned Aerial
to detect the plants present in the images. A Hough Transform (HT) approach
was used to detect the orientation of the crop rows and rotate the images so
that the rows became parallel to the x-axis. The result of applying different
13
segmentation methods to the images was then used in estimating the
was then applied to find the centroids of the detected plants. Each centroid
was associated with a crop row, and centroids lying outside the row
Python is also being used for developing the IoT devices. AI is assisting IoT
in enabling real-time data analytics to help make informed decisions to farmers. Precision
agriculture or smart Agriculture relies on emerging technologies such as AI, ML and data
8. Workflow Project:-
coordination and automation of tasks and processes related to crop cultivation, monitoring,
14
and prediction of yields. Here's a general outline of a typical workflow management system
Data Collection:- Data related to various factors that influence crop growth and yield, such as
weather conditions, soil characteristics, historical crop data, and satellite imagery, are
collected and integrated into the workflow management system. This data can be collected
Data Preprocessing:- The collected data is preprocessed to clean and transform it into a
format suitable for analysis. This may involve data cleaning, normalization, aggregation, and
Data Analysis:- The preprocessed data is analyzed using various statistical and machine
learning techniques to identify patterns, trends, and correlations between different variables.
For example, machine learning algorithms such as decision trees, random forests, and neural
networks can be used to predict crop yields based on historical data and environmental
factors.
Crop Prediction:- Based on the analysis results, the workflow management system can
generate crop prediction models that can forecast crop yields for different crops and regions.
These models can be continuously updated with new data to improve their accuracy over
time.
Decision Support:- The workflow management system can provide decision support to
farmers by presenting them with insights and recommendations based on the crop prediction
models. For example, it can suggest optimal planting times, irrigation schedules, and
fertilization plans based on the predicted crop yields and current weather conditions.
Task Automation:-The workflow management system can automate various tasks related to
crop cultivation, such as scheduling irrigation, applying fertilizers, and monitoring pest
control, based on the predicted crop yields and environmental conditions. This can help
actual crop growth and yield data and compare it with the predicted results. This feedback
loop allows for ongoing validation and refinement of the prediction models, and helps farmers
Reporting and Visualization:- The workflow management system can generate reports and
visualizations to provide farmers and other stakeholders with a clear understanding of the
crop prediction results, trends, and performance metrics. This can help farmers evaluate the
effectiveness of their crop management strategies and make data-driven decisions for future
seasons.
Integration with Crop Management Tools:- The workflow management system can be
integrated with other crop management tools, such as farm management software, precision
by incorporating new data sources, updating prediction models, and refining decision support
algorithms based on feedback from farmers and other stakeholders. This iterative process
helps ensure that the system remains accurate, reliable, and relevant over time.
9.Elbow Method:-
16
The Elbow Method is a commonly used technique in data science and machine learning to
determine the optimal number of clusters or groups in a dataset. It can also be applied in
For each value of k, run the clustering algorithm and compute the sum of squared distances
(SSE) of each data point to its centroid within each cluster. Plot the SSE values against the
The Elbow Method can help in optimizing the clustering process and improving the accuracy
of crop prediction models by identifying the appropriate number of clusters or groups in the
dataset. It can also aid in making informed decisions related to crop management, resource
The distribution of agricultural conditions can vary greatly depending on various factors such
as climate, soil type, topography, water availability, and human intervention. Here are some
example, areas with moderate temperatures, adequate rainfall, and ample sunlight are often
conducive to agriculture. Regions with harsh climates such as deserts, extreme cold, or
Soil type:- Soil type is another critical factor that influences agricultural conditions. Different
crops require different types of soils for optimal growth. For example, crops like rice and
cranberries thrive in acidic soils, while crops like wheat and corn prefer well-drained loamy
soils. Agricultural areas are often found in regions with fertile soils that provide essential
agricultural conditions. Flat or gently sloping lands are generally more suitable for
agriculture as they allow for easier irrigation and cultivation. Steep slopes or rugged terrains
may pose challenges in terms of soil erosion, water runoff, and accessibility, which can
Water availability:-Access to water is critical for agriculture. Regions with ample water
resources such as rivers, lakes, or groundwater reserves are often conducive to agriculture.
Irrigation systems are often developed in areas with limited rainfall to support crop growth.
In contrast, areas with limited water resources may face challenges in agricultural
production.
as irrigation systems, fertilizers, and crop management practices, can enhance agricultural
productivity and expand the potential for agriculture in regions with suboptimal conditions.
Human settlements and infrastructure, such as roads and markets, also play a role in
Understanding these factors is crucial for planning and managing agricultural activities and
11.Predictions of Crops:-
As an AI language model, I can provide predictions about potential trends and factors
that may impact crop production in the future. However, it's important to note that crop
advancements, economic factors, and policy changes, which can all influence crop
disease outbreaks, can also significantly impact crop yields. With these considerations in
growing demand for climate-resilient crops that are adapted to changing weather patterns,
engineering may lead to the development of genetically modified crops that are better able
to withstand extreme weather conditions, helping to ensure stable crop production in the
Vertical farming:- Vertical farming, which involves growing crops indoors in stacked layers
19
using artificial lighting, may become more widespread due to its potential for year-round
Advances in LED lighting technology, automation, and data analytics may drive increased
adoption of vertical farming, allowing for the cultivation of a wide variety of crops in
Organic and regenerative agriculture: There may be a growing demand for organic and
regenerative agricultural practices that prioritize soil health, biodiversity, and ecosystem
drive demand for crops grown using organic or regenerative practices, which can promote
soil fertility, reduce chemical inputs, and enhance overall ecosystem resilience.
drones, sensors, and data analytics to optimize crop management, may continue to gain
momentum. Advancements in remote sensing, data analytics, and artificial intelligence may
management, and pest control, resulting in improved crop yields, reduced input use, and
enhanced sustainability.
Alternative protein crops:-As global demand for protein-rich foods continues to rise, there
may be an increasing focus on alternative protein crops, such as legumes, insects, and algae.
These crops are rich in protein, require fewer resources to produce compared to traditional
traditional and indigenous crops that are well adapted to local climates and have genetic
diversity. These crops may be seen as more resilient to changing environmental conditions
to increased adoption of genetically modified crops with enhanced traits, such as resistance
20
to pests, diseases, or environmental stress. However, the adoption of genetically modified
crops may continue to be a topic of debate, with concerns about safety, environmental
It's important to note that these predictions are speculative and may be subject to change as
new technologies, policies, and environmental factors emerge. The future of crop production
will likely be shaped by a complex interplay of various factors, and careful monitoring and
adaptive management will be necessary to ensure sustainable and resilient crop production
systems.
Example:-
12.Confusion Matrix:-
A confusion matrix, also known as an error matrix, is a commonly used evaluation metric in
machine learning and data mining to assess the performance of a classification model. K-
means, however, is an unsupervised clustering algorithm that does not inherently provide
labels or ground truth for classification. Therefore, using a confusion matrix directly with K-
However, if you are interested in evaluating the performance of a classification model that is
trained using K-means clustering as a feature extraction step, you can follow these steps to
Perform K-means clustering:-Use K-means algorithm to cluster your data into K groups. The
21
clusters obtained from K-means can be treated as pseudo-labels for your data.
Train a classifier:-Use the cluster assignments obtained from K-means as features and train
a classification model, such as logistic regression, decision tree, or support vector machine
(SVM), using a labeled dataset. The labeled dataset should have true class labels for each
Make predictions:-Use the trained classifier to make predictions on a test dataset. The
predicted class labels can be obtained from the output of the classifier.
Create a confusion matrix:-Compare the predicted class labels with the true class labels
from the test dataset to create a confusion matrix. The confusion matrix will have rows
representing the true class labels and columns representing the predicted class labels. The
diagonal elements of the confusion matrix represent the number of correct predictions, while
Calculate performance metrics:-Use the values in the confusion matrix to calculate various
performance metrics such as accuracy, precision, recall, and F1 score, which provide
insights into the classification performance of the model.Here's an example of how you can
create a confusion matrix using K-means clustering as a feature extraction step in Python.
A confusion matrix, also known as an error matrix, is a performance evaluation tool used in
machine learning and statistics to assess the accuracy of a classification model. It is a table
that displays the true positive (TP), true negative (TN), false positive (FP), and false negative
(FN) values for a set of predictions compared to the actual ground truth.
Negative | FN | TN
Each cell in the confusion matrix represents the count or percentage of instances that fall
into a specific category based on the model's predictions and the actual ground truth. The
True Negative (TN):-The number of instances that are actually negative and are correctly
False Positive (FP):-The number of instances that are actually negative but are incorrectly
False Negative (FN):-The number of instances that are actually positive but are incorrectly
The confusion matrix provides valuable insights into the performance of a classification
model, allowing for the calculation of various performance metrics such as accuracy,
precision, recall, F1 score, and specificity, which help in understanding the model's strengths
and weaknesses. It is a useful tool for evaluating and fine-tuning machine learning models to
model, such as logistic regression. It is a matrix that shows the number of true positives (TP),
false positives (FP), true negatives (TN), and false negatives (FN) for a given set of
Here's an example of how you can create a confusion matrix using logistic regression in
Python:-
import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
# Load your dataset
# X is the feature matrix, y is the target variable
X, y = load_your_dataset()
# Split the dataset into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Initialize the logistic regression model
logreg = LogisticRegression()
23
# Train the model
logreg.fit(X_train, y_train)
# Make predictions on the test set
y_pred = logreg.predict(X_test)
# Create a confusion matrix
cm = confusion_matrix(y_test, y_pred)
# Extract values from the confusion matrix
tn, fp, fn, tp = cm.ravel()
# Print the confusion matrix
print("Confusion Matrix:")
print("True Negatives (TN):", tn)
print("False Positives (FP):", fp)
print("False Negatives (FN):", fn)
print("True Positives (TP):", tp)
# You can also visualize the confusion matrix using a heatmap
import seaborn as sns
import matplotlib.pyplot as plt
# Create a heatmap of the confusion matrix
sns.heatmap(cm, annot=True, fmt="d", cmap="Blues")
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.show()
In the example above, we first load our dataset and split it into train and test sets using
train_test_split from Scikit-learn. Then we initialize a logistic regression model, fit it to the
training data, and make predictions on the test data. We create a confusion matrix using
confusion_matrix from Scikit-learn, and then extract the values for TP, FP, FN, and TP from
the confusion matrix. Finally, we print the values and visualize the confusion matrix using a
machine learning and data mining to assess the performance of a classification model. K-
means, however, is an unsupervised clustering algorithm that does not inherently provide
labels or ground truth for classification. Therefore, using a confusion matrix directly with K-
However, if you are interested in evaluating the performance of a classification model that is
trained using K-means clustering as a feature extraction step, you can follow these steps to
clusters obtained from K-means can be treated as pseudo-labels for your data.
Train a classifier:-Use the cluster assignments obtained from K-means as features and train
a classification model, such as logistic regression, decision tree, or support vector machine
(SVM), using a labeled dataset. The labeled dataset should have true class labels for each
Make predictions:-Use the trained classifier to make predictions on a test dataset. The
predicted class labels can be obtained from the output of the classifier.
Create a confusion matrix:-Compare the predicted class labels with the true class labels
from the test dataset to create a confusion matrix. The confusion matrix will have rows
representing the true class labels and columns representing the predicted class labels. The
diagonal elements of the confusion matrix represent the number of correct predictions, while
Calculate performance metrics:-Use the values in the confusion matrix to calculate various
performance metrics such as accuracy, precision, recall, and F1 score, which provide
Here's an example of how you can create a confusion matrix using K-means clustering as a
metrics such as precision, recall, F1-score, and support for each class in a classification
problem. You can interpret the report to assess the performance of your logistic regression
model.
Here's an example of how you can generate a classification report for agriculture and
crop production using logistic regression. Please note that this is a hypothetical
example and the data and results are not based on actual data.
26
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report
# Load the dataset (example data)
data = pd.read_csv('agriculture_dataset.csv')
# Split the data into features and target variable
X = data.drop('Crop_Type', axis=1) # Features
y = data['Crop_Type'] # Target variable
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Create and fit the logistic regression model
model = LogisticRegression()
model.fit(X_train, y_train)
# Predict on the testing data
y_pred = model.predict(X_test)
# Generate the classification report
report = classification_report(y_test, y_pred)
# Print the classification report
print(report)
report, which provides metrics such as precision, recall, F1-score, and support for each
class in the target variable (Crop_Type in this case). The report gives an overview of the
performance of the logistic regression model in predicting the crop type based on the
Source Code:-
plt.subplot(3,4,1)
sns.histplot(data['N'],color="green")
plt.xlabel("Nitrogen")
plt.grid()
plt.subplot(3,4,2)
sns.histplot(data['P'],color="red")
plt.xlabel("P")
plt.grid()
plt.subplot(3,4,3)
sns.histplot(data['K'],color="yellow")
plt.xlabel("K")
plt.grid()
plt.subplot(3,4,4)
sns.histplot(data['ph'],color="blue")
plt.xlabel("PH")
plt.grid()
plt.subplot(2,4,5)
sns.histplot(data['temperature'],color="yellow")
plt.xlabel("temperature")
plt.grid()
plt.subplot(2,4,6)
sns.histplot(data['humidity'],color="green")
plt.xlabel("humidity")
plt.grid()
plt.subplot(2,4,7)
sns.histplot(data['rainfall'],color="blue")
plt.xlabel("rainfall")
plt.grid()
"""**Elbow method**"""
from pandas.core.common import random_state
from sklearn.cluster import KMeans
x=data.drop(['label'],axis=1)
x=x.values
wcss=[]
for i in range(1,11):
km=KMeans(n_clusters=i,init="k-means++", max_iter=2000,n_init=10,random_state=0)
km.fit(x)
wcss.append(km.inertia_)
plt.plot(range(1,11),wcss)
plt.show()
km=KMeans(n_clusters=4,init="k-means++", max_iter=2000,n_init=10,random_state=0)
y_means=km.fit_predict(x)
a=data['label']
y_means=pd.DataFrame(y_means)
z=pd.concat([y_means,a],axis=1)
z=z.rename(columns={0:'cluster'})
29
print("Cluster 1",z[z['cluster']==0]['label'].unique())
print("Cluster 2",z[z['cluster']==1]['label'].unique())
print("Cluster 3",z[z['cluster']==2]['label'].unique())
print("Cluster 4",z[z['cluster']==3]['label'].unique())
y=data['label']
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=.2,random_state=0)
from sklearn.linear_model import LogisticRegression
model=LogisticRegression()
model.fit(x_train,y_train)
y_pred=model.predict(np.array([[40,40,40,40,100,7,200]]))
print(y_pred)
y_pred=model.predict(x_test)
from sklearn.metrics import classification_report
cr=classification_report(y_test,y_pred)
print(cr)
from sklearn.metrics import confusion_matrix
cm=confusion_matrix(y_test,y_pred)
sns.heatmap(cm,annot=True)
print(cm)
Output:-
15.Conclusion:-
In conclusion, machine learning has emerged as a promising tool for predicting crop yields
30
and improving agricultural practices. By leveraging large datasets and sophisticated
algorithms, machine learning models can analyze various factors such as weather patterns,
soil conditions, historical crop data, and management practices to make accurate
One key benefit of crop prediction using machine learning is its potential to optimize
agricultural practices. Farmers can use these predictions to make informed decisions about
planting schedules, irrigation, fertilization, and pest management, leading to more efficient
resource allocation and higher yields. Additionally, machine learning can help farmers
identify early warning signs of crop stress or disease outbreaks, allowing for timely
Machine learning in crop prediction also has the potential to contribute to sustainable
agriculture by optimizing resource use. For example, by predicting crop water requirements,
farmers can implement targeted irrigation strategies, minimizing water waste and conserving
this precious resource. Similarly, by predicting crop nutrient needs, farmers can apply
fertilizers more judiciously, reducing the risk of nutrient runoff and environmental pollution.
However, it's important to note that machine learning models for crop prediction are not
without limitations. Accurate predictions depend on the availability of reliable data, and in
many regions, data may be sparse or inconsistent. Additionally, machine learning models are
not immune to biases and may suffer from limitations in generalization, especially when
applied to different regions or crop varieties. Therefore, it's crucial to continue refining and
In conclusion, machine learning has the potential to revolutionize crop prediction and
sustainable agriculture. However, ongoing research, data collection, and model validation
are necessary to ensure their reliability and effectiveness in real-world farming scenarios.
31
16.Future scope:-
The future scope of machine learning in crop prediction is promising and holds significant
potential for revolutionizing agriculture and improving crop production. Here are some key
areas where machine learning can play a significant role in the future:-
including soil quality, weather patterns, pest and disease prevalence, and plant growth rates
pest control. This can optimize resource usage, reduce input costs, and increase crop yields.
crop diseases and pests and create predictive models that can help farmers anticipate
disease outbreaks and pest infestations. This can enable early intervention and prevent crop
losses, reducing the reliance on chemical pesticides and minimizing environmental impact.
learning can help farmers adapt by providing predictive models that take into account
changing weather patterns, temperature fluctuations, and rainfall variability. This can enable
farmers to make informed decisions about crop selection, planting times, and irrigation
strategies.
historical yield data, weather patterns, and other factors to create accurate crop yield
forecasts. This can help farmers with crop planning, marketing, and financial decision-
making.
by analyzing genetic data and identifying optimal combinations of traits for crop
improvement. This can accelerate the development of new crop varieties with improved yield,
including satellite imagery, to monitor crop health, detect stressors such as nutrient
deficiencies, water stress, and disease outbreaks. This can help farmers make data-driven
provide farmers with real-time recommendations and insights for crop management. These
systems can integrate data from various sources and provide personalized recommendations
In conclusion, machine learning has a bright future in crop prediction and agriculture, and it
has the potential to significantly improve crop production, optimize resource usage, and
algorithms, data collection, and analytics are expected to drive further innovation in this
17.Bibliography:-
Nelli.