Professional Documents
Culture Documents
MLP Proj
MLP Proj
MACHINE LEARNING
A PROJECT REPORT
Submitted by
BACHELOR OF TECHNOLOGY IN
BONAFIDE CERTIFICATE
Certified that this project report Bank Customer churn prediction using machine
This is to further certify to the best of my knowledge that this project has not been
SIGNATURE
Mrs.P.ANURADHA
Assistant professor
Certified that the above-mentioned project has been duly carried out as per the
norms of the college and statutes of the university.
SIGNATURE
Dr .P.SUBRAT KUMAR
Associate professor
SIGNATURE
DR.P.A. SUNNY DAYAL
Dean Associate professor
HEAD OF THE DEPARTMENT / DEAN OF THE SCHOOL
Professor of Computer Science and Engineering
DEPARTMENT SEAL
ACKNOWLEDGEMENTS
I thank Prof. Dr. Subrat Kumar Parida, Head of the Dept. of Department of Computer
Science and Engineering, SoET, Vizianagaram Campus for extending their support
during Course of this investigation.
I thank Dr. P. A. Sunny Dayal, Dean of SoET, Vizianagaram Campus for their
invaluable guidance, insightful feedback, and continuous support throughout the course of
this project. Your expertise and mentorship have been invaluable.
I thank Dr. P. Pallavi, Registrar, CUTM, Vizianagaram Campus for their assistance and
cooperation in facilitating the necessary resources and administrative support essential for
the successful execution of this project.
I am sincerely grateful to each one of you for your contributions, guidance, and
unwavering support, without which this project would not have been possible.
DECLARATION
We hereby declare that the work described in this project work, entitled " BANK
The work is original and has not been submitted for any Degree of this or any other
university,
Submitted by,
P. TEJESH (221801370066)
Y. CHANDU (221801370076)
V.SRAVAN KUMAR (221801370037)
ABSTRACT
Customer churn, or the rate at which customers cease doing business with a company, is a
critical concern for banks, as retaining existing customers is often more cost-effective than
acquiring new ones. In this study, we apply machine learning techniques to predict
customer churn in the banking sector. We explore various features such as demographic
information, transaction history, and customer interactions to develop predictive models.
Specifically, we employ algorithms including logistic regression, random forest, and
gradient boosting machines to build and evaluate the models. Additionally, we investigate
the impact of feature engineering, hyperparameter tuning, and model interpretability
techniques on the performance and interpretability of the models. Our findings
demonstrate the effectiveness of machine learning in identifying customers at risk of
churn, enabling proactive retention strategies and ultimately contributing to improved
customer satisfaction and business profitability in the banking industry.
TABLE OF CONTENTS
Chapter 1 Introduction
2.3 Algorithm
3.2 Modules 9
5.2 Coding
15
Conclusion
24
Future Scope
25
Biblograpy
26
References
27
List of Diagrams
List of Tables
The heart is a kind of muscular organ which pumps blood into the body and
is the central part of the body's cardiovascular system which also contains lungs.
veins, arteries, and capillaries. These blood vessels deliver blood all over the body.
Abnormalities in normal blood flow from the heart cause several types of heart
diseases are the main reasons for death worldwide. According to the survey of the
World Health Organization (WHO), 17.5 million total global deaths occur because
of heart attacks and strokes. More than 75% of deaths from cardiovascular diseases
occur mostly in middle-income and low-income countries. Also, 80% of the deaths
that oceur due to CVDs are because of stroke and heart attack.
Therefore, prediction of cardiac abnormalities at the early stage and tools for
the prediction of heart diseases can save a lot of life and help doctors to design an
effective treatment plan which ultimately reduces the mortality rate due to
analyzing big data from an assorted perspective and encapsulating it into useful
patients etc. are generated by healthcare industries. Data mining provides a number
patterns can be utilized for healthcare diagnosis. However, the available raw
medical data are widely distributed, voluminous and heterogeneous in nature .This
data needs to be collected in an organized form. This collected data can be then
significant role in data mining. This paper analyzes the heart disease predictions
using classification algorithms. These invisible patterns can be utilized for health
The primary goal of this examination is to develop a heart forecast framework. The
system can find information related with heart disease from the historical heart data
set to implement the classifier that classifies the disease according to the
contribution of the client and reduce the cost of the medical test. The scope of the
problems.
SYSTEM ANALYSIS
experience rather than on the knowledge rich data hidden in the database. This
practice leads to unwanted biases, errors and excessive medical costs which affects
the quality of service provided to patients. There are many ways that a medical
misdiagnosis of a serious illness can have very extreme and harmful effects. The
National Patient Safety Foundation cites hat 42% of medical patients feel they have
negligently given he back seat for other concerns, such as the cost of medical tests,
drugs, and operations. Medical Misdiagnoses are a serious risk to our healthcare
profession. If they continue, then people will fear going to the hospital for
treatment. We can put an end to medical misdiagnosis by informing the public and
Disadvantages:
Any faults occurred by the doctor or hospital staff n predicting would lead to
fatal incidents.
the patient to find out if he/she has any chances to get heart disease in future.
2.2 PROPOSED SYSTEM
This section depicts the overview of the proposed system and illustrates
all of the components, techniques and tools are used for developing the entire
an efficient software tool is needed in order to train huge datasets and compare
multiple machine learning algorithms.After choosing the robust algorithm with best
the smart phone-based application for detecting and predicting heart disease risk
level.
2.3 ALGORITHMS
the form of probabilities of an event occurring, i.e. the probability of y=1, given
certain values of input variables x. Thus, the results of LogR range between 0-1.
LogR models the data points using the standard logistic function, which
is an S- shaped curve also called as sigmoid curve and is given by the equation:
Even though, logistic (logit) regression is frequently used for binary variables
(2 classes), it can be used for categorical dependent variables with more than 2
classification as well as regression .But however ,it is mainly used for classification
problems .As we know that a forest is made up of trees and more trees means more
robust forest. Similarly ,random forest creates decision trees on data samples and
then gets the prediction from each of them and finally selects the best solution by
means voting It is ensemble method which is better than a single decision tree
Next ,this algorithm will construct a decision tree for every sample . Then it
At last ,select the most voted prediction results as the final prediction result.
REQUIREMENT HARDWARE
Processor - 1.6 GHz or Faster Processor
RAM - 8 GB or above
Hard disk
System Design
3.2 MODULES
They are:
a. Data Pre-processing
b. Feature
c. Classification
d. Prediction
Heart attack prediction using machine learning involves the application of various
make predictions
or feature importance analysis are used to select relevant features from the
language.
Scikit-learn: This library provides simple and efficient tools for data
Pandas: Used for data manipulation and analysis. Install it using pip:
Matplotlib and seaborn: These libraries sre used for data visualization.
4.3 MANUAL
Prediction: Input the medical data of a new individual into the trained
Classifer”.
a. Train data
5.2 CODING
Sample code:
import pandas as pd
import numpy as np
heart_df=pd.read_csv("heartnew.csv")
heart_df
heart_df.head()
heart_df.isnull()
heart_df.isnull().sum()
heart_df.info()
dict_names = {
'age': 'Age',
'sex': 'Sex',
'cp': 'Chest_Pain',
'trtbps': 'Resting_Pressure',
'chol': 'Cholesterol',
'fbs': 'Fasting_Blood_Sugar',
'restecg': 'Resting_Ecg_Results',
'thalachh': 'Maximum_Heart_Rate',
'exng': 'Exercise_Induced_Angina',
'oldpeak': 'Old_Peak',
'slp': 'Slope',
'caa': 'Major_Vessels',
'thall': 'Thallium_Rate',
'output': 'Target'
}
heart_df.shape
heart_df.info()
heart_df.describe()
heart_df.duplicated().sum()
heart_df.drop_duplicates(inplace=True)
heart_df.shape
heart_df['Target'].value_count()
x=heart_df.drop(columns='target',axis=1)
plt.figure(figsize=(13,4))
age_counts = heart_df['Age'].value_counts().sort_index()
plt.bar(age_counts.index, age_counts.values,
color=plt.cm.viridis(np.linspace(0, 1, len(age_counts))))
plt.title('Frequency Of Each Age')
plt.xlabel('Age')
plt.ylabel('Frequency')
plt.show()
plt.figure(figsize=(13,4))
sns.histplot(heart_df['Maximum_Heart_Rate'], bins=20, kde=True,
color='pink')
plt.title('Distribution of Maximum Heart Rate')
plt.xlabel('Maximum Heart Rate')
plt.ylabel('Frequency')
plt.show()
##
plt.figure(figsize=(7,4))
sns.countplot(x='Sex', data=heart_df, palette='Set1')
plt.title('Distribution of Sex')
plt.xlabel('Sex (0 = Female, 1 = Male)')
plt.ylabel('Count')
plt.show()
plt.figure(figsize=(8, 6))
chest_pain_counts = heart_df['Chest_Pain'].value_counts().sort_index()
colors = plt.cm.viridis(np.linspace(0, 1, len(chest_pain_counts)))
ax = chest_pain_counts.plot(kind='bar', width=0.9, color=colors)
ax.set_title('Chest Pain Levels Frequency')
ax.set_xlabel('Chest Pain Level')
ax.set_ylabel('Frequency')
plt.show()
plt.figure(figsize=(13, 4))
sns.histplot(heart_df['Resting_Pressure'], kde=True, color='skyblue')
plt.title('Distribution of Resting Pressure')
plt.xlabel('Resting Pressure')
plt.ylabel('Frequency')
plt.show()
plt.figure(figsize=(13, 4))
sns.histplot(heart_df['Cholesterol'], kde=True, color='red')
plt.title('Distribution of Cholesterol')
plt.xlabel('Cholesterol Level')
plt.ylabel('Frequency')
plt.show()
x = heart_df.drop(columns=['Target'])
y = heart_df['Target']
x.shape
y.shape
scaler = StandardScaler()
x = scaler.fit_transform(x)
logistic_model = LogisticRegression()
logistic_model.fit(x_train, y_train)
y_pred = logistic_model.predict(x_test)
rf_model = RandomForestClassifier()
rf_model.fit(x_train, y_train)
y_pred = rf_model.predict(x_test)
rf_model = RandomForestClassifier()
rf_model.fit(x_train, y_train)
y_pred = rf_model.predict(x_test)
In this project, we introduce about the heart disease prediction system with
different classifier techniques for the prediction of heart disease. The techniques
are Random Forest and Logistic Regression: we have analyzed that the Random
irrelevant attributes from the dataset and only picking those that
As illustrated before the system can be used as a clinical assistant for any
clinicians.
The disease prediction through the risk factors can be hosted online and hence any
internet users can access the system through a web browser and understand the risk
of heart disease. The proposed model can be implemented for any real time
application .Using the proposed model other type of heart disease also can be
disease can be identified. Other health care systems can be formulated using this
proposed model in order to identify the diseases in the early stage. The proposed
implement it in real time. The proposed model has wide area of application like
called Random Forest and Adaboost. By ensembling these two algorithms we will
achieve
high performance.
BIBLOGRAPY
heart disease prediction based on ensemble deep learning and feature fusion,"
2020.
2016.
Techniques," 2015.
algorithm," 2015.
https://www.kaggle.com/code/kanncaa1/heart-attack-
analysis-prediction
https://www.youtube.com/watch?v=tSBAag6lAQo