G3 - Final Report
G3 - Final Report
G3 - Final Report
A PROJECT REPORT
Submitted by
AMREEN.R [211420243003]
BACHELOR OF TECHNOLOGY
IN
MARCH 2024
PANIMALAR ENGINEERING COLLEGE
(An Autonomous Institution, Affiliated to Anna University, Chennai)
BONAFIDE CERTIFICATE
SIGNATURE SIGNATURE
Dr. S. MALATHI, M.E., Ph.D. Dr.S.CHAKARAVARTHI,M.E.,Ph.D.
HEAD OF THE DEPARTMENT PROFESSOR
Certified that the above-mentioned student was examined in End Semester Project
Work (AD8811) viva-voice held on .
me.
AMREEN.R
i
ACKNOWLEDGEMENT
We would like to express our deep gratitude to our respected Secretary and
Correspondent Dr. P. CHINNADURAI, M.A., Ph.D. for his kind words and
KUMAR B.E., M.B.A., Ph.D., for providing us with the necessary facilities to
We also express our gratitude to our Principal Dr. K. MANI, M.E., Ph.D. who
We thank the Head of the AI&DS Department, Dr. S. MALATHI, M.E., Ph.D.,
and all the faculty members of the Department of AI&DS for their advice and
ii
ABSTRACT
iii
TABLE OF CONTENTS
ABSTRACT iii
LIST OF ABBREVIATIONS ix
1. INTRODUCTION 1
1.1 Overview 3
1.3 Objective 4
2. LITERATURE SURVEY 6
3. SYSTEM ANALYSIS 10
iv
CHAPTER NO. TITLE PAGE NO.
4. DATASETS 19
5. SYSTEM DESIGN 22
5.1 ER diagram 23
6. SYSTEM ARCHITECTURE 25
6.2 Algorithm 29
7. EVALUATION METRICS 32
7.2 Accuracy 34
7.3 Recall 34
7.4 Precision 34
7.5 F1 Score 34
8. SYSTEM IMPLEMENTATION 35
9. SYSTEM TESTING 40
11. CONCLUSION 46
11.1 Conclusion 47
12. APPENDICES 49
13. REFERENCES 53
vi
LIST OF TABLES
LIST OF FIGURES
vii
FIGURE NO. TITLE PAGE NO.
viii
3.1 System Architecture for proposed system 11
ix
LIST OF ABBREVIATIONS
Abbreviation Meaning
AD Alzheimer’s Disease
MRI Magnetic Resonance Imaging
CNN convolutional neural networks
CN Cognitively Normal
MCI Mild Cognitive Impairment
EMCI Early Mild Cognitive Impairment
LMCI Late Mild Cognitive Impairment
MMSE Mini-Mental State Exam
PET Positron Emission Tomography
LLM Large Language Model
ADNI Alzheimer's Disease Neuroimaging Initiative
NLP Natural Language Processing
MLP Multilayer perceptron
x
CHAPTER 1
INTRODUCTION
1
CHAPTER 1
INTRODUCTION
2
1.1 OVERVIEW
3
1.2 PROBLEM DEFINITION
The need for Alzheimer's detection is paramount given the profound impact of
this neurodegenerative disease on individuals, families, and society. Alzheimer's
is a progressive condition that gradually impairs cognitive function, memory, and
daily living activities, ultimately leading to severe disability and dependency.
Early detection of Alzheimer's is crucial as it allows for timely intervention and
management strategies to slow disease progression and improve quality of life for
affected individuals. Additionally, early diagnosis enables patients and their
families to make informed decisions regarding care planning, access support
services, and participate in clinical trials for potential treatments. This
encompasses a multifaceted approach that incorporates the integration of various
data sources, including medical records, imaging studies, genetic information, and
cognitive assessments. The overarching goal is to construct predictive models
capable of reliably diagnosing Alzheimer's disease at its incipient stages, thus
facilitating timely intervention and personalized treatment strategies.
Furthermore, the problem statement includes the exploration of novel biomarkers,
risk factors, and disease mechanisms through advanced data analytics and
machine learning techniques. By addressing these challenges, the aim is to
enhance our understanding of Alzheimer's disease, improve early detection rates,
and ultimately mitigate its impact on individuals, families, and society as a whole.
1.3 OBJECTIVE
The primary objective of this project is to advance the diagnostic capabilities for
Alzheimer's disease (AD) by leveraging cutting-edge technology and
methodologies. By utilizing deep learning techniques, particularly convolutional
neural networks (CNNs), complex spatial and temporal information inherent in
MRI brain images can be effectively processed and analyzed. This approach
4
enables the extraction of subtle neuroanatomical changes associated with AD,
thereby facilitating early detection and intervention. Moreover, the integration of
a generative chatbot interface, based on LangChain architecture and powered by
the llama2 language model, serves to enhance the interaction between users,
including physicians and patients, and the diagnostic system. This intuitive
interface fosters real-time communication, allowing for the interpretation of
diagnostic data and provision of immediate feedback, thereby improving the
overall diagnostic process. Furthermore, the incorporation of a prediction
mechanism aims to forecast disease progression, enabling proactive management
strategies. Additionally, real-time news updates ensure that the diagnostic system
remains informed of the latest advancements and findings in AD research,
thereby continually enhancing its efficacy and relevance in clinical practice.
Overall, this project represents a multidimensional approach to advancing
Alzheimer's disease diagnosis, combining state-of-the-art technology with
innovative communication interfaces and predictive capabilities to improve
patient outcomes and clinical decision-making.
5
CHAPTER 2
LITERATURE SURVEY
6
CHAPTER 2
LITERATURE SURVEY
Tetiana Habuza et al. [1], The survey explores the use of deep learning in
analyzing brain scans and cognitive tests for Alzheimer's disease (AD) detection. It
discusses the urgency of AD detection, deep learning's potential in medical imaging
analysis, and the role of MRI and cognitive tests in AD diagnosis. It acknowledges
limitations like data needs and potential bias.
Cristina L. Saratxaga et al. [3], This paper reviews current methods for diagnosing
Alzheimer's Disease (AD), focusing on limitations of traditional cognitive tests and
the role of Magnetic Resonance Imaging (MRI) in identifying brain abnormalities. It
highlights recent advancements in deep learning, particularly Convolutional Neural
Networks (CNNs), and their potential for AD diagnosis. The review acknowledges
challenges like large datasets and potential biases in deep learning algorithms. The
aim is to establish the need for improved AD diagnosis and explore deep learning's
potential in MRI analysis.
Nasir Rahim et al. [4], The study discusses advancements in Alzheimer's disease
diagnosis and prognosis using deep learning techniques, multimodal data fusion,
longitudinal analysis, and explainable AI methods. Researchers use datasets like
ADNI to develop models that analyze MRI scans to identify patterns associated
7
with AD progression. Longitudinal studies track changes over time, while
explainable AI helps clinicians interpret model predictions. This integrated
approach improves early detection and understanding of AD's underlying
mechanisms.
Yan Zhao et al. [5], The study suggests that integrating PET/MR imaging with
deep learning (DL) can improve Alzheimer's disease diagnosis and treatment.
Despite challenges like data heterogeneity and interpretability, DL has shown
potential in improving efficiency and quality of AD imaging. Future research should
address these challenges to fully utilize DL's capabilities in improving AD
diagnosis, prognosis, and treatment strategies.
Naveen Sundar Gnanadesigan et al. [6], proposed the DC-GC model that is a
novel method for identifying candidate genes associated with Alzheimer's disease
(AD) using network topology measures and machine learning techniques. This
model outperforms existing classifiers like ANN, KNN, SVM, and decision trees in
identifying candidate genes. It ranks genes based on their connectivity and
physicochemical properties, improving the identification of potential AD-related
genes. The model's promising results suggest it can advance our understanding of
AD pathogenesis and facilitate the development of targeted therapeutic
interventions, thereby improving the development of effective therapies.
Zhen Zhao et al. [7], This paper reviews various machine learning techniques for
Alzheimer's disease classification and prediction, including SVM, RF, CNN,
Autoencoder, Deep Learning, and Transformer. It discusses feature extractors and
input formats, and addresses challenges like class imbalance and data leakage. The
review also discusses pre-processing techniques and trade-offs between deep
learning and conventional methods. It offers insights for addressing these issues,
exploring new techniques, and selecting appropriate input types for optimal AD
8
diagnosis and prediction.
Pradnya Borkar et al. [8], This research aims to develop noninvasive and cost-
effective methods for early detection of Alzheimer's disease (AD). By analyzing
MRI scans and extracting brain characteristics, a model based on convolutional
neural networks and long short-term memory networks is trained. The model
outperforms current diagnostic methods, providing high accuracy (99.7%) while
remaining noninvasive and cost-effective. This innovative approach contributes to
the growing literature on deep learning for early detection and intervention in AD,
offering hope for improved patient outcomes.
Daichi Shigemizu et al. [9], The study explores the genetic structure of late-onset
Alzheimer's disease (LOAD) using genome-wide association study data from
Japanese cohorts. Two distinct groups of LOAD patients were identified: one
exhibited risk genes for LOAD development, immune-related genes, and another
displayed genes associated with kidney disorders. Impaired kidney function was
identified as a potential contributor to LOAD pathogenesis. Researchers developed a
prediction model using a deep neural network, providing new insights into LOAD's
pathogenic mechanisms.
Shangran Qiu et al. [10], The study explores the genetic structure of late-onset
Alzheimer's disease (LOAD) using genome-wide association study data from
Japanese cohorts. Two distinct groups of LOAD patients were identified: one
exhibited risk genes for LOAD development, immune-related genes, and another
displayed genes associated with kidney disorders. Impaired kidney function was
identified as a potential contributor to LOAD pathogenesis. Researchers developed a
prediction model using a deep neural network, providing new insights into LOAD's
pathogenic mechanisms.
9
CHAPTER 3
SYSTEM ANALYSIS
10
CHAPTER 3
SYSTEM ANALYSIS
Existing systems for Alzheimer's detection [2] and prediction use machine
learning and deep learning techniques to analyse complex patterns and biomarkers
associated with the disease. These systems use data from neuroimaging, genetic
markers, and clinical assessments to predict the risk of Alzheimer's onset or
11
progression. These systems aim to enhance early detection, prognosis, and
personalized treatment strategies. However, manual diagnosis is error-prone and
time-consuming. This research proposes a deep learning-based solution using
DenseNet-169 and ResNet-50 CNN architectures for the diagnosis and
classification of Alzheimer's disease. The model categorizes Alzheimer's into
Non-Dementia, Very Mild Dementia, Mild Dementia, and Moderate Dementia
stages. DenseNet-169 outperformed in training and testing phases, demonstrating
potential for real-time analysis and classification of Alzheimer's disease.
Data pre-processing is the next stage that follows the data collection
procedure. The information gathered in the earlier phase comes from various
sources. It may not be in a format that is appropriate for us to work on. A variety
of errors—not necessarily errors, such as missing numbers, outliers, redundant
data, etc.—may be included. Therefore, handling each of these irregularities that
are present in the dataset is required at this point. As a result, the unnecessary
columns that essentially reflect the years must be deleted. Moreover, defining the
columns for features (x) and goal (y). where y is the output variable, also known
as the dependent variable, and X stands for the input variables, also known as the
independent variables.
14
process, enabling it to make predictions on new, unseen data.
The below graphs depcts the relationship between the given parameters-
15
nonexistent, which is inappropriate. It is for this reason that standardization is
required. We will scale the standardizing formula to each of the 20 selected
column's observations.
The dataset must be split into training and testing datasets as the next stage
in the data modeling process. The data can be divided in a variety of ways, for
example, by applying ratios like 80:20, 75:25, 70:30, etc. We selected a 70:30
ratio to divide the data for our model, meaning that 70% of the data will be used
for training and 30% for evaluating the model's performance.
Developing the model is one of our system's most crucial phases. We use a
variety of supervised learning methods to model the data after dividing it into
training and testing sets. We attempted to use three different techniques to model
the data. They are-
Logistic Regression
Random Forest Classifier
MLP Classifier
Linear Discriminant Analysis
Gradient Boosting Classifier
Convolutional Neural Network
16
Sequential model
The following sections will provide a description of the aforementioned
algorithms and their workflow. The models from all of the previously mentioned
models that have been implemented are assessed, and the model with the highest
accuracy is chosen to proceed with the deployment process.
18
CHAPTER 4
DATASETS
19
CHAPTER 4
DATASETS
20
Fig 4.1. Dataframe created from the ADNI dataset
Another dataset utilized in our study is the MRI scan dataset obtained from the
Alzheimer's Disease Neuroimaging Initiative (ADNI) database. This dataset
consists of MRI scans collected from participants diagnosed with various
cognitive states, including cognitively normal (CN), Alzheimer's disease (AD),
mild cognitive impairment (MCI), early mild cognitive impairment (EMCI), and
late mild cognitive impairment (LMCI). The source of this dataset is the ADNI
database, a longitudinal multicenter study aimed at understanding the progression
of Alzheimer's disease. The MRI scan dataset contains approximately 10,000
scans with detailed imaging features extracted, including measures of brain
volume, cortical thickness, and white matter integrity. Each scan is associated
with demographic information such as age, gender, and clinical assessments. The
dataset was initially provided in DICOM format and later transformed into a
structured format suitable for analysis. This dataset serves as a valuable resource
for studying the neuroanatomical changes associated with Alzheimer's disease
progression across different cognitive states.
21
Fig 4.2. Images from the MRI scans dataset
22
CHAPTER 5
SYSTEM DESIGN
23
CHAPTER 5
SYSTEM DESIGN
5.1. ER DIAGRAM
Figure 5.1 depicts the relationship between the user, administrator, and the
Alzheimer's prediction and detection system. Users input personal data, while the
system utilizes pre-trained models to predict Alzheimer's risk. The administrator
monitors the process and accesses results, including risk levels and potential
indicators, facilitating early intervention strategies.
24
5.2 CLASS DIAGRAM
The following diagram depicts the class diagram of the Alzheimer diagnostic system-
Fig 5.2 shows the functions of each class used in our system. The class names here
are the home_page, prediction_page, alzheimer_detection, chat_bot, and news_page.
Each of the class has its own attributes. The prediction_page predicts the Alzheimer
use several characteristics. The alzheimer_detection class checks whether the given
MRI scan image of the patient is healthy or not. The generative chatbot system and
the real-time news update class are the added advantage to the system.
25
CHAPTER 6
SYSTEM ARCHITECTURE
26
CHAPTER 6
SYSTEM ARCHITECTURE
Fig 6.1 describes the working of system architecture for Alzheimer Prediction
27
and Detection System. The Alzheimer's prediction and detection system
architecture involve a sequential process starting with data collection, followed
by pre-processing to clean and standardize the data. Exploration and
visualization help understand data characteristics before feature scaling to
ensure uniformity. The dataset is then split into training and testing subsets for
model training and evaluation. Using machine learning algorithms, the model is
built to predict and detect Alzheimer's disease based on the processed data.
Finally, the trained model is deployed for practical use in clinical or research
settings, aiding in early diagnosis and intervention.
28
B). Generative Chatbot Support: The Generative Alzheimer's Chatbot
Support System is designed to provide personalized assistance and support for
individuals affected by Alzheimer's disease and their caregivers. Harnessing
generative chatbot technology, the system interacts with users in natural
language, offering empathetic and informative responses to queries related to
Alzheimer's symptoms, treatments, caregiving strategies, and emotional
support. The chatbot employs a deep learning-based generative model trained
on a diverse corpus of Alzheimer's-related information, ensuring accurate and
contextually relevant responses. Additionally, the system integrates sentiment
analysis capabilities to gauge the emotional state of users and tailor responses
accordingly, offering empathetic and supportive interactions. Through
continuous learning and refinement, the chatbot evolves to better understand
and address the unique needs and challenges faced by individuals living with
Alzheimer's disease and their caregivers. This system aims to provide accessible
and reliable support, helping to alleviate stress, improve caregiver well-being,
and enhance the overall quality of life for those affected by Alzheimer's disease.
29
the likelihood of Alzheimer's disease occurrence over time, taking into account
longitudinal data and temporal dynamics. By capitalizing on a combination of
demographic, genetic, cognitive, and personal information, the Predictive
Analysis System offers valuable insights into individualized risk assessment for
Alzheimer's disease, empowering healthcare professionals and individuals to
proactively manage and mitigate the impact of the disease through personalized
interventions and lifestyle modifications.
6.2 ALGORITHM
30
The main purpose of this system is to detect and predict the likelihood of
Alzheimer's disease in individuals and offer performance analysis. We have
employed several supervised learning algorithms tailored for classification
problems. These models are thoroughly evaluated, and the model that
demonstrates superior performance is chosen for deployment.
31
Fig. 6.2.1: Logistic Regression
32
4. Model training and model evaluation.
5. Parameter tuning.
6. Prediction on new data.
CHAPTER 7
EVALUATION METRICS
33
CHAPTER 7
EVALUATION METRICS
The performance of our model was assessed using a number of metrics for evaluation.
This section discusses common evaluation metrics.
The metrices contains four values namely True Positive, True Negative, False Positive and
False Negative where-
True Positive (TP) =Observation is positive, and is predicted to be positive. False
Negative (FN) = Observation is positive, but is predicted negative.
True Negative (TN) = Observation is negative, and is predicted to be negative. False
Positive (FP) =Observation is negative, but is predicted positive.
The confusion matrix for the Logistic Regression model can be given as-
The confusion matrix for the Ridge Classifier model can be given as-
TP+TN
Accuracy=
TP+TN + FP+ FN
7.3 Recall
Recall is simply defined as the total number of correctly classified positive observations
divided by the total number of positive classifications.
Recall can be calculated as:
TP
Recall=
TP+ FN
7.4 Precision
Precision can be defined as the total number of correctly classified positive observations
divided by the total number of predicted positive observations.
Precision can be calculated as
TP
Precision=
TP+ FP
7.5 F1 Score
The F1 score is the harmonic mean of precision and recall. It provides a balanced
measure of a model's performance, especially when dealing with imbalanced datasets
where one class is much more frequent than the other.
F1 Score can be calculated using the following formula
35
Precision X Recall
F 1=2.
Precision+ Recall
CHAPTER 8
SYSTEM IMPLEMENTATION
36
CHAPTER 8
SYSTEM IMPLEMENTATION
# import opencv
import cv2
# store list
images = []
lables = []
# Sample Images
import matplotlib.pyplot as plt
plt.figure(figsize = (15,15))
for i in range(20):
plt.subplot(4, 5, i + 1)
37
plt.imshow(images[10 + i*3])
plt.show()
# Shape of the Images
print(f"Shape of each image is = {images[1000].shape}")
# MODEL LAYERS
model=Sequential()
# Convolutional Layers
model.add(Conv2D(25, kernel_size = (3,3), strides = (1,1), padding = 'same', activation
= 'relu', input_shape = (200, 200, 1)))
38
model.add(Conv2D(75, kernel_size = (3,3), strides = (1,1), padding = 'same', activation
= 'relu'))
model.add(MaxPooling2D(pool_size = (2,2)))
model.add(Flatten())
model.add(Dense(500,activation='relu'))
model.add(Dropout(0.25))
model.add(Dense(250,activation='relu'))
model.add(Dropout(0.25))
model.add(Dense(100,activation='relu'))
model.add(Dense(5,activation='softmax'))
model.summary()
# MODEL COMPILE
39
# Main function for Streamlit app
def main():
st.title("Alzheimer's Disease Prediction")
st.write("Upload an image to predict the result")
# File uploader
uploaded_file = st.file_uploader("Choose an image...", type=["jpg", "jpeg"])
# Predict result
result = predict_alzheimer(image)
st.write("Predicted class:", result)
if __name__ == "__main__":
main()
40
CHAPTER 9
SYSTEM TESTING
41
CHAPTER 9
SYSTEM TESTING
In this chapter, the results are checked for inputs provided by the user.
[1] Input
Upload an image (CN image)
[1] Output
Predicted Class: Cognitively Normal
[2] Input
Age: 65
Gender: Male
Years of Education: 20
Ethnicity: Hisp/Latino
Race Category: White
APOE Allele Type: APOE4_0
APOE4 Genotype: 2,2
Imputed Genotype: True
MSME Score: 25
[2] Output
Detected Class: Late Mild Cognitive Impairment (LMCI)
42
CHAPTER 10
43
CHAPTER 10
In this study, logistic regression was employed to predict the presence or absence
of Alzheimer's disease based on a comprehensive set of features including patient
information, demographics, ethnicity, and genetic information.
Accuracy 0.87 53
Macro avg 0.86 0.86 0.86 53
Weighted avg 0.87 0.87 0.87 53
The above table represents the accuracy report for the Logistic Regression model.
The model achieved an accuracy of 87%, indicating a substantial level of
predictive capability. This suggests that the variables included in the model are
informative in distinguishing between individuals with and without Alzheimer's
disease.
Our approach involved the utilization of a Convolutional Neural Network (CNN)
model, constructed using the Keras deep learning API, to analyze and classify
brain images associated with Alzheimer's disease progression. We preprocessed
the images using OpenCV, a computer vision library, to enhance their quality and
extract relevant features. Subsequently, we employed scikit-learn, a machine
learning library, to split the dataset into training and testing sets, ensuring robust
evaluation of the model's performance.
44
Upon rigorous evaluation, our model demonstrated exceptional accuracy in
classifying different stages of cognitive impairment, achieving an impressive
accuracy score of 98%. This signifies the model's ability to accurately discern
subtle variations in brain imaging patterns indicative of Alzheimer's disease
progression.
GenAI models are evaluated using four datasets: ARC, HellaSwag, MMLU, and
TruthfulQA. Each dataset has a 25-shot evaluation setup, with rankings
determined by calculating average performance across different levels. This
comprehensive approach provides a holistic understanding of models' capabilities.
45
The following table is the accuracy report of the chatbot-
46
CHAPTER 11
CONCLUSION
47
CHAPTER 11
CONCLUSION
11.1 CONCLUSION
48
11.2 FUTURE WORK
49
CHAPTER 12
APPENDICES
50
CHAPTER 12
APPENDICES
51
Screenshot of Alzheimer Prediction page:
52
Screen Shot of AlzNewsFeed:
53
CHAPTER 13
REFERENCES
54
CHAPTER 13
REFERENCES
56