M L - B D D U P S F: Achine Earning Ased Iabetes Etection Sing Hotoplethysmography Ignal Eatures

M ACHINE L EARNING -BASED D IABETES D ETECTION U SING
P HOTOPLETHYSMOGRAPHY S IGNAL F EATURES
A P REPRINT
Filipe A. C. Oliveira Felipe M. Dias Marcelo A. F. Toledo

Heart Institute (InCor) Heart Institute (InCor) Heart Institute (InCor)
arXiv:2308.01930v1 [cs.LG] 2 Aug 2023
Clinics Hospital University of Clinics Hospital University of Clinics Hospital University of

Sao Paulo Medical School Sao Paulo Medical School Sao Paulo Medical School
Sao Paulo - SP - Brazil Sao Paulo - SP - Brazil Sao Paulo - SP - Brazil
filipe.acoliveira@hc.fm.usp.br f.dias@hc.fm.usp marcelo.arruda@hc.fm.usp.br
Diego A. C. Cardenas Douglas A. Almeida Estela Ribeiro

Heart Institute (InCor) Heart Institute (InCor) Heart Institute (InCor)
Clinics Hospital University of Clinics Hospital University of Clinics Hospital University of
Sao Paulo Medical School Sao Paulo Medical School Sao Paulo Medical School
Sao Paulo - SP - Brazil Sao Paulo - SP - Brazil Sao Paulo - SP - Brazil
diego.cardona@hc.fm.usp.br douglas.andrade@hc.fm.usp.br estela.ribeiro@hc.fm.usp.br
Jose E. Krieger Marco A. Gutierrez

Heart Institute (InCor) Heart Institute (InCor)
Clinics Hospital University of Clinics Hospital University of
Sao Paulo Medical School Sao Paulo Medical School
Sao Paulo - SP - Brazil Sao Paulo - SP - Brazil
j.krieger@hc.fm.usp.br marco.gutierrez@incor.usp.br
A BSTRACT
Diabetes is a prevalent chronic condition that compromises the health of millions of people world-
wide. Minimally invasive methods are needed to prevent and control diabetes but most devices for
measuring glucose levels are invasive and not amenable for continuous monitoring. Here, we present
an alternative method to overcome these shortcomings based on non-invasive optical photoplethys-
mography (PPG) for detecting diabetes. We classify non-Diabetic and Diabetic patients using the
PPG signal and metadata for training Logistic Regression (LR) and eXtreme Gradient Boosting
(XGBoost) algorithms. We used PPG signals from a publicly available dataset. To prevent overfitting,
we divided the data into five folds for cross-validation. By ensuring that patients in the training
set are not in the testing set, the model’s performance can be evaluated on unseen subjects’ data,
providing a more accurate assessment of its generalization. Our model achieved an F1-Score and
AUC of 58.8 ± 20.0% and 79.2 ± 15.0% for LR and 51.7 ± 16.5% and 73.6 ± 17.0% for XGBoost,
respectively. Feature analysis suggested that PPG morphological features contains diabetes-related
information alongside metadata. Our findings are within the same range reported in the literature,
indicating that machine learning methods are promising for developing remote, non-invasive, and
continuous measurement devices for detecting and preventing diabetes.
Keywords Photoplethysmography · Wearable devices · Diabetes.

ML Diabetes Detection A P REPRINT
1 Introduction
Diabetes is a chronic disease characterized by insufficient insulin release by the pancreas or lack of tissue response,
leading to higher blood glucose levels (BGL) DeFronzo et al. [1992]. When undetected or untreated, this higher glucose
concentration may lead to several vascular illnesses, such as kidney and heart diseases. Diabetes management includes
the prevention of such illnesses by constant BGL self-monitoring Kirk and Stegner [2010].
Several devices for assessing the patient’s BGL are either invasive based on fasting plasma glucose test, which requires
a blood sample collection by a healthcare professional, or minimally invasive through finger prick blood test and
continuous glucose monitors (CGM) for self-monitoring at home. However, all these methods require skin perforation,
and only the CGM provides continuous measurements. Due to the high cost, discomfort, and risk provided by CGM,
frequent BGL monitoring is not implemented at large. A non-invasive approach to detect diabetes would be valuable
for the identification of early stages of diabetes in healthy and pre-Diabetic subjects, contributing to global efforts in
preventing this disease and instrumental to assist monitor therapy interventions LaMonte et al. [2005].
The Photoplethysmography (PPG) is one of the non-invasive approaches that has been studied for the detection of
diabetes and pre-diabetes Zanelli et al. [2022]. The PPG is an optical technique for measuring changes in blood volume
in the microvascular bed of tissues. This technique typically involves emitting light on the tissue and measuring changes
in light absorption, reflection, or scattering to determine changes in blood volume Mejía-Mejía et al. [2022]. Moreover,
the quantification of certain molecules in the blood can be performed by employing light of particular molecules’ peak
absorption frequencies Mukkamala et al. [2022]. The canonical case is oximetry (SpO2). Besides that, PPG waveform
can be used to measure other important cardiovascular and respiratory parameters, such as heart rate (HR), respiration
rate, and blood pressure Nitzan and Ovadia-Blechman [2022].
PPG can be applied as an indirect measure of a subject’s hemodynamics information, conveying information pertaining
to the condition of the blood vessel, such as the arterial stiffness, which is found to be elevated in individuals with
diabetes Pilt et al. [2013]. Some works suggest that Diabetic subjects have less prominent dicrotic notch Spigulis et al.
[2002], even though they used a reduced number of subjects in their analysis. Other work show that the area under a
PPG pulse is reduced when the level of HbA1c (glycated hemoglobin) level increase Usman et al. [2011]. The HbA1c
provides an estimation of a person’s average blood glucose levels over the past two to three months and is used for
diabetes diagnostic and management.
An advantage of PPG-based methods for assessing BGL is that they can provide continuous non-invasive measurements.
These are not only important for monitoring, but also for detecting diabetes in its early stages of development. In
Reddy. et al. [2017], they applied an SVM method for predicting diabetes using features related to heart rate variability
(HRV) and PPG morphology, using a fingertip pulse oximeter private dataset. In Hettiarachchi and Chitraranjan [2019],
they proposed to predict diabetes based on morphological features extracted from PPG signals from a public dataset
Liang et al. [2018] using a Linear Discriminant Analysis method. In Moreno et al. [2017], they proposed a method
for screening patients for diabetes by extracting features from the signal of a pulse oximeter. The features served as
input for different classification algorithms, such as random forest and gradient boosting. Srinivasan and Foroozan
[2021] made an analysis in the frequency domain using a convolutional neural network (CNN) with 30 seconds PPG
scalograms and metadata over a large dataset. More recently, Zanelli et al. [2023] proposed a transfer learning approach
to detect diabetes through 1-second raw PPG signals.
In this work, we presented a method to classify non-Diabetic and Diabetic patients using a set of features extracted from
PPG signals and metadata for training machine learning models using Logistic Regression (LR) Tolles and Meurer
[2016] and eXtreme Gradient Boosting (XGBoost) Chen and Guestrin [2016a] algorithms. We aimed to correlate
the PPG signals morphological characteristics and the subject’s diabetes status adjusted by the individual’s metadata
information. The method proposed in this paper utilizes shorter PPG signals, each segment lasting 2.1 seconds. This
approach enables easy integration into various wearable devices, while also potentially offering benefits such as reduced
computational costs and processing times.
2 Materials and Methods
In this section, the description of the publicly available dataset used to classify the PPG signals into non-Diabetic
or Diabetic patients is presented and also the data selection criteria. Next, we describe the proposed methodology
for Diabetes classification, based on signal preprocessing and feature extraction of PPG signals, and the use of two
algorithms for the classification step. Fig. 1 shows the proposed methodology for the classification between non-Diabetic
and Diabetic groups.
2
DATA SELECTION PREPROCESSING FEATURE EXTRACTION CLASSIFICATION
PPG signals Low‐pass S PPT

DIABETES
2.1 s 2.1 s 2.1 s Filtering Metadata
Subject 1 Butterworth
PPG Features
XGBoost
NON‐DIABETES
H1
2.1 s 2.1 s 2.1 s Subject N 2.1 s PW N D
DIABETES
Metadata Logistic
Guilin Dataset
H3
H2
Baseline 0 PPG Features Regression NON‐DIABETES
(N = 86 subjects after selection criteria) T1
Correction T3
T2
T
Heartbeat
Cycle 1 Cycle 2 Cycle M Segmentation
Figure 1: The proposed methodology for the classification of patients into Diabetes or non-Diabetes groups.
2.1 Data Selection
We used a publicly available dataset provided by a study that contains physiological information and PPG data from 219
patients of Guilin People’s Hospital in China Liang et al. [2018]. The PPG signal was collected from the fingertip using
the transmission method in infrared wavelength (905 nm) at a sample rate of 1 kHz. The PPG signal was collected from
resting patients, where three 2.1-second PPG segments were captured from each patient.
The dataset contains information about three different classes of diseases: diabetes, hypertension, and cerebrovascular
diseases, along with metadata information such as age, weight, height, body mass index (BMI), arterial blood pressure
(ABP), and an unique patient ID number.
For this study, we divided the patients into two groups: non-Diabetic and Diabetic patients. We kept 59 healthy subjects
in the non-Diabetic group by removing all patients with any stage of hypertension or cerebrovascular disease, and all
38 patients diagnosed with diabetes were kept in the other group. Due to segmentation errors and noisy signals, 11
subjects were excluded from the analysis. Table 1 show a description of the data used in this study.
Table 1: Summary of the data used.

Class Subjects Male Cycles Age Height Weight HR BMI
Non-Diabetic 54 22 281 45 ± 16 161 ± 8 56 ± 11 73 ± 11 22 ± 4
Diabetic 32 14 172 59 ± 12 160 ± 8 62 ± 12 74 ± 11 24 ± 4
Total 86 36 453 50 ± 16 161 ± 8 59 ± 11 74 ± 11 23 ± 4
2.2 Preprocessing
To remove high-frequency artifacts in the PPG, we preprocessed each signal segment with a 6th -order Butterworth
low-pass filter with a cutoff frequency of 16 Hz. During the acquisition of PPG signals, different artifacts may cause
baseline oscillations. To address this issue, we applied the Fitting-based Sliding Window (FSW) algorithm instead of a
high-pass-filter to remove the baseline, as the high-pass filter may eliminate important low-frequency characteristics
present in the signal Zhang et al. [2020].
The FSW algorithm involves a sliding window that identifies the local minimums between each cycle (heartbeat) in the
signal segments and uses them to fit the baseline. The fitted line was subtracted from the signal, removing the baseline
fluctuations while preserving low-frequency characteristics. The valleys between the cycles were also used to segment
the PPG cycles in each signal segment. The FSW algorithm details are described in Zhang et al. [2020].
We performed a visual inspection on the cycles extracted from PPG signals of all patients in the study. During this
process, six Diabetic and five non-Diabetic patients were removed from the experiment due to segmentation flaws or
presence of very noisy signals.
2.3 Feature extraction
Each PPG cycle segmented by the FSW served as input for the feature extraction algorithm. We used the same features
presented in Costa et al. [2023], resulting in 104 features. The features were computed from each signal cycle (heartbeat)
and its corresponding first and second derivatives. The features included a variety of values such as pulse width, area,
3
intervals, peak-to-peak interval, and systolic amplitude. A detailed description of the features used here can be found in
Costa et al. [2023], Chowdhury et al. [2020], El-Hajj and Kyriacou [2021], and Lin et al. [2020].
The metadata (sex, age, height, weight, heart rate, and BMI) was used as features in the classification algorithm along
with the PPG signal features. However, the data related to blood pressure was not included due to its strong relation to
hypertensive and pre-hypertensive patients. This could create a potential bias in the classification algorithm as high
blood pressure could be falsely associated with the presence of Diabetes, since most Diabetic patients included in the
experiment have a degree of hypertension.
This feature extraction step resulted in 453 sets of features that were used to perform the detection of Diabetes on PPG
signals, being 6 features in each set related to the medadata information and 104 related to features extracted from the
PPG signal cycles.
2.4 Classification
We employed two well-known algorithms commonly used in machine learning classification tasks, the Logistic
Regression Stoltzfus [2011] and the XGBoost Chen and Guestrin [2016b] algorithms. XGBoost is known for its high
accuracy, ability to handle complex data, and scalability, but it’s less interpretable. Logistic Regression, on the other
hand, is a simpler model with better interpretability, but may not perform as well in complex datasets.
L1 penalty and LIBLINEAR solver were used as hyperparameters for the LR algorithm. For the XGBoost algorithm we
used the tree booster with learning rate of 0.1 and maximum tree depth of 30. The weights for each class were adjusted
according to the number of samples representing each class for both classifiers.
The performance of the LR and the XGBoost models was evaluated using statistical metrics: Accuracy (Acc), Sensitivity
(Se), Specificity (Sp), F1-score, Positive Predictive Value (P P V ), and the Area Under the ROC Curve (AU C).
The “Diabetes” column of the dataset was used as the label, considering one for Diabetics and zero for non-Diabetics.
The classification was performed and evaluated using a 5-fold cross-validation strategy, which involves splitting the
dataset into five equal parts or folds. In each iteration, four of the folds are used for training and one fold is used for
testing. This process is repeated five times with each fold serving as the testing set once. The performance of the
classifiers (LR and XGBoost) is then averaged across the five iterations.
Features sets obtained from PPG cycles were split in each fold patient-wise. This way, features extracted from signals
of the same patient are placed in training folds or testing fold, never in both. By allocating each patient in training and
testing sets by their IDs, we avoided the classification bias caused by mixing information of the same patient in both
training and testing folds in each iteration Costa et al. [2023] de Chazal et al. [2004].
The overall performance of the classifiers is obtained by evaluating the mean and standard deviation of all statistical
metrics obtained in each iteration. Moreover, we used the sklearn.inspection.permutation_importance algorithm to
determine feature importance for both classifiers. Since we have a 5-fold strategy, the mean feature importance for all
5-folds was computed.
3 Results
This study used data from 86 subjects, with 453 cycles extracted using the FSW method. Out of these cycles, 172 belong
to Diabetic patients. A total of 110 features were used as input for the classifiers, with 104 of these being morphological
features extracted from each PPG cycle and the remaining 6 features comprising metadata of the patients.
The goal of the study was to classify non-Diabetic and Diabetic patients by using LR and XGBoost classifiers on
features extracted from PPG cycles. The 5-fold cross-validation strategy permitted the evaluation of the classifiers’
performance on multiple iterations and reduced the impact of any random fluctuations in the data.
Table 2 summarizes the overall performance of the proposed methods for the binary classification between Diabetic/non-
Diabetic. The metrics are presented as the mean and standard deviation of the five iterations of the cross-validation
strategy. In addition, Figs. 2 and 3 show the AUC for the LR and XGBoost methods, respectively.
Table 2 shows the classification metrics of this study, which achieved F1-Score and AUC of 58.8 ± 20.0% and
79.2 ± 15.0% for LR and 51.7 ± 16.5% and 73.6 ± 17.0% for XGBoost, respectively. The best achieved AUC with
LR model is comparable to the AUC of 79 ± 15%, reported in Hettiarachchi and Chitraranjan [2019], which used the
same dataset for diabetes detection.
Additionally, we present the feature importance measures, displaying the 7 most significant features for the two proposed
models’ ability to predict the outcomes. The importance and name of each feature is shown in Fig. 4 for Logistic
4
Table 2: Comparison of the performance results of our two proposed algorithms for Diabetic/Non-Diabetic classification.
Metrics Logistic Regression XGBoost
Se 66.4 ± 26.6% 56.3 ± 22.1%
Spe 75.1 ± 18.9% 70.5 ± 16.0%
F1-score 58.9 ± 20.0% 51.7 ± 16.5%
Acc 70.0 ± 12.2% 64.5 ± 9.8%
PPV 61.1 ± 24.1% 54.0 ± 21.3%
AUC 79.2 ± 15.0% 73.6 ± 17.0%
Figure 2: Receiver Operating Curve (ROC) and AUC for each fold of the Logistic Regression method.
Regression and in Fig. 5 for XGBoost. Based on the feature importance results obtained from both algorithms, we have
compiled a summary of the six most significant PPG features in Table 3. We have excluded from this table the features
associated with metadata, namely age, weight, height, and BMI, as they are self-explanatory.
Table 3: Description of the six most significant PPG signal features of the experiment.
Name Description
der_1_PI Intensity of the 1st inflection point in the 1st derivative of the PPG cycle
AS Slope of the systolic portion in the PPG cycle (from the beginning of the cycle to
the systolic peak)
der_1_AS Slope of the line up to the 1st inflection point (from the beginning of the 1st
derivative to the 1st inflection point)
DID Difference in intensity between the systolic peak and the end of the PPG cycle
AID Difference in intensity between the beginning of the PPG cycle and the systolic
peak
PI Intensity of the systolic peak in the PPG cycle normalized by the average value
(baseline)
5
Figure 3: Receiver Operating Curve (ROC) and AUC for each fold of the XGBoost method.
Figure 4: Importance of the 7 most significant features for Logistic Regression. The values represent the the mean of
the five folds.
4 Discussion
In this study, we proposed a method aimed at aiding in the diagnosis of diabetes based on PPG signals. This is an
exploratory study that correlates the morphological features extracted from PPG signals with metadata information
from the subjects.
We show that it is possible to achieve results comparable to the state-of-the-art using a limited dataset, with features
extracted from single heartbeats (cycles) of the PPG signals. Table 4 displays a comparison of works that propose to
detect diabetes through PPG signals. Hettiarachchi and Chitraranjan [2019] used the same dataset as our study; however,
6
Figure 5: Importance of the 7 most significant features for XGBoost. The values represent the the mean of the five folds.
they used features extracted from the whole 2.1 s segment available, reducing the input size of their models. Moreno
et al. [2017], on the other hand, used a private dataset with one-minute long segments registered from 1,170 subjects and
achieved results in the same range as ours (AUC 69.4%, Se 65% Spe 64%) using different machine learning techniques.
Zanelli et al. [2023], the most recent work on this topic, used 1 s segments of PPG raw signals from 100 subjects
to predict diabetes. Even though they proposed a transfer learning strategy, heir best-performing model was the one
without transfer learning, combining age and sex metadata information.
Different from the works on Table 4, Nirala et al. [2019] achieved Acc 97.87%, Se 98.78% and Spe 96.61%, using data
from 141 subjects. However, they randomly partitioned the data into k-equal folds on a k-fold strategy, implicating that
they mixed the same subject data on the training and test partitions. Gupta et al. [2022] also did the same approach, not
clearly specifying the dataset division. Previous research has shown that utilizing the same subject for both the training
and test sets can yield better outcomes, as data leakage may occur due to the high interdependence among intra-subject
heartbeats de Chazal et al. [2004], Costa et al. [2023]. Reddy. et al. [2017], on the other hand, achieved Acc 89%, Se
90% and Spe 88%, using data from 100 subjects. Nevertheless, they didn’t make it clear if during partition of the data
on the k-fold strategy they ensure that the same subject signals wasn’t on both training and test sets.
Table 4: Comparison of the performance results of our two proposed algorithms for Diabetes detection through PPG
signal and the related state-of-the-art works.
Reference F1-score AUC Accuracy Sensitivity Specificity Dataset Method
Moreno et al. [2017] – 70.0 – 80.0 48.0 Private (s = 1,170) RF and GB - PPG features and metadata
Reddy. et al. [2017] – – 89.0 90.0 88.0 Private (s = 100) SVM - PPG features
Hettiarachchi and Chitraranjan [2019] 71.0 ± 15.0 79.0 ± 15.0 71.0 – – Liang et al. [2018] (s = 64) LDA - PPG features and metadata
Hettiarachchi and Chitraranjan [2019] 69.0 ± 10.0 74.0 ± 17.0 79.0 – – Liang et al. [2018] (s = 64) SVM - PPG features and metadata
Nirala et al. [2019] 98.18 97.69 97.87 98.78 96.61 Private (s = 141) SVM - 37 PPG features
Avram et al. [2020] – 76.6 – 75.0 65.4 Private (s = 55,433) CNN - PPG raw signal and metadata
Srinivasan and Foroozan [2021] – 83.0 76.34 76.66 76.11 MIMIC III (s = 808) 2D CNN - PPG scalogram and metadata
Gupta et al. [2022] 99.0 – 98.52 99.0 96.0 Liang et al. [2018] (s = 219) RF - PPG features
Zanelli et al. [2023] 46.15 56.50 – 75.0 76.0 Private (s = 100) CNN - PPG raw signal, age and sex
Zanelli et al. [2023] 30.77 61.0 – 50.0 72.0 Private (s = 100) CNN and Transfer Learning - PPG raw signal, age and sex
Our method: XGBoost 51.7 ± 16.5 73.6 ± 17.0 64.5 ± 9.8 56.3 ± 22.1 70.5 ± 16.0 Liang et al. [2018] (s = 86) XGBoost - 104 PPG features and 6 metadata
Our method: LR 58.9 ± 20.0 79.2 ± 15.0 70.0 ± 12.2 66.4 ± 26.6 75.1 ± 18.9 Liang et al. [2018] (s = 86) LR - 104 PPG features and 6 metadata
Nirala et al. [2019] suggest in their work that the absence of a dicrotic notch on PPG waves can be observed in Diabetic
subjects. Zanelli et al. [2023] also compares a waveform from a Diabetic and a non-Diabetic subjects, highlighting the
absence of a dicrotic notch in the former. However, they imply that these differences are not always readily apparent. On
the contrary, Srinivasan and Foroozan [2021] indicates that despite using a much larger dataset than the aforementioned
7
studies, it is not possible to observe this dicrotic notch difference between the two groups. Our work aligns with
Srinivasan’s findings. In Figure 6, we present the average heartbeat of both Diabetic and non-Diabetic subjects in our
dataset, demonstrating no discernible difference in regarding the presence of dicrotic notch. Moreover, the absence of
dicrotic notch may be related to aging or other factors MILLASSEAU et al. [2002].
Figure 6: Average heartbeat (red) for Diabetic and non-Diabetic groups. All PPG cycles were normalized and aligned
by the highest peak in the cycle.
Srinivasan and Foroozan [2021] suggest that their work outperformed the studies based on PPG features due to this
similarity of PPG wave morphology. Even though Zanelli et al. [2023] also used a deep learning approach, they
used a much smaller input signal (1 s vs 30 s length). Moreover, Srinivasan’s study implicate that frequency domain
information may be important for diabetes detection. Future works should be done in this regard.
Heart Rate Variability (HRV) is broadly used as a feature in diabetes detection and BGL estimation studies Reddy. et al.
[2017] Chu et al. [2021]. However, this feature may not be calculated with confidence in short segments of data such as
the available in our dataset. For this reason, the HRV was not estimated for this experiment.
The small number of patients in the dataset raises concerns about potential bias in the classifier’s performance. There
is a possibility that the classifier may learn to estimate other clinical parameters, such as blood pressure or other
clinical conditions, instead of focusing solely on Diabetes classification. In addition, several studies involving Diabetes
classification do not explicitly consider the bias caused by mixing PPG signals from the same patient in the training and
testing steps Panwar et al. [2020], which may result in better metrics due to overfitting. We addressed this issue by
adopting the 5-fold cross-validation strategy considering the patient’s identification number while splitting the data.
Comparing the significance of each feature shown in Figs. 4 and 5, it is seen that the Logistc Regression algorithm
considered mostly the PPG-related features as important for predicting diabetes. On the other hand, the XGBoost
algorithm made predictions considering almost exclusively the patient’s metadata information. The better results
provided by LR algorithm suggest that the PPG cycles may indeed carry diabetes-related information. Both classification
algorithms consider as important two specific PPG features: der_1_PI and AID. These features use information from
the systolic wave between its onset and peak, indicating a potential value of this portion of the PPG wave for further
analysis.
One of the reasons for the lack of studies using this dataset may be related to its reduced number of patients and its
short time signals, considering that each patient has only six seconds of PPG signal divided into three non-continuous
segments. The visual analysis shows that most of the PPG segments display incomplete cycles at the signal edges, which
led to the disposal of large portions of PPG data. This aspect may impair the performance of segmentation algorithms,
such as the FSW. Such impairment seen in this work led to the exclusion of eleven patients from the experiment due to
segmentation errors and noisy signals. The FSW algorithm was first introduced in Zhang et al. [2020] for removing the
baseline and segmenting PPG signals. However, when applied to signals with a small number of cycles, it tends to
8
discard the edges, resulting in a reduction of the data passed to the feature extraction algorithm and, consequently, a
lower number of computed features, which negatively affects the performance and generalization of the classifiers.
The clinical use of PPG-based diabetes detection algorithms demands higher accuracy and reliability on different
patients with varying medical conditions. Therefore, in order to enhance the effectiveness and reliability of the findings,
ensuring their generalizability and applicability, future studies on the detection of diabetes using wearable devices
should be conducted on a larger and more diverse dataset, including patients with different medical conditions, ages,
and ethnicities. Thus, we urge for the need of publicly available well-annotated datasets on this theme in order to
improve the effectiveness of PPG-based diabetes detection algorithms and facilitate its translation to clinical practice.
5 Conclusion
Our study proposed a method for diagnosing diabetes based on PPG signals, utilizing morphological features extracted
from single beats of the PPG signals. We achieved comparable results to state-of-the-art studies using a limited dataset.
Comparison with other studies showed that our approach yielded promising accuracy, sensitivity, and specificity values.
However, the small number of patients in our dataset and the short duration of the PPG signals available for the analysis
may have impacted our results. Additionally, publicly available well-annotated datasets in this field are needed for
further advance research in this area.
References
Ralph A DeFronzo, Riccardd C Bonadonna, and Eleuterio Ferrannini. Pathogenesis of NIDDM: A Balanced Overview.
Diabetes Care, 15(3):318–368, 03 1992. doi:10.2337/diacare.15.3.318.
Julienne K. Kirk and Jane Stegner. Self-monitoring of blood glucose: Practical aspects. Journal of Diabetes Science
and Technology, 4(2):435–439, 2010. doi:10.1177/193229681000400225.
Michael J. LaMonte, Steven N. Blair, and Timothy S. Church. Physical activity and diabetes prevention. Journal of
Applied Physiology, 99(3):1205–1213, 2005. ISSN 8750-7587, 1522-1601. doi:10.1152/japplphysiol.00193.2005.
Serena Zanelli, Mehdi Ammi, Magid Hallab, and Mounim A. El Yacoubi. Diabetes detection and management through
photoplethysmographic and electrocardiographic signals analysis: A systematic review. Sensors, 22(13):4890, 2022.
ISSN 1424-8220. doi:10.3390/s22134890.
Elisa Mejía-Mejía, John Allen, Karthik Budidha, Chadi El-Hajj, Panicos A. Kyriacou, and Peter H. Charlton. 4 - photo-
plethysmography signal processing and synthesis. In John Allen and Panicos Kyriacou, editors, Photoplethysmogra-
phy, pages 69–146. Academic Press, 2022. ISBN 978-0-12-823374-0. doi:10.1016/B978-0-12-823374-0.00015-3.
Ramakrishna Mukkamala, Jin-Oh Hahn, and Anand Chandrasekhar. 11 - photoplethysmography in noninvasive blood
pressure monitoring. In John Allen and Panicos Kyriacou, editors, Photoplethysmography, pages 359–400. Academic
Press, 2022. ISBN 978-0-12-823374-0. doi:10.1016/B978-0-12-823374-0.00010-4.
Meir Nitzan and Zehava Ovadia-Blechman. 9 - physical and physiological interpretations of the ppg signal. In
John Allen and Panicos Kyriacou, editors, Photoplethysmography, pages 319–340. Academic Press, 2022. ISBN
978-0-12-823374-0. doi:10.1016/B978-0-12-823374-0.00009-8.
Kristjan Pilt, Rain Ferenets, Kalju Meigas, Lars-Goran Lindberg, Kristina Temitski, and Margus Viigimaa. New
photoplethysmographic signal analysis algorithm for arterial stiffness estimation. The Scientific World Journal, 2013:
1–9, 2013. doi:http://dx.doi.org/10.1155/2013/169035.
Janis Spigulis, Indulis Kukulis, Eva Fridenberga, and Girts Venckus. Potential of advanced photoplethysmography
sensing for noninvasive vascular diagnostics and early screening. In Gerald E. Cohn, editor, Clinical Diagnostic
Systems: Technologies and Instrumentation, volume 4625, pages 38 – 43. International Society for Optics and
Photonics, SPIE, 2002. doi:10.1117/12.469789.
Sahnius Usman, MMBI Reaz, and MABM Ali. Repeated measurement analysis of the area under the curve of
photoplethysmogram among diabetic patients. Life Sci. J, 10:532–539, 2011.
V. Ramu Reddy., Anirban Dutta Choudhury., Srinivasan Jayaraman., Naveen Kumar Thokala., Parijat Deshpande.,
and Venkatesh Kaliaperumal. Perdmcs: Weighted fusion of ppg signal features for robust and efficient dia-
betes mellitus classification. In Proceedings of the 10th International Joint Conference on Biomedical Engi-
neering Systems and Technologies - SmartMedDev, pages 553–560. SciTePress, 2017. ISBN 978-989-758-213-4.
doi:10.5220/0006297205530560.
Chirath Hettiarachchi and Charith Chitraranjan. A machine learning approach to predict diabetes using short recorded
photoplethysmography and physiological characteristics. In Artificial Intelligence in Medicine, volume 11526, pages
9
322–327. Springer International Publishing, 2019. ISBN 978-3-030-21641-2 978-3-030-21642-9. doi:10.1007/978-

3-030-21642-9_41.
Yongbo Liang, Zhengcheng Chen, Guiyong Liu, and Mohamed Elgendi. A new, short-recorded photoplethysmogram
dataset for blood pressure monitoring in china. Scientific Data, 8:180020, 2018.
Enrique Monte Moreno, Maria Jose Anyo Lujan, Montse Torrres Rusinol, Paqui Juarez Fernandez, Pilar Nunez
Manrique, Cristina Aragon Trivino, Magda Pedrosa Miquel, Marife Alvarez Rodriguez, and M. Jose Gonzalez
Burguillos. Type 2 diabetes screening test by means of a pulse oximeter. IEEE Transactions on Bio-Medical
Engineering, 64(2):341–351, 2017. ISSN 1558-2531 0018-9294. doi:10.1109/TBME.2016.2554661.
Venkatesh Bharadwaj Srinivasan and Foroohar Foroozan. Deep learning based non-invasive diabetes predictor using
photoplethysmography signals. In 2021 29th European Signal Processing Conference (EUSIPCO), pages 1256–1260,
2021. doi:10.23919/EUSIPCO54536.2021.9616351.
Serena Zanelli, Mounim A. El Yacoubi, Magid Hallab, and Mehdi Ammi. Type 2 diabetes detection with light cnn
from single raw ppg wave. IEEE Access, 11:57652–57665, 2023. doi:10.1109/ACCESS.2023.3274484.
Juliana Tolles and William J. Meurer. Logistic Regression: Relating Patient Characteristics to Outcomes. JAMA, 316
(5):533–534, 08 2016. ISSN 0098-7484. doi:10.1001/jama.2016.7653.
Tianqi Chen and Carlos Guestrin. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD
International Conference on Knowledge Discovery and Data Mining, KDD ’16, page 785–794. Association for
Computing Machinery, 2016a. ISBN 9781450342322. doi:10.1145/2939672.2939785.
Gaobo Zhang, Zhen Mei, Yuan Zhang, Xuesheng Ma, Benny Lo, Dongyi Chen, and Yuanting Zhang. A noninva-
sive blood glucose monitoring system based on smartphone ppg signal processing and machine learning. IEEE
Transactions on Industrial Informatics, 16(11):7209–7218, 2020. doi:10.1109/TII.2020.2975222.
Thiago Bulhões Da Silva Costa, Felipe Meneguitti Dias, Diego Armando Cardona Cardenas, Marcelo Arruda Fiuza De
Toledo, Daniel Mário De Lima, Jose Eduardo Krieger, and Marco Antonio Gutierrez. Blood pressure estimation
from photoplethysmography by considering intra- and inter-subject variabilities: Guidelines for a fair assessment.
IEEE Access, 11:57934–57950, 2023. doi:10.1109/ACCESS.2023.3284458.
Moajjem Hossain Chowdhury, Md Nazmul Islam Shuzan, Muhammad E.H. Chowdhury, Zaid B. Mahbub, M. Monir
Uddin, Amith Khandakar, and Mamun Bin Ibne Reaz. Estimating blood pressure from the photoplethysmogram
signal and demographic features using machine learning techniques. Sensors, 20(11), 2020. ISSN 1424-8220.
doi:10.3390/s20113127.
C El-Hajj and P.A Kyriacou. Cuffless blood pressure estimation from ppg signals and its derivatives using
deep learning models. Biomedical Signal Processing and Control, 70:102984, 2021. ISSN 1746-8094.
doi:https://doi.org/10.1016/j.bspc.2021.102984.
Wan-Hua Lin, Xiangxin Li, Yuanheng Li, Guanglin Li, and Fei Chen. Investigating the physiological mechanisms of
the photoplethysmogram features for blood pressure estimation. Physiological Measurement, 41(4):044003, may
2020. doi:10.1088/1361-6579/ab7d78.
Jill C. Stoltzfus. Logistic regression: A brief primer. Academic Emergency Medicine, 18(10):1099–1104, 2011.
doi:https://doi.org/10.1111/j.1553-2712.2011.01185.x.
Tianqi Chen and Carlos Guestrin. XGBoost. In Proceedings of the 22nd ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining. ACM, aug 2016b. doi:10.1145/2939672.2939785.
Philip de Chazal, M. O’Dwyer, and R.B. Reilly. Automatic classification of heartbeats using ecg morphol-
ogy and heartbeat interval features. IEEE Transactions on Biomedical Engineering, 51(7):1196–1206, 2004.
doi:10.1109/TBME.2004.827359.
Neelamshobha Nirala, R. Periyasamy, Bikesh Kumar Singh, and Awanish Kumar. Detection of type-2 diabetes using
characteristics of toe photoplethysmogram by applying support vector machine. Biocybernetics and Biomedical
Engineering, 39(1):38–51, 2019. ISSN 0208-5216. doi:https://doi.org/10.1016/j.bbe.2018.09.007.
Shresth Gupta, Anurag Singh, Abhishek Sharma, and Rajesh Kumar Tripathy. dsvri: A ppg-based novel feature for early
diagnosis of type-ii diabetes mellitus. IEEE Sensors Letters, 6(9):1–4, 2022. doi:10.1109/LSENS.2022.3203609.
Robert Avram, Jeffrey E Olgin, Peter Kuhar, J Weston Hughes, Gregory M Marcus, Mark J Pletcher, Kirstin Aschbacher,
and Geoffrey H Tison. A digital biomarker of diabetes from smartphone-based vascular signals. Nature medicine, 26
(10):1576—1582, October 2020. doi:10.1038/s41591-020-1010-5.
S.C. MILLASSEAU, R.P. KELLY, J.M. RITTER, and P.J. CHOWIENCZYK. Determination of age-related in-
creases in large artery stiffness by digital pulse contour analysis. Clinical Science, 103(4):371–377, 08 2002.
doi:10.1042/cs1030371.
10
Justin Chu, Wen-Tse Yang, Tung-Han Hsieh, and Fu-Liang Yang. One-minute finger pulsation measurement for
diabetes rapid screening with 1.3% to 13% false-negative prediction rate. Biomedical Statistics and Informatics, 6(1):
6, 2021. ISSN 2578-871X. doi:10.11648/j.bsi.20210601.12.
Madhuri Panwar, Arvind Gautam, Rashi Dutt, and Amit Acharyya. CardioNet: Deep learning framework for prediction
of CVD risk factors. In 2020 IEEE International Symposium on Circuits and Systems (ISCAS), pages 1–5. IEEE,
2020. ISBN 978-1-72813-320-1. doi:10.1109/ISCAS45731.2020.9180636.
11

M L - B D D U P S F: Achine Earning Ased Iabetes Etection Sing Hotoplethysmography Ignal Eatures

Uploaded by

Copyright:

Available Formats

M L - B D D U P S F: Achine Earning Ased Iabetes Etection Sing Hotoplethysmography Ignal Eatures

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

M L - B D D U P S F: Achine Earning Ased Iabetes Etection Sing Hotoplethysmography Ignal Eatures

Uploaded by

Copyright:

Available Formats

M ACHINE L EARNING -BASED D IABETES D ETECTION U SING

P HOTOPLETHYSMOGRAPHY S IGNAL F EATURES

Filipe A. C. Oliveira Felipe M. Dias Marcelo A. F. Toledo

Clinics Hospital University of Clinics Hospital University of Clinics Hospital University of

Diego A. C. Cardenas Douglas A. Almeida Estela Ribeiro

Jose E. Krieger Marco A. Gutierrez

Keywords Photoplethysmography · Wearable devices · Diabetes.

2 Materials and Methods

DATA SELECTION PREPROCESSING FEATURE EXTRACTION CLASSIFICATION

PPG signals Low‐pass S PPT

2.1 Data Selection

Table 1: Summary of the data used.

2.3 Feature extraction

322–327. Springer International Publishing, 2019. ISBN 978-3-030-21641-2 978-3-030-21642-9. doi:10.1007/978-

You might also like