Dutta 2020
Dutta 2020
Dutta 2020
a r t i c l e i n f o a b s t r a c t
Article history: This study proposes an efficient neural network with convolutional layers to classify significantly class-
Received 3 September 2019 imbalanced clinical data. The data is curated from the National Health and Nutritional Examination Sur-
Revised 17 March 2020
vey (NHANES) with the goal of predicting the occurrence of Coronary Heart Disease (CHD). While the
Accepted 24 March 2020
majority of the existing machine learning models that have been used on this class of data are vulner-
Available online 21 May 2020
able to class imbalance even after the adjustment of class-specific weights, our simple two-layer CNN
Keywords: exhibits resilience to the imbalance with fair harmony in class-specific performance. Given a highly im-
Coronary heart disease balanced dataset, it is often challenging to simultaneously achieve a high class 1 (true CHD prediction
Machine learning rate) accuracy along with a high class 0 accuracy, as the test data size increases. We adopt a two-step
LASSO regression approach: first, we employ least absolute shrinkage and selection operator (LASSO) based feature weight
Convolutional neural network assessment followed by majority-voting based identification of important features. Next, the important
Artificial Intelligence
features are homogenized by using a fully connected layer, a crucial step before passing the output of
NHANES
the layer to successive convolutional stages. We also propose a training routine per epoch, akin to a sim-
ulated annealing process, to boost the classification accuracy.
Despite a high class imbalance in the NHANES dataset, the investigation confirms that our proposed CNN
architecture has the classification power of 77% to correctly classify the presence of CHD and 81.8% to
accurately classify the absence of CHD cases on a testing data, which is 85.70% of the total dataset. This
result signifies that the proposed architecture can be generalized to other studies in healthcare with a
similar order of features and imbalances. While the recall values obtained from other machine learning
methods, such as SVM and random forest, are comparable to that of our proposed CNN model, our model
predicts the negative (Non-CHD) cases with higher accuracy. Our model architecture exhibits a way for-
ward to develop better investigative tools, improved medical treatment and lower diagnostic costs by
incorporating a smart diagnostic system in the healthcare system. The balanced accuracy of our model
(79.5%) is also better than individual accuracies of SVM or random forest classifiers. The CNN classifier
results in high specificity and test accuracy along with high values of recall and area under the curve
(AUC).
© 2020 Elsevier Ltd. All rights reserved.
https://doi.org/10.1016/j.eswa.2020.113408
0957-4174/© 2020 Elsevier Ltd. All rights reserved.
2 A. Dutta, T. Batabyal and M. Basu et al. / Expert Systems With Applications 159 (2020) 113408
bier, and Just, 1993). Ahmed et al. (2017) show that Body Mass network during training); 3) in conjunction with the architecture,
Index (BMI) and systolic blood pressure are the two most critical we propose a simulated annealing-like training schedule that is
factors affecting hypertension. Fava et al. (2013) conclude signif- shown to minimize the generalization error between train and test
icant association between age, sex, BMI and heart rate with hy- losses.
pertension. Studies in general population indicate that high level It is important to note that our work is not intended to pro-
of creatinine in blood can increase the risk of CHD (Irie et al., vide a sophisticated architecture using a neural network. We also
2006; Wannamethee, Shaper, and Perry, 1997). Additionally, blood do not focus on providing theoretical explanation on how our net-
cholesterol and glycohaemoglobin levels are found to be persis- work offers resistance to data imbalance. Instead, our goal is to es-
tently and significantly high in patients with CHD (Burchfiel, Tracy, tablish that under certain constraints one can apply convolutional
Chyou, and Strong, 1997; Meigs et al., 1997). Several researchers stages despite the scarcity of data and the absence of well-defined
have used statistical and machine learning models on echocar- data augmentation techniques and to show that the shallow lay-
diography images (Madani, Arnaout, Mofrad, and Arnaout, 2018; ers of convolution indeed offer resilience to the data imbalance
Nakanishi et al., 2018) and electrocardiography signals (Jin, Sun, problem by dint of a training schedule. The proposed pipeline con-
and Cheng, 2009; Shen et al., 2016) to predict clinically signifi- tributes to improving CHD prediction rates in imbalanced clinical
cant parameters related to CHD in patients, such as heart rate and data, based on a robust feature selection technique using LASSO
axis deviation. Boosted algorithms such as gradient boost and logit and shallow convolutional layers. This serves to improve predic-
boost have been used in literature to predict FFR and cardiovas- tion algorithms included in smart healthcare devices where so-
cular events (Goldstein, Navar, and Carter, 2017; Weng, Reps, Kai, phisticated neural algorithms can learn from past user data to pre-
Garibaldi, and Qureshi, 2017). Frizzell et al. and Mortazavi et al. dict the probability of heart failure and strokes. Prediction rates
built prediction models to determine the presence of cardiovascu- could be integrated in healthcare analytics to provide real time
lar disease using the 30-day readmission electronic data for pa- monitoring which not only benefits the patients but also medi-
tients with heart failure. The reported C-statistic of the models cal practitioners for efficient operations. The present research also
varied from 0.533 to 0.628, showing an improvement in predic- focuses on a systematic training schedule which can be incorpo-
tion with the machine learning approach over traditional statistical rated in smart devices to improve tracking of different predic-
methods. tor variable levels for heart failure. The rest of the paper is or-
Numerous risk factor variables often make the prediction of ganized as follows: Section 2 explains data preparation and the
CHD difficult, which in turn, increases the cost of diagnosis and preprocessing techniques. In Section 3, we illustrate the convolu-
treatment. In order to resolve the complexities and cost of diag- tional neural network architecture with details on the training and
nosis, advanced machine learning models are being widely used testing methodology. In Section 4 we demonstrate the results ob-
by researchers to predict CHD from clinical data of patients. tained from our model with performance evaluation metrics and
Kurt, Ture, and Kurum (2008) compared prediction performances compare it with existing models. Section 5 is the conclusion and
of a number of machine learning models including the multi- discussion section. Here, several extensions to the research are
layer perceptron (MLP) and radial basis function (RBF) to pre- proposed.
dict the presence of CHD in 1245 subjects (Kurt et al., 2008).
The MLP was found to be the most efficient method, yielding 2. Data preprocessing
an area under the receiver operating characteristic (ROC) curve
of 0.78. Kahramanli and Allahverdi (2008), Shilaskar and Gha- Our study uses the NHANES data from 1999–20 0 0 to 2015–
tol (2013), Haq, Li, Memon, Nazir, and Sun (2018) proposed a hy- 2016. The dataset is compiled by combining the demographic, ex-
brid forward selection technique wherein they were able to se- amination, laboratory and questionnaire data of 37,079 (CHD –
lect smaller subsets and increase the accuracy of the presence 1300, Non-CHD – 35,779) individuals as shown in Fig. 1. Demo-
of cardiovascular disease with reduced number of attributes. Sev- graphic variables include age and gender of the survey partici-
eral other groups have reported techniques, such as artificial neu- pants at the time of screening. Participant weight, height, blood
ral network (ANN), fuzzy logic (FL) and deep learning (DL) meth- pressure and body mass index (BMI) from the examination data
ods to improve heart disease diagnosis (Das, Turkoglu, & Sen- are also considered as a set of risk factor variables to study their
gur, 2009; Olaniyi, Oyedotun, & Khashman, 2015; Uyar, 2017; effect on cardiovascular diseases. NHANES collects laboratory and
Venkatesh, 2017). Nonetheless, in most of the previous studies, the survey data from participants once in every two years depending
patient cohort was limited to a few thousand with limited risk on their age and gender. In addition, based on the already existing
factors. validated experimental research, a comprehensive list of risk fac-
We propose an efficient neural network with convolutional lay- tor variables is selected from the laboratory tests conducted. Ques-
ers using the NHANES dataset to predict the occurrence of CHD. tionnaire data comprises of questions asked at home by interview-
A complete set of clinical, laboratory and examination data are ers using a Computer-Assisted Personal Interview (CAPI) system as
used in the analysis along with a feature selection technique by mentioned in the NHANES website (NHANES, 2015). A total of 5
LASSO regression. Data preprocessing is performed using LASSO dichotomous predictor categorical variables are selected from the
followed by a feature voting and elimination technique. The per- questionnaire data which have been shown to affect CHD (refer-
formance of the network is compared to several existing tradi- ences, required). In all, 30 continuous and 6 categorical indepen-
tional ML models in conjunction with the identification of a set dent variables are used to predict the likelihood of coronary heart
of important features for CHD. Our architecture is simple in de- disease. For this study, coronary heart disease (CHD) is used as the
sign, elegant in concept, sophisticated in training schedule, effec- dichotomous dependent variable. Awareness of CHD is defined as
tive in outcome with far-reaching applicability in problems with “yes” response to the question “Have you been ever told you had
unbalanced datasets. Our research contributes to the existing stud- coronary heart disease?” Table 1 shows the categorical indepen-
ies in three primary ways: 1) our model uses a variable elimina- dent and dependent variables in the dataset considered for model
tion technique using LASSO and feature voting as preprocessing development.
steps; 2) we leverage a shallow neural network with convolutional The exhaustive list of variables is: gender, age, annual-family-
layers, which improves CHD prediction rates compared to exist- income, ratio-family-income-poverty, 60 s pulse rate, systolic, dias-
ing models with comparable subjects (the ‘shallowness’ is dictated tolic, weight, height, body mass index, white blood cells, lympho-
by the scarcity of class-specific data to prevent overfitting of the cyte, monocyte, eosinophils, basophils, red blood cells, hemoglobin,
A. Dutta, T. Batabyal and M. Basu et al. / Expert Systems With Applications 159 (2020) 113408 3
Fig. 1. Data compilation from National Health and Nutritional Survey (NHANES). The data is acquired from 1999 to 2016 in three categories – Demography, Examination and
Laboratory. Based on the nature of the factors that are considered, the dataset contains both the quantitative and the qualitative variables.
Table 1.
Description of the risk factor independent variables and the dependent variable.
Fig. 2. Proposed convolutional neural network architecture. The ‘Input’ is a 1D numerical array corresponding to all the factors/variables from LASSO-Majority Voting pre-
processing stage. The ‘Dense’ layer, immediately after the ‘Input’, combines all the factors and each neuron (computing node) at the output of ‘Dense’ layer is a weighted
combination of all the variables, indicating a homogeneous mix of different variable types. The next two convolution layers seek representation of the input variables via the
‘Dense’ layer. The next two ‘Dense’ layers are followed by the ‘Softmax’ layer. The last two ‘Dense’ layers (before the ‘Softmax’ layer) can be retrained for transfer learning
in case new data is obtained. The associated training parameters, such as dropout probability, number of neurons, activation function (we used ReLU), pooling types, and
convolution filter parameters are shown in the above figure. Owing to the large number of parameters that can lead to overfitting of training data points, we propose a
training schedule in Section 3.2.1.
Adam optimizer with learning rate 0.005, β 1 = 0.9, β 2 = 0.999 3.4. Competitive approaches
and zero decay. Our proposed architecture consists of 32,642 train-
able and 1164 non-trainable parameters. We experiment with sev- Machine learning classification methods have shown to poten-
eral hyperparameters that are associated with our model to obtain tially improve prediction outcomes in coronary heart disease. Such
consistent class-wise accuracy. We provide results by varying sub- classification methods include logistic regression, support vector
sampling of input data, epochs, class-weights, the number of neu- machines, random forests, boosting methods and multilayer per-
rons in each dense layer except the last one, and the number of ceptron (Goldstein et al., 2017). Logistic regression models the pre-
filters in each convolution layer during training. diction of a binomial outcome with one or more explanatory vari-
ables, using a standard logistic function which measures the re-
lationship between the categorical dependent variable and one or
more independent variables by estimating the probabilities. The lo-
3.3.1. Training schedule
gistic function is given by, f (x ) = 1+1e−x which in common practice
During training, the class weight ratio, which is adjusted as a
is known as the sigmoid curve. Support vector machine (SVM) is
penalty factor due to class imbalance, is defined as the ratio of
a binary classification algorithm which generates a (N-1) dimen-
CHD and Non-CHD dataset. For example, a class weight ratio of
sional hyperplane to separate points into two distinct classes in an
10:1 indicates that any misclassification of a CHD training sam-
N dimensional plane. The classification hyperplane is constructed
ple will be penalized 10 times more than a misclassified Non-CHD
in a high dimensional space that represents the largest separation,
sample during the error calculation at the output prior to back-
or margin, between the two classes.
propagation after each epoch. Although, we use dropout layers in
Random forests are an ensemble learning algorithm where de-
our CNN model, we also use this training schedule in order to fur-
cision trees that grow deep are averaged and trained on different
ther reduce possible overfitting. The intuition is to initially train
parts of the training set to reduce variance and avoid overfitting.
the model with 1:N weight ratio for sufficiently large number of
Random forests algorithm employs bagging or bootstrap aggregat-
epochs and then, gradually increase the weight ratio with a steady
ing and at each split a random subset of features are selected. Bag-
decline in the number of epochs. Let the actual class weight ratio
ging is a parallel ensemble because each model is built indepen-
is ρ 0 : 1, which we take as a factor ρ 0 .
dently. Boosting on the other hand is a sequential ensemble where
each model is built based on correcting the misclassifications of
Fitting our CNN model, M, by varying the number of epochs (ω) and weight the previous model. In boosting methods, the weights are initial-
ratio (ρ )
ized on training samples and for n iterations, a classifier is trained
1. Initialize ρ = N, ω (large number, we set as N), M, end_iter (5–10 using a single feature and training error evaluated. Then the clas-
depending on the instance), i = 1 sifier with the lowest error is chosen and the weights are updated
2. While ρ ≤ ρ 0
M.fit (Data, ω , ρ )
accordingly; the final classifier is formed as a linear combination
T
ρ ← f loor ( ρ2 ) of n classifiers. A boost classifier is in the form, FT (x ) = t=1 ft (x )
ω ← f loor ( ω2 ) where each ft is a weak learner with x as input. Each weak learner
3. While (i ≤ end _iter) and Trainloss(i) ≤ Trainloss(i − 1) produces an output hypothesis, h(xi ), for each sample in the train-
M.fit (Data, i, ρ 0 )
ing set. At each iteration t, a weak learner is selected and assigned
6 A. Dutta, T. Batabyal and M. Basu et al. / Expert Systems With Applications 159 (2020) 113408
Fig. 3. Correlation table for the independent predictor variables. In this table, moderately strong correlations among few pairs are observed (Glucose and Glycohemoglobin,
Red blood cells and Hemoglobin, ALT and AST, Weight and Body-Mass-Index). Rest of the pairs show fairly low correlation values, implying the variables after the LASSO-
Majority voting stage are sufficiently decorrelated.
a coefficient α t such that the sum training error Et of the resulting class data by either replicating or synthesizing new data. One per-
t-stage boost classifier is minimized. A multilayer perceptron (MLP) tinent issue with regard to synthesized data is that, unlike im-
is a feedforward artificial neural network (ANN) which consists of ages, the data in the context of biological factors (variables) may
an input layer, an output layer and one or more hidden layers and be implausible as it is difficult to verify the authenticity of newly
utilizes backpropagation for training the data. The MLP commonly augmented data, especially when both classes of data are closely
uses a nonlinear activation function which maps the weighted in- spaced. In this paper, we provide comparative results by using the
puts to the output of each neurons in the hidden layers. In an above algorithms. In addition, we provide the corresponding visu-
MLP, the connection weights are changed based on the error be- alization of the augmented data via t-SNE. Please keep in mind
tween the generated output and expected result. Two of the most that t-SNE is a non-convex algorithm, generating embedding that
common activation functions are the rectified linear unit (ReLU), depends on the initialization low-dimensional embedding. We em-
f(x) = x+ and the hyperbolic tangent, y(xi ) = tanh(xi ). ploy random undersampling strategy to select a subset of data for
Data augmentation demands attention in the context of data training the CNN. Similar to data augmentation, there are several
imbalance. Algorithms, such as random oversampling (ROS), syn- data undersampling strategies. We compare our results with edited
thetic minority over-sampling technique (SMOTE) (Chawla, Bowyer, nearest neighbor (EDN) (Wilson, 1972), instance hardness thresh-
Hall, and Kegelmeyer, 2002), and adaptive synthetic sampling old (IHT) (Smith, Martinez, and Giraud-Carrier, 2014) and three
(ADASYN) (He, Bai, Garcia, and Li, 2008) augment the minority versions of near-miss (NM-v1, v2 and v3) (Mani, 2003) algorithms.
A. Dutta, T. Batabyal and M. Basu et al. / Expert Systems With Applications 159 (2020) 113408 7
Fig. 4. Model accuracy as a function of majority voting threshold. The threshold value of majority voting affects the classification accuracy of CHD as the selection of this
value controls the number of variables that are to be channeled to our CNN model. The smaller is the threshold value, the larger is the set of variables. Based on the training
loss, training accuracy and test accuracy, the threshold value between 16.67 (100/6) – 20 (100/5) combining 100 instances of LASSO appears suitable for obtaining balanced
per class (CHD and Non-CHD) classification accuracy.
Table 2.
Training schedule for increasing class weight ratio and sampling for optimal threshold. A maximum
training accuracy if 83.51% and minimum training loss of 0.489 is achieved with a misclassification
penalty of 3:1 (CHD: Non-CHD) and a sampling ratio of 130 0:40 0 0 (CHD: Non-CHD).
Class Weight Sampling TPR TNR Train Acc (%) Test Acc (%) Training Loss
The performance of our binary classifier is calculated by com- ones, independently from the class distribution. It does so by plot-
puting the ROC curve (Yang, Zhang, Lu, Zhang, and Kalui, 2017). ting parametrically the true positive rate (TPR) vs the false pos-
The area under the curve (AUC) value in the ROC curve is the prob- itive rate (FPR) at various threshold settings as shown in Fig. 4.
ability that our proposed CNN classifier ranks a randomly chosen (Right). The calculated AUC is 0.767 or 76.7% which is compara-
positive case (CHD) higher than a randomly chosen negative case ble to previous studies related to CHD (Martinez, Schwarcz, Valdez,
(Non-CHD) (Tom 2005). Thus, the ROC curve behaves as a tool to and Diaz, 2018). In highly imbalanced data sets balanced accuracy
select the possible optimal models and to reject the suboptimal is often considered to be a more accurate metric than normal ac-
Table 4.
Confusion matrix for the CNN classifier for coronary heart disease. Out of 208 coronary heart dis-
ease cases in the sample cohort, 161 cases were predicted correctly by the classifier. The proposed
classifier also correctly predicts 25,828 cases where patient did not report coronary heart disease.
Predicted Presence of CHD True Positive (TP) = 161 False Negative (FN) = 47
Condition Absence of CHD False Positive (FP) = 5743 True Negative (TN) = 25,828
A. Dutta, T. Batabyal and M. Basu et al. / Expert Systems With Applications 159 (2020) 113408 9
Fig. 5. Training and test accuracies with varying misclassification penalties for class 1 and 0. The minimum difference between training and test accuracies is obtained with
a class weight of 3:1 (CHD: Non-CHD) and a training loss = 0.489. The model is trained with a constant optimized learning rate of 0.006 and 60 epochs.
Table 3. Table 5.
Training schedule for increasing class weight ratio and optimal sampling ratio at- Comparison of machine learning models for coronary heart disease prediction.
tained from table I. The difference between the training (83.17%) and test (82.32%) As compared to traditional machine learning models, our proposed model at-
accuracies attain the minimum when the misclassification penalty of class 1 is set tains a recall value of 0.77 which is comparable to the SVM classifier. However,
three times higher than class 0. specificity (0.81) and test accuracy (0.82) of our model are significantly higher
than the SVM classifier.
Class Weight TPR TNR Train Acc (%) Test Acc (%) Training Loss
Recall (%) Specificity (%) Test Acc (%) AUC
50:1 0.980 0.383 59.43 44.48 1.591
25:1 0.923 0.568 67.06 58.04 1.236 Logistic Regression 51.44 91.15 90.89 71.29
12:1 0.860 0.664 71.98 66.60 0.965 SVM 77.40 77.87 77.87 77.64
8:1 0.836 0.686 75.08 69.76 0.765 Random Forest 76.44 76.06 76.06 76.25
6:1 0.817 0.740 80.13 74.12 0.620 AdaBoost 52.88 90.36 90.12 71.63
4:1 0.788 0.770 81.00 77.10 0.550 MLP 66.34 78.88 78.80 72.61
3:1 0.773 0.818 83.17 82.32 0.489 Our model 77.3 81.8 81.78 76.78
curacy itself. The balanced accuracy of the model is determined to racies and AUC values are also determined. Logistic regression and
be (TPR + TNR)/2 = 0.795 or 79.5%. The fall out rate or the Type-I adaboost classification result in highest test accuracies but these
error of the model is 5743/31,571 = 18.2% and the miss rate or the classifiers suffer from low recall values which is the true positive
Type-II error of the model is 47/210 = 22.6%. The positive likeli- rate for coronary heart disease detection. While the recall values
hood ratio of the predicted model is 4.27 indicating that there is obtained from SVM and random forest are comparable to that of
almost a 30% increase in probability post diagnosis in prediction our proposed CNN model, our model predicts the negative (Non-
of the presence of CHD in patients. A negative likelihood ratio of CHD) cases with higher accuracy as shown in Table 5. The balanced
0.27 was calculated which signifies that there is approximately 30% accuracy of our model (79.5%) is also higher than individual accu-
decrease in probability post diagnosis in prediction of absence of racies from that of SVM or random forest classifiers. An optimized
CHD in patients. two-layer multilayer perceptron resulted in a low recall value of
66.34% when tested on our test cohort. Results in Table 5 show
4.3. Comparison of ML models that SVM and random forest classifiers perform better than logistic,
adaboost and MLP classifiers, but the specificity and test accuracy
4.3.1. Comparison with state-of-the-art ML models are significantly lower as compared to our designed CNN classi-
Machine learning models discussed in Section 3.3 are imple- fier. The CNN classifier results in high specificity and test accuracy
mented and tested on our test cohort. The prediction results from along with high values of recall and AUC.
these methods are then compared with the results of our proposed These results confirm that the CNN classifier outperforms all
CNN architecture. All models are implemented with optimized pa- existing commonly used machine learning models for coronary
rameters and then compared based on the true positive rate (re- heart disease prediction in terms of accuracy in prediction for both
call) and true negative rate (sensitivity). Corresponding test accu- CHD and Non-CHD classes.
10 A. Dutta, T. Batabyal and M. Basu et al. / Expert Systems With Applications 159 (2020) 113408
Fig. 6. Results using three oversampling techniques for the data augmentation of the minority class. For each technique, the results provide training accuracy, test accuracy,
training loss, CHD accuracy (class-specific) and no-CHD accuracy (class-specific) over a number of epochs, added with t-SNE low-dimensional embedding for data visualiza-
tion in 3D. (a) t-SNE visualization of 90% of the original data used for training. 10% of the data is reserved for testing. (b) The results using random oversampling (ROS). Note
that we did not provide the t-SNE visualization for ROS as, in ROS, data samples from the minority class are randomly picked and added to the data, thereby maintaining
the same data with redundant samples. So, the visualization is same as the original data in (a). (c),(d) Results using SMOTE with visualization. (e),(f) Results using ADASYN
with visualization.
12 A. Dutta, T. Batabyal and M. Basu et al. / Expert Systems With Applications 159 (2020) 113408
Fig. 7. t-SNE visualization of the five undersampling techniques for the data reduction of the majority class. (a), (e) and (f) Near-miss using k-nearest neighbor (version-1,2
and 3). (b) Random subsampling with 3:1 as no-CHD: CHD data samples (one instance). (c) Edited nearest neighbor (EDN). (d) Instance hardness threshold (IHT).
A. Dutta, T. Batabyal and M. Basu et al. / Expert Systems With Applications 159 (2020) 113408 13
Fig. 8. Results using the undersampling algorithms from Fig. 6(a)–(e). (a) CHD detection accuracies over epochs using the algorithms. (b) Detection accuracies of no-CHD test
data. Among all the competitive undersampling strategies that we compare our results with, near-miss (version 3) works better in improving both class-specific accuracies.
we need more refined approaches to address the resilience to data networks. Therefore, we pay our attention to shallow networks in
imbalance. It is because in 2D cases, for example image data, there this context. The results of various sequential convolutional net-
exists spatial correlation among pixels that need to be taken into works are enumerated in Table 7.
account, whereas we are considering mixed-type 1D variables that Table 7 suggests that, if properly trained, MLP indeed shows im-
may or may not be correlated at all. provement in accuracy of the majority class, which unfortunately
The small number of samples in the minority class and the in- affects the accuracy of the minority class. The difference between
feasibility of data augmentation prohibit us from designing deep classwise accuracies is 27.14% for MLP-I. MLP-II with an extra deep
14 A. Dutta, T. Batabyal and M. Basu et al. / Expert Systems With Applications 159 (2020) 113408
Table 7.
Experiments with different shallow layers on CHD dataset. I = input, O = output. C2 = a convolution layer with 2 filters, C4 = a
convolution layer with four filters, C8 = a convolution layer with 8 filters. 64, 128, 512 = dense layers.
No-CHD acc (%) CHD acc (%) Overall acc (%) Acc difference (%) No of parameters
layer compared to MLP-I seems to have a decline in the accuracy of randomly subsampled datasets, LASSO is performed repeatedly
difference (5.73%). However, this is achieved after a careful training to check the consistency of the variable contribution, which is a
and there is a significant chance of overfitting as the number of crucial step in our algorithm to control the true-negatives of vari-
trainable parameters of MLP-II is 45,442 and this is approximately able selection. Finally, a majority voting algorithm is applied to ex-
9 times of the amount of input data. tract the significant variables of interest, a step that achieves di-
It can be noticed that the convolutional layers provide surpris- mensionality reduction by excising unimportant variables. We do
ing resilience to class imbalance in terms of the difference between not follow the conventional dimension reduction techniques, such
classwise accuracies. Conv-I, II and III yield 2.04%, 2.4%, and 0.07% as Local Linear Embedding (LLE) and Principal Component Analy-
accuracy differences. However, this comes at a cost of achieving sis (PCA) because these methods generally provide dimensions that
lower overall accuracy scores. By restricting ourselves to sequential are linear or nonlinear combination of the data variables, leading
design for simplicity, we start investigating two possible architec- to a lack of interpretability of the derived dimensions. For example,
tures Conv-IV and the last one in Table 7. After rigorous training, a linear combination of BMI and alkaline phosphatase (ALP) is dif-
it is observed that sequential placements of C2-C4 outputs better ficult to interpret. Rather, we explicitly use t-SNE for the visualiza-
accuracy. tion of under and over-sampled data generated by applying state-
Note that the total number of parameters of our architecture of-the art algorithms. As we utilize LASSO, a potential research av-
is 32,066, which is significantly higher than Conv-I, II and III, but enue would be to test if LASSO reflects the true importance of a
moderately lower than MLP-I and II. Such large number of parame- variable through its shrinkage, and if not, this would call for the
ters is due to the presence of the dense layers (64 and 512). While construction of an appropriate optimization function.
C2-C4 attempts to minimize the difference between classwise ac- Once we obtain the significant predictor variables with LASSO
curacies, dense layers try to improve the overall test accuracy. and feature voting, we feed them to our 1-D feedforward convolu-
Another point worth to mention is the stability of accuracy tional neural network. We substantiate that shallow convolutional
achieved by individual model. While training MLP-I and II, it is ob- layers provide adaptability in data imbalance in terms of our re-
served that the training and test accuracies have the tendency to sults in Table 5, where, in contrast to Logistic Regression and Ad-
monotonically increase over epochs when we gradually decrease aboost, our model provides a balanced class wise classification ac-
the weight ratio to 13:40 (see Table 3 for weight ratio) according curacy. In a cohort of 37,079 individuals with high imbalance be-
to the previously-mentioned training schedule. This is degenerative tween presence of CHD and Non-CHD, we show that it is possible
because after each epoch, the accuracies (train and test) tend to be to predict the CHD cases with 77.3% and Non-CHD cases with 81.8%
higher than the ones at previous epoch (destabilization) while the accuracy, indicating that the prediction of the CHD class, which is
accuracy of the minority class starts plummeting. This degenera- deficit in the number of reported patient samples, does not suf-
tion is strikingly diminished after the inclusion of multiple con- fer significantly from the data imbalance. The preprocessing stage,
volution layers. In short, the accuracies that our proposed model consisting of repeated LASSO and majority voting, is pivotal in fil-
yields so far are stable for a fairly large number of epochs. tering out highly correlated variables, setting flags only for the un-
correlated ones to be fed to our CNN model. This is clearly ob-
5. Conclusion, limitations and future research served from Fig. 3. Each LASSO stage maintains 1:1 ratio of CHD:
Non-CHD data to avoid the adverse effect of data imbalance on the
In this paper, we propose a multi-stage model to predict CHD final shrinkage parameters γ . However, the 1:1 ratio does not en-
using a highly imbalanced clinical data containing both qualitative capsulate enough variation in Non-CHD classification as this class
and quantitative attributes. For such clinical data, imbalance is an contains large number of reported cases in our dataset. We re-
imminent challenge that exists due to the limited availability of peat LASSO with randomly sampled subsets and eventually apply
data. Such data imbalance adversely affects the performance of any majority voting to assess the importance of a variable. Once the
state-of-the-art clinical classification model. As a remedy to the dominant predictor variables are identified, we discard their LASSO
imbalance problem, one cannot efficiently apply conventional tech- values in this paper. In future work, instead of disregarding the
niques, such as data augmentation strategy due to biological im- LASSO-majority voting generated weight values, one can integrate
plausibility of replication of several attributes in the clinical data. them to the subsequent CNN model as priors, and test if that leads
By way of extensive experimentation and validation, we establish to enhancement in the performance of the model.
that a special-purpose, shallow convolutional neural network ex- A potential problem might arise while using LASSO due to the
hibits a considerable degree of resilience towards data imbalance, linear nature of the estimator. LASSO is a penalized regression
thereby producing classification accuracy superior to the existing technique, where sparsity of variables is enforced by a non-convex
machine learning models (Table 5). Our model is simple in con- penalty. Competitive methods, such as cross-correlation based vari-
cept, modular in design, and offers moderate resilience to data im- able selection, also supports the linear map, however, with a lit-
balance. tle difference. LASSO exploits partial correlation between the fac-
The proposed model initiates with the application of LASSO re- tors, which caters to the relevant prediction of output responses,
gression in order to identify the contribution of significant vari- whereas cross-correlation computes, in a sense, the marginal cor-
ables or attributes in the data variation. Using multiple instances
A. Dutta, T. Batabyal and M. Basu et al. / Expert Systems With Applications 159 (2020) 113408 15
relation between each pair of factors, which might not be mono- last two dense layers can be retrained for new data. Thus, a sig-
tonic and linear all the time. Nonetheless, the assumption of linear nificant future research direction would be to implement CNN for
relationship between the input factors and output labels may have predictions from similar clinical datasets where such imbalanced
some consequential limitations and the number of factors might number of positive and negative classifications exist.
be significantly greater than what is present in the current data. A
further refined approach, in this case, would be a two-step, non- Credit author statement
linear reduction of dimension, where, we can use techniques, such
as sure independence screening (SIS), conditional SIS or graphical Aniruddha Dutta: Conceptualization, Data curation, Investi-
LASSO to approximate the partial correlation/covariance among the gation, Formal analysis, Writing original draft; Tamal Batabyal:
factors. A suitable threshold would give a reduced dimension to Methodology, software, validation, Writing- original draft; Meheli
apply LASSO afterwards for further reduction in the dimension. Basu: Data curation, Investigation, Writing-Review & Editing; Scott
A possible future direction of this work is to consider nutrition Acton: Writing Review & Editing, Supervision. Aniruddha Dutta and
and dietary data recorded by NHANES as additional predictor vari- Tamal Batabyal are equal contributors.
ables for CHD prediction. Dietary factors play an important role
Declaration of Competing Interest
in CHD occurrence (Bhupathiraju, 2011; Masironi, 1970) and the
prediction accuracy of CHD by including additional dietary vari-
ables could be explored. For example, until very recently, several The authors declare that they have no known competing finan-
cial interests or personal relationships that could have appeared to
prospective studies concluded that total dietary fat was not sig-
influence the work reported in this paper. The authors declare the
nificantly associated with CHD mortality (Howard, Van Horn, and
Hsia, 2006; Skeaff, 2009). However, according to American Heart following financial interests/personal relationships which may be
considered as potential competing interests
Association (AHA), it is the quality of fat which determines CHD
risk (Lichtenstein, Appel, and Brands, 2006, USDA 2010). Individual
Acknowledgments
experiments performed with NHANES dietary data have discussed
the association of cholesterol, LDL, HDL, amino acids and dietary
The authors acknowledge the high-performance computational
supplements with CHD (references). However, individual consump-
support from The Center for Advanced Computing (CAC) at Queen’s
tion of nutrients takes place collectively in the form of meals con-
University, Canada and the Center for Research Computing at Uni-
sisting of combination of nutrients (Hu, 2002; Sacks, Obarzanek,
versity of Pittsburgh, USA. This research is not funded by any ex-
and Windhauser, 1995). This may lead to multi-collinearity among
ternal research grant.
factors and thus a more complex dietary pattern analysis, con-
trolling for multicollinearity of CHD associated significant nutri- References
ents could lead to a more comprehensive approach to CHD preven-
tion. Additionally, some of the clinical predictor variables included Ahmed, M. A., Yasmeen, A. A., Awadalla, H., Elmadhoun, W. M., Noor, S. K., & Al-
mobarak, A. O. (2017). Prevalence and trends of obesity among adult Sudanese
in the classification model of CHD prediction may themselves be individuals: Population based study. Diabetes & Metabolic Syndrome: Clinical Re-
impacted by certain dietary habits of patients. Thus, inclusion of search & Reviews, 11(2), 963–967. doi:10.1016/j.dsx.2017.07.023.
dietary data of patients along with clinical predictor variables, in Benjamin, E. J., Muntner, P., Alonso, A., Bittencourt, M. S., Callaway, C. W., Car-
son, A. P., et al. (2019). Heart disease and stroke statistics—2019 update: A re-
prediction of CHD, can also lead to potential endogeneity issues. port from the american heart association. American Heart Association, 139, 56–
However, with appropriate treatment of endogeneity, dietary data 528. doi:10.1161/CIR.0 0 0 0 0 0 0 0 0 0 0 0 0659.
inclusion is expected to provide further insights and improved ac- Bhupathiraju, S. N., & Tucker, K. L. (2011). Coronary heart disease prevention: Nu-
trients, foods, and dietary patterns. Clinica Chimica Acta, 412(17-18), 1493–1514.
curacy of CHD diagnosis.
doi:10.1016/j.cca.2011.04.038.
Finally, the preferred selection between data augmentation and Burchfiel, C. M., Tracy, R. E., Chyou, P., & Strong, J. P. (1997). Cardiovascular risk fac-
data subsampling is much debated and demands attention in this tors and hyalinization of renal arterioles at autopsy. Arteriosclerosis, Thrombosis,
and Vascular Biology, 17(4), 760–768. doi:10.1161/01.ATV.17.4.760.
section. Our argument in favor of subsampling is as follows: as
Burke, A. P., Farb, A., Malcom, G. T., Liang, Y. H., Smialek, J., & Virmani, R. (1997).
observed from the t-SNE figures in the result section, the CHD Coronary risk factors and plaque morphology in men with coronary disease
and no-CHD classes are densely interspersed. Moreover, the class- who died suddenly. The New England Journal of Medicine, 336(18), 1276–1282.
specific clusters are highly non-convex and extremely hard to sep- doi:10.1056/NEJM199705013361802.
Celermajer, D. S., Sorensen, K. E., Georgakopoulos, D., Bull, C., Thomas, O., Robin-
arate using naïve nonlinear classifiers. Synthetic data samples us- son, J., & Deanfield, J. E. (1993). Cigarette smoking is associated with dose-
ing strategies, such as a random sampling on the line connecting related and potentially reversible impairment of endothelium-dependent dila-
an arbitrary pair of data samples (used in SMOTE, ADASYN) might tion in healthy young adults. Circulation, 88(5), 2149–2155. doi:10.1161/01.CIR.
88.5.2149.
receive the wrong label. It is because of the fact that the newly Center for Nutrition Policy and Promotion. (2010). Dietary guidelines for Americans.
sampled data sample has the likelihood to be labeled as “0 (for US Department of Agriculture.
training) if the pair of data samples belongs to class “0 . How- Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Syn-
thetic minority over-sampling technique. Journal of Artificial Intelligence Research,
ever, the data sample may be biologically implausible or, in case 16, 321–357. doi:10.1613/jair.953.
of potential plausibility, may actually be a sample from class “1 Chobanian, A. V., Bakris, G. L., Black, H. R., Cushman, W. C., Green, L. A., IzzoJr, J. L.,
as both the classes are densely mixed. Especially, when the data is Jones, D. W., Materson, B. J., Oparil, S., WrightJr, J. T., & Roccella, E. J.National
High Blood Pressure Education Program Coordinating Committee. (2003). Sev-
significantly imbalanced, such as in the case of our data, the num-
enth report of the joint national committee on prevention, detection, evalu-
ber of synthesized data samples of the minority class is large. A ation, and treatment of high blood pressure. Hypertension, 42(6), 1206–1252.
countable fraction of such newly synthesized, incorrectly labeled doi:10.1161/01.HYP.0 0 0 0107251.49515.c2.
Chonchol, M., & Nielson, C. (2008). Hemoglobin levels and coronary artery disease.
data imposes a large bias on the trained network and increases the
American Heart Journal, 155(3), 494–498. doi:10.1016/j.ahj.2007.10.031.
probability of misclassification. Therefore, we prefer to adopt the Clifton, P. M. (2011). Protein and coronary heart disease: The role of different
sub-sampling strategy, where the authenticity of data is preserved, protein sources. Current Atherosclerosis Reports, 13(6), 493–498. doi:10.1007/
barring the measurement and acquisition noise. It is an interesting s11883-011-0208-x.
Das, R., Turkoglu, I., & Sengur, A. (2009). Effective diagnosis of heart disease through
avenue to explore if the extension of shallow CNN models, in terms neural networks ensembles. Expert Systems with Applications, 36(4), 7675–7680.
of architecture and data sub-sampling, to implementation of neural doi:10.1016/j.eswa.2008.09.013.
net-based learning on similar clinical datasets, improves the pre- Fava, A., Plastino, M., Cristiano, D., Spanò, A., Cristofaro, S., Opipari, C., Chillà, A.,
Casalinuovo, F., Colica, C., Bartolo, M. D., Pirritano, D., & Bosco, D. (2013). Insulin
diction accuracy of the classification process. As explained earlier, resistance possible risk factor for cognitive impairment in fibromialgic patients.
our model can also be used as a transfer learning model and the Metabolic Brain Disease, 28(4), 619–627. doi:10.1007/s11011-013-9421-3.
16 A. Dutta, T. Batabyal and M. Basu et al. / Expert Systems With Applications 159 (2020) 113408
Goldstein, B. A., Navar, A. M., & Carter, R. E. (2017). Moving beyond regression Masironi, R. (1970). Dietary factors and coronary heart disease. Bulletin of the World
techniques in cardiovascular risk prediction: Applying machine learning to ad- Health Organization, 42(1), 103–114.
dress analytic challenges. European Heart Journal, 38(23), 1805–1814. doi:10. Meigs, J. M., D’Agostino Sr, R. B., Wilson, P. W. F., Cupples, L. A., Nathan, D. A.,
1093/eurheartj/ehw302. & Singer, D. E. (1997). Risk variable clustering in the insulin resistance syn-
Goodfellow, I., Bengio, Y., Courville, A., & Bengio, Y. (2016). Deep learning: 1. MIT drome: the Framingham offspring study. Diabetes, 46(10), 1594–1600. doi:10.
press Cambridge. 2337/diacare.46.10.1594.
Haq, A. U., Li, J. P., Memon, M. H., Nazir, S., & Sun, R. (2018). A hybrid intelligent Mordvintsev, A., Olah, C., & Tyka, M. (2015). Inceptionism: Going deeper into neural
system framework for the prediction of heart disease using machine learning networks. Google AI Blog.
algorithms. Mobile Information Systems 21 pages. doi:10.1155/2018/3860146. Nakanishi, R., Dey, D., Commandeur, F., Slomka, P., Betancur, J., Gransar, H., Dail-
Haskell, W. L., Alderman, E. L., Fair, J. M., Maron, D. J., Mackey, S. F., Superko, H. R., ing, C., Osawa, K., Berman, D., & Budoff, M. (2018). Machine learning in predict-
Williams, P. T., Johnstone, I. M., Champagne, M. A., & Krauss, R. M. (1994). Ef- ing coronary heart disease and cardiovascular disease events: Results from the
fects of intensive multiple risk factor reduction on coronary atherosclerosis and multi-ethnic study of atherosclerosis (MESA). Journal of the American College of
clinical cardiac events in men and women with coronary artery disease. The Cardiology, 71(11) Supplement. doi:10.1016/S0735-1097(18)32024-2.
Stanford Coronary Risk Intervention Project (SCRIP). Circulation, 89(3), 975–990. National Center for Health Statistics. (2015). https://wwwn.cdc.gov/nchs/nhanes/
doi:10.1161/01.CIR.89.3.975. ContinuousNhanes/Questionnaires.aspx?BeginYear=2015
He, H., Bai, Y., Garcia, E. A., & Li, S. (2008). ADASYN: Adaptive synthetic sampling Neilson, C., Lange, T., & Hadjokas, N. (2006). Blood glucose and coronary artery
approach for imbalanced learning. In Proceedings of 2008 IEEE International joint disease in nondiabetic patients. Diabetes Care, 29(5), 998–1001. doi:10.2337/
conference on neural networks (IEEE world congress on computational intelligence) dc05-1902.
(pp. 1322–1328). IEEE. doi:10.1109/IJCNN.2008.4633969. Olaniyi, E. O., Oyedotun, O. K., & Khashman, A. (2015). Heart diseases diagnosis us-
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recogni- ing neural networks arbitration. International Journal of Intelligent Systems and
tion. In Proceedings of the IEEE conference on computer vision and pattern recog- Applications, 7(12), 75–82. doi:10.5815/ijisa.2015.12.08.
nition (pp. 770–778). Radford, A., Metz, L., & Chintala, S. (2015). Unsupervised representation learning
Howard, B. V., Van Horn, L., Hsia, J., et al. (2006). Low-fat dietary pattern and risk with deep convolutional generative adversarial networks. arXiv preprint.
of cardiovascular disease: The women’s health initiative randomized controlled Roger, V. L. (2010). The heart failure epidemic. International Journal of Environmen-
dietary modification trial. JAMA, 295, 655–666. doi:10.1001/jama.295.6.655. tal Research and Public Health, 7(4), 1807–1830. doi:10.3390/ijerph7041807.
Hu, F. B. (2002). Dietary pattern analysis: A new direction in nutri- Sacks, F. M., Obarzanek, E., Windhauser, M. M., et al. (1995). Rationale and de-
tional epidemiology. Current Opinion on Lipidology, 13, 3–9. doi:10.1097/ sign of the dietary approaches to stop hypertension trial (DASH). A multicenter
0 0 041433-20 02020 0 0-0 0 0 02. controlled-feeding study of dietary patterns to lower blood pressure. Annals of
Iandola, F. N., .Han, S., Moskewicz, M. W., .Ashraf, K., Dally, W. J., .& Keutzer, K. Epidemiology, 5, 108–118. doi:10.1016/1047-2797(94)0 0 055-x.
(2016). Squeezenet: Alexnet-level accuracy with 50x fewer parameters and 0.5 Shaper, A. G., Wannamethee, S. G., & Whincup, P. H. (2004). Serum albumin and
mb model size. arXiv preprint. risk of stroke, coronary heart disease, and mortality: The role of cigarette smok-
Irie, F., Iso, H., Sairenchi, T., Fukasawa, N., Yamagishi, K., Ikehara, S., & ing. Journal of Clinical Epidemiology, 57(2), 195–202. doi:10.1016/j.jclinepi.2003.
Kanashiki, M. (2006). The relationships of proteinuria, serum creatinine, 07.001.
glomerular filtration rate with cardiovascular disease mortality in Japanese Shen, J., Zhang, J., Wen, J., Ming, Q., Zhang, J., & Xu, Y. (2015). Correlation of
general population. Kidney International, 69(7), 1264–1271. doi:10.1038/sj.ki. serum alanine aminotransferase and aspartate aminotransferase with coronary
50 0 0284. heart disease. International Journal of Clinical and Experimental Medicine, 8(3),
Jin, Z., Sun, Y., & Cheng, A. C. (2009). Predicting cardiovascular disease from real- 4399–4404.
time electrocardiographic monitoring: An adaptive machine learning approach Shen, Y., Yang, Y., Parish, S., Chen, Z., Clarke, R., & Clifton, D. A. (2016). Risk predic-
on a cell phone. In Proceedings of international conference of the IEEE engineer- tion for cardiovascular disease using ECG data in the China kadoorie biobank.
ing in medicine and biology society (pp. 6889–6892). doi:10.1109/IEMBS.2009. In Proceedings of 38th annual international conference of the IEEE engineering in
5333610. medicine and biology society (EMBC). doi:10.1109/EMBC.2016.7591218.
Kahramanli, H., & Allahverdi, N. (2008). Design of a hybrid system for the diabetes Shilaskara, S., & Ghatol, A. (2013). Feature selection for medical diagnosis: Evalua-
and heart diseases. Expert Systems with Applications, 35(1-2), 82–89. doi:10.1016/ tion for cardiovascular diseases. Expert Systems with Applications, 40(10), 4146–
j.eswa.20 07.06.0 04. 4153. doi:10.1016/j.eswa.2013.01.032.
Kannel, W. B. (1996). Blood pressure as a cardiovascular risk factor: Prevention Skeaff, C. M., & Miller, J. (2009). Dietary fat and coronary heart disease: Summary
and treatment. Journal of the American Medical Association, 275(20), 1571–1576. of evidence from prospective cohort and randomised controlled trials. Annals of
doi:10.10 01/jama.1996.03530440 051036. Nutrition and Metabolism, 55, 173–201. doi:10.1159/0 0 02290 02.
Kannel, W. B., Castelli, W. P., Gordon, T., & McNamara, P. M. (1971). Serum choles- Smith, M. R., Martinez, T., & Giraud-Carrier, C. (2014). An instance level anal-
terol, lipoproteins, and the risk of coronary heart disease. The Framingham ysis of data complexity. Machine Learning, 95(2), 225–256. doi:10.1007/
study. Annals of Internal Medicine, 74(1), 1–12. doi:10.7326/0 0 03- 4819- 74- 1- 1. s10994- 013- 5422- z.
Kopel, E., Kivity, S., Morag-Koren, N., Segev, S., & Sidi, Y. (2012). Relation of serum Stamler, J., Vaccaro, O., Neaton, J. D., & Wentworth, D. (1993). Diabetes, other risk
lactate dehydrogenase to coronary artery disease. The American Journal of Cardi- factors, and 12-yr cardiovascular mortality for men screened in the multiple risk
ology, 110(12), 1717–1722. doi:10.1016/j.amjcard.2012.08.005. factor intervention trial. Diabetes Care, 16(2), 434–444. doi:10.2337/diacare.16.2.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep 434.
convolutional neural networks. In Proceedings of the 25th international conference Szegedy, C., Ioffe, S., Vanhoucke, V., & Alemi, A. (2017). Inception-v4, inception-res-
on neural information processing systems (pp. 11097–11105). net and the impact of residual connections on learning. In Proceedings of the
Kurt, I., Ture, M., & Kurum, A. T. (2008). Comparing performances of logistic re- thirty-first AAAI conference on artificial intelligence (pp. 4278–4284).
gression, classification and regression tree, and neural networks for predict- Uyar, K., & Ilhan, A. (2017). Diagnosis of heart disease using genetic algorithm based
ing coronary artery disease. Expert Systems with Applications, 34(1), 366–374. trained recurrent fuzzy neural networks. Procedia Computer Science, 120, 588–
doi:10.1016/j.eswa.20 06.09.0 04. 593. doi:10.1016/j.procs.2017.11.283.
LeCun, Y., & Bengio, Y. (1995). Convolutional networks for images, speech, and time Vasan, R. S., Larson, M. G., Leip, E. P., Evans, J. C., O’Donnell, C. J., Kannel, W. B.,
series. In M. A. Arbib (Ed.), The handbook of brain theory and neural networks & Levy, D. (2001). Impact of high-normal blood pressure on the risk of cardio-
(pp. 255–258). Cambridge, MA: MIT Press. vascular disease. The New England Journal of Medicine, 345(18), 1291–1297 2001
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521, 436–444. Nov. doi:10.1056/NEJMoa003417.
doi:10.1038/nature14539. Venkatesh, B. A., et al. (2017). Cardiovascular event prediction by machine learn-
Lichtenstein, A. H., Appel, L. J., Brands, M., et al. (2006). Diet and lifestyle ing. The multi-ethnic study of atherosclerosis. Circulation Research, 121(9), 1092–
recommendations revision 2006: A scientific statement from the American 1101. doi:10.1161/CIRCRESAHA.117.311312.
Heart Association Nutrition Committee. Circulation, 114, 82–96. doi:10.1161/ Wannamethee, S. G., Shaper, A. G., & Perry, I. J. (1997). Serum creatinine concentra-
CIRCULATIONAHA.106.176158. tion and risk of cardiovascular disease: A possible marker for increased risk of
Madani, A., Arnaout, R., Mofrad, M., & Arnaout, R. (2018). Fast and accurate view stroke. Stroke, 28(3), 557–563. doi:10.1161/01.STR.28.3.557.
classification of echocardiograms using deep learning. npj Digital Medicine, 1, 6. Weng, S. F., Reps, J., Kai, J., Garibaldi, J. M., & Qureshi, N. (2017). Can machine-
doi:10.1038/s41746-017-0013-1. learning improve cardiovascular risk prediction using routine clinical data? PLoS
Madjid, M., & Fatemi, O. (2013). Components of the complete blood count as risk One, 12(4), E0174944. doi:10.1371/journal.pone.0174944.
predictors for coronary heart disease. Texas Heart Institute Journal, 40(1), 17–29. Wilson, D. L. (1972). Asymptotic properties of nearest neighbor rules using edited
Mani, I., & Zhang, I. (2003). kNN approach to unbalanced data distributions: A case data. IEEE Transactions on Systems, Man, and Cybernetics, 3, 408–421. doi:10.1109/
study involving information extraction. In Proceedings of workshop on learning TSMC.1972.4309137.
from imbalanced datasets: 126. Yang, Z., Zhang, T., Lu, J., Zhang, D., & Kalui, D. (2017). Optimizing area under the
Marcus, G. (2018). Deep learning: A critical appraisal. arXiv preprint. ROC curve via extreme learning machines. Knowledge-Based Systems, 130, 74–89.
Martinez, F. L., Schwarcz, A., Valdez, E. R., & Diaz, V. G. (2018). Machine learn- doi:10.1016/j.knosys.2017.05.013.
ing classification analysis for a hypertensive population as a function of several Zeiher, A. M., Drexler, H., Saurbier, B., & Just, H. (1993). Endothelium-mediated coro-
risk factors. Expert Systems with Applications, 110, 2016–2215. doi:10.1016/j.eswa. nary blood flow modulation in humans. Effects of age, atherosclerosis, hyper-
2018.06.006. cholesterolemia, and hypertension. The Journal of Clinical Investigation, 92(2),
652–662. doi:10.1172/JCI116634.