Osteoporosis Detection Using Machine and Deep Learning Techniques

osteoporosis Detection Using
Machine and Deep Learning

Techniquesvjav.com
Abstract. The images were acquired at the hospital of Orleans (France) from postmenopausal women. The high degree
of similarity between images of a healthy bone (CS) and a diseased one (OP) makes classification a challenge. A good
bone texture characterization technique is essential for identifying osteoporosis cases. Standard texture feature
extraction techniques like Local Binary Pattern (LBP), oriented Basic Image Features (OBIF), Line Local Binary
Pattern (LLBP), Binarized Statistical Image Features (BSIF), associated with random subspace and bagging methods
have been used for this purpose. Validation on the test set showed that the ensemble classifiers predicted osteoporosis
risk with an AUC of ----%, accuracy of ----%, sensitivity of ----%, and specificity of ----%. compared to the results
obtained by the SVM classifier alone.
This paper describes the process of using a deep learning framework to train and test two convolutional neural network
models (VGG-16, Resnet…. GoogleNet) to classify ostheoporosis…. To improve the results, we created an ensemble
and also averaged over K nearest neighbors and …..
Introduction
Osteoporosis is a condition that influences the bones. Its name comes from
Latin for "porous bones." Osteoporosis can happen in individuals of all ages,
yet it's more normal in more seasoned grown-ups, particularly ladies. In
excess of 53 million individuals in the United States either have osteoporosis
or are at high risk be affected by it.
Osteoporosis is a disease that influences the skeletal structure and is

characterized by low bone mass density (BMD) and crumbling of the bone
microarchitecture, expanding the risk of fractures. Clinically, the evaluation
disease comprises of a mix of proof of weakness fractures with estimation of
BMD [1].
Dual-energy X-ray absorptiometry (DXA), is the standard test for diagnosing
osteoporosis, by estimating of bone mineral density (BMD) in the proximal
femur, and lumbar spine. But, BMD alone represents just 60% of fracture
risk prediction [2]. BMD via DXA also has other modeling limitations, like it
gives different values for variable bone sizes measurement and has a very
costly setup. The trabecular bone microarchitecture description has been
perceived as a significant factor and completes the osteoporosis diagnosis
utilizing BMD [3-9].
These constraints pave a need to find some alternate diagnostic tool for
osteoporosis.
Recently, the list of models that aim to capture the risk of fragility fracture
has been widely increased by the application of Artificial Intelligence tools,
in the hope of making the predictions ever more useful. The AI tools have the
potential to capture underlying trends and patterns, otherwise impossible
with previous modeling tools; they learn from data [10-12].
Osteoporosis prediction could be considered as a machine learning task.

Ensemble learning algorithms are one of the most attractive methods for
prediction and data classification problems. Ensemble learning framework
consist of a combination of various classifiers to perform a classification task
together. These methods have preferred features which make them proper
form for datasets [13]. The main goal of ensemble learning is to decrease the
prediction error of an individual learner based classification task for the
learning [14]: in boosting, the learners evolves over time, in the end
producing a weighted result; in bootstrap aggregation or bagging, e.g.
bagged decision trees, the prediction is averaged over a collection of
bootstrap samples of decision trees. In order to categorize subjects into two
classes (osteoporosis: OP, control subject: CS) [15], In this work; random
subspace and bagging methods were incorporated in building ensemble
classifiers for the classification of osteoporosis disease.
So, The input to our models are images of patients subject to osteoporosis.
Each image belongs to one of the two classes described in Section. We then
use different types of convolutional neural networks (CNN): VGG-16, Resnet
and GoogleNet to predict to which class the given images belong. The output
is a list of predicted class labels and the corresponding probabilities
(confidence) for all images. our aim/goal in this project is to detect
osteoporosis cases, by using various Machine Learning Models to classify the
provided images into different categories of osteoporosis.
………………………………………………………………………………………………
2. Related Work
Many solutions already exist because this problem was a public challenge.
Many top solutions used pre-trained CNN models and the most popular are
VGG-16 and ResNet, which were state-of-the-art one or two years ago and
had been improving . Besides these models, two of the top performing
solutions we consulted for our project also used several good ideas:
ensemble, K nearest neighbors (KNN) and data augmentation [4][5]. First,
because the test set size is about four times as large as the training set, it is
easy to overfit. Creating an ensemble of models can reduce variance and
alleviate this problem. Secondly, because the images are taken from a ………,
many images can be very similar or almost identical and they should be
classified to the same class. Applying KNN can yield more stable results. We
have seen from previous solutions that KNN was used for either test or the
training test (data augmentation) with different ideology and the results
were good in both cases.
……………………………………………………………………………………………………………
…………………………………………………………………………………………………………….
3. Dataset and Features
The images were acquired at the hospital of Orleans (France) from

postmenopausal women
……………………………………………………………………………………………………………
……………………………………………………………………………………………………….
Dataset Visualization
3.1 Data Pre-processing

Images are resized using CV2 in order to improve the computing efficiency of
the classifier.
Stratified splitting is used to split the dataset into ….. Training-Testing

ratio. The training dataset is further split into …. Training-Validation set.
Thus, the final training set has ……. images; the final validation set has …..
images and the final testing set has 4485 images.
3.2 Workflow
Detecting osteoporosis — Flow Chart
4. Feature Extraction
Following feature extraction techniques were used for extracting features
from the images:
4.1 HOG — The histogram of oriented gradients

(HOG) is a feature descriptor used in computer vision and image processing
for the purpose of object detection. The technique counts occurrences of
gradient orientation in localized portions of an image.
4.2 LBP — Local Binary Pattern (LBP)

LBP is a simple yet very efficient texture operator which labels the pixels of
an image by thresholding the neighbourhood of each pixel and considers the
result as a binary number.
4.3 SURF
The SURF method (Speeded Up Robust Features) is a fast and robust
algorithm for local, similarity invariant representation and comparison of
images. The main interest of the SURF approach lies in its fast computation
of operators using box filters, thus enabling real-time applications such as
tracking and object recognition.
4.4 Color Histogram
A color histogram is a representation of the distribution of colors in an
image. For digital images, a color histogram represents the number of pixels
that have colors in each of a fixed list of color ranges, that span the image’s
color space.
4.5 KAZE
KAZE is a 2D feature detection and description method that operates
completely in nonlinear scale space.
Features extracted using Feature Extraction Techniques
4.6 Normalization:
In this technique of data normalization, a linear transformation is performed
on the original data. Minimum and maximum value from data is fetched and
each value is replaced according to the following formula:
where x is the original value and x’ is the normalized value.
4.7 Dimensionality Reduction

We have used three dimensionality reduction techniques which are stated
below:
4.7.1 PCA
Principal component analysis (PCA) is a technique for reducing the
dimensionality of datasets, increasing interpretability but at the same time
minimizing information loss.
After applying feature extraction techniques, PCA with n_components =

100 components is applied to features extracted from each extraction
technique individually and then later combined together to form a combined
feature set of 700 features.
4.7.2 LDA
Linear discriminant analysis (LDA) is a generalization of Fisher’s linear
discriminant, a method used in statistics and other fields, to find a linear
combination of features that characterizes or separates two or more classes
of objects or events.
4.7.3 LDA over PCA

LDA is applied on combined features which are obtained after applying PCA
to further reduce the features, get a better class separation and to increase
computational efficiency.
5. Models
Following Models were used for classification of our dataset:
5.1 Traditional ML Models
5.1.1Decision Tree
Decision Tree is an important non-parametric method for image mining,
This method uses several simple decision rules of the image to provide a path
for image classification.
5.1.2 Support Vector Machine

SVM’s are supervised learning models with associated learning algorithms
that analyze data for classification and regression analysis.
5.1.3 KNN
The KNN algorithm assumes that similar things exist in close proximity. In
other words, similar things are near to each other KNN captures the idea of
similarity (sometimes called distance, proximity, or closeness)
5.2 Ensembling Methods
5.2.1 XGBoost
XGBoost is a decision-tree-based ensemble Machine Learning algorithm that
uses a gradient boosting framework. In prediction problems involving
unstructured data (images, text, etc.) artificial neural networks tend to
outperform all other algorithms or frameworks.
5.2.2 Bagging
Bootstrap aggregating also called bagging (from bootstrap aggregating), is a
machine learning ensemble meta-algorithm designed to improve the
stability and accuracy of machine learning algorithms used in statistical
classification and regression. It also reduces variance and helps to avoid
overfitting.
5.2.3 ADABoost
AdaBoost algorithm, short for Adaptive Boosting, is a Boosting technique
that is used as an Ensemble Method in Machine Learning. It is called
Adaptive Boosting as the weights are re-assigned to each instance, with
higher weights to incorrectly classified instances.
5.3 Deep Learning Techniques
5.3.1 CNN
Convolutional Neural Network (CNN) has proven to be very effective in areas
such as image recognition and classification. The advantage of using CNN is
that it can automatically learn a complex feature utilizing massive simple
neurons and backpropagation.
CNN architecture that was implemented for classification is shown below:

CNN Architecture
5.3.2 Transfer Learning:

Transfer learning is a technique in deep learning where one model that is
trained on a task is repurposed to fit another task. ResNet-101 was applied
with 2 strategies:
5.3.2.1 Strategy-1
Retrain only the last classifier layer of the pre-trained model and freeze all
other parameters.
5.3.2.2 Strategy-2
Retrain the last few layers of the model including the classifier layer.
6. Result and Analysis
6.1 Grid Search

Optimal parameters obtained after hyperparameter tuning using
GridSearchCV were :
Hyper-parameters obtained after tuning
6.2 ROC curves
ROC curves for each model after applying PCA
ROC curves for each model after applying LDA

ROC curves for each model after applying LDA over PCA
6.3 Classification Report
After applying PCA
After applying LDA

After applying LDA over PCA
6.4 Plots of Training and Validation Loss
CNN
ResNet-101: Stratergy 1 (left) and Stratergy 2 (right)
6.5 Confusion Matrix

CNN
ResNet-101: Stratergy 1 (left) and Stratergy 2 (right)
6.6 CAM
Class activation maps (CAMs) is a way to highlight the regions within an
image that a CNN uses to make a classification decision for that particular
image.
Class Activation Map for ResNet-101 — S2
7. Conclusion
PCA gave better results than LDA and LDA over PCA. Combination of
features from various feature extractions gave better accuracy than taking
individual features. Adaboost performed poorly for classification on this
dataset as boosting techniques usually perform better on high variance
datasets which is contrary to our data set which has low variance. After
plotting the ROC curves for various traditional models, the best AUC-ROC
score was found for SVM. ResNet-101 S2 performed much better than
ResNet-101 S1 which is also higher than the accuracy obtained using CNN.
Data Augmentation and Data Extraction techniques helped the classifier to
perform better.

Osteoporosis Detection Using Machine and Deep Learning Techniques

Uploaded by

Copyright:

Available Formats

Osteoporosis Detection Using Machine and Deep Learning Techniques

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Osteoporosis Detection Using Machine and Deep Learning Techniques

Uploaded by

Copyright:

Available Formats

osteoporosis Detection Using

Machine and Deep Learning

Osteoporosis is a disease that influences the skeletal structure and is

Osteoporosis prediction could be considered as a machine learning task.

3. Dataset and Features

The images were acquired at the hospital of Orleans (France) from

3.1 Data Pre-processing

Stratified splitting is used to split the dataset into ….. Training-Testing

4.1 HOG — The histogram of oriented gradients

4.2 LBP — Local Binary Pattern (LBP)

Features extracted using Feature Extraction Techniques

4.7 Dimensionality Reduction

After applying feature extraction techniques, PCA with n_components =

4.7.3 LDA over PCA

5.1 Traditional ML Models

5.1.2 Support Vector Machine

5.2 Ensembling Methods

5.3 Deep Learning Techniques

CNN architecture that was implemented for classification is shown below:

5.3.2 Transfer Learning:

6. Result and Analysis

6.1 Grid Search

6.2 ROC curves

ROC curves for each model after applying PCA

ROC curves for each model after applying LDA

6.3 Classification Report

After applying PCA

After applying LDA

6.4 Plots of Training and Validation Loss

ResNet-101: Stratergy 1 (left) and Stratergy 2 (right)

6.5 Confusion Matrix

ResNet-101: Stratergy 1 (left) and Stratergy 2 (right)

You might also like