Metabook 6540
Metabook 6540
Metabook 6540
https://ebookmeta.com/product/data-analysis-and-related-
applications-volume-2-multivariate-health-and-demographic-data-
analysis-1st-edition-konstantinos-n-zafeiris/
https://ebookmeta.com/product/applied-modeling-techniques-and-
data-analysis-1-computational-data-analysis-methods-and-
tools-1st-edition-yiannis-dimotikalis/
https://ebookmeta.com/product/volume-iii-data-storage-data-
processing-and-data-analysis-volker-liermann-editor/
https://ebookmeta.com/product/computational-methods-and-data-
analysis-for-metabolomics-shuzhao-li/
The Christoffel Darboux Kernel for Data Analysis
Cambridge Monographs on Applied and Computational
Mathematics Jean Bernard Lasserre
https://ebookmeta.com/product/the-christoffel-darboux-kernel-for-
data-analysis-cambridge-monographs-on-applied-and-computational-
mathematics-jean-bernard-lasserre/
https://ebookmeta.com/product/computational-topology-for-data-
analysis-tamal-krishna-dey/
https://ebookmeta.com/product/algebraic-foundations-for-applied-
topology-and-data-analysis-1st-edition-hal-schenck/
https://ebookmeta.com/product/pattern-recognition-and-data-
analysis-with-applications-deepak-gupta/
https://ebookmeta.com/product/applied-missing-data-analysis-2nd-
edition-craig-k-enders/
Data Analysis and Related Applications 1
Big Data, Artificial Intelligence and Data Analysis Set
coordinated by
Jacques Janssen
Volume 9
Edited by
Konstantinos N. Zafeiris
Christos H. Skiadas
Yiannis Dimotikalis
Alex Karagrigoriou
Christiana Karagrigoriou-Vonta
First published 2022 in Great Britain and the United States by ISTE Ltd and John Wiley & Sons, Inc.
Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted
under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored or
transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the
case of reprographic reproduction in accordance with the terms and licenses issued by the
CLA. Enquiries concerning reproduction outside these terms should be sent to the publishers at the
undermentioned address:
www.iste.co.uk www.wiley.com
Any opinions, findings, and conclusions or recommendations expressed in this material are those of the
author(s), contributor(s) or editor(s) and do not necessarily reflect the views of ISTE Group.
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii
Konstantinos N. ZAFEIRIS, Yiannis DIMOTIKALIS, Christos H. SKIADAS, Alex KARAGRIGORIOU
and Christiana KARAGRIGORIOU-VONTA
Part 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2. Data understanding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3. Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4. Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.6. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2. Background. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2.1. Blockchain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2.2. Blockchain types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2.3. Blockchain-based web applications . . . . . . . . . . . . . . . . . . . . . . 33
3.2.4. Blockchain consensus algorithms . . . . . . . . . . . . . . . . . . . . . . . 33
3.2.5. Other consensus algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.3. Analysis stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.3.1. Art Shop web application . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.3.2. SQL-based application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.3.3. NoSQL-based application . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.3.4. Blockchain-based application . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.4. Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.4.1. Adding records . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.4.2. Query. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.4.3. Functionality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.4.4. Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.6. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.2. Discrete-time model with reinsurance and bank loans . . . . . . . . . . . . . . 44
4.2.1. Model description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.2.2. Optimization problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.2.3. Model stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.3. Continuous-time insurance model with dividends . . . . . . . . . . . . . . . . 48
4.3.1. Model description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.3.2. Optimal barrier strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.3.3. Special form of claim distribution . . . . . . . . . . . . . . . . . . . . . . . 50
4.3.4. Numerical analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Contents vii
5.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.2. Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.3. Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.3.1. Main limit results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.3.2. Block maxima method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.3.3. Largest order statistics method. . . . . . . . . . . . . . . . . . . . . . . . . 62
5.3.4. Estimation of other tail parameters . . . . . . . . . . . . . . . . . . . . . . 63
5.4. Results and conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.5. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.6. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
6.2. Nearest neighbor methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.2.1. Background of the NN methods . . . . . . . . . . . . . . . . . . . . . . . . 69
6.2.2. The k-nearest neighbors method . . . . . . . . . . . . . . . . . . . . . . . . 70
6.2.3. The fixed-radius NN method. . . . . . . . . . . . . . . . . . . . . . . . . . 70
6.2.4. The kernel-NN method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
6.2.5. Algorithms of the three considered NN methods. . . . . . . . . . . . . . . 72
6.2.6. Parameter and distance metric selection . . . . . . . . . . . . . . . . . . . 74
6.3. Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
6.3.1. Dataset description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
6.3.2. Variable selection and data splitting. . . . . . . . . . . . . . . . . . . . . . 75
6.3.3. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
6.3.4. A discussion and comparison of results . . . . . . . . . . . . . . . . . . . . 78
6.4. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
6.5. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
viii Data Analysis and Related Applications 1
7.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
7.2. Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
7.2.1. Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
7.2.2. Instrument . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
7.2.3. Statistical analyses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
7.3. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
7.3.1. EFA results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
7.3.2. CFA results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
7.3.3. Scale construction and assessment . . . . . . . . . . . . . . . . . . . . . . 91
7.4. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
7.5. Funding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
7.6. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
Part 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
Chapter 12. Invariant Description for a Batch Version of the UCB Strategy
with Unknown Control Horizon . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
Sergey GARBAR
Part 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
Part 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
Chapter 27. High Speed and Secured Network Connectivity for Higher
Education Institutions Using Software Defined Networks . . . . . . . . . . . 371
Lincoln S. PETER and Viranjay M. SRIVASTAVA
Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425
The field of data analysis has grown enormously over recent decades due to the
rapid growth of the computer industry, the continuous development of innovative
algorithmic techniques and recent advances in statistical tools and methods. Due to
the wide applicability of data analysis, a collective work is always needed to bring
all recent developments in the field, from all areas of science and engineering, under
a single umbrella.
Part 1 focuses mainly on computational data analysis and related fields, with
nine chapters covering machine learning algorithms, web applications, spatial
analysis, multivariate regression, factor analysis, mixture models, non-parametric
techniques and tail distributions.
Part 2 focuses mainly on stochastic and algorithmic data analysis and related
fields, with nine chapters covering volatility, calibration, segmentation, Markov
chains, genetic algorithms, classification algorithms, batch processing, entropies and
pseudodistances.
xviii Data Analysis and Related Applications 1
Part 3 focuses mainly on applied statistical data analysis and related fields, with
five chapters covering spatial statistics, Monte Carlo methods, machine learning
methods, time series analysis and gas analysis.
Part 4 focuses mainly on economic and numerical data analysis and related
fields, with six chapters covering economic downturn, cyber systems, morbidity,
fixed-income market, Bayesian inference and reliability analysis.
Konstantinos N. ZAFEIRIS
Yiannis DIMOTIKALIS
Christos H. SKIADAS
Alex KARAGRIGORIOU
Christiana KARAGRIGORIOU-VONTA
April 2022
PART 1
Additive Manufacturing of Metal Alloys 1: Processes, Raw Materials and Numerical Simulation,
First Edition. Edited by Konstantinos N. Zafeiris, Christos H. Skiadas, Yiannis Dimotikalis Alex
Karagrigoriou and Christiana Karagrigoriou-Vonta.
© ISTE Ltd 2022. Published by ISTE Ltd and John Wiley & Sons, Inc.
1
Thyroid cancer is the second most prevalent cancer type among women in
Turkey. The number of people diagnosed with thyroid cancer in the United States in
2021 is estimated as 44,280, according to the report published by the American
Cancer Society. The risk of thyroid cancer can be reduced by early diagnosis and
treatment. This study is focused on predicting five different thyroid diseases, based
on various symptoms and reports of the thyroid. Several machine learning
algorithms, such as support vector machine, k-nearest neighbors, artificial neural
network and decision tree are used for diagnosis of various thyroid diseases, and
their classification performances are compared with each other. For this purpose,
a thyroid disease dataset gathered from the Department of Nuclear Medicine and
Endocrinology in Istanbul University-Cerrahpaşa Faculty of Medicine was used.
1.1. Introduction
Chapter written by Burcu Bektas GÜNEŞ, Evren BURSUK and Rüya ŞAMLI.
4 Data Analysis and Related Applications 1
There are four basic steps in the decision-making process providing diagnosis in
medicine. These are: cue acquisition, hypothesis generation, cue interpretation and
hypothesis evaluation. In modern times, the wide variety of diseases (differential
diagnosis), complicated disease states (the presence of more than one disease in the
same person), selectivity in perception, variety/size of medical data, insufficient
time allocated to the evaluation processes and the need for these processes to be
done in a limited time are all factors that may cause errors in the steps of this
decision-making process. Physical or emotional changes due to human nature such
as stress, fatigue, distraction, illness or inexperience can also increase the likelihood
of these diagnostic errors. Considering today’s technology, various computer-aided
systems are used to reduce these errors, and a new one is added to these systems
every day (Bursuk 1999; Nohria 2015). In addition, machine learning (ML), another
branch of artificial intelligence, is used in programs designed recently. It is used in
an increasingly wide range.
In this study, we explored the use of machine learning methodology for the
automatic classification of thyroid diseases using 10 attributes. We used the private
dataset that contains the information of 130 patients from the Department of Nuclear
Medicine and Endocrinology in Istanbul University-Cerrahpaşa Faculty of
Medicine, Turkey (IUC). After pre-processing stages, the data were trained by
adapting most of the ML algorithms to our data. Results of this research indicated
Performance of Evaluation of Diagnosis of Various Thyroid Diseases 5
that by using all the findings (physical examination, laboratory findings and
radiologic findings) together, various types of thyroid disease can be diagnosed and
the ML provides almost 100% correct answers.
This research was carried out using physical examination, laboratory findings
and radiologic findings, depicted in Table 1.1. Data were obtained from IUC after
the Ethical Committee’s approval.
This dataset contains five diseases. These are Plummer disease, toxic
multi-nodular goiter, Hashimoto’s disease, Graves’ disease and subacute thyroiditis.
In this context, the number of target attributes are seven for Plummer disease, 40 for
6 Data Analysis and Related Applications 1
toxic multi-nodular goiter, 32 for Hashimoto’s disease, 48 for Graves’ disease and
three for subacute thyroiditis for multiple classifications, as shown in Figure 1.1.
Figure 1.1. Class visualization for the whole dataset. For a color
version of this figure, see www.iste.co.uk/zafeiris/data1.zip
1.3. Modeling
For five different diseases, analyses were performed using machine learning
methods. SVM, k-nearest neighbors (KNN), artificial neural network (ANN) and
decision tree (DT) were used. With these algorithms, fivefold cross-validation was
used as a performance evaluation method for the dataset before the models were
performed. According to this method, the dataset is divided into five equal parts
each time, one part is chosen to be tested and the others are used as training data.
The accuracy metric in equation [1.1], the precision metric in equation [1.2], the
recall metric in equation [1.3] and F-measure metric in equation [1.4] are widely
used for model performance. In this study, accuracy was selected as the model
performance evaluation metric.
[1.1]
[1.2]
[1.3]
∗
2∗ [1.4]
True positive (TP): the true label of the given sample is positive; it refers to the
number of data that the classifier also predicts as positive. True negative (TN):
Performance of Evaluation of Diagnosis of Various Thyroid Diseases 7
the true label of the given sample is negative; it refers to the number of data that the
classifier predicts as negative. False positive (FP): the true label is negative but
refers to the number of data the classifier incorrectly predicts positively. False
negative (FN): the true label is positive but refers to the number of data the classifier
incorrectly predicts negatively (Bulut et al. 2020).
SVM is one of the managed machine learning algorithms used for both
classification and regression issues, and is generally used for a bit of arrangement
problems. Each data item is plotted as a point in n-dimensional space with the value
of each feature being the value of a particular coordinate. The classification then
takes place by finding the hyper-plane that ideally differentiates the classes (Razia
et al. 2018; Raisinghani et al. 2019; Dharmarajan et al. 2020).
leaves represent a class. The DT algorithm commonly uses the gini index,
information gain, chi-square and reduction in variance to make a strategic split
(Raisinghani et al. 2019; Chaubey et al. 2021). In this study, the J48 decision tree
algorithm was used.
1.4. Findings
The performance of the models is assessed using the accuracy metric. The results
are shown in Table 1.2 and Figure 1.2. The SVM algorithm achieved 100%
performance. Figure 1.2 shows the accuracy performances of the ML algorithms
compared with each other.
Accuracy
1.02
1
0.98
0.96
0.94
SVM ANN
KNN Decision Tree
Predicted Label
Toxic
Graves’ Hashimoto’s Subacute Plummer
multi-nodular
disease disease thyroiditis disease
goiter
True Label
48 0 0 0 0
0 32 0 0 0
0 0 3 0 0
0 0 0 40 0
0 0 0 0 7
Predicted Label
Toxic
Graves’ Hashimoto’s Subacute Plummer
multi-nodular
disease disease thyroiditis disease
goiter
True Label
48 0 0 0 0
0 32 0 0 0
0 0 2 0 1
0 0 0 40 0
0 0 0 0 7
Predicted Label
Toxic
Graves’ Hashimoto’s Subacute Plummer
multi-nodular
disease disease thyroiditis disease
goiter
True Label
48 0 0 0 0
0 32 0 0 0
3 0 0 0 0
0 0 0 40 0
0 0 0 0 7
Predicted Label
Toxic
Graves’ Hashimoto’s Subacute Plummer
multi-nodular
disease disease thyroiditis disease
goiter
True Label
48 0 0 0 0
0 32 0 0 0
0 0 2 0 1
0 0 0 40 0
0 0 0 0 7
1.5. Conclusion
In this study, we explored the use of machine learning methodologies for the
automatic classification of thyroid diseases using 10 attributes. We used the private
dataset that contains the information of 130 patients from IUC. After pre-processing
stages, the data were trained by adapting most of the ML algorithms to our data. The
results of this research indicated that by using all the findings (physical examination,
laboratory findings and radiologic findings) together, various types of thyroid
disease can be diagnosed and the ML provides almost 100% correct answers. The
IUC dataset was sufficiently differentiated according to the disease for which it was
labeled. For this reason, ML algorithms have shown very high performances.
Overfitting was not observed. This system can be developed by using a larger and
more balanced dataset. Further development can be done by using image processing
of ultrasonic scanning of thyroid images to predict thyroid nodules, which cannot be
recognized in laboratory findings.
1.6. References
Bulut, B., Kalın, V., Güneş, B.B., Khazhin, R. (2020). Deep learning approach for detection
of retinal abnormalities based on color fundus images. 2020 Innovations in Intelligent
Systems and Applications Conference, 1–6, Istanbul, 15–17 October 2020.
Bursuk, E. (1999). A diagnostic expert system for cardiological, respiratory, vascular and
hematological diseases. Master’s thesis, Institute of Biomedical Engineering, Bosphorus
University, Istanbul.
Chaubey, G., Bisen, D., Arjaria, S., Yadav, V. (2021). Thyroid disease prediction using
machine learning approaches. Natl. Acad. Sci. Lett., 44(3), 233–238.
Performance of Evaluation of Diagnosis of Various Thyroid Diseases 11
Dharmarajan, K., Balasree, K., Arunachalam, A.S., Abirmai, K. (2020). Thyroid disease
classification using decision tree and SVM. Indian J. Public Health Res. Dev., 11, 229.
Godara, S. and Kumar, S. (2018). Prediction of thyroid disease using machine learning
techniques. International Journal of Electronics Engineering, 10(2), 787–793.
Hameed, M.A. (2017). Artificial neural network system for thyroid diagnosis. Eng. Sci.,
11(25), 518–528.
Haykin, S.S. and Haykin, S.S. (2009). Neural Networks and Learning Machines, 3rd edition.
Prentice Hall, New York.
Nohria, R. (2015). Medical expert system – A comprehensive review. Int. J. Comput. Appl.,
130(7), 44–50.
Raisinghani, S., Shamdasani, R., Motwani, M., Bahreja, A., Raghavan Nair Lalitha, P. (2019).
Thyroid prediction using machine learning techniques. In ICACDS 2019: Advances in
Computing and Data Sciences, Singh, M., Gupta, P., Tyagi, V., Flusser, J., Ören, T.,
Kashyap, R. (eds). Springer, Singapore.
Razia, S., Swathi Prathyusha, P., Krishna, N.V., Sumana, N. (2018). A comparative study of
machine learning algorithms on thyroid disease prediction. International Journal of
Engineering & Technology, 7(2.8), 315–319.
Reza Obeidavi, M., Rafiee, A., Mahdiyar, O. (2017). Diagnosing thyroid disease by neural
networks. Biomed. Pharmacol. J., 10(2), 509–524.
Wang, Y., Yue, W., Li, X., Liu, S., Guo, L., Xu, H., Zhang, H., Yang, G. (2020). Comparison
study of radiomics and deep learning-based methods for thyroid nodules classification
using ultrasound images. IEEE Access, 8, 52010–52017.
2
Spatial analyses of infectious diseases have a long tradition, and with the
contemporary increasing incidences of chronic and degenerative diseases, consistent
interest has emerged regarding the geography of these types of non-infectious
pathologies and their environmental correlations. In this work, we explore spatial
variations in the prevalence of thyroid cancer, taking into account the demographic
heterogeneity in the at-risk population at the small-area level.
This work aims to enhance the existing research surrounding thyroid incidence in
volcanic areas by analyzing spatial patterns of thyroid cancer cases in Mount Etna’s
area, in the eastern part of Sicily. It is known from the medical literature that several
constituents of volcanic lava and ashes, such as radioactive and heavy metals, are
involved in the pathogenesis of thyroid cancer via the biocontamination of
atmosphere, soil and aquifers. Here, we exploit a unique dataset that allowed us to
geocode the geographic location of cases at the household level, whereas all studies
that we are aware of use aggregated data. Applying the local Moran’s I statistic as a
means for detecting spatial clustering, we aimed to disentangle the spatial
aggregation of thyroid cancer cases due to the proximity to a volcanic area from that
due to the geographic variations in the density of the population at risk and other
concomitant environmental risk factors.
Additive Manufacturing of Metal Alloys 1: Processes, Raw Materials and Numerical Simulation,
First Edition. Edited by Konstantinos N. Zafeiris, Christos H. Skiadas, Yiannis Dimotikalis Alex
Karagrigoriou and Christiana Karagrigoriou-Vonta.
© ISTE Ltd 2022. Published by ISTE Ltd and John Wiley & Sons, Inc.
14 Data Analysis and Related Applications 1
Our preliminary findings seem to confirm a vast empirical literature that has
revealed an increased thyroid cancer incidence in volcanic areas, such as islands,
Hawaii and the Philippines, where an intense basaltic volcanic activity has also been
long detected; furthermore, parts of the Etna volcanic area seem to be more affected
than others.
2.1. Introduction
At the end of the 18th century, Dr. Valentine Seaman mapped yellow fever cases
in New York and thus succeeded in highlighting a possible correlation between the
sites of various dumps and the location of the cases (Stevenson 1965). About
60 years later, John Snow came up with the idea of creating a map of the cholera
cases that were plaguing Soho (London) at the time, and he realized that the cause of
the epidemic was due to a specific public fountain. By closing the fountain he
managed to stop the infection (Snow 1855; Walter 2000). These are just two of the
first attempts to use cartography as a tool to provide epidemiological information.
From that time on, geographic maps have increasingly been adopted as a traditional
tool to visualize the spatial distribution of diseases in the field of health. In general,
considerable effort has been devoted to the development of geographic information
systems (GIS) that facilitate the understanding of public health problems and foster
collaboration between physicians, epidemiologists and geographers to map and
predict disease risk (Croner et al. 1996). As a result of the epidemiological
transition, the long tradition of using geographic techniques for the analysis of
infectious diseases has assisted a similar application in the geographic distribution of
chronic diseases such as cancer and various types of heart disease (Ghosh et al.
1999; Wakefield 2007). There are many environmental risk factors included among
the possible concurrent causes of non-infectious pathologies, and geographical
representations constitute a valid tool for conducting exploratory analyses on the
spatial distribution of cases. In particular, May (1950) emphasized how a disease is
the product of the interaction between pathological factors (such as vectors and
genetic causes) and geographical factors acting on a physical, biological and social
level.
To date, many epidemiological studies suggest that the etiology of thyroid cancer
(TC) includes the presence of an active volcano among several factors such as the
technological improvement of screening systems, iodine consumption and others
(Marcello et al. 2014; Vigneri et al. 2015). TC is the most widespread endocrine
neoplasm, whose incidence has grown steadily around the world in recent decades
(Curado et al. 2007; Kilfoy et al. 2009; Fitzmaurice et al. 2015; Liu et al. 2017).
Exploring Chronic Diseases’ Spatial Patterns 15
local Moran’s I index. The local Moran’s I statistic is able to detect the presence of
spatial autocorrelation at the level of sub-areas, which may not emerge at the global
level. Although TC case maps and cluster analysis cannot prove the causal
mechanisms underlying the investigated phenomenon, we rely on these
methodologies to provide further evidence regarding the volcano–TC relationship
and to support decision-making in the public health sector. Our results show the
presence of areas of greater risk that would suggest a possible effect of proximity to
Mount Etna and also to Mount Vulcano, although the latter presents a reduced
activity in comparison with the first one. Despite this, given the exploratory
contribution of our work, a more in-depth study is required to gain a greater
understanding of the phenomenon.
This work is organized as follows: the second section describes the available
data and the salient features of the area under analysis; the third section reports the
methodology applied, with particular mention of SIR and local Moran’s I index; the
fourth section illustrates and discusses the distribution of TC in the eastern part of
Sicily and shows the presence of clusters of high- and low-risk areas and the fifth
and last section summarizes and concludes the work.
TC is the most widespread endocrine neoplasm in the world and has been
increasing steadily in recent decades (Curado et al. 2007; Kilfoy et al. 2009;
Fitzmaurice et al. 2015). Incidence rates significantly higher than the national
averages were recorded in various volcanic areas such as the area that we consider in
this work, eastern Sicily. This area includes four provinces: Messina, Catania, Enna
and Siracusa. The volcanic area that refers to Mount Etna, the highest active
European volcano, is located in the province of Catania but involves some other
areas of the southern province of Messina. Pellegriti et al. (2009) actually report a
considerable increase in the incidence rate of TC compared to the Italian average,
especially in the province of Catania. The Sicilian TT incidence figures are made
public in the Health Atlas of Sicily, published by the Department for Health
Activities and Epidemiological Observatory (Regional Health Department 2106).
Table 2.1 shows the TT incidence rate for the provinces of eastern Sicily (calculated
for the period 2003–2011 by standardization on the new European population per
100,000 inhabitants), disclosed in the Health Atlas. The rate is always higher for
women than men, as known in the literature, and higher than the regional value in
the provinces of Catania and Messina, for both sexes.
Exploring Chronic Diseases’ Spatial Patterns 17
Several studies have revealed, over time, the presence of high levels of heavy
metals in the volcanic area, as a result of the continuous emissions of gas (mainly
composed of gases such as CO2 and SO2), ash and lava by Mount Etna (Buat-Ménard
and Arnold 1978; Cimino and Ziino 1983; Caltabiano et al. 2004; Andronico et al.
2009; D’Aleo et al. 2016). Such heavy metals include among others arsenic,
cadmium, chromium, cobalt, mercury, tungsten and zinc which, in high
concentrations, could contaminate soil, water and the atmosphere, eventually
entering the food chain (Vigneri et al. 2017). These works indicate that the presence
of an active volcano could contaminate the surrounding area through the repeated
emissions leading to potential repercussions for human health.
The territory of these provinces is heterogeneous and includes the volcanic area
as well as urban, rural and industrial regions (Istat 2013). As a result, the resident
population and the cases of TC are distributed in a non-homogeneous way according
to the characteristics of the urban morphology and of the natural environment
(Figure 2.1).
2.3. Methodology
neighboring areas, such as the presence of volcanic areas adjacent to coastal and
plain areas. Therefore, the expected risk of cancer will be higher where the number
of population at risk is high, and the environmental factors are close. Conversely, the
risk will be relatively lower in sparsely populated areas or where the natural causes
of the risk are missing.
In this case, it is possible that neighboring areas with similar population density
or in the presence (absence) of other risk factors, give rise to actual clusters of high,
medium and low risk of TC. The analysis of the similarity of the attributes of nearby
geographic areas is generally part of the study of spatial autocorrelation, which
evaluates the spatial distribution of a particular process in terms of relationships,
mutual influences and distance (Cressie 1991; Anselin and Rey 2010; Borruso and
Murgante 2012).
The risk of TC was represented through the production of maps showing the
spatial distribution, for each census tract, of the standardized incidence ratio (SIR).
The SIRs were calculated for each inhabited census tract by indirect standardization
(Waller and Gotway 2004, pp. 12–15), using the incidence rate of TC observed in
the same period (2003–2016) in the whole of eastern Sicily. SIR is the ratio between
observed TC cases and expected TC cases in each census tract i
where Oi is the number of cases observed for census tract i and Ei is the number of
cases expected in the same census tract i. The number of expected cases is calculated
as the product of the population at risk (and therefore the entire resident population)
in the given census tract i and the general incidence rate for the entire investigated
area
=
20 Data Analysis and Related Applications 1
where Pi is the population at risk in the specific census tract i and r+ is the general
incidence rate of TC, calculated for the four provinces of interest as a whole, as
The SIR index suffers from limits in terms of variability: sparsely populated
areas have a high probability of resulting in a significantly high index, showing a
fallacious increase in the risk of TC. Furthermore, by construction, the standard
error of SIR tends to be large for sparsely populated areas and small for densely
populated ones. As a result, the confidence intervals of SIR will attribute
significance mostly to the highly populated areas (Haining 2003). On the whole,
Exploring Chronic Diseases’ Spatial Patterns 21
areas with low population density often result in extreme values of SIR while highly
populated areas are mostly associated with SIR significantly different from 1. To
overcome these issues and contain the variability in the spatial distribution of the
population, we will consider only the census tracts with more than 30 residents for
the calculation of SIR. On the contrary, when computing the expected global
number of cases for each stratum, rj, we will consider the totality of TC cases and
the resident population.
The local Moran’s I indicator belongs to the so-called LISA (Local Indicators of
Spatial Association) or local indicators of spatial autocorrelation proposed by
Anselin (1995). It is calculated with the following formula:
− ̅
= − ̅
,
Positive and high values of the local Moran’s I index indicate that a given region
is surrounded by neighboring regions with similar high (or low) values of the
variable under study. In this case, the spatial groups detected are defined as
“high–high” (region with a high value surrounded by regions with high values) or
“low–low” (region with low value surrounded by regions with low values). In terms
of cancer risk, a “high–high” cluster would indicate a high-risk area, while a
“low–low” cluster would denote a low-risk area. Negative values of the local
Moran’s I reveal that the region under examination is a spatial outlier. A spatial
outlier is an area that has a markedly different value from that of its neighbors
(Cerioli and Riani 1999). Spatial outliers are divided into “high–low” (high value
surrounded by neighbors with low values) and “low–high” (low value surrounded by
neighbors with high values).
The local Moran’s I can be standardized so that its significance can be tested
under normal distribution assumption. However, its distribution under the null
22 Data Analysis and Related Applications 1
In eastern Sicily from 2003 to 2016, 7,182 individuals were affected by TC. The
etiology of this tumor is complex and varied, and can be genetic as well as
preventive, come from dietary causes, etc. as already mentioned. In the case of
Sicily, the distribution of TC cases could also be conditioned by two geographical
components:
– the spatial arrangement of the resident population, with particular reference to
the female part, which is known to be the most affected by TC (Parkin et al. 2005).
Where the population is more concentrated or where the female population is
predominant, it will be more likely to record a high incidence of TC;
– the presence of environmental factors such as the volcanic nature of the
territory. The fumes emitted by an active volcano, such as Mount Etna, are able to
transport heavy metals and radioactive substances capable of contaminating the air,
water and soil of the surrounding areas (Fiore et al. 2019).
Figure 2.2(a)and 2.2(b) shows, respectively, SIR by census section and the
relative confidence intervals. From the mere SIR representation (Figure 2.1(a)),
different risk areas emerge, namely those with an SIR value greater than 1. These
areas are located in the area around Mount Etna as well as in the non-volcanic
provinces, especially in those of Enna and Messina. The consideration of the
confidence intervals for SIR (Figure 2.3(b)) instead highlights the area south-east of
Mount Etna and different sections belonging mainly to the Messina province.
In both maps, it is evident that if in the non-volcanic provinces the census sections
with SIR greater than 1 are casually arranged on the territory, in the province of
Catania, the risk sections are concentrated in an area close to Mount Etna, leaving
the rest of the province almost free. Furthermore, the location of the risk areas along
the NW–SE axis could suggest that persistent winds in the SE direction could carry
the toxic substances emitted by the volcano, therefore polluting the atmosphere of
the territories positioned along this corridor, as highlighted in Boffetta et al. (2020).
It is also interesting to note that the census sections on the island of Lipari show a
high and significant SIR. Indeed, this area is also of a volcanic type and is located in
the immediate vicinity of Mount Vulcano, an active volcano presenting only a little
activity compared to that of Mount Etna. The island of Vulcano is home to
numerous sulfurous fumaroles as well as a field of frequent submarine volcanic CO2
emissions, whose spatial distribution follows the direction given by persistent winds
blowing from the NW (Vizzini et al. 2020). Moreover, Vizzini et al. (2013) stated
that the area experiences “low”-level contamination due to elements such as Ba, Fe,
As and Cd. Overall, the significance of SIR in Lipari seems to further corroborate
the idea that a volcano can influence the incidence of TC nearby.
Figure 2.3(a) shows the local Moran’s I statistic, while Figure 2.3(b) shows the
pseudo p-values obtained from the conditioned permutation procedure. Low-risk
census sections surrounded by low-risk census sections are represented in bright
yellow; those of high risk with high-risk neighbors are in the brown; low-risk
sections surrounded by neighboring high-risk sections are colored light orange and
high-risk ones with a low-risk neighborhood appear in dark orange. Figure 2.3(a)
shows a variation in the risk between the northeast and the southwest: southern and
western internal areas do not host high-risk clusters, while the eastern and northern
ones present different high-risk clusters. In particular, there are extensive low-risk
clusters along the eastern coast of Messina and Syracuse, whereas high-risk groups
emerge in the SSE area to Mount Etna, in the Aeolian Islands up north and on the
northern coast near Barcellona Pozzo di Gotto. Figure 2.3(b) illustrates that the
sections constituting the high- and low-risk clusters are significant at a level equal to
at most α = 0.05. Finally, it should be noted that most of the considered sections
were found to be of insignificant risk, as can be seen from the large gray areas
present in both maps.
Exploring Chronic Diseases’ Spatial Patterns 25
The cluster analysis could confirm the hypothesis according to which persistent
winds in the SE direction would push the radioactive substances emitted by the
volcano towards areas that report a high risk. A similar suggestion seems to apply to
the Aeolian Islands and the sections near Barcellona Pozzo di Gotto.
2.5. Conclusion
146. Distances.
1. The distances between the several bodies in which troops are
distributed for attack depend upon the nature of the ground, and the
weapons of the enemy, and they must be fixed by the officers in
immediate command.
2. The scouts should be sufficiently far in advance of, and on the
exposed flanks of, the firing-line, to protect it from surprise. In close
or undulating country it will be necessary to provide for connecting
links in order that there may be no danger of touch with the
advanced scouts being lost, and of reports, verbal or by signal,
failing to reach the commanders of the firing-line. In wooded country
the distance may be decreased.
3. In close country, and in wood-fighting, the distances between
the several bodies into which an attacking force is divided should
seldom exceed 200 yards. In open ground greater distances are
necessary, except against a badly-armed enemy.
4. The distance of the general reserve should be usually greater
than that between the other bodies in order that it may not be
prematurely drawn into the fight.
5. The general rule is that the troops in rear should be brought
closer to the firing-line, the nearer the moment for the assault
approaches.
147. Intervals.
An arbitrary rule as regards intervals is undesirable. Each portion
of the force engaged will generally be told off to attack a particular
section of the enemy’s line, and the frontage to be occupied by each
left to the discretion of their commanding officers. It is essential that
there should be a clear understanding as to responsbility for
searching, and, if necessary, clearing, all dangerous ground which
lies between units. This should be notified in the orders for attack.
3. When two or more officers are present with a company, one will
always be with the firing line.
4. Half-company commanders in the firing line will place
themselves where they can best supervise the skirmishers. Their
duties in action are as follows:—
(i) They must be constantly on the look out for the signals of the
company commander, and of the scouts.
(ii) They must maintain the direction.
(iii) They will see that fire is not wasted, and that it is concentrated on
important targets.
(iv) They will observe the enemy’s movements, and report at once to
the company commander.
(v) If the assault succeeds, they will lose no time in rallying and
reforming their half-companies.
(vi) During the advance they will take all leaderless men of other
companies and corps under their command, and keep them until
the action is over, or the force re-forms.
5. The frontage occupied by a company acting independently
depends on the nature of the operation. There may be a
considerable gap between the frontal and the flank attacks; and a
portion of the company, extended at wide intervals, may be told off
merely to hold the enemy, while the remainder, at closer intervals,
make the decisive attack.
The rule that a strong firing-line should be established in a good
fire-position at a decisive range must always be observed by the
portion of the company which is told off for the decisive attack; and
although the men need not be so close as in the case of larger
forces, still, to dislodge an enemy of nearly equal strength, the firing-
line, at decisive range, should not be weaker than one rifle to every
two or three yards of front.
6. When the company is acting in concert with the remainder of
the battalion, its frontage, as a rule, will be assigned by the battalion
commander.
7. The company commander must always be guided by
circumstances in deciding on the strength of his firing-line, and on
the formation of the remainder. The general procedure will be to
gradually reinforce the scouts, when they are checked by the
enemy’s fire, and thus build up a firing line, which, at decisive range,
shall be strong enough to gain superiority over the enemy’s fire. This
procedure is, however, by no means to be regarded as invariable. It
might be desirable, for instance, to deploy the whole company at
once in the firing line.
S. 153 (3). This may sometimes be advisable on open ground
without cover, when less loss would be incurred than by gradually
reinforcing a weaker firing-line.
8. In order that tactical unity may be maintained as long as
possible, it will usually be advisable that complete squads or
sections be extended on the first advance, further reinforcements
being furnished by the other squads of the same sections, or other
sections of the same half company.
157. Instruction.
It is always advisable, in instructing a battalion, to hand over the
entire control of the companies in firing-line or reserve, with the
exception of the portion retained at the disposal of the officer
commanding, to their own leaders, and to give each of the latter a
free hand in carrying out the task assigned to him. Such a method,
with inexperienced company officers, may at first lead to mistakes
and misunderstandings; but as soon as these officers gain
confidence, become accustomed to working in concert, and
understand what is required of them, energetic combination will take
the place of hesitation and bewilderment, and the officer
commanding will find himself supported by a body of zealous and
self-reliant assistants, capable of executing his intentions without
depending on continual instructions.
Moreover, the practice of carrying out an attack by the co-
operation of several independent units is the only method possible in
a hotly contested action.
It must be made clear whether the battalion is supposed to be
acting alone or in conjunction with other troops.
THE DEFENCE.
160. Distribution of Infantry for defence.
1. Infantry detailed for the defence of the entrenchments will
generally be distributed in two bodies, viz.,
(i) Firing Line and Supports.
(ii) Local Reserves.
For the decisive counter-attack, a separate body, The General
Reserve, which has nothing to do with the immediate defence of the
entrenchments, will be retained in the hands of the officer
commanding.
2. The strength of the firing line will depend entirely on the extent
of the field of fire and the character of the cover. If the conditions are
favourable to the defence a few men can easily protect a wide front.
If there is any chance of a surprise, or of the position being attacked
by a sudden rush, the firing line should be as dense as is compatible
with the free use of the rifle by every man engaged.
3. The duty of the supports is to replace casualties in the firing
line, and they should therefore be posted near at hand and under
cover. In strong positions very small supports will be quite sufficient,
or they may even be dispensed with altogether.
4. The duties of the local reserves are to deliver local counter-
attacks, to reinforce the firing line at critical moments, and to protect
the flanks; they will also furnish the outposts and supply
detachments to occupy temporary positions, either in front or beyond
the flanks of the entrenchments. S. 161 (7), also “Combined
Training,” 125 (4). Local reserves should be well covered, especially
from artillery fire; but there should be no obstacle to their being
brought rapidly to the front.
163. Fire.
1. As the difficulties of ammunition supply and want of knowledge
of ranges are not so great as in the attack, it will often be expedient
to open fire at long ranges in order to oblige the assailant to deploy
and adopt a definite course of action which it will be difficult for him
to rectify when exposed to fire.
Long-range fire may also be used to deceive the enemy as to the
dispositions and strength of the defender, and to check the advance
of reinforcements.
The employment of long-range fire must, however, be regulated
by the effect produced on the enemy. If this is observed to be small,
it will be wiser to reserve ammunition for closer ranges where better
results may be expected, and on occasion it may be advisable to
encourage the enemy’s advance by a weak fire or by withholding it
entirely, and to receive him at decisive ranges with a fire of the
greatest intensity possible.