Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Metabook 6540

Download as pdf or txt
Download as pdf or txt
You are on page 1of 70

Data Analysis and Related

Applications, Volume 1: Computational,


Algorithmic and Applied Economic
Data Analysis 1st Edition Konstantinos
N. Zafeiris
Visit to download the full and correct content document:
https://ebookmeta.com/product/data-analysis-and-related-applications-volume-1-com
putational-algorithmic-and-applied-economic-data-analysis-1st-edition-konstantinos-n
-zafeiris/
More products digital (pdf, epub, mobi) instant
download maybe you interests ...

Data Analysis and Related Applications, Volume 2:


Multivariate, Health and Demographic Data Analysis 1st
Edition Konstantinos N. Zafeiris

https://ebookmeta.com/product/data-analysis-and-related-
applications-volume-2-multivariate-health-and-demographic-data-
analysis-1st-edition-konstantinos-n-zafeiris/

Applied Modeling Techniques and Data Analysis 1:


Computational Data Analysis Methods and Tools 1st
Edition Yiannis Dimotikalis

https://ebookmeta.com/product/applied-modeling-techniques-and-
data-analysis-1-computational-data-analysis-methods-and-
tools-1st-edition-yiannis-dimotikalis/

Volume III: Data Storage, Data Processing and Data


Analysis Volker Liermann (Editor)

https://ebookmeta.com/product/volume-iii-data-storage-data-
processing-and-data-analysis-volker-liermann-editor/

Computational Methods and Data Analysis for


Metabolomics Shuzhao Li

https://ebookmeta.com/product/computational-methods-and-data-
analysis-for-metabolomics-shuzhao-li/
The Christoffel Darboux Kernel for Data Analysis
Cambridge Monographs on Applied and Computational
Mathematics Jean Bernard Lasserre

https://ebookmeta.com/product/the-christoffel-darboux-kernel-for-
data-analysis-cambridge-monographs-on-applied-and-computational-
mathematics-jean-bernard-lasserre/

Computational Topology for Data Analysis Tamal Krishna


Dey

https://ebookmeta.com/product/computational-topology-for-data-
analysis-tamal-krishna-dey/

Algebraic Foundations for Applied Topology and Data


Analysis 1st Edition Hal Schenck

https://ebookmeta.com/product/algebraic-foundations-for-applied-
topology-and-data-analysis-1st-edition-hal-schenck/

Pattern Recognition and Data Analysis with Applications


Deepak Gupta

https://ebookmeta.com/product/pattern-recognition-and-data-
analysis-with-applications-deepak-gupta/

Applied Missing Data Analysis, 2nd Edition Craig K.


Enders

https://ebookmeta.com/product/applied-missing-data-analysis-2nd-
edition-craig-k-enders/
Data Analysis and Related Applications 1
Big Data, Artificial Intelligence and Data Analysis Set
coordinated by
Jacques Janssen

Volume 9

Data Analysis and


Related Applications 1

Computational, Algorithmic and


Applied Economic Data Analysis

Edited by
Konstantinos N. Zafeiris
Christos H. Skiadas
Yiannis Dimotikalis
Alex Karagrigoriou
Christiana Karagrigoriou-Vonta
First published 2022 in Great Britain and the United States by ISTE Ltd and John Wiley & Sons, Inc.

Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted
under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored or
transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the
case of reprographic reproduction in accordance with the terms and licenses issued by the
CLA. Enquiries concerning reproduction outside these terms should be sent to the publishers at the
undermentioned address:

ISTE Ltd John Wiley & Sons, Inc.


27-37 St George’s Road 111 River Street
London SW19 4EU Hoboken, NJ 07030
UK USA

www.iste.co.uk www.wiley.com

© ISTE Ltd 2022


The rights of Konstantinos N. Zafeiris, Christos H. Skiadas, Yiannis Dimotikalis, Alex Karagrigoriou and
Christiana Karagrigoriou-Vonta to be identified as the authors of this work have been asserted by them in
accordance with the Copyright, Designs and Patents Act 1988.

Any opinions, findings, and conclusions or recommendations expressed in this material are those of the
author(s), contributor(s) or editor(s) and do not necessarily reflect the views of ISTE Group.

Library of Congress Control Number: 2022935196

British Library Cataloguing-in-Publication Data


A CIP record for this book is available from the British Library
ISBN 978-1-78630-771-2
Contents

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii
Konstantinos N. ZAFEIRIS, Yiannis DIMOTIKALIS, Christos H. SKIADAS, Alex KARAGRIGORIOU
and Christiana KARAGRIGORIOU-VONTA

Part 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

Chapter 1. Performance of Evaluation of Diagnosis of Various Thyroid


Diseases Using Machine Learning Techniques . . . . . . . . . . . . . . . . . . 3
Burcu Bektas GÜNEŞ, Evren BURSUK and Rüya ŞAMLI

1.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2. Data understanding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3. Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4. Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.6. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

Chapter 2. Exploring Chronic Diseases’ Spatial Patterns:


Thyroid Cancer in Sicilian Volcanic Areas . . . . . . . . . . . . . . . . . . . . . 13
Francesca BITONTI and Angelo MAZZA
2.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2. Epidemiological data and territory . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3. Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3.1. Spatial inhomogeneity and spatial dependence . . . . . . . . . . . . . . . . 18
2.3.2. Standardized incidence ratio (SIR) . . . . . . . . . . . . . . . . . . . . . . 19
2.3.3. Local Moran’s I statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.4. Spatial distribution of TC in eastern Sicily . . . . . . . . . . . . . . . . . . . . 22
2.4.1. SIR geographical variation . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
vi Data Analysis and Related Applications 1

2.4.2. Estimate of the spatial attraction . . . . . . . . . . . . . . . . . . . . . . . . 24


2.5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.6. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

Chapter 3. Analysis of Blockchain-based Databases in Web


Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Orhun Ceng BOZO and Rüya ŞAMLI

3.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2. Background. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2.1. Blockchain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2.2. Blockchain types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2.3. Blockchain-based web applications . . . . . . . . . . . . . . . . . . . . . . 33
3.2.4. Blockchain consensus algorithms . . . . . . . . . . . . . . . . . . . . . . . 33
3.2.5. Other consensus algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.3. Analysis stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.3.1. Art Shop web application . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.3.2. SQL-based application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.3.3. NoSQL-based application . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.3.4. Blockchain-based application . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.4. Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.4.1. Adding records . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.4.2. Query. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.4.3. Functionality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.4.4. Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.6. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

Chapter 4. Optimization and Asymptotic Analysis


of Insurance Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Ekaterina BULINSKAYA

4.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.2. Discrete-time model with reinsurance and bank loans . . . . . . . . . . . . . . 44
4.2.1. Model description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.2.2. Optimization problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.2.3. Model stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.3. Continuous-time insurance model with dividends . . . . . . . . . . . . . . . . 48
4.3.1. Model description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.3.2. Optimal barrier strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.3.3. Special form of claim distribution . . . . . . . . . . . . . . . . . . . . . . . 50
4.3.4. Numerical analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Contents vii

4.4. Conclusion and further research directions . . . . . . . . . . . . . . . . . . . . 55


4.5. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

Chapter 5. Statistical Analysis of Traffic Volume in


the 25 de Abril Bridge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Frederico CAEIRO, Ayana MATEUS and Conceicao VEIGA de ALMEIDA

5.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.2. Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.3. Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.3.1. Main limit results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.3.2. Block maxima method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.3.3. Largest order statistics method. . . . . . . . . . . . . . . . . . . . . . . . . 62
5.3.4. Estimation of other tail parameters . . . . . . . . . . . . . . . . . . . . . . 63
5.4. Results and conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.5. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.6. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

Chapter 6. Predicting the Risk of Gestational Diabetes Mellitus through


Nearest Neighbor Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
Louisa TESTA, Mark A. CARUANA, Maria KONTORINAKI and Charles SAVONA-VENTURA

6.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
6.2. Nearest neighbor methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.2.1. Background of the NN methods . . . . . . . . . . . . . . . . . . . . . . . . 69
6.2.2. The k-nearest neighbors method . . . . . . . . . . . . . . . . . . . . . . . . 70
6.2.3. The fixed-radius NN method. . . . . . . . . . . . . . . . . . . . . . . . . . 70
6.2.4. The kernel-NN method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
6.2.5. Algorithms of the three considered NN methods. . . . . . . . . . . . . . . 72
6.2.6. Parameter and distance metric selection . . . . . . . . . . . . . . . . . . . 74
6.3. Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
6.3.1. Dataset description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
6.3.2. Variable selection and data splitting. . . . . . . . . . . . . . . . . . . . . . 75
6.3.3. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
6.3.4. A discussion and comparison of results . . . . . . . . . . . . . . . . . . . . 78
6.4. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
6.5. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
viii Data Analysis and Related Applications 1

Chapter 7. Political Trust in National Institutions: The Significance


of Items’ Level of Measurement in the Validation of Constructs . . . . . . . 81
Anastasia CHARALAMPI, Eva TSOUPAROPOULOU, Joanna TSIGANOU and
Catherine MICHALOPOULOU

7.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
7.2. Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
7.2.1. Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
7.2.2. Instrument . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
7.2.3. Statistical analyses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
7.3. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
7.3.1. EFA results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
7.3.2. CFA results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
7.3.3. Scale construction and assessment . . . . . . . . . . . . . . . . . . . . . . 91
7.4. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
7.5. Funding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
7.6. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

Chapter 8. The State of the Art in Flexible Regression Models for


Univariate Bounded Responses . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
Agnese Maria DI BRISCO, Roberto ASCARI, Sonia MIGLIORATI and Andrea ONGARO
8.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
8.2. Regression model for bounded responses . . . . . . . . . . . . . . . . . . . . . 101
8.2.1. Augmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
8.2.2. Main distributions on the bounded support . . . . . . . . . . . . . . . . . . 103
8.2.3. Inference and fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
8.3. Case studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
8.3.1. Stress data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
8.3.2. Reading data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
8.4. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

Chapter 9. Simulation Studies for a Special Mixture Regression Model


with Multivariate Responses on the Simplex . . . . . . . . . . . . . . . . . . . 115
Agnese Maria DI BRISCO, Roberto ASCARI, Sonia MIGLIORATI and Andrea ONGARO
9.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
9.2. Dirichlet and EFD distributions . . . . . . . . . . . . . . . . . . . . . . . . . . 116
9.3. Dirichlet and EFD regression models . . . . . . . . . . . . . . . . . . . . . . . 118
9.3.1. Inference and fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
9.4. Simulation studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
9.4.1. Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
9.5. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
Contents ix

Part 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

Chapter 10. Numerical Studies of Implied Volatility Expansions


Under the Gatheral Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
Marko DIMITROV, Mohammed ALBUHAYRI, Ying NI and Anatoliy MALYARENKO
10.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
10.2. Asymptotic expansions of implied volatility . . . . . . . . . . . . . . . . . . . 137
10.3. Performance of the asymptotic expansions . . . . . . . . . . . . . . . . . . . 139
10.4. Calibration using the asymptotic expansions . . . . . . . . . . . . . . . . . . 141
10.4.1. A partial calibration procedure . . . . . . . . . . . . . . . . . . . . . . . . 142
10.4.2. Calibration to synthetic and market data . . . . . . . . . . . . . . . . . . 143
10.5. Conclusion and future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
10.6. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

Chapter 11. Performance Persistence of Polish Mutual Funds:


Mobility Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
Dariusz FILIP

11.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149


11.2. Literature review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
11.3. Dataset and empirical design . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
11.4. Empirical results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
11.5. Monthly perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
11.6. Quarterly perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
11.7. Yearly perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
11.8. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
11.9. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

Chapter 12. Invariant Description for a Batch Version of the UCB Strategy
with Unknown Control Horizon . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
Sergey GARBAR

12.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163


12.2. UCB strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
12.3. Batch version of the strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
12.4. Invariant description with a unit control horizon . . . . . . . . . . . . . . . . 166
12.5. Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
12.6. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
12.7. Affiliations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
12.8. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
x Data Analysis and Related Applications 1

Chapter 13. A New Non-monotonic Link Function for


Beta Regressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
Gloria GHENO

13.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174


13.2. Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
13.3. Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
13.4. Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
13.5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
13.6. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184

Chapter 14. A Method of Big Data Collection and Normalization


for Electronic Engineering Applications . . . . . . . . . . . . . . . . . . . . . . 187
Naveenbalaji GOWTHAMAN and Viranjay M. SRIVASTAVA

14.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187


14.2. Machine learning (ML) in electronic engineering . . . . . . . . . . . . . . . . 189
14.2.1. Data acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
14.2.2. Accessing the data repositories . . . . . . . . . . . . . . . . . . . . . . . . 191
14.2.3. Data storage and management . . . . . . . . . . . . . . . . . . . . . . . . 192
14.3. Electronic engineering applications – data science . . . . . . . . . . . . . . . 193
14.4. Conclusion and future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
14.5. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195

Chapter 15. Stochastic Runge–Kutta Solvers Based on Markov


Jump Processes and Applications to Non-autonomous Systems
of Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
Flavius GUIAŞ
15.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
15.2. Description of the method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
15.2.1. The direct simulation method. . . . . . . . . . . . . . . . . . . . . . . . . 201
15.2.2. Picard iterations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
15.2.3. Runge–Kutta steps. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
15.3. Numerical examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
15.3.1. The Lorenz system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
15.3.2. A combustion model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
15.4. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
15.5. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
Contents xi

Chapter 16. Interpreting a Topological Measure of Complexity for


Decision Boundaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
Alan HYLTON, Ian LIM, Michael MOY and Robert SHORT
16.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
16.2. Persistent homology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
16.3. Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
16.3.1. Neural networks and binary classification . . . . . . . . . . . . . . . . . . 213
16.3.2. Persistent homology of a decision boundary . . . . . . . . . . . . . . . . 213
16.3.3. Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
16.4. Experiments and results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
16.4.1. Three-dimensional binary classification . . . . . . . . . . . . . . . . . . . 215
16.4.2. Data divided by a hyperplane. . . . . . . . . . . . . . . . . . . . . . . . . 217
16.5. Conclusion and discussion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
16.6. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220

Chapter 17. The Minimum Renyi’s Pseudodistance Estimators for


Generalized Linear Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
María JAENADA and Leandro PARDO
17.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
17.2. The minimum RP estimators for the GLM model: asymptotic
distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
17.3. Example: Poisson regression model . . . . . . . . . . . . . . . . . . . . . . . 230
17.3.1. Real data application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
17.4. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
17.5. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
17.6. Appendix. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
17.6.1. Proof of Theorem 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
17.7. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234

Chapter 18. Data Analysis based on Entropies and Measures


of Divergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
Christos MESELIDIS, Alex KARAGRIGORIOU and Takis PAPAIOANNOU
18.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
18.2. Divergence measures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
18.3. Tests of fit based on Φ−divergence measures . . . . . . . . . . . . . . . . . . 241
18.4. Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
18.5. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
xii Data Analysis and Related Applications 1

Part 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259

Chapter 19. Geographically Weighted Regression for Official Land


Prices and their Temporal Variation in Tokyo . . . . . . . . . . . . . . . . . . . 261
Yuta KANNO and Takayuki SHIOHAMA
19.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
19.2. Models and methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
19.3. Data analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
19.3.1. Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
19.3.2. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268
19.4. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
19.5. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
19.6. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273

Chapter 20. Software Cost Estimation Using Machine


Learning Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
Sukran EBREN KARA and Rüya ŞAMLI

20.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275


20.2. Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
20.2.1. Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
20.2.2. Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
20.2.3. Evaluating the performance of the model . . . . . . . . . . . . . . . . . . 278
20.3. Results and discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
20.4. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
20.5. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283

Chapter 21. Monte Carlo Accuracy Evaluation of Laser


Cutting Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
Samuel KOSOLAPOV
21.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286
21.2. Mathematical model of a pintograph . . . . . . . . . . . . . . . . . . . . . . . 286
21.3. Monte Carlo simulator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
21.4. Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294
21.5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
21.6. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
21.7. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
Contents xiii

Chapter 22. Using Parameters of Piecewise Approximation by


Exponents for Epidemiological Time Series Data Analysis . . . . . . . . . . 297
Samuel KOSOLAPOV

22.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298


22.2. Deriving equations for moving exponent parameters . . . . . . . . . . . . . . 298
22.3. Validation of derived equations by using synthetic data . . . . . . . . . . . . 300
22.4. Using derived equations to analyze real-life Covid-19 data . . . . . . . . . . 302
22.5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
22.6. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306

Chapter 23. The Correlation Between Oxygen Consumption and


Excretion of Carbon Dioxide in the Human Respiratory Cycle . . . . . . . . 307
Anatoly KOVALENKO, Konstantin LEBEDINSKII and Verangelina MOLOSHNEVA

23.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308


23.2. Respiratory function physiology: ventilation–perfusion ratio . . . . . . . . . 309
23.3. The basic principle of operation of artificial lung ventilation devices:
patient monitoring parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310
23.4. The algorithm for monitoring the carbon emissions and oxygen
consumption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312
23.5. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314
23.6. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316
23.7. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316

Part 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317

Chapter 24. Approximate Bayesian Inference Using the


Mean-Field Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319
Antonin DELLA NOCE and Paul-Henry COURNÈDE
24.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319
24.2. Inference problem in a symmetric population system . . . . . . . . . . . . . . 321
24.2.1. Example of a symmetric system describing plant competition . . . . . . 321
24.2.2. Inference problem of the Schneider system, in a more
general setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323
24.3. Properties of the mean-field distribution . . . . . . . . . . . . . . . . . . . . . 325
24.4. Mean-field approximated inference. . . . . . . . . . . . . . . . . . . . . . . . 327
24.4.1. Case of systems admitting a mean-field limit . . . . . . . . . . . . . . . . 327
24.5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330
24.6. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330
xiv Data Analysis and Related Applications 1

Chapter 25. Pricing Financial Derivatives in the Hull–White Model Using


Cubature Methods on Wiener Space . . . . . . . . . . . . . . . . . . . . . . . . 333
Hossein NOHROUZIAN, Anatoliy MALYARENKO and Ying NI
25.1. Introduction and outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333
25.2. Cubature formulae on Wiener space . . . . . . . . . . . . . . . . . . . . . . . 335
25.2.1. A simple example of classical Monte Carlo estimates . . . . . . . . . . . 335
25.2.2. Modern Monte Carlo estimates via cubature method. . . . . . . . . . . . 336
25.2.3. An application in the Black–Scholes SDE . . . . . . . . . . . . . . . . . 338
25.2.4. Trajectories of the cubature formula of degree 5 on Wiener space . . . . 339
25.2.5. Trajectories of price process given in equation [25.7] . . . . . . . . . . 340
25.2.6. An application on path-dependent derivatives . . . . . . . . . . . . . . . 341
25.2.7. Trinomial tree (model) via cubature formulae of degree 5 . . . . . . . . . 342
25.3. Interest-rate models and Hull–White one-factor model . . . . . . . . . . . . . 343
25.3.1. Equilibrium models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
25.3.2. No-arbitrage models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344
25.3.3. Forward rate models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345
25.3.4. Hull–White one-factor model . . . . . . . . . . . . . . . . . . . . . . . . 345
25.3.5. Discretization of the Hull–White model via Euler scheme . . . . . . . . 346
25.3.6. Hull–White model for bond prices . . . . . . . . . . . . . . . . . . . . . . 346
25.4. The Hull–White model via cubature method. . . . . . . . . . . . . . . . . . . 349
25.4.1. Simulating SDE [25.15] and ODE [25.24] . . . . . . . . . . . . . . . . 350
25.4.2. The Hull–White interest-rate tree via iterated cubature formulae:
some examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353
25.5. Discussion and future works . . . . . . . . . . . . . . . . . . . . . . . . . . . 354
25.6. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355

Chapter 26. Differences in the Structure of Infectious Morbidity


of the Population during the First and Second Half of
2020 in St. Petersburg . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359
Vasilii OREL, Olga NOSYREVA, Tatiana BULDAKOVA, Natalya GUREVA, Viktoria SMIRNOVA,
Andrey KIM and Lubov SHARAFUTDINOVA

26.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360


26.2. Materials and methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360
26.2.1. Characteristics of the territory of the district . . . . . . . . . . . . . . . . 360
26.2.2. Demographic characteristics of the area . . . . . . . . . . . . . . . . . . . 360
26.2.3. Characteristics of the district medical service . . . . . . . . . . . . . . . . 361
26.2.4. The procedure for collecting primary information on cases of diseases
of the population with a new coronavirus infection . . . . . . . . . . . . . . . . . 361
26.3. Results of the analysis of the incidence of acute respiratory viral infectious
diseases, new coronavirus infection Covid-19 and community-acquired
pneumonia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362
Contents xv

26.4. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367


26.5. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368

Chapter 27. High Speed and Secured Network Connectivity for Higher
Education Institutions Using Software Defined Networks . . . . . . . . . . . 371
Lincoln S. PETER and Viranjay M. SRIVASTAVA

27.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372


27.2. Existing model review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373
27.3. Selection of a suitable model . . . . . . . . . . . . . . . . . . . . . . . . . . . 374
27.4. Conclusion and future recommendations . . . . . . . . . . . . . . . . . . . . . 376
27.5. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376

Chapter 28. Reliability of a Double Redundant System Under the


Full Repair Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379
Vladimir RYKOV and Nika IVANOVA
28.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379
28.2. Problem statement, assumptions and notations . . . . . . . . . . . . . . . . . 381
28.3. Reliability function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384
28.4. Time-dependent system state probabilities . . . . . . . . . . . . . . . . . . . . 386
28.4.1. General representation of t.d.s.p.s . . . . . . . . . . . . . . . . . . . . . . 386
28.4.2. T.d.s.p.s in a separate regeneration period. . . . . . . . . . . . . . . . . . 387
28.5. Steady-state probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392
28.6. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393
28.7. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393

Chapter 29. Predicting Changes in Depression Levels Following the


European Economic Downturn of 2008 . . . . . . . . . . . . . . . . . . . . . . . 395
Eleni SERAFETINIDOU and Georgia VERROPOULOU

29.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 396


29.1.1. Aims of the study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 398
29.2. Data and methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 398
29.2.1. Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 398
29.2.2. Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 398
29.3. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 400
29.3.1. Descriptive findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 400
29.3.2. Non-respondents compared to respondents at baseline (wave 2) . . . . . 403
29.3.3. Descriptive findings for respondents – analysis by gender . . . . . . . . 405
29.3.4. Findings regarding decreasing depression levels – analysis for the
total sample and by gender . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 408
29.3.5. Findings regarding increasing depression levels – analysis for the
total sample and by gender . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 410
xvi Data Analysis and Related Applications 1

29.4. Discussion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413


29.5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414
29.6. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415
29.7. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415

List of Authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419

Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425

Summary of Volume 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429


Preface

This book is a collective work with contributions by leading experts on “Data


Analysis and Related Applications: Theory and Practice”.

The field of data analysis has grown enormously over recent decades due to the
rapid growth of the computer industry, the continuous development of innovative
algorithmic techniques and recent advances in statistical tools and methods. Due to
the wide applicability of data analysis, a collective work is always needed to bring
all recent developments in the field, from all areas of science and engineering, under
a single umbrella.

The contributions to this collective work are by a number of leading scientists,


analysts, engineers, demographers, health experts, mathematicians and statisticians
who have been working on the front end of data analysis. The chapters included in
this collective volume represent a cross-section of current concerns and research
interests in the scientific areas mentioned. The material is divided into four parts
and 29 chapters in a form that will provide the reader with both methodological and
practical information on data analytic methods, models and techniques, together
with a wide range of appropriate applications.

Part 1 focuses mainly on computational data analysis and related fields, with
nine chapters covering machine learning algorithms, web applications, spatial
analysis, multivariate regression, factor analysis, mixture models, non-parametric
techniques and tail distributions.

Part 2 focuses mainly on stochastic and algorithmic data analysis and related
fields, with nine chapters covering volatility, calibration, segmentation, Markov
chains, genetic algorithms, classification algorithms, batch processing, entropies and
pseudodistances.
xviii Data Analysis and Related Applications 1

Part 3 focuses mainly on applied statistical data analysis and related fields, with
five chapters covering spatial statistics, Monte Carlo methods, machine learning
methods, time series analysis and gas analysis.

Part 4 focuses mainly on economic and numerical data analysis and related
fields, with six chapters covering economic downturn, cyber systems, morbidity,
fixed-income market, Bayesian inference and reliability analysis.

Konstantinos N. ZAFEIRIS
Yiannis DIMOTIKALIS
Christos H. SKIADAS
Alex KARAGRIGORIOU
Christiana KARAGRIGORIOU-VONTA

April 2022
PART 1

Additive Manufacturing of Metal Alloys 1: Processes, Raw Materials and Numerical Simulation,
First Edition. Edited by Konstantinos N. Zafeiris, Christos H. Skiadas, Yiannis Dimotikalis Alex
Karagrigoriou and Christiana Karagrigoriou-Vonta.
© ISTE Ltd 2022. Published by ISTE Ltd and John Wiley & Sons, Inc.
1

Performance of Evaluation of Diagnosis


of Various Thyroid Diseases Using
Machine Learning Techniques

Thyroid cancer is the second most prevalent cancer type among women in
Turkey. The number of people diagnosed with thyroid cancer in the United States in
2021 is estimated as 44,280, according to the report published by the American
Cancer Society. The risk of thyroid cancer can be reduced by early diagnosis and
treatment. This study is focused on predicting five different thyroid diseases, based
on various symptoms and reports of the thyroid. Several machine learning
algorithms, such as support vector machine, k-nearest neighbors, artificial neural
network and decision tree are used for diagnosis of various thyroid diseases, and
their classification performances are compared with each other. For this purpose,
a thyroid disease dataset gathered from the Department of Nuclear Medicine and
Endocrinology in Istanbul University-Cerrahpaşa Faculty of Medicine was used.

1.1. Introduction

According to Feigenbaum, the pioneer of Artificial Intelligence, “an expert


system, which is one of the branches of Artificial Intelligence, is an intelligent
computer program that uses knowledge and inference procedures to solve problems
that are difficult enough to require significant human expertise for their solution”
(Bursuk 1999). The first expert system is DENDRAL, developed by chemist Joshua
Lederberg to describe chemical molecular structures in 1965. Since then, the
spectrum of artificial intelligence and expert systems especially has expanded with
technological developments (Bursuk 1999; Nohria 2015).

Chapter written by Burcu Bektas GÜNEŞ, Evren BURSUK and Rüya ŞAMLI.
4 Data Analysis and Related Applications 1

Medicine occupies a lot of space in artificial intelligence and expert systems, in


order to diagnose diseases, such as cancer, that can have serious consequences and
even lead to death, in the early stages and to apply the right treatment method to
patients, in order to ensure that these patients lead a quality life and to increase their
survival rates (Bursuk 1999; Nohria 2015).

There are four basic steps in the decision-making process providing diagnosis in
medicine. These are: cue acquisition, hypothesis generation, cue interpretation and
hypothesis evaluation. In modern times, the wide variety of diseases (differential
diagnosis), complicated disease states (the presence of more than one disease in the
same person), selectivity in perception, variety/size of medical data, insufficient
time allocated to the evaluation processes and the need for these processes to be
done in a limited time are all factors that may cause errors in the steps of this
decision-making process. Physical or emotional changes due to human nature such
as stress, fatigue, distraction, illness or inexperience can also increase the likelihood
of these diagnostic errors. Considering today’s technology, various computer-aided
systems are used to reduce these errors, and a new one is added to these systems
every day (Bursuk 1999; Nohria 2015). In addition, machine learning (ML), another
branch of artificial intelligence, is used in programs designed recently. It is used in
an increasingly wide range.

There are a number of research works on the classification of thyroid diseases in


the literature. Wang et al. proposed a deep learning-based method to diagnose
benign- or malignant-type thyroid nodules using ultrasound images. They compared
the radiomics and deep learning-based approaches. Deep learning turned out to be
the best approach (Wang et al. 2020). Godara and Kumar used logistics regression
and support vector machine (SVM) ML techniques to analyze the thyroid dataset.
They compared these two algorithms based on precision, recall, F-measure, receiver
operating characteristic curve (ROC) and root-mean-square (RMS) error. Logistic
regression turned out to be the best classifier (Godara and Kumar 2018). Obeidavi
et al. proposed a neural network-based method to diagnose the types of thyroid
disease. In this research, the dataset consisting of T3UR, FTI, FT4, FT3, T4, T3 and
TSH was conducted on 244 subjects. The results of this research indicated that, by
hormone tests and using neural networks, various types of thyroid diseases can be
diagnosed and the neural network provides almost 100% correct answers (Reza
Obeidavi et al. 2017).

In this study, we explored the use of machine learning methodology for the
automatic classification of thyroid diseases using 10 attributes. We used the private
dataset that contains the information of 130 patients from the Department of Nuclear
Medicine and Endocrinology in Istanbul University-Cerrahpaşa Faculty of
Medicine, Turkey (IUC). After pre-processing stages, the data were trained by
adapting most of the ML algorithms to our data. Results of this research indicated
Performance of Evaluation of Diagnosis of Various Thyroid Diseases 5

that by using all the findings (physical examination, laboratory findings and
radiologic findings) together, various types of thyroid disease can be diagnosed and
the ML provides almost 100% correct answers.

1.2. Data understanding

This research was carried out using physical examination, laboratory findings
and radiologic findings, depicted in Table 1.1. Data were obtained from IUC after
the Ethical Committee’s approval.

This dataset contains 10 attributes of 130 patients. Each measurement vector


consists of 10 values – seven attributes are binary and three attributes are
continuous. The binary and continuous attribute values are mapped to zero and one,
where zero refers to false (normal) and one refers to true (abnormal).

Attribute Domain Mapped domain


Physical examination
Hypothyroid findings [0, 1] [0, 1]
Hyperthyroid findings [0, 1] [0, 1]
Ophthalmopathy [0, 1] [0, 1]
Past viral inflammation [0, 1] [0, 1]
Goiter [0, 1] [0, 1]
Bilateral [0, 1] [0, 1]
Laboratory findings
Thyroid-stimulating hormone (TSH) (<0,0002 – >100) [0, 1]
Triiodothyronine (TT3) [0,532 – >800) [0, 1]
Total thyroxin (TT4) [0,2 – >30) [0, 1]
Radiologic findings
Nodular thyroid [0, 1] [0, 1]

Table 1.1. Dataset attribute description

This dataset contains five diseases. These are Plummer disease, toxic
multi-nodular goiter, Hashimoto’s disease, Graves’ disease and subacute thyroiditis.
In this context, the number of target attributes are seven for Plummer disease, 40 for
6 Data Analysis and Related Applications 1

toxic multi-nodular goiter, 32 for Hashimoto’s disease, 48 for Graves’ disease and
three for subacute thyroiditis for multiple classifications, as shown in Figure 1.1.

Figure 1.1. Class visualization for the whole dataset. For a color
version of this figure, see www.iste.co.uk/zafeiris/data1.zip

1.3. Modeling

For five different diseases, analyses were performed using machine learning
methods. SVM, k-nearest neighbors (KNN), artificial neural network (ANN) and
decision tree (DT) were used. With these algorithms, fivefold cross-validation was
used as a performance evaluation method for the dataset before the models were
performed. According to this method, the dataset is divided into five equal parts
each time, one part is chosen to be tested and the others are used as training data.

The accuracy metric in equation [1.1], the precision metric in equation [1.2], the
recall metric in equation [1.3] and F-measure metric in equation [1.4] are widely
used for model performance. In this study, accuracy was selected as the model
performance evaluation metric.

[1.1]

[1.2]

[1.3]


2∗ [1.4]

True positive (TP): the true label of the given sample is positive; it refers to the
number of data that the classifier also predicts as positive. True negative (TN):
Performance of Evaluation of Diagnosis of Various Thyroid Diseases 7

the true label of the given sample is negative; it refers to the number of data that the
classifier predicts as negative. False positive (FP): the true label is negative but
refers to the number of data the classifier incorrectly predicts positively. False
negative (FN): the true label is positive but refers to the number of data the classifier
incorrectly predicts negatively (Bulut et al. 2020).

SVM, KNN, ANN and DT were selected as the classification models.

SVM is one of the managed machine learning algorithms used for both
classification and regression issues, and is generally used for a bit of arrangement
problems. Each data item is plotted as a point in n-dimensional space with the value
of each feature being the value of a particular coordinate. The classification then
takes place by finding the hyper-plane that ideally differentiates the classes (Razia
et al. 2018; Raisinghani et al. 2019; Dharmarajan et al. 2020).

KNN is a simple, supervised machine learning algorithm that can be used to


solve both classification and regression problems. The algorithm is classified by the
majority of vote to its neighbors, with the case being assigned to the class, the most
common among its k nearest neighbors. This is measured by a distance function.
If k = 1, then the case value is simply assigned to the class of its nearest neighbor.
The three distance measures are noted as valid continuous variables (Dharmarajan
et al. 2020). In this study, the k value was taken as 3.

ANN is a well-known artificial intelligence technique for solving problems that


are difficult to be solved by human beings or conventional computational algorithms
(Hameed 2017). ANN can learn and adjust itself to solve different nonlinear
problems via modifying certain weights during the training process with offline data.
There are many existing architectures of ANN. In general, fundamental architectures
of ANN are single-layer feedforward, multi-layer feedforward and recurrent (Haykin
and Haykin 2009). In this study, a multi-layer feedforward ANN is used to
recognize the type of thyroid diseases. As a result of different trials, it was seen that
four hidden layers (h = 4) and learning rate (lr) 0.3 gave the best results. Therefore, a
four hidden layer structure was established. Back propagation is used as a learning
algorithm to train ANN. First, synaptic weights are initialized with random values.
Then, at each iteration of the back propagation algorithm, one input sample is
applied to ANN to produce the actual output. After that, the error is computed
between the actual output and the desired output. Depending on this error, the
synaptic weights are updated to minimize the error (Hameed 2017).

DT is one of the most important classification and prediction methods in


supervised learning. A decision tree classifier has a tree-type structure that provides
stability and high accuracy. Nodes and leaves are the two elements of which
decision trees are formed. Nodes help in the testing of a particular attribute, and
8 Data Analysis and Related Applications 1

leaves represent a class. The DT algorithm commonly uses the gini index,
information gain, chi-square and reduction in variance to make a strategic split
(Raisinghani et al. 2019; Chaubey et al. 2021). In this study, the J48 decision tree
algorithm was used.

1.4. Findings

The performance of the models is assessed using the accuracy metric. The results
are shown in Table 1.2 and Figure 1.2. The SVM algorithm achieved 100%
performance. Figure 1.2 shows the accuracy performances of the ML algorithms
compared with each other.

Algorithm used Accuracy


SVM 1
ANN (h = 4, lr = 0.3) 0.992
KNN (k = 3) 0.9769
Decision tree (J48) 0.9923

Table 1.2. Result analysis

Accuracy
1.02
1
0.98
0.96
0.94

SVM ANN
KNN Decision Tree

Figure 1.2. Accuracy comparison. For a color version


of this figure, see www.iste.co.uk/zafeiris/data1.zip

The confusion matrix is used to evaluate the effectiveness of the classification


model. The matrix compares the actual target values with the predictions of the
Performance of Evaluation of Diagnosis of Various Thyroid Diseases 9

machine learning algorithm. The confusion matrix of our dataset is obtained as


shown in Figures 1.3–1.6.

Predicted Label
Toxic
Graves’ Hashimoto’s Subacute Plummer
multi-nodular
disease disease thyroiditis disease
goiter
True Label

48 0 0 0 0
0 32 0 0 0
0 0 3 0 0
0 0 0 40 0
0 0 0 0 7

Figure 1.3. Confusion matrix for SVM

Predicted Label
Toxic
Graves’ Hashimoto’s Subacute Plummer
multi-nodular
disease disease thyroiditis disease
goiter
True Label

48 0 0 0 0
0 32 0 0 0
0 0 2 0 1
0 0 0 40 0
0 0 0 0 7

Figure 1.4. Confusion matrix for ANN

Predicted Label
Toxic
Graves’ Hashimoto’s Subacute Plummer
multi-nodular
disease disease thyroiditis disease
goiter
True Label

48 0 0 0 0
0 32 0 0 0
3 0 0 0 0
0 0 0 40 0
0 0 0 0 7

Figure 1.5. Confusion matrix for KNN


10 Data Analysis and Related Applications 1

Predicted Label
Toxic
Graves’ Hashimoto’s Subacute Plummer
multi-nodular
disease disease thyroiditis disease
goiter
True Label

48 0 0 0 0
0 32 0 0 0
0 0 2 0 1
0 0 0 40 0
0 0 0 0 7

Figure 1.6. Confusion matrix for DT (J48)

1.5. Conclusion

In this study, we explored the use of machine learning methodologies for the
automatic classification of thyroid diseases using 10 attributes. We used the private
dataset that contains the information of 130 patients from IUC. After pre-processing
stages, the data were trained by adapting most of the ML algorithms to our data. The
results of this research indicated that by using all the findings (physical examination,
laboratory findings and radiologic findings) together, various types of thyroid
disease can be diagnosed and the ML provides almost 100% correct answers. The
IUC dataset was sufficiently differentiated according to the disease for which it was
labeled. For this reason, ML algorithms have shown very high performances.
Overfitting was not observed. This system can be developed by using a larger and
more balanced dataset. Further development can be done by using image processing
of ultrasonic scanning of thyroid images to predict thyroid nodules, which cannot be
recognized in laboratory findings.

1.6. References

Bulut, B., Kalın, V., Güneş, B.B., Khazhin, R. (2020). Deep learning approach for detection
of retinal abnormalities based on color fundus images. 2020 Innovations in Intelligent
Systems and Applications Conference, 1–6, Istanbul, 15–17 October 2020.
Bursuk, E. (1999). A diagnostic expert system for cardiological, respiratory, vascular and
hematological diseases. Master’s thesis, Institute of Biomedical Engineering, Bosphorus
University, Istanbul.
Chaubey, G., Bisen, D., Arjaria, S., Yadav, V. (2021). Thyroid disease prediction using
machine learning approaches. Natl. Acad. Sci. Lett., 44(3), 233–238.
Performance of Evaluation of Diagnosis of Various Thyroid Diseases 11

Dharmarajan, K., Balasree, K., Arunachalam, A.S., Abirmai, K. (2020). Thyroid disease
classification using decision tree and SVM. Indian J. Public Health Res. Dev., 11, 229.
Godara, S. and Kumar, S. (2018). Prediction of thyroid disease using machine learning
techniques. International Journal of Electronics Engineering, 10(2), 787–793.
Hameed, M.A. (2017). Artificial neural network system for thyroid diagnosis. Eng. Sci.,
11(25), 518–528.
Haykin, S.S. and Haykin, S.S. (2009). Neural Networks and Learning Machines, 3rd edition.
Prentice Hall, New York.
Nohria, R. (2015). Medical expert system – A comprehensive review. Int. J. Comput. Appl.,
130(7), 44–50.
Raisinghani, S., Shamdasani, R., Motwani, M., Bahreja, A., Raghavan Nair Lalitha, P. (2019).
Thyroid prediction using machine learning techniques. In ICACDS 2019: Advances in
Computing and Data Sciences, Singh, M., Gupta, P., Tyagi, V., Flusser, J., Ören, T.,
Kashyap, R. (eds). Springer, Singapore.
Razia, S., Swathi Prathyusha, P., Krishna, N.V., Sumana, N. (2018). A comparative study of
machine learning algorithms on thyroid disease prediction. International Journal of
Engineering & Technology, 7(2.8), 315–319.
Reza Obeidavi, M., Rafiee, A., Mahdiyar, O. (2017). Diagnosing thyroid disease by neural
networks. Biomed. Pharmacol. J., 10(2), 509–524.
Wang, Y., Yue, W., Li, X., Liu, S., Guo, L., Xu, H., Zhang, H., Yang, G. (2020). Comparison
study of radiomics and deep learning-based methods for thyroid nodules classification
using ultrasound images. IEEE Access, 8, 52010–52017.
2

Exploring Chronic Diseases’


Spatial Patterns: Thyroid Cancer
in Sicilian Volcanic Areas

Spatial analyses of infectious diseases have a long tradition, and with the
contemporary increasing incidences of chronic and degenerative diseases, consistent
interest has emerged regarding the geography of these types of non-infectious
pathologies and their environmental correlations. In this work, we explore spatial
variations in the prevalence of thyroid cancer, taking into account the demographic
heterogeneity in the at-risk population at the small-area level.

This work aims to enhance the existing research surrounding thyroid incidence in
volcanic areas by analyzing spatial patterns of thyroid cancer cases in Mount Etna’s
area, in the eastern part of Sicily. It is known from the medical literature that several
constituents of volcanic lava and ashes, such as radioactive and heavy metals, are
involved in the pathogenesis of thyroid cancer via the biocontamination of
atmosphere, soil and aquifers. Here, we exploit a unique dataset that allowed us to
geocode the geographic location of cases at the household level, whereas all studies
that we are aware of use aggregated data. Applying the local Moran’s I statistic as a
means for detecting spatial clustering, we aimed to disentangle the spatial
aggregation of thyroid cancer cases due to the proximity to a volcanic area from that
due to the geographic variations in the density of the population at risk and other
concomitant environmental risk factors.

Chapter written by Francesca BITONTI and Angelo MAZZA.


For a color version of all the figures in this chapter, see www.iste.co.uk/zafeiris/data1.zip.

Additive Manufacturing of Metal Alloys 1: Processes, Raw Materials and Numerical Simulation,
First Edition. Edited by Konstantinos N. Zafeiris, Christos H. Skiadas, Yiannis Dimotikalis Alex
Karagrigoriou and Christiana Karagrigoriou-Vonta.
© ISTE Ltd 2022. Published by ISTE Ltd and John Wiley & Sons, Inc.
14 Data Analysis and Related Applications 1

Our preliminary findings seem to confirm a vast empirical literature that has
revealed an increased thyroid cancer incidence in volcanic areas, such as islands,
Hawaii and the Philippines, where an intense basaltic volcanic activity has also been
long detected; furthermore, parts of the Etna volcanic area seem to be more affected
than others.

2.1. Introduction

At the end of the 18th century, Dr. Valentine Seaman mapped yellow fever cases
in New York and thus succeeded in highlighting a possible correlation between the
sites of various dumps and the location of the cases (Stevenson 1965). About
60 years later, John Snow came up with the idea of creating a map of the cholera
cases that were plaguing Soho (London) at the time, and he realized that the cause of
the epidemic was due to a specific public fountain. By closing the fountain he
managed to stop the infection (Snow 1855; Walter 2000). These are just two of the
first attempts to use cartography as a tool to provide epidemiological information.
From that time on, geographic maps have increasingly been adopted as a traditional
tool to visualize the spatial distribution of diseases in the field of health. In general,
considerable effort has been devoted to the development of geographic information
systems (GIS) that facilitate the understanding of public health problems and foster
collaboration between physicians, epidemiologists and geographers to map and
predict disease risk (Croner et al. 1996). As a result of the epidemiological
transition, the long tradition of using geographic techniques for the analysis of
infectious diseases has assisted a similar application in the geographic distribution of
chronic diseases such as cancer and various types of heart disease (Ghosh et al.
1999; Wakefield 2007). There are many environmental risk factors included among
the possible concurrent causes of non-infectious pathologies, and geographical
representations constitute a valid tool for conducting exploratory analyses on the
spatial distribution of cases. In particular, May (1950) emphasized how a disease is
the product of the interaction between pathological factors (such as vectors and
genetic causes) and geographical factors acting on a physical, biological and social
level.

To date, many epidemiological studies suggest that the etiology of thyroid cancer
(TC) includes the presence of an active volcano among several factors such as the
technological improvement of screening systems, iodine consumption and others
(Marcello et al. 2014; Vigneri et al. 2015). TC is the most widespread endocrine
neoplasm, whose incidence has grown steadily around the world in recent decades
(Curado et al. 2007; Kilfoy et al. 2009; Fitzmaurice et al. 2015; Liu et al. 2017).
Exploring Chronic Diseases’ Spatial Patterns 15

An extremely high incidence of TC was found in Hawaii (Goodman et al. 1988;


Kolonel et al. 1990; Hawai’i Tumor Registry 2019), Iceland (Arnbjörnsson et al.
1986; Hrafnkelsson et al. 1989; Bray et al. 2017), the Philippines (Duntas and
Doumas 2009; Caguioa et al. 2019) and Sicily (Pellegriti et al. 2009; Malandrino
et al. 2013; Vigneri et al. 2017); all regions whose common denominator is the
presence of active volcanoes (Duntas and Doumas 2009). Although the underlying
causes of the progressive increase in the incidence of TC are still poorly defined and
greatly debated, many studies have suggested a potential relationship between
volcanic activity and the increase in the incidence of TC. Kung et al. (1981)
analyzed data of cancer registries from various areas, including Hawaii and Iceland,
and identified elements present in volcanic gases as plausible etiological agents of
TC. The research by Goodman et al. (1988) showed that the incidence of TC among
Hawaiian residents was higher than that of people of the same ethnic group but
residing elsewhere. This result supports the idea that environmental risk factors,
such as the volcanic nature of the territory, can play a critical role in increasing the
risk of TC. The same phenomenon of increased risk of TC emerged in other
volcanic areas such as the Vesuvius area in Campania (Biondi et al. 2019), New
Caledonia (Truong et al. 1985; Bray et al. 2017) and French Polynesia (Curado et al.
2007). The area around Mount Etna, in Sicily, was recently monitored because the
figures of the Cancer Registry of Eastern Sicily (CRES) recorded that the incidence
of TC in the vicinity of the volcano is double compared to the same data relating to
the entire Sicily region (Pellegriti et al. 2009). Several analyses of Sicilian data have
found a possible association between the volcanic environment and the increased
risk of TC in the proximity of Mount Etna (Vigneri et al. 2015; Malandrino et al.
2016).

All the aforementioned studies reinforce the hypothesis of a volcano–TC


relationship but lack a geographical approach to analyze the phenomenon of interest.
The epidemiological data of volcanic areas require, in our opinion, a geographical
investigation capable of offering a new vision of the risk of TC. When mentioning
the concepts of proximity and spatial variations, we cannot neglect the geographical
tools and approaches of spatial statistics. Our work represents an attempt to fill this
gap in the literature, introducing the geographical perspective in the study of the
distribution of TC in space. In particular, after having georeferenced the data from
CRES using the Google Maps Geocoding API interface, we have created maps
describing the risk of TC at the census tract level in the provinces of Messina,
Catania, Enna and Siracusa during the period 2003–2016. The chosen risk indicator
is the standardized incidence ratio (SIR), calculated by indirect standardization.
Using additional maps, we have shown which sections record an increase in
incidence compared to the expected one, which is statistically significant.
Subsequently, to evaluate the presence of clusters of high-risk areas we applied the
16 Data Analysis and Related Applications 1

local Moran’s I index. The local Moran’s I statistic is able to detect the presence of
spatial autocorrelation at the level of sub-areas, which may not emerge at the global
level. Although TC case maps and cluster analysis cannot prove the causal
mechanisms underlying the investigated phenomenon, we rely on these
methodologies to provide further evidence regarding the volcano–TC relationship
and to support decision-making in the public health sector. Our results show the
presence of areas of greater risk that would suggest a possible effect of proximity to
Mount Etna and also to Mount Vulcano, although the latter presents a reduced
activity in comparison with the first one. Despite this, given the exploratory
contribution of our work, a more in-depth study is required to gain a greater
understanding of the phenomenon.

This work is organized as follows: the second section describes the available
data and the salient features of the area under analysis; the third section reports the
methodology applied, with particular mention of SIR and local Moran’s I index; the
fourth section illustrates and discusses the distribution of TC in the eastern part of
Sicily and shows the presence of clusters of high- and low-risk areas and the fifth
and last section summarizes and concludes the work.

2.2. Epidemiological data and territory

TC is the most widespread endocrine neoplasm in the world and has been
increasing steadily in recent decades (Curado et al. 2007; Kilfoy et al. 2009;
Fitzmaurice et al. 2015). Incidence rates significantly higher than the national
averages were recorded in various volcanic areas such as the area that we consider in
this work, eastern Sicily. This area includes four provinces: Messina, Catania, Enna
and Siracusa. The volcanic area that refers to Mount Etna, the highest active
European volcano, is located in the province of Catania but involves some other
areas of the southern province of Messina. Pellegriti et al. (2009) actually report a
considerable increase in the incidence rate of TC compared to the Italian average,
especially in the province of Catania. The Sicilian TT incidence figures are made
public in the Health Atlas of Sicily, published by the Department for Health
Activities and Epidemiological Observatory (Regional Health Department 2106).
Table 2.1 shows the TT incidence rate for the provinces of eastern Sicily (calculated
for the period 2003–2011 by standardization on the new European population per
100,000 inhabitants), disclosed in the Health Atlas. The rate is always higher for
women than men, as known in the literature, and higher than the regional value in
the provinces of Catania and Messina, for both sexes.
Exploring Chronic Diseases’ Spatial Patterns 17

Several studies have revealed, over time, the presence of high levels of heavy
metals in the volcanic area, as a result of the continuous emissions of gas (mainly
composed of gases such as CO2 and SO2), ash and lava by Mount Etna (Buat-Ménard
and Arnold 1978; Cimino and Ziino 1983; Caltabiano et al. 2004; Andronico et al.
2009; D’Aleo et al. 2016). Such heavy metals include among others arsenic,
cadmium, chromium, cobalt, mercury, tungsten and zinc which, in high
concentrations, could contaminate soil, water and the atmosphere, eventually
entering the food chain (Vigneri et al. 2017). These works indicate that the presence
of an active volcano could contaminate the surrounding area through the repeated
emissions leading to potential repercussions for human health.

Area Males Females

Catania 10.6 35.5

Enna 7.7 25.5

Messina 9.6 29.1

Siracusa 5.2 18.9

Sicily 7.7 25.6

Table 2.1. TT age-standardized SIR for geographical


area. Source: Health Atlas of Sicily, 2016

The territory of these provinces is heterogeneous and includes the volcanic area
as well as urban, rural and industrial regions (Istat 2013). As a result, the resident
population and the cases of TC are distributed in a non-homogeneous way according
to the characteristics of the urban morphology and of the natural environment
(Figure 2.1).

The analyzed cases of TC were recorded by CRES and refer to individuals


residing in the four provinces of interest, aged between 5 and 95 years, and who
manifested the disease in the period 2003–2016. The residential addresses of
individuals have been geocoded using the Google Maps Geocoding API interface
(https://cloud.google.com/maps-platform/). The data concerning the population
residing in the same provinces, on the other hand, come from the 15th General
Population Census carried out by Istat (Italian National Institute of Statistics) in
2011.
18 Data Analysis and Related Applications 1

Figure 2.1. Spatial arrangement of resident population.


Source: 15th Italian General Population Census

2.3. Methodology

2.3.1. Spatial inhomogeneity and spatial dependence

The spatial distribution of cancer can be represented through a planar point


process, displayed as a series of points on a map in which the points, strictly called
“events”, represent precisely the cancer cases. The probability of finding a cancer
case changes according to the geographical distribution of the population and to the
presence of environmental risk factors. It is well known that the population is not
uniformly spread over the territory but is concentrated in localized densely
populated urban areas, leaving large rural and mountain areas mostly deserted. The
morphology of the territory can also present considerable differences even between
Exploring Chronic Diseases’ Spatial Patterns 19

neighboring areas, such as the presence of volcanic areas adjacent to coastal and
plain areas. Therefore, the expected risk of cancer will be higher where the number
of population at risk is high, and the environmental factors are close. Conversely, the
risk will be relatively lower in sparsely populated areas or where the natural causes
of the risk are missing.

The variability in the distribution of tumor events is described by the non-


homogeneous Poisson point process. In this model, the number of events N(U) in a
given area U ⊆ R, where R is the entire study region, follows a Poisson distribution
with variable spatial intensity λ(u). Therefore, the expected number of events is

In this case, it is possible that neighboring areas with similar population density
or in the presence (absence) of other risk factors, give rise to actual clusters of high,
medium and low risk of TC. The analysis of the similarity of the attributes of nearby
geographic areas is generally part of the study of spatial autocorrelation, which
evaluates the spatial distribution of a particular process in terms of relationships,
mutual influences and distance (Cressie 1991; Anselin and Rey 2010; Borruso and
Murgante 2012).

2.3.2. Standardized incidence ratio (SIR)

The risk of TC was represented through the production of maps showing the
spatial distribution, for each census tract, of the standardized incidence ratio (SIR).
The SIRs were calculated for each inhabited census tract by indirect standardization
(Waller and Gotway 2004, pp. 12–15), using the incidence rate of TC observed in
the same period (2003–2016) in the whole of eastern Sicily. SIR is the ratio between
observed TC cases and expected TC cases in each census tract i

where Oi is the number of cases observed for census tract i and Ei is the number of
cases expected in the same census tract i. The number of expected cases is calculated
as the product of the population at risk (and therefore the entire resident population)
in the given census tract i and the general incidence rate for the entire investigated
area

=
20 Data Analysis and Related Applications 1

where Pi is the population at risk in the specific census tract i and r+ is the general
incidence rate of TC, calculated for the four provinces of interest as a whole, as

where O+ corresponds to the number of cases of TC observed and P+ is the resident


population in the whole of eastern Sicily. The subscript + indicates that the variables
are calculated for the totality of the study area. Hence, it follows that the SIR of a
single census tract is thus calculated as

When the characteristics of the population determine a subdivision into strata


with different risk levels, it is necessary to give a proper weight to each stratum
based on its own specific risk. In this case, instead of calculating a general rate of
incidence r+ for the entire reference area, a different rate is calculated for each

stratum j as =∑ . Hence, the expected number of cases in census tract i is given
by = ∑ . When the number of expected cases Ei is very low, as in the case
of many types of tumor, it is generally assumed that the number of observed cases Oi
comes from a Poisson distribution with mean θi Ei, where θi is the relative risk of the
section i. Therefore, the relative risk of a specific census section equal to 1 implies
that this risk is equal to the risk of the entire reference area. It is therefore of interest
to locate the areas in which the relative risk, estimated by SIR, is greater than 1 and
therefore greater than expected (Banerjee et al. 2004, pp. 150–152; Bivand et al.
2008, pp. 320–323). By exploiting the fact that the TC cases are distributed
according to a Poisson, it is possible to construct the confidence intervals at 95% of
SIRs with an exact method, using the “pois.exact” function of the “epitools”
package contained in the R software (R Core Team 2014). The exact method was
preferred to the normal approximation since the number of cases observed in many
census sections was found to be small. In fact, when the number of observed cases
Oi is low, the Poisson distribution is strongly asymmetric and therefore it cannot be
approximated to a normal distribution (Breslow and Day 1987).

The SIR index suffers from limits in terms of variability: sparsely populated
areas have a high probability of resulting in a significantly high index, showing a
fallacious increase in the risk of TC. Furthermore, by construction, the standard
error of SIR tends to be large for sparsely populated areas and small for densely
populated ones. As a result, the confidence intervals of SIR will attribute
significance mostly to the highly populated areas (Haining 2003). On the whole,
Exploring Chronic Diseases’ Spatial Patterns 21

areas with low population density often result in extreme values of SIR while highly
populated areas are mostly associated with SIR significantly different from 1. To
overcome these issues and contain the variability in the spatial distribution of the
population, we will consider only the census tracts with more than 30 residents for
the calculation of SIR. On the contrary, when computing the expected global
number of cases for each stratum, rj, we will consider the totality of TC cases and
the resident population.

2.3.3. Local Moran’s I statistic

The local Moran’s I indicator belongs to the so-called LISA (Local Indicators of
Spatial Association) or local indicators of spatial autocorrelation proposed by
Anselin (1995). It is calculated with the following formula:

− ̅
= − ̅
,

where n is the number of geographical units, xi is the value of the variable x in


region i, x¯ is the sample mean of the variable, xj is the value of the variable x in all
other regions (where j ≠ i), S2i is the sample variance of the variable x and wij is a
weight that can be defined as the inverse of the distance between the various
regions. There are other ways to define wij, some contemplate choosing a limit
distance to define the neighborhood of a given region: the regions that fall within the
limit distance take on a weight equal to one, while the external regions take on a
weight equal to zero.

Positive and high values of the local Moran’s I index indicate that a given region
is surrounded by neighboring regions with similar high (or low) values of the
variable under study. In this case, the spatial groups detected are defined as
“high–high” (region with a high value surrounded by regions with high values) or
“low–low” (region with low value surrounded by regions with low values). In terms
of cancer risk, a “high–high” cluster would indicate a high-risk area, while a
“low–low” cluster would denote a low-risk area. Negative values of the local
Moran’s I reveal that the region under examination is a spatial outlier. A spatial
outlier is an area that has a markedly different value from that of its neighbors
(Cerioli and Riani 1999). Spatial outliers are divided into “high–low” (high value
surrounded by neighbors with low values) and “low–high” (low value surrounded by
neighbors with high values).

The local Moran’s I can be standardized so that its significance can be tested
under normal distribution assumption. However, its distribution under the null
22 Data Analysis and Related Applications 1

hypothesis of absence of spatial autocorrelation may not be normal, especially in the


presence of highly asymmetric data. For this reason, it is possible to adopt the
method of conditional permutation (Anselin 1995), which does not presuppose
assumptions on the data. According to this approach, when the value of an attribute
of a given region is evaluated, its value is kept fixed, and all other values (from other
regions) are randomly permuted without repetition. Each time the other values are
permuted, the local Moran’s I index is calculated to form an empirical reference
distribution. The significance level (called “pseudo p-value”) can be estimated by
comparing the index actually observed on the data with the empirical distribution
created by conditional permutation (Anselin 2005). Each pseudo p-value is
computed as (M + 1)/(R + 1), where R is the number of permutations and M is the
number of instances where a statistic computed via permutations is equal to or
greater than the observed value (for positive values index) or less than or equal to
the observed value (for negative index values). In this study, all local Moran’s I
indices were tested using 999 permutations and the significance level was chosen
<0.05.

2.4. Spatial distribution of TC in eastern Sicily

2.4.1. SIR geographical variation

In eastern Sicily from 2003 to 2016, 7,182 individuals were affected by TC. The
etiology of this tumor is complex and varied, and can be genetic as well as
preventive, come from dietary causes, etc. as already mentioned. In the case of
Sicily, the distribution of TC cases could also be conditioned by two geographical
components:
– the spatial arrangement of the resident population, with particular reference to
the female part, which is known to be the most affected by TC (Parkin et al. 2005).
Where the population is more concentrated or where the female population is
predominant, it will be more likely to record a high incidence of TC;
– the presence of environmental factors such as the volcanic nature of the
territory. The fumes emitted by an active volcano, such as Mount Etna, are able to
transport heavy metals and radioactive substances capable of contaminating the air,
water and soil of the surrounding areas (Fiore et al. 2019).

In an attempt to distinguish the effects of the two geographical components on


the spatial distribution of TC cases, we propose maps of the SIR by census tract and
its significant confidence intervals. SIRs were computed by dividing the population
into strata based on age and sex, to reflect the variation in the risk of TC due to these
two demographic variables. Therefore, a different overall risk rate was calculated for
each stratum (see section 2.3.2).
Exploring Chronic Diseases’ Spatial Patterns 23

Figure 2.2(a)and 2.2(b) shows, respectively, SIR by census section and the
relative confidence intervals. From the mere SIR representation (Figure 2.1(a)),
different risk areas emerge, namely those with an SIR value greater than 1. These
areas are located in the area around Mount Etna as well as in the non-volcanic
provinces, especially in those of Enna and Messina. The consideration of the
confidence intervals for SIR (Figure 2.3(b)) instead highlights the area south-east of
Mount Etna and different sections belonging mainly to the Messina province.
In both maps, it is evident that if in the non-volcanic provinces the census sections
with SIR greater than 1 are casually arranged on the territory, in the province of
Catania, the risk sections are concentrated in an area close to Mount Etna, leaving
the rest of the province almost free. Furthermore, the location of the risk areas along
the NW–SE axis could suggest that persistent winds in the SE direction could carry
the toxic substances emitted by the volcano, therefore polluting the atmosphere of
the territories positioned along this corridor, as highlighted in Boffetta et al. (2020).
It is also interesting to note that the census sections on the island of Lipari show a
high and significant SIR. Indeed, this area is also of a volcanic type and is located in
the immediate vicinity of Mount Vulcano, an active volcano presenting only a little
activity compared to that of Mount Etna. The island of Vulcano is home to
numerous sulfurous fumaroles as well as a field of frequent submarine volcanic CO2
emissions, whose spatial distribution follows the direction given by persistent winds
blowing from the NW (Vizzini et al. 2020). Moreover, Vizzini et al. (2013) stated
that the area experiences “low”-level contamination due to elements such as Ba, Fe,
As and Cd. Overall, the significance of SIR in Lipari seems to further corroborate
the idea that a volcano can influence the incidence of TC nearby.

Figure 2.2. SIR distribution by census section (a) and representation of


its significance (b). Source: author’s elaboration on CRES data
24 Data Analysis and Related Applications 1

2.4.2. Estimate of the spatial attraction

To visualize the presence of clusters of TC cases on the area in question and to


analyze their arrangement in relation to the proximity of Mount Etna, we mapped
the local Moran’s I index of the previously calculated SIRs. To date, in the
literature, there is no empirical method or clear theoretical foundation to guide the
choice of the “correct” spatial weight matrix (Anselin and Bera 1998); for this
reason, it is common practice to experiment with different types of the matrix. On
the other hand, LeSage and Pace (2014) found no solid theoretical basis showing
that estimates and inferences from spatial regression models are sensitive to a
particular specification of the spatial weight matrix. In this study, we employed a
row standardized binary spatial weights matrix, based on the 20th-order queen
contiguity criterion (also including intermediate orders) for the Moran I local index
calculations. Contiguity based on the queen criterion is often selected to analyze
areal data. The decision to include orders up to the 20th comes from the necessity to
consider the inverse correlation between the population residing in a specific section
and the area of the section itself. Generally, in fact, small census sections are
densely populated, while the large ones coincide with sparsely populated rural areas.
If we had considered a lower order, the smaller census sections with higher
population density (and therefore with a large population at risk) would have had a
restricted neighborhood, while the large and relatively less populated sections would
have had an extended neighborhood as a result of their same amplitude. The
consideration of the 20th order allowed us to build neighborhoods that were
“comparable” to each other in terms of geographical extension for all census
sections.

Figure 2.3(a) shows the local Moran’s I statistic, while Figure 2.3(b) shows the
pseudo p-values obtained from the conditioned permutation procedure. Low-risk
census sections surrounded by low-risk census sections are represented in bright
yellow; those of high risk with high-risk neighbors are in the brown; low-risk
sections surrounded by neighboring high-risk sections are colored light orange and
high-risk ones with a low-risk neighborhood appear in dark orange. Figure 2.3(a)
shows a variation in the risk between the northeast and the southwest: southern and
western internal areas do not host high-risk clusters, while the eastern and northern
ones present different high-risk clusters. In particular, there are extensive low-risk
clusters along the eastern coast of Messina and Syracuse, whereas high-risk groups
emerge in the SSE area to Mount Etna, in the Aeolian Islands up north and on the
northern coast near Barcellona Pozzo di Gotto. Figure 2.3(b) illustrates that the
sections constituting the high- and low-risk clusters are significant at a level equal to
at most α = 0.05. Finally, it should be noted that most of the considered sections
were found to be of insignificant risk, as can be seen from the large gray areas
present in both maps.
Exploring Chronic Diseases’ Spatial Patterns 25

Figure 2.3. Risk cluster map (a) and relative p-values


(b). Source: authors’ elaboration on CRES data

The cluster analysis could confirm the hypothesis according to which persistent
winds in the SE direction would push the radioactive substances emitted by the
volcano towards areas that report a high risk. A similar suggestion seems to apply to
the Aeolian Islands and the sections near Barcellona Pozzo di Gotto.

2.5. Conclusion

The study of the geographical spread of infectious diseases has a consolidated


tradition. The growing incidence of chronic degenerative diseases (mostly cancers
and cardiovascular pathologies) has led to the application of typical methodologies
for studying infections diffusion also in this area. When environmental factors are
included among the contributing causes of similar diseases, such as TC, the
geographical analysis is a fundamental step to obtain a greater understanding of the
distribution of risk and incidence. One of the environmental factors often cited as a
possible cause of the onset of TC is the presence of an active volcano. In various
volcanic regions around the world, various studies carried out on data from local
cancer registries have reported a significant increase in the incidence of TC (see
section 2.1). In this work, we mapped the TC cases in eastern Sicily to visualize the
risk areas and relate them to their proximity to Mount Etna. The health data
analyzed were released by the CRES. The geocoding activity of the TC cases’
addresses has allowed us to work at the census tracts level and build, therefore,
indexes and maps of extreme geographic precision. To quantify the risk, we adopted
SIR weighted for different strata of the population and calculated by indirect
Another random document with
no related content on Scribd:
forces.

146. Distances.
1. The distances between the several bodies in which troops are
distributed for attack depend upon the nature of the ground, and the
weapons of the enemy, and they must be fixed by the officers in
immediate command.
2. The scouts should be sufficiently far in advance of, and on the
exposed flanks of, the firing-line, to protect it from surprise. In close
or undulating country it will be necessary to provide for connecting
links in order that there may be no danger of touch with the
advanced scouts being lost, and of reports, verbal or by signal,
failing to reach the commanders of the firing-line. In wooded country
the distance may be decreased.
3. In close country, and in wood-fighting, the distances between
the several bodies into which an attacking force is divided should
seldom exceed 200 yards. In open ground greater distances are
necessary, except against a badly-armed enemy.
4. The distance of the general reserve should be usually greater
than that between the other bodies in order that it may not be
prematurely drawn into the fight.
5. The general rule is that the troops in rear should be brought
closer to the firing-line, the nearer the moment for the assault
approaches.

147. Intervals.
An arbitrary rule as regards intervals is undesirable. Each portion
of the force engaged will generally be told off to attack a particular
section of the enemy’s line, and the frontage to be occupied by each
left to the discretion of their commanding officers. It is essential that
there should be a clear understanding as to responsbility for
searching, and, if necessary, clearing, all dangerous ground which
lies between units. This should be notified in the orders for attack.

148. Direction and Pace.


1. Each unit should be given a point to move on. Nevertheless, in
moving through woods, or over ground so close that it is
impracticable to fix a point to march upon, a unit of direction, which
should march by compass bearing, or by some well-defined
landmark, such as a road or stream, is the only means of avoiding
confusion and delay, S. 131 (3).
2. A change in the direction of the line of march is effected by
giving a fresh point or points to move on.
3. When once a firing-line has been formed, a change of direction
under fire will be effected either as described in S. 48, or by forming
a new firing-line in the required direction from the troops in rear, the
old firing-line being withdrawn.
4. Undue rapidity tends to exhaust the men, and thus impair the
accuracy of their fire. During the earlier stages of the attack, the
ordinary pace should, therefore, be maintained.
5. When the defender’s fire begins to tell seriously the advance
must be continued according to circumstances as laid down in S.
136 (3).

149. Machine Guns.


1. The effective use of a machine gun depends on the
promptitude of its commander in utilising opportunities which are, as
a rule, very brief.
2. Machine guns form an integral part of the battalion to which
they belong, and will, as a rule, be employed under the orders of its
commander. This should not, however, prevent general officers
commanding brigades from detaching machine guns from their
battalions, especially in the case of reserve battalions, and
employing them either massed or in groups should the tactical
situation so demand. It must be remembered, however, that when
massed their position will be more easily discovered, and they will
form a large and vulnerable target for the enemy’s fire.
3. Machine guns may be employed with advantage in the attack in
the following conditions:—
(i) To cover the advance of the firing line by engaging the enemy from
positions in close support of it.
The gun should generally be regarded as a long range weapon
and in ordinary open ground it would rarely be advisable to push it
into the firing line, where it would offer a conspicuous target to the
fire of the enemy, but in a broken or enclosed country, where the
gun could be brought up under cover, occasions may arise where
it could be usefully employed in a forward position.
When the ground is favourable, the gun would with advantage
accompany that portion of the reserves told off to cover the
advance of the remainder by long-range fire.
(ii) To bring a concentrated fire on any particular spot.
(iii) To assist in repelling counter-attacks to which the firing line may
suddenly become exposed, and in the protection of the flanks
against cavalry or counter-attack.
(iv) To bring fire to bear upon an enemy from a position on a flank of
the battalion. The gun would, when so placed, be less liable to
draw the fire of the enemy upon the infantry which it is supporting.
(v) To give effect to holding attacks by sudden outbursts of fire.
(vi) To establish possession of points gained.

4. The machine gun commander must be fully acquainted with the


orders given to the infantry he is acting with and with all subsequent
orders issued. It is his business to watch his infantry, and conform to
their movements and keep touch generally. He should be allowed
great liberty of action.
5. Especial care must be exercised to bring the gun into action
without exposing it, and to screen it when in action. Machine guns
should generally be used singly, though occasions may occur when
it may be advisable to use them in pairs.

THE COMPANY IN ATTACK.


150. General Rules.
1. In executing an attack independently, the company commander
will employ his four sections in accordance with the principles laid
down in the preceding pages. He will see that his advance is
protected by scouts; and after as thorough a reconnaissance as his
means permit, he will carefully explain to the subordinate leaders
and men the object to be attained and the plan of action, and will
make certain that all understand what is expected of them. He will
tell off the company into firing-line and support, arrange, if possible,
for outflanking the enemy, keep a small reserve in his own hand, and
act generally in the same manner as the commander of a
considerable force.
2. In executing an attack in conjunction with the remainder of the
battalion, the company commander must explain to his subordinates
and men the orders he has received, and the method in which he
intends to carry them out. During the advance he should place
himself where he can best watch the firing-line and the enemy, and
at the same time issue orders to his support. His duties in action are
as follows:—
(i) He will detach scouts to the front, and if necessary to the flanks, to
cover his advance.
(ii) He will be careful to co-operate with the companies on his flanks,
to cover their advance by fire, and to maintain the direction.
(iii) He will keep the battalion commander acquainted with any change
in the dispositions of the enemy, and pass on any useful
information received from the scouts.
(iv) He is responsible that his supply of ammunition is complete, and
will make the necessary arrangements to bring up a further
supply; he will also ensure that the ammunition of disabled men is
collected and distributed.
(v) He will exercise a general control over the fire of his company.
(vi) He will, if opportunity offers, lend aid to other companies by
enfilading, or firing obliquely on, a portion of the enemy’s line.
(vii) He will lead his company in the assault.
(viii) If the assault succeeds, he will lose no time in rallying and
reforming his company, in replenishing ammunition, and if
necessary securing the position against counter-attack by means
of entrenchments.

3. When two or more officers are present with a company, one will
always be with the firing line.
4. Half-company commanders in the firing line will place
themselves where they can best supervise the skirmishers. Their
duties in action are as follows:—
(i) They must be constantly on the look out for the signals of the
company commander, and of the scouts.
(ii) They must maintain the direction.
(iii) They will see that fire is not wasted, and that it is concentrated on
important targets.
(iv) They will observe the enemy’s movements, and report at once to
the company commander.
(v) If the assault succeeds, they will lose no time in rallying and
reforming their half-companies.
(vi) During the advance they will take all leaderless men of other
companies and corps under their command, and keep them until
the action is over, or the force re-forms.
5. The frontage occupied by a company acting independently
depends on the nature of the operation. There may be a
considerable gap between the frontal and the flank attacks; and a
portion of the company, extended at wide intervals, may be told off
merely to hold the enemy, while the remainder, at closer intervals,
make the decisive attack.
The rule that a strong firing-line should be established in a good
fire-position at a decisive range must always be observed by the
portion of the company which is told off for the decisive attack; and
although the men need not be so close as in the case of larger
forces, still, to dislodge an enemy of nearly equal strength, the firing-
line, at decisive range, should not be weaker than one rifle to every
two or three yards of front.
6. When the company is acting in concert with the remainder of
the battalion, its frontage, as a rule, will be assigned by the battalion
commander.
7. The company commander must always be guided by
circumstances in deciding on the strength of his firing-line, and on
the formation of the remainder. The general procedure will be to
gradually reinforce the scouts, when they are checked by the
enemy’s fire, and thus build up a firing line, which, at decisive range,
shall be strong enough to gain superiority over the enemy’s fire. This
procedure is, however, by no means to be regarded as invariable. It
might be desirable, for instance, to deploy the whole company at
once in the firing line.
S. 153 (3). This may sometimes be advisable on open ground
without cover, when less loss would be incurred than by gradually
reinforcing a weaker firing-line.
8. In order that tactical unity may be maintained as long as
possible, it will usually be advisable that complete squads or
sections be extended on the first advance, further reinforcements
being furnished by the other squads of the same sections, or other
sections of the same half company.

THE BATTALION IN ATTACK.


151. General Rules.
1. The battalion commander is practically in the same position as
the commander of a brigade, with the exception that he has under
him eight small units instead of four large units.
2. Nevertheless, so limited are his powers of personal control
upon the field of battle, that success, as a rule, will depend on the
clearness and comprehensiveness of the order which commits his
companies to the attack, as well as on the manner in which he has
trained his company leaders. It is of importance, therefore, that the
battalion should never be hurried into action; but that time should be
taken for a survey of the ground, for the issue of orders, and for the
instructions to be given by the company leaders to their subordinates
and the men.
3. A battalion, whether acting alone or forming part of a larger
force engaged in an attack, will be sub-divided into three bodies, viz.,
firing-line, supports and reserves, on the same principle as laid down
in S. 129. The firing-line, which in the first instance will not exceed a
quarter of the whole battalion, will usually be furnished by the same
companies as the supports, whilst the reserves will be supplied by
the remainder, and be under the direct control of the battalion
commander.
When the battalion forms part of a larger force, the commander
will employ his reserves in strengthening such portions of his firing-
line as most require reinforcement, the whole battalion, as a rule,
being eventually absorbed into the firing-line.
4. When the battalion is acting independently, the commanding
officer will act on the same principles as the commander of a larger
force. He will detail certain companies for the flank attack, and others
if necessary, for a holding attack, or for a feint. He will make
arrangements from the companies of the reserve for the protection of
the flanks against counter-attack, and if the ground permits, for
covering the advance by long-range fire. He will retain a portion of
his battalion as a general reserve at his own disposal; and select a
portion of the enemy’s line against which the decisive attack will be
pressed home.
He will assign a portion of the objective to each company that
forms part of the firing-line; but it should seldom be necessary for
him, if his company leaders are well trained, to indicate the formation
to be adopted.
5. The battalion, on reaching the zone of distant fire, will form
lines of company columns, preceded, and, if necessary, flanked by
scouts.
6. It is impossible to lay down any rule as to the number of
companies in the firing-line. But it is always advisable, when the
battalion first forms for attack, whether it is acting alone or with
others, to put in no more than are actually required at the moment;
the remainder being kept well in hand, but in such formations as will
enable them to take advantage of cover, and avoid unnecessary
loss.
Before the enemy’s exact position is ascertained, the advance
must be cautious and deliberate, and it is dangerous in such
circumstances to place several companies alongside one another on
a broad frontage.
152. Orders.
The orders issued to a battalion will differ in degree but not in
principle from those given to a larger force, S. 131; as a rule they will
be issued verbally, but in any case they should be personally
explained by the commanding officer when the position comes into
view.

153. Distribution of the companies in the decisive attack.


1. In order to establish a strong firing-line within decisive range of
the enemy’s position, it is desirable that, making allowance for
losses, there should be, at the commencement of the attack, at least
125 rifles to every 100 yards of front, exclusive of that portion of the
reserves which will furnish the final reinforcement necessary to
deliver the assault. These 125 men are disposed in several bodies,
the bodies in rear, i.e., the supports and reserves, supplying the
successive reinforcements which gradually build up the firing-line to
its maximum strength.
2. Whether these 125 rifles are furnished by two or more
companies must be determined by the commanding officer.
3. To extend whole companies in the firing-line at the outset, the
supports being formed from other companies, is a proceeding which
can seldom be justified; leading as it must to a premature admixture
of tactical units, and to the surrender, at an unnecessarily early
period, of the control of the firing-line. The rule that all
reinforcements should be furnished as long as possible by the same
unit should never be infringed.
4. When the battalion is acting in concert with other units, the
frontage assigned to it, if the attack is intended to be decisive, must
be in proportion to its strength.
5. Battalions should be constantly exercised in forming for the
attack from a position of assembly, the frontage being always varied,
and a different number of companies told off to the firing-line and the
reserve. It is only by practice that a commanding officer can acquire
the facility of recognising at once how many companies should be
extended in firing-line, and how many allotted to the reserve.
154. The firing-line and supports.
The formation of the firing-line and supports, and the distance of
the latter from the firing-line, will be determined by the company
commanders. There is no necessity that these should be the same in
every company so long as the general principles laid down for the
attack are intelligently applied.

155. The reserve.


1. Next to the conception of a sound plan of attack, and the issue
of clear and comprehensive orders to the company commanders, the
most important duty of the officer commanding a battalion is the
handling of his reserve. It is by means of the reserve that he makes
his influence felt in action, and by reinforcing the firing-line at the
right time and at the right place keeps the attack moving and
eventually attains the superiority of fire. But judicious feeding of the
firing-line is not all that is required. Not only must its flanks be
protected, and its advance covered by long-range fire; but if the
enemy is well-trained, counter-attack is always to be apprehended;
and—what is also dangerous—a sudden reinforcement of the
defence, when the struggle for fire-superiority is at its height may
take place. It should be the aim, then, of the officer commanding, so
to husband his reserve, that while prosecuting the attack with vigour
by means of timely reinforcements, he may still have a sufficient
force at his disposal to meet emergencies. From first to last,
therefore, he should retain at least a portion of the reserve in his own
hand, for even a half-company may be of the greatest service in
repelling a sudden counter-attack, or in forming a rallying point if the
attack is repulsed.
2. If heavy losses are to be expected before a strong firing-line
can be established within decisive range of the enemy’s position, the
reserve should be stronger than the firing-line and supports. If, on
the other hand, the opposition is weak, or decisive range can be
reached under cover, the reserve may be of the same strength as
the firing-line and supports. It is to be observed, however, that the
firing-line and supports here alluded to are those engaged in the
decisive attack; companies engaged in a holding attack, or in a
feigned or false attack, are not to be counted when calculating the
strength of the reserve.
3. It may be advisable to divide the reserve into two distinct
bodies, one following the flank attack, the other the frontal attack.
4. When the flanks (or flank) of the battalion are exposed, a
portion of the reserve will be told off as a protection against counter-
attack and for extended patrolling.
5. On open ground, in order to avoid unnecessary loss, the
reserve must advance in several lines of skirmishers. In close
country, the reserve should move in as compact a formation as the
ground will permit, due regard being paid to the protection of the
flanks.
6. The initial formation of the companies in reserve will be decided
by the officer commanding, and will depend altogether upon the
ground. Wide intervals are not so essential as for the firing-line and
supports, and on ground which is little exposed to fire company
columns, or columns of fours may be resorted to with advantage. It
will seldom be necessary, however, that the formation of each
successive line should be identical; and, during the advance, the
formation of each company will be altered, in order to take
advantage of cover or to avoid shot-swept spaces, at the discretion
of its own commander.

156. Holding Attack.


The holding attack will be carried out by a battalion in accordance
with the principles already laid down; the frontage being larger, and
the reserve smaller than in a decisive attack, S. 139.

157. Instruction.
It is always advisable, in instructing a battalion, to hand over the
entire control of the companies in firing-line or reserve, with the
exception of the portion retained at the disposal of the officer
commanding, to their own leaders, and to give each of the latter a
free hand in carrying out the task assigned to him. Such a method,
with inexperienced company officers, may at first lead to mistakes
and misunderstandings; but as soon as these officers gain
confidence, become accustomed to working in concert, and
understand what is required of them, energetic combination will take
the place of hesitation and bewilderment, and the officer
commanding will find himself supported by a body of zealous and
self-reliant assistants, capable of executing his intentions without
depending on continual instructions.
Moreover, the practice of carrying out an attack by the co-
operation of several independent units is the only method possible in
a hotly contested action.
It must be made clear whether the battalion is supposed to be
acting alone or in conjunction with other troops.

THE BRIGADE IN ATTACK.

158. General Rules.


1. The rules for the battalion in attack apply in all respects to the
brigade, and even to larger forces of infantry, with the exception that
in the position of assembly the brigade or division will usually be
drawn up in line, or lines, of battalions in quarter column or lines of
company columns, and will advance as far as the zone of distant fire
in this formation. The commander assigns to the battalion leaders
their respective tasks, leaving them perfect freedom as to the
manner of execution, and the way in which they form their
commands.
2. The frontage of the brigade will depend on the situation, as also
the strength of the brigade reserve. The latter should always consist
of a complete unit or units, of which a small portion may be kept
back at the crisis of the attack to form a rallying point in case of
reverse, S. 132 (3).
3. The orders issued by the brigadier will be in the same form as
those issued by the officer commanding a larger force, S. 131.
4. The brigadier will be accompanied by signallers, who will
maintain communication with all the battalions of the brigade during
the attack.
THE DIVISION IN ATTACK.
159. General Rules.
1. The best battle-formation for the infantry of a division engaged
in a decisive attack, and, generally speaking, in all attacks, is the two
brigades placed alongside each other, dividing the front, and
regulating their own reserves. If the division is acting alone, it is
important that, as a general rule, the divisional commander should
retain two complete battalions as general reserve. During the action
the divisional commander should be accompanied by a party of
mounted signallers.
2. The divisional commander assigns to the brigadiers their
respective tasks, leaving them perfect freedom as to the manner of
execution.
3. The orders issued by the divisional commander will be drawn
up and issued as directed in S. 131, and “Combined Training,” S.
115.
4. If the divisional commander finds it necessary to detach a
battalion, or in an unforeseen contingency to give an individual
battalion direct orders for the execution of some movement, he
should at once inform the brigadier to whom the battalion belongs.
5. In the instruction of the brigade or division a most useful
exercise is to practice deployments for attack under different
suppositions, such as an enemy occupying different extents of front,
an enemy suddenly discovered in position half-right, half-left, or
flanking the line of march.

THE DEFENCE.
160. Distribution of Infantry for defence.
1. Infantry detailed for the defence of the entrenchments will
generally be distributed in two bodies, viz.,
(i) Firing Line and Supports.
(ii) Local Reserves.
For the decisive counter-attack, a separate body, The General
Reserve, which has nothing to do with the immediate defence of the
entrenchments, will be retained in the hands of the officer
commanding.
2. The strength of the firing line will depend entirely on the extent
of the field of fire and the character of the cover. If the conditions are
favourable to the defence a few men can easily protect a wide front.
If there is any chance of a surprise, or of the position being attacked
by a sudden rush, the firing line should be as dense as is compatible
with the free use of the rifle by every man engaged.
3. The duty of the supports is to replace casualties in the firing
line, and they should therefore be posted near at hand and under
cover. In strong positions very small supports will be quite sufficient,
or they may even be dispensed with altogether.
4. The duties of the local reserves are to deliver local counter-
attacks, to reinforce the firing line at critical moments, and to protect
the flanks; they will also furnish the outposts and supply
detachments to occupy temporary positions, either in front or beyond
the flanks of the entrenchments. S. 161 (7), also “Combined
Training,” 125 (4). Local reserves should be well covered, especially
from artillery fire; but there should be no obstacle to their being
brought rapidly to the front.

161. Occupation of the position.


1. It is far more important that every man should see well to the
front, and be well covered, than that the front should be regular and
continuous.
Each section, or even each squad, may have its own
entrenchment. These entrenchments will not necessarily be in one
general line. The main consideration is a good field of fire and
provision of oblique or enfilade fire to support other parts of the line.
2. Weak points should be strongly held. Between the points held
spaces may be left unoccupied, provided they can be swept by an
effective cross-fire.
3. When there is no time to entrench every man must improvise
cover for himself, and this should be constantly practised in peace.
When the troops occupy entrenchments every man should see that
he can use his rifle effectively, and if necessary make the
improvements necessary to enable him to do so.
4. The distance of all prominent objects and exposed points on
the probable lines of attack should be carefully ascertained, noted
and communicated to the men.
If time permit, these distances should be defined by marks.
5. If possible, objects which might assist the enemy in finding the
range should be removed, and all works and entrenchments should
be hidden with bushes, grass, &c.
6. Arrangements should be made to ensure that men, if suddenly
called on at night to man the entrenchments, fire in the required
direction. This can best be done by so designing parapets,
loopholes, &c., that the line of fire of a rifle resting on them grazes
the ground in advance for some distance.
7. In addition to the outposts, which will protect the front and
flanks of the position, troops may be specially detailed to take up
temporary positions to mislead the enemy, embarrass his
reconnoitring patrols and delay his advance, S. 160 (4); such troops
should be withdrawn before they become seriously engaged, care
being taken not to mask the fire of the main position during the
retirement.

162. Entrenching a Position.


1. Trenches on the sky-line afford so excellent a target, that such
a position, especially if the enemy has good artillery, should always
be avoided. They may, however, be constructed on the sky-line and
left unoccupied for the purpose of deceiving the enemy.
2. When placed at the foot of slopes that trend towards the enemy
they have the advantage that fire from them is more grazing than it
would be if they were placed higher up the slope, but a retreat under
fire from them will probably involve heavy loss. It is, as a rule, easy
to conceal them in such a position. On the other hand, the field of fire
from low-lying trenches is often very limited, and it is generally more
easy to open up communication with the rear when trenches are
close to the crest-line than when they are much in advance of it.
3. Trenches which can bring fire to bear at decisive range on to
the ground over which the attack must pass, and which are
themselves concealed from the attackers in the early stages, are
most valuable in surprising the enemy at the most critical period of
the attack.
4. Trenches should always be concealed and head cover provided
when possible, covered communication from the rear should also be
provided when time is available.
5. Important tactical points and such others which, owing to the
dead ground in their vicinity, constitute a weakness to position,
should be further strengthened by placing barbed wire
entanglements or abattis in front of them, trenches being so placed
as to bring an effective fire to bear on such obstacles.
6. When time admits, deep trenches just in rear of the crest-line
may be usefully provided to give cover to the supports or the
garrisons of the advanced trenches till they are required.

163. Fire.
1. As the difficulties of ammunition supply and want of knowledge
of ranges are not so great as in the attack, it will often be expedient
to open fire at long ranges in order to oblige the assailant to deploy
and adopt a definite course of action which it will be difficult for him
to rectify when exposed to fire.
Long-range fire may also be used to deceive the enemy as to the
dispositions and strength of the defender, and to check the advance
of reinforcements.
The employment of long-range fire must, however, be regulated
by the effect produced on the enemy. If this is observed to be small,
it will be wiser to reserve ammunition for closer ranges where better
results may be expected, and on occasion it may be advisable to
encourage the enemy’s advance by a weak fire or by withholding it
entirely, and to receive him at decisive ranges with a fire of the
greatest intensity possible.

164. Machine Guns.


1. In defence, as in attack, machine guns may be employed
singly, or in pairs.
2. They should not be isolated, and are not adapted for use
against lines of widely extended skirmishers, but are most suitable to
protect flanks, to flank salients or portions of the line, to cover
obstacles, to deny the passage of defiles to the enemy, or to bring a
heavy fire to bear from ground which, owing to its narrow frontage, is
unsuited for the deployment of infantry. If employed in the firing line,
they should be carefully concealed. Alternative positions connected
by a covered communication should be prepared, if possible, for use
in case they come under effective artillery fire.
3. If not utilised for these purposes they should be retained as a
reserve of fire, either in the hands of the battalion commander or
brigadier, to be used to check the advance of hostile reinforcements,
to meet turning movements, to support the firing-line in crises, to
prepare and cover the counter-attack, or against close deep
formations at long range.
4. In pursuit they should endeavour to operate against the flanks
of the enemy from decisive range.

165. Position of the Reserves.


1. The local reserves will be in their respective sections. For the
protection of the flanks they will be echeloned in rear of the flanks or
of one flank if the other is unassailable. If the flanks are secure, the
most suitable position for them, if it provides good cover, is in rear of
the centre of the section to which they belong.
2. The general reserve will usually be posted in rear of the centre
of the position, until the direction of the counter-attack can be
decided; but in certain cases, as when, for instance, the defender is
equal or superior in numbers, it may be echeloned in rear of that
flank where the ground offers the greatest facilities for the counter-
attack.
3. The reserves should be most carefully hidden until the moment
for action arrives. If no natural cover is available, artificial cover
should, if possible, be provided for them.

166. Duties of Officers Commanding Reserves.


1. The officers commanding the reserves, whether local or
general, must make themselves acquainted with all ground over
which they may have to act. They should know the direction of all
roads and tracks; and they must keep a watch, by means of staff
officers and patrols, on the progress of the engagement, so that they
may anticipate orders, and have their troops formed up ready to
move as soon as they are called for.

167. Local Counter-attacks.


1. Local counter-attacks, which are the special duty of the local
reserves, may be made at any moment. Should the enemy gain
some local success either in the position itself, or on ground close to
it, whence he could seriously threaten the defence of the position,
the necessity for counter-attack becomes imperative. In such cases,
the sooner the attack is delivered the better, so that the enemy may
have no opportunity of strengthening the ground he has gained.
2. Local counter-attacks are delivered on the initiative of the
officers in charge of sections of the defensive line. They should
seldom be carried far in advance of the entrenchments; and directly
the enemy’s firing-line falls back, the troops should be reformed as
rapidly as possible.
4. Local counter-attacks should also be delivered when the enemy
advances to the assault. Bayonets will be fixed when his line arrives
within a few hundred yards of the position, every available man
brought up into the firing-line, and the charge met with rapid fire, and
if that fails to stop him, with a counter-charge. In this counter-charge,
which should be practised at all manœuvres, the men will cheer,
bugles be sounded, and pipes played.
168. Decisive Counter-attack.
1. The decisive counter-attack will be delivered by the general
reserve, it will usually be directed against the enemy’s flanks, and in
such a manner as to threaten his line of retreat, although
opportunities for breaking the centre may sometimes occur. The
counter-attack should come, if possible, in the form of a surprise,
and should be carried through with the utmost vigour and resolution;
all ranks should understand that they must press forward until the
last reserve has been thrown in.
2. To judge the right time for the decisive counter-attack is as
difficult as it is important. The most favourable moment is when the
enemy has expended his reserves in endeavouring to storm the
entrenchments. If, however, the defending force is carefully
concealed, or if the enemy is led to believe that the front is much
longer than it really is, he may commit mistakes such as exposing a
portion of his force without hope of support from the remainder,
extending his front so far that the greater part of his force is in the
firing-line, exposing his flanks, or posting his reserves in the wrong
place; and these mistakes, all of which are favourable to the counter-
attack, may occur at any period of the engagement—. It is important,
therefore, that the course of the action should be closely watched,
that the staff should make arrangements for incessant patrolling,
constant observation, and the rapid transmission of reports, and that
the general reserve should be prepared for immediate action
throughout the fight.
3. When launched to the attack the firing-line, as a rule, should be
thicker than at the commencement of an ordinary attack, and it is
unnecessary that it should be preceded, though it must always be
flanked, by scouts. A portion of the force should be echeloned in
rear, in order to deal with the enemy’s reserves.
4. The formation in which the general reserve will carry out the
counter-attack cannot be laid down; but care should be taken that
the troops composing it are formed up in such a manner as to be
able to advance and come into action in any direction with the least
possible delay.
5. It is possible that there will be little time for issuing detailed
orders, but the direction and manner of carrying out the counter-
attack should be carefully pointed out to all subordinate
commanders, who will explain the same to the troops, and impress
on them the importance of getting to close quarters as quickly as
possible.

THE COMPANY IN DEFENCE.

169. General Rules.


1. When acting independently the company will act in accordance
with the principles enumerated in S. 160. The reserve will, as a rule,
undertake the defence of the flanks, in addition to its other duties. It
may often be conveniently placed in rear of the centre.
To deceive the enemy as to the extent of the position scouts must
be employed in place of larger bodies, and they should be
encouraged to use all sorts of stratagems, such as constantly
changing their positions, opening rapid fire, &c., &c., in order to
effect their purpose. Concealment is imperative.
All dead ground in front or on the flanks of the position should be
carefully observed.
2. When acting in battalion, a company told off to furnish a portion
of the firing-line will usually keep a part in support. But it will often be
advisable to extend only a few men at first, and to retain the
remainder in rear until the enemy’s infantry advance to the attack,
but they should be able to reach their places in the firing line without
being observed by the enemy.
3. The occupation of the ground allotted to a company will be
carried out in accordance with S. 161.

170. Duties of the Subalterns, Section, and Squad Leaders.


1. They are responsible that communication is maintained
between the different portions of the company, that all movements of
the company are at once reported, that the fire is kept under control,
that the men aim at the targets pointed out to them, and that all
instructions as regards cover, concealment, ranges, and water are
scrupulously observed.
2. They will see:—
(i) That every man has good cover.
(ii) That the firing-line is well hidden, the existence of entrenchments
concealed, and every man is in such a position that he can use
his rifle.
(iii) That ranges are taken and communicated to the men.
(iv) That every man has plenty of ammunition and a full water-bottle,
and that the ammunition from the killed and wounded is collected
and distributed.
(v) That the support knows the position of the firing-line.

3. They will ascertain the position of the dressing station and of


the reserve ammunition.
4. They will report to the company commander all movements of
the enemy and any opportunity which appears to be favourable to
counter-attack.
5. They will see that their flanks, if exposed, are protected by
scouts.
6. They will be careful to keep in communication with the
companies or either flank.

171. Duties of the Company Leader.


Nothing in the previous section is intended to relieve the company
commander of his responsibility in all that concerns his command.
He will make arrangements for the distribution of fresh supplies of
ammunition, but it is important he should not allow himself to
become too much engrossed in details which should be looked to by
his subordinates.
In defence, the occupation, to the best advantage, of the ground
allotted to him, is the company commander’s first duty.

THE BATTALION IN DEFENCE.

You might also like