Developmental Dyslexia Detection Using Machine Lea
Developmental Dyslexia Detection Using Machine Lea
Shahriar Kaisar
PII: S2405-9595(20)30101-6
DOI: https://doi.org/10.1016/j.icte.2020.05.006
Reference: ICTE 258
Please cite this article as: S. Kaisar, Developmental dyslexia detection using machine learning
techniques : A survey, ICT Express (2020), doi: https://doi.org/10.1016/j.icte.2020.05.006.
This is a PDF file of an article that has undergone enhancements after acceptance, such as the
addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive
version of record. This version will undergo additional copyediting, typesetting and review before it
is published in its final form, but we are providing this version to give early visibility of the article.
Please note that, during the production process, errors may be discovered which could affect the
content, and all legal disclaimers that apply to the journal pertain.
⃝c 2020 The Korean Institute of Communications and Information Sciences (KICS). Publishing
services by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http:
//creativecommons.org/licenses/by-nc-nd/4.0/).
Journal Pre-proof
of
Shahriar Kaisar
School of Business IT and Logistics, RMIT University, Melbourne, Australia
pro
Abstract
Developmental dyslexia is a learning disability that occurs mostly in children during their early childhood.
Dyslexic children face difficulties while reading, spelling and writing words despite having average or above-
average intelligence. As a consequence, dyslexic children often suffer from negative feelings, such as low self-
esteem, frustration, and anger. Therefore, early detection of dyslexia is very important to support dyslexic
children right from the start. Researchers have proposed a wide range of techniques to detect developmental
re-
dyslexia, which includes game-based techniques, reading and writing tests, facial image capture and analysis,
eye tracking, Magnetic reasoning imaging (MRI) and Electroencephalography (EEG) scans. This survey
paper critically analyzes recent contributions in detecting dyslexia using machine learning techniques and
identify potential opportunities for future research.
Keywords: Dyslexia, machine learning, survey, EEG
lP
1. Introduction sources, such as reading/writing tests, web-based
word games, eye tracking while reading/writing,
The word ‘Dyslexia’ is originated from the
MRI and EEG scans, video and image capture while
Greek language and it means difficulty with words.
reading/writing. Recently, machine learning ap-
Dyslexia is a type of specific learning difficulty
proaches have become popular in detecting dyslexia
(SLD) in which a person has difficulty in fluently
as they provide higher detection accuracy and bet-
reading, spelling, and writing despite having av-
rna
Machine
No of User’s Data Test Performance
Article Year Language learning
User age type Type metric
Technique
LDA and
Graphical models Accuracy,
2015 313 7-62 Hebrew Text Reading Naive
Lakretz et al. [7] Perplexity
of
Bayes
Detect readers Eye
2015 97 11-54 Spanish Reading SVM Accuracy
Rello and Ballesteros [8] tracking
Eye-tracking Eye
2016 185 9-10 Swedish Reading SVM Accuracy
Benfatto et al. [9] tracking
pro
Accuracy,
SVM, sensitivity,
WM connectivity MRI
2016 61 10-14.7 Mandarin Reading logistic specificity,
Cui et al. [10] scans
regression PPV and
NPV
SVM,
French, logistic Accuracy,
Multi-Parameter MRI
2017 236 8.5-13.7 German Reading regression and area
Plonski et al. [11] re- scans
and Polish and random under curve
forest
DCS
2018 857 7 (avg.) Malay Text Reading K-NN Accuracy
Khan et al. [12]
Accuracy,
Dytective Online
2018 267 7-60 English Text SVM precision,
Rello et al. [13] game
recall
lP
SVM,
ERP EEG Confusion
2018 32 grades 6-7 Hebrew Reading Neural
Frid and Manevitz [14] scans matrix
network
Typing Accuracy,
EEG pattern EEG
2018 32 ≥ 18 English and SVM sensitivity,
Perera et al. [15] scans
writing and specificity
Video SVM, Naive
rna
Adaptive learning
2018 30 7-12 Malay and Reading Bayes and Accuracy
Hamid et al. [16]
image K-NN
SVM, Naive
DysLexML Eye Accuracy,
2019 69 8.5-12.5 Greek Reading Bayes and
Asvestopoulou et al. [17] tracking MSE
K-means
Accuracy,
SVM
EEG local network EEG sensitivity,
2019 44 grade 3 Dutch Reading and
Rezvani et al. [18] scans specificity
K-NN
Jou
and precision
Convolutional
Handwriting grades hand-
2019 150 English image Neural Accuracy
Spoon et al. [19] k-6 writing
Network
2
Journal Pre-proof
tion system [9], MRI scanner [10] and eye tracker [8]
in a lab environment to collect EEG or MRI scan
data. Although these approaches achieve higher ac-
curacy, they are expensive, can only cover a small
set of users and may result in participants behaving
of
in an unusual way under observation or test envi-
ronment. In this regard, computer based reading
or writing tests along with game-based approaches
can be more beneficial. Nowadays smart mobile de-
vices are becoming popular and hence an app-based
pro
data collection technique will also help to reach a
broader user base.
less time consuming and often inexpensive . In ues can be numerical or categorical. The number
this case, a wide range of tests, such as reading of features varied across different studies from 12
[7, 12, 17, 9, 8, 10, 11, 14, 18, 16], writing and typ- to 226 [8, 13]. The next step is to identify the set
ing [15], handwriting [19], and web-based game [13] of dominant features that are more important for
are conducted to collect various types of data, such determining the class of the object. To achieve this,
as text [7, 12, 15, 13], Eye-movement [9, 8, 17], a few studies used manual selection [14] while oth-
MRI scans [10, 11], EEG scans [14, 15, 18], and ers used techniques, such as least absolute shrink-
image [16, 19]. The age of the participants var- age and selection operator (LASSO) [17] and SVM-
ied within 7 to 62 while the native language of the RFE [9]. LASSO can be simultaneously used for
Jou
participants across different studies were Spanish, improving accuracy and interpretability as it can
Hebrew, Swedish, Mandarin, French, German, Pol- simultaneously perform regularization and variable
ish, Malay, English, Greek, and Dutch. However, selection. They are suitable for regression models.
most of these studies were conducted in a specific On the other hand, SVM-RFE selects features con-
language and hence a game-based language inde- sidering their importance for SVM classifiers to sep-
pendent test [20] would be a better choice in this arate classes. This technique starts with a full fea-
regard. Although a language-independent data col- ture set and starts eliminating a number of features
lection is used in [20], it did not use any machine in consecutive iterations. Appropriate feature selec-
learning algorithm. A few approaches also require tion is an important task when the number of fea-
the use of customized tools, such as EEG headset tures is high due to the computational complexity.
[15], customized camera [17], infrared corneal reflec- However, comparative performance analysis of dif-
3
Journal Pre-proof
ferent feature selection techniques is not presented (PPV), negative predictive value (NPV) and area
in existing works. under the receiver operating characteristic (ROC)
curve. Accuracy measures the number of correctly
2.3. System training and classification classified objects to the total number of objects
while sensitivity and specificity measures the ra-
After feature selection, system training and clas-
of
tio of correctly identified dyslexic and non-dyslexic
sification is conducted using machine learning al-
users, respectively. Precision or positive predic-
gorithms. The dataset is divided into training and
tive value refers to the fraction correctly identi-
testing parts. Existing literature mostly used 10-
fied dyslexic users with respect to the total number
fold cross-validation where the dataset is divided
of identified dyslexic users while recall is the frac-
pro
into 10 equal parts, and 9 of them are used for
tion of the total amount of dyslexic users that were
training the algorithm while the other set is used
correctly identified. EEG-based methods achieved
for testing its performance [8, 11, 13] while others
60-80% accuracy in different works [15] while MRI
used a different split (e.g., five-fold [17], leave-one-
scan-based method achieved an accuracy of 83.61%
out-cross-validation (LOOCV) [18, 10, 11], and 70-
[10] and the game-based technique achieved an ac-
30 [12]). Since the training dataset already contains
curacy of 80.24% [8]. Overall, it would be interest-
the class information, i.e., dyslexic or non-dyslexic,
ing to see how the performance of these techniques
the supervised classification algorithms are used for
changes if data from multiple sources are combined
testing purposes. Existing studies mostly used sup- re- together . Table 1 highlights different aspects of
port vector machine (SVM), Naive Bayes, Logistic
the techniques proposed for dyslexia detection us-
regression, Neural network, K-Nearest Neighbour
ing machine learning approaches.
(K-NN) and Linear discriminant analysis (LDA) as
the machine learning algorithm to classify partici-
pants. SVM was the most common algorithm used 3. Future direction and conclusion
across multiple studies. Since the problem is essen-
tially a binary classification problem, i.e., identify Dyslexia is a learning disability and affecting
lP
dyslexic and non-dyslexic users, SVM is expected about 10% of the world population. It is highly
to provide good performance when the number of important to identify dyslexic children at an early
dimensions is higher than the number of samples, stage to provide them with appropriate learning fa-
and the feature space is sparse. However, the inter- cilities. Researchers have proposed several tech-
pretation of SVM is a complex task and it does not niques to identify dyslexic children. This pa-
perform well when the dataset has more noise. On per summarized existing dyslexia detection tech-
rna
the other hand, a method such as logistic regression niques that use machine learning approaches. Al-
is easier to implement and understand and expected though these approaches attain acceptable accu-
to provide a very good solution for binary classifica- racy and success rates, their performance can be
tion problems. Overall, the selection of appropriate further improved. In this regard, the collection
classification techniques would essentially depend of data from multiple sources (e.g., image, text,
on the data itself and hence studies should produce game and scans) can be combined to make the
a comparative performance to show the outcome of prediction models work better. The development
different machine learning models rather than re- of a language-independent data collection method
porting the performance of a selected one. In this would also be helpful in this case. It would be inter-
Jou
regard, application of ensemble methods can also esting to see the impact of ensemble methods where
be beneficial to achieve better performances. prediction from multiple models are combined to-
gether to improve accuracy of machine learning
2.4. Performance evaluation techniques. Overall, a combination of the above-
mentioned techniques is expected to provide better
Existing literature used MATLAB, WEKA, and
results in detecting dyslexia.
python based tools for performance evaluation. In
this case, different metrics were used for evaluating
the performance of dyslexia detection techniques Conflict of interest
using machine learning approaches. This includes
accuracy, sensitivity, specificity, precision, recall, The authors declare that there is no conflict of
mean square error (MSE), positive predictive value interest in this paper.
4
Journal Pre-proof
of
org.au/dyslexia-in-australia/, accessed on 28 padopouli, Dyslexml: Screening tool for dyslexia using
February 2020. machine learning, arXiv preprint arXiv:1903.06274.
[3] S. S. A. Hamid, N. Admodisastro, A. Kamaruddin, A [18] Z. Rezvani, M. Zare, G. Žarić, M. Bonte, J. Tijms,
study of computer-based learning model for students M. Van der Molen, G. F. González, Machine learning
with dyslexia, in: 2015 9th Malaysian Software Engi- classification of dyslexic children based on eeg local net-
neering Conference (MySEC), IEEE, 2015, pp. 284–289. work features, bioRxiv (2019) 1–23.
pro
[4] Dyslexia, https://www.open.edu/openlearn/ [19] K. Spoon, D. Crandall, K. Siek, Towards detecting
education-development/education/ dyslexia in children’s handwriting using neural net-
understanding-dyslexia/content-section-1.7.2, works, in: Proceedings of the 36th International con-
accessed on 28 February 2020. ference on Machine Learning, 2019, pp. 1–5.
[5] H. Perera, M. F. Shiratuddin, K. W. Wong, Re- [20] M. Rauschenberger, L. Rello, R. Baeza-Yates, J. P.
view of eeg-based pattern classification frameworks for Bigham, Towards language independent detection of
dyslexia, Brain informatics 5 (2) (2018) 4. dyslexia with a web-based game, in: Proceedings of the
[6] H. Perera, M. F. Shiratuddin, K. W. Wong, Review of Internet of Accessible Things, 2018, pp. 1–10.
the role of modern computational technologies in the
detection of dyslexia, in: Information Science and Ap-
plications (ICISA) 2016, Springer, 2016, pp. 1465–1475. re-
[7] Y. Lakretz, G. Chechik, N. Friedmann, M. Rosen-Zvi,
Probabilistic graphical models of dyslexia, in: Proceed-
ings of the 21th ACM SIGKDD International Confer-
ence on Knowledge Discovery and Data Mining, 2015,
pp. 1919–1928.
[8] L. Rello, M. Ballesteros, Detecting readers with
dyslexia using machine learning with eye tracking mea-
sures, in: Proceedings of the 12th Web for All Confer-
lP
ence, 2015, pp. 1–8.
[9] M. N. Benfatto, G. Ö. Seimyr, J. Ygge, T. Pansell,
A. Rydberg, C. Jacobson, Screening for dyslexia using
eye tracking during reading, PloS one 11 (12).
[10] Z. Cui, Z. Xia, M. Su, H. Shu, G. Gong, Disrupted white
matter connectivity underlying developmental dyslexia:
a machine learning approach, Human brain mapping
37 (4) (2016) 1443–1458.
[11] P. Płoński, W. Gradkowski, I. Altarelli, K. Monza-
rna
5
Journal Pre-proof
of
pro
re-
lP
rna
Jou