Authors:
Souad Taouti
1
;
Hadda Cherroun
1
and
Djelloul Ziadi
2
Affiliations:
1
LIM, Université UATL Laghouat, Algeria
;
2
Groupe de Recherche Rouennais en Informatique Fondamentale, Université de Rouen Normandie, France
Keyword(s):
Kernel Methods, Structured Data Kernels, Tree Kernels, Tree Series, Root Weighted Tree Automata, MapReduce, Spark, Parallel Automata Intersection.
Abstract:
Tree kernels are fundamental tools that have been leveraged in many applications, particularly those based on machine learning for Natural Language Processing tasks. In this paper, we devise a parallel implementation of the sequential algorithm for the computation of some tree kernels of two finite sets of trees (Ouali-Sebti, 2015). Our comparison is narrowed on a sequential implementation of SubTree kernel computation. This latter is mainly reduced to an intersection of weighted tree automata. Our approach relies on the nature of the data parallelism source inherent in this computation by deploying both MapReduce paradigm and Spark framework. One of the key benefits of our approach is its versatility in being adaptable to a wide range of substructure tree kernel-based learning methods. To evaluate the efficacy of our parallel approach, we conducted a series of experiments that compared it against the sequential version using a diverse set of synthetic tree language datasets that wer
e manually crafted for our analysis. The reached results clearly demonstrate that the proposed parallel algorithm outperforms the sequential one in terms of latency.
(More)