Abstract
Several research efforts have shown that a similarity function synthesized from examples may capture an application-specific similarity criterion in a way that fits the application needs more effectively than a generic distance definition. In this work, we propose a similarity learning algorithm tailored to problems of syntax-based entity extraction from unstructured text streams. The algorithm takes in input pairs of strings along with an indication of whether they adhere or not adhere to the same syntactic pattern. Our approach is based on Grammatical Evolution and explores systematically a similarity definition space including all functions that may be expressed with a specialized, simple language that we have defined for this purpose. We assessed our proposal on patterns representative of practical applications. The results suggest that the proposed approach is indeed feasible and that the learned similarity function is more effective than the Levenshtein distance and the Jaccard similarity index.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Yang, L., Jin, R.: Distance metric learning: a comprehensive survey. Michigan State Universiy 2 (2006)
Kulis, B.: Metric learning: a survey. Found. Trends Mach. Learn. 5(4), 287–364 (2012)
Bellet, A., Habrard, A., Sebban, M.: A survey on metric learning for feature vectors and structured data (2013). arXiv preprint arXiv:1306.6709
Fernau, H.: Algorithms for learning regular expressions from positive data. Inf. Comput. 207(4), 521–541 (2009)
Cicchello, O., Kremer, S.C.: Inducing grammars from sparse data sets: a survey of algorithms and results. J. Mach. Learn. Res. 4, 603–632 (2003)
Cetinkaya, A.: Regular expression generation through grammatical evolution. In: Proceedings of the 2007 GECCO Conference Companion on Genetic and Evolutionary Computation, pp. 2643–2646. ACM (2007)
Li, Y., Krishnamurthy, R., Raghavan, S., Vaithyanathan, S., Jagadish, H.: Regular expression learning for information extraction. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 21–30. Association for Computational Linguistics (2008)
Brauer, F., Rieger, R., Mocan, A., Barczynski, W.M.: Enabling information extraction by inference of regular expressions from sample entities. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 1285–1294. ACM (2011)
Murthy, K., P., D., Deshpande, P.M.: Improving recall of regular expressions for information extraction. In: Wang, X.S., Cruz, I., Delis, A., Huang, G. (eds.) WISE 2012. LNCS, vol. 7651, pp. 455–467. Springer, Heidelberg (2012)
Bartoli, A., Davanzo, G., De Lorenzo, A., Mauri, M., Medvet, E., Sorio, E.: Automatic generation of regular expressions from examples with genetic programming. In: Proceedings of the 14th Annual Conference Companion on Genetic and Evolutionary Computation, pp. 1477–1478. ACM (2012)
Bartoli, A., Davanzo, G., De Lorenzo, A., Medvet, E., Sorio, E.: Automatic synthesis of regular expressions from examples. Computer 12, 72–80 (2014)
Bartoli, A., De Lorenzo, A., Medvet, E., Tarlao, F.: Learning text patterns using separate-and-conquer genetic programming. In: Machado, P., et al. (eds.) Genetic Programming, vol. 9025, pp. 16–27. Springer, Cham (2015)
Bartoli, A., De Lorenzo, A., Medvet, E., Tarlao, F.: Active learning approaches for learning regular expressions with genetic programming. In: Proceedings of the 31st Annual ACM Symposium on Applied Computing, pp. 97–102. ACM (2016)
Bartoli, A., De Lorenzo, A., Medvet, E., Tarlao, F.: Inference of regular expressions for text extraction from examples. IEEE Trans. Knowl. Data Eng. 28(5), 1217–1230 (2016)
Megano, T., Fukui, K.i., Numao, M., Ono, S.: Evolutionary multi-objective distance metric learning for multi-label clustering. In: 2015 IEEE Congress on Evolutionary Computation (CEC), pp. 2945–2952. IEEE (2015)
Stahl, A., Gabel, T.: Using evolution programs to learn local similarity measures. In: Ashley, K.D., Bridge, D.G. (eds.) ICCBR 2003. LNCS, vol. 2689, pp. 537–551. Springer, Heidelberg (2003)
Xiong, N., Funk, P.: Building similarity metrics reflecting utility in case-based reasoning. J. Intell. Fuzzy Syst. 17(4), 407–416 (2006)
Xiong, N.: Learning fuzzy rules for similarity assessment in case-based reasoning. Expert Syst. Appl. 38(9), 10780–10786 (2011)
Schultz, M., Joachims, T.: Learning a distance metric from relative comparisons. In: Advances in Neural Information Processing Systems (NIPS), p. 41 (2004)
Xiong, S., Pei, Y., Rosales, R., Fern, X.Z.: Active learning from relative comparisons. IEEE Trans. Knowl. Data Eng. 27(12), 3166–3175 (2015)
Hao, S., Zhao, P., Hoi, S.C., Miao, C.: Learning relative similarity from data streams: active online learning approaches. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pp. 1181–1190. ACM (2015)
Ryan, C., Collins, J.J., Neill, M.O.: Grammatical evolution: evolving programs for an arbitrary language. In: Banzhaf, W., Poli, R., Schoenauer, M., Fogarty, T.C. (eds.) EuroGP 1998. LNCS, vol. 1391, pp. 83–96. Springer, Heidelberg (1998)
O’Neill, M., Ryan, C.: Grammatical evolution. IEEE Trans. Evol. Comput. 5(4), 349–358 (2001)
Acknowledgements
We are grateful to Michele Furlanetto who contributed in the implementation of our proposed method.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Bartoli, A., De Lorenzo, A., Medvet, E., Tarlao, F. (2016). Syntactical Similarity Learning by Means of Grammatical Evolution. In: Handl, J., Hart, E., Lewis, P., López-Ibáñez, M., Ochoa, G., Paechter, B. (eds) Parallel Problem Solving from Nature – PPSN XIV. PPSN 2016. Lecture Notes in Computer Science(), vol 9921. Springer, Cham. https://doi.org/10.1007/978-3-319-45823-6_24
Download citation
DOI: https://doi.org/10.1007/978-3-319-45823-6_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45822-9
Online ISBN: 978-3-319-45823-6
eBook Packages: Computer ScienceComputer Science (R0)