Abstract
Automatic Indexing of Broadcast News is a developing research area of great recent interest [1]. This paper describes the development steps for designing an automatic index system of broadcast news for both Basque and Spanish. This application requires of appropriate Language Resources to design all the components of the system. Nowadays, large and well-defined resources can be found in most widely used languages, but there is a lot of work to do with respect to minority languages. Even if Spanish has much more resources than Basque, this work has parallel efforts for both languages. These two languages have been chosen because they are evenly official in the Basque Autonomous Community and they are used in many mass media of the Community including the Basque Public Radio and Television EITB [2].
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Vandecatseye, A., Martens, J.P., Neto, J., Meinedo, H., Garcia-Mateo, C., Dieguez, F.J., Mihelic, F., Zibert, J., Nouza, J., David, P., Pleva, M., Cizmar, A., Papageorgiou, H., Alexandris, C.: The COST278 pan-European Broadcast News Database. In: Proceedings of LREC 2004, Lisbon, Portugal (2004)
EITB Basque Public Radio and Television, http://www.eitb.com/
Euskaltzaindia, http://www.euskaltzaindia.net/
Alegria, I., Artola, X., Sarasola, K., Urkia, M.: Automatic morphological analysis of Basque. In: Literary & Linguistic Computing, vol. 11(4), pp. 193–203. Oxford Univ. Press, Oxford (1996)
Peñagarikano, M., Bordel, G., Varona, A., de Ipina, L.: Using non-word Lexical Units in Automatic Speech Understanding. In: Proceedings of IEEE, ICASSP 1999, Phoenix, Arizona (1999)
Lopez de Ipiña, K., Graña, M., Ezeiza, N., Hernández, M., Zulueta, E., Ezeiza, A., Tovar, C.: Selection of Lexical Units for Continuous Speech Recognition of Basque. Progress in Pattern Recognition, Speech and Image Analysis, 244–250 (2003)
Lopez de Ipina, K., Ezeiza, N.: Bordel. N., Graña M.: Automatic Morphological Segmentation for Speech Processing in Basque IEEE TTS Workshop. Santa Monica USA (2002)
Egunkaria, Euskaldunon Egunkaria, the only newspaper in Basque, which has been recently replaced by Berria, online, at http://www.berria.info/
GARA, local Basque Country newspaper in Spanish, online, at http://www.gara.net/
Barras, C., Geoffrois, E., Wu, Z., Liberman, M.: Transcriber: a Free Tool for Segmenting, Labeling and Transcribing Speech. In: First International Conference on Language Resources and Evaluation, LREC 1998 (1998)
Linguistic Data Consortium, Design Specifications for the Transcription of Spoken Language, available online, at http://www.ldc.upenn.edu/Projects/Corpus_Cookbook
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bordel, G., Ezeiza, A., de Ipina, K.L., López, J.M., Peñagarikano, M., Zulueta, E. (2005). Language Resources for a Bilingual Automatic Index System of Broadcast News in Basque and Spanish. In: Sanfeliu, A., Cortés, M.L. (eds) Progress in Pattern Recognition, Image Analysis and Applications. CIARP 2005. Lecture Notes in Computer Science, vol 3773. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11578079_107
Download citation
DOI: https://doi.org/10.1007/11578079_107
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29850-2
Online ISBN: 978-3-540-32242-9
eBook Packages: Computer ScienceComputer Science (R0)