Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.3115/1118637.1118645dlproceedingsArticle/Chapter ViewAbstractPublication PagessemiticConference Proceedingsconference-collections
Article
Free access

A morphological, syntactic and semantic search engine for Hebrew texts

Published: 11 July 2002 Publication History

Abstract

This article describes the construction of a morphological, syntactic and semantic analyzer to operate a high-grade search engine for Hebrew texts. A good search engine must be complete and accurate. In Hebrew or Arabic script most of the vowels are not written, many particles are attached to the word without space, a double consonant is written with one letter, and some letters signify both vowels and consonants. Thus, almost every string of characters may designate many words (the average in Hebrew is almost three words). As a consequence, deciphering a word necessitates reading the whole sentence. Our model is Fillmore's framework of an expression with a verb as its center. The engine eliminates readings of words unsuited to the syntax or the semantic structure of the sentence. In every verbal entry of our conceptual dictionary the features of the noun phrases (NP's) required by the verb are included. When all the correct readings of all the strings of characters in the sentence have been identified, the program chooses the proper occurrences of the searched word in the text. Approximately 95% of the results by our search engine match those in the query.

References

[1]
Bentor, E., A. Angel, D. Ben-Ari-Segev and A. Lavie. 1992. Computerized Analysis of Hebrew Words in Hebrew Computational Linguistics, ed. by U. Ornan, G. Arieli and E. Doron, Ministry of Science and Technology, pages 36--38.(Hebrew).]]
[2]
Carmel, David and Yoelle Maarek. 1999. Morphological disambiguation for Hebrew. In Proceedings of the 4th International Workshop NGIT-99, Lecture notes in computer science 1649. Springer Verlag, pages 312--325.]]
[3]
Chomsky, N. 1965. Aspects of the Theory of Syntax. MIT Press.]]
[4]
Chomsky, N. Lectures on Government and Binding. .Foris Pub.]]
[5]
Choueka, Y. and Serge Lusignan.1985. Disambiguation by Short Contexts. In Computers And Humanities Vol 19:147--157.]]
[6]
Choueka Yaacov.1990. Responsa: An Operational Full-Text Retrieval System. In Computers in Literary and Linguistic Research. edited by J. Hamesse and A. Zampoli. Champion-Slatkine Paris-Geneve. Pages 94--102]]
[7]
Even-Shoshan, Avraham.1987. The New Dictionary. (Hebrew).]]
[8]
Fillmore, C. C.1968. The Case for Case. In Universals in Linguistic Theory. Edited By E. Bach and R. Harmes. Holt, Rinehart and Winston, New York, Academic Press. Pages 59--81.]]
[9]
Herz, Y. and M. Rimon.1992. Diminishing Ambiguity by Short-Context Automaton. In Hebrew Computational Linguistics. Edited by U. Ornan, G. Arieli and E. Doron. Ministry of Science and Technology. Pages 74--87.(Hebrew).]]
[10]
Ide, Nancy and Jean Véronis. 1998. Introduction to the Special Issue on Word Sense Disambiguation: The State of the Art. Compuational Linguistics 24:1--40.]]
[11]
ISO 259--3.1999. Conversion of Hebrew Characters Into Latin Characters. Part 3: Phonemic Conversion. ISO/TC46/SC2.]]
[12]
Levinger, Moshe. 1992. Morphological Disambiguation in Hebrew. Research Thesis for MSc in Computer Science. Technion. Haifa (Hebrew).]]
[13]
Levinger, Moshe, Uzzi Ornan and Itai Alon. 1995. Learning Morpho-Lexical Probabilities from an Untagged Corpus with an Application to Hebrew. Computational Linguistics 21:383--404.]]
[14]
Miller, George A. 1993. Nouns in WordNet. Web file.]]
[15]
Nirenburg, Sergei and Y. Ben Asher. 1984. HUHU: Hebrew University Hebrew Understander. Journal of Computational Linguistics Vol. 9:161--182.]]
[16]
Ornan, Uzzi. 1987. Hebrew Text Processing Based on Unambiguous Script. .Mishpatim 17:15--24.(Hebrew)]]
[17]
Ornan, Uzzi.1991. Theoretical Gemination in Israeli Hebrew. Semitic Studies in honor of Wolf Lwslau. Edited by Alan S, Kaye. Otto Harrassowitz, Weisbaden. Pages 1158--1168.]]
[18]
Ornan, Uzzi and Michael Katz. 1994. A New Program for Hebrew Index Based on the Phonemic Script .TR #LCL 94-7 (revised). Technion - I.I.T.]]
[19]
Ostler, Nicholas. 1995. Perception Vocabulary in five Languages - Towards an Analysis Using Frame Elements. In Steffens Petra (editor) Machine Translation and the Lexicon. Springer Verlag. Pages 219--23.]]
[20]
Segal, Erel, 1999. Hebrew Morphological Analyzer for Hebrew undotted Analysis. Thesis for MSc in Computer Science. Technion. Haifa (Hebrew).]]
[21]
Somers, H. L. 1987. Valency and Case in Computational Linguistics. Edinburg University Press.]]
[22]
Stern, Naftali. 1994. The Verb Dictionary. Bar-Ilan University. (Hebrew).]]
[23]
Wintner, Shuly and Uzzi Ornan. 1995. Syntactic Analysis of Hebrew Sentence. Natural Lnaguage Engineering 1:261--288.]]
[24]
Whorf, Benjamin Lee. 1956. The Relation of Habitual Thought and Behavior to Language. In Leslie Spier (editor) Language, Culture and Reality, essays in memory of Edward Sapir.1941. Pages 75--93. Reprinted in John B. Carrol (editor) Language, Thought and Reality. M.I.T. Press. Pages 134--159]]

Recommendations

Comments

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
SEMITIC '02: Proceedings of the ACL-02 workshop on Computational approaches to semitic languages
July 2002
85 pages

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 11 July 2002

Qualifiers

  • Article

Acceptance Rates

Overall Acceptance Rate 12 of 21 submissions, 57%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 432
    Total Downloads
  • Downloads (Last 12 months)40
  • Downloads (Last 6 weeks)4
Reflects downloads up to 06 Oct 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media