research-article

Toward a semantic granularity model for domain-specific information retrieval

Authors:

Raymond Y.K. Lau,

Jian MaAuthors Info & Claims

ACM Transactions on Information Systems (TOIS), Volume 29, Issue 3

Article No.: 15, Pages 1 - 46

https://doi.org/10.1145/1993036.1993039

Published: 22 July 2011 Publication History

Abstract

Both similarity-based and popularity-based document ranking functions have been successfully applied to information retrieval (IR) in general. However, the dimension of semantic granularity also should be considered for effective retrieval. In this article, we propose a semantic granularity-based IR model that takes into account the three dimensions, namely similarity, popularity, and semantic granularity, to improve domain-specific search. In particular, a concept-based computational model is developed to estimate the semantic granularity of documents with reference to a domain ontology. Semantic granularity refers to the levels of semantic detail carried by an information item. The results of our benchmark experiments confirm that the proposed semantic granularity based IR model performs significantly better than the similarity-based baseline in both a bio-medical and an agricultural domain. In addition, a series of user-oriented studies reveal that the proposed document ranking functions resemble the implicit ranking functions exercised by humans. The perceived relevance of the documents delivered by the granularity-based IR system is significantly higher than that produced by a popular search engine for a number of domain-specific search tasks. To the best of our knowledge, this is the first study regarding the application of semantic granularity to enhance domain-specific IR.

References

[1]

Allen, R.B. and Wu, Y. 2002. Generality of texts. In Proceedings of the 5th International Conference on Asian Digital Libraries. 111--116.

Digital Library

[2]

Aronson, A. R. 2001. Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. In Proceedings of the American Medical Informatics Association Annual Symposium. 17--21.

[3]

Bailey, P., Craswell, N., De Vries, A. P., and Soboroff, I. 2007. Overview of the TREC 2007 enterprise track. In Proceedings of The 16th Text Retrieval Conference (TREC'07).

[4]

Bargiela, A. and Pedrycz, W. 2008. Toward a theory of granular computing for human-centered information processing. IEEE Trans. Fuzzy Syst. 16, 2, 320--330.

Digital Library

[5]

Beaulieu, M., Fowkes, H., Alemayehu, N., and Sanderon, M. 1999. Interactive OKAPI at Sheffield TREC-8. In Proceedings of the 8th Text Retrieval Conference (TREC-8). E. M. Voorhees and D. K. Harman, Eds. 17--19.

[6]

Belkin, N. J., Perez Carballo, J., Cool, C., Kelly, D., Lin, S., Park, S. Y., Rieh, S. Y., Savage-Knepshield, P., and Sikora, C. 1998. Rutgers' TREC-7 interactive track experience. In Proceedings of the 7th Text Retrieval Conference (TREC-7). E. M. Voorhees and D. K. Harman, Eds. 275--283.

[7]

Belkin, N. J., Head, J., Jeng, J., Kelly, D., Lin, S., Park, S. Y., Cool, C., Savage-Knepshield, P., and Sikora, C. 1999. Relevance feedback versus local context analysis as term suggestion devices: Rutgers' TREC-8 interactive track experience. In Proceedings of the 8th Text Retrieval Conference (TREC-8). E. M. Voorhees and D. K. Harman, Eds. 565--574.

[8]

Bhatia N., Shah N. H., Rubin D. L., Chiang A. P., and Musen M. A. 2009. Comparing concept recognizers for ontology-based indexing: MGREP vs. MetaMap. In Proceedings of the AMIA Summit on Translational Bioinformatics.

[9]

Bodner, R. C. and Chignell, M. H. 1998. CLICKIR: Text retrieval using a dynamic hypertext interface. In Proceedings of the 7th Text Retrieval Conference (TREC-7). E. M. Voorhees and D. K. Harman, Eds. 573--582.

[10]

Buyukkokten, O., Kaljuvee, O., Garcia-Molina, H., Paeppcke, A., and Winograd, T. 2002. Efficient Web browsing on handheld devices using page and form Summarization. ACM Trans. Inform. Syst. 20, 1, 82--115.

Digital Library

[11]

Carbonell, J. and Goldstain, J. 1998. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 335--336.

Digital Library

[12]

Dumais, S., Platt, J., Heckerman, D., and Sahami, M. 1998. Inductive learning algorithms and representations for text categorization. In Proceedings of the 7th International Conference on Information and Knowledge Management. 148--155.

Digital Library

[13]

Fonseca. F., Egenhofer, M., Davis, C., and Camara, G. 2002. Semantic granularity in ontology-driven geographic information systems. Annals Math. Artifi. Intell. 36, 1--2, 121--151.

Digital Library

[14]

Fleiss, J. L. 1971. Measuring nominal scale agreement among many raters. Psych. Bull. 76, 5, 378--382.

[15]

Fuller, M., Kaszkiel, M., Kim, D., Ng, C., Robertson, J., Wilkinson, R., Wu, M., and Zobel, J 1998. TREC 7 Ad Hoc, speech, and interactive tracks. In Proceedings of the 7th Text Retrieval Conference (TREC-7). E. M. Voorhees and D. K. Harman, Eds. 465--474.

[16]

Fuller, M., Kaszkiel, M., Kimberley, S., Zobel, C., Ng, J., Wilkinson, R., and Wu, M. 1999. The RMIT/CSIRO Ad Hoc, Q&A, Web, interactive and speech experiments at TREC-8. In Proceedings of the 8th Text Retrieval Conference (TREC-8). E. M. Voorhees and D. K. Harman, Eds. 549--564.

[17]

Gan, G., Ma, C., and Wu, J. 2007. Data Clustering: Theory, Algorithms, and Applications. SIAM.

Digital Library

[18]

Gey, F., Jiang, H., Chen, A., and Larson, R. R. 1998. Manual queries and machine translation in cross-language retrieval and interactive retrieval with Cheshire II at TREC-7. In Proceedings of the 7th Text Retrieval Conference (TREC-7). E. M. Voorhees and D. K. Harman, Eds. 527--540.

[19]

Granka, L., Joachims, T., and Gay, G. 2004. Eye-tracking analysis of user behavior in WWW search. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 478--479.

Digital Library

[20]

Haveliwala, T. H. 2003. Topic-sensitive PageRank: A context-sensitive ranking algorithm for Web search. IEEE Trans. Know. Data Engin. 15, 4, 784--796.

Digital Library

[21]

He, B. and Ounis, I. 2004. Inferring query performance using pre-retrieval predictors. In Proceedings of the 11th Symposium on String Processing and Information Retrieval. Lecture Notes in Computer Science, Springer-Verlag, 43--54.

[22]

Hersh, W., Buckley, C., Leone, T. J., and Hickam, D. 1994. OHSUMED: An interactive retrieval evaluation and new large test collection for research. In Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'94). Springer-Verlag 192--201.

Digital Library

[23]

Hersh, W., Price, S., Kraemer, D., Chan, B., Sacherek, L., and Olson, D. 1998. A large-scale comparison of Boolean vs. natural-language searching for the TREC-7 interactive track. In Proceedings of the 7th Text Retrieval Conference (TREC-7). E. M. Voorhees and D. K. Harman, Eds. 491--500.

[24]

Hersh, W. and Over, P. 1999. TREC-8 interactive track report. In Proceedings of the 8th Text Retrieval Conference. 57--64.

[25]

Hersh, W., Turpin, A., Price, S., Kraemer, D., Chan, B., Sacherek, L., and Olson, D. 1999. Do batch and user evaluations give the same results&quest; An analysis from the TREC-8 interactive track. In Proceedings of the 8th Text Retrieval Conference (TREC-8). E. M. Voorhees and D. K. Harman, Eds. 17--24.

Digital Library

[26]

Hersh, W., Cohen, A. M., Roberts, P., and Rekapalli, H. K. 2006. TREC 2006 genomics track overview. In The Proceedings of the 8th Text Retrieval Conference (TREC-15). E. M. Voorhees and L. P. Buckland, Eds. 52--78.

[27]

Ho, J. and Tang, R. 2001. Towards an optimal resolution to information overload: An infomediary approach. In Proceedings of the International ACM SIGGROUP Conference on Supporting Group Work. 91--96.

Digital Library

[28]

Lagergren, E. and Over, P. 1998. Comparing interactive information retrieval systems across sites: The TREC-6 interactive track matrix experiment. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'98). ACM Press, New York, NY, 164--172.

Digital Library

[29]

Larson, R.R. 1999. Berkeley's TREC-8 interactive track entry: Cheshire II and Zprise. In Proceedings of the 8th Text Retrieval Conference (TREC-8). E. M. Voorhees and D. K. Harman, Eds. 613--622.

[30]

Lau, R. Y. K., Song, D., Li, Y., Cheung, C. H., and Hao, J. X. 2009a. Towards a fuzzy domain ontology extraction method for adaptive e-learning. IEEE Trans. Knowl. Data Engin. 21, 6, 800--813.

Digital Library

[31]

Lau, R. Y. K., Lai, C. L., and Li, Y. 2009b. Mining fuzzy ontology for a Web-based granular information retrieval system. In Proceedings of the 4th International Conference on Rough Set and Knowledge Technology. Lecture Notes in Computer Science, vol. 5589, Springer-Verlag, 239--246.

Digital Library

[32]

Lau, R. Y. K., Bruza, P. D., and Song, D. 2008. Towards a belief revision based adaptive and context sensitive information retrieval system. ACM Trans. Inform. Syst. 26, 2, 8.31--8.38.

Digital Library

[33]

Leacock, C. and Chodorow, M. 1998. Combining local context and WordNet similarity for word sense identification. In WordNet, An Electronic Lexical Database, Fellbaum, C., Ed. MIT Press, Cambridge, MA, 265--283.

[34]

Li, Y., Bandar, Z. A., and Mclean, D. 2003. An approach for measuring semantic similarity between words using multiple information sources. IEEE Trans. Knowl. Data Engin. 15, 4, 871--882.

Digital Library

[35]

Liu, Y., Zhang, B., Chen, Z., Lyu, M. R., and MA, W. Y. 2004. Affinity rank: A new scheme for efficient Web search. In Proceedings of the 13th World Wide Web Conference. ACM, New York, 338--339.

Digital Library

[36]

Liu, X. and Croft, W. B. 2002. Passage retrieval based on language models. In Proceedings of the 11th International Conference on Information and Knowledge Management. 375--382.

Digital Library

[37]

Liu, Y., Yang, Y., and Carbonell, J. 2002. Boosting to correct inductive bias in text classification. In Proceedings of the 11th International Conference on Information and Knowledge Management. 348--355.

Digital Library

[38]

Morris, J. and Hirst, G. 1991. Lexical cohesion computed by thesaural relations as an indicator of the structure of text. Computat. Ling. 17, 21--48.

Digital Library

[39]

Mowshowitz, A. and Kawaguchi, A. 2002. Bias on the Web. Comm. ACM, 45, 9, 56--60.

Digital Library

[40]

Ogden, W., Davis, M., and Rice, S. 1998. Document thumbnail visualizations for rapid relevance judgements: When do they pay off&quest; In Proceedings of the 7th Text Retrieval Conference (TREC-7). E. M. Voorhees and D. K. Harman, Eds. 528--534.

[41]

Over, P. 1997. TREC-6 interactive track report. InProceedings of the 6th Text Retrieval Conference. 73--82.

[42]

Over, P. 1998. TREC-7 interactive track report. InProceedings of the 7th Text Retrieval Conference. 65--72.

[43]

Page, L., Brin, S., Motwani, R., and Winograd, T. 1998. The PageRank citation ranking: Bringing order to the web. Tech. rep. Stanford InfoLab. http://ilpubs.stanford.edu:8090/422/.

[44]

Peng, F., Schuurmans, D., and Wang, S. 2004. Augmenting naive bayes classifiers with statistical language models. Inform. Retrie. 7, 3--4, 317--345.

Digital Library

[45]

Plachouras, V., Cacheda, F., Ounis, I., and Van Rijsbergen, C. J. 2003. University of Glasgow at the Web track: Dynamic application of hyperlink analysis using the query scope. In Proceedings of the 12th Text Retrieval Conference (TREC'03). 636--642.

[46]

Porter, T. 1980. An algorithm for suffix striping. Program. 14, 3, 130--137.

[47]

Ponte, J. and Croft, B. 1998. A language modeling approach to information retrieval. InProceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 275--281.

Digital Library

[48]

Price, S. L., Nielsen, M. L., Delcambre, L. M. L., and Vedsted, P. 2007. Semantic components enhance retrieval of domain-specific documents. In Proceedings of the 16th ACM Conference on Information and Knowledge Management. 429--438.

Digital Library

[49]

Ransdell, J. 1966. Charles Peirce: The idea of representation. Ph.D. dissertation, Columbia University, NY. Dissertations & Theses: A&I. (Publication No. AAT 6709367).

[50]

Resnik, P. 1995. Using information content to evaluate semantic similarity in a taxonomy. In Proceedings of the 14th International Joint Conference on Artificial Intelligence. 448--453.

Digital Library

[51]

Robertson, S. E. 1997. The probability ranking principle in IR. In Readings in Information Retrieval, Morgan Kaufmann Publishers Inc., 281--286.

Digital Library

[52]

Robertson, S. E., Walker, S., and Beaulieu, M. 1998. Okapi at TREC-7: Automatic ad hoc, filtering, VLC and interactive track. In Proceedings of the 7th Text Retrieval Conference (TREC-7). E. M. Voorhees and D. K. Harman, Eds. 253--264.

[53]

Roussinov, D. G. and Chen, H. 2001. Information navigation on the Web by clustering and summarizing query results. Inform. Process. Manage. 37, 6, 789--816.

Digital Library

[54]

Salton, G., Wong, A., and Yang, C. S. 1975. A vector space model for automatic indexing. Comm. ACM, 18, 229--237.

Digital Library

[55]

Salton, G. and Buckley, C. 1988. Term-weighted approaches to automatic text retrieval. Inform. Process. Manage. 24, 5, 513--523.

Digital Library

[56]

Salton, G. 1990. Full text information processing using the smart system. IEEE CS Tech. Comm. Datab. Engin. Bull. 13, 1, 2--9.

Digital Library

[57]

Salton, G. 1991. Developments in automatic text retrieval. Science. 253, 2053, 974--980.

[58]

Santaella, L. 2003. What is a symbol. Semiot. Evol., Energy, Devel. 3, 54--60.

[59]

Shepard, R. 1987. Towards a universal law of generation for psychological science. Science 237, 1317--1323.

[60]

Swan, R.C. and Allan, J. 1998. Aspect windows, 3-D visualizations, and indirect comparisons of information retrieval systems. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval. ACM Press, New York, NY, 173--181.

Digital Library

[61]

Van Rijsbergen, C. J. 1979. Information Retrieval. Butterworths.

Digital Library

[62]

Wang, M. and Si, L. 2008. Discriminative probabilistic models for passage based retrieval. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 419--426.

Digital Library

[63]

Yao, J. T. 2005. Information granulation and granular relationships. In Proceedings of the IEEE International Conference on Granular Computing. 326--329.

[64]

Yao, Y. Y. 2002. Information retrieval support systems. In Proceedings of the IEEE World Congress on Computational Intelligence. 773--778.

[65]

Yan, X., Song, D., and Li, S. 2006. Concept-based document readability in domain-specific information retrieval. In Proceedings of the 15th ACM International Conference on Information and Knowledge Management (CIKM). 540--549.

Digital Library

[66]

Yang, K., Maglaughlin, K. L., Meho, L., and Sumner, R.g, Jr. 1998. IRIS at TREC-7. In Proceedings of the 7th Text Retrieval Conference (TREC-7). 555--566.

[67]

Yang, K., Maglaughlin, K. L., and Iris, J. 1999. IRIS at TREC-8. In Proceedings of the 8th Text Retrieval Conference (TREC-8). 645--656.

[68]

Zadeh, L.A. 1979. Fuzzy sets and information granularity. In Advances in Fuzzy Set Theory and Applications, M. Gupta, R. K. Ragade, and R. R. Yager, Eds., North-Holland Publishing Company, 3--18.

[69]

Zakos, J., Verma, B., Li, X., and Kulkarni, S. 2003. Intelligent encoding of concepts in Web document retrieval. In Proceedings of the 5th International Conference on Computational Intelligence and Multimedia Applications (ICCIMA'03). 72.

Digital Library

[70]

Zhai, C. 2002. Risk minimization and language modeling in text retrieval dissertation abstract. SIGIR Forum 36, 100--101.

Digital Library

[71]

Zhai, C., Cohen, W. W., and Lafferty, J. 2003. Beyond independent relevance: Methods and evaluation metrics for subtopic retrieval. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 10--17.

Digital Library

Cited By

Kejriwal MHaidarian HChiu MXiang AShrestha DJaved FHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)A Semantic Search Engine for Helping Patients Find Doctors and Locations in a Large Healthcare OrganizationProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3661349(2945-2949)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3661349
Tamine LGoeuriot L(2021)Semantic Information Retrieval on Medical TextsACM Computing Surveys10.1145/346247654:7(1-38)Online publication date: 17-Sep-2021
https://dl.acm.org/doi/10.1145/3462476
Fan ZLi GLiu Y(2020)Processes and methods of information fusion for ranking products based on online reviews: An overviewInformation Fusion10.1016/j.inffus.2020.02.007Online publication date: Feb-2020
https://doi.org/10.1016/j.inffus.2020.02.007
Show More Cited By

Index Terms

Toward a semantic granularity model for domain-specific information retrieval
1. Information systems
  1. Information retrieval
  2. Information storage systems
2. Theory of computation
  1. Semantics and reasoning
    1. Program reasoning
      1. Abstraction

Recommendations

Concept-based document readability in domain specific information retrieval
CIKM '06: Proceedings of the 15th ACM international conference on Information and knowledge management

Domain specific information retrieval has become in demand. Not only domain experts, but also average non-expert users are interested in searching domain specific (e.g., medical and health) information from online resources. However, a typical problem ...
Research on Domain Ontology Based Information Retrieval Model
IUCE '09: Proceedings of the 2009 International Symposium on Intelligent Ubiquitous Computing and Education

This paper gives out an information retrieval system model based on Ontology. First of all, the weak point and its reason are analyzed for traditional information retrieval system,. Then, the concept of Ontology and its application in intelligent ...
Domain Ontology Driven Fuzzy Semantic Information Retrieval
Abstract
With the exponential growth in web content, the answers provided by traditional search engines by query specific keywords to content has resulted in markedly high recall and low precision. Semantic information retrieval can enhance the relevancy ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Information Systems

ACM Transactions on Information Systems Volume 29, Issue 3

July 2011

134 pages

ISSN:1046-8188

EISSN:1558-2868

DOI:10.1145/1993036

Issue’s Table of Contents

Copyright © 2011 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 July 2011

Accepted: 01 March 2011

Revised: 01 September 2009

Received: 01 October 2008

Published in TOIS Volume 29, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

51
Total Citations
View Citations
1,789
Total Downloads

Downloads (Last 12 months)16
Downloads (Last 6 weeks)1

Reflects downloads up to 10 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Kejriwal MHaidarian HChiu MXiang AShrestha DJaved FHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)A Semantic Search Engine for Helping Patients Find Doctors and Locations in a Large Healthcare OrganizationProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3661349(2945-2949)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3661349
Tamine LGoeuriot L(2021)Semantic Information Retrieval on Medical TextsACM Computing Surveys10.1145/346247654:7(1-38)Online publication date: 17-Sep-2021
https://dl.acm.org/doi/10.1145/3462476
Fan ZLi GLiu Y(2020)Processes and methods of information fusion for ranking products based on online reviews: An overviewInformation Fusion10.1016/j.inffus.2020.02.007Online publication date: Feb-2020
https://doi.org/10.1016/j.inffus.2020.02.007
Olteanu ACastillo CDiaz FKıcıman E(2019)Social Data: Biases, Methodological Pitfalls, and Ethical BoundariesFrontiers in Big Data10.3389/fdata.2019.000132Online publication date: 11-Jul-2019
https://doi.org/10.3389/fdata.2019.00013
Palotti JZuccon GHanbury A(2019)Consumer Health Search on the Web: Study of Web Page Understandability and Its Integration in Ranking AlgorithmsJournal of Medical Internet Research10.2196/1098621:1(e10986)Online publication date: 30-Jan-2019
https://doi.org/10.2196/10986
Ding TYan GLei YXu X(2019)A niching behaviour-based algorithm for multi-level manufacturing service composition optimal-selectionJournal of Ambient Intelligence and Humanized Computing10.1007/s12652-019-01250-0Online publication date: 15-Apr-2019
https://doi.org/10.1007/s12652-019-01250-0
Ibrahim M(2019)An empirical comparison of random forest-based and other learning-to-rank algorithmsPattern Analysis and Applications10.1007/s10044-019-00856-623:3(1133-1155)Online publication date: 28-Oct-2019
https://doi.org/10.1007/s10044-019-00856-6
Bigley J(2018)Assembling Frameworks for Strategic Innovation Enactment: Enhancing Transformational Agility through Situational ScanningAdministrative Sciences10.3390/admsci80300378:3(37)Online publication date: 25-Jul-2018
https://doi.org/10.3390/admsci8030037
Song YSun YHu QHe L(2018)An Interaction Embedded Framework for Healthcare Search2018 IEEE 22nd International Conference on Computer Supported Cooperative Work in Design ((CSCWD))10.1109/CSCWD.2018.8465267(347-352)Online publication date: May-2018
https://doi.org/10.1109/CSCWD.2018.8465267
Liu LLiu LFu XHuang QZhang XZhang Y(2018)A cloud-based framework for large-scale traditional Chinese medical record retrievalJournal of Biomedical Informatics10.1016/j.jbi.2017.11.01377(21-33)Online publication date: Jan-2018
https://doi.org/10.1016/j.jbi.2017.11.013
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents