Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2467696.2467736acmconferencesArticle/Chapter ViewAbstractPublication PagesjcdlConference Proceedingsconference-collections
research-article

IFME: information filtering by multiple examples with under-sampling in a digital library environment

Published: 22 July 2013 Publication History

Abstract

With the amount of digitalized documents increasing exponentially, it is more difficult for users to keep up to date with the knowledge in their domain. In this paper, we present a framework named IFME (Information Filtering by Multiple Examples) in a digital library environment to help users identify the literature related to their interests by leveraging the Positive Unlabeled learning (PU learning). Using a few relevant documents provided by a user and considering the documents in an online database as unlabeled data (called U), it ranks the documents in U using a PU learning algorithm. From the experimental results, we found that while the approach performed well when a large set of relevant feedback documents were available, it performed relatively poor when the relevant feedback documents were few. We improved IFME by combining PU learning with under-sampling to tune the performance. Using Mean Average Precision (MAP), our experimental results indicated that with under-sampling, the performance improved significantly even when the size of P was small. We believe the PU learning based IFME framework brings insights to develop more effective digital library systems.

References

[1]
B. Liu. Web Data Mining: Exploring Hyperlinks, Contents and Usage Data. Springer, 2006.
[2]
K. Noto, M. Saier, C. Elkan. Learning to find relevant biological articles without negative training examples. In Proceedings of 21st Australasian Joint Conference on Artificial Intelligence, pages 202--213, 2008.
[3]
B. Liu, W. S. Lee, P. S. Yu, X. Li. Partially supervised classification of text documents. In Proceedings of the Nineteenth International Conference on Machine Learning, pages 387--394, San Francisco, CA, USA, 2002.Morgan Kaufmann Publishers Inc.
[4]
Y. Altun, D. McAllester, M. Belkin. Maximum margin semi-supervised learning for structured variables. In Proceedings of Neural Information Processing Systems, 2005.
[5]
K. Nigam, A. McCallum, S. Thrun, T. Mitchell. Learning to classify text from labeled and unlabeled documents. In Proceedings of the Fifteenth National Conference on Artificial Intelligence, pages 792--799, 1998.
[6]
F. Denis, R. Gilleron, M. Tommasi. Text classification from positive and unlabeled examples. In Proceedings of the 9th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, pages 1927--1934, 2002.
[7]
C.-C. Chang, C.-J. Lin. LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2(3): 1--27, 201.
[8]
B. Liu, W. Lee, P. Yu, X. Li. Partially supervised classification of text documents. In Proceedings of the Nineteenth International Conference on Machine Learning, pages 387--394, 2002.
[9]
H. Yu, J. Han, K. C. Chang. PEBL: Web page classification without negative examples. IEEE-Transactions on Knowledge and Data Engineering. 16(1): 70--81, 2004.
[10]
X. Li, B. Liu, S. Ng. Negative Training Data can be Harmful to Text Classification. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pages 218--228, 2010.
[11]
Stanford University. Stanford Core NLP Software. http://nlp.stanford.edu/software/corenlp.shtml, 2012.
[12]
E. Agichtein, E. Brill, S. Dumais. Improving web search ranking by incorporating user behavior information. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages 19--26, 2006.
[13]
D. Zhang, W.S. Lee. Query-By-Multiple-Examples using Support Vector Machines. Journal of Digital Information Management. 7(4): 202--210, 2009.

Cited By

View all
  • (2023)Multi-level Correlation Matching for Legal Text Similarity Modeling with Multiple ExamplesWeb Information Systems Engineering – WISE 202310.1007/978-981-99-7254-8_48(621-632)Online publication date: 21-Oct-2023
  • (2022)Predicting resistive wall mode stability in NSTX through balanced random forests and counterfactual explanationsNuclear Fusion10.1088/1741-4326/ac44af62:3(036002)Online publication date: 18-Jan-2022
  • (2021)Homogenization Algorithm Based on Incremental L2-Discrepancy Filtering for Data-Driven ModellingArtificial Intelligence in Industry 4.010.1007/978-3-030-61045-6_6(73-83)Online publication date: 28-Feb-2021
  • Show More Cited By

Index Terms

  1. IFME: information filtering by multiple examples with under-sampling in a digital library environment

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      JCDL '13: Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries
      July 2013
      480 pages
      ISBN:9781450320771
      DOI:10.1145/2467696
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 22 July 2013

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. information retrieval
      2. positive unlabeled learning
      3. relevance feedback
      4. search by multiple examples
      5. text classification

      Qualifiers

      • Research-article

      Conference

      JCDL '13
      Sponsor:
      JCDL '13: 13th ACM/IEEE-CS Joint Conference on Digital Libraries
      July 22 - 26, 2013
      Indiana, Indianapolis, USA

      Acceptance Rates

      JCDL '13 Paper Acceptance Rate 28 of 95 submissions, 29%;
      Overall Acceptance Rate 415 of 1,482 submissions, 28%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)4
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 30 Aug 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)Multi-level Correlation Matching for Legal Text Similarity Modeling with Multiple ExamplesWeb Information Systems Engineering – WISE 202310.1007/978-981-99-7254-8_48(621-632)Online publication date: 21-Oct-2023
      • (2022)Predicting resistive wall mode stability in NSTX through balanced random forests and counterfactual explanationsNuclear Fusion10.1088/1741-4326/ac44af62:3(036002)Online publication date: 18-Jan-2022
      • (2021)Homogenization Algorithm Based on Incremental L2-Discrepancy Filtering for Data-Driven ModellingArtificial Intelligence in Industry 4.010.1007/978-3-030-61045-6_6(73-83)Online publication date: 28-Feb-2021
      • (2018)Data Exploration Using Example-Based MethodsSynthesis Lectures on Data Management10.2200/S00881ED1V01Y201810DTM05310:4(1-164)Online publication date: 27-Nov-2018
      • (2018)Leveraging Online Word of Mouth for Personalized App RecommendationIEEE Transactions on Computational Social Systems10.1109/TCSS.2018.28788665:4(1061-1070)Online publication date: Dec-2018
      • (2018)Subscription and Redemption Prediction in Mutual Funds Using Machine Learning Techniques2018 IEEE International Conference on Big Data (Big Data)10.1109/BigData.2018.8622149(4692-4697)Online publication date: Dec-2018
      • (2016)A Novel Classifier - Weighted Features Cost-Sensitive SVM2016 IEEE International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData)10.1109/iThings-GreenCom-CPSCom-SmartData.2016.133(598-603)Online publication date: Dec-2016
      • (2014)Search by multiple examplesProceedings of the 7th ACM international conference on Web search and data mining10.1145/2556195.2556206(667-672)Online publication date: 24-Feb-2014
      • (2014)Learning to Rank with Only Positive ExamplesProceedings of the 2014 13th International Conference on Machine Learning and Applications10.1109/ICMLA.2014.19(87-92)Online publication date: 3-Dec-2014

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media