Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/319950.319964acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
Article
Free access

Task-oriented world wide web retrieval by document type classification

Published: 01 November 1999 Publication History

Abstract

This paper proposes a novel approach to accurately searching Web pages for relevant information in problem solving by specifying a Web document category instead of the user's task. Accessing information from World Wide Web pages as an approach to problem solving has become commonplace. However, such a search is difficult with current search services, since these services only provide keyword-based search methods that are equivalent to narrowing down the target references according to domains. However, problem solving usually involves both a domain and a task. Accordingly, our approach is based on problem solving tasks. To specify a user's problem solving task, we introduce the concept of document types that directly relate to the problem solving tasks; with this approach, users can easily designate problem solving tasks. We implemented PageTypeSearch system based on our approach. Classifier of PageTypeSearch classifies Web pages into the document types by comparing their pages with typical structural characteristics of the types. We compare PageTypeSearch using the document typeindices with a conventional keyword-based search system in experiments. The average precision of the document type-based search is 88.9%, while the average precision of the keyword-based search is 31.2%. Moreover, the number of irrelevant references gathered by our system is about one-thirteenth that of traditional keyword-based search systems. Our approach has practical advantages for problem solving by introducing the viewpoint of tasks to achieve higher performance.

References

[1]
Chidanand Apte, Fred Damerau, and Sholom M. Weiss, Automated learning of decision roles for text categorization. ACM Transactions on Information Systems, Vol.12, No.3, pp.233-251, 1994.
[2]
William W. Cohen and Yoram Singer. Context-sensitive learning methods for text categorization. In Proceedings of the 19'h Annual International ACM $1GIR Conference on Research and Development in Information Retrieval, pp.307-315, 1996.
[3]
Wai Lam, Kon F. Low and Chao Y. Ho, Using a Bayesian Network Induction Approach for Text Categorization. In Proceedings of 15th International Joint Conference on Artificial Intelligence, pp.745-750, 1997.
[4]
Robert B. Doorenbos, Oren Etzioni and Daniel S. Weld, A Scalable Comparison-Shopping Agent for the World-Wide Web. University of Washington, Department of Computer Science and Engineering Technical Report UW-CSE-96-01-03, 1996.
[5]
Robin Burke, Kristian Hammond, Vladimir Kulyukin, Steven Lytinen, Noriko Tomuro, and Scott Schoenberg, Question Answering from Frequently- Asked Question Files: Experiences with the FAQ Finder System. University of Chicago, Department of Computer Science Technical Report TR-97-05, 1997.
[6]
Major Bernard J. Jansen, Amanda Spink, Judy Bateman and Tefko Saracevic, Real Life Information Retrieval: A Study Of User Queries On The Web. SIGIR FORUM, Vol.32, Num. 1, pp.5-17, 1998.
[7]
Hinrich Schutze, David A. Hull and Jan O. Pedersen, A comparison of classifiers and document representations for the routing problem. In Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp.229-237,1995.
[8]
Mark Craven, Dan DiPasquo, Dayne Freitag, Andrew McCallum, Tom Mitchell, Kamal Nigam and Sean Slattery, Learning to Extract Symbolic Knowledge from the World Wide Web. In Proceedings of the 15th National Conference on Artificial Intelligence, pp.509- 516, 1998.
[9]
Jonathan Shakes, Marc Langheinrich and Oren Etzioni, Dynamic Reference Sifting: A Case Study in the Homepage Domain. In Proceedings of Sixth International World Wide Web Conference, pp. 189- 200, 1997.

Cited By

View all
  • (2021)Large Scale Subject Category Classification of Scholarly Papers With Deep Attentive Neural NetworksFrontiers in Research Metrics and Analytics10.3389/frma.2020.6003825Online publication date: 10-Feb-2021
  • (2012)A path-based approach for web page retrievalWorld Wide Web10.1007/s11280-011-0133-515:3(257-283)Online publication date: 1-May-2012
  • (2009)Looking AheadProceedings of the 12th IFIP TC 13 International Conference on Human-Computer Interaction: Part I10.1007/978-3-642-03655-2_42(378-391)Online publication date: 24-Aug-2009
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '99: Proceedings of the eighth international conference on Information and knowledge management
November 1999
564 pages
ISBN:1581131461
DOI:10.1145/319950
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 November 1999

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. WWW
  2. classification
  3. document type
  4. information retrieval

Qualifiers

  • Article

Conference

CIKM99
Sponsor:
CIKM99: Conference on Information and Knowledge Management
November 2 - 6, 1999
Missouri, Kansas City, USA

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)44
  • Downloads (Last 6 weeks)12
Reflects downloads up to 16 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2021)Large Scale Subject Category Classification of Scholarly Papers With Deep Attentive Neural NetworksFrontiers in Research Metrics and Analytics10.3389/frma.2020.6003825Online publication date: 10-Feb-2021
  • (2012)A path-based approach for web page retrievalWorld Wide Web10.1007/s11280-011-0133-515:3(257-283)Online publication date: 1-May-2012
  • (2009)Looking AheadProceedings of the 12th IFIP TC 13 International Conference on Human-Computer Interaction: Part I10.1007/978-3-642-03655-2_42(378-391)Online publication date: 24-Aug-2009
  • (2009)PathRankProceedings of the 31th European Conference on IR Research on Advances in Information Retrieval10.1007/978-3-642-00958-7_32(350-361)Online publication date: 18-Apr-2009
  • (2009)Monitoring Web Resources Discovery by Reusing Classification KnowledgeSocial Computing and Behavioral Modeling10.1007/978-1-4419-0056-2_17(1-8)Online publication date: 23-Feb-2009
  • (2008)Web page genre classificationProceedings of the 2008 ACM symposium on Applied computing10.1145/1363686.1364247(2353-2357)Online publication date: 16-Mar-2008
  • (2007)Searching documents based on relevance and typeProceedings of the 29th European conference on IR research10.5555/1763653.1763729(629-636)Online publication date: 2-Apr-2007
  • (2007)Searching Documents Based on Relevance and TypeAdvances in Information Retrieval10.1007/978-3-540-71496-5_60(629-636)Online publication date: 2007
  • (2006)Categorizing web search results into meaningful and stable categories using fast-feature techniquesProceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries10.1145/1141753.1141801(210-219)Online publication date: 11-Jun-2006
  • (2005)Many a little makes a mickle - Enabling mailing list archive for corporate knowledge sharingProceedings of the 2005 International Conference on Active Media Technology, 2005. (AMT 2005).10.1109/AMT.2005.1505268(57-62)Online publication date: 2005
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media