Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1247480.1247601acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article

AllInOneNews: development and evaluation of a large-scale news metasearch engine

Published: 11 June 2007 Publication History

Abstract

AllInOneNews is the largest news metasearch engine in the world, connecting to over 1,000 news sites over 150 countries. Implementing a large-scale metasearch engine like AllInOneNews needs to overcome unique challenges not faced by building small metasearch engines such as developing highly scalable search engine selection techniques. In this paper, we discuss these unique challenges and our solutions to these challenges. We also discuss some novel features of AllInOneNews such as highly automated solution and semantic query match. This paper also reports the results of a comparative evaluation of three commercial news search systems, one search engine - Google News and two metasearch engines - Mamma News and AllInOneNews. Several measures such as effectiveness, diversity and time-sensitivity are used to perform the comparison. Another contribution of this paper is that we introduce a novel scheme to compare multiple news search systems in a combined measure that takes both relevance and time-sensitivity of retrieved information into consideration.

References

[1]
C. Baumgarten. A probabilistic solutions to the selection and fusion problem in distributed information retrieval. ACM SIGIR Conference, 1999.
[2]
M. Bergman. The Deep Web: Surfacing Hidden Value. White Paper of CompletePlanet at http://brightplanet.com/pdf/deepwebwhitepaper.pdf, 2001.
[3]
L. Barbosa, J. Freire. Searching for hidden-web databases. 8th International Workshop on WebDB, 2005.
[4]
J. Callan, Z. Lu, and. W. Croft. Searching Distributed Collections with Inference Networks. ACM SIGIR, 1995, pp.21--28.
[5]
J. Cope, N. Craswell, D. Hawking. Automated Discovery of Search Interfaces on the Web. ADC 2003: 181--189.
[6]
D. Dreilinger, and A. Howe. Experiences with selecting search engines using metasearch. ACM Transactions on Information Systems, July, 1997, pp.195--222.
[7]
Y. Fan, and S. Gauch. Adaptive Agents for Information Gathering from Multiple, Distributed Information Sources. 1999 AAAI Symposium on Intelligent Agents in Cyberspace, Stanford University, March 1999.
[8]
S. Gauch, G. Wang, and M. Gomez. ProFusion: Intelligent fusion from multiple, distributed search engines. Journal of Universal Computer Science, 1996.
[9]
L. Gravano, and H. Garcia-Molina. Generalizing gloss to vector-space databases and broker hierarchies. VLDB, 1995, pp.78--89.
[10]
L. Gravano, and H. Garcia-Molina. Merging ranks from heterogeneous Internet sources. VLDB, 1997, pp.196--205.
[11]
D. Hawking, N. Craswell, and K. Griffiths. Which search engine is best at finding online services? WWW conference, poster, 2001.
[12]
D. Hawking, N. Craswell, P. Bailey, K. Griffiths. Measuring Search Engine Quality. Information Retrieval, 4(1), 2001.
[13]
K. L. Liu, C. Yu, W. Meng, W. Wu, and N. Rishe. A Statistical Method for Estimating the Usefulness of Text Databases. IEEE TKDE, 2002.
[14]
Y. Lu, W. Meng, L. Shu, C. Yu, and K. L. Liu. Evaluation of Result Merging Strategies for Metasearch Engines. WISE Conference, pp.53--66, November 2005.
[15]
Y. Lu, W. Meng, W. Zhang, K. L. Liu, and C. Yu. Automatic Extraction of Publication Time from News Search Results. Int'l Workshop on Challenges in Web Information Retrieval and Integration (WIRI2006), April 2006.
[16]
U. Manber, and P. Bigot. The Search Broker. USENIX Symposium and Internet Techniques and Systems, Monterey, California, December, 1997, pp.231--239.
[17]
W. Meng, K. L. Liu, C. Yu, X. Wang, Y. Chang and N. Rishe. Determining Text Databases to Search in the Internet. VLDB, 1998.
[18]
W. Meng, Z. Wu, C. Yu, and Z. Li. A Highly-Scalable and Effective Method for Metasearch. ACM Transactions on Information Systems 19(3), pp.310--335, July 2001.
[19]
W. Meng, C. Yu, and K. L. Liu. Building Efficient and Effective Metasearch Engines. ACM Computing Surveys, 34(1), March 2002, pp.48--84.
[20]
Y. Rasolofo, D. Hawking, and J. Savoy. Result merging strategies for a current news metasearcher. Information Processing & Management, 39, 2003, pp.581--609.
[21]
Z. Wu, W. Meng, C. Yu, and Z. Li. Towards a highly scalable and effective metasearch engine. WWW Conference, Hong Kong, 2001.
[22]
Z. Wu, V. Raghavan, H. Qian, V. Rama K, W. Meng, H. He, and C. Yu. Towards Automatic Incorporation of Search Engines into a large-scale Metasearch Engine. IEEE/WIC International Conference on Web Intelligence, 2003.
[23]
C. Yu, W. Meng, K.L. Liu, W. Wu and N. Rishe. Efficient and Effective Metasearch for a Large Number of Text Databases. ACM CIKM, November 1999.
[24]
C. Yu, K. Liu, W. Meng, Z. Wu, and N. Rishe. A Methodology to Retrieve Text Documents from Multiple Databases. IEEE TKDE, Vol.14, No.6, November/December 2002, pp.1347--1361.
[25]
C. Yu, and W. Meng. Web Search Technology. In The Internet Encyclopedia edited by Hossein Bidgoli, Wiley Publishers, pp.738--753, 2003.
[26]
B. Yuwono, and D. Lee. Server Ranking for Distributed Text Resource Systems on the Internet. DASFAA, 1997, pp.391--400.
[27]
H. Zhao, W. Meng, Z. Wu, V. Raghavan, and C. Yu. Fully Automatic Wrapper Generation for Search Engines. WWW Conference, pp.66--75, 2005.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '07: Proceedings of the 2007 ACM SIGMOD international conference on Management of data
June 2007
1210 pages
ISBN:9781595936868
DOI:10.1145/1247480
  • General Chairs:
  • Lizhu Zhou,
  • Tok Wang Ling,
  • Program Chair:
  • Beng Chin Ooi
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 June 2007

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. metasearch engine
  2. news search
  3. search engine
  4. time-sensitive ranking

Qualifiers

  • Article

Conference

SIGMOD/PODS07
Sponsor:

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)1
Reflects downloads up to 04 Oct 2024

Other Metrics

Citations

Cited By

View all

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media