Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

A data-centric approach to feed search in blogs

Published: 01 August 2012 Publication History

Abstract

The explosive growth of blogs creates a critical demand for information retrieval techniques to effectively search for required and meaningful information. This paper studied the blog distillation and feed search task in the TREC Blog Track, which was designed to search for the relevant feeds which have a principal and recurring interest in a particular topic or query. In this paper, a novel data-centric approach is proposed which achieved good results compared to others in the TREC Blog Track.

References

[1]
Agarwal, N., Liu, H., Subramanya, S., Salerno, J.J. and Yu, P.S. (2009) 'Connecting sparsely distributed similar bloggers', in Proceedings of the 2009 Ninth IEEE International Conference on Data Mining, ICDM '09, pp. 11-20.
[2]
Chen, M. and Ohta, T. (2010) 'Using blog content depth and breadth to access and classify blogs', International Journal of Business and Information, Vol. 5, No. 1, pp. 26-45.
[3]
Chen, Y., Tsai, F.S. and Chan, K.L. (2008) 'Machine learning techniques for business blog search and mining', Expert Syst. Appl., Vol. 35, No. 3, pp. 581-590.
[4]
Cleverdon, C., Mills, J. and Keen, E. (1966) 'Factors determining the performance of indexing systems', Technical report.
[5]
Elsas, J.L., Arguello, J., Callan, J. and Carbonell, J.G. (2008) 'Retrieval and feedback models for blog feed search', in SIGIR '08: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 347-354, ACM, New York, NY, USA.
[6]
Ernsting, B., Weerkamp, W. and de Rijke, M. (2007) 'Language modeling approaches to blog post and feed finding', in Proceedings of the Sixteenth Text REtrieval Conference.
[7]
Garca-Crespo, A., Colomo-Palacios, R., Gmez-Berbs, J. and Ruiz-Mezcua, B. (2010) 'SEMO: a framework for customer social networks analysis based on semantics', Journal of Information Technology, Vol. 25, No. 2, pp. 278-188.
[8]
Hannah, D., Macdonald, C., He, B., Peng, J. and Ounis, I. (2007) 'University of Glasgow at TREC 2007: experiments in blog and enterprise tracks with terrier', in The Sixteenth Text REtrieval Conference (TREC 2007) Proceedings.
[9]
He, S., Zheng, X., Zhang, C. and Wang, L. (2011) 'Topic-oriented information detection and scoring', Lecture Notes in Computer Science (LNCS), Pacific Asia Workshop on Intelligence and Security Informatics 2011, Vol. 6749, No. 2011, pp. 36-42.
[10]
Huang, Y-M., Huang, T-C. and Huang, Y-M. (2010) 'Applying an intelligent notification mechanism to blogging systems utilizing a genetic-based information retrieval approach', Expert Syst. Appl., Vol. 37, No. 1, pp. 705-715.
[11]
Inches, G., Carman, M.J. and Crestani, F. (2010) 'Statistics of online user-generated short documents: ECIR 2010', Lecture Notes in Computer Science, Vol. 5993, pp. 649-652.
[12]
Kolari, P., Finin, T. and Joshi, A. (2006) 'SVMs for the blogosphere: blog identification and splog detection', in AAAI Spring Symposium on Computational Approaches to Analysing Weblogs.
[13]
Kwee, A.T., Tsai, F.S. and Tang, W. (2009) 'Sentence-level novelty detection in English and Malay', Lecture Notes in Computer Science (LNCS, PAKDD '09), Vol. 5476, No. 2009, pp. 40-51.
[14]
Lakshmanan, G. and Oberhofer, M. (2010) 'Knowledge discovery in the blogosphere: approaches and challenges', IEEE Internet Computing, Vol. 14, No. 2, pp. 24-32.
[15]
Macdonald, C. and Ounis, I. (2006) 'The TREC Blogs06 collection: creating and analysing a blog test collection', Technical report, Dept. of Computing Science, University of Glasgow.
[16]
Macdonald, C. and Ounis, I. (2008) 'Key blog distillation: ranking aggregates', in CIKM '08: Proceeding of the 17th ACM Conference on Information and Knowledge Management, pp. 1043-1052, ACM, New York, NY, USA.
[17]
Macdonald, C., Ounis, I. and Soboroff, I. (2007) 'Overview of the TREC-2007 Blog Track', in The Sixteenth Text REtrieval Conference (TREC 2007) Proceedings.
[18]
Ng, K.W., Tsai, F.S., Chen, L. and Goh, K.C. (2007) 'Novelty detection for text documents using named entity recognition', in 2007 6th International Conference on Information, Communications and Signal Processing, ICICS.
[19]
Ounis, I., Amati, G., Plachouras, V., He, B., Macdonald, C. and Lioma, C. (2006a) 'Terrier: a high performance and scalable information retrieval platform', in Proceedings of ACM SIGIR'06 Workshop on Open Source Information Retrieval (OSIR 2006).
[20]
Ounis, I., de Rijke, M., Macdonald, C., Mishne, G.A. and Soboroff, I. (2006b) 'Overview of the TREC-2006 Blog Track', in TREC 2006 Working Notes, pp. 15-27.
[21]
Ounis, I., Macdonald, C. and Soboroff, I. (2008a) 'On the TREC Blog Track', in International Conference on Weblogs and Social Media.
[22]
Ounis, I., Macdonald, C. and Soboroff, I. (2008b) 'Overview of the TREC-2008 Blog Track', in TREC 2008 Working Notes.
[23]
Rocchio, J. (1971) 'Relevance feedback in information retrieval', in The SMART Retrieval System: Experiments in Automatic Document Processing, pp. 313-323.
[24]
Seki, K., Kino, Y., Sato, S. and Uehara, K. (2007) 'TREC 2007 Blog Track experiments at Kobe University', in Proceedings of the Sixteenth Text Retrieval Conference.
[25]
Seo, J. and Croft, W.B. (2008) 'Blog site search using resource selection', in CIKM '08: Proceeding of the 17th ACM Conference on Information and Knowledge Management, pp. 1053-1062, ACM, New York, NY, USA.
[26]
Tsai, F.S. (2010) 'Review of techniques for intelligent novelty mining', Information Technology Journal, Vol. 9, No. 6, pp. 1255-1261.
[27]
Tsai, F.S. (2011a) 'Dimensionality reduction for blog tag mining', International Journal of Web Engineering and Technology, Vol. 6, No. 3, pp. 286-298.
[28]
Tsai, F.S. (2011b) 'Dimensionality reduction techniques for blog visualization', Expert Systems with Applications, Vol. 38, No. 3, pp. 2766-2773.
[29]
Tsai, F.S. and Chan, K.L. (2007) 'Detecting cyber security threats in weblogs using probabilistic models', Lecture Notes in Computer Science LNCS, PAISI 2007, Vol. 4430, No. 2007, pp. 46-57.
[30]
Tsai, F.S. and Kwee, A.T. (2011) 'Database optimization for novelty mining of business blogs', Expert Systems with Applications, Vol. 38, No. 9, pp. 1040-11047.
[31]
Tsai, F.S. and Zhang, Y. (2011) 'D2S: document-to-sentence framework for novelty detection', Knowledge and Information Systems, Vol. 29, No. 2, pp. 419-433.
[32]
Tsai, F.S., Tang, W. and Chan, K.L. (2010) 'Evaluation of metrics for sentence-level novelty mining', Information Sciences, Vol. 180, No. 12, pp. 2359-2374.
[33]
Tsai, F.S., Zhang, Y., Kwee, A.T. and Tang, W. (2011) 'Multilingual novelty detection', Expert Systems with Applications, Vol. 38, No. 1, pp. 652-658.
[34]
Van Rijsbergen, C. (1979) Information Retrieval, Butterworth-Heinemann, Newton, MA, USA.
[35]
Wu, S. (2012) 'Applying the data fusion technique to blog opinion retrieval', Expert Syst. Appl., Vol. 39, No. 1, pp. 1346-1353.
[36]
Zhang, Y., Tsai, F.S. and Kwee, A.T. (2011) 'Multilingual sentence categorization and novelty mining', Information Processing and Management: an International Journal, Vol. 47, No. 5, pp. 667-675.

Cited By

View all
  1. A data-centric approach to feed search in blogs

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image International Journal of Web Engineering and Technology
    International Journal of Web Engineering and Technology  Volume 7, Issue 3
    August 2012
    96 pages
    ISSN:1476-1289
    EISSN:1741-9212
    Issue’s Table of Contents

    Publisher

    Inderscience Publishers

    Geneva 15, Switzerland

    Publication History

    Published: 01 August 2012

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 09 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    View options

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media