Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

The Impacts of Structural Difference and Temporality of Tweets on Retrieval Effectiveness

Published: 01 November 2013 Publication History
  • Get Citation Alerts
  • Abstract

    To explore the information seeking behaviors in microblogosphere, the microblog track at TREC 2011 introduced a real-time ad-hoc retrieval task that aims at ranking relevant tweets in reverse-chronological order. We study this problem via a two-phase approach: 1) retrieving tweets in an ad-hoc way; 2) utilizing the temporal information of tweets to enhance the retrieval effectiveness of tweets. Tweets can be categorized into two types. One type consists of short messages not containing any URL of a Web page. The other type has at least one URL of a Web page in addition to a short message. These two types of tweets have different structures. In the first phase, to address the structural difference of tweets, we propose a method to rank tweets using the divide-and-conquer strategy. Specifically, we first rank the two types of tweets separately. This produces two rankings, one for each type. Then we merge these two rankings of tweets into one ranking. In the second phase, we first categorize queries into several types by exploring the temporal distributions of their top-retrieved tweets from the first phase; then we calculate the time-related relevance scores of tweets according to the classified types of queries; finally we combine the time scores with the IR scores from the first phase to produce a ranking of tweets. Experimental results achieved by using the TREC 2011 and TREC 2012 queries over the TREC Tweets2011 collection show that: (i) our way of ranking the two types of tweets separately and then merging them together yields better retrieval effectiveness than ranking them simultaneously; (ii) our way of incorporating temporal information into the retrieval process yields further improvements, and (iii) our method compares favorably with state-of-the-art methods in retrieval effectiveness.

    References

    [1]
    Ailon, N., Charikar, M., and Newman, A. 2008. Aggregating inconsistent information: Ranking and clustering. J. ACM 55, 5, 23:1--23:27.
    [2]
    Amati, G., Amodeo, G., and Gaibisso, C. 2012. Survival analysis for freshness in microblogging search. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM’12). ACM, New York, 2483--2486.
    [3]
    Amodeo, G., Amati, G., and Gambosi, G. 2011. On relevance, time and query expansion. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management (CIKM’11). ACM, New York, 1973--1976.
    [4]
    Berberich, K., Bedathur, S., Alonso, O., and Weikum, G. 2010. A language modeling approach for temporal information needs. In Proceedings of the 32nd European conference on Advances in Information Retrieval (ECIR’10). 13--25.
    [5]
    Bian, J., Li, X., Li, F., Zheng, Z., and Zha, H. 2010. Ranking specialization for web search: A divide-and-conquer approach by using topical ranksvm. In Proceedings of the 19th International Conference on World Wide Web (WWW’10). 131--140.
    [6]
    Choi, J. and Croft, W. B. 2012. Temporal models for microblogs. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM’12). ACM, New York, 2491--2494.
    [7]
    Choi, J., Croft, W. B., and Kim, J. Y. 2012. Quality models for microblog retrieval. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM’12). ACM, New York, 1834--1838.
    [8]
    Cohen, W. W., Schapire, R. E., and Singer, Y. 1998. Learning to order things. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS’97). 451--457.
    [9]
    Dai, N. and Davison, B. D. 2010. Freshness matters: In flowers, food, and web authority. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’10). 114--121.
    [10]
    Dai, N., Shokouhi, M., and Davison, B. D. 2011. Learning to rank for freshness and relevance. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’11). 95--104.
    [11]
    Dakka, W., Gravano, L., and Ipeirotis, P. G. 2012. Answering general time-sensitive queries. IEEE Trans. Knowl. Data Eng. 24, 220--235.
    [12]
    Dong, A., Chang, Y., Zheng, Z., Mishne, G., Bai, J., Zhang, R., Buchner, K., Liao, C., and Diaz, F. 2010a. Towards recency ranking in web search. In Proceedings of the 3rd ACM International Conference on Web Search and Data Mining (WSDM’10). 11--20.
    [13]
    Dong, A., Zhang, R., Kolari, P., Bai, J., Diaz, F., Chang, Y., Zheng, Z., and Zha, H. 2010b. Time is of the essence: Improving recency ranking using Twitter data. In Proceedings of the 19th International Conference on World Wide Web (WWW’10). 331--340.
    [14]
    Duan, Y., Jiang, L., Qin, T., Zhou, M., and Shum, H.-Y. 2010. An empirical study on learning to rank of tweets. In Proceedings of the 23rd International Conference on Computational Linguistics (COLING’10). 295--303.
    [15]
    Efron, M. and Golovchinsky, G. 2011. Estimation methods for ranking recent information. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’11). 495--504.
    [16]
    Efron, M., Organisciak, P., and Fenlon, K. 2012. Improving retrieval of short texts through document expansion. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’12). ACM, New York, 911--920.
    [17]
    Elsas, J. L. and Dumais, S. T. 2010. Leveraging temporal dynamics of document content in relevance ranking. In Proceedings of the 3rd ACM International Conference on Web Search and Data Mining (WSDM’10). 1--10.
    [18]
    Han, Z., Li, X., Yang, M., Qi, H., Li, S., and Zhao, T. 2012. Hit at trec 2012 microblog track. In Proceedings of Text REtrieval Conference.
    [19]
    Herbrich, R., Graepel, T., and Obermayer, K. 2000. Large margin rank boundaries for ordinal regression. In Advances in Large Margin Classifiers, P. J. Bartlett, B. Schölkopf, D. Schuurmans, and A. J. Smola Eds., 115--132.
    [20]
    Hüllermeier, E. and Fürnkranz, J. 2010. On predictive accuracy and risk minimization in pairwise label ranking. J. Comput. Syst. Sci. 76, 1, 49--62.
    [21]
    Joachims, T. 1999. Advances in Kernel Methods. 169--184.
    [22]
    Joachims, T. 2002. Optimizing search engines using clickthrough data. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’02). 133--142.
    [23]
    Jones, R. and Diaz, F. 2007. Temporal profiles of queries. ACM Trans. Inf. Syst. 25, 3.
    [24]
    Keikha, M., Gerani, S., and Crestani, F. 2011a. Temper: A temporal relevance feedback method. In Proceedings of the 33d European Conference on Advances in Information Retrieval (ECIR’11). Springer, 436--447.
    [25]
    Keikha, M., Gerani, S., and Crestani, F. 2011b. Time-based relevance models. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’11). ACM, New York, 1087--1088.
    [26]
    Kulkarni, A., Teevan, J., Svore, K. M., and Dumais, S. T. 2011. Understanding temporal query dynamics. In Proceedings of the 4th ACM International Conference on Web Search and Data Mining (WSDM’11). 167--176.
    [27]
    Laplace, P.-S. 1774. Mémoire sur la probabilité des causes par les évènements. Mémoires de l’Academie Royale des Sciences Presentés par Divers Savan., 621--656.
    [28]
    Lee, J. H. 1997. Analyses of multiple evidence combination. In Proceedings of the 20th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’97). ACM, New York, 267--276.
    [29]
    Li, X. and Croft, W. B. 2003. Time-based language models. In Proceedings of the 12th ACM International Conference on Information and Knowledge Management (CIKM’03). 69--475.
    [30]
    Liang, F., Qiang, R., and Yang, J. 2012. Exploiting real-time information retrieval in the microblogosphere. In Proceedings of the 12th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL’12). 267--276.
    [31]
    Liu, S., Liu, F., Yu, C., and Meng, W. 2004. An effective approach to document retrieval via utilizing wordnet and recognizing phrases. In Proceedings of the 27th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’04). 266--272.
    [32]
    Massoudi, K., Tsagkias, M., de Rijke, M., and Weerkamp, W. 2011. Incorporating query expansion and quality indicators in searching microblog posts. In Proceedings of the 32nd European conference on Advances in Information Retrieval (ECIR’10). Springer, 362--367.
    [33]
    McCreadie, R., MacDonald, C., Santos, R., and Ounis, I. 2011. University of glasgow at trec 2011: Experiments with terrier in crowdsourcing, microblog, and web tracks. In Proceedings of Text REtrieval Conference.
    [34]
    Metzler, D. and Cai, C. 2011. Usc/isi at trec 2011: Microblog track (notebook version). In Proceedings of Text REtrieval Conference.
    [35]
    Ounis, I., MacDonald, C., Lin, J., and Soboroff, I. 2011. Overview of the trec 2011 microblog track. In Proceedings of Text REtrieval Conference.
    [36]
    Rijsbergen, C. J. V. 1979. Information Retrieval 2nd Ed. Butterworth-Heinemann, Newton, MA.
    [37]
    Robertson, S., Walker, S., Jones, S., Hancock-Beaulieu, M., and Gatford, M. 1996. Okapi at TREC-3. 109--126.
    [38]
    Robertson, S., Zaragoza, H., and Taylor, M. 2004. Simple bm25 extension to multiple weighted fields. In Proceedings of the 13th ACM International Conference on Information and Knowledge Management (CIKM’04). 42--49.
    [39]
    Shaw, J. A., Fox, E. A., Shaw, J. A., and Fox, E. A. 1994. Combination of multiple searches. In Proceedings of the 2nd Text REtrieval Conference (TREC-2). 243--252.
    [40]
    Soboroff, I., Ounis, I., and Lin, J. 2012. Overview of the trec 2012 microblog track. In Proceedings of Text REtrieval Conference.
    [41]
    Zhang, W., Liu, S., Yu, C., Sun, C., Liu, F., and Meng, W. 2007. Recognition and classification of noun phrases in queries for effective retrieval. In Proceedings of the 16th ACM International Conference on Information and Knowledge Management (CIKM’07). ACM, New York, 711--720.
    [42]
    Zhang, X., He, B., Luo, T., and Li, B. 2012. Query-biased learning to rank for real-time twitter search. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM’12). ACM, New York, 1915--1919.

    Cited By

    View all
    • (2019)A Deep Learning-based Ranking Approach for Microblog RetrievalProcedia Computer Science10.1016/j.procs.2019.09.190159(352-362)Online publication date: 2019
    • (2017)Microblog Retrieval Using Ensemble of Feature Sets through Supervised Feature SelectionIEICE Transactions on Information and Systems10.1587/transinf.2016DAP0032E100.D:4(793-806)Online publication date: 2017
    • (2015)Combining temporal and content aware features for microblog retrieval2015 2nd International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA)10.1109/ICAICTA.2015.7335353(1-6)Online publication date: Aug-2015

    Index Terms

    1. The Impacts of Structural Difference and Temporality of Tweets on Retrieval Effectiveness

        Recommendations

        Reviews

        Xiannong Meng

        A novel method of evaluating tweets is introduced in this paper. Most retrieval algorithms do not differentiate the structure of tweets. The authors show convincingly that it does have an impact on retrieval effectiveness. In their study, two types of tweets are evaluated separately, those containing only plain text, and those containing any URLs. The key is that in ranking the tweets with URLs, the ranker considers the content of the page(s) pointed to by the URLs in addition to the tweets. After ranking the two types of tweets, a support vector machine-based classifier with 18 features is used to evaluate the relevance between the tweets and the query. If a tweet is time-sensitive, the temporal information of both the tweet and its parent is taken into consideration. Data from TREC 2011, TREC 2012, and TREC Tweets 2011 are used to evaluate the algorithm. The results indicate that (1) it is more effective to rank the two types of tweets separately and then merge them; (2) incorporating temporal information yields further improvements; and (3) the proposed "method compares favorably with state-of-the-art methods in retrieval effectiveness." The novelty of the proposed method is that it offers the ability to rank the tweets with and without URLs separately and to incorporate the temporal information in ranking the tweets. The paper is well written and self-contained. Many examples illustrate the concepts discussed. The readers can explore the topic further using the abundant references provided. Online Computing Reviews Service

        Access critical reviews of Computing literature here

        Become a reviewer for Computing Reviews.

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Transactions on Information Systems
        ACM Transactions on Information Systems  Volume 31, Issue 4
        November 2013
        192 pages
        ISSN:1046-8188
        EISSN:1558-2868
        DOI:10.1145/2536736
        Issue’s Table of Contents
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 01 November 2013
        Accepted: 01 July 2013
        Revised: 01 April 2013
        Received: 01 September 2012
        Published in TOIS Volume 31, Issue 4

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. Ad-hoc retrieval of tweets
        2. learning to rank
        3. query temporal categorization

        Qualifiers

        • Research-article
        • Research
        • Refereed

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)4
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 26 Jul 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2019)A Deep Learning-based Ranking Approach for Microblog RetrievalProcedia Computer Science10.1016/j.procs.2019.09.190159(352-362)Online publication date: 2019
        • (2017)Microblog Retrieval Using Ensemble of Feature Sets through Supervised Feature SelectionIEICE Transactions on Information and Systems10.1587/transinf.2016DAP0032E100.D:4(793-806)Online publication date: 2017
        • (2015)Combining temporal and content aware features for microblog retrieval2015 2nd International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA)10.1109/ICAICTA.2015.7335353(1-6)Online publication date: Aug-2015

        View Options

        Get Access

        Login options

        Full Access

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media