research-article

The Impacts of Structural Difference and Temporality of Tweets on Retrieval Effectiveness

Authors:

Weiyi MengAuthors Info & Claims

ACM Transactions on Information Systems (TOIS), Volume 31, Issue 4

Article No.: 21, Pages 1 - 38

https://doi.org/10.1145/2500751

Published: 01 November 2013 Publication History

Abstract

To explore the information seeking behaviors in microblogosphere, the microblog track at TREC 2011 introduced a real-time ad-hoc retrieval task that aims at ranking relevant tweets in reverse-chronological order. We study this problem via a two-phase approach: 1) retrieving tweets in an ad-hoc way; 2) utilizing the temporal information of tweets to enhance the retrieval effectiveness of tweets. Tweets can be categorized into two types. One type consists of short messages not containing any URL of a Web page. The other type has at least one URL of a Web page in addition to a short message. These two types of tweets have different structures. In the first phase, to address the structural difference of tweets, we propose a method to rank tweets using the divide-and-conquer strategy. Specifically, we first rank the two types of tweets separately. This produces two rankings, one for each type. Then we merge these two rankings of tweets into one ranking. In the second phase, we first categorize queries into several types by exploring the temporal distributions of their top-retrieved tweets from the first phase; then we calculate the time-related relevance scores of tweets according to the classified types of queries; finally we combine the time scores with the IR scores from the first phase to produce a ranking of tweets. Experimental results achieved by using the TREC 2011 and TREC 2012 queries over the TREC Tweets2011 collection show that: (i) our way of ranking the two types of tweets separately and then merging them together yields better retrieval effectiveness than ranking them simultaneously; (ii) our way of incorporating temporal information into the retrieval process yields further improvements, and (iii) our method compares favorably with state-of-the-art methods in retrieval effectiveness.

References

[1]

Ailon, N., Charikar, M., and Newman, A. 2008. Aggregating inconsistent information: Ranking and clustering. J. ACM 55, 5, 23:1--23:27.

Digital Library

[2]

Amati, G., Amodeo, G., and Gaibisso, C. 2012. Survival analysis for freshness in microblogging search. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM’12). ACM, New York, 2483--2486.

Digital Library

[3]

Amodeo, G., Amati, G., and Gambosi, G. 2011. On relevance, time and query expansion. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management (CIKM’11). ACM, New York, 1973--1976.

Digital Library

[4]

Berberich, K., Bedathur, S., Alonso, O., and Weikum, G. 2010. A language modeling approach for temporal information needs. In Proceedings of the 32nd European conference on Advances in Information Retrieval (ECIR’10). 13--25.

Digital Library

[5]

Bian, J., Li, X., Li, F., Zheng, Z., and Zha, H. 2010. Ranking specialization for web search: A divide-and-conquer approach by using topical ranksvm. In Proceedings of the 19th International Conference on World Wide Web (WWW’10). 131--140.

Digital Library

[6]

Choi, J. and Croft, W. B. 2012. Temporal models for microblogs. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM’12). ACM, New York, 2491--2494.

Digital Library

[7]

Choi, J., Croft, W. B., and Kim, J. Y. 2012. Quality models for microblog retrieval. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM’12). ACM, New York, 1834--1838.

Digital Library

[8]

Cohen, W. W., Schapire, R. E., and Singer, Y. 1998. Learning to order things. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS’97). 451--457.

Digital Library

[9]

Dai, N. and Davison, B. D. 2010. Freshness matters: In flowers, food, and web authority. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’10). 114--121.

Digital Library

[10]

Dai, N., Shokouhi, M., and Davison, B. D. 2011. Learning to rank for freshness and relevance. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’11). 95--104.

Digital Library

[11]

Dakka, W., Gravano, L., and Ipeirotis, P. G. 2012. Answering general time-sensitive queries. IEEE Trans. Knowl. Data Eng. 24, 220--235.

Digital Library

[12]

Dong, A., Chang, Y., Zheng, Z., Mishne, G., Bai, J., Zhang, R., Buchner, K., Liao, C., and Diaz, F. 2010a. Towards recency ranking in web search. In Proceedings of the 3rd ACM International Conference on Web Search and Data Mining (WSDM’10). 11--20.

Digital Library

[13]

Dong, A., Zhang, R., Kolari, P., Bai, J., Diaz, F., Chang, Y., Zheng, Z., and Zha, H. 2010b. Time is of the essence: Improving recency ranking using Twitter data. In Proceedings of the 19th International Conference on World Wide Web (WWW’10). 331--340.

Digital Library

[14]

Duan, Y., Jiang, L., Qin, T., Zhou, M., and Shum, H.-Y. 2010. An empirical study on learning to rank of tweets. In Proceedings of the 23rd International Conference on Computational Linguistics (COLING’10). 295--303.

Digital Library

[15]

Efron, M. and Golovchinsky, G. 2011. Estimation methods for ranking recent information. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’11). 495--504.

Digital Library

[16]

Efron, M., Organisciak, P., and Fenlon, K. 2012. Improving retrieval of short texts through document expansion. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’12). ACM, New York, 911--920.

Digital Library

[17]

Elsas, J. L. and Dumais, S. T. 2010. Leveraging temporal dynamics of document content in relevance ranking. In Proceedings of the 3rd ACM International Conference on Web Search and Data Mining (WSDM’10). 1--10.

Digital Library

[18]

Han, Z., Li, X., Yang, M., Qi, H., Li, S., and Zhao, T. 2012. Hit at trec 2012 microblog track. In Proceedings of Text REtrieval Conference.

[19]

Herbrich, R., Graepel, T., and Obermayer, K. 2000. Large margin rank boundaries for ordinal regression. In Advances in Large Margin Classifiers, P. J. Bartlett, B. Schölkopf, D. Schuurmans, and A. J. Smola Eds., 115--132.

[20]

Hüllermeier, E. and Fürnkranz, J. 2010. On predictive accuracy and risk minimization in pairwise label ranking. J. Comput. Syst. Sci. 76, 1, 49--62.

Digital Library

[21]

Joachims, T. 1999. Advances in Kernel Methods. 169--184.

[22]

Joachims, T. 2002. Optimizing search engines using clickthrough data. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’02). 133--142.

Digital Library

[23]

Jones, R. and Diaz, F. 2007. Temporal profiles of queries. ACM Trans. Inf. Syst. 25, 3.

Digital Library

[24]

Keikha, M., Gerani, S., and Crestani, F. 2011a. Temper: A temporal relevance feedback method. In Proceedings of the 33d European Conference on Advances in Information Retrieval (ECIR’11). Springer, 436--447.

Digital Library

[25]

Keikha, M., Gerani, S., and Crestani, F. 2011b. Time-based relevance models. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’11). ACM, New York, 1087--1088.

Digital Library

[26]

Kulkarni, A., Teevan, J., Svore, K. M., and Dumais, S. T. 2011. Understanding temporal query dynamics. In Proceedings of the 4th ACM International Conference on Web Search and Data Mining (WSDM’11). 167--176.

Digital Library

[27]

Laplace, P.-S. 1774. Mémoire sur la probabilité des causes par les évènements. Mémoires de l’Academie Royale des Sciences Presentés par Divers Savan., 621--656.

[28]

Lee, J. H. 1997. Analyses of multiple evidence combination. In Proceedings of the 20th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’97). ACM, New York, 267--276.

Digital Library

[29]

Li, X. and Croft, W. B. 2003. Time-based language models. In Proceedings of the 12th ACM International Conference on Information and Knowledge Management (CIKM’03). 69--475.

Digital Library

[30]

Liang, F., Qiang, R., and Yang, J. 2012. Exploiting real-time information retrieval in the microblogosphere. In Proceedings of the 12th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL’12). 267--276.

Digital Library

[31]

Liu, S., Liu, F., Yu, C., and Meng, W. 2004. An effective approach to document retrieval via utilizing wordnet and recognizing phrases. In Proceedings of the 27th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’04). 266--272.

Digital Library

[32]

Massoudi, K., Tsagkias, M., de Rijke, M., and Weerkamp, W. 2011. Incorporating query expansion and quality indicators in searching microblog posts. In Proceedings of the 32nd European conference on Advances in Information Retrieval (ECIR’10). Springer, 362--367.

Digital Library

[33]

McCreadie, R., MacDonald, C., Santos, R., and Ounis, I. 2011. University of glasgow at trec 2011: Experiments with terrier in crowdsourcing, microblog, and web tracks. In Proceedings of Text REtrieval Conference.

[34]

Metzler, D. and Cai, C. 2011. Usc/isi at trec 2011: Microblog track (notebook version). In Proceedings of Text REtrieval Conference.

[35]

Ounis, I., MacDonald, C., Lin, J., and Soboroff, I. 2011. Overview of the trec 2011 microblog track. In Proceedings of Text REtrieval Conference.

[36]

Rijsbergen, C. J. V. 1979. Information Retrieval 2nd Ed. Butterworth-Heinemann, Newton, MA.

Digital Library

[37]

Robertson, S., Walker, S., Jones, S., Hancock-Beaulieu, M., and Gatford, M. 1996. Okapi at TREC-3. 109--126.

[38]

Robertson, S., Zaragoza, H., and Taylor, M. 2004. Simple bm25 extension to multiple weighted fields. In Proceedings of the 13th ACM International Conference on Information and Knowledge Management (CIKM’04). 42--49.

Digital Library

[39]

Shaw, J. A., Fox, E. A., Shaw, J. A., and Fox, E. A. 1994. Combination of multiple searches. In Proceedings of the 2nd Text REtrieval Conference (TREC-2). 243--252.

[40]

Soboroff, I., Ounis, I., and Lin, J. 2012. Overview of the trec 2012 microblog track. In Proceedings of Text REtrieval Conference.

[41]

Zhang, W., Liu, S., Yu, C., Sun, C., Liu, F., and Meng, W. 2007. Recognition and classification of noun phrases in queries for effective retrieval. In Proceedings of the 16th ACM International Conference on Information and Knowledge Management (CIKM’07). ACM, New York, 711--720.

Digital Library

[42]

Zhang, X., He, B., Luo, T., and Li, B. 2012. Query-biased learning to rank for real-time twitter search. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM’12). ACM, New York, 1915--1919.

Digital Library

Cited By

Ibtihel BLobna HLotfi B(2019)A Deep Learning-based Ranking Approach for Microblog RetrievalProcedia Computer Science10.1016/j.procs.2019.09.190159(352-362)Online publication date: 2019
https://doi.org/10.1016/j.procs.2019.09.190
CHY AULLAH MAONO M(2017)Microblog Retrieval Using Ensemble of Feature Sets through Supervised Feature SelectionIEICE Transactions on Information and Systems10.1587/transinf.2016DAP0032E100.D:4(793-806)Online publication date: 2017
https://doi.org/10.1587/transinf.2016DAP0032
Chy AUllah MAono M(2015)Combining temporal and content aware features for microblog retrieval2015 2nd International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA)10.1109/ICAICTA.2015.7335353(1-6)Online publication date: Aug-2015
https://doi.org/10.1109/ICAICTA.2015.7335353

Index Terms

The Impacts of Structural Difference and Temporality of Tweets on Retrieval Effectiveness
1. Information systems
  1. Information retrieval
    1. Document representation
  2. Information systems applications

Recommendations

An effective approach to tweets opinion retrieval

Opinion retrieval deals with finding relevant documents that express either a negative or positive opinion about some topic. Social Networks such as Twitter, where people routinely post opinions about almost any topic, are rich environments for ...
Hashtag recommendation for hyperlinked tweets
SIGIR '14: Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval

Presence of hyperlink in a tweet is a strong indication of tweet being more informative. In this paper, we study the problem of hashtag recommendation for hyperlinked tweets (i.e., tweets containing links to Web pages). By recommending hashtags to ...
Time-aware adaptive tweets ranking through deep learning
Abstract
Generally, tweets about brands, news and so forth, are mostly delivered to the Twitter user in a reverse chronological order choosing among those twitted by the so-called followed users. Recently, Twitter is facing with information ...
Highlights
- Time-aware adaptive and personalized learning to rank algorithm for tweets.
- ...

Reviews

Reviewer: Xiannong Meng

A novel method of evaluating tweets is introduced in this paper. Most retrieval algorithms do not differentiate the structure of tweets. The authors show convincingly that it does have an impact on retrieval effectiveness. In their study, two types of tweets are evaluated separately, those containing only plain text, and those containing any URLs. The key is that in ranking the tweets with URLs, the ranker considers the content of the page(s) pointed to by the URLs in addition to the tweets. After ranking the two types of tweets, a support vector machine-based classifier with 18 features is used to evaluate the relevance between the tweets and the query. If a tweet is time-sensitive, the temporal information of both the tweet and its parent is taken into consideration. Data from TREC 2011, TREC 2012, and TREC Tweets 2011 are used to evaluate the algorithm. The results indicate that (1) it is more effective to rank the two types of tweets separately and then merge them; (2) incorporating temporal information yields further improvements; and (3) the proposed "method compares favorably with state-of-the-art methods in retrieval effectiveness." The novelty of the proposed method is that it offers the ability to rank the tweets with and without URLs separately and to incorporate the temporal information in ranking the tweets. The paper is well written and self-contained. Many examples illustrate the concepts discussed. The readers can explore the topic further using the abundant references provided. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Information Systems

ACM Transactions on Information Systems Volume 31, Issue 4

November 2013

192 pages

ISSN:1046-8188

EISSN:1558-2868

DOI:10.1145/2536736

Editor:
Jamie Callan
Carnegie Mellon University, USA

Issue’s Table of Contents

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 November 2013

Accepted: 01 July 2013

Revised: 01 April 2013

Received: 01 September 2012

Published in TOIS Volume 31, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
327
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)0

Reflects downloads up to 26 Jul 2024

Other Metrics

View Author Metrics

Citations

Cited By

Ibtihel BLobna HLotfi B(2019)A Deep Learning-based Ranking Approach for Microblog RetrievalProcedia Computer Science10.1016/j.procs.2019.09.190159(352-362)Online publication date: 2019
https://doi.org/10.1016/j.procs.2019.09.190
CHY AULLAH MAONO M(2017)Microblog Retrieval Using Ensemble of Feature Sets through Supervised Feature SelectionIEICE Transactions on Information and Systems10.1587/transinf.2016DAP0032E100.D:4(793-806)Online publication date: 2017
https://doi.org/10.1587/transinf.2016DAP0032
Chy AUllah MAono M(2015)Combining temporal and content aware features for microblog retrieval2015 2nd International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA)10.1109/ICAICTA.2015.7335353(1-6)Online publication date: Aug-2015
https://doi.org/10.1109/ICAICTA.2015.7335353

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents