Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Using Social Conversational Context For Detecting Users Interactions on Microblogging Sites Rami BELKAROUI∗ , Rim FAIZ∗∗ , Aymen ELKHLIFI∗∗∗ ∗ LARODEC, ISG Tunis, University of Tunis, Tunisia rami.belkaroui@gmail.com ∗∗ LARODEC, IHEC, University of Carthage, Tunisia rim.faiz@ihec.rnu.tn ∗∗∗ LALIC, Paris Sorbonne University, France aymen.elkhlifi@paris4.sorbonne.fr Abstract. In the current era, microblogging services like Twitter, gives people the ability to communicate, interact, collaborate with each other, reply to messages from others and create conversations. These services can be seen as very large information repository containing millions of text messages usually organized into complex networks involving users interacting with each other at specific times. Several works have proposed tools for tweets search focused only to retrieve relevant tweets. Therefore, users are unable to explore the results or retrieve more relevant tweets based on the content, and may get lost or become frustrated by the information overload. In this paper, we propose a new method to retrieve conversation on microblogging sites particularly Twitter. It’s based on content analysis and content enrichment. The goal of our method is to present a more informative result compared to conventional search engine. The proposed method has been implemented and evaluated by comparing it to Google and Twitter Search engines and we obtained very promising results. 1 Introduction Last years, People are becoming more communicative through expansion of services and multi-platform applications such as blogs, forums and social networks which establishes social and collaborative backgrounds. This behavior leads to an accumulation of an enormous amount of information. Among these platforms are so-called microblogs. Furthermore, microblogging services (Boyd et al., 2010) gives people the ability to communicate, interact, collaborate with each other, reply to messages from others and create conversations. While communicating people share different kind of information like common knowledge, opinions, emotions, information resources and their likes or dislikes. The analysis of those communications can be useful for commercial applications such as trends monitoring, reputation management and news broadcasting. In addition, one of main characteristic of microblogging services is that users are not limited to produce contents; they can get involved indirectly in conversations with other - 389 - Detecting Users Interactions on Microblogging Sites users by liking and sharing user’s posts. Several works have proposed tools for tweets search focused only to retrieve relevant tweets. Therefore, users are unable to explore the results or retrieve more relevant tweets based on the content, and may get lost or become frustrated by the information overload. In addition, finding good results concerning the given subjects needs to consider the entire context. However, context can be derived from user interactions. This paper proposed a conversation retrieval method which can be used to extract conversation from twitter. Comparing with current methods, the new proposed not only extract directly reply tweets, but also relevant tweets which might be retweets or comments and other possible interactions. The method extract extensive posts beyond conventional conversation. The rest of the document is organized as follows: we begin by presenting related work addressing conversation retrieval on microblogging sites. In section 3, we propose our method allows extracting user’s content interactions. The experimentation and evaluation results are detailed in section 4. The final section presents a summary of our work and future dicrections. 2 Related Work Conversation retrieval is a new search paradigm for microblogging sites. It results from the intersection of Information Retrieval and Social Network Analysis (SNA). Most of microblogging services provide a way to retrieve relevant information (Jabeur et al., 2012; Cherichi and Faiz, 2013), but lack the ability to provide all tweets discussion. There have been few previous researches dealing specifically with conversation detection. In addition, existing conversation retrieval approaches for microblogging sites (Cogan et al., 2012; Magnani et al., 2010, 2011, 2012) have so far focused on the particular case of a conversation formed by directly replying tweets. Magnani et al. (2011) proposed a user-based tree model for retrieving conversations from microblogs. They considered only tweets that directly respond to other tweets by the use of @sign as a marker of addressivity. The downside is that this method does not consider tweets that do not contain the @sign. Similarly (Cogan et al., 2012) proposed a method to build conversation graphs, formed by users replying to tweets. In this case, a tweet can only directly reply to other tweet. However, users can get involved indirectly in conversations communities by commenting, liking, sharing user’s posts and other possible interactions. In (Kumar et al., 2010) the authors concentrated on different microblogging conversations aspects. They proposed a simple model that produces basic conversation structures taking into account the identities of each conversation member. Other related works (Huang et al., 2010) focusing on different aspects of microblogging conversation, that deal respectively with conversation tagging and topics identification. 3 Twitter Conversation Detection Method: TCOND We propose a method which combines a set of conversational features and the directly exchanged text messages in order to extract extensive posts beyond conventional conversation. In addition, we defined a conversation as a set of short text messages posted by a user at specific timestamp on the same topic. These messages can be directly replied to other users by using "@username" or indirectly by liking, retweeting, commenting and other possible interactions. In the next part, we will present more details about our two approach steps. - 390 - R.Belkaroui et al. 3.1 Step 1: Constructing Direct Conversation In this step, we aim to collect all tweets in reply directly to other tweets. Obviously, a reply to a user will always begin with "@username". Our goal in this step is to create reply tree. The reply tree construction process consists of two algorithms run in parallel recursive root finder algorithm and iterative search algorithm. Algorithm 1 Recursive Root Finder (A: twitter) Let T be a tweet collected from Twitter (ID tweet) while (Ti !=root) do Extract Ti - 1 by matching field "in reply to status id" end while A : twitter = A : twitter – 1 Let T0 is the root (first tweet published) of the conversation C and T is a single tweet of the conversation retrieved. Let consider Ti the type of tweet T. A tweet can have three types: root, reply or retweet. The goal of the Recursive Root Finder Algorithm is to identify the conversation root T0 given T. Note that when the algorithm starts, T is not known. Once, the conversation root T0 has been established, the Iterative Search Algorithm is used to seek the remainder of conversation C by searching all tweets addressed to Ti using matching field "in reply to status id". It is run repeatedly until some conditions, indicating that the conversation has ended, are met. 3.2 Step 2: Relevant Indirect Tweets We define new features that may help to detect tweets related indirectly to a same conversation. The goal is to extract tweets that may be relevant to the conversation without the use of the @symbol. We use the following notations in the sequel: • ti is a set of tweets present in direct conversation (tweets in reply to other tweets directly). • tj is a tweet that can be linked indirectly to conversation. The features we used are: • Using the same URL: By sharing an URL, an author would enrichment the information published in his tweet. This feature is applied to collect tweets that share the same URL.  1 if t contains the same URL. (1) P1(ti , tj ) = 0 otherwise. • Hashtags Similarity: The # symbol, called hashtag, is used to mark a topic in a tweet or to follow conversation. Any user can categorize or follow topics with hashtags. We used this feature to collect tweets that share the same hashtags. - 391 - Detecting Users Interactions on Microblogging Sites P2(ti , tj ) =  1 if t contains the same hashtag. 0 otherwise. (2) • Tweets Time Difference: The time difference is highly important feature for detecting tweets linked indirectly to conversation. We use the time attribute to efficiently remove tweets having a large distance in terms of time compared to conversation root. • Tweets Publication dates: Date attribute are highly important for detecting conversations. Users tend to post tweets about conversational topic within a short time period. The Euclidean distance has been used to calculate how similar two posts publication dates are. • Content: We compute the textual similarity between each element in tj , ti taking the maximum value as the similarity measure between two messages. The similarity between two elements is calculated using the well-known tf-idf cosine similarity, sim(ti , tj ). • Similarity Function: Finally, the similarity between tweets indirectly linked to conversation and tweets which are present in the reply tree is calculated by a linear combination between their attributes. 4 Experiments and Results The following experiment has been designed to gather some knowledge on the impact of our results on end-users. For this experiment we have selected three events and queried our dataset using Google 1 , Twitter search engine 2 and our method (TCOND). Then, we have asked a set of 100 assessors to rate the top-10 results of every search task with three relevance levels, namely highly relevant (value equal to 2), relevant (value equal to 1) or irrelevant (value equal to 0). In order to measure the results quality, we use the Normalized Discounted Cumulative Gain (NDCG) at 10 for all the judged events. In addition, we used a second metric which is the Precision at top 10. The dataset has been obtained by monitoring microblogging system Twitter posts over the period of July-August 2013. In particular, we used a sample of about 113 000 posts containing trending topic keywords using Twitter’s streaming API. Trending topics have been determined directly by Twitter, and we have selected the most frequent ones during the monitoring period. 1. www.google.com 2. Search.twitter.com. - 392 - R.Belkaroui et al. 4.1 Experimental Outcomes and Interpretation Results We compare our conversation retrieval method with the results returned by Google and by Twitter search engine using two metrics namely the P@10 and the NDCG@10. From this comparison, we obtained the values summarized in Table 1 where we notice that our method overcomes the results given by both of Google and Twitter. The reason of these promising values is the fact that we combine a set of conversational features and direct replies method to retrieve conversation may have a significant impact on the users’ evaluation. Task1 Google Twitter TCOND Task2 Google Twitter TCOND Task3 Google Twitter TCOND P@10 (Average%) NDCG (Average%) 59.62 65.73 73.28 56.86 59.71 64.52 57.31 62.78 67.27 56.02 58.45 62.73 63.21 65.88 77.27 66.52 68.46 69.33 TAB . 1 – Table of Values for Computing our Worked Example Focusing on the three messages selections, we observe that all conversations obtained with our method receive higher scores with compared to Google and Twitter’s selection. According to the free comments of some users and following the qualitative analysis of the posts in the three selections we can see that Google and twitter received lower scores not because they contained posts judged as less interesting, but because some posts were considered not relevant with regard to the searched topic. Concentration on the three messages selections we observe that all conversations selections obtained with twitter search has higher scores with respect to Google’s selection. These results lead us toward a more general interpretation of the collected data. It appears that the social metrics usage have a significant impact on the users’ degree interest in the retrieved posts. In addition, the retrieving conversations process from Social Network differs from traditional Web information retrieval; it involves human communication aspects, like the degree interest in the conversation explicitly or implicitly expressed by the interacting people. 5 Conclusion This work explored a new method for detecting conversation on microblogging sites: an information retrieval activity exploiting a set of conversational features in addition to the directly exchanged text messages to retrieve conversation. Our experimental results have highlighted many interesting points. First, including social features and the concept of direct conversation - 393 - Detecting Users Interactions on Microblogging Sites in the search function improves the relevance of tweets informativeness and also provides results that are considered more satisfaction with respect to a traditional tweet search task. Future work will further research the conversational aspects by including human communication aspects, like the degree of interest in the conversation and their influence/popularity by gathering data from multiple sources from Social Networks in real time. References Boyd, D., S. Golder, and G. Lotan (2010). Tweet, tweet, retweet: Conversational aspects of retweeting on twitter. In Proceedings of the 2010 43rd Hawaii International Conference on System Sciences, HICSS ’10, Washington, DC, USA, pp. 1–10. IEEE Computer Society. Cherichi, S. and R. Faiz (2013). New metric measure for the improvement of search results in microblogs. In Proceedings of the 3rd International Conference on Web Intelligence, Mining and Semantics, WIMS ’13, New York, NY, USA, pp. 24:1–24:7. ACM. Cogan, P., M. Andrews, M. Bradonjic, W. S. Kennedy, A. Sala, and G. Tucci (2012). Reconstruction and analysis of twitter conversation graphs. In Proceedings of the First ACM International Workshop on Hot Topics on Interdisciplinary Social Networks Research, HotSocial ’12, New York, NY, USA, pp. 25–31. ACM. Huang, J., K. M. Thornton, and E. N. Efthimiadis (2010). Conversational tagging in twitter. In Proceedings of the 21st ACM conference on Hypertext and hypermedia, HT ’10, New York, NY, USA, pp. 173–178. ACM. Jabeur, L. B., L. Tamine, and M. Boughanem (2012). Uprising microblogs: A bayesian network retrieval model for tweet search. In Proceedings of the 27th Annual ACM Symposium on Applied Computing, New York, NY, USA, pp. 943–948. ACM. Kumar, R., M. Mahdian, and M. McGlohon (2010). Dynamics of conversations. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’10, New York, NY, USA, pp. 553–562. ACM. Magnani, M., D. Montesi, G. Nunziante, and L. Rossi (2011). Conversation retrieval from twitter. In Proceedings of the 33rd European conference on Advances in information retrieval, ECIR’11, Berlin, Heidelberg, pp. 780–783. Springer-Verlag. Magnani, M., D. Montesi, and L. Rossi (2010). Information propagation analysis in a social network site. In N. Memon and R. Alhajj (Eds.), ASONAM, pp. 296–300. IEEE C.S. Magnani, M., D. Montesi, and L. Rossi (2012). Conversation retrieval for microblogging sites. In Information.Retrieval Journal, Volume 15, pp. 354–372. Springer Netherlands. Résumé Dans ce travail, nous proposons une nouvelle méthode de détection des conversations sur les sites des réseaux sociaux. Cette méthode est basée sur l’analyse et l’enrichissement de contenu dans le but de présenter un résultat informatif basé sur les interactions des utilisateurs. Nous avons évalué notre méthode sur corpus recueillis de réseau social lié à des sujets spécifiques, et nous avons obtenu des bons résultats. - 394 -