Affiliations: Research into Artifacts, Center for Engineering (RACE), The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8568, Japan | Services and Solution Development Department, NTT DoCoMo, Inc., NTT DoCoMo R&D Center 3-5 Hikari-no-oka, Yokosuka, Kanagawa 239-8536, Japan
Abstract: A topic model capable of assigning word pairs to associated topics is developed to explore people's activities. Considering that the form of word pairs led by verbs is a more effective way to express people's activities than separate words, we incorporate the word-connection model into the smoothed Latent Dirichlet Allocation (LDA) to ensure that the words are well paired and assigned to the associated topics. To quantitatively and qualitatively evaluate the proposed model, two datasets were built using Twitter posts as data sources: the wish-related and the geographical information-related datasets. The experiment results using the wish-related dataset indicate that the relatedness of words plays a key role in forming reasonable pairs, and the proposed model, word-pair generative Latent Dirichlet Allocation (wpLDA), performs well in clustering. Results obtained using the geographical information-related dataset demonstrate that the proposed model works well for discovering people's activities, in which the activities are understandably represented with an intuitive character.
Keywords: Intuitive expressions, connection lattice, tweets, LDA model, association rules