Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2254129.2254152acmotherconferencesArticle/Chapter ViewAbstractPublication PageswimsConference Proceedingsconference-collections
research-article

PostRank: a new algorithm for incremental finding of persian blog representative words

Published: 13 June 2012 Publication History

Abstract

Dimension reduction techniques for text documents can be used for in the preprocessing phrase of blog mining, but these techniques can be more effective if they deal with the nature of the blogs properly. In this paper we propose a novel algorithm called PostRank using shallow approach to identify theme of the blog or blog representative words in order to reduce the dimensions of blogs. PostRank uses a graph-based syntactic representation of the weblog by taking into account some structural features of weblog. At the first step it models the blog as a complete graph and assumes the theme of the blog as a query applied to a search engine like Google and each post as a search result. It tries to rank the posts using Markov chain model like PageRank in Google. We used the ranking model under the assumption that top ranked nodes contain blog best representative words. Then it tries to identify post groups according to their scores. Finally this algorithm analyzes the first group using statistical methods(like TF-IDF) to identify blog representative words. Other groups are candidates of having blog theme after occurring change of theme to the blog. By arriving new instances of posts we try to update the blog graph by setting the initial scores of old nodes in the Markov chain to their final score from last run and continue the PostRank iterations until reaching convergence point. If half of the representative words have changed we would say that theme of the weblog has been changed.
We evaluated our method on the Persianblog dataset and obtained promising results. The blogs have been assigned to ten representative words by human beings and the results of PostRank have been compared to them and results of old related algorithms in this area.

References

[1]
Tang B, Shepherd M, Milios E, Heywood M (2005) Comparing and combining dimension reduction techniques for efficient text clustering. Proceeding of SIAM International Workshop on Feature Selection for Data Mining: 17--26.
[2]
Molina LC, Belanche L, Nebot A (2002) Feature selection algorithms: a survey and experimental evaluation. Proceeding of ICDM'02:306--313.
[3]
Carbonell, J., and Goldstein, J.: The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries. In: SIGIR98. Melbourne, Australia (1998)
[4]
Radev, D., Allison, T., Blair-Goldensohn, S., Blitzer, J. C., Elebi, A., Dimitrov, S., Drabek, E., Hakim, A., Lam, W., Liu, D., Otterbacher, J., Qi, H., Saggion, H., Teufel, S., Topper, M., Winkel, A., and Zhang, Z.: MEAD - a Platform for Multidocument Multilingual Text Summarization. In: LREC. Lisbon, Portugal (2004)
[5]
Berger, A. L., Mittal, V. O.: OCELOT: a System for Summarizing Web Pages. In: 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 144--151, Athens, Greece (2000)
[6]
Sun, J. T., Shen, D., Zeng, H. J., Yang, Q., Lu, Y., Chen, Z.: Web-page Summarization Using clickthrough Data. In: SIGIR'05, pp. 194--201, Salvador, Brazil (2005)
[7]
Shen, D., Chen, Z., Yang, Q., Zeng, H. J., Zhang, B., Lu, Y., Ma, W., Y.: Web-page Classification through Summarization. In: 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Sheffield, United Kingdom (2004)
[8]
Minqing H, Bing L: Mining and summarizing customer reviews. Proceeding of SIGKDD'04:168--177((2004)).
[9]
Ku, L. W., Liang, Y. T., Chen, H. H.: Opinion Extraction, Summarization and Tracking in News and Blog Corpora. In: AAAI-CAAW'06, Stanford, CA, USA (2006)
[10]
Zhou, L., Hovy, E.: On the Summarization of Dynamically Introduced Information: Online Discussions and Blogs. In: AAAI-CAAW'06, Stanford, CA, USA (2006)
[11]
Hu, M., Sun, A., Lim, E. P.: Comments-Oriented Blog Summarization by Sentence Extraction. In: CIKM '07, pp. 901--904, Lisbon, Portugal (2007)
[12]
Lin, Y. R., Sundaram, H.: Blog antenna: summarization of personal blog temporal dynamics based on self-similarity factorization. Proceeding of International Conference on Multimedia and Expo (ICME'07): 540--543, Beijing, China (2007)
[13]
Jafari-Asbagh, M., Sayyadiharikandeh, M., Abolhassani, H.: Blog Summarization for Mining Persian Blogs, SNPD (2009).
[14]
Manning, C. D., Raghavan, P., Schtze, H.: Introduction to Information Retrieval, i0521865719, 9780521865715, Cambridge University Press(2008)
[15]
Sharifloo, A. A. and Shamsfard, M.: A bottom up approach to Persian stemming', IJCNLP, Hyderabad, India(2008)
[16]
Taghva, K., Beckley, R., Sadeh, M.: A List of Farsi Stopwords. Technical Report, 2003-01, Information Science Research Institute, University of Nevada, Las Vegas (2003)

Cited By

View all
  • (2014)Topic classification in Romanian blogosphere12th Symposium on Neural Network Applications in Electrical Engineering (NEUREL)10.1109/NEUREL.2014.7011480(131-134)Online publication date: Nov-2014
  • (2013)Web Discussion Summarization: Study ReviewProceedings of the First International Conference on Advanced Data and Information Engineering (DaEng-2013)10.1007/978-981-4585-18-7_73(649-656)Online publication date: 15-Dec-2013
  1. PostRank: a new algorithm for incremental finding of persian blog representative words

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    WIMS '12: Proceedings of the 2nd International Conference on Web Intelligence, Mining and Semantics
    June 2012
    571 pages
    ISBN:9781450309158
    DOI:10.1145/2254129
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    • UCV: University of Craiova
    • WNRI: Western Norway Research Institute

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 June 2012

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tag

    1. dimension reduction-incremental-Markov chain

    Qualifiers

    • Research-article

    Conference

    WIMS '12
    Sponsor:
    • UCV
    • WNRI

    Acceptance Rates

    Overall Acceptance Rate 140 of 278 submissions, 50%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 12 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2014)Topic classification in Romanian blogosphere12th Symposium on Neural Network Applications in Electrical Engineering (NEUREL)10.1109/NEUREL.2014.7011480(131-134)Online publication date: Nov-2014
    • (2013)Web Discussion Summarization: Study ReviewProceedings of the First International Conference on Advanced Data and Information Engineering (DaEng-2013)10.1007/978-981-4585-18-7_73(649-656)Online publication date: 15-Dec-2013

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media