Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3155133.3155177acmotherconferencesArticle/Chapter ViewAbstractPublication PagessoictConference Proceedingsconference-collections
research-article

Phrasal Graph-based Method for Abstractive Vietnamese Paragraph Compression

Published: 07 December 2017 Publication History
  • Get Citation Alerts
  • Abstract

    Text compression is the task of identifying the main information in the source text to form a short single sentence. A broad approach is to find a path containing common vertices in the word graph model. The first issue of this approach is that the path finding algorithm can separate words from the phrase expressing a content. This leads to create new sentences having different meaning from the original ones. The second issue is that when an information is expressed by different words or phrases, called co-reference situations. Due to lacking of mechanism for handling this situation, the compression will be missing information. We propose in this paper a method to overcome the above issues. The core of new method is the improved graph model in which each vertex illustrates a phrase with its corresponding Part-of-Speech label. The intersection vertices of branches are results of mechanism for handling co-references. The compressing algorithm reduces the graph and forms the final sentence. We use ROUGE measure to compare with two word graph-based baselines. The experiment result shows that our method creates short sentences containing rich information.

    References

    [1]
    A. Khan and N. Salim. 2014. A Review on Abstractive Summarization Methods. Journal of Theoretical and Applied Information Technology 59, 1 (2014), 64--72.
    [2]
    B. Santorini. 1990. Part-of-speech Tagging Guidelines for the Penn Treebank Project. Technical Report MS-CIS- 90-47. Department of Computer and Information Science, University of Pennsylvania.
    [3]
    C. F. Greenbacker. 2011. Towards a framework for abstractive summarization of multimodal documents. In ACL HLT. 75.
    [4]
    C. S. Lee, Z. W. Jian and L. K. Huang. 2005. A Fuzzy Ontology and Its Application to News Summarization. IEEE Transaction on Systems, Man and Cybernetics, Part B: Cybernetics 35, 5 (2005), 859--880.
    [5]
    C. S. Saranyamol and L. Sindhu. 2014. A Survey on Automatic Text Summarization. International Journal of Computer Science and Information Technologies 5, 6 (2014), 7889--7893.
    [6]
    C. Y. Lin. 2004. ROUGE: A Package for Automatic Evaluation of Summaries. In Proceeding of the Workshop on Text Summarization Branches Out, Post-Conference Workshop of ACL 2004. Barcelona, Spain.
    [7]
    D. Das and A. F. T. Martins. 2007. A survey on automatic text summarization. Language Technologies Institute, Carnegie Mellon University.
    [8]
    E. Lloret. 2008. Text summarization: an overview. Paper supported by the Spanish Government under the project TEXT-MESS (TIN2006-15265- C06-01).
    [9]
    E. Lloret and M. Palomar. 2011. Analyzing the Use of Word Graphs for Abstractive Text Summarization. In Proceeding of The First International Conference on Advances in Information Mining and Management.
    [10]
    E. Krahmer, E. Marsi and Paul van Pelt. 2008. Query-based sentence fusion is better defined and leads to more preferred results than generic sentence fusion. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies, Short Papers (Companion Volume). Columbus, Ohio, USA, June 2008, 193--196.
    [11]
    F. Boudin and E. Morin. 2013. Keyphrase extraction for n-best reranking in multi-sentence compression. In Proceeding of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2013). Atlanta, Georgia, 298--305.
    [12]
    F. Cornish. 2009. Inter-sentential anaphora and coherence relations in discourse: a perfect match. Language Sciences, 31, 5 (2009), 572--592.
    [13]
    H. P. Luhn.1958. The automatic creation of literature abstracts. IBM Journal of Research Development 2, 2 (1958), 159--165.
    [14]
    H. P. Edmundson. 1969. New methods in automatic extracting. Journal of the ACM 1, 2 (1969), 264--285.
    [15]
    H. T. Le and T. M. Le. 2013. An approach to Abstractive Text Summarization. In Proceeding of 5th International Conference of Soft Computing and Pattern Recognition (SoCPaR 2013). Hanoi, Vietnam. 372--377.
    [16]
    H. X. Cao. 2006. Tiêng Viêt: So' thao ngũ pháp chũc năng {Vietnamese: Brief of Functional Grammar}. Nhà xuât bân giáo dũc {Education Publisher}.
    [17]
    I. F. Moawad and M. Aref. 2012. Semantic graph reduction approach for abstractive Text Summarization. In Proceeding of 7th International Conference on Computer Engineering & Systems (ICCES). 132--138.
    [18]
    I. Mani. 2001. Automatic Summarization. John Benjamins Publishing Company.
    [19]
    J. Clarke and M. Lapata. 2006a. Constraint-Based Sentence Compression: An Integer Programming Approach. In Proceedings of the COLING/ACL 2006 Main Conference Poster Session. Sydney, Australia, 144--151.
    [20]
    J. Clarke and M. Lapata. 2006b. Models for sentence compression: A comparison across domains, training requirements and evaluation measures. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Sydney, Australia, 17-8 July, 377--384.
    [21]
    J. Clarke and M. Lapata. 2008. Global inference for sentence compression: An integer linear programming approach. Journal of Artificial Intelligence Research, 31 (2008), 399--429.
    [22]
    K. A. Ganesan, C. X. Zhai and J. Han. 2010. Opinosis: A Graph-Based Approach to Abstractive Summarization of Highly Redundant Opinions. In Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010). Beijing, China. 340--348.
    [23]
    K. Filippova. 2010. Multi-Sentence Compression: Finding Shortest Paths in Word Graphs. In Proceeding of the 23rd International Conference on Computational Linguistics (COLING 2010). Beijing, China. 322--330.
    [24]
    K. Filippova and M. Strube. 2008a. Dependency Tree Based Sentence Compression. In Proceeding of the 5th International Natural Language Generation Conference. Salt Fork, Ohio.
    [25]
    K. Filippova and M. Strube. 2008b. Sentence Fusion via Dependency Graph Compression. In Proceeding of the Conference on Empirical Methods in Natural Language Processing. Honolulu, Hawaii.
    [26]
    K. Jezek and J. Steinberger. 2008. Automatic Text summarization. Vaclav Snasel (Ed.): Znalosti 2008, ISBN 978-80-227-2827-0, HIT STU Bratislava. Ustav Informatiky a softveroveho inzinierstva, 1--12.
    [27]
    K. S. Jones. 2007. Automatic summarising: a review and discussion of the state of the art. Technical Report 679. Computer Laboratory, University of Cambridge.
    [28]
    N. R. Kasture, N. Yargal, N. N. Singh, N. Kulkarni and V. Mathur. 2014. A Survey on Methods of Abstractive Text Summarization. International Journal for Research in Merging Science and Technology 1, 6 (2014), 53--57.
    [29]
    P. Baxendale. 1958. Machine-made index for technical literature -- an experiment. IBM Journal of Research Development 2, 4 (1958), 354--361.
    [30]
    P. E. Genest and G. Lapalme. 2010. Text Generation for Abstractive Summarization. In Proceedings of the 3rd Text Analysis Conference.
    [31]
    P. E. Genest and G. Lapalme. 2011. Framework for Abstractive Summarization using Text-to-Text Generation. In Workshop on Monolingual Text-To-Text Generation, pages 64--73. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, Portland, Oregon, 24 June 2011, 64--73.
    [32]
    P. E. Genest and G. Lapalme. 2012. Fully Abstractive Approach to Guided Summarization. In Proceeding of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers -- Volum 2. Jeju Island, Korea, 354--358.
    [33]
    R. Barzilay, K. R. McKeown and M. Elhadad. 1999. Information fusion in the context of multi-document summarization. In Proceeding of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics. 550--557.
    [34]
    R. Barzilay and K. R. McKeown. 2005. Sentence Fusion for Multi-document News Summarization. Computational Linguistics 31, 3 (2005), 297--328.
    [35]
    S. M. Harabagiu and F. Lacatusu. 2002. Generating single and multi-document summaries with gistexter. In Proceeding of Document Understanding Conferences.
    [36]
    T. Tran and D. T. Nguyen. 2013a. A Solution for Resolving Inter-sentential Anaphoric Pronouns for Vietnamese Paragraphs Composing Two Single Sentences. In Proceeding of the 5th International Conference of Soft Computing and Pattern Recognition (SoCPaR 2013). Hanoi, Vietnam, 172--177.
    [37]
    T. Tran and D. T. Nguyen. 2013b. Improve effectiveness resolving some inter-sentential anaphoric pronouns indicating human objects in Vietnamese paragraphs using finding heuristics with priority. In Proceedings of the 10th RIVF International Conference on Computing and Communication Technologies--Research, Innova- tion, and Vision for the Future (RIVF'13). Hanoi, Vietnam. 109--114.
    [38]
    T. Tran and D. T. Nguyen. 2006. Môt Phũong Pháp Dũa Trên Luât đe Chuyên Đoi Văn Bân Tiêng Viêt vê DRS (Discourse Representation Structure) {A Rule-based Method for Transforming Vietnamese Paragraphs into DRS (Discourse Representation Structure)}. Chuyên san Công nghê Thông tin và Truyên thông, Tâp chí Khoa hôc và Ky thuât, Hôc viên Ky thuât quân sũ {Journal of Science and Technology: The Section on Information and Communication Technology (LQDTU-JICT)}, 9 (2016), 61--83.
    [39]
    V. Gupta and G. S. Lehal. 2010. A survey of text summarization extractive techniques. Journal of Emerging Technology in Web Intelligence 2, 3 (2010). 258--268.

    Index Terms

    1. Phrasal Graph-based Method for Abstractive Vietnamese Paragraph Compression

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Other conferences
        SoICT '17: Proceedings of the 8th International Symposium on Information and Communication Technology
        December 2017
        486 pages
        ISBN:9781450353281
        DOI:10.1145/3155133
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        In-Cooperation

        • SOICT: School of Information and Communication Technology - HUST
        • NAFOSTED: The National Foundation for Science and Technology Development

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 07 December 2017

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. Co-Reference Resolution
        2. Graph Construction
        3. Text Compression
        4. Text Tagging

        Qualifiers

        • Research-article
        • Research
        • Refereed limited

        Conference

        SoICT 2017

        Acceptance Rates

        Overall Acceptance Rate 147 of 318 submissions, 46%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • 0
          Total Citations
        • 23
          Total Downloads
        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 27 Jul 2024

        Other Metrics

        Citations

        View Options

        Get Access

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media