research-article

clstk: The Cross-Lingual Summarization Tool-Kit

Authors:

Nisarg Jhaveri,

Vasudeva VarmaAuthors Info & Claims

WSDM '19: Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining

Pages 766 - 769

https://doi.org/10.1145/3289600.3290614

Published: 30 January 2019 Publication History

Abstract

Cross-lingual summarization (CLS) aims to create summaries in a target language, from a document or document set given in a different, source language. Cross-lingual summarization can play a critical role in enabling cross-lingual information access for millions of people across the globe who do not speak or understand languages having large representation on the web. It can also make documents originally published in local languages quickly accessible to a large audience which does not understand those local languages. Though cross-lingual summarization has gathered some attention in the last decade, there has been no serious effort to publish rigorous software for this task. In this paper, we provide a design for an end-to-end CLS software called clstk. Besides implementing a number of methods proposed by different CLS researchers over years, the software integrates multiple components critical for CLS. We hope that this extremely modular tool-kit will help CLS researchers to contribute more effectively to the area.

References

[1]

Florian Boudin, Stéphane Huet, and Juan-Manuel Torres-Moreno. 2011. A Graph-based Approach to Cross-Language Multi-Document Summarization . Polibits 43 (2011), 113--118.

[2]

Dipanjan Das and André FT Martins. 2007. A Survey on Automatic Text Summarization . Literature Survey for the Language and Statistics II course at Carnegie Mellon University, Vol. 4 (2007), 192--195.

[3]

George Giannakopoulos, Mahmoud El-Haj, Benoit Favre, Marianna Litvak, Josef Steinberger, and Vasudeva Varma. 2011. TAC 2011 MultiLing Pilot Overview . (2011).

[4]

Eva Hasler, Adrià de Gispert, Felix Stahlberg, Aurelien Waite, and Bill Byrne. 2017. Source Sentence Simplification for Statistical Machine Translation . Computer Speech & Language, Vol. 45 (2017), 221--235.

Digital Library

[5]

Nisarg Jhaveri, Manish Gupta, and Vasudeva Varma. 2018a. A Workbench for Rapid Generation of Cross-Lingual Summaries. In Proc. of the $11^th$ Intl. Conf. on Language Resources and Evaluation (LREC 2018) (7--12).

[6]

Nisarg Jhaveri, Manish Gupta, and Vasudeva Varma. 2018b. Translation Quality Estimation for Indian Languages. In EAMT (28--30). 159--168.

[7]

Chin-Yew Lin. 2004. ROUGE: A Package for Automatic Evaluation of Summaries . Text Summarization Branches Out (2004).

[8]

Hui Lin and Jeff Bilmes. 2010. Multi-Document Summarization via Budgeted Maximization of Submodular Functions. In Human Language Technologies: The 2010 Annual Conf. of the North American Chapter of the Association for Computational Linguistics. 912--920.

Digital Library

[9]

Hui Lin and Jeff Bilmes. 2011. A Class of Submodular Functions for Document Summarization. In Proc. of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1. 510--520.

Digital Library

[10]

Ani Nenkova and Kathleen McKeown. 2012. A survey of text summarization techniques. In Mining text data . Springer, 43--76.

[11]

Sergiu Nisioi, Sanja vS tajner, Simone Paolo Ponzetto, and Liviu P Dinu. 2017. Exploring Neural Text Simplification Models. In ACL, Vol. 2. 85--91.

[12]

C Poornima, V Dhanalakshmi, KM Anand, and KP Soman. 2011. Rule-based Sentence Simplification for English to Tamil Machine Translation System . Intl. Journal of Computer Applications, Vol. 25, 8 (2011), 38--42.

[13]

Xiaojun Wan. 2011. Using Bilingual Information for Cross-Language Document Summarization. In ACL-HLT . 1546--1555.

Digital Library

[14]

Xiaojun Wan, Huiying Li, and Jianguo Xiao. 2010. Cross-Language Document Summarization based on Machine Translation Quality Prediction. In ACL . 917--926.

Digital Library

[15]

Xiaojun Wan, Fuli Luo, Xue Sun, Songfang Huang, and Jin-ge Yao. 2018. Cross-Language Document Summarization via Extraction and Ranking of Multiple Summaries . Knowledge and Information Systems (2018), 1--19.

[16]

Jin-ge Yao, Xiaojun Wan, and Jianguo Xiao. 2015. Phrase-based Compressive Cross-Language Summarization. In EMNLP . 118--127.

[17]

Jiajun Zhang, Yu Zhou, and Chengqing Zong. 2016. Abstractive Cross-Language Summarization via Translation Model Enhanced Predicate Argument Structure Fusing . TASLP, Vol. 24, 10 (2016), 1842--1853.

Digital Library

Cited By

Cagliero LGarza PLa Quatra M(2020)Combining Machine Learning and Natural Language Processing for Language-Specific, Multi-Lingual, and Cross-Lingual Text SummarizationTrends and Applications of Text Summarization Techniques10.4018/978-1-5225-9373-7.ch001(1-31)Online publication date: 2020
https://doi.org/10.4018/978-1-5225-9373-7.ch001

Index Terms

clstk: The Cross-Lingual Summarization Tool-Kit
1. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Summarization
    2. Specialized information retrieval
      1. Structure and multilingual text search
        Multilingual and cross-lingual retrieval

Recommendations

Unifying Cross-lingual Summarization and Machine Translation with Compression Rate
SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

Cross-Lingual Summarization (CLS) is a task that extracts important information from a source document and summarizes it into a summary in another language. It is a challenging task that requires a system to understand, summarize, and translate at the ...
Long-Document Cross-Lingual Summarization
WSDM '23: Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining

Cross-Lingual Summarization (CLS) aims at generating summaries in one language for the given documents in another language. CLS has attracted wide research attention due to its practical significance in the multi-lingual world. Though great contributions ...
MCLS: A Large-Scale Multimodal Cross-Lingual Summarization Dataset
Chinese Computational Linguistics
Abstract
Multimodal summarization which aims to generate summaries with multimodal inputs, e.g., text and visual features, has attracted much attention in the research community. However, previous studies only focus on monolingual multimodal summarization ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WSDM '19: Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining

January 2019

874 pages

ISBN:9781450359405

DOI:10.1145/3289600

General Chairs:
J. Shane Culpepper
RMIT University
,
Alistair Moffat
The University of Melbourne
,
Program Chairs:
Paul N. Bennett
Microsoft
,
Kristina Lerman
University of Southern California

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 January 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

WSDM '19

Sponsor:

WSDM '19: The Twelfth ACM International Conference on Web Search and Data Mining

February 11 - 15, 2019

Melbourne VIC, Australia

Acceptance Rates

WSDM '19 Paper Acceptance Rate 84 of 511 submissions, 16%;

Overall Acceptance Rate 498 of 2,863 submissions, 17%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
180
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)1

Reflects downloads up to 09 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Cagliero LGarza PLa Quatra M(2020)Combining Machine Learning and Natural Language Processing for Language-Specific, Multi-Lingual, and Cross-Lingual Text SummarizationTrends and Applications of Text Summarization Techniques10.4018/978-1-5225-9373-7.ch001(1-31)Online publication date: 2020
https://doi.org/10.4018/978-1-5225-9373-7.ch001

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents