Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3488560.3498405acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

Knowledge Enhanced Sports Game Summarization

Published: 15 February 2022 Publication History
  • Get Citation Alerts
  • Abstract

    Sports game summarization aims at generating sports news from live commentaries. However, existing datasets are all constructed through automated collection and cleaning processes, resulting in a lot of noise. Besides, current works neglect the knowledge gap between live commentaries and sports news, which limits the performance of sports game summarization. In this paper, we introduce K-SportsSum, a new dataset with two characteristics: (1) K-SportsSum collects a large amount of data from massive games. It has 7,854 commentary-news pairs. To improve the quality, K-SportsSum employs a manual cleaning process; (2) Different from existing datasets, to narrow the knowledge gap, K-SportsSum further provides a large-scale knowledge corpus that contains the information of 523 sports teams and 14,724 sports players. Additionally, we also introduce a knowledge-enhanced summarizer that utilizes both live commentaries and the knowledge to generate sports news. Extensive experiments on K-SportsSum and SportsSum datasets show that our model achieves new state-of-the-art performances. Qualitative analysis and human study further verify that our model generates more informative sports news.

    Supplementary Material

    MP4 File (WSDM22-fp237.mp4)
    In this paper, we introduce K-SportsSum, a new dataset with two characteristics: (1) K-SportsSum collects a large amount of data from massive games. It has 7,854 commentary-news pairs. To improve the quality, K-SportsSum employs a manual cleaning process; (2) Different from existing datasets, to narrow the knowledge gap, K-SportsSum further provides a large-scale knowledge corpus that contains the information of 523 sports teams and 14,724 sports players. Additionally, we also introduce a knowledge-enhanced summarizer that utilizes both live commentaries and the knowledge to generate sports news.

    References

    [1]
    Joshua Ainslie, Santiago Onta nón, Chris Alberti, V. Cvicek, Zachary Kenneth Fisher, Philip Pham, Anirudh Ravula, Sumit K. Sanghai, Qifan Wang, and Li Yang. 2020. ETC: Encoding Long and Structured Inputs in Transformers. In EMNLP .
    [2]
    Iz Beltagy, Matthew E. Peters, and Arman Cohan. 2020. Longformer: The Long-Document Transformer. ArXiv, Vol. abs/2004.05150 (2020).
    [3]
    Sumit Chopra, Michael Auli, and Alexander M. Rush. 2016. Abstractive Sentence Summarization with Attentive Recurrent Neural Networks. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies . Association for Computational Linguistics, San Diego, California, 93--98. https://doi.org/10.18653/v1/N16--1012
    [4]
    Kuan-Hao Huang, Chen Li, and Kai-Wei Chang. 2020. Generating Sports News from Live Commentary: A Chinese Dataset for Sports Game Summarization. In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing . Association for Computational Linguistics, Suzhou, China, 609--615. https://www.aclweb.org/anthology/2020.aacl-main.61
    [5]
    Yoon Kim. 2014. Convolutional Neural Networks for Sentence Classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) . Association for Computational Linguistics, Doha, Qatar, 1746--1751. https://doi.org/10.3115/v1/D14--1181
    [6]
    Gina-Anne Levow. 2006. The Third International Chinese Language Processing Bakeoff: Word Segmentation and Named Entity Recognition. In SIGHAN@COLING/ACL .
    [7]
    Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, and Luke Zettlemoyer. 2020. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 7871--7880. https://doi.org/10.18653/v1/2020.acl-main.703
    [8]
    Xiaonan Li, Hang Yan, Xipeng Qiu, and Xuanjing Huang. 2020. FLAT: Chinese NER Using Flat-Lattice Transformer. In ACL .
    [9]
    Maofu Liu, Qiaosong Qi, Huijun Hu, and Han Ren. 2016. Sports news generation from live webcast scripts based on rules and templates. In Natural Language Understanding and Intelligent Applications. Springer, 876--884.
    [10]
    Y. Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, M. Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. ArXiv, Vol. abs/1907.11692 (2019).
    [11]
    Xue-Qiang Lv, Xin-Dong You, Wen-Chao Wang, and Jian-She Zhou. 2020. Generate Football News from Live Webcast Scripts Based on Character-CNN with Five Strokes. Journal of Computers, Vol. 31, 1 (2020), 232--241.
    [12]
    Rada Mihalcea and Paul Tarau. 2004. TextRank: Bringing Order into Text. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Barcelona, Spain, 404--411. https://www.aclweb.org/anthology/W04--3252
    [13]
    Ramesh Nallapati, Bowen Zhou, Cicero dos Santos, cC aug lar Gulcc ehre, and Bing Xiang. 2016. Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond. In Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning. Association for Computational Linguistics, Berlin, Germany, 280--290. https://doi.org/10.18653/v1/K16--1028
    [14]
    Colin Raffel, Noam M. Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, W. Li, and Peter J. Liu. 2020. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. ArXiv, Vol. abs/1910.10683 (2020).
    [15]
    Alexander M. Rush, Sumit Chopra, and Jason Weston. 2015. A Neural Attention Model for Abstractive Sentence Summarization. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing . Association for Computational Linguistics, Lisbon, Portugal, 379--389. https://doi.org/10.18653/v1/D15--1044
    [16]
    Abigail See, Peter J. Liu, and Christopher D. Manning. 2017. Get To The Point: Summarization with Pointer-Generator Networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Vancouver, Canada, 1073--1083. https://doi.org/10.18653/v1/P17--1099
    [17]
    Ashish Vaswani, Noam M. Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. ArXiv, Vol. abs/1706.03762 (2017).
    [18]
    Xiaojun Wan, Jianmin Zhang, Jin-ge Yao, and Tianming Wang. 2016. Overview of the NLPCC-ICCPOL 2016 shared task: sports news generation from live webcast scripts. In Natural Language Understanding and Intelligent Applications. Springer, 870--875.
    [19]
    Jiaan Wang, Zhixu Li, Qiang Yang, Jianfeng Qu, Zhigang Chen, Qingsheng Liu, and Guoping Hu. 2021. SportsSum2.0: Generating High-Quality Sports News from Live Text Commentary. Proceedings of the 30th ACM International Conference on Information and Knowledge Management .
    [20]
    Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander M. Rush. 2020. Transformers: State-of-the-Art Natural Language Processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations . Association for Computational Linguistics, Online, 38--45. https://www.aclweb.org/anthology/2020.emnlp-demos.6
    [21]
    Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, and Colin Raffel. 2021. mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer. In NAACL .
    [22]
    Jin-ge Yao, Jianmin Zhang, Xiaojun Wan, and Jianguo Xiao. 2017. Content Selection for Real-time Sports News Construction from Commentary Texts. In Proceedings of the 10th International Conference on Natural Language Generation . Association for Computational Linguistics, Santiago de Compostela, Spain, 31--40. https://doi.org/10.18653/v1/W17--3504
    [23]
    Jianmin Zhang, Jin-ge Yao, and Xiaojun Wan. 2016. Towards Constructing Sports News from Live Text Commentary. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Berlin, Germany, 1361--1371. https://doi.org/10.18653/v1/P16--1129
    [24]
    Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q. Weinberger, and Yoav Artzi. 2020. BERTScore: Evaluating Text Generation with BERT . In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26--30, 2020 . OpenReview.net. https://openreview.net/forum?id=SkeHuCVFDr
    [25]
    Hao Zheng and Mirella Lapata. 2019. Sentence Centrality Revisited for Unsupervised Summarization. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics . Association for Computational Linguistics, Florence, Italy, 6236--6247. https://doi.org/10.18653/v1/P19--1628
    [26]
    Liya Zhu, Wenchao Wang, Yujing Chen, Xueqiang Lv, and Jianshe Zhou. 2016. Research on summary sentences extraction oriented to live sports text. In Natural Language Understanding and Intelligent Applications. Springer, 798--807.

    Cited By

    View all
    • (2024)A Coarse-to-Fine Framework for Entity-Relation Joint Extraction2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00082(1009-1022)Online publication date: 13-May-2024
    • (2023)Generating Factually Consistent Sport Highlights NarrationsProceedings of the 6th International Workshop on Multimedia Content Analysis in Sports10.1145/3606038.3616157(15-22)Online publication date: 29-Oct-2023
    • (2023)Long-Document Cross-Lingual SummarizationProceedings of the Sixteenth ACM International Conference on Web Search and Data Mining10.1145/3539597.3570479(1084-1092)Online publication date: 27-Feb-2023
    • Show More Cited By

    Index Terms

    1. Knowledge Enhanced Sports Game Summarization

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      WSDM '22: Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining
      February 2022
      1690 pages
      ISBN:9781450391320
      DOI:10.1145/3488560
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 15 February 2022

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. datasets
      2. sports game summarization
      3. text summarization

      Qualifiers

      • Research-article

      Funding Sources

      Conference

      WSDM '22

      Acceptance Rates

      Overall Acceptance Rate 498 of 2,863 submissions, 17%

      Upcoming Conference

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)44
      • Downloads (Last 6 weeks)4
      Reflects downloads up to 12 Aug 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)A Coarse-to-Fine Framework for Entity-Relation Joint Extraction2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00082(1009-1022)Online publication date: 13-May-2024
      • (2023)Generating Factually Consistent Sport Highlights NarrationsProceedings of the 6th International Workshop on Multimedia Content Analysis in Sports10.1145/3606038.3616157(15-22)Online publication date: 29-Oct-2023
      • (2023)Long-Document Cross-Lingual SummarizationProceedings of the Sixteenth ACM International Conference on Web Search and Data Mining10.1145/3539597.3570479(1084-1092)Online publication date: 27-Feb-2023
      • (2022)A Survey on Cross-Lingual SummarizationTransactions of the Association for Computational Linguistics10.1162/tacl_a_0052010(1304-1323)Online publication date: 28-Nov-2022
      • (2022)Soccer Game Summarization using Audio Commentary, Metadata, and CaptionsProceedings of the 1st Workshop on User-centric Narrative Summarization of Long Videos10.1145/3552463.3557019(13-22)Online publication date: 10-Oct-2022

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media