Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3653081.3653118acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiotaaiConference Proceedingsconference-collections
research-article

A Chinese Information Operation and Maintenance Knowledge Retrieval Corpus

Published: 03 May 2024 Publication History

Abstract

The lack of large-scale information operation and maintenance corpus greatly limits the development of information technology operation and maintenance management, especially in non-English languages. To improve this situation, in this paper, we introduce a large-scale Chinese information operation and maintenance knowledge retrieval corpus and release it publicly. How to collect a large amount of retrieval corpora in different languages is a key point in building such a corpus. In this paper, we first collect a large amount of Chinese information operation and maintenance knowledge corpus related to high-frequency words in various fields using search engines, and then generate relevant questions for the corpus using ChatGPT (https://chat.openai.com/). Finally, we recruit three annotators to manually check the quality of the retrieval corpus. After this process, we have built a Chinese information operation and maintenance knowledge corpus containing 2000 retrieval questions. To verify the quality of the corpus, we divide it into two parts: a training set containing 1500 retrieval questions and a test set containing 500 retrieval questions, and test several well-known retrieval methods on them (https://pan.baidu.com/s/1rLWqHZJhE9nEOYg3OTC1Ag). The experimental results not only prove the high quality of the corpus but also provide a solid baseline performance for further research on this corpus.

References

[1]
Kim C, Haas C T, Liapi K A. Rapid, on-site spatial information acquisition and its use for infrastructure operation and maintenance[J]. Automation in Construction, 2005, 14(5): 666-684.
[2]
Yang L, Li G, Zhang Z, Operations & maintenance optimization of wind turbines integrating wind and aging information[J]. IEEE Transactions on Sustainable Energy, 2020, 12(1): 211-221.
[3]
Gao X, Pishdad-Bozorgi P. BIM-enabled facilities operation and maintenance: A review[J]. Advanced engineering informatics, 2019, 39: 227-247.
[4]
Kou L, Li Y, Zhang F, Review on monitoring, operation and maintenance of smart offshore wind farms[J]. Sensors, 2022, 22(8): 2822.
[5]
Zhu C, Du X, Zhao E, Research on Preprocessing Method for Massive Operations and Maintenance Data Based on Fuzzy Correlation[C]. 2023 4th International Conference on Intelligent Computing and Human-Computer Interaction (ICHCI). IEEE, 2023: 395-398.
[6]
Jia J, Fu H, Zhang Z, Diagnosis of power operation and maintenance records based on pre-training model and prompt learning[C]. 2022 21st International Symposium on Distributed Computing and Applications for Business Engineering and Science (DCABES). IEEE, 2022: 58-61.
[7]
Lu W, Zhang X, Lu H, Deep hierarchical encoding model for sentence semantic matching[J]. Journal of Visual Communication and Image Representation, 2020, 71: 102794.
[8]
Zhang X, Lu W, Li F, Deep feature fusion model for sentence semantic matching[J]. Computers, Materials and Continua, 2019.
[9]
Zhang X, Lu W, Zhang G, Chinese sentence semantic matching based on multi-granularity fusion model[C]. Pacific-Asia Conference on Knowledge Discovery and Data Mining. Cham: Springer International Publishing, 2020: 246-257.
[10]
Zhang J, Liu Y, Ma S, Relevance estimation with multiple information sources on search engine result pages[C]. Proceedings of the 27th ACM International Conference on Information and Knowledge Management. 2018: 627-636.
[11]
Luo C, Zheng Y, Liu Y, SogouT-16: a new web corpus to embrace IR research[C]. Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2017: 1233-1236.
[12]
Wang A, Singh A, Michael J, GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding[C]. International Conference on Learning Representations. 2018.
[13]
Kingma D P, Ba J. Adam: A method for stochastic optimization[J]. arXiv preprint arXiv:1412.6980, 2014.
[14]
Cui Y, Che W, Liu T, Pre-training with whole word masking for chinese bert[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021, 29: 3504-3514.
[15]
Zhang X, Liu Z, Xiang Y, Complicate Then Simplify: A Novel Way to Explore Pre-trained Models for Text Classification[C]. Proceedings of the 29th International Conference on Computational Linguistics. 2022: 1136-1145.
[16]
Cui Y, Che W, Liu T, Revisiting Pre-Trained Models for Chinese Natural Language Processing[C]. Findings of the Association for Computational Linguistics: EMNLP 2020. 2020: 657-668.
[17]
Sun Z, Li X, Sun X, ChineseBERT: Chinese Pretraining Enhanced by Glyph and Pinyin Information[C]. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021: 2065-2075.

Index Terms

  1. A Chinese Information Operation and Maintenance Knowledge Retrieval Corpus

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    IoTAAI '23: Proceedings of the 2023 5th International Conference on Internet of Things, Automation and Artificial Intelligence
    November 2023
    902 pages
    ISBN:9798400716485
    DOI:10.1145/3653081
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 03 May 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    IoTAAI 2023

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 12
      Total Downloads
    • Downloads (Last 12 months)12
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 11 Jan 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media