ip2text: A Reasoning-Aware Dataset for Text Generation of Devices on the Internet

Ren, Yimo; Li, Zhi; Li, Hong; Liu, Peipei; Liu, Jie; Zhu, Hongsong; Sun, Limin

doi:10.1007/978-3-031-35415-1_2

Yimo Ren^14,15,
Zhi Li^14,15,
Hong Li^14,15,
Peipei Liu^14,15,
Jie Liu^14,15,
Hongsong Zhu^14,15 &
…
Limin Sun^14,15

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13922))

Included in the following conference series:

International Conference on Database Systems for Advanced Applications

503 Accesses

Abstract

Nowadays, Internet of Things (IoT) search engines are more and more popular for users to explore devices on the Internet. Table-to-text generation of devices is helpful for users to understand search results from IoT search engines. However, it has yet to be available, and difficult to obtain a good text description of the devices because of lacking quality data for this task. Also, the content is hidden in multiple attributes of the devices, and it takes work to mine them well and directly. Thus, this paper introduces ip2text, a challenging dataset for reasoning-aware table-to-text generation of devices on the Internet. The input data in ip2text are tables, which contain many attributes of devices collected from the Internet. And the output data is their corresponding descriptions. Generating descriptions of devices is costly, time-consuming, and does not scale to Internet data. To tackle this problem, this paper designs an annotation method based on active learning according to the characteristics of devices and studies the performance of existing and typical state-of-the-art models for table-to-text generation on ip2text. The automatic evaluation shows that existing pre-trained baselines could be challenging to perform satisfactorily on ip2text, with BLEU almost all less than 1. Further, the human evaluation shows that BART and T5 are prone to produce hallucinations when reasoning, and results show that Hallucination is more than 0.10. Therefore, it is not easy to achieve satisfactory performance using the existing and mainstream seq2seq models based on the reasoning-aware ip2text. So, continuous improvement is urgently needed for the models and datasets for the table-to-text generation of devices on the Internet.

Supported by National Key Research and Development Projects (No. 2020YFB2103803) and National Natural Science Foundation of China (No. U1766215, No. 61931019).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

TableSF: A Structural Bias Framework for Table-To-Text Generation

Leveraging Large Language Models for Flexible and Robust Table-to-Text Generation

Few-Shot Table-to-Text Generation with Structural Bias Attention

References

Matherly, J.: Complete guide to Shodan. Shodan, LLC (2016–02-25), vol. 1 (2015)
Google Scholar
Li, R., Shen, M., Yu, H., Li, C., Duan, P., Zhu, L.: A survey on cyberspace search engines. In: Lu, W., et al. (eds.) CNCERT 2020. CCIS, vol. 1299, pp. 206–214. Springer, Singapore (2020). https://doi.org/10.1007/978-981-33-4922-3_15
Chapter Google Scholar
Ackley, D., Yang, H.: Exploration of smart grid device cybersecurity vulnerability using Shodan. In: 2020 IEEE Power & Energy Society General Meeting (PESGM) (2020)
Google Scholar
Novianto, B., Suryanto, Y., Ramli, K.: Vulnerability analysis of internet devices from Indonesia based on exposure data in Shodan. In: IOP Conference Series: Materials Science and Engineering, vol. 1115, no. 1, p. 012045 (9pp) (2021)
Google Scholar
Belz, A.: Automatic generation of weather forecast texts using comprehensive probabilistic generation-space models. Nat. Lang. Eng. 14(4), 431–455 (2008)
Article Google Scholar
Chen, D.L., Mooney, R.J.: Learning to sportscast: a test of grounded language acquisition. In: Proceedings of the 25th International Conference on Machine Learning, pp. 128–135 (2008)
Google Scholar
Dušek, O., Novikova, J., Rieser, V.: Evaluating the state-of-the-art of end-to-end natural language generation: the E2E NLG challenge. Comput. Speech Lang. 59, 123–156 (2020)
Article Google Scholar
Lebret, R, Grangier, D., Auli, M.: Neural text generation from structured data with application to the biography domain. arXiv preprint arXiv:1603.07771 (2016)
Moosavi, N.S., Rücklé, A., Roth, D., Gurevych, I.: SciGen: a dataset for reasoning-aware text generation from scientific tables. In: Thirty-Fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2) (2021)
Google Scholar
Wiseman, S., Shieber, S.M., Rush, A.M.: Challenges in data-to-document generation. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (2017)
Google Scholar
Lewis, M., et al.: BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In: Meeting of the Association for Computational Linguistics (2020)
Google Scholar
Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21(140), 1–67 (2020)
MathSciNet Google Scholar
Liu, T., Wang, K., Sha, L., Chang, B, Sui, Z.: Table-to-text generation by structure-aware seq2seq learning. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Google Scholar
Nan, L., et al.: DART: open-domain structured data record to text generation. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 432–447. Association for Computational Linguistics, Online (2021)
Google Scholar
Liu, T., Luo, F., Xia, Q., Ma, S., Chang, B., Sui, Z.: Hierarchical encoder with auxiliary supervision for neural table-to-text generation: learning better representation for tables. In: Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, pp. 6786–6793 (2019)
Google Scholar
Wang, Q., et al.: Describing a knowledge base. arXiv preprint arXiv:1809.01797 (2018)
Chen, Z., et al.: Logic2text: high-fidelity natural language generation from logical forms. arXiv preprint arXiv:2004.14579 (2020)
Chen, W., et al.: TabFact: a large-scale dataset for table-based fact verification. arXiv preprint arXiv:1909.02164 (2019)
Parikh, A.P., et al.: ToTTo: a controlled table-to-text generation dataset. arXiv preprint arXiv:2004.14373 (2020)
Luo, Y., Chen, X., Ge, N., Lu, J.: Deep learning based device classification method for safeguarding internet of things. In: 2021 IEEE Global Communications Conference (GLOBECOM), pp. 1–6. IEEE (2021)
Google Scholar
Wan, Y., Xu, K., Wang, F., Xue, G.: IoTMosaic: inferring user activities from IoT network traffic in smart homes. In: IEEE INFOCOM 2022-IEEE Conference on Computer Communications, pp. 370–379. IEEE (2022)
Google Scholar
Wang, Y., Burgener, D., Flores, M., Kuzmanovic, A., Huang, C.: Towards street-level IP geolocation. In: 8th USENIX Symposium on Networked Systems Design and Implementation (NSDI 2011) (2011)
Google Scholar
Paiva, T.B., Siqueira, Y., Batista, D.M., Hirata, R., Terada, R.: BGP anomalies classification using features based on as relationship graphs. In: 2021 IEEE Latin-American Conference on Communications (LATINCOM), pp. 1–6. IEEE (2021)
Google Scholar
Lu, C., et al.: From WHOIS to WHOWAS: a large-scale measurement study of domain registration privacy under the GDPR. In: NDSS (2021)
Google Scholar
Fiebig, T., Borgolte, K., Hao, S., Kruegel, C., Vigna, G., Feldmann, A.: In rDNS we trust: revisiting a common data-source’s reliability. In: Beverly, R., Smaragdakis, G., Feldmann, A. (eds.) PAM 2018. LNCS, vol. 10771, pp. 131–145. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-76481-8_10
Chapter Google Scholar
Ye, R., Shi, W., Zhou, H., Wei, Z., Li, L.: Variational template machine for data-to-text generation. arXiv preprint arXiv:2002.01127 (2020)
Ren, P., et al.: A survey of deep active learning. ACM Comput. Surv. (CSUR) 54(9), 1–40 (2021)
Article Google Scholar
Alihosseini, D., Montahaei, E., Baghshah, M.S.: Jointly measuring diversity and quality in text generation models. In: Proceedings of the Workshop on Methods for Optimizing and Evaluating Neural Language Generation, pp. 90–98 (2019)
Google Scholar
Voita, E., Sennrich, R., Titov, I.: When a good translation is wrong in context: context-aware machine translation improves on deixis, ellipsis, and lexical cohesion. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, pp. 1198–1212. Association for Computational Linguistics (2019)
Google Scholar
Ribeiro, L.F.R., Schmitt, M, Schütze, H., Gurevych, I.: Investigating pretrained language models for graph-to-text generation. arXiv preprint arXiv:2007.08426 (2020)
Papineni, K., Roukos, S., Ward, T., Zhu, W.-J.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 311–318 (2002)
Google Scholar
Post, M.: A call for clarity in reporting bleu scores. arXiv preprint arXiv:1804.08771 (2018)
Lin, C.-Y.: Rouge: a package for automatic evaluation of summaries. In: Text Summarization Branches Out, pp. 74–81 (2004)
Google Scholar
Denkowski, M., Lavie, A.: Meteor universal: language specific translation evaluation for any target language. In: Proceedings of the Ninth Workshop on Statistical Machine Translation, pp. 376–380 (2014)
Google Scholar
Zhang, T., Kishore, V., Wu, F., Weinberger, K.Q., Artzi, Y.: BERTScore: evaluating text generation with BERT. arXiv preprint arXiv:1904.09675 (2019)
Zhao, W., Peyrard, M., Liu, F., Gao, Y., Meyer, C.M., Eger, S.: MoverScore: text generation evaluating with contextualized embeddings and earth mover distance. arXiv preprint arXiv:1909.02622 (2019)
Sellam, T., Das, D., Parikh, A.P.: BLEURT: learning robust metrics for text generation. arXiv preprint arXiv:2004.04696 (2020)

Download references

Author information

Authors and Affiliations

School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China
Yimo Ren, Zhi Li, Hong Li, Peipei Liu, Jie Liu, Hongsong Zhu & Limin Sun
Institute of Information Engineering, University of Chinese Academy of Sciences, Beijing, China
Yimo Ren, Zhi Li, Hong Li, Peipei Liu, Jie Liu, Hongsong Zhu & Limin Sun

Authors

Yimo Ren
View author publications
You can also search for this author in PubMed Google Scholar
Zhi Li
View author publications
You can also search for this author in PubMed Google Scholar
Hong Li
View author publications
You can also search for this author in PubMed Google Scholar
Peipei Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jie Liu
View author publications
You can also search for this author in PubMed Google Scholar
Hongsong Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Limin Sun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhi Li .

Editor information

Editors and Affiliations

University of California, Santa Barbara, Santa Barbara, CA, USA
Amr El Abbadi
University of Auckland, Auckland, New Zealand
Gillian Dobbie
Tianjin University, Tianjin, China
Zhiyong Feng
Zhejiang University, Hangzhou, China
Lu Chen
The University of Southern Queensland, Queensland, Australia
Xiaohui Tao
Beijing University of Posts and Telecommunications, Beijing, China
Yingxia Shao
The University of Queensland, Brisbane, QLD, Australia
Hongzhi Yin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ren, Y. et al. (2023). ip2text: A Reasoning-Aware Dataset for Text Generation of Devices on the Internet. In: El Abbadi, A., et al. Database Systems for Advanced Applications. DASFAA 2023 International Workshops. DASFAA 2023. Lecture Notes in Computer Science, vol 13922. Springer, Cham. https://doi.org/10.1007/978-3-031-35415-1_2

Download citation

DOI: https://doi.org/10.1007/978-3-031-35415-1_2
Published: 28 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-35414-4
Online ISBN: 978-3-031-35415-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

ip2text: A Reasoning-Aware Dataset for Text Generation of Devices on the Internet

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

TableSF: A Structural Bias Framework for Table-To-Text Generation

Leveraging Large Language Models for Flexible and Robust Table-to-Text Generation

Few-Shot Table-to-Text Generation with Structural Bias Attention

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

ip2text: A Reasoning-Aware Dataset for Text Generation of Devices on the Internet

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

TableSF: A Structural Bias Framework for Table-To-Text Generation

Leveraging Large Language Models for Flexible and Robust Table-to-Text Generation

Few-Shot Table-to-Text Generation with Structural Bias Attention

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation