Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3635059.3635068acmotherconferencesArticle/Chapter ViewAbstractPublication PagespciConference Proceedingsconference-collections
research-article
Open access

Can Large Language Models Revolutionalize Open Government Data Portals? A Case of Using ChatGPT in statistics.gov.scot

Published: 14 February 2024 Publication History
  • Get Citation Alerts
  • Abstract

    Large language models possess tremendous natural language understanding and generation abilities. However, they often lack the ability to discern between fact and fiction, leading to factually incorrect responses. Open Government Data are repositories of, often times linked, information that is freely available to everyone. By combining these two technologies in a proof of concept designed application utilizing the GPT3.5 OpenAI model and the Scottish open statistics portal, we show that not only is it possible to augment the large language model’s factuality of responses, but also propose a novel way to effectively access and retrieve statistical information from the data portal just through natural language querying. We anticipate that this paper will trigger a discussion regarding the transformation of Open Government Portals through large language models.

    References

    [1]
    Razvan Azamfirei, Sapna R Kudchadkar, and James Fackler. 2023. Large language models and the perils of their hallucinations. Critical Care 27, 1 (2023), 1–2.
    [2]
    Adithya Bhaskar, Alexander R Fabbri, and Greg Durrett. 2022. Zero-shot opinion summarization with GPT-3. arXiv preprint arXiv:2211.15914 (2022).
    [3]
    Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. arxiv:2005.14165 [cs.CL]
    [4]
    Yupeng Chang, Xu Wang, Jindong Wang, Yuan Wu, Linyi Yang, Kaijie Zhu, Hao Chen, Xiaoyuan Yi, Cunxiang Wang, Yidong Wang, Wei Ye, Yue Zhang, Yi Chang, Philip S. Yu, Qiang Yang, and Xing Xie. 2023. A Survey on Evaluation of Large Language Models. arxiv:2307.03109 [cs.CL]
    [5]
    Xuanting Chen, Junjie Ye, Can Zu, Nuo Xu, Rui Zheng, Minlong Peng, Jie Zhou, Tao Gui, Qi Zhang, and Xuanjing Huang. 2023. How Robust is GPT-3.5 to Predecessors? A Comprehensive Study on Language Understanding Tasks. arXiv preprint arXiv:2303.00293 (2023).
    [6]
    R Cyganiak and D Reynolds. 2014. The RDF data cube vocabulary: W3C recommendation. W3C Tech. Rep. (2014).
    [7]
    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
    [8]
    Santiago González-Carvajal and Eduardo C Garrido-Merchán. 2020. Comparing BERT against traditional machine learning text classification. arXiv preprint arXiv:2005.13012 (2020).
    [9]
    Evangelos Kalampokis, Areti Karamanou, and Konstantinos Tarabanis. 2019. Interoperability Conflicts in Linked Open Statistical Data. Information 10, 8 (2019). https://doi.org/10.3390/info10080249
    [10]
    Evangelos Kalampokis, Efthimios Tambouris, and Konstantinos Tarabanis. 2016. Linked Open Cube Analytics Systems: Potential and Challenges. IEEE Intelligent Systems 31, 5 (2016), 89–92. https://doi.org/10.1109/MIS.2016.82
    [11]
    Evangelos Kalampokis, Efthimios Tambouris, and Konstantinos Tarabanis. 2017. ICT tools for creating, expanding and exploiting statistical linked Open Data. Statistical Journal of the IAOS 33 (2017), 503–514. https://doi.org/10.3233/SJI-150190 2.
    [12]
    Evangelos Kalampokis, Dimitris Zeginis, and Konstantinos Tarabanis. 2019. On modeling linked open statistical data. Journal of Web Semantics 55 (2019), 56–68. https://doi.org/10.1016/j.websem.2018.11.002
    [13]
    Tiffany H Kung, Morgan Cheatham, Arielle Medenilla, Czarina Sillos, Lorie De Leon, Camille Elepaño, Maria Madriaga, Rimel Aggabao, Giezel Diaz-Candido, James Maningo, 2023. Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. PLoS digital health 2, 2 (2023), e0000198.
    [14]
    Alexandros Laios, Georgios Theophilou, Diederick De Jong, and Evangelos Kalampokis. 2023. The Future of AI in Ovarian Cancer Research: The Large Language Models Perspective. Cancer Control 30 (2023), 10732748231197915. https://doi.org/10.1177/10732748231197915 37624621.
    [15]
    Huayang Li, Yixuan Su, Deng Cai, Yan Wang, and Lemao Liu. 2022. A survey on retrieval-augmented text generation. arXiv preprint arXiv:2202.01110 (2022).
    [16]
    Yang Liu. 2019. Fine-tune BERT for Extractive Summarization. arxiv:1903.10318 [cs.CL]
    [17]
    Renze Lou, Kai Zhang, and Wenpeng Yin. 2023. Is Prompt All You Need? No. A Comprehensive and Broader View of Instruction Learning. arxiv:2303.10475 [cs.CL]
    [18]
    Rahul Mehta and Vasudeva Varma. 2023. LLM-RM at SemEval-2023 Task 2: Multilingual Complex NER using XLM-RoBERTa. arxiv:2305.03300 [cs.CL]
    [19]
    OpenAI. 2023. GPT-4 Technical Report. arxiv:2303.08774 [cs.CL]
    [20]
    Juan Manuel Perez Martinez, Rafael Berlanga, Maria Jose Aramburu, and Torben Bach Pedersen. 2008. Integrating Data Warehouses with Web Data: A Survey. IEEE Transactions on Knowledge and Data Engineering 20, 7 (2008), 940–955. https://doi.org/10.1109/TKDE.2007.190746
    [21]
    Fabio Petroni, Patrick Lewis, Aleksandra Piktus, Tim Rocktäschel, Yuxiang Wu, Alexander H. Miller, and Sebastian Riedel. 2020. How Context Affects Language Models’ Factual Predictions. arxiv:2005.04611 [cs.CL]
    [22]
    Fabio Petroni, Tim Rocktäschel, Patrick Lewis, Anton Bakhtin, Yuxiang Wu, Alexander H Miller, and Sebastian Riedel. 2019. Language models as knowledge bases?arXiv preprint arXiv:1909.01066 (2019).
    [23]
    Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever, 2018. Improving language understanding by generative pre-training. (2018).
    [24]
    Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, 2019. Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2019), 9.
    [25]
    Freda Shi, Xinyun Chen, Kanishka Misra, Nathan Scales, David Dohan, Ed H. Chi, Nathanael Schärli, and Denny Zhou. 2023. Large Language Models Can Be Easily Distracted by Irrelevant Context. In Proceedings of the 40th International Conference on Machine Learning(Proceedings of Machine Learning Research, Vol. 202), Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (Eds.). PMLR, 31210–31227. https://proceedings.mlr.press/v202/shi23a.html
    [26]
    Karan Singhal, Shekoofeh Azizi, Tao Tu, S Sara Mahdavi, Jason Wei, Hyung Won Chung, Nathan Scales, Ajay Tanwani, Heather Cole-Lewis, Stephen Pfohl, 2022. Large language models encode clinical knowledge. arXiv preprint arXiv:2212.13138 (2022).
    [27]
    Xiaofei Sun, Xiaoya Li, Jiwei Li, Fei Wu, Shangwei Guo, Tianwei Zhang, and Guoyin Wang. 2023. Text Classification via Large Language Models. arxiv:2305.08377 [cs.CL]
    [28]
    Yi Tay, Mostafa Dehghani, Dara Bahri, and Donald Metzler. 2020. Efficient transformers: A survey.(2020). arXiv preprint cs.LG/2009.06732 (2020).
    [29]
    Romal Thoppilan, Daniel De Freitas, Jamie Hall, Noam Shazeer, Apoorv Kulshreshtha, Heng-Tze Cheng, Alicia Jin, Taylor Bos, Leslie Baker, Yu Du, YaGuang Li, Hongrae Lee, Huaixiu Steven Zheng, Amin Ghafouri, Marcelo Menegali, Yanping Huang, Maxim Krikun, Dmitry Lepikhin, James Qin, Dehao Chen, Yuanzhong Xu, Zhifeng Chen, Adam Roberts, Maarten Bosma, Vincent Zhao, Yanqi Zhou, Chung-Ching Chang, Igor Krivokon, Will Rusch, Marc Pickett, Pranesh Srinivasan, Laichee Man, Kathleen Meier-Hellstern, Meredith Ringel Morris, Tulsee Doshi, Renelito Delos Santos, Toju Duke, Johnny Soraker, Ben Zevenbergen, Vinodkumar Prabhakaran, Mark Diaz, Ben Hutchinson, Kristen Olson, Alejandra Molina, Erin Hoffman-John, Josh Lee, Lora Aroyo, Ravi Rajakumar, Alena Butryna, Matthew Lamm, Viktoriya Kuzmina, Joe Fenton, Aaron Cohen, Rachel Bernstein, Ray Kurzweil, Blaise Aguera-Arcas, Claire Cui, Marian Croak, Ed Chi, and Quoc Le. 2022. LaMDA: Language Models for Dialog Applications. arxiv:2201.08239 [cs.CL]
    [30]
    Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, and Guillaume Lample. 2023. LLaMA: Open and Efficient Foundation Language Models. arxiv:2302.13971 [cs.CL]
    [31]
    Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, Dan Bikel, Lukas Blecher, Cristian Canton Ferrer, Moya Chen, Guillem Cucurull, David Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller, Cynthia Gao, Vedanuj Goswami, Naman Goyal, Anthony Hartshorn, Saghar Hosseini, Rui Hou, Hakan Inan, Marcin Kardas, Viktor Kerkez, Madian Khabsa, Isabel Kloumann, Artem Korenev, Punit Singh Koura, Marie-Anne Lachaux, Thibaut Lavril, Jenya Lee, Diana Liskovich, Yinghai Lu, Yuning Mao, Xavier Martinet, Todor Mihaylov, Pushkar Mishra, Igor Molybog, Yixin Nie, Andrew Poulton, Jeremy Reizenstein, Rashi Rungta, Kalyan Saladi, Alan Schelten, Ruan Silva, Eric Michael Smith, Ranjan Subramanian, Xiaoqing Ellen Tan, Binh Tang, Ross Taylor, Adina Williams, Jian Xiang Kuan, Puxin Xu, Zheng Yan, Iliyan Zarov, Yuchen Zhang, Angela Fan, Melanie Kambadur, Sharan Narang, Aurelien Rodriguez, Robert Stojnic, Sergey Edunov, and Thomas Scialom. 2023. Llama 2: Open Foundation and Fine-Tuned Chat Models. arxiv:2307.09288 [cs.CL]
    [32]
    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).
    [33]
    Shuhe Wang, Xiaofei Sun, Xiaoya Li, Rongbin Ouyang, Fei Wu, Tianwei Zhang, Jiwei Li, and Guoyin Wang. 2023. GPT-NER: Named Entity Recognition via Large Language Models. arxiv:2304.10428 [cs.CL]
    [34]
    Shijie Wu, Ozan Irsoy, Steven Lu, Vadim Dabravolski, Mark Dredze, Sebastian Gehrmann, Prabhanjan Kambadur, David Rosenberg, and Gideon Mann. 2023. Bloomberggpt: A large language model for finance. arXiv preprint arXiv:2303.17564 (2023).
    [35]
    Junjie Ye, Xuanting Chen, Nuo Xu, Can Zu, Zekai Shao, Shichun Liu, Yuhan Cui, Zeyang Zhou, Chao Gong, Yang Shen, Jie Zhou, Siming Chen, Tao Gui, Qi Zhang, and Xuanjing Huang. 2023. A Comprehensive Capability Analysis of GPT-3 and GPT-3.5 Series Models. arxiv:2303.10420 [cs.CL]
    [36]
    Seonghyeon Ye, Hyeonbin Hwang, Sohee Yang, Hyeongu Yun, Yireun Kim, and Minjoon Seo. 2023. In-Context Instruction Learning. arxiv:2302.14691 [cs.CL]
    [37]
    Gokul Yenduri, Ramalingam M, Chemmalar Selvi G, Supriya Y, Gautam Srivastava, Praveen Kumar Reddy Maddikunta, Deepti Raj G, Rutvij H Jhaveri, Prabadevi B, Weizheng Wang, Athanasios V. Vasilakos, and Thippa Reddy Gadekallu. 2023. Generative Pre-trained Transformer: A Comprehensive Review on Enabling Technologies, Potential Applications, Emerging Challenges, and Future Directions. arxiv:2305.10435 [cs.CL]
    [38]
    Zhebin Zhang, Sai Wu, Dawei Jiang, and Gang Chen. 2021. BERT-JAM: Maximizing the utilization of BERT for neural machine translation. Neurocomputing 460 (2021), 84–94.
    [39]
    Andrew Zhao, Daniel Huang, Quentin Xu, Matthieu Lin, Yong-Jin Liu, and Gao Huang. 2023. ExpeL: LLM Agents Are Experiential Learners. arxiv:2308.10144 [cs.LG]
    [40]
    Chunting Zhou, Graham Neubig, Jiatao Gu, Mona Diab, Paco Guzman, Luke Zettlemoyer, and Marjan Ghazvininejad. 2020. Detecting hallucinated content in conditional neural sequence generation. arXiv preprint arXiv:2011.02593 (2020).

    Index Terms

    1. Can Large Language Models Revolutionalize Open Government Data Portals? A Case of Using ChatGPT in statistics.gov.scot

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Other conferences
        PCI '23: Proceedings of the 27th Pan-Hellenic Conference on Progress in Computing and Informatics
        November 2023
        304 pages
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 14 February 2024

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. chatgpt
        2. large language model
        3. linked data
        4. natural language processing
        5. open government data

        Qualifiers

        • Research-article
        • Research
        • Refereed limited

        Funding Sources

        Conference

        PCI 2023

        Acceptance Rates

        Overall Acceptance Rate 190 of 390 submissions, 49%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • 0
          Total Citations
        • 78
          Total Downloads
        • Downloads (Last 12 months)78
        • Downloads (Last 6 weeks)26

        Other Metrics

        Citations

        View Options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format.

        HTML Format

        Get Access

        Login options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media