A Service-Based Pipeline for Complex Linguistic Tasks Adopting LLMs and Knowledge Graphs

Bianchini, Filippo; Calamo, Marco; De Luzi, Francesca; Macrì, Mattia; Mecella, Massimo

doi:10.1007/978-3-031-72578-4_8

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 2221))

Included in the following conference series:

Symposium and Summer School on Service-Oriented Computing

43 Accesses

Abstract

This paper introduces a microservices-based architecture designed for executing complex linguistic tasks using Large Language Models (LLMs) and Knowledge Graphs (KGs). It has been conceived by focusing on the legal domain, and it integrates Domain-specific KGs and Constraint KGs to address tasks such as law extraction and reasoning. We outline how the pipeline works through a running example involving the extraction of legislative references from legal documents. Furthermore, we discuss a methodology for building KGs from unstructured documents and employing zero-shot prompt engineering techniques to facilitate information extraction. Finally, we present a validation process leveraging the Constraint KG to ensure the coherence and correctness of generated outputs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 64.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Enhancing Complex Linguistic Tasks Resolution Through Fine-Tuning LLMs, RAG and Knowledge Graphs (Short Paper)

LegalGPT: Legal Chain of Thought for the Legal Large Language Model Multi-agent Framework

TraQuLA: Transparent Question Answering Over RDF Through Linguistic Analysis

Notes

1.
We informally define a complex linguistic task as a linguistic task that requires domain-specific information and knowledge and reasoning abilities to be correctly fulfilled.
2.
It is important to note that the classification of the taxonomy is not dependent on the specific techniques used to apply it. This provides the possibility to choose another approach or method in the future.
3.
Cypher is Neo4j’s declarative graph query language.

References

Achiam, J., et al.: GPT-4 technical report. arXiv preprint arXiv:2303.08774 (2023)
Bianchini, F., Calamo, M., De Luzi, F., Macrì, M., Mecella, M.: Enhancing complex linguistic tasks resolution through fine-tuning LLMs, RAG and knowledge graphs. In: Almeida, J.P.A., Di Ciccio, C., Kalloniatis, C. (eds.) CAiSE 2024. LNBIP, vol. 521, pp. 147–155. Springer, Cham (2024). https://doi.org/10.1007/978-3-031-61003-5_13
Chapter Google Scholar
Brown, T., et al.: Language models are few-shot learners. In: Advances in Neural Information Processing Systems, vol. 33, pp. 1877–1901 (2020)
Google Scholar
Calamo, M., De Luzi, F., Macrì, M., Mencattini, T., Mecella, M.: CICERO: a GPT2-based writing assistant to investigate the effectiveness of specialized LLMs’ applications in e-justice. In: Frontiers in Artificial Intelligence and Applications (2023)
Google Scholar
Dziri, N., Madotto, A., Zaïane, O., Bose, A.J.: Neural path hunter: reducing hallucination in dialogue systems via path grounding. arXiv preprint arXiv:2104.08455 (2021)
Francis, N., et al.: Cypher: an evolving query language for property graphs. In: Proceedings of the 2018 International Conference on Management of Data, SIGMOD 2018, pp. 1433–1445. Association for Computing Machinery, New York, NY, USA (2018)
Google Scholar
Gao, Y., et al.: Retrieval-augmented generation for large language models: a survey. arXiv preprint arXiv:2312.10997 (2023)
Jeong, C.: A study on the implementation of generative AI services using an enterprise data-based LLM application architecture. arXiv preprint arXiv:2309.01105 (2023)
Ji, Z., et al.: Survey of hallucination in natural language generation. ACM Comput. Surv. 55(12), 1–38 (2023)
Article Google Scholar
Kang, H., Ni, J., Yao, H.: EVER: mitigating hallucination in large language models through real-time verification and rectification. arXiv preprint arXiv:2311.09114 (2023)
Kim, J., Park, S., Kwon, Y., Jo, Y., Thorne, J., Choi, E.: FactKG: fact verification via reasoning on knowledge graphs. arXiv preprint arXiv:2305.06590 (2023)
Lewis, P., et al.: Retrieval-augmented generation for knowledge-intensive NLP tasks. In: Advances in Neural Information Processing Systems, vol. 33, pp. 9459–9474 (2020)
Google Scholar
Luo, L., Vu, T.T., Phung, D., Haffari, G.: Systematic assessment of factual knowledge in large language models. arXiv preprint arXiv:2310.11638 (2023)
Wei, X., et al.: Zero-shot information extraction via chatting with ChatGPT. arXiv preprint arXiv:2302.10205 (2023)
White, J., et al.: A prompt pattern catalog to enhance prompt engineering with ChatGPT. arXiv preprint arXiv:2302.11382 (2023)
Yin, W., Xu, M., Li, Y., Liu, X.: LLM as a system service on mobile devices. arXiv preprint arXiv:2403.11805 (2024)
Yu, S., Huang, T., Liu, M., Wang, Z.: BEAR: revolutionizing service domain knowledge graph construction with LLM. In: Monti, F., Rinderle-Ma, S., Ruiz Cortés, A., Zheng, Z., Mecella, M. (eds.) ICSOC 2023. LNCS, vol. 14419, pp. 339–346. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-48421-6_23
Chapter Google Scholar

Download references

Acknowledgments

The work of Mattia Macrì has been supported by the PhD fellowship Pubblica Amministrazione DM118 - CUP B83C22003460006. The work of Marco Calamo and Filippo Bianchini has been supported by the Next-Generation EU (Italian PNRR - M4 C2, Invest 1.3 - D.D. 1551.11-10-2022), named PE4 - MICS (Made in Italy - Circular and Sustainable).

Author information

Authors and Affiliations

Dipartimento Ingegneria Informatica, Automatica e Gestionale Antonio Ruberti, Sapienza Università di Roma, Rome, Italy
Filippo Bianchini, Marco Calamo, Francesca De Luzi, Mattia Macrì & Massimo Mecella

Authors

Filippo Bianchini
View author publications
You can also search for this author in PubMed Google Scholar
Marco Calamo
View author publications
You can also search for this author in PubMed Google Scholar
Francesca De Luzi
View author publications
You can also search for this author in PubMed Google Scholar
Mattia Macrì
View author publications
You can also search for this author in PubMed Google Scholar
Massimo Mecella
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Francesca De Luzi .

Editor information

Editors and Affiliations

University of Stuttgart, Stuttgart, Germany
Marco Aiello
University of Stuttgart, Stuttgart, Germany
Johanna Barzen
TU Wien, Vienna, Wien, Austria
Schahram Dustdar
University of Stuttgart, Stuttgart, Germany
Frank Leymann

A Appendix

Here are two examples of zero-shot prompting. The standard approach involves breaking down the request into a system instruction and a prompt. This helps the model better understand what tasks it needs to perform overall versus for a specific instance, leading to improved performance. The first example of a prompt focuses on extracting Primary and NER information. These are pieces of information that only require the domain document as input and perform NER to extract named entities. In particular, it aims to analyze citation acts to extract all cited laws. In the example, the variables are highlighted in to distinguish them from the template parts. Specifically, the variables and form the template for extracting Primary and NER information. The other variables contain the necessary data for the end user during the Information Extraction phase.

In the second example, we present instructions for extracting Semi-derivative and Question Answering information. These tasks require both the document and previously extracted data as input, executing the Question Answering task to provide an output response. In this instance, we aim to understand why laws were mentioned in a citation act.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bianchini, F., Calamo, M., De Luzi, F., Macrì, M., Mecella, M. (2025). A Service-Based Pipeline for Complex Linguistic Tasks Adopting LLMs and Knowledge Graphs. In: Aiello, M., Barzen, J., Dustdar, S., Leymann, F. (eds) Service-Oriented Computing. SummerSOC 2024. Communications in Computer and Information Science, vol 2221. Springer, Cham. https://doi.org/10.1007/978-3-031-72578-4_8

Download citation

DOI: https://doi.org/10.1007/978-3-031-72578-4_8
Published: 19 October 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72577-7
Online ISBN: 978-3-031-72578-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Service-Based Pipeline for Complex Linguistic Tasks Adopting LLMs and Knowledge Graphs