Abstract
This paper introduces a microservices-based architecture designed for executing complex linguistic tasks using Large Language Models (LLMs) and Knowledge Graphs (KGs). It has been conceived by focusing on the legal domain, and it integrates Domain-specific KGs and Constraint KGs to address tasks such as law extraction and reasoning. We outline how the pipeline works through a running example involving the extraction of legislative references from legal documents. Furthermore, we discuss a methodology for building KGs from unstructured documents and employing zero-shot prompt engineering techniques to facilitate information extraction. Finally, we present a validation process leveraging the Constraint KG to ensure the coherence and correctness of generated outputs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
We informally define a complex linguistic task as a linguistic task that requires domain-specific information and knowledge and reasoning abilities to be correctly fulfilled.
- 2.
It is important to note that the classification of the taxonomy is not dependent on the specific techniques used to apply it. This provides the possibility to choose another approach or method in the future.
- 3.
Cypher is Neo4j’s declarative graph query language.
References
Achiam, J., et al.: GPT-4 technical report. arXiv preprint arXiv:2303.08774 (2023)
Bianchini, F., Calamo, M., De Luzi, F., Macrì, M., Mecella, M.: Enhancing complex linguistic tasks resolution through fine-tuning LLMs, RAG and knowledge graphs. In: Almeida, J.P.A., Di Ciccio, C., Kalloniatis, C. (eds.) CAiSE 2024. LNBIP, vol. 521, pp. 147–155. Springer, Cham (2024). https://doi.org/10.1007/978-3-031-61003-5_13
Brown, T., et al.: Language models are few-shot learners. In: Advances in Neural Information Processing Systems, vol. 33, pp. 1877–1901 (2020)
Calamo, M., De Luzi, F., Macrì, M., Mencattini, T., Mecella, M.: CICERO: a GPT2-based writing assistant to investigate the effectiveness of specialized LLMs’ applications in e-justice. In: Frontiers in Artificial Intelligence and Applications (2023)
Dziri, N., Madotto, A., Zaïane, O., Bose, A.J.: Neural path hunter: reducing hallucination in dialogue systems via path grounding. arXiv preprint arXiv:2104.08455 (2021)
Francis, N., et al.: Cypher: an evolving query language for property graphs. In: Proceedings of the 2018 International Conference on Management of Data, SIGMOD 2018, pp. 1433–1445. Association for Computing Machinery, New York, NY, USA (2018)
Gao, Y., et al.: Retrieval-augmented generation for large language models: a survey. arXiv preprint arXiv:2312.10997 (2023)
Jeong, C.: A study on the implementation of generative AI services using an enterprise data-based LLM application architecture. arXiv preprint arXiv:2309.01105 (2023)
Ji, Z., et al.: Survey of hallucination in natural language generation. ACM Comput. Surv. 55(12), 1–38 (2023)
Kang, H., Ni, J., Yao, H.: EVER: mitigating hallucination in large language models through real-time verification and rectification. arXiv preprint arXiv:2311.09114 (2023)
Kim, J., Park, S., Kwon, Y., Jo, Y., Thorne, J., Choi, E.: FactKG: fact verification via reasoning on knowledge graphs. arXiv preprint arXiv:2305.06590 (2023)
Lewis, P., et al.: Retrieval-augmented generation for knowledge-intensive NLP tasks. In: Advances in Neural Information Processing Systems, vol. 33, pp. 9459–9474 (2020)
Luo, L., Vu, T.T., Phung, D., Haffari, G.: Systematic assessment of factual knowledge in large language models. arXiv preprint arXiv:2310.11638 (2023)
Wei, X., et al.: Zero-shot information extraction via chatting with ChatGPT. arXiv preprint arXiv:2302.10205 (2023)
White, J., et al.: A prompt pattern catalog to enhance prompt engineering with ChatGPT. arXiv preprint arXiv:2302.11382 (2023)
Yin, W., Xu, M., Li, Y., Liu, X.: LLM as a system service on mobile devices. arXiv preprint arXiv:2403.11805 (2024)
Yu, S., Huang, T., Liu, M., Wang, Z.: BEAR: revolutionizing service domain knowledge graph construction with LLM. In: Monti, F., Rinderle-Ma, S., Ruiz Cortés, A., Zheng, Z., Mecella, M. (eds.) ICSOC 2023. LNCS, vol. 14419, pp. 339–346. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-48421-6_23
Acknowledgments
The work of Mattia Macrì has been supported by the PhD fellowship Pubblica Amministrazione DM118 - CUP B83C22003460006. The work of Marco Calamo and Filippo Bianchini has been supported by the Next-Generation EU (Italian PNRR - M4 C2, Invest 1.3 - D.D. 1551.11-10-2022), named PE4 - MICS (Made in Italy - Circular and Sustainable).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Appendix
A Appendix
Here are two examples of zero-shot prompting. The standard approach involves breaking down the request into a system instruction and a prompt. This helps the model better understand what tasks it needs to perform overall versus for a specific instance, leading to improved performance. The first example of a prompt focuses on extracting Primary and NER information. These are pieces of information that only require the domain document as input and perform NER to extract named entities. In particular, it aims to analyze citation acts to extract all cited laws. In the example, the variables are highlighted in to distinguish them from the template parts. Specifically, the variables and form the template for extracting Primary and NER information. The other variables contain the necessary data for the end user during the Information Extraction phase.
In the second example, we present instructions for extracting Semi-derivative and Question Answering information. These tasks require both the document and previously extracted data as input, executing the Question Answering task to provide an output response. In this instance, we aim to understand why laws were mentioned in a citation act.
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Bianchini, F., Calamo, M., De Luzi, F., Macrì, M., Mecella, M. (2025). A Service-Based Pipeline for Complex Linguistic Tasks Adopting LLMs and Knowledge Graphs. In: Aiello, M., Barzen, J., Dustdar, S., Leymann, F. (eds) Service-Oriented Computing. SummerSOC 2024. Communications in Computer and Information Science, vol 2221. Springer, Cham. https://doi.org/10.1007/978-3-031-72578-4_8
Download citation
DOI: https://doi.org/10.1007/978-3-031-72578-4_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72577-7
Online ISBN: 978-3-031-72578-4
eBook Packages: Computer ScienceComputer Science (R0)