Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

A Service-Based Pipeline for Complex Linguistic Tasks Adopting LLMs and Knowledge Graphs

  • Conference paper
  • First Online:
Service-Oriented Computing (SummerSOC 2024)

Abstract

This paper introduces a microservices-based architecture designed for executing complex linguistic tasks using Large Language Models (LLMs) and Knowledge Graphs (KGs). It has been conceived by focusing on the legal domain, and it integrates Domain-specific KGs and Constraint KGs to address tasks such as law extraction and reasoning. We outline how the pipeline works through a running example involving the extraction of legislative references from legal documents. Furthermore, we discuss a methodology for building KGs from unstructured documents and employing zero-shot prompt engineering techniques to facilitate information extraction. Finally, we present a validation process leveraging the Constraint KG to ensure the coherence and correctness of generated outputs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 64.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    We informally define a complex linguistic task as a linguistic task that requires domain-specific information and knowledge and reasoning abilities to be correctly fulfilled.

  2. 2.

    It is important to note that the classification of the taxonomy is not dependent on the specific techniques used to apply it. This provides the possibility to choose another approach or method in the future.

  3. 3.

    Cypher is Neo4j’s declarative graph query language.

References

  1. Achiam, J., et al.: GPT-4 technical report. arXiv preprint arXiv:2303.08774 (2023)

  2. Bianchini, F., Calamo, M., De Luzi, F., Macrì, M., Mecella, M.: Enhancing complex linguistic tasks resolution through fine-tuning LLMs, RAG and knowledge graphs. In: Almeida, J.P.A., Di Ciccio, C., Kalloniatis, C. (eds.) CAiSE 2024. LNBIP, vol. 521, pp. 147–155. Springer, Cham (2024). https://doi.org/10.1007/978-3-031-61003-5_13

    Chapter  Google Scholar 

  3. Brown, T., et al.: Language models are few-shot learners. In: Advances in Neural Information Processing Systems, vol. 33, pp. 1877–1901 (2020)

    Google Scholar 

  4. Calamo, M., De Luzi, F., Macrì, M., Mencattini, T., Mecella, M.: CICERO: a GPT2-based writing assistant to investigate the effectiveness of specialized LLMs’ applications in e-justice. In: Frontiers in Artificial Intelligence and Applications (2023)

    Google Scholar 

  5. Dziri, N., Madotto, A., Zaïane, O., Bose, A.J.: Neural path hunter: reducing hallucination in dialogue systems via path grounding. arXiv preprint arXiv:2104.08455 (2021)

  6. Francis, N., et al.: Cypher: an evolving query language for property graphs. In: Proceedings of the 2018 International Conference on Management of Data, SIGMOD 2018, pp. 1433–1445. Association for Computing Machinery, New York, NY, USA (2018)

    Google Scholar 

  7. Gao, Y., et al.: Retrieval-augmented generation for large language models: a survey. arXiv preprint arXiv:2312.10997 (2023)

  8. Jeong, C.: A study on the implementation of generative AI services using an enterprise data-based LLM application architecture. arXiv preprint arXiv:2309.01105 (2023)

  9. Ji, Z., et al.: Survey of hallucination in natural language generation. ACM Comput. Surv. 55(12), 1–38 (2023)

    Article  Google Scholar 

  10. Kang, H., Ni, J., Yao, H.: EVER: mitigating hallucination in large language models through real-time verification and rectification. arXiv preprint arXiv:2311.09114 (2023)

  11. Kim, J., Park, S., Kwon, Y., Jo, Y., Thorne, J., Choi, E.: FactKG: fact verification via reasoning on knowledge graphs. arXiv preprint arXiv:2305.06590 (2023)

  12. Lewis, P., et al.: Retrieval-augmented generation for knowledge-intensive NLP tasks. In: Advances in Neural Information Processing Systems, vol. 33, pp. 9459–9474 (2020)

    Google Scholar 

  13. Luo, L., Vu, T.T., Phung, D., Haffari, G.: Systematic assessment of factual knowledge in large language models. arXiv preprint arXiv:2310.11638 (2023)

  14. Wei, X., et al.: Zero-shot information extraction via chatting with ChatGPT. arXiv preprint arXiv:2302.10205 (2023)

  15. White, J., et al.: A prompt pattern catalog to enhance prompt engineering with ChatGPT. arXiv preprint arXiv:2302.11382 (2023)

  16. Yin, W., Xu, M., Li, Y., Liu, X.: LLM as a system service on mobile devices. arXiv preprint arXiv:2403.11805 (2024)

  17. Yu, S., Huang, T., Liu, M., Wang, Z.: BEAR: revolutionizing service domain knowledge graph construction with LLM. In: Monti, F., Rinderle-Ma, S., Ruiz Cortés, A., Zheng, Z., Mecella, M. (eds.) ICSOC 2023. LNCS, vol. 14419, pp. 339–346. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-48421-6_23

    Chapter  Google Scholar 

Download references

Acknowledgments

The work of Mattia Macrì has been supported by the PhD fellowship Pubblica Amministrazione DM118 - CUP B83C22003460006. The work of Marco Calamo and Filippo Bianchini has been supported by the Next-Generation EU (Italian PNRR - M4 C2, Invest 1.3 - D.D. 1551.11-10-2022), named PE4 - MICS (Made in Italy - Circular and Sustainable).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Francesca De Luzi .

Editor information

Editors and Affiliations

A Appendix

A Appendix

Here are two examples of zero-shot prompting. The standard approach involves breaking down the request into a system instruction and a prompt. This helps the model better understand what tasks it needs to perform overall versus for a specific instance, leading to improved performance. The first example of a prompt focuses on extracting Primary and NER information. These are pieces of information that only require the domain document as input and perform NER to extract named entities. In particular, it aims to analyze citation acts to extract all cited laws. In the example, the variables are highlighted in to distinguish them from the template parts. Specifically, the variables and form the template for extracting Primary and NER information. The other variables contain the necessary data for the end user during the Information Extraction phase.

figure k

In the second example, we present instructions for extracting Semi-derivative and Question Answering information. These tasks require both the document and previously extracted data as input, executing the Question Answering task to provide an output response. In this instance, we aim to understand why laws were mentioned in a citation act.

figure l

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bianchini, F., Calamo, M., De Luzi, F., Macrì, M., Mecella, M. (2025). A Service-Based Pipeline for Complex Linguistic Tasks Adopting LLMs and Knowledge Graphs. In: Aiello, M., Barzen, J., Dustdar, S., Leymann, F. (eds) Service-Oriented Computing. SummerSOC 2024. Communications in Computer and Information Science, vol 2221. Springer, Cham. https://doi.org/10.1007/978-3-031-72578-4_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-72578-4_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-72577-7

  • Online ISBN: 978-3-031-72578-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics