Abstract
In this work we present the TULSI system (so named after Turin University Legal Semantic Interpreter), a system to produce automatic annotations of normative documents through the extraction of modificatory provisions. TULSI relies on a deep syntactic analysis and a shallow semantic interpreter that are illustrated in detail. We report the results of an experimental evaluation of the system and discuss them, also suggesting future directions for further improvement.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
E.g., the technical syntagm “digital signature” is at the same time a legal concept and also a definition of a particular technology.
E.g., the term “privacy” is a well-known legal term that represents a specific legal concept in all EU jurisdiction; in US there is a different legal meaning, in Italy this term is not used in any legal document about this topic.
In particular, it is sometimes necessary to apply modern legal interpretation principles that go beyond the textualism theory, and that integrate the objective literal interpretation with teleological scope, socio-economical goals and historical-cultural elements.
“We can now define the concept of a normative system as the set of all the propositions that are consequences of the explicitly commanded propositions” (Alchourròn and Bulygin 1971).
Also, since the Italian standard NormeInRete was being devised, the paper (Bolioli et al. 2002) provides the first description of a software system for the automated mark-up of Italian legal texts, funded by the AIPA (the Italian Authority for promoting the information technologies in the Italian Public Administration).
The files containing the DTD are available at the URL: http://www.digitpa.gov.it/standard-normeinrete.
Further details about the input format are provided in Sect. 5.1.
Corpo is the Italian word for body.
Rif stands for riferimento, the Italian word for reference.
Vir stands for virgolette, the Italian word for quotes.
Excerpt from the file http://www.di.unito.it/∼radicion/AI_LAW_2012/S2603265.xml.
Chunks can be defined as groups of syntactically related and adjacent words (Abney 1991).
The English pseudo-translation aims to keep the ordering of the Italian words. Conjunctions are labelled with subscripts for reference purposes in the following description.
Actually, some inputs can be without verbs. In this case, either the analysis now includes a single chunk (and the task is completed) or one of the chunks is chosen as the head and the others are attached to it. This is still accomplished via heuristics.
The \(\langle alinea \rangle\) tag is an element used in the annotation of legal texts to govern an enumeration of elements.
The dataset is available for download at the URL: http://www.di.unito.it/∼radicion/AI_LAW_2012/.
E.g., a sentence like ‘dalla data del RIF32, il RIF33 abrogato’ (from the date of RIF32, RIF33 is repealed) is ambiguous in Italian, and the parser wrongly accounts for the dependent ‘dalla data’—translated into ‘from the date’—with the agent of a passive action, which is introduced by the same preposition.
References
Abney SP (1991) Principle-based parsing: computation and psycholinguistics. In: Berwick RC, Abney SP, Tenny C (eds) Parsing by Chunks. Kluwer, Dordrecht
AIPA (2002) Formato per la rappresentazione elettronica dei provvedimenti normativi tramite il linguaggio di marcatura XML. Circolare n. AIPA/CR/40, 22 aprile
Alchourròn CE, Bulygin E (1971) Normativity and norms: critical perspectives on Kelsenian themes. In: Paulson SL, Litschewski-Paulson B (eds) The expressive conception of norms. Clarendon Press, Oxford
Alicante A, Bosco C, Corazza A, Lavelli A (2012) A treebank-based study on the influence of Italian word order on parsing performance. In: Calzolari N (Conference Chair), Choukri K, Declerck T, Uğur Doğan M, Maegaard B, Mariani J, Odijk J, Piperidis S (eds) Proceedings of the eight international conference on language resources and evaluation (LREC’12), Istanbul, Turkey, May 2012. European Language Resources Association (ELRA)
Appelt DE, Israel D (1999) Introduction to information extraction technology. In Proceedings of 16th international joint conference on artificial intelligence IJCAI-99, Tutorial
Arnold-Moore T (1995) Automatically processing amendments to legislation. In: ICAIL, pp 297–306
Arnold-Moore T (1997) Automatic generation of amendment legislation. In: Proceedings of the international conference on artificial intelligence and law (ICAIL), pp 56–62
Bartolini R, Lenci A, Montemagni S, Pirrelli V, Soria C (2004) Semantic mark-up of italian legal texts through nlp-based techniques. In: Proceedings of LREC 2004, pp 795–798
Biagioli C, Francesconi E, Spinosa P, Taddei M (2003) The NIR project: standards and tools for legislative drafting and legal document web publication. In: Proceedings of ICAIL workshop on e-government: modelling norms and concepts as key issues, pp 69–78
Biagioli C, Francesconi E, Passerini A, Montemagni S, Soria C (2005) Automatic semantics extraction in law documents. In: ICAIL ’05: Proceedings of the 10th international conference on artificial intelligence and law. New York, NY, USA. ACM, pp 133–140
Bolioli A, Dini L, Mercatali P, Romano F (2002) For the automated mark-up of italian legislative texts in XML. In: Bench-Capon T, Daskalopulu A, Winkels R (eds) Legal knowledge and information systems. Proceedings of Jurix 2002: the fifteenth annual conference. IOS Press
Bosco C, Montemagni S, Mazzei A, Lombardo V, Dell’Orletta F, Lenci A (2009) Evalita’09 parsing task: comparing dependency parsers and treebanks. In: Proceedings of Evalita’09. Reggio Emilia, Italy
Brighi R, Palmirani M (2009) Legal text analysis of the modification provisions: a pattern oriented approach. In: Proceedings of the international conference on artificial intelligence and law (ICAIL)
Cherubini M, Giardiello G, Marchi S, Montemagni S, Spinosa PL, Venturi G (2008) NLP-based metadata annotation of textual amendments. In: Proceedings of workshop on legislative XML 2008, Jurix
Collins M (1997) Three generative, lexicalised models for statistical parsing. In: Proceedings of the 35th annual meeting of the association for computational linguistics, pp 16–23
de Maat E, Winkels R, van Engers TM (2006) Automated detection of reference structures in law. In: van Engers TM (ed) Proceedings of the JURIX 2006 on legal knowledge and information systems: the nineteenth annual conference. IOS Press, Amsterdam, pp 41–50
de Maat E, Krabben K, Winkels R (2010) Machine learning versus knowledge based classification of legal texts. In: IOS Press, (ed) Proceedings of the 2010 conference on legal knowledge and information systems: JURIX 2010: the twenty-third annual conference, Amsterdam, pp 87–96
De Salvo Braz R, Girju R, Punyakanok V, Dan R, Sammons M (2005) An inference model for semantic entailment in natural language. In: AAAI’05: Proceedings of the 20th national conference on artificial intelligence. AAAI Press, pp 1043–1049
Domingos P (1999) The role of Occam’s razor in knowledge discovery. Data Min Knowl Discov 3:409–425
Haghighi AD, Ng AY, Manning CD (2005) Robust textual inference via graph matching. In: HLT ’05: Proceedings of the conference on human language technology and empirical methods in NLP, Morristown, NJ, USA, 2005. ACL, pp 387–394
Jackson P, Moulinier I (2002) Natural language processing for online applications. Text retrieval, extraction and categorization, vol 5 of natural language processing. Benjamins, Amsterdam, Philadelphia
Lesmo L (2007) The rule-based parser of the nlp group of the university of torino. Intell Artif 2(4):46–47
Lesmo L, Lombardo V (2002) Transformed subcategorization frames in chunk parsing. In: Proceedings of the 3rd international conference on language resources and evaluation (LREC 2002), Las Palmas, pp 512–519
Lupo C, Vitali F, Francesconi E, Palmirani M, Winkels R, de Maat E, Boer A, Mascellani P (2007) General XML format(s) for legal Sources—ESTRELLA European project. Deliverable 3.1, Faculty of Law, University of Amsterdam, Amsterdam
McCarty LT (2007) Deep semantic interpretations of legal texts. In: ICAIL ’07: Proceedings of the 11th international conference on Artificial intelligence and law. New York, NY, USA, ACM, pp 217–224
Ogawa Y, Inagaki S, Toyama K (2008) Automatic consolidation of Japanese statutes based on formalization of amendment sentences. In: Proceedings of the 2007 conference on New frontiers in artificial intelligence, JSAI’07, Berlin, Heidelberg, 2008. Springer, pp 363–376
Palmirani M (2011) Legislative change management with Akoma-Ntoso. In: Sartor G, Palmirani M, Francesconi E, Angela Biasiotti MA (eds) Legislative XML for the Semantic Web. Springer, Berlin
Palmirani M, Benigni F (2007) Norma-system: a legal information system for managing time. In: Biagioli C, Francesconi E, Sartor G (eds) Proceedings of the V legislative XML workshop. European Press Academic Publishing, Feb 2007, pp 205–223
Palmirani M, Brighi R (2003) An XML editor for legal information management. In: Traunmüller R (ed) Electronic government, vol 2739 of LNCS. Springer, Berlin, pp 421–429
Palmirani M, Brighi R (2006) Time model for managing the dynamic of normative system. Electron Gov. Lecture notes in computer science, vol 4084. Springer, pp 207–218
Palmirani M, Brighi R (2010) Model regularity of legal language in active modifications. In: Biasiotti M et al (eds) AICOL workshops 2009. Springer, Berlin, pp 54–73
Palmirani M, Brighi R, Massini M (2004) Processing normative references on the basis of natural language questions. In: DEXA ’04 Proceedings of the database and expert systems applications, 15th international workshop. IEEE Computer Society, pp 9–12
Rodotà S (1998) La tecnica legislativa per clausole generali in Italia. In: Cabella Pisu L, Nanni L (eds) Clausole e principi generali nell’argomentazione giurisprudenziale degli anni novanta. Cedam, Padova
Sacco R (2000) Lingua e diritto. Ars Interpretandi. Annuario di ermeneutica giuridica. Traduzione e diritto 5:117–134
Sagri MT, Tiscornia D (2009) Le peculiarità del linguaggio giuridico. Problemi e prospettive nel contesto multilingue Europeo. MediAzioni 7. http://mediazioni.sitlec.unibo.it. ISSN 1974-4382
Saias J, Quaresma P (2004) A methodology to create legal ontologies in a logic programming based web information retrieval system. Artif Intell Law 12(4):397–417
Sartor G (1996) Riferimenti normativi e dinamica dei nessi normativi. In: Il procedimento normativo regionale. Cedam, Padova, pp 151–164
Soria C, Bartolini R, Lenci A, Montemagni S, Pirrelli V (2007) Automatic extraction of semantics in law documents. In: Biagioli C, Francesconi E, Sartor G (eds) Proceedings of the V legislative XML workshop. European Press Academic Publishing, pp 253–266
Spinosa PL, Giardiello G, Cherubini M, Marchi S, Venturi G, Montemagni S (2009) Nlp-based metadata extraction for legal text consolidation. In: Proceedings of the 12th international conference on artificial intelligence and law, ICAIL ’09. New York, NY, USA, 2009. ACM, pp 40–49
Wyner A (2011) Towards annotating and extracting textual legal case elements. In: Francesconi E (ed) Informatica e Diritto: special issue on legal ontologies and artificial intelligent techniques 19(1–2):9–18 ESI
Zanchetta E, Baroni M (2005) Morph-it! A free corpus-based morphological resource for the Italian language. Corpus Linguistics 2005 1(1). http://www.corpus.bham.ac.uk/PCLC/
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
Rights and permissions
About this article
Cite this article
Lesmo, L., Mazzei, A., Palmirani, M. et al. TULSI: an NLP system for extracting legal modificatory provisions. Artif Intell Law 21, 139–172 (2013). https://doi.org/10.1007/s10506-012-9127-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10506-012-9127-6