Abstract
This paper presents a refinement of PrOnto ontology using a validation test based on legal experts’ annotation of privacy policies combined with an Open Knowledge Extraction (OKE) algorithm. To ensure robustness of the results while preserving an interdisciplinary approach, the integration of legal and technical knowledge has been carried out as follows. The set of privacy policies was first analysed by the legal experts to discover legal concepts and map the text into PrOnto. The mapping was then provided to computer scientists to perform the OKE analysis. Results were validated by the legal experts, who provided feedbacks and refinements (i.e. new classes and modules) of the ontology according to MeLOn methodology. Three iterations were performed on a set of (development) policies, and a final test using a new set of privacy policies. The results are 75,43% of detection of concepts in the policy texts and an increase of roughly 33% in the accuracy gain on the test set, using the new refined version of PrOnto enriched with SKOS-XL lexicon terms and definitions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
https://www.w3.org/TR/sparql11-query/, last accessed 2020/06/19.
- 2.
https://www.w3.org/TR/rdf11-concepts/, last accessed 2020/06/19.
- 3.
https://www.w3.org/TR/skos-reference/skos-xl.html, last accessed 2020/06/19.
- 4.
PrOnto reuses existing ontologies ALLOT [4] FRBR [19], LKIF [6] we use in particular lkif:Agent to model lkif:Organization, lkif:Person and lkif:Role [6], the Publishing Workflow Ontology (PWO) [13], Time-indexed Value in Context (TVC) and Time Interval [30]. Now with this work we include also SKOS-XL [5, 8].
- 5.
Rover, Parkclick, Springer, Zalando, Louis Vuitton, Burger King, Microsoft-Skype, Lufthansa, Booking, Zurich Insurance.
- 6.
https://spacy.io, last accessed 2020/06/19.
- 7.
https://gitlab.com/CIRSFID/un-challange-2019, last accessed 2020/06/19.
- 8.
https://www.betterinternetforkids.eu/web/portal/practice/awareness/detail?articleId=3017751, last accessed 2020/06/19.
- 9.
COM (2019) 250 final “data which were initially personal data, but were later made anonymous. The ‘anonymisation’ of personal data is different to pseudonymisation (see above), as properly anonymised data cannot be attributed to a specific person, not even by use of additional data and are therefore non-personal data”.
- 10.
Recital 26 GDPR “5. The principles of data protection should therefore not apply to anonymous information, namely information which does not relate to an identified or identifiable natural person or to personal data rendered anonymous in such a manner that the data subject is not or no longer identifiable. 6. This Regulation does not therefore concern the processing of such anonymous information, including for statistical or research purposes”.
- 11.
https://www.specialprivacy.eu/, last accessed 2020/06/19.
- 12.
https://publications.europa.eu/en/web/eu-vocabularies/th-dataset/-/resource/dataset/eurovoc, last accessed 2020/06/19.
- 13.
https://iate.europa.eu/, last accessed 2020/06/19.
- 14.
https://www.w3.org/TR/skos-reference/skos-xl.html, last accessed 2020/06/19.
References
Angeli, G., Premkumar, M.J.J., Manning, C.D.: Leveraging linguistic structure for open domain information extraction. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, vol. 1: Long Papers, pp. 344–354 (2015)
Ashley, K.D.: Artificial intelligence and Legal Analytics: New Tools for Law Practice in the Digital Age. Cambridge University Press, Cambridge (2017)
Bandeira, J., Bittencourt, I.I., Espinheira, P., Isotani, S.: FOCA: a methodology for ontology evaluation. Eprint ArXiv (2016)
Barabucci, G., Cervone, L., Di Iorio, A., Palmirani, M., Peroni, S., Vitali, F.: Managing semantics in XML vocabularies: an experience in the legal and legislative domain. In: Proceedings of Balisage: The Markup Conference, vol. 5 (2010)
Bosque-Gil, J., Gracia, J., Montiel-Ponsoda E.: Towards a module for lexicography in OntoLex. In: Proceedings of the LDK Workshops: OntoLex, TIAD and Challenges for Wordnets at 1st Language Data and Knowledge Conference (LDK 2017), Galway, Ireland, vol. 1899, pp. 74–84. CEUR-WS (2017)
Breuker, J., et al.: OWL Ontology of Basic Legal Concepts (LKIF-Core), Deliverable No. 1.4. IST-2004-027655 ESTRELLA: European project for Standardised Transparent Representations in order to Extend Legal Accessibility (2007)
Cer, D., et al.: Universal sentence encoder. arXiv preprint arXiv:1803.11175 (2018)
Declerck, T., Egorova, K., Schnur, E.: An integrated formal representation for terminological and lexical data included in classification schemes. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC-2018) (2018)
Del Corro, L., Gemulla, R.: ClausIE: clause-based open information extraction. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 355–366. ACM (2013)
Fernández-Barrera, M., Sartor, G.: The legal theory perspective: doctrinal conceptual systems vs. computational ontologies. In: Sartor, G., Casanovas, P., Biasiotti, M., Fernández-Barrera, M. (eds.) Approaches to Legal Ontologies. Law, Governance and Technology Series, vol. 1, pp. 15–47. Springer, Dordrecht (2011). https://doi.org/10.1007/978-94-007-0120-5_2
Gangemi, A., Guarino, N., Masolo, C., Oltramari, A., Schneider, L.: Sweetening ontologies with DOLCE. In: Gómez-Pérez, A., Benjamins, V.R. (eds.) EKAW 2002. LNCS (LNAI), vol. 2473, pp. 166–181. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45810-7_18
Gangemi, A., Peroni, S., Shotton, D., Vitali, F.: The publishing workflow ontology (PWO). Semant. Web 8, 703–718 (2017). https://doi.org/10.3233/SW-160230
Gangemi, A., Presutti, V., Reforgiato Recupero, D., Nuzzolese, A.G., Draicchio, F., Mongiovì, M.: Semantic web machine reading with FRED. Semant. Web 8(6), 873–893 (2017)
Guarino, N., Welty, C.A.: An overview of OntoClean. In: Staab, S., Studer, R. (eds.) Handbook on Ontologies. International Handbooks on Information Systems, pp. 151–171. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24750-0_8
Hitzler, P., Gangemi, A., Janowicz, K., Krisnadhi, A. (eds.): Ontology Engineering with Ontology Design Patterns: Foundations and Applications. Studies on the Semantic Web. IOS Press, Amsterdam (2016)
http://openscience.adaptcentre.ie/ontologies/GConsent/docs/ontology. Accessed 19 June 2020
http://www.w3.org/2016/05/ontolex. Accessed 19 June 2020
https://www.w3.org/ns/dpv#data-controller. Accessed 19 June 2020
IFLA Study Group on the Functional Requirements for Bibliographic Records. Functional Requirements for Bibliographic Records. IFLA Series on Bibliographic Control. De Gruyter Saur (1996)
Liebwald, D.: Law’s capacity for vagueness. International Journal for the Semiotics of Law-Revue internationale de Sémiotique juridique 26(2), 391–423 (2012)
Lockard, C., Shiralkar, P., Dong, X.L.: OpenCeres: when open information extraction meets the semi-structured web. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1 (Long and Short Papers), pp. 3047–3056 (2019)
McCrae, J., Spohr, D., Cimiano, P.: Linking lexical resources and ontologies on the semantic web with lemon. In: Antoniou, G., et al. (eds.) ESWC 2011. LNCS, vol. 6643, pp. 245–259. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21034-1_17
Oltramari, A., et al.: PrivOnto: a semantic framework for the analysis of privacy policies. Semant. Web, 1–19 (2016)
Palmirani, M., Bincoletto, G., Leone, V., Sapienza, S., Sovrano, F.: PrOnto ontology refinement through open knowledge extraction. In: Jurix 2019 Proceedings, pp. 205–210 (2019)
Palmirani, M., Governatori, G.: Modelling legal knowledge for GDPR compliance checking. In: JURIX 2018 Proceedings, pp. 101–110 (2018)
Palmirani, M., Martoni, M., Rossi, A., Bartolini, C., Robaldo, L.: PrOnto: privacy ontology for legal reasoning. In: Kő, A., Francesconi, E. (eds.) EGOVIS 2018. LNCS, vol. 11032, pp. 139–152. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98349-3_11
Palmirani, M., Martoni, M., Rossi, A., Bartolini, C., Robaldo, L.: Legal ontology for modelling GDPR concepts and norms. In: JURIX 2018 Proceedings, pp. 91–100 (2018)
Palmirani, M., Martoni, M., Rossi, A., Bartolini, C., Robaldo, L.: PrOnto: privacy ontology for legal compliance. In: Proceedings of the 18th European Conference on Digital Government ECDG 2018, Reading UK, Academic Conferences and Publishing International Limited, 2018, pp. 142–151 (2018)
Pandit, H.J., Fatema, K., O’Sullivan, D., Lewis, D.: GDPRtEXT - GDPR as a linked data resource. In: Gangemi, A., et al. (eds.) ESWC 2018. LNCS, vol. 10843, pp. 481–495. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93417-4_31
Pandit, H.J., Lewis, D.: Modelling provenance for gdpr compliance using linked open data vocabularies. In: Proceedings of the 5th Workshop on Society, Privacy and the Semantic Web - Policy and Technology (PrivOn2017) co-located with the 16th International Semantic Web Conference (ISWC 2017) (2017)
Peroni, S., Palmirani, M., Vitali, F.: UNDO: the united nations system document ontology. In: d’Amato, C., et al. (eds.) ISWC 2017. LNCS, vol. 10588, pp. 175–183. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68204-4_18
Rossi, A., Palmirani, M.: DaPIS: an ontology-based data protection icon set. In: Peruginelli, G., Faro, S. (eds.) Knowledge of the Law in the Big Data Age. Frontiers in Artificial Intelligence and Applications, vol. 317. IOS Press (2019)
Roussey, C., Pinet, F., Kang, M.A., Corcho, O.: An introduction to ontologies and ontology engineering. In: Falquet, G., Métral, C., Teller, J., Tweed, C. (eds.) Ontologies in Urban Development Projects. Advanced Information and Knowledge Processing, vol. 1, pp. 9–38. Springer, London (2011). https://doi.org/10.1007/978-0-85729-724-2_2
Sovrano, F., Palmirani, M., Vitali, F.: Deep learning based multi-label text classification of UNGA resolutions. arXiv preprint arXiv:2004.03455 (2020)
van Opijnen, M., Santos, C.: On the concept of relevance in legal information retrieval. Artif. Intell. Law 25(1), 65–87 (2017). https://doi.org/10.1007/s10506-017-9195-8
Welty, Chris, Murdock, J.W.: Towards knowledge acquisition from information extraction. In: Cruz, Isabel, Decker, Stefan, Allemang, Dean, Preist, Chris, Schwabe, Daniel, Mika, Peter, Uschold, Mike, Aroyo, Lora M. (eds.) ISWC 2006. LNCS, vol. 4273, pp. 709–722. Springer, Heidelberg (2006). https://doi.org/10.1007/11926078_51
Wilson, S., et al.: Analyzing privacy policies at scale: from crowdsourcing to automated annotations. ACM Trans. Web 13, 1 (2018)
Acknowledgements
This work was partially supported by the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 690974 “MIREL: MIning and REasoning with Legal texts”.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
In this section we will provide additional data (technical results and measurements) resulting from the experiments described in this paper. More precisely, we present the statistics obtained from the experiments on both the first and the second set of privacy policies.
First Set of Privacy Policies (Development Set)
PrOnto version | SKOS support | Found ontology concepts | Ontology concepts | Presence | Accuracy gain |
---|---|---|---|---|---|
8 | No | 87 | 139 | 62,65% | 0% |
9 | No | 111 | 172 | 64,91% | 27,58% |
9 | Yes | 123 | 172 | 71,92% | 41,37% |
Second Set of Privacy Policies (Test Set)
PrOnto version | SKOS support | Found ontology concepts | Ontology concepts | Presence | Accuracy gain |
---|---|---|---|---|---|
8 | No | 97 | 139 | 69,78% | 0% |
9 | No | 119 | 172 | 69,59% | 22,68% |
9 | Yes | 129 | 172 | 75,43% | 32,98% |
Where:
-
The “Accuracy Gain” is computed as (x2 - x1)/x1, where x1 is the number of “Found Ontology Concepts” with PrOnto v8 without SKOS support and x2 is the number of “Found Ontology Concepts” of any of the other versions of PrOnto.
-
The “Presence” is computed as the ratio of “Found Ontology Concepts” and “Ontology Concepts”.
-
“Ontology Concepts” is the total number of concepts in the ontology.
-
“Found Ontology Concepts” is the number of concepts of the ontology that have been identified by the OKE tool in the set of policies. “Found Ontology Concepts” is always lower than “Ontology Concepts”.
-
“SKOS Support” is a boolean indicating whether it has been used SKOS support or not.
-
“PrOnto Version” indicates the version of PrOnto.
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Palmirani, M., Bincoletto, G., Leone, V., Sapienza, S., Sovrano, F. (2020). Hybrid Refining Approach of PrOnto Ontology. In: Kő, A., Francesconi, E., Kotsis, G., Tjoa, A., Khalil, I. (eds) Electronic Government and the Information Systems Perspective. EGOVIS 2020. Lecture Notes in Computer Science(), vol 12394. Springer, Cham. https://doi.org/10.1007/978-3-030-58957-8_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-58957-8_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58956-1
Online ISBN: 978-3-030-58957-8
eBook Packages: Computer ScienceComputer Science (R0)