Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Visionary: a framework for analysis and visualization of provenance data

Published: 01 February 2022 Publication History

Abstract

Provenance is recognized as a central challenge to establish the reliability and provide security in computational systems. In scientific workflows, provenance is considered essential to support experiments’ reproducibility, interpretation of results, and problem diagnosis. We consider that these requirements can also be used in new application domains, such as software processes and IoT. However, for a better understanding and use of provenance data, efficient and user-friendly mechanisms are needed. Ontology, complex networks, and software visualization can help in this process by generating new data insights and strategic information for decision-making. This paper presents the Visionary framework, designed to assist in the understanding and use of provenance data through ontologies, complex network analysis, and software visualization techniques. The framework captures the provenance data and generates new information using ontologies and structural analysis of the provenance graph. The visualization presents and highlights inferences and results obtained with the data analysis. Visionary is an application domain-free framework adapted to any system that uses the PROV provenance model. Evaluations were carried out, and some evidence was found that the framework assists in the understanding and analysis of provenance data when decision-making is needed.

References

[1]
Groth P, Moreau L (2013) Prov-overview. An overview of the prov family of documents. World Wide Web Consortium. http://www.w3.org/TR/2013/NOTE...-20130430/. Accessed 31 Aug 2021
[2]
Acar UA, Ahmed A, Cheney J, and Perera R A core calculus for provenance POST 2012 7215 410-429
[3]
Simmhan YL, Plale B, and Gannon D A survey of data provenance in e-science ACM SIGMOD Rec 2005 34 3 31-36
[4]
Muniswamy-Reddy K-K, Holland DA, Braun U, Seltzer MI (2006) Provenance-aware storage systems. In: USENIX annual technical conference, general track, pp 43–56
[5]
Costa G, Werner C, Braga RM, Dalpra H, Stroele V, and Araujo MA Deriving strategical information for software development processes using provenance data and ontology techniques Int J Bus Process Integr Manag (Print) 2019
[6]
Muniswamy-Reddy K-K and Seltzer M Provenance as first class cloud data ACM SIGOPS Oper Syst Rev 2010 43 4 11-16
[7]
Margo DW, Smogor R (2010) Using provenance to extract semantic file attributes. In: Proceedings of the 2nd conference on theory and practice of provenance (TAPP'10). USENIX Association, USA, p 7
[8]
Cheney J, Chiticariu L, Tan W-C, et al. Provenance in databases: why, how, and where Found Trends® Databases 2009 1 4 379-474
[9]
Wang Q, Hassan WU, Li D, Jee K, Yu X, Zou K, Chen H (2020) You are what you do: hunting stealthy malware via data provenance analysis. In: Symposium on network and distributed system security (NDSS).
[10]
Sigwart M, Borkowski M, Peise M, Schulte S, and Tai S A secure and extensible blockchain-based data provenance framework for the Internet of Things Pers Ubiquit Comput 2020
[11]
Moreau L, Clifford B, Freire J, Futrelle J, Gil Y, Groth P, Kwasnikowska N, Miles S, Missier P, Myers J, et al. The open provenance model core specification (v1.1) Future Gener Comput Syst 2011 27 6 743-756
[12]
Buneman P, Khanna S, Tan WC (2001) Why and where: a characterization of data provenance. In: Springer. ICDT, 1, pp 316–330.
[13]
Packer HS, Moreau L (2014) Sentence templating for explaining provenance. In: Ludäscher B, Plale B (eds) Provenance and annotation of data and processes. IPAW 2014. Lecture notes in computer science, vol 8628. Springer, Cham.
[14]
Arshad B, Munir K, Mcclatchey R, Liaquat S (2015) Position paper: provenance data visualization for neuroimaging analysis. arXiv:1502.01556
[15]
Hoekstra R, Groth P (2014) Prov-o-viz-understanding the role of activities in provenance. In: International provenance and annotation workshop. Springer, pp 215–220.
[16]
Oliveira W, Ambrosio L, Braga R, Stroele V, David JMN, and Campos F A framework for provenance analysis and visualization Procedia Comput Sci 2017 108 1592-1601
[17]
Pérez B, Rubio J, and Sáenz-Ádan C A systematic review of provenance systems Knowl Inf Syst 2018 57 495-543
[18]
Kohwalter T, Oliveira T, Freire J, Clua E, Murta L (2016) Prov viewer: a graph-based visualization tool for interactive exploration of provenance data. In: International provenance and annotation workshop. Springer, pp 71–82.
[19]
Cheay Y-W, Plale B (2012) Provenance analysis: towards quality provenance. In: 2012 IEEE 8th international conference on E-science (e-Science). IEEE, pp 1–8.
[20]
Dominguez E, Pérez B, Rubio J, and Sáenz-Ádan C Developing provenance-aware query systems: an occurrence-centric approach Knowl Inf Syst 2017 50 661-688
[21]
Richardson DP, Moreau L (2016) Towards the domain agnostic generation of natural language explanations from provenance graphs for casual users. In: International provenance and annotation workshop. Springer, pp 95–106.
[22]
Hevner AR, March ST, Jinsoo P, and Ram S Design science in information systems research MIS Q 2004 28 1 75-105
[23]
Moreau L, Kwasnikowska N, Bussche JV (2009) The foundations of the open provenance model. http://eprints.soton.ac.uk/id/eprint/267282. Accessed 31 Aug 2021
[24]
Lim C, Lu S, Chebotko A, Fotouhi F (2010) Prospective and retrospective provenance collection in scientific workflow environments. In: 2010 IEEE international conference on services computing (SCC). IEEE, pp 449–456.
[25]
Bowers S, Mcphillips T, Ludascher B, Cohen S, Davidson SB (2006) A model for user-oriented data provenance in pipelined scientific workflows. In: International provenance and annotation workshop. Springer, pp 133–147.
[26]
Buneman P, Chapman A, Cheney J, and Vansummeren SA Provenance model for manually curated data IPAW 2006 6 162-170
[27]
Cao B, Plale B, Subramanian G, Robertson E, Simmhan Y (2009) Provenance information model of karma version 3. In: 2009 world conference on services-I. IEEE, pp 348–351.
[28]
Davidson SB, Freire J (2008) Provenance and scientific workflows: challenges and opportunities. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data. ACM, pp 1345–1350.
[29]
Moreau L, Missier P (2013) Prov-dm: The prov data model., v. 3. https://www.w3.org/TR/prov-dm/. Accessed 31 Aug 2021
[30]
Lebo T, Sahoo S, Mcguinness D, Belhajjame K, Cheney J, Corsar D, Garijo D, Soiland Reyes S, Zednik S, Zhao J (2013) Prov-O: the prov ontology. W3C recommendation, 30. https://www.w3.org/TR/2011/WD-prov-o-20111213/. Accessed 31 Aug 2021
[31]
Harary F Graph theory 1969 Reading Addison
[32]
Newman MEJ Networks: an introduction 2010 Oxford Oxford University ISBN: 0199206651
[33]
Guarino N et al. Formal ontology and information systems Proc FOIS 1998 98 81-97
[34]
Wohlin C, Runeson P, Host M, Ohlsson MC, Regnell B, and Wesslen A Experimentation in software engineering 2012 Berlin Springer
[35]
Chen P, Plale B, Cheah YW, Ghoshal D, Jensen S, Luo Y (2012) Visualization of network data provenance. In: 2012 19th international conference on high-performance computing (HiPC). IEEE, pp 1–9.
[36]
Karsai L (2016) Clustering provenance. Ph.D. thesis, University of Sydney.
[37]
Ragan E, Endert A, Sanyal J, and Chen J Characterizing provenance in visualization and data analysis: an organizational framework of provenance types and purposes IEEE Trans Vis Comput Graph 2016 22 1 31-40
[38]
Stitz H, Gratzl S, Piringer H, Zichener T, and Streit M KnowledgePearls: provenance-based visualization retrieval IEEE Trans Vis Comput Graph (VAST '18) 2018 25 1 120-130
[39]
Anand MK, Bowers S, Ludascher B (2010) Provenance browser: Displaying and querying scientific workflow provenance graphs. In: 2010 IEEE 26th international conference on data engineering (ICDE). IEEE, pp 1201–1204.
[40]
Borkin MA, Yeh CS, Boyd M, Macko P, Gajos KZ, Seltzer M, and Pfister H Evaluation of filesystem provenance visualization tools IEEE Trans Vis Comput Graph 2013 19 12 2476-2485
[41]
Kadivar N, Chen V, Dunsmuir D, Lee E, Qjan C, Dill J, Shaw C, Woodbury R (2009) Capturing and supporting the analysis process. In: IEEE symposium on visual analytics science and technology. VAST 2009. IEEE, pp 131–138.
[42]
Chen YV, Qian ZC, Woodbury R, Dill J, and Shaw CD Employing a parametric model for analytic provenance ACM Trans Interact Intell Syst (TiiS) 2014 4 1 6
[43]
Rio ND, Silva PPD (2007) Probe-it! Visualization support for provenance. In: International symposium on visual computing. Springer, pp 732–741.
[44]
Hunter J and Cheung K Provenance explorer-a graphical interface for constructing scientific publication packages from provenance trails Int J Digit Libr 2007 7 1 99-107
[45]
Khan S, Kanturska U, Waters T, Eaton J, Banares-Alcantara R, and Chen M Ontology-assisted provenance visualization for supporting enterprise search of engineering and business files Adv Eng Inform 2016 30 2 244-257
[46]
Stitz H, Luger S, Streit M, Gehlenborg N (2016) Avocado: visualization of workflow-derived data provenance for reproducible biomedical research. In: Computer graphics forum. Wiley Online Library, vol 35, no 3, pp 481–490.
[47]
Macko P, Margo S (2011) Provenance map orbiter: interactive exploration of large provenance graphs. In: Proceedings of the 3rd USENIX workshop on the theory and practice of provenance (TaPP '11), June 20–21, Heraklion, Crete, Greece. USENIX Association, Berkeley, CA
[48]
Callahan SP et al. (2006) VisTrails: visualization meets data management. In: Proceedings of the 2006 ACM SIGMOD international conference on management of Data. ACM, New York, NY, USA, pp 745–747.
[49]
Altintas I et al. (2004) Kepler: an extensible system for design and execution of scientific workflows. In: 16th international conference on scientific and statistical database management. Proceedings, pp 423–424.
[50]
Hull D Taverna: a tool for building and running workflows of services Nucleic Acids Res 2006 34 suppl 2 W729-W732
[51]
Ceolin D, Groth P, Maccatrozzo V, Fokkink W, Hage WRV, and Nottamkandath A Combining user reputation and provenance analysis for trust assessment J Data Inf Qual (JDIQ) 2016 7 1–2 6
[52]
Mcgrath RE, Futrelle J (2008) Reasoning about provenance with owl and swrl rules. In: AAAI spring symposium: AI meets business rules and process management, pp 87–92
[53]
Missier P, Belhajjame K (2012) A prov encoding for provenance analysis using deductive rules. In: IPAW. Springer, pp 67–81.
[54]
Prat N, Madnick S (2008) Measuring data believability: a provenance approach. In: Proceedings of the 41st annual Hawaii international conference on system sciences. IEEE, pp 393–393.
[55]
Strubulis C, Tzitzikas Y, Doerr M, Flouris G (2012) Evolution of workflow provenance information in the presence of custom inference rules. In: 3rd intern. workshop on the role of semantic web in provenance management (SWPM'12), co-located with ESWC'12, Heraklion, Crete
[56]
Cuevas-Vicenttin V et al (2016) ProvONE: a PROV extension data model for scientific workflow provenance. https://purl.dataone.org/provone-v1-dev. Accessed 31 Aug 2021
[57]
Dalpra H (2016) PROV-process: provenance data applied to software development process. Master Thesis, Federal University of Juiz de Fora. http://www.ufjf.br/pgcc/files/2014/06/Humberto-Dalpra.pdf (in Portuguese). Accessed 31 Aug 2021
[58]
Sirqueira TF, Braga R, Araujo MA, David JM, Campos F, and Stroele V An approach to configuration management of scientific workflows Int J Web Portals (IJWP) 2017 9 2 20-46
[59]
Sirin E, Parsia B, Cuenca Grau B, Kalynpur A, and Kartz Y Pellet: a practical OWL-DL reasoner Web Semant 2007 5 2 51-53
[60]
Dalpra H, Castro G, Ferrenzini T, Braga R, Werner C, David JMN, Campos F (2015) Using ontology and data provenance to improve software processes. In: ONTOBRAS, 2015, São Paulo. Proceedings of Ontobras
[61]
Horrocks I, Patel-Schneider PF, Boley H, Tabet S, Grosof B, Dean M (2004) SWRL: a semantic web rule language combining OWL and RuleML. https://www.w3.org/Submission/SWRL/. Accessed 10 May 2018
[62]
Ebden M, Huynh T, Moreau L, Ramchurn S, Roberts S (2012) Network analysis on provenance graphs from a crowdsourcing application. In: Provenance and annotation of data and processes. Springer, pp 168–182.
[63]
Huynh TD, Ebden M, Venanzi M, Ramchurn SD, Roberts S, Moreau L (2013) Interpretation of crowdsourced activities using provenance network analysis. In: First AAAI conference on human computation and crowdsourcing. http://eprints.soton.ac.uk/id/eprint/357199. Accessed 31 Aug 2021
[64]
OMG (2011) BPM Notation (bpmn) version 2.0. https://www.omg.org/spec/BPMN/2.0/About-BPMN/. Accessed 31 Aug 2021
[65]
Basili V, Caldiera G, Rombach D (1994) GQM paradigm. Computer encyclopedia of software engineering. Wiley
[66]
Schaber K Sutherland J, Casanave C, Miller J, Patel P, and Hollowell G SCRUM development process Business object design and implementation 1997 London Springer
[67]
Classe T, Braga R, David JMN, Campos F, and Arbex W A distributed infrastructure to support scientific experiments J Grid Comput 2017 1 1-26
[68]
Lethbridge TC, Sim SE, and Singer J Studying software engineers: data collection techniques for software field studies Empir Softw Eng 2005 10 311-341
[69]
Hossin M and Sulaiman MN A review on evaluation metrics for data classification evaluations Int J Data Min Knowl Manag Process 2015 5 2 1
[70]
Runeson P, Host M, Rainer A, Regnell B (2012) Case study research in software engineering: Guidelines and examples. Wiley. ISBN: 978-1-118-10435-4

Cited By

View all
  • (2024)Prov-DominoesExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.123030245:COnline publication date: 2-Jul-2024
  • (2024)Optimizing data regeneration and storage with data dependency for cloud scientific workflow systemsExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.121984238:PDOnline publication date: 15-Mar-2024
  • (2024)A Data Model of a Data Lineage Management System for Database Repair and SimulationInformation Integration and Web Intelligence10.1007/978-3-031-78093-6_22(243-248)Online publication date: 1-Dec-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Knowledge and Information Systems
Knowledge and Information Systems  Volume 64, Issue 2
Feb 2022
294 pages

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 01 February 2022
Accepted: 10 December 2021
Revision received: 08 December 2021
Received: 16 March 2020

Author Tags

  1. Provenance data
  2. Ontologies
  3. Complex networks
  4. Software visualization
  5. Scientific software
  6. Software process

Qualifiers

  • Research-article

Funding Sources

  • cnpq
  • Fapemig

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Prov-DominoesExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.123030245:COnline publication date: 2-Jul-2024
  • (2024)Optimizing data regeneration and storage with data dependency for cloud scientific workflow systemsExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.121984238:PDOnline publication date: 15-Mar-2024
  • (2024)A Data Model of a Data Lineage Management System for Database Repair and SimulationInformation Integration and Web Intelligence10.1007/978-3-031-78093-6_22(243-248)Online publication date: 1-Dec-2024

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media