Abstract
In this paper, we propose a new data model that uses data lineage, which represents relationships between data and their histories, for the verification of data. In recent years, although the verifiability of processes in many industries for their data is required, current data management systems are inadequate, as they do not distinguish between data flow and data dependencies, and also offer little support for machine learning models. Our data model for managing data lineage metadata is based on the property graph model, and supports not only machine learning models which must be treated as both operations to data as well as the data derived from training datasets, but also database repairs and database simulations, which are for maintaining the integrity of data and leveraging past data for future use.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
European union: Regulation (eu) 2023/1542 of the European parliament and of the council of 12 july 2023 concerning batteries and waste batteries, amending directive 2008/98/ec and regulation (eu) 2019/1020 and repealing directive 2006/66/ec. Official Journal of the European Union, L191, pp. 1–117 (2023)
Buneman, P., Khanna, S., Tan, W.C.: On propagation of deletions and annotations through views. In: Proceedings of the Twenty-First ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 150–158 (2002)
Cui, Y., Widom, J.: Practical lineage tracing in data warehouses. In: 2013 IEEE 29th International Conference on Data Engineering (ICDE), pp. 367–378 (2000)
Heinis, T., Alonso, G.: Efficient lineage tracking for scientific workflows. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 1007–1018 (2008)
Interlandi, M., Shah, K., Tetali, S.D., Gulzar, M.A., Yoo, S., Kim, M., Millstein, T., Condie, T.: Titian: data provenance support in spark. Proc. VLDB Endow. 9(3), 216–227 (2015)
Karvounarakis, G., Fundulaki, I., Christophides, V.: Provenance for linked data. In: Tannen, V., Wong, L., Libkin, L., Fan, W., Tan, W.-C., Fourman, M. (eds.) In Search of Elegance in the Theory and Practice of Computation. LNCS, vol. 8000, pp. 366–381. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41660-6_19
Ko, R., Xiao, C., Onizuka, M., Huang, Y., Lin, Z.: Ultraverse: efficient retroactive operation for attack recovery in database systems and web frameworks. CoRR arxiv:2211.05327 (2022)
Mirabelli, G., Solina, V.: Blockchain and agricultural supply chains traceability: research trends and future challenges. Procedia Manuf. 42, 414–421 (2020)
de Oliveira, W., Braga, R., David, J.M.N., Stroele, V., Campos, F., Castro, G.: Visionary: a framework for analysis and visualization of provenance data. Knowl. Inf. Syst. 64(2), 381–413 (2022)
Omitola, T., Zuo, L., Gutteridge, C., Millard, I.C., Glaser, H., Gibbins, N., Shadbolt, N.: Tracing the provenance of linked data using void. In: Proceedings of the International Conference on Web Intelligence, Mining and Semantics (2011)
Openlineage. https://openlineage.io/. Accessed 9 Mar 2024
Simmhan, Y.L., Plale, B., Gannon, D.: A survey of data provenance in e-science. SIGMOD Rec. 34(3), 31–36 (2005)
Prov-dm: The prov data model (2013). https://www.w3.org/TR/prov-dm/. Accessed 9 Mar 2024
Wang, J., Crawl, D., Purawat, S., Nguyen, M., Altintas, I.: Big data provenance: challenges, state of the art and opportunities. In: 2015 IEEE International Conference on Big Data (Big Data), pp. 2509–2516 (2015)
Wittner, R., et al.: Lightweight distributed provenance model for complex real-world environments. Scientific Data 9 (2022)
Wylot, M., Cudré-Mauroux, P., Hauswirth, M., Groth, P.: Storing, tracking, and querying provenance in linked data. IEEE Trans. Knowl. Data Eng. 29(8), 1751–1764 (2017)
Acknowledgements
This work was supported by JST, CREST Grant Number JPMJCR22M2, and JSPS KAKENHI Grant Numbers JP23K28091, JP23K28383, Japan.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Wong, W.J., Yasuda, K., Chang, Q., Miyazaki, J. (2025). A Data Model of a Data Lineage Management System for Database Repair and Simulation. In: Delir Haghighi, P., Greguš, M., Kotsis, G., Khalil, I. (eds) Information Integration and Web Intelligence. iiWAS 2024. Lecture Notes in Computer Science, vol 15343. Springer, Cham. https://doi.org/10.1007/978-3-031-78093-6_22
Download citation
DOI: https://doi.org/10.1007/978-3-031-78093-6_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-78092-9
Online ISBN: 978-3-031-78093-6
eBook Packages: Computer ScienceComputer Science (R0)