Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

A Data Model of a Data Lineage Management System for Database Repair and Simulation

  • Conference paper
  • First Online:
Information Integration and Web Intelligence (iiWAS 2024)

Abstract

In this paper, we propose a new data model that uses data lineage, which represents relationships between data and their histories, for the verification of data. In recent years, although the verifiability of processes in many industries for their data is required, current data management systems are inadequate, as they do not distinguish between data flow and data dependencies, and also offer little support for machine learning models. Our data model for managing data lineage metadata is based on the property graph model, and supports not only machine learning models which must be treated as both operations to data as well as the data derived from training datasets, but also database repairs and database simulations, which are for maintaining the integrity of data and leveraging past data for future use.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 49.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 59.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. European union: Regulation (eu) 2023/1542 of the European parliament and of the council of 12 july 2023 concerning batteries and waste batteries, amending directive 2008/98/ec and regulation (eu) 2019/1020 and repealing directive 2006/66/ec. Official Journal of the European Union, L191, pp. 1–117 (2023)

    Google Scholar 

  2. Buneman, P., Khanna, S., Tan, W.C.: On propagation of deletions and annotations through views. In: Proceedings of the Twenty-First ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 150–158 (2002)

    Google Scholar 

  3. Cui, Y., Widom, J.: Practical lineage tracing in data warehouses. In: 2013 IEEE 29th International Conference on Data Engineering (ICDE), pp. 367–378 (2000)

    Google Scholar 

  4. Heinis, T., Alonso, G.: Efficient lineage tracking for scientific workflows. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 1007–1018 (2008)

    Google Scholar 

  5. Interlandi, M., Shah, K., Tetali, S.D., Gulzar, M.A., Yoo, S., Kim, M., Millstein, T., Condie, T.: Titian: data provenance support in spark. Proc. VLDB Endow. 9(3), 216–227 (2015)

    Article  Google Scholar 

  6. Karvounarakis, G., Fundulaki, I., Christophides, V.: Provenance for linked data. In: Tannen, V., Wong, L., Libkin, L., Fan, W., Tan, W.-C., Fourman, M. (eds.) In Search of Elegance in the Theory and Practice of Computation. LNCS, vol. 8000, pp. 366–381. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41660-6_19

    Chapter  Google Scholar 

  7. Ko, R., Xiao, C., Onizuka, M., Huang, Y., Lin, Z.: Ultraverse: efficient retroactive operation for attack recovery in database systems and web frameworks. CoRR arxiv:2211.05327 (2022)

  8. Mirabelli, G., Solina, V.: Blockchain and agricultural supply chains traceability: research trends and future challenges. Procedia Manuf. 42, 414–421 (2020)

    Article  Google Scholar 

  9. de Oliveira, W., Braga, R., David, J.M.N., Stroele, V., Campos, F., Castro, G.: Visionary: a framework for analysis and visualization of provenance data. Knowl. Inf. Syst. 64(2), 381–413 (2022)

    Article  Google Scholar 

  10. Omitola, T., Zuo, L., Gutteridge, C., Millard, I.C., Glaser, H., Gibbins, N., Shadbolt, N.: Tracing the provenance of linked data using void. In: Proceedings of the International Conference on Web Intelligence, Mining and Semantics (2011)

    Google Scholar 

  11. Openlineage. https://openlineage.io/. Accessed 9 Mar 2024

  12. Simmhan, Y.L., Plale, B., Gannon, D.: A survey of data provenance in e-science. SIGMOD Rec. 34(3), 31–36 (2005)

    Article  Google Scholar 

  13. Prov-dm: The prov data model (2013). https://www.w3.org/TR/prov-dm/. Accessed 9 Mar 2024

  14. Wang, J., Crawl, D., Purawat, S., Nguyen, M., Altintas, I.: Big data provenance: challenges, state of the art and opportunities. In: 2015 IEEE International Conference on Big Data (Big Data), pp. 2509–2516 (2015)

    Google Scholar 

  15. Wittner, R., et al.: Lightweight distributed provenance model for complex real-world environments. Scientific Data 9 (2022)

    Google Scholar 

  16. Wylot, M., Cudré-Mauroux, P., Hauswirth, M., Groth, P.: Storing, tracking, and querying provenance in linked data. IEEE Trans. Knowl. Data Eng. 29(8), 1751–1764 (2017)

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by JST, CREST Grant Number JPMJCR22M2, and JSPS KAKENHI Grant Numbers JP23K28091, JP23K28383, Japan.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jun Miyazaki .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wong, W.J., Yasuda, K., Chang, Q., Miyazaki, J. (2025). A Data Model of a Data Lineage Management System for Database Repair and Simulation. In: Delir Haghighi, P., Greguš, M., Kotsis, G., Khalil, I. (eds) Information Integration and Web Intelligence. iiWAS 2024. Lecture Notes in Computer Science, vol 15343. Springer, Cham. https://doi.org/10.1007/978-3-031-78093-6_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-78093-6_22

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-78092-9

  • Online ISBN: 978-3-031-78093-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics