Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Learning and Deducing Temporal Orders

Published: 01 April 2023 Publication History

Abstract

This paper studies how to determine temporal orders on attribute values in a set of tuples that pertain to the same entity, in the absence of complete timestamps. We propose a creator-critic framework to learn and deduce temporal orders by combining deep learning and rule-based deduction, referred to as GATE (Get the lATEst). The creator of GATE trains a ranking model via deep learning, to learn temporal orders and rank attribute values based on correlations among the attributes. The critic then validates the temporal orders learned and deduces more ranked pairs by chasing the data with currency constraints; it also provides augmented training data as feedback for the creator to improve the ranking in the next round. The process proceeds until the temporal order obtained becomes stable. Using real-life and synthetic datasets, we show that GATE is able to determine temporal orders with F-measure above 80%, improving deep learning by 7.8% and rule-based methods by 34.4%.

References

[1]
2022. Full version. https://github.com/yyssl88/Timeliness/blob/main/paper_full_version.pdf.
[2]
Serge Abiteboul, Richard Hull, and Victor Vianu. 1995. Foundations of Databases. Addison-Wesley.
[3]
Marcelo Arenas, Leopoldo Bertossi, and Jan Chomicki. 1999. Consistent Query Answers in Inconsistent Databases. In PODS.
[4]
Tobias Bleifuß, Sebastian Kruse, and Felix Naumann. 2017. Efficient Denial Constraint Discovery with Hydra. PVLDB 11, 3 (2017), 311--323.
[5]
Rpjc Jagadeesh Bose, Rs Ronny Mans, and Van Der Wmp Wil Aalst. 2013. Wanna improve process mining results? It's high time we consider data quality issues seriously. In Computational Intelligence & Data Mining.
[6]
Philip Bramsen, Pawan Deshpande, Yoong Keok Lee, and Regina Barzilay. 2006. Inducing Temporal Graphs. In Conference on Empirical Methods in Natural Language Processing (EMNLP). ACL.
[7]
Chris Burges, Tal Shaked, Erin Renshaw, Ari Lazier, Matt Deeds, Nicole Hamilton, and Greg Hullender. 2005. Learning to rank using gradient descent. In international conference on Machine learning. 89--96.
[8]
Businesswire. 2022. Over 80 Percent of Companies Rely on Stale Data for Decision-Making. https://www.businesswire.com/news/home/20220511005403/en/Over-80-Percent-of-Companies-Rely-on-Stale-Data-for-Decision-Making.
[9]
Statistics Canada. 2022. Classification of legal marital status. https://www23.statcan.gc.ca/imdb/p3VD.pl?Function=getVD&TVD=61748&CVD=61748&CLV=0&MLV=1&D=1.
[10]
Nathanael Chambers and Dan Jurafsky. 2008. Jointly Combining Implicit Constraints Improves Temporal Ordering. In Conference on Empirical Methods in Natural Language Processing (EMNLP) (Honolulu, Hawaii). ACL, 698--706.
[11]
Olivier Chapelle and Yi Chang. 2011. Yahoo! learning to rank challenge overview. In Proceedings of the learning to rank challenge. PMLR, 1--24.
[12]
Peter Christen and Ross W. Gayler. 2013. Adaptive Temporal Entity Resolution on Dynamic Databases. In PAKDD. Springer.
[13]
Xu Chu, Ihab F. Ilyas, and Paolo Papotti. 2013. Discovering Denial Constraints. PVLDB 6, 13 (2013), 1498--1509.
[14]
E. F. Codd. 1979. Extending the Database Relational Model to Capture More Meaning. TODS 4, 4 (1979), 397--434.
[15]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL-HLT. 4171--4186.
[16]
Ioannis Dikeoulias, Saadullah Amin, and Günter Neumann. 2022. Temporal Knowledge Graph Reasoning with Low-rank and Model-agnostic Representations. CoRR abs/2204.04783 (2022).
[17]
Xiaoou Ding, Hongzhi Wang, Yitong Gao, Jianzhong Li, and Hong Gao. 2017. Efficient currency determination algorithms for dynamic data. Tsinghua Science and Technology 22, 3 (2017), 227--242.
[18]
Xiaoou Ding, Hongzhi Wang, Jiaxuan Su, Jianzhong Li, and Hong Gao. 2018. Improve3c: Data cleaning on consistency and completeness with currency. arXiv preprint arXiv:1808.00024 (2018).
[19]
Xiaoou Ding, Hongzhi Wang, Jiaxuan Su, Muxian Wang, Jianzhong Li, and Hong Gao. 2020. Leveraging Currency for Repairing Inconsistent and Incomplete Data. TKDE (2020).
[20]
Aswathy Divakaran and Anuraj Mohan. 2020. Temporal Link Prediction: A Survey. New Gener. Comput. 38, 1 (2020), 213--258.
[21]
X. Dong, L. Berti-Equille, and D. Srivastava. 2009. Truth Discovery and Copying Detection in a Dynamic World. PVLDB 2, 1 (2009), 562--573.
[22]
X. L. Dong, L. Berti-Equille, Y. Hu, and D. Srivastava. 2010. Global detection of complex copying relationships between sources. In PVLDB.
[23]
Xuliang Duan, Bing Guo, Yan Shen, Yuncheng Shen, Xiangqian Dong, and Hong Zhang. 2020. Research on Parallel Data Currency Rule Algorithms. In International Conference on Information Science and System. 24--28.
[24]
Kevin K Duh. 2009. Learning to rank with partially-labeled data. University of Washington.
[25]
Exasol. 2020. Exasol Research Finds 58% of Organizations Make Decisions Based on Outdated Data. https://www.exasol.com/news-exasol-research-finds-organizations-make-decisions-based-on-outdated-data/.
[26]
Wenfei Fan, Floris Geerts, Xibei Jia, and Anastasios Kementsietsidis. 2008. Conditional functional dependencies for capturing data inconsistencies. TODS 33, 2 (2008), 6:1--6:48.
[27]
Wenfei Fan, Floris Geerts, Nan Tang, and Wenyuan Yu. 2013. Inferring data currency and consistency for conflict resolution. In ICDE. IEEE, 470--481.
[28]
Wenfei Fan, Floris Geerts, Nan Tang, and Wenyuan Yu. 2014. Conflict resolution with data currency and consistency. Journal Data and Information Quality (JDIQ) 5, 1--2 (2014), 6:1--6:37.
[29]
Wenfei Fan, Floris Geerts, and Jef Wijsen. 2011. Determining the currency of data. In PODS. ACM.
[30]
Wenfei Fan, Floris Geerts, and Jef Wijsen. 2012. Determining the Currency of Data. TODS 37, 4 (2012), 25:1--25:46.
[31]
Wenfei Fan, Ruochun Jin, Ping Lu, Chao Tian, and Ruiqi Xu. 2022. Towards Event Prediction in Temporal Graphs. PVLDB 15, 9 (2022), 1861--1874.
[32]
Wenfei Fan, Ping Lu, and Chao Tian. 2020. Unifying Logic Rules and Machine Learning for Entity Enhancing. Sci. China Inf. Sci. 63, 7 (2020).
[33]
Wenfei Fan, Chao Tian, Yanghao Wang, and Qiang Yin. 2021. Parallel Discrepancy Detection and Incremental Detection. PVLDB 14, 8 (2021), 1351--1364.
[34]
Shenzhen Municipal Govement. 2022. Self-employed Entrepreneurs. https://opendata.sz.gov.cn/data/dataSet/toDataDetails/29200_01300931.
[35]
Tanya Goyal and Greg Durrett. 2019. Embedding Time Expressions for Deep Temporal Ordering Models. In Conference of the Association for Computational Linguistics (ACL). ACL.
[36]
Shuguang Han, Xuanhui Wang, Mike Bendersky, and Marc Najork. 2020. Learning-to-Rank with BERT in TF-Ranking. arXiv preprint arXiv:2004.08476 (2020).
[37]
Yujing Hu, Qing Da, Anxiang Zeng, Yang Yu, and Yinghui Xu. 2018. Reinforcement learning to rank in e-commerce search engine: Formalization, analysis, and application. In SIGKDD. 368--377.
[38]
Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In ICLR (Poster).
[39]
James Kirkpatrick, Razvan Pascanu, Neil C. Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A. Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, Demis Hassabis, Claudia Clopath, Dharshan Kumaran, and Raia Hadsell. 2016. Overcoming catastrophic forgetting in neural networks. CoRR abs/1612.00796 (2016).
[40]
Angelina Prima Kurniati, Eric Rojas, David Hogg, Geoff Hall, and Owen A Johnson. 2019. The assessment of data quality issues for process mining in healthcare using Medical Information Mart for Intensive Care III, a freely available e-health record database. Health informatics journal 25, 4 (2019), 1878--1893.
[41]
Stefano Leone. 2022. FIFA 22 complete player dataset. https://www.kaggle.com/stefanoleone992/fifa-21-complete-player-dataset.
[42]
Furong Li, Mong-Li Lee, Wynne Hsu, and Wang-Chiew Tan. 2015. Linking Temporal Records for Profiling Entities. In SIGMOD. ACM, 593--605.
[43]
Mohan Li and Jianzhong Li. 2016. A minimized-rule based approach for improving data currency. J. Comb. Optim. (2016), 812--841.
[44]
Mohan Li, Jianzhong Li, Siyao Cheng, and Yanbin Sun. 2018. Uncertain rule based method for determining data currency. IEICE TRANSACTIONS on Information and Systems 101, 10 (2018), 2447--2457.
[45]
Mohan Li and Yanbin Sun. 2018. Currency Preserving Query: Selecting the Newest Values from Multiple Tables. IEICE TRANSACTIONS on Information and Systems 101, 12 (2018), 3059--3072.
[46]
Pei Li, Xin Luna Dong, Andrea Maurino, and Divesh Srivastava. 2011. Linking Temporal Records. PVLDB 4, 11 (2011), 956--967.
[47]
Yuliang Li, Jinfeng Li, Yoshihiko Suhara, AnHai Doan, and Wang-Chiew Tan. 2020. Deep Entity Matching with Pre-Trained Language Models. PVLDB 14, 1 (2020), 50--60.
[48]
Yu Liang, Xuliang Duan, Yuanjun Ding, Xifeng Kou, and Jingcheng Huang. 2019. Data Mining of Students' Course Selection Based on Currency Rules and Decision Tree. In International Conference on Big Data and Computing. 247--252.
[49]
Ashley Little. 2020. Outdated Data: Worse Than No Data? https://info.aldensys.com/joint-use/outdated-data-is-worse-than-no-data#::text=Obsolete%20data%20about%20the%20condition,too%20old%20to%20be%20reliable.
[50]
Tie-Yan Liu. 2009. Learning to Rank for Information Retrieval. Found. Trends Inf. Retr. 3, 3 (2009), 225--331.
[51]
Tie-Yan Liu. 2010. Learning to rank for information retrieval. In SIGIR.
[52]
Ester Livshits, Alireza Heidari, Ihab F. Ilyas, and Benny Kimelfeld. 2020. Approximate Denial Constraints. PVLDB 13, 10 (2020), 1682--1695.
[53]
Niels Martin, Antonio Martinez-Millana, Bernardo Valdivieso, and Carlos Fernandez-Llatas. 2019. Interactive Data Cleaning for Process Mining: A Case Study of an Outpatient Clinic's Appointment System. In International Conference on Business Process Management.
[54]
Qiang Ning, Zhili Feng, and Dan Roth. 2017. A Structured Learning Approach to Temporal Relation Extraction. In Conference on Empirical Methods in Natural Language Processing (EMNLP). ACL, 1027--1037.
[55]
Qiang Ning, Hao Wu, Haoruo Peng, and Dan Roth. 2018. Improving Temporal Relation Extraction with a Globally Acquired Statistical Resource. In Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT. ACL, 841--851.
[56]
Rodrigo Frassetto Nogueira and Kyunghyun Cho. 2019. Passage Re-ranking with BERT. CoRR abs/1901.04085 (2019).
[57]
Rama Kumar Pasumarthi, Sebastian Bruch, Xuanhui Wang, Cheng Li, Michael Bendersky, Marc Najork, Jan Pfeifer, Nadav Golbandi, Rohan Anil, and Stephan Wolf. 2019. Tf-ranking: Scalable tensorflow library for learning-to-rank. In SIGKDD. 2970--2978.
[58]
Eduardo H. M. Pena, Eduardo Cunha de Almeida, and Felix Naumann. 2019. Discovery of Approximate (and Exact) Denial Constraints. PVLDB 13, 3 (2019), 266--278.
[59]
Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. In NAACL.
[60]
Royal Mail. 2018. Dynamic Customer Data in a Digital World: Data Services Insight Report. https://www.royalmail.com/business/system/files/royal-mail-data-services-insight-report-2018.pdf.
[61]
Ali Sadeghian, Mohammadreza Armandpour, Anthony Colas, and Daisy Zhe Wang. 2021. ChronoR: Rotation Based Temporal Knowledge Graph Embedding. In AAAI. AAAI Press, 6471--6479.
[62]
Fereidoon Sadri and Jeffrey D. Ullman. 1980. The Interaction between Functional Dependencies and Template Dependencies. In SIGMOD.
[63]
Yi Tay, Minh C Phan, Luu Anh Tuan, and Siu Cheung Hui. 2017. Learning to rank question answer pairs with holographic dual lstm architecture. In SIGIR. 695--704.
[64]
Julien Tourille, Olivier Ferret, Aurélie Névéol, and Xavier Tannier. 2017. Neural Architecture for Temporal Relation Extraction: A Bi-LSTM Approach for Detecting Narrative Containers. In ACL. ACL, 224--230.
[65]
Rakshit Trivedi, Hanjun Dai, Yichen Wang, and Le Song. 2017. Know-Evolve: Deep Temporal Reasoning for Dynamic Knowledge Graphs. In International Conference on Machine Learning (ICML), Vol. 70. PMLR, 3462--3471.
[66]
Hongzhi Wang, Xiaoou Ding, Jianzhong Li, and Hong Gao. 2018. Rule-based entity resolution on database with hidden temporal information. TKDE 30, 11 (2018), 2199--2212.
[67]
Jun Xu, Xiangnan He, and Hang Li. 2020. Deep Learning for Matching in Search and Recommendation. Found. Trends Inf. Retr. 14, 2--3 (2020), 102--288.
[68]
Jing Yao, Zhicheng Dou, Jun Xu, and Ji-Rong Wen. 2021. RLPS: A Reinforcement Learning-Based Framework for Personalized Search. TOIS 39, 3 (2021), 1--29.
[69]
Jingran Zhang, Fumin Shen, Xing Xu, and Heng Tao Shen. 2020. Temporal Reasoning Graph for Activity Recognition. IEEE Trans. Image Process. (2020).
[70]
Meng Zhang, Yang Liu, Huanbo Luan, and Maosong Sun. 2016. Listwise ranking functions for statistical machine translation. IEEE/ACM Transactions on Audio, Speech, and Language Processing 24, 8 (2016), 1464--1472.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 16, Issue 8
April 2023
257 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 April 2023
Published in PVLDB Volume 16, Issue 8

Check for updates

Badges

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 90
    Total Downloads
  • Downloads (Last 12 months)53
  • Downloads (Last 6 weeks)7
Reflects downloads up to 03 Oct 2024

Other Metrics

Citations

Cited By

View all

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media