research-article

Enriching Relations with Additional Attributes for ER

Authors:

Min XieAuthors Info & Claims

Proceedings of the VLDB Endowment, Volume 17, Issue 11

Pages 3109 - 3123

https://doi.org/10.14778/3681954.3681987

Published: 30 August 2024 Publication History

Abstract

This paper studies a new problem of relation enrichment. Given a relation D of schema R and a knowledge graph G with overlapping information, it is to identify a small number of relevant features from G, and extend schema R with the additional attributes, to maximally improve the accuracy of resolving entities represented by the tuples of D. We formulate the enrichment problem and show its intractability. Nonetheless, we propose a method to extract features from G that are diverse from the existing attributes of R, minimize null values, and moreover, reduce false positives and false negatives of entity resolution (ER) models. The method links tuples and vertices that refer to the same entity, learns a robust policy to extract attributes via reinforcement learning, and jointly trains the policy and ER models. Moreover, we develop algorithms for (incrementally) enriching D. Using real-life data, we experimentally verify that relation enrichment improves the accuracy of ER above 15.4% (percentage points) by adding 5 attributes, up to 33%.

References

[1]

2017. Identity fraud's impact on the insurance sector. https://legal.thomsonreuters.com/en/insights/articles/identity-frauds-impact-on-the-insurance-sector.

[2]

2019. IMDB. https://www.imdb.com/interfaces/.

[3]

2020. Knowledge Graphs for Financial Services. https://www2.deloitte.com/content/dam/Deloitte/nl/Documents/risk/deloitte-nl-risk-knowledge-graphs-financial-services.pdf.

[4]

2022. DBpedia. http://wiki.dbpedia.org.

[5]

2022. Fraud detection using knowledge graph: How to detect and visualize fraudulent activities. https://www.nebula-graph.io/posts/fraud-detection-using-knowledge-and-graph-database.

[6]

2022. How Fraudsters Create Fake Identities. https://www.shift-technology.com/resources/perspectives/sme-perspectives-how-fraudsters-create-fake-identities.

[7]

2022. Wikemedia. https://www.kaggle.com/datasets/kenshoresearch/kensho-derived-wikimedia-data.

[8]

2022. Wikidata - Recent changes. https://www.amazon.science/blog/combining-knowledge-graphs-quickly-and-accurately.

[9]

2022. Wikipedia. https://www.wikipedia.org.

[10]

2023. Code, datasets and full version. https://github.com/SICS-Fundamental-Research-Center/Enrichment.

[11]

2023. IMDb Non-Commercial Datasets. https://developer.imdb.com/non-commercial-datasets.

[12]

2023. Leverage Data Enrichment to Ensure You're Dealing with Real People. https://seon.io/resources/online-insurance-fraud/.

[13]

2023. SEON. https://seon.io/.

[14]

2023. SIFT. https://sift.com/.

[15]

2023. Social Network Usage and Growth Statistics. https://backlinko.com/social-media-users.

[16]

2023. STARBUCKS eGIFT. https://www.starbucks.com/terms/gift-card-offer-terms/.

[17]

2023. Wikidata:WikiProject Disambiguation pages. https://www.wikidata.org/wiki/Wikidata:WikiProject_Disambiguation_pages.

[18]

Aisha Abdallah, Mohd Aizaini Maarof, and Anazida Zainal. 2016. Fraud detection system: A survey. Journal of Network and Computer Applications 68 (2016), 90--113.

Digital Library

[19]

Ghadeer Abuoda, Saravanan Thirumuruganathan, and Ashraf Aboulnaga. 2022. Accelerating Entity Lookups in Knowledge Graphs Through Embeddings. In ICDE. IEEE, 1111--1123.

[20]

David W. Aha and Richard L. Bankert. 1995. A Comparative Evaluation of Sequential Feature Selection Algorithms. In Learning from Data - Fifth International Workshop on Artificial Intelligence and Statistics (AISTATS). Springer, 199--206.

[21]

Boanerges Aleman-Meza, Christian Halaschek-Wiener, Ismailcem Budak Arpinar, and Amit P. Sheth. 2003. Context-Aware Semantic Association Ranking. In SWDB. 33--50.

[22]

Rohit Ananthakrishna, Surajit Chaudhuri, and Venkatesh Ganti. 2002. Eliminating Fuzzy Duplicates in Data Warehouses. In VLDB. 586--597.

[23]

Arvind Arasu, Michaela Götz, and Raghav Kaushik. 2010. On active learning of record matching packages. In SIGMOD. 783--794.

[24]

Arvind Arasu, Christopher Ré, and Dan Suciu. 2009. Large-Scale Deduplication with Constraints Using Dedupalog. In ICDE. 952--963.

[25]

Marcelo Arenas, Leopoldo Bertossi, and Jan Chomicki. 1999. Consistent Query Answers in Inconsistent Databases. In PODS. 68--79.

[26]

Abolfazl Asudeh, Nima Shahbazi, Zhongjun Jin, and H. V. Jagadish. 2021. Identifying Insufficient Data Coverage for Ordinal Continuous-Valued Attributes. In SIGMOD. 129--141.

[27]

Zeinab Bahmani and Leopoldo E. Bertossi. 2017. Enforcing Relational Matching Dependencies with Datalog for Entity Resolution. In FLAIRS.

[28]

Zeinab Bahmani, Leopoldo E. Bertossi, and Nikolaos Vasiloglou. 2017. ERBlox: Combining matching dependencies with machine learning for entity resolution. Int. J. Approx. Reasoning 83 (2017), 118--141.

Digital Library

[29]

Parikshit Bansal, Prathamesh Deshpande, and Sunita Sarawagi. 2021. Missing Value Imputation on Multidimensional Time Series. PVLDB 14, 11 (2021), 2533--2545.

Digital Library

[30]

Roberto Battiti. 1994. Using mutual information for selecting features in supervised neural net learning. IEEE Trans. Neural Networks 5, 4 (1994), 537--550.

Digital Library

[31]

Mario Beraha, Alberto Maria Metelli, Matteo Papini, Andrea Tirinzoni, and Marcello Restelli. 2019. Feature Selection via Mutual Information: New Theoretical Insights. In International Joint Conference on Neural Networks (IJCNN). IEEE, 1--9.

[32]

Leopoldo E. Bertossi, Solmaz Kolahi, and Laks V. S. Lakshmanan. 2013. Data Cleaning and Query Answering with Matching Dependencies and Matching Functions. Theory Comput. Syst. 52, 3 (2013), 441--482.

Digital Library

[33]

Gunawan Budiprasetyo. 2019. Optimisation classification on the web of data using linked data. A study case: Movie popularity classification. Ph.D. Dissertation. University of Southampton.

[34]

Gabrielle Karine Canalle, Bernadette Farias Loscio, and Ana Carolina Salgado. 2017. A strategy for selecting relevant attributes for entity resolution in data integration systems. In International Conference on Enterprise Information Systems, Vol. 2. SCITEPRESS, 80--88.

[35]

Jianbo Chen, Le Song, Martin Wainwright, and Michael Jordan. 2018. Learning to explain: An information-theoretic perspective on model interpretation. In International Conference on Machine Learning. PMLR, 883--892.

[36]

Zitian Chen, Yanwei Fu, Yinda Zhang, Yu-Gang Jiang, Xiangyang Xue, and Leonid Sigal. 2019. Multi-level semantic feature augmentation for one-shot learning. IEEE Transactions on Image Processing 28, 9 (2019), 4594--4605.

[37]

Nadiia Chepurko, Ryan Marcus, Emanuel Zgraggen, Raul Castro Fernandez, Tim Kraska, and David R. Karger. 2020. ARDA: Automatic Relational Data Augmentation for Machine Learning. PVLDB 13, 9 (2020), 1373--1387.

Digital Library

[38]

Vassilis Christophides, Vasilis Efthymiou, Themis Palpanas, George Papadakis, and Kostas Stefanidis. 2021. An Overview of End-to-End Entity Resolution for Big Data. ACM Comput. Surv. 53, 6 (2021), 127:1--127:42.

Digital Library

[39]

E. F. Codd. 1979. Extending the Database Relational Model to Capture More Meaning. ACM Trans. Database Syst. 4, 4 (1979), 397--434.

Digital Library

[40]

Ting Deng, Wenfei Fan, and Floris Geerts. 2016. Capturing Missing Tuples and Missing Values. ACM Trans. Database Syst. 41, 2 (2016), 10:1--10:47.

Digital Library

[41]

Ting Deng, Wenfei Fan, Ping Lu, Xiaomeng Luo, Xiaoke Zhu, and Wanhe An. 2022. Deep and Collective Entity Resolution in Parallel. In ICDE. IEEE, 2060--2072.

[42]

Xiang Deng, Huan Sun, Alyssa Lees, You Wu, and Cong Yu. 2020. TURL: Table Understanding through Representation Learning. PVLDB (2020).

Digital Library

[43]

Xin Dong, Alon Y. Halevy, and Jayant Madhavan. 2005. Reference Reconciliation in Complex Information Spaces. In SIGMOD. ACM, 85--96.

[44]

Yuyang Dong, Kunihiro Takeoka, Chuan Xiao, and Masafumi Oyamada. 2021. Efficient joinable table discovery in data lakes: A high-dimensional similarity-based approach. In ICDE. IEEE, 456--467.

[45]

Muhammad Ebraheem, Saravanan Thirumuruganathan, Shafiq R. Joty, Mourad Ouzzani, and Nan Tang. 2018. Distributed Representations of Tuples for Entity Resolution. PVLDB 16, 8 (2018), 1944--1957.

[46]

Mahdi Esmailoghli, Jorge-Arnulfo Quiané-Ruiz, and Ziawasch Abedjan. 2021. COCOA: COrrelation COefficient-Aware Data Augmentation. In EDBT. 331--336.

[47]

Grace Fan, Jin Wang, Yuliang Li, Dan Zhang, and Renée J. Miller. 2023. Semantics-aware Dataset Discovery from Data Lakes with Contextualized Column-based Representation Learning. PVLDB 16, 7 (2023), 1726--1739.

Digital Library

[48]

Wenfei Fan, Hong Gao, Xibei Jia, Jianzhong Li, and Shuai Ma. 2011. Dynamic constraints for record matching. VLDB J. 20, 4 (2011), 495--520.

Digital Library

[49]

Wenfei Fan and Floris Geerts. 2010. Relative information completeness. ACM Trans. Database Syst. 35, 4 (2010), 27:1--27:44.

Digital Library

[50]

Wenfei Fan, Floris Geerts, Xibei Jia, and Anastasios Kementsietsidis. 2008. Conditional functional dependencies for capturing data inconsistencies. ACM Trans. Database Syst. 33, 2 (2008), 6:1--6:48.

Digital Library

[51]

Wenfei Fan, Liang Geng, Ruochun Jin, Ping Lu, Resul Tugey, and Wenyuan Yu. 2022. Linking Entities across Relations and Graphs. In ICDE. IEEE, 634--647.

[52]

Wenfei Fan, Ziyan Han, Weilong Ren, Ding Wang, Yaoshu Wang, Min Xie, and Mengyi Yan. 2023. Splitting Tuples of Mismatched Entities. Proc. ACM Manag. Data (2023).

Digital Library

[53]

Wenfei Fan, Chunming Hu, and Chao Tian. 2017. Incremental Graph Computations: Doable and Undoable. In SIGMOD. 155--169.

[54]

Wenfei Fan, Jianzhong Li, Shuai Ma, Nan Tang, and Wenyuan Yu. 2012. Towards certain fixes with editing rules and master data. VLDB J. 21, 2 (2012), 213--238.

Digital Library

[55]

Wenfei Fan, Ping Lu, and Chao Tian. 2020. Unifying Logic Rules and Machine Learning for Entity Enhancing. Sci. China Inf. Sci. 63, 7 (2020).

[56]

Wenfei Fan, Chao Tian, Yanghao Wang, and Qiang Yin. 2021. Parallel Discrepancy Detection and Incremental Detection. PVLDB 14, 8 (2021), 1351--1364.

Digital Library

[57]

Lior Friedman and Shaul Markovitch. 2018. Recursive feature generation for knowledge-based learning. arXiv preprint arXiv:1802.00050 (2018).

[58]

Xinyu Fu, Jiani Zhang, Ziqiao Meng, and Irwin King. 2020. MAGNN: Metapath aggregated graph neural network for heterogeneous graph embedding. In The Web Conference 2020. 2331--2341.

Digital Library

[59]

Sainyam Galhotra, Udayan Khurana, Oktie Hassanzadeh, Kavitha Srinivas, Horst Samulowitz, and Miao Qi. 2019. Automated feature enhancement for predictive modeling using external knowledge. In International Conference on Data Mining Workshops (ICDMW). IEEE, 1094--1097.

[60]

Michael Garey and David Johnson. 1979. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman and Company.

Digital Library

[61]

Songtao Guo, Xin Luna Dong, Divesh Srivastava, and Remi Zajac. 2010. Record Linkage with Uniqueness Constraints and Erroneous Values. PVLDB 3, 1 (2010), 417--428.

Digital Library

[62]

Mark A. Hall. 2000. Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning. In International Conference on Machine Learning (ICML). Morgan Kaufmann, 359--366.

[63]

Asaf Harari and Gilad Katz. 2022. Automatic features generation and selection from external sources: A DBpedia use case. Information Sciences 582 (2022), 398--414.

Digital Library

[64]

Asaf Harari and Gilad Katz. 2022. Few-Shot Tabular Data Enrichment Using Fine-Tuned Transformer Architectures. In ACL. Association for Computational Linguistics, 1577--1591.

[65]

Qi He, Jaewon Yang, and Baoxu Shi. 2020. Constructing knowledge graph for social networks in a deep and holistic way. In Companion Proceedings of the Web Conference 2020. 307--308.

Digital Library

[66]

Benjamin Hilprecht and Carsten Binnig. 2021. ReStore - Neural Data Completion for Relational Databases. In SIGMOD. 710--722.

[67]

Arthur E. Hoerl and Robert W. Kennard. 2000. Ridge Regression: Biased Estimation for Nonorthogonal Problems. Technometrics 42, 1 (2000), 80--86.

Digital Library

[68]

Aidan Hogan, Eva Blomqvist, Michael Cochez, Claudia d'Amato, Gerard de Melo, Claudio Gutiérrez, Sabrina Kirrane, José Emilio Labra Gayo, Roberto Navigli, Sebastian Neumaier, Axel-Cyrille Ngonga Ngomo, Axel Polleres, Sabbir M. Rashid, Anisa Rula, Lukas Schmelzeisen, Juan F. Sequeda, Steffen Staab, and Antoine Zimmermann. 2021. Knowledge Graphs. ACM Comput. Surv. 54, 4 (2021), 71:1--71:37.

Digital Library

[69]

Xuming Hu, Shen Wang, Xiao Qin, Chuan Lei, Zhengyuan Shen, Christos Faloutsos, Asterios Katsifodimos, George Karypis, Lijie Wen, and Philip S. Yu. 2023. Automatic Table Union Search with Tabular Representation Learning. In Findings of the Association for Computational Linguistics: ACL. Association for Computational Linguistics.

[70]

Shengyi Huang and Santiago Ontañón. 2022. A Closer Look at Invalid Action Masking in Policy Gradient Algorithms. In the Thirty-Fifth International Florida Artificial Intelligence Research Society Conference (FLAIRS).

[71]

Vassilis N Ioannidis, Xiang Song, Saurav Manchanda, Mufei Li, Xiaoqin Pan, Da Zheng, Xia Ning, Xiangxiang Zeng, and George Karypis. 2020. DRKG-drug repurposing knowledge graph for covid-19. https://github.com/gnn4dr/DRKG/.

[72]

Robert Isele, Anja Jentzsch, and Christian Bizer. 2010. Silk server-adding missing links while consuming linked data. In COLD. 85--96.

[73]

Kashif Javed, Sameen Maruf, and Haroon A. Babri. 2015. A two-stage Markov blanket based feature selection algorithm for text classification. Neurocomputing 157 (2015), 91--104.

[74]

Jungo Kasai, Kun Qian, Sairam Gurajada, Yunyao Li, and Lucian Popa. 2019. Low-resource Deep Entity Resolution with Transfer and Active Learning. In ACL. 5851--5861.

[75]

Yoed N Kenett, Effi Levi, David Anaki, and Miriam Faust. 2017. The semantic distance task: Quantifying semantic distance with semantic network path length. Journal of Experimental Psychology: Learning, Memory, and Cognition 43, 9 (2017), 1470.

[76]

Aamod Khatiwada, Grace Fan, Roee Shraga, Zixuan Chen, Wolfgang Gatterbauer, Renée J. Miller, and Mirek Riedewald. 2023. SANTOS: Relationship-based Semantic Table Union Search. Proc. ACM Manag. Data 1, 1 (2023), 9:1--9:25.

Digital Library

[77]

Mourad Khayati, Ines Arous, Zakhar Tymchenko, and Philippe Cudré-Mauroux. 2020. ORBITS: Online Recovery of Missing Values in Multiple Time Series Streams. PVLDB 14, 3 (2020), 294--306.

Digital Library

[78]

Ron Kohavi and George H. John. 1997. Wrappers for Feature Subset Selection. Artif. Intell. 97, 1--2 (1997), 273--324.

Digital Library

[79]

loannis Koumarelas, Thorsten Papenbrock, and Felix Naumann. 2020. MDedup: Duplicate Detection with Matching Dependencies. PVLDB 13, 5 (2020), 712--725.

[80]

Walter Kropatsch. 1996. Building irregular pyramids by dual-graph contraction. In Vision Image and Signal Processing.

[81]

Clyde P. Kruskal, Larry Rudolph, and Marc Snir. 1990. A complexity theory of efficient parallel algorithms. Theoretical Computer Science 71, 1 (1990), 95--132.

Digital Library

[82]

Arun Kumar, Jeffrey Naughton, Jignesh M Patel, and Xiaojin Zhu. 2016. To join or not to join? Thinking twice about joins before feature selection. In SIGMOD. 19--34.

[83]

Alexander K. Lew, Monica Agrawal, David A. Sontag, and Vikash Mansinghka. 2021. PClean: Bayesian Data Cleaning at Scale with Domain-Specific Probabilistic Programming. In International Conference on Artificial Intelligence and Statistics, (AISTATS) (Proceedings of Machine Learning Research).

[84]

Chenjie Li, Zhengjie Miao, Qitian Zeng, Boris Glavic, and Sudeepa Roy. 2021. Putting things into context: Rich explanations for query answers using join graphs. In SIGMOD. 1051--1063.

[85]

Pan Li, Da Li, Wei Li, Shaogang Gong, Yanwei Fu, and Timothy M Hospedales. 2021. A simple feature augmentation for domain generalization. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 8886--8895.

[86]

Yuliang Li, Jinfeng Li, Yoshihiko Suhara, AnHai Doan, and Wang-Chiew Tan. 2020. Deep Entity Matching with Pre-Trained Language Models. PVLDB 14, 1 (2020), 50--60.

Digital Library

[87]

Xueling Lin, Haoyang Li, Hao Xin, Zijian Li, and Lei Chen. 2020. KBPearl: A Knowledge Base Population System Supported by Joint Entity and Relation Linking. PVLDB 13, 7 (2020), 1035--1049.

Digital Library

[88]

Jiabin Liu, Chengliang Chai, Yuyu Luo, Yin Lou, Jianhua Feng, and Nan Tang. 2022. Feature Augmentation with Reinforcement Learning. In ICDE. IEEE, 3360--3372.

[89]

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019).

[90]

Mohammad Mahdavi and Ziawasch Abedjan. 2020. Baran: Effective Error Correction via a Unified Context Representation and Transfer Learning. PVLDB 13, 11 (2020), 1948--1961.

Digital Library

[91]

Andy Maule, Wolfgang Emmerich, and David S Rosenblum. 2008. Impact analysis of database schema changes. In international conference on Software engineering. 451--460.

Digital Library

[92]

Zhengjie Miao, Yuliang Li, and Xiaolan Wang. 2021. Rotom: A Meta-Learned Data Augmentation Framework for Entity Matching, Data Cleaning, Text Classification, and Beyond. In SIGMOD. ACM, 1303--1316.

Digital Library

[93]

Alberto Moraglio, Cecilia Di Chio, and Riccardo Poli. 2007. Geometric Particle Swarm Optimisation. In EuroGP (Lecture Notes in Computer Science, Vol. 4445). Springer, 125--136.

[94]

Michalis Mountantonakis and Yannis Tzitzikas. 2017. How linked data can aid machine learning-based tasks. In International Conference on Theory and Practice of Digital Libraries (TPDL). 155--168.

[95]

Sidharth Mudgal, Han Li, Theodoros Rekatsinas, AnHai Doan, Youngchoon Park, Ganesh Krishnan, Rohit Deep, Esteban Arcaute, and Vijay Raghavendra. 2018. Deep learning for entity matching: A design space exploration. In SIGMOD. 19--34.

[96]

Avanika Narayan, Ines Chami, Laurel J. Orr, and Christopher Ré. 2022. Can Foundation Models Wrangle Your Data? PVLDB 16, 4 (2022), 738--746.

Digital Library

[97]

Fatemeh Nargesian, Erkang Zhu, Ken Q. Pu, and Renée J. Miller. 2018. Table Union Search on Open Data. PVLDB 11, 7 (2018), 813--825.

Digital Library

[98]

George Papadakis, George Mandilaras, Luca Gagliardelli, Giovanni Simonini, Emmanouil Thanos, George Giannakopoulos, Sonia Bergamaschi, Themis Palpanas, and Manolis Koubarakis. 2020. Three-dimensional entity resolution with JedAI. Information Systems 93 (2020), 101565.

[99]

Jan Peters and J. Andrew Bagnell. 2017. Policy Gradient Methods. In Encyclopedia of Machine Learning and Data Mining. Springer.

[100]

Abdulhakim Qahtan, Nan Tang, Mourad Ouzzani, Yang Cao, and Michael Stonebraker. 2020. Pattern functional dependencies for data cleaning. PVLDB 13, 5 (2020), 684--697.

Digital Library

[101]

Zhixin Qi, Hongzhi Wang, Jianzhong Li, and Hong Gao. 2018. FROG: Inference from knowledge base for missing value imputation. Knowl. Based Syst. 145 (2018), 77--90.

Digital Library

[102]

Kun Qian, Lucian Popa, and Prithviraj Sen. 2017. Active Learning for Large-Scale Entity Resolution. In CIKM. 1379--1388.

[103]

Priya Radhakrishnan, Partha P. Talukdar, and Vasudeva Varma. 2018. ELDEN: Improved Entity Linking Using Densified Knowledge Graphs. In Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT). Association for Computational Linguistics, 1844--1853.

[104]

Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Conference on Empirical Methods in Natural Language Processing.

[105]

Theodoros Rekatsinas, Xu Chu, Ihab F. Ilyas, and Christopher Ré. 2017. Holo-Clean: Holistic Data Repairs with Probabilistic Inference. PVLDB 10, 11 (2017), 1190--1201.

Digital Library

[106]

Aécio Santos, Aline Bessa, Fernando Chirigati, Christopher Musco, and Juliana Freire. 2021. Correlation sketches for approximate join-correlation queries. In SIGMOD. 1531--1544.

[107]

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal Policy Optimization Algorithms. CoRR abs/1707.06347 (2017).

[108]

Özge Sevgili, Artem Shelmanov, Mikhail Y. Arkhipov, Alexander Panchenko, and Chris Biemann. 2022. Neural entity linking: A survey of models based on deep learning. Semantic Web 13, 3 (2022), 527--570.

Digital Library

[109]

Vraj Shah, Arun Kumar, and Xiaojin Zhu. 2017. Are key-foreign key joins safe to avoid when learning high-capacity classifiers? arXiv preprint arXiv:1704.00485 (2017).

[110]

Zhuchen Shao, Liuxi Dai, Yifeng Wang, Haoqian Wang, and Yongbing Zhang. 2023. AugDiff: Diffusion based Feature Augmentation for Multiple Instance Learning in Whole Slide Image. arXiv preprint arXiv:2303.06371 (2023).

[111]

Shubhranshu Shekhar, Deepak Pai, and Sriram Ravindran. 2020. Entity resolution in dynamic heterogeneous networks. In Companion Proceedings of the Web Conference 2020. 662--668.

Digital Library

[112]

Feichen Shen and Yugyung Lee. 2016. Knowledge discovery from biomedical ontologies in cross domains. PloS one 11, 8 (2016), e0160005.

[113]

Kai Shu, Suhang Wang, Jiliang Tang, Reza Zafarani, and Huan Liu. 2016. User Identity Linkage across Online Social Networks: A Review. SIGKDD Explor. 18, 2 (2016), 5--17.

Digital Library

[114]

Dag Sjøberg. 1993. Quantifying schema evolution. Information and Software Technology 35, 1 (1993), 35--44.

[115]

Petr Somol, Pavel Pudil, Jana Novovicová, and Pavel Paclík. 1999. Adaptive floating search methods in feature selection. Pattern Recognit. Lett. 20, 11--13 (1999), 1157--1163.

Digital Library

[116]

Shaoxu Song, Yu Sun, Aoqian Zhang, Lei Chen, and Jianmin Wang. 2018. Enriching data imputation under similarity rule constraints. TKDE 32, 2 (2018), 275--287.

[117]

Indro Spinelli, Simone Scardapane, and Aurelio Uncini. 2020. Missing data imputation with adversarially-trained graph convolutional networks. Neural Networks (2020).

[118]

El-Ghazali Talbi, Laetitia Jourdan, José García-Nieto, and Enrique Alba. 2008. Comparison of population based metaheuristics for feature selection: Application to microarray data classification. In International Conference on Computer Systems and Applications (AICCSA). IEEE Computer Society, 45--52.

Digital Library

[119]

R. Tibshirani. 1996. Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society (Series B) 58 (1996), 267--288.

[120]

Chung-Jui Tu, Li-Yeh Chuang, Jun-Yang Chang, and Cheng-Hong Yang. 2006. Feature Selection using PSO-SVM. In International MultiConference of Engineers and Computer Scientists (IMECS) (Lecture Notes in Engineering and Computer Science). Newswood Limited, 138--143.

[121]

Ryan J. Urbanowicz, Melissa Meeker, William G. La Cava, Randal S. Olson, and Jason H. Moore. 2018. Relief-based feature selection: Introduction and review. J. Biomed. Informatics 85 (2018), 189--203.

Digital Library

[122]

Aäron van den Oord, Yazhe Li, and Oriol Vinyals. 2018. Representation Learning with Contrastive Predictive Coding. CoRR abs/1807.03748 (2018).

[123]

Jiayi Wang, Chengliang Chai, Nan Tang, Jiabin Liu, and Guoliang Li. 2022. Coresets over Multiple Tables for Feature-rich and Data-efficient Machine Learning. PVLDB 16, 1 (2022), 64--76.

Digital Library

[124]

Pengfei Wang, Xiaocan Zeng, Lu Chen, Fan Ye, Yuren Mao, Junhao Zhu, and Yunjun Gao. 2022. PromptEM: Prompt-tuning for Low-resource Generalized Entity Matching. PVLDB 16, 2 (2022), 369--378.

Digital Library

[125]

Yue Wang and Shaofeng Zou. 2022. Policy Gradient Method For Robust Reinforcement Learning. In International Conference on Machine Learning (ICML).

[126]

Melanie Weis and Felix Naumann. 2005. DogmatiX Tracks down Duplicates in XML. In SIGMOD. ACM, 431--442.

[127]

Steven Euijong Whang and Hector Garcia-Molina. 2013. Joint entity resolution on multiple datasets. VLDB J. 22, 6 (2013), 773--795.

Digital Library

[128]

Richard Wu, Aoqian Zhang, Ihab F. Ilyas, and Theodoros Rekatsinas. 2020. Attention-based Learning for Missing Data Imputation in HoloClean. In MLSys 2020.

[129]

Jinsung Yoon, James Jordon, and Mihaela van der Schaar. 2018. GAIN: Missing Data Imputation using Generative Adversarial Nets. In ICML. PMLR, 5675--5684.

[130]

Brit Youngmann, Michael Cafarella, Yuval Moskovitch, and Babak Salimi. 2023. On Explaining Confounding Bias. In ICDE. IEEE, 1846--1859.

[131]

Brit Youngmann, Michael Cafarella, Babak Salimi, and Anna Zeng. 2023. Causal Data Integration. PVLDB 16, 10 (2023), 2659--2665.

Digital Library

[132]

Wenhao Yu, Chenguang Zhu, Zaitang Li, Zhiting Hu, Qingyun Wang, Heng Ji, and Meng Jiang. 2022. A Survey of Knowledge-enhanced Text Generation. ACM Comput. Surv. 54, 11s (2022), 227:1--227:38.

[133]

Reza Zafarani and Huan Liu. 2016. Users joining multiple sites: Friendship and popularity variations across sites. Inf. Fusion 28 (2016), 83--89.

Digital Library

[134]

Dongxiang Zhang, Long Guo, Xiangnan He, Jie Shao, Sai Wu, and Heng Tao Shen. 2018. A Graph-Theoretic Fusion Framework for Unsupervised Entity Resolution. In ICDE.

[135]

Yi Zhang and Zachary G Ives. 2020. Finding related tables in data lakes for interactive data science. In SIGMOD. ACM, 1951--1966.

Digital Library

[136]

Yifei Zhang, Hao Zhu, Zixing Song, Piotr Koniusz, and Irwin King. 2022. Spectral Feature Augmentation for Graph Contrastive Learning and Beyond. arXiv preprint arXiv:2212.01026 (2022).

[137]

Zixuan Zhao and Raul Castro Fernandez. 2022. Leva: Boosting Machine Learning Performance with Relational Embedding Data Augmentation. In SIGMOD. ACM, 1504--1517.

[138]

Erkang Zhu, Dong Deng, Fatemeh Nargesian, and Renée J Miller. 2019. Josie: Overlap set similarity search for finding joinable tables in data lakes. In SIGMOD. ACM, 847--864.

Digital Library

[139]

H. Zou and T. Hastie. 2003. Regularization and Variable Selection via the Elastic Net. Journal of the Royal Statistical Society: Series B (Statistical Methodology) (2003).

[140]

Xiaohan Zuo, Peng Lu, Xi Liu, Yibo Gao, Yiping Yang, and Jianxin Chen. 2011. An improved feature selection algorithm based on Markov blanket. In International Conference on Biomedical Engineering and Informatics, (BMEI). IEEE, 1645--1649.

Index Terms

Enriching Relations with Additional Attributes for ER
1. Computer systems organization
  1. Architectures
    1. Other architectures

Index terms have been assigned to the content through auto-classification.

Recommendations

Knowledge representation learning with entities, attributes and relations
IJCAI'16: Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence

Distributed knowledge representation (KR) encodes both entities and relations in a low-dimensional semantic space, which has significantly promoted the performance of relation extraction and knowledge reasoning. In many knowledge graphs (KG), some ...
Extracting Attributes and Synonymous Attributes from Online Encyclopedias
WI-IAT '14: Proceedings of the 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT) - Volume 01

In this paper, we present an approach that extracts attributes of open-domain named entities for the Chinese language. The approach contains two steps. The first step consists in an unsupervised technique which captures high frequency attributes from ...
Multiple Aesthetic Attribute Assessment by Exploiting Relations Among Aesthetic Attributes
ICMR '15: Proceedings of the 5th ACM on International Conference on Multimedia Retrieval

Current research of aesthetic assessment for images assumes one aesthetic score or one aesthetic label for an image, ignoring the relations of multiple aesthetic-related attributes. However, most images can be described by multiple aesthetic attributes ...

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment

Proceedings of the VLDB Endowment Volume 17, Issue 11

July 2024

1039 pages

Editors:
Meihui Zhang
Beijing Institute of Technology
,
Cyrus Shahabi
University of Southern California

Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 30 August 2024

Published in PVLDB Volume 17, Issue 11

Check for updates

Badges

Artifacts Available / v1.1

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
48
Total Downloads

Downloads (Last 12 months)48
Downloads (Last 6 weeks)21

Reflects downloads up to 09 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents