research-article

CoTel: Ontology-Neural Co-Enhanced Text Labeling

Authors:

Guidong ZhengAuthors Info & Claims

WWW '23: Proceedings of the ACM Web Conference 2023

Pages 1897 - 1906

https://doi.org/10.1145/3543507.3583533

Published: 30 April 2023 Publication History

Abstract

The success of many web services relies on the large-scale domain-specific high-quality labeled dataset. Insufficient public datasets motivate us to reduce the cost of data labeling while maintaining high accuracy in support of intelligent web applications. The rule-based method and the learning-based method are common techniques for labeling. In this work, we study how to utilize the rule-based and learning-based methods for resource-effective text labeling. We propose CoTel, the first ontology-neural co-enhanced framework for text labeling. We propose critical ontology extraction in the rule-based module and ontology-enhanced loss prediction in the learning-based module. CoTel can integrate explicit labeling rules and implicit labeling models and make them help each other to improve resource efficiency in text labeling tasks. We evaluate CoTel on both public datasets and real applications with three different tasks. Compared with the baseline, CoTel can reduce the time cost by 64.75% (a 2.84× speedup) and the number of labeling by 62.07%.

References

[1]

Rakesh Agrawal, Tomasz Imieliński, and Arun Swami. 1993. Mining association rules between sets of items in large databases. In Proceedings of the 1993 ACM SIGMOD international conference on Management of data. 207–216.

Digital Library

[2]

Raunak Bhattacharyya, Soyeon Jung, Liam A Kruse, Ransalu Senanayake, and Mykel J Kochenderfer. 2021. A Hybrid Rule-Based and Data-Driven Approach to Driver Modeling Through Particle Filtering. IEEE Transactions on Intelligent Transportation Systems (2021).

[3]

Richard Boddy and Gordon Smith. 2009. Statistical methods in practice: for scientists and technologists. John Wiley & Sons.

[4]

Tingting Cai, Zhiyuan Ma, Hong Zheng, and Yangming Zhou. 2021. NE–LP: normalized entropy-and loss prediction-based sampling for active learning in Chinese word segmentation on EHRs. Neural Computing and Applications 33, 19 (2021), 12535–12549.

Digital Library

[5]

Haw-Shiuan Chang, Shankar Vembu, Sunil Mohan, Rheeya Uppaal, and Andrew McCallum. 2020. Using error decay prediction to overcome practical issues of deep active learning for named entity recognition. Machine Learning 109, 9 (2020), 1749–1778.

Digital Library

[6]

Huizhong Chen, Andrew Gallagher, and Bernd Girod. 2012. Describing clothing by semantic attributes. In Computer Vision–ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings, Part III 12. Springer, 609–623.

Digital Library

[7]

Laura Chiticariu, Yunyao Li, and Frederick R. Reiss. 2013. Rule-Based Information Extraction is Dead! Long Live Rule-Based Information Extraction Systems!. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Seattle, Washington, USA, 827–832. https://aclanthology.org/D13-1079

[8]

Zhiyong Cui, Ruimin Ke, Ziyuan Pu, and Yinhai Wang. 2018. Deep bidirectional and unidirectional LSTM recurrent neural network for network-wide traffic speed prediction. arXiv preprint arXiv:1801.02143 (2018).

[9]

Ona de Gibert, Naiara Perez, Aitor García-Pablos, and Montse Cuadros. 2018. Hate Speech Dataset from a White Supremacy Forum. In Proceedings of the 2nd Workshop on Abusive Language Online (ALW2). Association for Computational Linguistics, Brussels, Belgium, 11–20. https://doi.org/10.18653/v1/W18-5102

[10]

Ting Deng, Wenfei Fan, Ping Lu, Xiaomeng Luo, Xiaoke Zhu, and Wanhe An. 2022. Deep and Collective Entity Resolution in Parallel. In 2022 IEEE 38th International Conference on Data Engineering (ICDE). IEEE, 2060–2072.

[11]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).

[12]

Majigsuren Enkhsaikhan, Wei Liu, Eun-Jung Holden, and Paul Duuring. 2021. Auto-labelling entities in low-resource text: a geological case study. Knowledge and Information Systems 63 (2021), 695–715.

Digital Library

[13]

Wenfei Fan. 2022. Big graphs: challenges and opportunities. Proceedings of the VLDB Endowment 15, 12 (2022), 3782–3797.

Digital Library

[14]

Wenfei Fan, Ziyan Han, Yaoshu Wang, and Min Xie. 2022. Parallel Rule Discovery from Large Datasets by Sampling. In Proceedings of the 2022 International Conference on Management of Data. 384–398.

Digital Library

[15]

Wenfei Fan, Ping Lu, and Chao Tian. 2020. Unifying logic rules and machine learning for entity enhancing. Science China Information Sciences 63, 7 (2020), 1–19.

[16]

Chenchen Feng, Yu He, Shiyang Wen, Guojun Liu, Liang Wang, Jian Xu, and Bo Zheng. 2022. DC-GNN: Decoupled Graph Neural Networks for Improving and Accelerating Large-Scale E-commerce Retrieval. In Companion Proceedings of the Web Conference 2022. 32–40.

[17]

Yuxia Geng, Jiaoyan Chen, Zhuo Chen, Jeff Z Pan, Zhiquan Ye, Zonggang Yuan, Yantao Jia, and Huajun Chen. 2021. OntoZSL: Ontology-enhanced zero-shot learning. In Proceedings of the Web Conference 2021. 3325–3336.

Digital Library

[18]

Daniele Di Grandi. 2022. ProbQL: A Probabilistic Query Language for Information Extraction from PDF Reports and Natural Language Written Texts. Master’s thesis.

[19]

Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017. Neural collaborative filtering. In Proceedings of the 26th international conference on world wide web. 173–182.

Digital Library

[20]

Jeff Howe 2006. The rise of crowdsourcing. Wired magazine 14, 6 (2006), 1–4.

[21]

Mostafa S Ibrahim, Srikanth Muralidharan, Zhiwei Deng, Arash Vahdat, and Greg Mori. 2016. A hierarchical deep temporal model for group activity recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1971–1980.

[22]

Houye Ji, Junxiong Zhu, Chuan Shi, Xiao Wang, Bai Wang, Chaoyu Zhang, Zixuan Zhu, Feng Zhang, and Yanghua Li. 2021. Large-scale comb-k recommendation. In Proceedings of the Web Conference 2021. 2512–2523.

Digital Library

[23]

T Karthikeyan and N Ravikumar. 2014. A survey on association rule mining. International Journal of Advanced Research in Computer and Communication Engineering 3, 1 (2014), 2278–1021.

[24]

Ildoo Kim, Younghoon Kim, and Sungwoong Kim. 2020. Learning loss for test-time augmentation. Advances in Neural Information Processing Systems 33 (2020), 4163–4174.

[25]

Jiayu Lei, Zheng Zhang, Lan Zhang, and Xiang-Yang Li. 2022. Coca: Cost-effective collaborative annotation system by combining experts and amateurs. In 2022 IEEE 38th International Conference on Data Engineering (ICDE). IEEE, 674–685.

[26]

Hector J Levesque. 1986. Knowledge representation and reasoning. Annual review of computer science 1, 1 (1986), 255–287.

[27]

David D Lewis. 1995. A sequential algorithm for training text classifiers: Corrigendum and additional data. In Acm Sigir Forum, Vol. 29. ACM New York, NY, USA, 13–19.

[28]

Jia Li and Dandan Song. 2022. Uncertainty-aware Pseudo Label Refinery for Entity Alignment. In Proceedings of the ACM Web Conference 2022. 829–837.

Digital Library

[29]

Roberto Lourenco Jr, Adriano Veloso, Adriano Pereira, Wagner Meira Jr, Renato Ferreira, and Srinivasan Parthasarathy. 2014. Economically-efficient sentiment stream analysis. In Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval. 637–646.

Digital Library

[30]

Yun Ma, Dongwei Xiang, Shuyu Zheng, Deyu Tian, and Xuanzhe Liu. 2019. Moving deep learning into web browser: How far can we go¿. In The World Wide Web Conference. 1234–1244.

Digital Library

[31]

Mariane Moreira, Jefersson A dos Santos, and Adriano Veloso. 2014. Learning to rank similar apparel styles with economically-efficient rule-based active learning. In Proceedings of International Conference on Multimedia Retrieval. 361–368.

Digital Library

[32]

Vu-Linh Nguyen, Mohammad Hossein Shaker, and Eyke Hüllermeier. 2022. How to measure uncertainty in uncertainty sampling for active learning. Machine Learning 111, 1 (2022), 89–122.

Digital Library

[33]

Sinno Jialin Pan and Qiang Yang. 2009. A survey on transfer learning. IEEE Transactions on knowledge and data engineering 22, 10 (2009), 1345–1359.

Digital Library

[34]

W Gerrod Parrott. 2001. Emotions in social psychology: Essential readings. psychology press.

[35]

Andrew Pavlo, Erik Paulson, Alexander Rasin, Daniel J Abadi, David J DeWitt, Samuel Madden, and Michael Stonebraker. 2009. A comparison of approaches to large-scale data analysis. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of data. 165–178.

Digital Library

[36]

Samira Pouyanfar, Saad Sadiq, Yilin Yan, Haiman Tian, Yudong Tao, Maria Presa Reyes, Mei-Ling Shyu, Shu-Ching Chen, and Sundaraja S Iyengar. 2018. A survey on deep learning: Algorithms, techniques, and applications. ACM Computing Surveys (CSUR) 51, 5 (2018), 1–36.

Digital Library

[37]

Yuanyuan Qiao, Yuewei Wu, Fan Duo, Wenhui Lin, and Jie Yang. 2019. Siamese neural networks for user identity linkage through web browsing. IEEE transactions on neural networks and learning systems 31, 8 (2019), 2741–2751.

[38]

Minghui Qiu, Liu Yang, Feng Ji, Wei Zhou, Jun Huang, Haiqing Chen, W Bruce Croft, and Wei Lin. 2018. Transfer Learning for Context-Aware Question Matching in Information-seeking Conversations in E-commerce. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 208–213.

[39]

Elvis Saravia, Hsien-Chi Toby Liu, Yen-Hao Huang, Junlin Wu, and Yi-Shin Chen. 2018. CARER: Contextualized Affect Representations for Emotion Recognition. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Brussels, Belgium, 3687–3697. https://doi.org/10.18653/v1/D18-1404

[40]

Burr Settles. 2009. Active learning literature survey. (2009).

[41]

Shadi Shaheen, Wassim El-Hajj, Hazem Hajj, and Shady Elbassuoni. 2014. Emotion recognition from text based on automatically generated rules. In 2014 IEEE International Conference on Data Mining Workshop. IEEE, 383–392.

[42]

Sahand Sharifzadeh, Sina Moayed Baharlou, and Volker Tresp. 2021. Classification by attention: Scene graph classification with prior knowledge. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 5025–5033.

[43]

Megh Shukla and Shuaib Ahmed. 2021. A mathematical analysis of learning loss for active learning in regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3320–3328.

[44]

Firdaus Solihin and Indra Budi. 2018. Recording of law enforcement based on court decision document using rule-based information extraction. In 2018 International Conference on Advanced Computer Science and Information Systems (ICACSIS). IEEE, 349–354.

[45]

Abdul Syafiq Abdull Sukor, Ammar Zakaria, Norasmadi Abdul Rahim, Latifah Munirah Kamarudin, Rossi Setchi, and Hiromitsu Nishizaki. 2019. A hybrid approach of knowledge-driven and data-driven reasoning for activity recognition in smart homes. Journal of Intelligent & Fuzzy Systems 36, 5 (2019), 4177–4188.

[46]

Iulia Turc, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. Well-Read Students Learn Better: On the Importance of Pre-training Compact Models. arXiv preprint arXiv:1908.08962v2 (2019).

[47]

Qi Wang, Yue Ma, Kun Zhao, and Yingjie Tian. 2022. A comprehensive survey of loss functions in machine learning. Annals of Data Science 9, 2 (2022), 187–212.

[48]

Karl Weiss, Taghi M Khoshgoftaar, and DingDing Wang. 2016. A survey of transfer learning. Journal of Big data 3, 1 (2016), 1–40.

[49]

Xing Wu, Cheng Chen, Mingyu Zhong, Jianjia Wang, and Jun Shi. 2021. COVID-AL: The diagnosis of COVID-19 with deep active learning. Medical Image Analysis 68 (2021), 101913.

[50]

Hongbin Ye, Ningyu Zhang, Shumin Deng, Xiang Chen, Hui Chen, Feiyu Xiong, Xi Chen, and Huajun Chen. 2022. Ontology-enhanced Prompt-tuning for Few-shot Learning. In Proceedings of the ACM Web Conference 2022. 778–787.

Digital Library

[51]

Donggeun Yoo and In So Kweon. 2019. Learning loss for active learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 93–102.

[52]

Mu Yuan, Lan Zhang, Xiang-Yang Li, Lin-Zhuo Yang, and Hui Xiong. 2022. Adaptive model scheduling for resource-efficient data labeling. ACM Transactions on Knowledge Discovery from Data (TKDD) 16, 4 (2022), 1–22.

[53]

Samira Zad and Mark Finlayson. 2020. Systematic evaluation of a framework for unsupervised emotion recognition for narrative text. In Proceedings of the First Joint Workshop on Narrative Understanding, Storylines, and Events. 26–37.

[54]

Baoquan Zhang, Shanshan Feng, Xutao Li, Yunming Ye, Rui Ye, Chen Luo, and Hao Jiang. 2022. Sgmnet: Scene graph matching network for few-shot remote sensing scene classification. IEEE Transactions on Geoscience and Remote Sensing 60 (2022), 1–15.

[55]

Dell Zhang and Wee Sun Lee. 2003. Question classification using support vector machines. In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval. 26–32.

Digital Library

[56]

Hong-Tao Zhang, Min-Lie Huang, and Xiao-Yan Zhu. 2012. A unified active learning framework for biomedical relation extraction. Journal of Computer Science and Technology 27, 6 (2012), 1302–1313.

[57]

Lirong Zhang, Hideo Joho, and Hai-Tao Yu. 2022. Semantic Modelling of Document Focus-Time for Temporal Information Retrieval. In Companion Proceedings of the Web Conference 2022. 896–902.

[58]

Xiaowei Zhang, Bin Hu, Jing Chen, and Philip Moore. 2013. Ontology-based context modeling for emotion recognition in an intelligent web. World Wide Web 16, 4 (2013), 497–513.

Digital Library

[59]

Xu Zhang, Yifeng Li, Wenpeng Lu, Ping Jian, and Guoqiang Zhang. 2020. Intra-Correlation Encoding for Chinese Sentence Intention Matching. In Proceedings of the 28th International Conference on Computational Linguistics. 5193–5204.

[60]

Qiankun Zhao and Sourav S Bhowmick. 2003. Association rule mining: A survey. Nanyang Technological University, Singapore 135 (2003).

[61]

Binggui Zhou, Guanghua Yang, Zheng Shi, and Shaodan Ma. 2022. Natural language processing for smart healthcare. IEEE Reviews in Biomedical Engineering (2022).

[62]

Fuzhen Zhuang, Zhiyuan Qi, Keyu Duan, Dongbo Xi, Yongchun Zhu, Hengshu Zhu, Hui Xiong, and Qing He. 2020. A comprehensive survey on transfer learning. Proc. IEEE 109, 1 (2020), 43–76.

Cited By

Index Terms

CoTel: Ontology-Neural Co-Enhanced Text Labeling
1. Computing methodologies
  1. Artificial intelligence
    1. Knowledge representation and reasoning
    2. Natural language processing
  2. Machine learning
    1. Learning paradigms
      1. Supervised learning
    2. Machine learning approaches
      1. Neural networks
2. Information systems

Index terms have been assigned to the content through auto-classification.

Recommendations

Improving Semi-Supervised Text Classification with Dual Meta-Learning
The goal of semi-supervised text classification (SSTC) is to train a model by exploring both a small number of labeled data and a large number of unlabeled data, such that the learned semi-supervised classifier performs better than the supervised ...
Consistency-Based Semi-supervised Active Learning: Towards Minimizing Labeling Cost
Computer Vision – ECCV 2020
Abstract
Active learning (AL) combines data labeling and model training to minimize the labeling cost by prioritizing the selection of high value data that can best improve model performance. In pool-based active learning, accessible unlabeled data are not ...
Mining relational data from text: From strictly supervised to weakly supervised learning

This paper approaches the relation classification problem in information extraction framework with different machine learning strategies, from strictly supervised to weakly supervised. A number of learning algorithms are presented and empirically ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WWW '23: Proceedings of the ACM Web Conference 2023

April 2023

4293 pages

ISBN:9781450394161

DOI:10.1145/3543507

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 April 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

National Key R&D Program of China
China National Natural Science Foundation
the Fundamental Research Funds for the Central Universities

Conference

WWW '23

Sponsor:

SIGWEB

WWW '23: The ACM Web Conference 2023

April 30 - May 4, 2023

TX, Austin, USA

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
242
Total Downloads

Downloads (Last 12 months)87
Downloads (Last 6 weeks)4

Reflects downloads up to 05 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents