Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3543507.3583533acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

CoTel: Ontology-Neural Co-Enhanced Text Labeling

Published: 30 April 2023 Publication History

Abstract

The success of many web services relies on the large-scale domain-specific high-quality labeled dataset. Insufficient public datasets motivate us to reduce the cost of data labeling while maintaining high accuracy in support of intelligent web applications. The rule-based method and the learning-based method are common techniques for labeling. In this work, we study how to utilize the rule-based and learning-based methods for resource-effective text labeling. We propose CoTel, the first ontology-neural co-enhanced framework for text labeling. We propose critical ontology extraction in the rule-based module and ontology-enhanced loss prediction in the learning-based module. CoTel can integrate explicit labeling rules and implicit labeling models and make them help each other to improve resource efficiency in text labeling tasks. We evaluate CoTel on both public datasets and real applications with three different tasks. Compared with the baseline, CoTel can reduce the time cost by 64.75% (a 2.84× speedup) and the number of labeling by 62.07%.

References

[1]
Rakesh Agrawal, Tomasz Imieliński, and Arun Swami. 1993. Mining association rules between sets of items in large databases. In Proceedings of the 1993 ACM SIGMOD international conference on Management of data. 207–216.
[2]
Raunak Bhattacharyya, Soyeon Jung, Liam A Kruse, Ransalu Senanayake, and Mykel J Kochenderfer. 2021. A Hybrid Rule-Based and Data-Driven Approach to Driver Modeling Through Particle Filtering. IEEE Transactions on Intelligent Transportation Systems (2021).
[3]
Richard Boddy and Gordon Smith. 2009. Statistical methods in practice: for scientists and technologists. John Wiley & Sons.
[4]
Tingting Cai, Zhiyuan Ma, Hong Zheng, and Yangming Zhou. 2021. NE–LP: normalized entropy-and loss prediction-based sampling for active learning in Chinese word segmentation on EHRs. Neural Computing and Applications 33, 19 (2021), 12535–12549.
[5]
Haw-Shiuan Chang, Shankar Vembu, Sunil Mohan, Rheeya Uppaal, and Andrew McCallum. 2020. Using error decay prediction to overcome practical issues of deep active learning for named entity recognition. Machine Learning 109, 9 (2020), 1749–1778.
[6]
Huizhong Chen, Andrew Gallagher, and Bernd Girod. 2012. Describing clothing by semantic attributes. In Computer Vision–ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings, Part III 12. Springer, 609–623.
[7]
Laura Chiticariu, Yunyao Li, and Frederick R. Reiss. 2013. Rule-Based Information Extraction is Dead! Long Live Rule-Based Information Extraction Systems!. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Seattle, Washington, USA, 827–832. https://aclanthology.org/D13-1079
[8]
Zhiyong Cui, Ruimin Ke, Ziyuan Pu, and Yinhai Wang. 2018. Deep bidirectional and unidirectional LSTM recurrent neural network for network-wide traffic speed prediction. arXiv preprint arXiv:1801.02143 (2018).
[9]
Ona de Gibert, Naiara Perez, Aitor García-Pablos, and Montse Cuadros. 2018. Hate Speech Dataset from a White Supremacy Forum. In Proceedings of the 2nd Workshop on Abusive Language Online (ALW2). Association for Computational Linguistics, Brussels, Belgium, 11–20. https://doi.org/10.18653/v1/W18-5102
[10]
Ting Deng, Wenfei Fan, Ping Lu, Xiaomeng Luo, Xiaoke Zhu, and Wanhe An. 2022. Deep and Collective Entity Resolution in Parallel. In 2022 IEEE 38th International Conference on Data Engineering (ICDE). IEEE, 2060–2072.
[11]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
[12]
Majigsuren Enkhsaikhan, Wei Liu, Eun-Jung Holden, and Paul Duuring. 2021. Auto-labelling entities in low-resource text: a geological case study. Knowledge and Information Systems 63 (2021), 695–715.
[13]
Wenfei Fan. 2022. Big graphs: challenges and opportunities. Proceedings of the VLDB Endowment 15, 12 (2022), 3782–3797.
[14]
Wenfei Fan, Ziyan Han, Yaoshu Wang, and Min Xie. 2022. Parallel Rule Discovery from Large Datasets by Sampling. In Proceedings of the 2022 International Conference on Management of Data. 384–398.
[15]
Wenfei Fan, Ping Lu, and Chao Tian. 2020. Unifying logic rules and machine learning for entity enhancing. Science China Information Sciences 63, 7 (2020), 1–19.
[16]
Chenchen Feng, Yu He, Shiyang Wen, Guojun Liu, Liang Wang, Jian Xu, and Bo Zheng. 2022. DC-GNN: Decoupled Graph Neural Networks for Improving and Accelerating Large-Scale E-commerce Retrieval. In Companion Proceedings of the Web Conference 2022. 32–40.
[17]
Yuxia Geng, Jiaoyan Chen, Zhuo Chen, Jeff Z Pan, Zhiquan Ye, Zonggang Yuan, Yantao Jia, and Huajun Chen. 2021. OntoZSL: Ontology-enhanced zero-shot learning. In Proceedings of the Web Conference 2021. 3325–3336.
[18]
Daniele Di Grandi. 2022. ProbQL: A Probabilistic Query Language for Information Extraction from PDF Reports and Natural Language Written Texts. Master’s thesis.
[19]
Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017. Neural collaborative filtering. In Proceedings of the 26th international conference on world wide web. 173–182.
[20]
Jeff Howe 2006. The rise of crowdsourcing. Wired magazine 14, 6 (2006), 1–4.
[21]
Mostafa S Ibrahim, Srikanth Muralidharan, Zhiwei Deng, Arash Vahdat, and Greg Mori. 2016. A hierarchical deep temporal model for group activity recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1971–1980.
[22]
Houye Ji, Junxiong Zhu, Chuan Shi, Xiao Wang, Bai Wang, Chaoyu Zhang, Zixuan Zhu, Feng Zhang, and Yanghua Li. 2021. Large-scale comb-k recommendation. In Proceedings of the Web Conference 2021. 2512–2523.
[23]
T Karthikeyan and N Ravikumar. 2014. A survey on association rule mining. International Journal of Advanced Research in Computer and Communication Engineering 3, 1 (2014), 2278–1021.
[24]
Ildoo Kim, Younghoon Kim, and Sungwoong Kim. 2020. Learning loss for test-time augmentation. Advances in Neural Information Processing Systems 33 (2020), 4163–4174.
[25]
Jiayu Lei, Zheng Zhang, Lan Zhang, and Xiang-Yang Li. 2022. Coca: Cost-effective collaborative annotation system by combining experts and amateurs. In 2022 IEEE 38th International Conference on Data Engineering (ICDE). IEEE, 674–685.
[26]
Hector J Levesque. 1986. Knowledge representation and reasoning. Annual review of computer science 1, 1 (1986), 255–287.
[27]
David D Lewis. 1995. A sequential algorithm for training text classifiers: Corrigendum and additional data. In Acm Sigir Forum, Vol. 29. ACM New York, NY, USA, 13–19.
[28]
Jia Li and Dandan Song. 2022. Uncertainty-aware Pseudo Label Refinery for Entity Alignment. In Proceedings of the ACM Web Conference 2022. 829–837.
[29]
Roberto Lourenco Jr, Adriano Veloso, Adriano Pereira, Wagner Meira Jr, Renato Ferreira, and Srinivasan Parthasarathy. 2014. Economically-efficient sentiment stream analysis. In Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval. 637–646.
[30]
Yun Ma, Dongwei Xiang, Shuyu Zheng, Deyu Tian, and Xuanzhe Liu. 2019. Moving deep learning into web browser: How far can we go¿. In The World Wide Web Conference. 1234–1244.
[31]
Mariane Moreira, Jefersson A dos Santos, and Adriano Veloso. 2014. Learning to rank similar apparel styles with economically-efficient rule-based active learning. In Proceedings of International Conference on Multimedia Retrieval. 361–368.
[32]
Vu-Linh Nguyen, Mohammad Hossein Shaker, and Eyke Hüllermeier. 2022. How to measure uncertainty in uncertainty sampling for active learning. Machine Learning 111, 1 (2022), 89–122.
[33]
Sinno Jialin Pan and Qiang Yang. 2009. A survey on transfer learning. IEEE Transactions on knowledge and data engineering 22, 10 (2009), 1345–1359.
[34]
W Gerrod Parrott. 2001. Emotions in social psychology: Essential readings. psychology press.
[35]
Andrew Pavlo, Erik Paulson, Alexander Rasin, Daniel J Abadi, David J DeWitt, Samuel Madden, and Michael Stonebraker. 2009. A comparison of approaches to large-scale data analysis. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of data. 165–178.
[36]
Samira Pouyanfar, Saad Sadiq, Yilin Yan, Haiman Tian, Yudong Tao, Maria Presa Reyes, Mei-Ling Shyu, Shu-Ching Chen, and Sundaraja S Iyengar. 2018. A survey on deep learning: Algorithms, techniques, and applications. ACM Computing Surveys (CSUR) 51, 5 (2018), 1–36.
[37]
Yuanyuan Qiao, Yuewei Wu, Fan Duo, Wenhui Lin, and Jie Yang. 2019. Siamese neural networks for user identity linkage through web browsing. IEEE transactions on neural networks and learning systems 31, 8 (2019), 2741–2751.
[38]
Minghui Qiu, Liu Yang, Feng Ji, Wei Zhou, Jun Huang, Haiqing Chen, W Bruce Croft, and Wei Lin. 2018. Transfer Learning for Context-Aware Question Matching in Information-seeking Conversations in E-commerce. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 208–213.
[39]
Elvis Saravia, Hsien-Chi Toby Liu, Yen-Hao Huang, Junlin Wu, and Yi-Shin Chen. 2018. CARER: Contextualized Affect Representations for Emotion Recognition. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Brussels, Belgium, 3687–3697. https://doi.org/10.18653/v1/D18-1404
[40]
Burr Settles. 2009. Active learning literature survey. (2009).
[41]
Shadi Shaheen, Wassim El-Hajj, Hazem Hajj, and Shady Elbassuoni. 2014. Emotion recognition from text based on automatically generated rules. In 2014 IEEE International Conference on Data Mining Workshop. IEEE, 383–392.
[42]
Sahand Sharifzadeh, Sina Moayed Baharlou, and Volker Tresp. 2021. Classification by attention: Scene graph classification with prior knowledge. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 5025–5033.
[43]
Megh Shukla and Shuaib Ahmed. 2021. A mathematical analysis of learning loss for active learning in regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3320–3328.
[44]
Firdaus Solihin and Indra Budi. 2018. Recording of law enforcement based on court decision document using rule-based information extraction. In 2018 International Conference on Advanced Computer Science and Information Systems (ICACSIS). IEEE, 349–354.
[45]
Abdul Syafiq Abdull Sukor, Ammar Zakaria, Norasmadi Abdul Rahim, Latifah Munirah Kamarudin, Rossi Setchi, and Hiromitsu Nishizaki. 2019. A hybrid approach of knowledge-driven and data-driven reasoning for activity recognition in smart homes. Journal of Intelligent & Fuzzy Systems 36, 5 (2019), 4177–4188.
[46]
Iulia Turc, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. Well-Read Students Learn Better: On the Importance of Pre-training Compact Models. arXiv preprint arXiv:1908.08962v2 (2019).
[47]
Qi Wang, Yue Ma, Kun Zhao, and Yingjie Tian. 2022. A comprehensive survey of loss functions in machine learning. Annals of Data Science 9, 2 (2022), 187–212.
[48]
Karl Weiss, Taghi M Khoshgoftaar, and DingDing Wang. 2016. A survey of transfer learning. Journal of Big data 3, 1 (2016), 1–40.
[49]
Xing Wu, Cheng Chen, Mingyu Zhong, Jianjia Wang, and Jun Shi. 2021. COVID-AL: The diagnosis of COVID-19 with deep active learning. Medical Image Analysis 68 (2021), 101913.
[50]
Hongbin Ye, Ningyu Zhang, Shumin Deng, Xiang Chen, Hui Chen, Feiyu Xiong, Xi Chen, and Huajun Chen. 2022. Ontology-enhanced Prompt-tuning for Few-shot Learning. In Proceedings of the ACM Web Conference 2022. 778–787.
[51]
Donggeun Yoo and In So Kweon. 2019. Learning loss for active learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 93–102.
[52]
Mu Yuan, Lan Zhang, Xiang-Yang Li, Lin-Zhuo Yang, and Hui Xiong. 2022. Adaptive model scheduling for resource-efficient data labeling. ACM Transactions on Knowledge Discovery from Data (TKDD) 16, 4 (2022), 1–22.
[53]
Samira Zad and Mark Finlayson. 2020. Systematic evaluation of a framework for unsupervised emotion recognition for narrative text. In Proceedings of the First Joint Workshop on Narrative Understanding, Storylines, and Events. 26–37.
[54]
Baoquan Zhang, Shanshan Feng, Xutao Li, Yunming Ye, Rui Ye, Chen Luo, and Hao Jiang. 2022. Sgmnet: Scene graph matching network for few-shot remote sensing scene classification. IEEE Transactions on Geoscience and Remote Sensing 60 (2022), 1–15.
[55]
Dell Zhang and Wee Sun Lee. 2003. Question classification using support vector machines. In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval. 26–32.
[56]
Hong-Tao Zhang, Min-Lie Huang, and Xiao-Yan Zhu. 2012. A unified active learning framework for biomedical relation extraction. Journal of Computer Science and Technology 27, 6 (2012), 1302–1313.
[57]
Lirong Zhang, Hideo Joho, and Hai-Tao Yu. 2022. Semantic Modelling of Document Focus-Time for Temporal Information Retrieval. In Companion Proceedings of the Web Conference 2022. 896–902.
[58]
Xiaowei Zhang, Bin Hu, Jing Chen, and Philip Moore. 2013. Ontology-based context modeling for emotion recognition in an intelligent web. World Wide Web 16, 4 (2013), 497–513.
[59]
Xu Zhang, Yifeng Li, Wenpeng Lu, Ping Jian, and Guoqiang Zhang. 2020. Intra-Correlation Encoding for Chinese Sentence Intention Matching. In Proceedings of the 28th International Conference on Computational Linguistics. 5193–5204.
[60]
Qiankun Zhao and Sourav S Bhowmick. 2003. Association rule mining: A survey. Nanyang Technological University, Singapore 135 (2003).
[61]
Binggui Zhou, Guanghua Yang, Zheng Shi, and Shaodan Ma. 2022. Natural language processing for smart healthcare. IEEE Reviews in Biomedical Engineering (2022).
[62]
Fuzhen Zhuang, Zhiyuan Qi, Keyu Duan, Dongbo Xi, Yongchun Zhu, Hengshu Zhu, Hui Xiong, and Qing He. 2020. A comprehensive survey on transfer learning. Proc. IEEE 109, 1 (2020), 43–76.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WWW '23: Proceedings of the ACM Web Conference 2023
April 2023
4293 pages
ISBN:9781450394161
DOI:10.1145/3543507
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 April 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. active learning
  2. knowledge enhancement
  3. pseudo labeling
  4. text labeling

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • National Key R&D Program of China
  • China National Natural Science Foundation
  • the Fundamental Research Funds for the Central Universities

Conference

WWW '23
Sponsor:
WWW '23: The ACM Web Conference 2023
April 30 - May 4, 2023
TX, Austin, USA

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 242
    Total Downloads
  • Downloads (Last 12 months)87
  • Downloads (Last 6 weeks)4
Reflects downloads up to 05 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media