Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2939672.2939727acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

CompanyDepot: Employer Name Normalization in the Online Recruitment Industry

Published: 13 August 2016 Publication History

Abstract

Entity linking links entity mentions in text to the corresponding entities in a knowledge base (KB) and has many applications in both open domain and specific domains. For example, in the recruitment domain, linking employer names in job postings or resumes to entities in an employer KB is very important to many business applications. In this paper, we focus on this employer name normalization task, which has several unique challenges: handling employer names from both job postings and resumes, leveraging the corresponding location context, and handling name variations, irrelevant input data, and noises in the KB. We present a system called CompanyDepot which contains a machine learning based approach CompanyDepot-ML and a heuristic approach CompanyDepot-H to address these challenges in three steps: (1) searching for candidate entities based on a customized search engine for the KB; (2) ranking the candidate entities using learning-to-rank methods or heuristics; and (3) validating the top-ranked entity via binary classification or heuristics. While CompanyDepot-ML shows better extendability and flexibility, CompanyDepot-H serves as a strong baseline and useful way to collect training data for CompanyDepot-ML. The proposed system achieves 2.5%-21.4% higher coverage at the same precision level compared to an existing system used at CareerBuilder over multiple real-world datasets. Applying the system to a similar task of academic institution name normalization further shows the generalization ability of the method.

References

[1]
4ICU. Top universities in the united kingdom. http://www.4icu.org/gb/. accessed 2014-01.
[2]
K. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor. Freebase: A collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD '08, pages 1247--1250, New York, NY, USA, 2008. ACM.
[3]
A. Borkovsky. Item name normalization, Apr. 29 2003. US Patent 6,556,991.
[4]
R. Busa-Fekete, G. Szarvas, T. Éltetö, and B. Kégl. An apple-to-apple comparison of learning-to-rank algorithms in terms of normalized discounted cumulative gain. In B. Proceedings of ECAI-12 Workshop, Preference Learning: Problems and Applications in AI, 2012.
[5]
C.-C. Chang and C.-J. Lin. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1--27:27, 2011. Software available at http://www.csie.ntu.edu.tw/ cjlin/libsvm.
[6]
P. Christen. Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Springer Publishing Company, Incorporated, 2012.
[7]
B. Croft, D. Metzler, and T. Strohman. Search Engines: Information Retrieval in Practice. Addison-Wesley Publishing Company, USA, 2009.
[8]
M. Dredze, P. McNamee, D. Rao, A. Gerber, and T. Finin. Entity disambiguation for knowledge base population. In Proceedings of the 23rd International Conference on Computational Linguistics, COLING '10, pages 277--285, 2010.
[9]
P. S. Efraimidis and P. G. Spirakis. Weighted random sampling with a reservoir. Inf. Process. Lett., 97(5):181--185, Mar. 2006.
[10]
Guardian. Universities. http://www.theguardian.com/education/list/educationinstitution. accessed 2014-01.
[11]
F. Jacob, F. Javed, M. Zhao, and M. Mcnair. sCooL: A system for academic institution name normalization. In Collaboration Technologies and Systems (CTS), 2014 International Conference on, pages 86--93, 2014.
[12]
T. Joachims. Optimizing search engines using clickthrough data. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '02, pages 133--142, New York, NY, USA, 2002. ACM.
[13]
S. Jonnalagadda and P. Topham. NEMO: Extraction and normalization of organization names from affiliation strings. Journal of Biomedical Discovery and Collaboration, 5:50, 2010.
[14]
H. Kardes, D. Konidena, S. Agrawal, M. Huff, and A. Sun. Graph-based approaches for organization entity resolution in MapReduce. In Proceedings of TextGraphs-8 Graph-based Methods for Natural Language Processing Workshop at EMNLP, 2013.
[15]
M. Kejriwal, Q. Liu, F. Jacob, and F. Javed. A pipeline for extracting and deduplicating domain-specific knowledge bases. In Proceedings of 2015 IEEE International Conference on Big Data, 2015.
[16]
R. Leaman, R. I. Dogan, and Z. Lu. Dnorm: disease name normalization with pairwise learning to rank. Bioinformatics, 29(22):2909--2917, 2013.
[17]
T.-Y. Liu. Learning to rank for information retrieval. Found. Trends Inf. Retr., 3(3):225--331, Mar. 2009.
[18]
W. Magdy, K. Darwish, O. Emam, and H. Hassan. Arabic cross-document person name normalization. In Proceedings of the 2007 Workshop on Computational Approaches to Semitic Languages: Common Issues and Resources, Semitic '07, pages 25--32, 2007.
[19]
W. P. McNeill, H. Kardes, and A. Borthwick. Dynamic record blocking: Efficient linking of massive databases in mapreduce. In Proceedings of 10th International Workshop on Quality in Databases (QDB) at VLDB, 2012.
[20]
D. Metzler and W. Bruce Croft. Linear feature-based models for information retrieval. Inf. Retr., 10(3):257--274, June 2007.
[21]
L. Ratinov, D. Roth, D. Downey, and M. Anderson. Local and global algorithms for disambiguation to wikipedia. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1, HLT '11, pages 1375--1384, 2011.
[22]
W. Shen, J. Wang, and J. Han. Entity linking with a knowledge base: Issues, techniques, and solutions. IEEE Trans. Knowl. Data Eng., 27(2):443--460, 2015.
[23]
J. Wermter, K. Tomanek, and U. Hahn. High-performance gene name normalization with GENO. Bioinformatics, 25(6):815--821, 2009.
[24]
B. Yan, L. Bajaj, and A. Bhasin. Entity resolution using social graphs for business applications. In International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2011, pages 220--227, 2011.
[25]
W. Zhang, Y. C. Sim, J. Su, and C. L. Tan. Entity linking with effective acronym expansion, instance selection and topic modeling. In Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence - Volume Volume Three, IJCAI'11, pages 1909--1914. AAAI Press, 2011.
[26]
W. Zhang, J. Su, C. L. Tan, and W. T. Wang. Entity linking leveraging: Automatically generated annotation. In Proceedings of the 23rd International Conference on Computational Linguistics, COLING '10, pages 1290--1298, 2010.
[27]
W. Zhang, C. L. Tan, Y. C. Sim, and J. Su. NUS-I2R: learning a combined system for entity linking. In Proceedings of the Third Text Analysis Conference, TAC 2010, 2010.
[28]
Z. Zheng, F. Li, M. Huang, and X. Zhu. Learning to link entities with knowledge base. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, HLT '10, pages 483--491, 2010.

Cited By

View all
  • (2023)Text Classification In The Wild: A Large-Scale Long-Tailed Name Normalization DatasetICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP49357.2023.10096769(1-5)Online publication date: 4-Jun-2023
  • (2021)Panacea: Visual exploration system for analyzing trends in annual recruitment using time-varying graphsPLOS ONE10.1371/journal.pone.024758716:3(e0247587)Online publication date: 1-Mar-2021
  • (2021)TMC 2021: 2021 International Workshop on Talent and Management ComputingProceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining10.1145/3447548.3469465(4169-4170)Online publication date: 14-Aug-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
August 2016
2176 pages
ISBN:9781450342322
DOI:10.1145/2939672
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 August 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. employer name normalization
  2. entity linking
  3. learning to rank
  4. named entity disambiguation

Qualifiers

  • Research-article

Conference

KDD '16
Sponsor:

Acceptance Rates

KDD '16 Paper Acceptance Rate 66 of 1,115 submissions, 6%;
Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)0
Reflects downloads up to 08 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Text Classification In The Wild: A Large-Scale Long-Tailed Name Normalization DatasetICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP49357.2023.10096769(1-5)Online publication date: 4-Jun-2023
  • (2021)Panacea: Visual exploration system for analyzing trends in annual recruitment using time-varying graphsPLOS ONE10.1371/journal.pone.024758716:3(e0247587)Online publication date: 1-Mar-2021
  • (2021)TMC 2021: 2021 International Workshop on Talent and Management ComputingProceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining10.1145/3447548.3469465(4169-4170)Online publication date: 14-Aug-2021
  • (2021)KCNet: Kernel-Based Canonicalization Network for Entities in Recruitment DomainArtificial Neural Networks and Machine Learning – ICANN 202110.1007/978-3-030-86340-1_13(157-169)Online publication date: 7-Sep-2021
  • (2020)Canonicalizing Organization Names for Recruitment DomainProceedings of the 7th ACM IKDD CoDS and 25th COMAD10.1145/3371158.3371203(296-301)Online publication date: 5-Jan-2020
  • (2020)Canonicalizing Knowledge Bases for Recruitment DomainAdvances in Knowledge Discovery and Data Mining10.1007/978-3-030-47436-2_38(500-513)Online publication date: 6-May-2020
  • (2019)The Impact of Person-Organization Fit on Talent ManagementProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3292500.3330849(1625-1633)Online publication date: 25-Jul-2019
  • (2018)Lessons Learned from Developing and Deploying a Large-Scale Employer Name Normalization System for Online RecruitmentProceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3219819.3219842(556-565)Online publication date: 19-Jul-2018
  • (2018)Automatically Detecting Errors in Employer Industry Classification Using Job PostingsData Science and Engineering10.1007/s41019-018-0071-73:3(221-231)Online publication date: 19-Aug-2018
  • (2017)Supporting Employer Name Normalization at both Entity and Cluster LevelProceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining10.1145/3097983.3098093(1883-1892)Online publication date: 13-Aug-2017
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media