Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3651781.3651820acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicscaConference Proceedingsconference-collections
research-article
Open access

CapStyleBERT: Incorporating Capitalization and Style Information into BERT for Enhanced resumes parsing

Published: 30 May 2024 Publication History

Abstract

The growing prevalence of online recruitment has led to a significant accumulation of resumes within recruitment databases. These resumes are typically crafted in diverse formats, incorporating various font sizes, colors, and layouts, all aimed at capturing the recruiters' interest. However, the quality of extraction outcomes significantly impacts all subsequent tasks, including automated vacancy-profile matching, applicant ranking etc. Rule-based methods, grammar and deep-learning based methods have been introduced to parse and structure resumes accurately, however, these methods tend to overlook the rich information present in capitalization and style. In the paper a new Style and Capitalization aware modification of the BERT model is introduced and the effectiveness of the proposed method is proven.

References

[1]
Nguyen, T. B., Nguyen, Q. M., Nguyen, T. T. H., Do, Q. T., & Luong, C. M. (2020). Improving vietnamese named entity recognition from speech using word capitalization and punctuation recovery models. arXiv preprint arXiv:2010.00198. https://doi.org/10.48550/arXiv.2010.00198
[2]
Davis, B., Morse, B., Price, B., Tensmeyer, C., Wigington, C., & Morariu, V. (2022, October). End-to-end document recognition and understanding with dessurt. In European Conference on Computer Vision (pp. 280-296). Cham: Springer Nature Switzerland. https://doi.org/10.48550/arXiv.2203.166
[3]
Huang, Y., Lv, T., Cui, L., Lu, Y., & Wei, F. (2022, October). Layoutlmv3: Pre-training for document ai with unified text and image masking. In Proceedings of the 30th ACM International Conference on Multimedia (pp. 4083-4091). https://doi.org/10.48550/arXiv.2204.083
[4]
Malakhov, E., Shchelkonogov, D. & Mezhuyev, V. “Algorithms for classification of mass problems of production subject domains”. ACM Proceedings of the 2019 8th International Conference on Software and Computer Applications. Penang: Malaysia. 2019. p. 149–153.
[5]
Stadlhofer, A., & Mezhuyev, V. (2023). Approach to provide interpretability in machine learning models for image classification. Industrial Artificial Intelligence, 1(1), 10.
[6]
Hatice Işık Özata, Önder Demir, and Buket Doğan, "Analysis of Patents in Cyber Security with Text Mining," International Journal of Computer Theory and Engineering vol. 13, no. 1, pp. 24-28, 2021.
[7]
Darshana H. Patel, Hiral Kotadiya, and Avani R. Vasant, "Privacy-Preserving Association Rule Mining Considering Multi-objective through an Evolutionary Algorithm," International Journal of Computer Theory and Engineering vol. 14, no. 1, pp. 1-8, 2022.
[8]
Kishana R. Kashwan and C. M. Velu, "Customer Segmentation Using Clustering and Data Mining Techniques," International Journal of Computer Theory and Engineering vol. 5, no. 6, pp. 856-861, 2013.
[9]
Zu, S., & Wang, X. (2019). Resume information extraction with a novel text block segmentation algorithm. Int J Nat Lang Comput, 8, 29-48.
[10]
Barducci, A., Iannaccone, S., La Gatta, V., Moscato, V., Sperlì, G., & Zavota, S. (2022). An end-to-end framework for information extraction from italian resumes. Expert Systems with Applications, 210, 118487.
[11]
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
[12]
Sajid, H., Kanwal, J., Bhatti, S. U. R., Qureshi, S. A., Basharat, A., Hussain, S., & Khan, K. U. (2022, January). Resume parsing framework for e-recruitment. In 2022 16th International Conference on Ubiquitous Information Management and Communication (IMCOM) (pp. 1-8). IEEE.
[13]
Retyk, F., Fabregat, H., Aizpuru, J., Taglio, M., & Zbib, R. (2023). Resume Parsing as Hierarchical Sequence Labeling: An Empirical Study. RecSys in HR'23: The 3rd Workshop on Recommender Systems for Human Resources, in conjunction with the 17th ACM Conference on Recommender Systems, September 18–22, 2023, Singapore, Singapore.
[14]
Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., ... & Liu, P. J. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1), 5485-5551.
[15]
Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2017). Enriching word vectors with subword information. Transactions of the association for computational linguistics, 5, 135-146.
[16]
Espinal, A., Haralambous, Y., Bedart, D., & Puentes, J. (2023, May). A Format-sensitive BERT-based Approach to Resume Segmentation. In 2023 33rd Conference of Open Innovations Association (FRUCT) (pp. 30-37). IEEE.
[17]
Powalski, R., Borchmann, Ł., Jurkiewicz, D., Dwojak, T., Pietruszka, M., & Pałka, G. (2021). Going full-tilt boogie on document understanding with text-image-layout transformer. In Document Analysis and Recognition–ICDAR 2021: 16th International Conference, Lausanne, Switzerland, September 5–10, 2021, Proceedings, Part II 16 (pp. 732-747). Springer International Publishing.
[18]
Garncarek, Ł., Powalski, R., Stanisławek, T., Topolski, B., Halama, P., Turski, M., & Graliński, F. (2021, September). Lambert: Layout-aware language modeling for information extraction. In International Conference on Document Analysis and Recognition (pp. 532-547). Cham: Springer International Publishing.
[19]
Xu, Y., Li, M., Cui, L., Huang, S., Wei, F., & Zhou, M. (2020, August). Layoutlm: Pre-training of text and layout for document image understanding. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 1192-1200).
[20]
Kerpel, O.I.; Penko, V.G. Development of Software for the Creation of the Corpus of the Ukrainian Language and its use // Informatics & Mathematical Methods in Simulation. 2020, Vol. 10 Issue 1/2, - Odessa, ONPU. 2020, P. 23-30.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICSCA '24: Proceedings of the 2024 13th International Conference on Software and Computer Applications
February 2024
395 pages
ISBN:9798400708329
DOI:10.1145/3651781
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 May 2024

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ICSCA 2024

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 164
    Total Downloads
  • Downloads (Last 12 months)164
  • Downloads (Last 6 weeks)29
Reflects downloads up to 17 Jan 2025

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media