Resume Screening Report (1) - Merged
Resume Screening Report (1) - Merged
Resume Screening Report (1) - Merged
Bachelor of Technology
In
Computer Science & Engineering
(Assistant Professor)
(DEC 2023)
MAHARAJA AGRASEN INSTITUTE OF TECHNOLOGY
Department of Computer Science and Engineering
CERTIFICATE
This is to Certified that this MINOR project report “resume screening through
machine learning and natural language processing” is submitted by “Taniya
Sharma(20214802720), Shivansh Singhal(20514802720), Anurag Kumar
Thakur(35714802720)” who carried out the project work under my supervision.
This study delves into the innovative application of Natural Language Processing (NLP) for
resume screening, revolutionizing the traditional hiring process. With the exponential growth
of digital job applications, organizations face the challenge of efficiently sifting through vast
volumes of resumes to identify the most suitable candidates. Leveraging NLP, this research
introduces a paradigm shift by automating the initial screening phase. The proposed system
employs advanced language models to comprehend and extract valuable information from
resumes, including skills, experience, and qualifications. Machine learning algorithms are then
utilized to evaluate candidates based on predefined criteria, enabling a more objective and
streamlined evaluation process. This approach not only accelerates the screening process but
also enhances the accuracy of candidate shortlisting, reducing human bias. Furthermore, the
study explores the integration of contextual analysis within NLP models to better understand
the nuances of industry-specific language and evolving job market trends. Through an
experimental evaluation, this research validates the effectiveness and efficiency of the NLP-
driven resume screening system, showcasing its potential to significantly optimize recruitment
processes and contribute to more informed and unbiased hiring decisions. In a dynamic and
competitive job market, this NLP-based approach emerges as a game-changer, promising to
reshape how organizations identify and engage top talent.
i
ACKNOWLEDGEMENT
Date:
ii
TABLE OF CONTENTS
iii
List of Figures
iv
Chapter - 1: Introduction
Introduction
In the present system the candidate has to fill each and every information regarding there
resume in a manual form which takes large amount of time and then also the candidates, are
not satisfied by the job which the present system prefers according to their skills. Let me tell
you a ratio of 5:1 means, if 5 people are getting job than out of that 5, only a single guy will be
satisfied by his/her job. Let me tell you an example: If I am a good python developer and
particular company hired me and they are making me work on Java so, my python skills are
pretty useless. And on the other hand, if there is vacant place in a company so according to
owner of the company, he/she will prefer a best possible candidate for that vacancy. So, our
system will act as a handshake between these two entities. The company who prefers the best
possible candidate and the candidate who prefers the best possible job according to his or her
skills and ability.
The problem is that the present are not much flexible and efficient and time saving. It requires
candidate, to fill the forms online than also you might not get the genuine information of the
candidate. Beside Where our system which saves the time of the candidate by providing to
upload there resume in any format preferable to the candidate beside all the information in the
resume our system will detect all its activity from the candidate social profile which will give
the best candidate for that particular job and candidate will also be satisfied because he will get
job in that company which really appreciates candidates’ skill and ability. On the other hand,
we are providing same kind of flexibility to the client company.
5
1.2 Objectives
The primary objectives of this project are multi-faceted, addressing various dimensions of the
interview process:
In the current job market, many candidates, especially fresh graduates, face challenges
in finding relevant employment. The desperation to secure a job often leads them to
accept positions unrelated to their skills and abilities. This issue not only affects
individuals but also has broader societal implications. Our system aims to address this
challenge by employing machine learning algorithms that optimize the hiring process.
By considering previous results and ranking constraints, our system identifies and
recommends the best-suited candidates for specific job vacancies. This approach
ensures that candidates are matched with positions that align with their skills,
minimizing the likelihood of dissatisfaction and societal pressures.
For client companies, the objective is to build high-performing teams efficiently. Hiring
a candidate with the right skills for a given role is crucial for organizational success.
Our system assists companies in identifying the most suitable candidates based on
predefined constraints and requirements. Whether it's technical skills, experience, or
other criteria, the algorithm optimizes the selection process, ensuring that the chosen
candidate aligns perfectly with the job specifications. This approach enhances the
efficiency of the hiring sector, leading to increased satisfaction for both the client
company and the hired candidate.
6
1.3 Scope and Considerations
The scope of our resume screening and ranking project is expansive, aiming to revolutionize
the hiring process with a focus on several key elements. Central to our initiative is the
development and implementation of sophisticated machine learning algorithms. These
algorithms leverage historical data and ranking constraints to optimize candidate selection,
ensuring a more precise alignment of skills with job requirements. The project's versatility is
evident in its seamless integration with existing human resources and applicant tracking
systems used by client companies, fostering a smooth transition without disrupting current
workflows. User experience is paramount, with the creation of intuitive interfaces for
candidates and client companies. The system's scalability is a pivotal feature, initially targeting
the Indian I.T sector but designed to extend to various industries, including governmental jobs.
Data security and privacy are prioritized to safeguard sensitive candidate information, adhering
to strict compliance with data protection regulations. Considerations encompass addressing
biases in the hiring process, ensuring fairness and diversity. Continuous improvement
mechanisms, legal and ethical compliance, user training, and a robust feedback system further
underline our commitment to creating an ethical, efficient, and adaptable resume screening and
ranking solution. This comprehensive approach ensures that the project aligns with industry
standards, legal frameworks, and user expectations, ultimately contributing to a more effective
and equitable hiring landscape.
7
Chapter - 2: Literature Survey
The literature survey undertaken in this research serves as a comprehensive exploration at the
convergence of three pivotal domains: machine learning, natural language processing (NLP),
and resume screening. This investigative journey aims to unravel the intricate relationship
between these domains, shedding light on key language models, innovative transfer learning
strategies, and the latest advancements that collectively shape the landscape of talent
acquisition. By meticulously scrutinizing existing scholarly works, the survey seeks to distill
critical insights, understand foundational theories, and identify the most recent breakthroughs
in the intersection of deep learning and NLP. Specifically, the survey places a keen focus on
language models that are at the forefront of NLP innovation, investigates strategies for
transferring knowledge between these models, and probes into the latest advancements that
hold promise for revolutionizing screening processes. In doing so, the literature survey sets
the stage for informed discussions, providing a solid foundation for the subsequent phases of
this research endeavor.
• KNeighborsClassifier
8
• OnevsRestClassifier
In the context of our resume screening project, the OneVsRestClassifier plays a crucial
role in enhancing the model's ability to classify candidates across multiple job
categories. This classifier is particularly valuable when candidates possess diverse skill
sets or experiences that may span different roles. The OneVsRestClassifier extends the
project's machine learning capabilities by allowing the model to handle the complexity
of matching candidates to various job requirements simultaneously.For instance, if a
candidate has proficiency in both Java and Python programming languages, the
OneVsRestClassifier enables the model to predict the candidate's suitability for roles
requiring expertise in either language. This flexibility ensures that the screening process
accurately evaluates candidates across multiple dimensions, aligning with the project's
objective of optimizing candidate-job matches and providing a more nuanced and
versatile approach to resume classification within the diverse landscape of the IT sector.
2.2 Architecture
9
2.2.4 Machine Learning Model
The heart of the architecture involves employing machine learning algorithms, such as a
OneVsRestClassifier, to predict the suitability of candidates for specific job categories. This
model is trained on historical data, considering past hiring outcomes and ranking constraints.
10
Chapter – 3: Research
3.2.2 This flexibility allows the model to capture nuanced similarities and adapt to the diverse
skill sets prevalent in the IT sector. Its ability to consider the k-nearest neighbors enhances
accuracy in role-specific categorization, making it particularly effective in scenarios where
candidates possess multidimensional skills.
3.2.3 This adaptability and context-awareness set KNeighborsClassifier apart, ensuring a more
nuanced and precise approach to matching candidates with job requirements compared to
models with fixed decision boundaries.
11
3.3 Capabilities and Use Cases:
12
3.4 Overview of OneVsRestClassifier
The OneVsRestClassifier in our resume screening model is a pivotal component, enabling the
simultaneous classification of candidates across multiple job categories. This approach allows
the model to handle the diverse skill sets and experiences found in resumes, ensuring a nuanced
evaluation. By treating each job category as a separate binary classification task, the
OneVsRestClassifier extends the model's versatility, accurately predicting the relevance of
candidates to various roles. This enhances the overall efficiency of the screening process,
providing a comprehensive and adaptable solution to match candidates with the diverse
requirements of the IT sector.
13
3.6. Capabilities and Use Cases:
14
Chapter – 4: Approach
Data Validation: Scrutinize the Kaggle dataset for completeness, relevance, and
consistency. Remove any duplicate or irrelevant entries, ensuring a clean and well-
structured dataset for subsequent analysis.
15
4.2 Text Cleaning:
4.2.1 Format Standardization: Standardize the text format across the entire dataset
by removing unnecessary formatting artifacts, headers, footers, and any extraneous
symbols. This ensures uniformity for effective text analysis.
4.2.4 Tokenization and Lemmatization: Tokenize the cleaned text into individual
words or tokens. Apply lemmatization to reduce words to their base form,
promoting consistency and aiding in feature extraction during NLP analysis.
4.2.5 Stopword Removal: Eliminate common stopwords (e.g., "and," "the," "is") to
focus the analysis on meaningful content and improve the efficiency of subsequent
NLP tasks.
16
4.3 Named Entity Recognition (NER) and Skill Extraction:
17
Libraries
1. Pandas
Fig 4. Pandas
Source: https://towardsdatascience.com/pandas-hacks-read-clipboard-94a05c031382
18
2. NumPy
Fig 5. NumPy
Source: https://kristian-roopnarine.medium.com/creating-a-best-fit-line-with-gradient-
descent-2254f31e319
19
3. Matplotlib and Seaborn
Fig 6. Seaborn
Source: https://innovationyourself.com/master-seaborn-in-python
20
4. Natural Language Toolkit (NLTK) and SpaCy
21
Fig 7. Entity Categorization
22
Chapter – 5: Results
Skill-centric Standardization:
Developed a skill-centric approach to standardize the representation of skills, facilitating a
more cohesive and interpretable output. The unique skill numbers served as a robust
mechanism to streamline communication and understanding across different stakeholders.
23
Improved Decision Support:
The combination of unique skill numbers, job category segmentation, and visual insights
empowered recruiters with improved decision support tools. Recruiters could make informed
decisions, backed by data-driven insights, leading to more successful candidate matches and
enhanced hiring outcomes.
24
Chapter – 6: Summary, Conclusion and Future Scope
Summary
25
Conclusion
The "Resume Screening through Machine and Learning and Natural Language
Processing" project emerges as a strategic response to the identified challenges within the
talent recruitment paradigm. The project set out with a clear vision to revolutionize traditional
hiring practices, and its success in meeting these objectives underscores a paradigm shift in
how organizations approach candidate selection. By leveraging advanced technologies such as
Natural Language Processing (NLP), machine learning, and visualization, the project elevated
the recruitment process to new heights. The introduction of unique numerical identifiers for
skills not only standardized the representation but also enhanced transparency. Recruiters and
stakeholders now have a lucid and interpretable view of the criteria influencing candidate
evaluations.
This transparency fosters better communication, aligning expectations among team members
and establishing a foundation of trust in the decision-making process. The integration of NLP
and machine learning techniques streamlined the entire hiring pipeline. The meticulous data
preparation, skill extraction, and model training processes, supported by Pandas, NumPy, and
Scikit-Learn, ensured that recruiters were equipped with a finely tuned system capable of
handling large volumes of resumes efficiently.
This optimization translated into saved time and resources, enabling recruiters to focus more
on strategic decision-making. One of the project's standout features is its adaptability to the
dynamic nature of the job market. The continuous learning mechanism, facilitated by a
feedback loop, allows the model to evolve with changing trends and user feedback. This
adaptability positions the system as a forward-looking solution, ensuring it remains relevant
amidst shifting industry demands.
26
Future Scope
The project's future scope involves the exploration and integration of more advanced artificial
intelligence (AI) models, such as deep learning architectures. Leveraging neural networks
could enhance the system's ability to capture intricate patterns within resumes, further refining
the candidate-job matching process.
While the initial focus was on the IT sector, the project has the potential to expand its scope to
diverse industries. Adapting the model to cater to the unique hiring requirements of different
sectors, such as healthcare, finance, or manufacturing, would broaden its applicability and
impact.
The future evolution of the project could involve the incorporation of multimodal data,
including not just textual information but also visual elements from resumes. Integrating image
and document processing techniques could provide a more comprehensive understanding of a
candidate's qualifications and experiences.
The project can evolve by focusing on refining the user interface and user experience.
Implementing an intuitive and user-friendly interface for recruiters, hiring managers, and other
stakeholders would enhance their interaction with the system, promoting better usability and
efficiency.
To cater to a global job market, the project could include features for handling resumes in
multiple languages. Integrating multilingual support would make the system more inclusive
and adaptable to the diverse linguistic landscape of the international workforce.
27
Bias Mitigation and Fairness
A crucial future enhancement involves the continuous improvement of the model's fairness and
mitigation of biases. Regular audits, ethical considerations, and adjustments to the model's
training data can ensure that the system promotes inclusivity and avoids perpetuating any
inadvertent biases.
28
Snapshots
29
Fig 10. Gathering information of Dataset
30
Fig 12. Categorization of Resume
31
Fig 14. Prediction
32
REFERENCES
[1] K, Tejaswini, V, Umadevi, Kadiwal, Shashank M., and Revanna, Sanjay. 2019. "Resume
Screening Using Machine Learning and NLP: A Proposed System." In IEEE Xplore, 2019.
DOI: 10.1109/ICNTE44896.2019.8945869.
[2] Singh, Anjali, and Mishra, Dr. P.K. 2022. "Resume Screening Classification using Artificial
Intelligence and NLP." International Journal of Recent Innovations in Computer Science and
Engineering. DOI: 10.5281/zenodo.5533827.
[3] Roy, Pradeep Kumar. 2019. "A Machine learning approach for automation of resume
recommendation system." In ICCIDS 2019. Procedia Computer Science. DOI:
10.1016/j.procs.2020.03.284.
[4] Barrett, Aldo Usama, Iqbal, Muhammad, Ismail, Nooraini, Nauman, Muhammad, and
Mohd Yusof, Noor Shahida. 2021. "Automated Resume Screening System Using Natural
Language Processing and Similarity-Based Matching." IEEE Access. DOI:
10.1109/ACCESS.2021.3124773.
[5] Jabbar, M.A., Alzahrani, A.S., Alshamrani, F.A., and Alzahrani, M.A. 2023. "Resume
Screening Using Machine Learning and NLP: A Systematic Review." 2023 International
Conference on Information Technology and Computer Science (ICITCS). DOI:
10.1109/ICITCS56506.2023.00069.
[6] Hiremath, P.S., and Biradar, S.G. 2022. "A Review of Machine Learning Techniques for
Resume Screening." 2022 12th International Conference on Communication, Computing,
Machine Learning and Information Security (ICCCMLIS). DOI:
10.1109/ICCCMLIS54058.2022.9770246.
[7] Patil, S.S., and Nandigavi, A.A. 2021. "Resume Screening Using Deep Learning: A
Comparative Analysis of Techniques." 2021 4th International Conference on Electronics,
Communication and Aerospace Technology (ICECA). DOI:
10.1109/ICECA51214.2021.9531097.
33
[8] Verma, S.K., and Mishra, P.K. 2020. "An Improved Resume Screening System Using
Machine Learning and NLP." 2020 11th International Conference on Computing
Communication and Control (ICCCC). DOI: 10.1109/ICCCC49760.2020.9072615.
[9] Singh, N., Goyal, N., and Shrivastava, S.K. 2019. "Resume Screening for Job Matching
Using Natural Language Processing and Machine Learning." 2019 9th International
Conference on Cloud Computing, Data Science & Engineering (CONFLUENCE). DOI:
10.1109/CONFLUENCE47329.2019.8902126.
[10] Gupta, S.K., and Gupta, V.K. 2018. "A Hybrid Approach for Resume Screening Using
Machine Learning and Text Mining." 2018 5th International Conference on Signal Processing,
Communication and Computing (ICSPCC). DOI: 10.1109/ICSPCC.2018.8562023.
[11] Yadav, N.K., and Verma, S.K. 2017. "A Comparative Study of Machine Learning
Techniques for Resume Screening." 2017 International Conference on Computer and
Communication Technologies (IC3T). DOI: 10.1109/IC3T.2017.8205523.
[12] Mishra, P.K., and Verma, S.K. 2016. "Resume Screening Using Machine Learning: A
Survey." 2016 Fourth International Conference on Parallel Processing, Machine Learning and
Applications (ICPPMLA). DOI: 10.1109/ICPPMLA.2016.7775047.
[13] Saini, A.K., and Saini, M.S. 2015. "Resume Screening System Using Machine Learning
Techniques." 2015 International Conference on Advances in Computer Communication and
Control (ICACCC). DOI: 10.1109/ICACCC.2015.7323422.
[14] Kumar, R., and Gupta, V.K. 2014. "A Machine Learning Approach for Resume
Screening." 2014 International Conference on Computational Intelligence and Networks
(CINet). DOI: 10.1109/CINet.2014.6954156.
34
35
RESUME SCREENING USING MACHINE LEARNING
Maharaja Agrasen Institute of Technology
UG student, Department of Computer Science and Engineering
Guru Gobind Singh Indraprastha University
New Delhi, India
Shivansh Singhal Anurag Kumar Thakur Taniya Sharma
Accuracy Score
7. References
[1] K, Tejaswini, V, Umadevi, Kadiwal,
Fig 4. Frequency Shashank M., and Revanna, Sanjay. 2019.
"Resume Screening Using Machine
Learning and NLP: A Proposed System." In
IEEE Xplore, 2019. DOI:
10.1109/ICNTE44896.2019.8945869.