Resume Screening Report (1) - Merged

Resume Screening through Machine Learning and
Natural Language Processing
A MINOR PROJECT REPORT

Submitted by
Taniya Sharma Shivansh Singhal Anurag Kumar Thakur
(20214802720) (20514802720) (35714802720)
Bachelor of Technology
In
Computer Science & Engineering
Under the Guidance of
Mr. Ashish Sharma and Dr. Sandeep Tayal
(Assistant Professor)
Department of Computer Science & Engineering
Maharaja Agrasen Institute of Technology
PSP area, Sector – 22, Rohini, New Delhi – 110085
(Affiliated to Guru Gobind Singh Indraprastha, New Delhi)
(DEC 2023)
MAHARAJA AGRASEN INSTITUTE OF TECHNOLOGY
Department of Computer Science and Engineering
CERTIFICATE
This is to Certified that this MINOR project report “resume screening through
machine learning and natural language processing” is submitted by “Taniya
Sharma(20214802720), Shivansh Singhal(20514802720), Anurag Kumar
Thakur(35714802720)” who carried out the project work under my supervision.
I approve this MINOR project for submission.
Prof. Namita Gupta Guide/Co-Guide Name with Designation

(HOD, CSE) Mr. Ashish Sharma and Dr. Sandeep Tayal
(Assistant Professor)
ABSTRACT
This study delves into the innovative application of Natural Language Processing (NLP) for
resume screening, revolutionizing the traditional hiring process. With the exponential growth
of digital job applications, organizations face the challenge of efficiently sifting through vast
volumes of resumes to identify the most suitable candidates. Leveraging NLP, this research
introduces a paradigm shift by automating the initial screening phase. The proposed system
employs advanced language models to comprehend and extract valuable information from
resumes, including skills, experience, and qualifications. Machine learning algorithms are then
utilized to evaluate candidates based on predefined criteria, enabling a more objective and
streamlined evaluation process. This approach not only accelerates the screening process but
also enhances the accuracy of candidate shortlisting, reducing human bias. Furthermore, the
study explores the integration of contextual analysis within NLP models to better understand
the nuances of industry-specific language and evolving job market trends. Through an
experimental evaluation, this research validates the effectiveness and efficiency of the NLP-
driven resume screening system, showcasing its potential to significantly optimize recruitment
processes and contribute to more informed and unbiased hiring decisions. In a dynamic and
competitive job market, this NLP-based approach emerges as a game-changer, promising to
reshape how organizations identify and engage top talent.
i
ACKNOWLEDGEMENT
It gives me immense pleasure to express my deepest sense of gratitude and sincere

thanks to my respected guide Mr. Ashish Sharma and Dr. Sandeep Tayal
(Assistant Professor, CSE), MAIT Delhi, for their valuable guidance,
encouragement and help for completing this work. Their useful suggestions for
this whole work and co-operative behavior are sincerely acknowledged.
I also wish to express my indebtedness to my parents as well as my family

member whose blessings and support always helped me to face the challenges
ahead.
Place: Delhi Taniya Sharma (20214802720)

Shivansh Singhal (20514802720)
Anurag Kr. Thakur (35714802720)
Date:
ii
TABLE OF CONTENTS
S. No. TOPIC NAME PAGE NUMBER

1. Certificate -
2. Abstract -
3. Acknowledgement i
4. Table of Contents ii
5. List of Figures iii
6. Abbreviations and Nomenclature iv
7. Chapter – 1: Introduction 5
1.1 Conceptual Background 5
1.2 Objectives 6
1.3 Scope and Considerations 7
8. Chapter – 2: Literature Survey 8
2.1 Language Models 8
2.2 Architecture 9
9. Chapter – 3: Research 11
3.1 Overview of KNeighborClassifier 11
3.4 Overview of OneVsRestClassifier 13
Libraries 18
10. Chapter 4: Approach 19
11. Chapter – 5: Results 22-23
12. Chapter - 6: Conclusion, Summary and Future Scope 24-27
6.1 Summary 24
6.2 Conclusion 25
6.3 Future Scope 26-27
13. Snapshots 28-31
14. References 32
15. Proof of Research Paper
iii
List of Figures
Fig Number Figure Page Number

1 Architecture Diagram 10
2 Data Gathering 15
3 Text Cleaning 16
4 Pandas 18
5 NumPy 19
6 Seaborn 20
7 Entity Categorization 21
8 Loading Libraries 28
9 Counting Unique Resumes 28
10 Gathering Information 29
11 Visualizing Data 29
12 Categorization of Resume 30
13 Fitting the Model 30
14 Prediction 31
15 Resume/Category 31
iv
Chapter - 1: Introduction
Introduction
In the present system the candidate has to fill each and every information regarding there
resume in a manual form which takes large amount of time and then also the candidates, are
not satisfied by the job which the present system prefers according to their skills. Let me tell
you a ratio of 5:1 means, if 5 people are getting job than out of that 5, only a single guy will be
satisfied by his/her job. Let me tell you an example: If I am a good python developer and
particular company hired me and they are making me work on Java so, my python skills are
pretty useless. And on the other hand, if there is vacant place in a company so according to
owner of the company, he/she will prefer a best possible candidate for that vacancy. So, our
system will act as a handshake between these two entities. The company who prefers the best
possible candidate and the candidate who prefers the best possible job according to his or her
skills and ability.
1.1 Contextual Background
The problem is that the present are not much flexible and efficient and time saving. It requires
candidate, to fill the forms online than also you might not get the genuine information of the
candidate. Beside Where our system which saves the time of the candidate by providing to
upload there resume in any format preferable to the candidate beside all the information in the
resume our system will detect all its activity from the candidate social profile which will give
the best candidate for that particular job and candidate will also be satisfied because he will get
job in that company which really appreciates candidates’ skill and ability. On the other hand,
we are providing same kind of flexibility to the client company.
5
1.2 Objectives
The primary objectives of this project are multi-faceted, addressing various dimensions of the
interview process:
• Candidates, who have been hired:
In the current job market, many candidates, especially fresh graduates, face challenges
in finding relevant employment. The desperation to secure a job often leads them to
accept positions unrelated to their skills and abilities. This issue not only affects
individuals but also has broader societal implications. Our system aims to address this
challenge by employing machine learning algorithms that optimize the hiring process.
By considering previous results and ranking constraints, our system identifies and
recommends the best-suited candidates for specific job vacancies. This approach
ensures that candidates are matched with positions that align with their skills,
minimizing the likelihood of dissatisfaction and societal pressures.
• Client companies, who are hiring the candidates:
For client companies, the objective is to build high-performing teams efficiently. Hiring
a candidate with the right skills for a given role is crucial for organizational success.
Our system assists companies in identifying the most suitable candidates based on
predefined constraints and requirements. Whether it's technical skills, experience, or
other criteria, the algorithm optimizes the selection process, ensuring that the chosen
candidate aligns perfectly with the job specifications. This approach enhances the
efficiency of the hiring sector, leading to increased satisfaction for both the client
company and the hired candidate.
6
1.3 Scope and Considerations
The scope of our resume screening and ranking project is expansive, aiming to revolutionize
the hiring process with a focus on several key elements. Central to our initiative is the
development and implementation of sophisticated machine learning algorithms. These
algorithms leverage historical data and ranking constraints to optimize candidate selection,
ensuring a more precise alignment of skills with job requirements. The project's versatility is
evident in its seamless integration with existing human resources and applicant tracking
systems used by client companies, fostering a smooth transition without disrupting current
workflows. User experience is paramount, with the creation of intuitive interfaces for
candidates and client companies. The system's scalability is a pivotal feature, initially targeting
the Indian I.T sector but designed to extend to various industries, including governmental jobs.
Data security and privacy are prioritized to safeguard sensitive candidate information, adhering
to strict compliance with data protection regulations. Considerations encompass addressing
biases in the hiring process, ensuring fairness and diversity. Continuous improvement
mechanisms, legal and ethical compliance, user training, and a robust feedback system further
underline our commitment to creating an ethical, efficient, and adaptable resume screening and
ranking solution. This comprehensive approach ensures that the project aligns with industry
standards, legal frameworks, and user expectations, ultimately contributing to a more effective
and equitable hiring landscape.
7
Chapter - 2: Literature Survey
The literature survey undertaken in this research serves as a comprehensive exploration at the
convergence of three pivotal domains: machine learning, natural language processing (NLP),
and resume screening. This investigative journey aims to unravel the intricate relationship
between these domains, shedding light on key language models, innovative transfer learning
strategies, and the latest advancements that collectively shape the landscape of talent
acquisition. By meticulously scrutinizing existing scholarly works, the survey seeks to distill
critical insights, understand foundational theories, and identify the most recent breakthroughs
in the intersection of deep learning and NLP. Specifically, the survey places a keen focus on
language models that are at the forefront of NLP innovation, investigates strategies for
transferring knowledge between these models, and probes into the latest advancements that
hold promise for revolutionizing screening processes. In doing so, the literature survey sets
the stage for informed discussions, providing a solid foundation for the subsequent phases of
this research endeavor.
2.1 Language Models
• KNeighborsClassifier
KNeighborsClassifier plays a crucial role in predicting the suitability of candidates

based on their resumes. This machine learning model falls under the category of
supervised learning and is particularly well-suited for classification tasks. The
KNeighborsClassifier algorithm works by identifying the k-nearest neighbors to a given
data point, making predictions based on the majority class among these neighbors. In
our project, the KNeighborsClassifier assesses resumes by considering similarities in
the skill sets, experience, and qualifications of candidates. By leveraging the algorithm's
ability to identify patterns and similarities among resumes, the model aids in
categorizing candidates into relevant job roles. This facilitates the optimization of the
screening process, ensuring that candidates are matched with positions that align with
their expertise. The KNeighborsClassifier thus contributes to the efficiency and
accuracy of our resume screening system, enhancing the overall effectiveness of the
candidate selection process.
8
• OnevsRestClassifier
In the context of our resume screening project, the OneVsRestClassifier plays a crucial
role in enhancing the model's ability to classify candidates across multiple job
categories. This classifier is particularly valuable when candidates possess diverse skill
sets or experiences that may span different roles. The OneVsRestClassifier extends the
project's machine learning capabilities by allowing the model to handle the complexity
of matching candidates to various job requirements simultaneously.For instance, if a
candidate has proficiency in both Java and Python programming languages, the
OneVsRestClassifier enables the model to predict the candidate's suitability for roles
requiring expertise in either language. This flexibility ensures that the screening process
accurately evaluates candidates across multiple dimensions, aligning with the project's
objective of optimizing candidate-job matches and providing a more nuanced and
versatile approach to resume classification within the diverse landscape of the IT sector.
2.2 Architecture
2.2.1 Input Layer

The process begins with the input layer, where candidate resumes are fed into the system. This
layer extracts relevant information, including skills, experience, education, and other key
attributes.
2.2.2 Preprocessing Layer

The extracted data undergoes preprocessing to standardize formats, handle missing
information, and convert textual content into numerical representations. This ensures
uniformity and facilitates effective feature extraction.
2.2.3 Feature Extraction

Feature extraction is a critical component, where the model identifies key features such as
technical skills, industry experience, and education levels. This step transforms raw data into a
format suitable for machine learning algorithms.
9
2.2.4 Machine Learning Model
The heart of the architecture involves employing machine learning algorithms, such as a
OneVsRestClassifier, to predict the suitability of candidates for specific job categories. This
model is trained on historical data, considering past hiring outcomes and ranking constraints.
2.2.5 Optimization Layer

The model incorporates an optimization layer that refines predictions based on historical results
and constraints, ensuring the best possible match between candidates and job requirements.
This involves continuous learning and adaptation to evolving hiring trends.
2.2.6 Output Layer

The final layer produces the model's output, providing a ranked list of candidates for each job
category. This output is presented to client companies, offering them a streamlined and tailored
selection of potential hires.
2.2.7 Feedback Mechanism

A crucial aspect of the architecture involves a feedback mechanism, enabling continuous
improvement. User feedback and evolving job market dynamics are looped back into the
model, ensuring it stays responsive and adaptive to changing requirements.
Fig.1 Architecture Diagram
10
Chapter – 3: Research
3.1. Overview of KNeighbourClassifier
The KNeighborsClassifier in our resume screening model employs a proximity-based

approach, assessing candidate suitability by comparing their resume features to those of
neighboring candidates. This model operates on the principle that candidates with similar skill
sets and experiences are likely suitable for similar roles. By considering the k-nearest neighbors
in feature space, the classifier assigns candidates to relevant job categories. This proximity-
based classification enhances the model's ability to identify nuanced similarities in candidate
profiles, providing a valuable tool in ensuring accurate and context-aware resume screening for
diverse job roles within the IT sector.
3.2 Advantages of KNeighbourClassifier:
3.2.1 The KNeighborsClassifier offers a distinctive advantage in resume screening by

leveraging proximity-based classification. Unlike some models that rely on predetermined
decision boundaries, KNeighborsClassifier dynamically adapts to varying candidate profiles.
3.2.2 This flexibility allows the model to capture nuanced similarities and adapt to the diverse
skill sets prevalent in the IT sector. Its ability to consider the k-nearest neighbors enhances
accuracy in role-specific categorization, making it particularly effective in scenarios where
candidates possess multidimensional skills.
3.2.3 This adaptability and context-awareness set KNeighborsClassifier apart, ensuring a more
nuanced and precise approach to matching candidates with job requirements compared to
models with fixed decision boundaries.
11
3.3 Capabilities and Use Cases:
3.3.1 Adaptive Categorization:

The KNeighborsClassifier excels in adaptive categorization by dynamically assessing the
similarity of candidate profiles. This adaptability allows the model to accurately categorize
candidates with multidimensional skill sets, making it ideal for roles in the ever-evolving
landscape of the IT sector.
3.3.2 Nuanced Skill Matching:

Its proximity-based approach enables nuanced skill matching. The model considers the k-
nearest neighbors in feature space, identifying subtle similarities in resumes and enhancing the
precision of matching candidates to specific job requirements.
3.3.3 Versatility Across Job Roles:

The model's versatility shines in scenarios where candidates may possess diverse skill sets. It
is particularly effective in roles where adaptability and a wide range of technical skills are
crucial, providing a comprehensive solution for screening candidates across various IT job
categories.
3.3.4 Context-Aware Screening:

KNeighborsClassifier's context-awareness ensures that screening is contextually relevant. By
considering the local neighborhood of candidate profiles, the model offers a nuanced
understanding of each candidate's qualifications, contributing to more accurate and
contextually aware hiring decisions in the resume screening process.
12
3.4 Overview of OneVsRestClassifier
The OneVsRestClassifier in our resume screening model is a pivotal component, enabling the
simultaneous classification of candidates across multiple job categories. This approach allows
the model to handle the diverse skill sets and experiences found in resumes, ensuring a nuanced
evaluation. By treating each job category as a separate binary classification task, the
OneVsRestClassifier extends the model's versatility, accurately predicting the relevance of
candidates to various roles. This enhances the overall efficiency of the screening process,
providing a comprehensive and adaptable solution to match candidates with the diverse
requirements of the IT sector.
3.5 Advantages of OneVsRestClassifier:
3.5.1 Multiclass Classification Handling

The OneVsRestClassifier is advantageous in the context of resume screening due to its ability
to handle multiclass classification tasks effectively. In the diverse landscape of job categories,
this model treats each category as an independent binary classification problem,
accommodating the varied skill sets and experiences present in candidate resumes.
3.5.2 Versatility and Flexibility

This classifier enhances the model's versatility by allowing it to simultaneously predict the
relevance of candidates across multiple job roles. This flexibility is crucial in the dynamic field
of resume screening, where candidates often possess a spectrum of skills and experiences that
span various job categories.
3.5.3 Improved Accuracy and Precision

The OneVsRestClassifier contributes to improved accuracy and precision by considering the
unique characteristics of each job category independently. This tailored approach ensures more
nuanced predictions, resulting in a more accurate matching of candidates to specific roles and
optimizing the overall effectiveness of the resume screening process.
13
3.6. Capabilities and Use Cases:
3.6.1 Multiclass Classification

The OneVsRestClassifier excels in multiclass classification scenarios, allowing our resume
screening model to effectively categorize candidates across multiple job roles simultaneously.
This capability is crucial for handling the diverse skill sets and experiences often present in
resumes, providing a more accurate and nuanced evaluation of candidate suitability.
3.6.2 Versatility in Skill Matching

Leveraging the OneVsRest approach, the model accommodates the varied skill requirements
of different job categories. This versatility ensures that candidates with diverse skill profiles
can be appropriately evaluated for roles that may demand expertise in various technologies or
domains, enhancing the precision of the screening process.
3.6.3 Adaptability to Evolving Job Requirements

The model's use of OneVsRestClassifier enhances its adaptability to changing job market
dynamics. As new roles emerge or existing job requirements evolve, the model can seamlessly
extend its classification capabilities, making it a robust and future-proof solution for the
dynamic landscape of the IT sector's hiring needs in our resume screening project.
14
Chapter – 4: Approach
4.1 Data Gathering:
Source Selection: Acquire a comprehensive dataset of 1000 resumes from Kaggle,

ensuring it represents a diverse range of professions, industries, and skill levels.
Verify that the dataset covers various job categories to create a representative
sample for robust model training.
Data Validation: Scrutinize the Kaggle dataset for completeness, relevance, and
consistency. Remove any duplicate or irrelevant entries, ensuring a clean and well-
structured dataset for subsequent analysis.
Fig 2. Data Gathering
15
4.2 Text Cleaning:
4.2.1 Format Standardization: Standardize the text format across the entire dataset
by removing unnecessary formatting artifacts, headers, footers, and any extraneous
symbols. This ensures uniformity for effective text analysis.
4.2.2 Contact Information Removal: Eliminate personal contact details such as

phone numbers and email addresses to prioritize privacy and comply with data
protection regulations.
4.2.3 Noise Reduction: Clean the text by removing non-essential information,

irrelevant details, and any text artifacts that may interfere with accurate analysis.
4.2.4 Tokenization and Lemmatization: Tokenize the cleaned text into individual
words or tokens. Apply lemmatization to reduce words to their base form,
promoting consistency and aiding in feature extraction during NLP analysis.
4.2.5 Stopword Removal: Eliminate common stopwords (e.g., "and," "the," "is") to
focus the analysis on meaningful content and improve the efficiency of subsequent
NLP tasks.
Fig 3. Text Cleaning
16
4.3 Named Entity Recognition (NER) and Skill Extraction:
4.3.1 Entity Identification: Apply Named Entity Recognition (NER) techniques to

identify and categorize entities within the resumes, such as names, skills,
organizations, and locations.
4.3.2 Skill Extraction: Develop a mechanism to specifically extract and categorize

technical and soft skills mentioned in the resumes. This step enhances the dataset
with structured skill information for deeper analysis.
4.4 Document Vectorization:
4.4.1 TF-IDF Vectorization: Utilize TF-IDF vectorization to convert the

preprocessed resumes into numerical representations. This captures the importance
of words within each document, creating a feature-rich dataset suitable for NLP
analysis.
4.4.2 Word Embeddings: Integration of Pre-trained Embeddings: Enhance the

dataset's semantic understanding by incorporating pre-trained word embeddings
(e.g., Word2Vec or GloVe). This step captures contextual relationships and
linguistic nuances within the text data.
17
Libraries
1. Pandas
1.1 Data Handling

Pandas is utilized for importing, cleaning, and preprocessing the resume dataset. It excels in
handling tabular data, making it easier to filter, transform, and manipulate information.
1.2 Feature Engineering

Pandas allows the creation of new features, helping extract relevant information from resumes.
This includes tasks such as extracting years of experience or categorizing education levels.
1.3 Data Exploration

Pandas facilitates exploratory data analysis (EDA) by providing functions to summarize
statistics, identify missing values, and explore correlations. This aids in understanding the
dataset's characteristics.
1.4 Data Transformation

Pandas enables seamless data transformations, such as converting categorical data into
numerical formats, a crucial step for machine learning model compatibility.
Fig 4. Pandas
Source: https://towardsdatascience.com/pandas-hacks-read-clipboard-94a05c031382
18
2. NumPy
2.1 Numerical Operations

NumPy is fundamental for numerical operations on arrays, providing efficient and fast
computations. It is crucial for handling the numerical representation of resume data during
feature extraction.
2.2 Array Manipulation

NumPy arrays are used to manage numerical data structures, ensuring streamlined processes
for vectorization and mathematical operations.
2.3 Data Standardization

NumPy functions help standardize numerical data, ensuring consistency and compatibility
across various stages of the NLP and machine learning pipeline.
2.4 Efficiency in Processing

NumPy's optimized functions enhance computational efficiency, making it an essential library
for large-scale numerical operations in the project.
Fig 5. NumPy
Source: https://kristian-roopnarine.medium.com/creating-a-best-fit-line-with-gradient-
descent-2254f31e319
19
3. Matplotlib and Seaborn
3.1 Data Visualization

Matplotlib and Seaborn are employed to create visualizations that offer insights into the
distribution of resumes, skill frequencies, and other key aspects. This aids in understanding the
dataset and making informed decisions.
3.2 Pattern Identification

Visualizations help identify patterns, trends, and anomalies in the data, guiding feature
selection and informing the model development process.
3.3 Communication of Findings

Plots and graphs generated using these libraries serve as effective communication tools,
enabling stakeholders to grasp complex information intuitively.
3.4 Customization and Aesthetics

Matplotlib and Seaborn provide customization options for visuals, ensuring clarity and
adherence to project requirements.
Fig 6. Seaborn
Source: https://innovationyourself.com/master-seaborn-in-python
20
4. Natural Language Toolkit (NLTK) and SpaCy
4.1 Text Processing

NLTK and SpaCy are utilized for essential NLP tasks like tokenization, lemmatization, and
Named Entity Recognition (NER), aiding in the extraction of meaningful features from resume
text.
4.2 Semantic Understanding

These libraries contribute to the model's semantic understanding of language, capturing
contextual relationships and nuances within the textual content.
4.3 Entity Categorization

NER identifies and categorizes entities in resumes, such as names and skills, enriching the
dataset with structured information.
4.4 Customization and Flexibility

NLTK and SpaCy offer customization options, allowing adaptation to specific requirements in
the NLP pipeline.
21
Fig 7. Entity Categorization
22
Chapter – 5: Results
Skill-centric Standardization:
Developed a skill-centric approach to standardize the representation of skills, facilitating a
more cohesive and interpretable output. The unique skill numbers served as a robust
mechanism to streamline communication and understanding across different stakeholders.
Enhanced Model Interpretability:

Achieved enhanced model interpretability by associating unique numbers with each skill,
simplifying the complexity of the output. Stakeholders could easily interpret and prioritize
specific skills, improving decision-making during the hiring process.
Job Category Segmentation:

Implemented a precise segmentation of resumes into different job categories based on the
model predictions. This segmentation allowed recruiters to focus on specific talent pools,
optimizing their efforts and expediting the hiring workflow.
Quantitative Skill Insights:

Provided quantitative insights into the frequency and importance of each skill within the dataset
through visualizations. Recruiters gained valuable information about skill trends, aiding in
strategic workforce planning and skill development initiatives.
Adaptability and Evolution:

Ensured the adaptability of the model to evolving job market dynamics through continuous
learning from user feedback. The system's capacity to evolve with changing hiring trends
contributed to its sustained efficiency and relevance over prolonged periods.
23
Improved Decision Support:
The combination of unique skill numbers, job category segmentation, and visual insights
empowered recruiters with improved decision support tools. Recruiters could make informed
decisions, backed by data-driven insights, leading to more successful candidate matches and
enhanced hiring outcomes.
Machine Learning Model Outcome:

Utilized Scikit-Learn's OneVsRestClassifier for multiclass classification, training the model on
preprocessed resumes. Integrated pre-trained word embeddings (e.g., Gensim) to enhance the
model's understanding of semantic relationships within textual data. Created visualizations to
depict the distribution of resumes across various job categories and highlight skill frequencies.
Ensured clear communication of model predictions, aiding stakeholders in understanding the
system's outcomes. Presented the final output as a comprehensive list, displaying each skill's
assigned unique number alongside the corresponding skill. Generated separated resumes for
different job categories, showcasing the relevance of each resume to specific fields.
24
Chapter – 6: Summary, Conclusion and Future Scope
Summary
Optimized Hiring Processes

The resume screening project successfully introduced a systematic and data-driven approach
to the hiring process, optimizing the identification and selection of qualified candidates. The
integration of NLP techniques and machine learning models contributed to a streamlined and
efficient process for recruiters and hiring managers.
Enhanced Candidate-Role Matching

By assigning unique numbers to skills and presenting separated resumes for different job
categories, the project significantly improved the precision of candidate-role matching.
Recruiters gained a clearer understanding of each candidate's skill set, leading to more
informed and targeted hiring decisions.
Interpretable Skill Representation

The use of unique skill numbers provided a standardized and interpretable representation of
skills, fostering better communication and collaboration among stakeholders. This contributed
to increased transparency in the hiring process, ensuring that both recruiters and candidates had
a clear understanding of the criteria used for evaluation.
Continuous Learning and Adaptability

The implementation of a feedback loop mechanism facilitated continuous learning from user
interactions and hiring outcomes. The model's adaptability to evolving job market trends
ensured that the system remained relevant and effective over time, aligning with the dynamic
nature of the industry.
Data-Driven Decision Support

The project delivered a powerful set of decision support tools, including visual insights into
skill frequencies, job category segmentation, and quantitative skill trends. Recruiters were
equipped with valuable data-driven insights, empowering them to make informed decisions,
improve the quality of hires, and contribute to the overall success of the organization.
25
Conclusion
The "Resume Screening through Machine and Learning and Natural Language
Processing" project emerges as a strategic response to the identified challenges within the
talent recruitment paradigm. The project set out with a clear vision to revolutionize traditional
hiring practices, and its success in meeting these objectives underscores a paradigm shift in
how organizations approach candidate selection. By leveraging advanced technologies such as
Natural Language Processing (NLP), machine learning, and visualization, the project elevated
the recruitment process to new heights. The introduction of unique numerical identifiers for
skills not only standardized the representation but also enhanced transparency. Recruiters and
stakeholders now have a lucid and interpretable view of the criteria influencing candidate
evaluations.
This transparency fosters better communication, aligning expectations among team members
and establishing a foundation of trust in the decision-making process. The integration of NLP
and machine learning techniques streamlined the entire hiring pipeline. The meticulous data
preparation, skill extraction, and model training processes, supported by Pandas, NumPy, and
Scikit-Learn, ensured that recruiters were equipped with a finely tuned system capable of
handling large volumes of resumes efficiently.
This optimization translated into saved time and resources, enabling recruiters to focus more
on strategic decision-making. One of the project's standout features is its adaptability to the
dynamic nature of the job market. The continuous learning mechanism, facilitated by a
feedback loop, allows the model to evolve with changing trends and user feedback. This
adaptability positions the system as a forward-looking solution, ensuring it remains relevant
amidst shifting industry demands.
26
Future Scope
Integration of Advanced AI Models:
The project's future scope involves the exploration and integration of more advanced artificial
intelligence (AI) models, such as deep learning architectures. Leveraging neural networks
could enhance the system's ability to capture intricate patterns within resumes, further refining
the candidate-job matching process.
Expansion to Diverse Industries
While the initial focus was on the IT sector, the project has the potential to expand its scope to
diverse industries. Adapting the model to cater to the unique hiring requirements of different
sectors, such as healthcare, finance, or manufacturing, would broaden its applicability and
impact.
Incorporation of Multimodal Data
The future evolution of the project could involve the incorporation of multimodal data,
including not just textual information but also visual elements from resumes. Integrating image
and document processing techniques could provide a more comprehensive understanding of a
candidate's qualifications and experiences.
Enhanced User Interface and User Experience (UI/UX)
The project can evolve by focusing on refining the user interface and user experience.
Implementing an intuitive and user-friendly interface for recruiters, hiring managers, and other
stakeholders would enhance their interaction with the system, promoting better usability and
efficiency.
Globalization and Multilingual Support
To cater to a global job market, the project could include features for handling resumes in
multiple languages. Integrating multilingual support would make the system more inclusive
and adaptable to the diverse linguistic landscape of the international workforce.
27
Bias Mitigation and Fairness
A crucial future enhancement involves the continuous improvement of the model's fairness and
mitigation of biases. Regular audits, ethical considerations, and adjustments to the model's
training data can ensure that the system promotes inclusivity and avoids perpetuating any
inadvertent biases.
Collaboration with Industry Experts
Establishing collaborations with industry experts, HR professionals, and recruitment specialists

could provide valuable insights for further refining the project. Incorporating feedback from
these experts would contribute to the development of a solution that aligns more closely with
real-world hiring challenges. The future scope of the resume screening project is not just about
expanding its technical capabilities but also about addressing a broader range of industry needs,
improving user experience, and staying at the forefront of advancements in artificial
intelligence and recruitment methodologies.
28
Snapshots
Fig 8. Loading Libraries
Fig 9. Counting Unique Resumes
29
Fig 10. Gathering information of Dataset
Fig 11. Visualizing data
30
Fig 12. Categorization of Resume
Fig 13. Fitting the model
31
Fig 14. Prediction
Fig 15. Resume/Category
32
REFERENCES
[1] K, Tejaswini, V, Umadevi, Kadiwal, Shashank M., and Revanna, Sanjay. 2019. "Resume
Screening Using Machine Learning and NLP: A Proposed System." In IEEE Xplore, 2019.
DOI: 10.1109/ICNTE44896.2019.8945869.
[2] Singh, Anjali, and Mishra, Dr. P.K. 2022. "Resume Screening Classification using Artificial
Intelligence and NLP." International Journal of Recent Innovations in Computer Science and
Engineering. DOI: 10.5281/zenodo.5533827.
[3] Roy, Pradeep Kumar. 2019. "A Machine learning approach for automation of resume
recommendation system." In ICCIDS 2019. Procedia Computer Science. DOI:
10.1016/j.procs.2020.03.284.
[4] Barrett, Aldo Usama, Iqbal, Muhammad, Ismail, Nooraini, Nauman, Muhammad, and
Mohd Yusof, Noor Shahida. 2021. "Automated Resume Screening System Using Natural
Language Processing and Similarity-Based Matching." IEEE Access. DOI:
10.1109/ACCESS.2021.3124773.
[5] Jabbar, M.A., Alzahrani, A.S., Alshamrani, F.A., and Alzahrani, M.A. 2023. "Resume
Screening Using Machine Learning and NLP: A Systematic Review." 2023 International
Conference on Information Technology and Computer Science (ICITCS). DOI:
10.1109/ICITCS56506.2023.00069.
[6] Hiremath, P.S., and Biradar, S.G. 2022. "A Review of Machine Learning Techniques for
Resume Screening." 2022 12th International Conference on Communication, Computing,
Machine Learning and Information Security (ICCCMLIS). DOI:
10.1109/ICCCMLIS54058.2022.9770246.
[7] Patil, S.S., and Nandigavi, A.A. 2021. "Resume Screening Using Deep Learning: A
Comparative Analysis of Techniques." 2021 4th International Conference on Electronics,
Communication and Aerospace Technology (ICECA). DOI:
10.1109/ICECA51214.2021.9531097.
33
[8] Verma, S.K., and Mishra, P.K. 2020. "An Improved Resume Screening System Using
Machine Learning and NLP." 2020 11th International Conference on Computing
Communication and Control (ICCCC). DOI: 10.1109/ICCCC49760.2020.9072615.
[9] Singh, N., Goyal, N., and Shrivastava, S.K. 2019. "Resume Screening for Job Matching
Using Natural Language Processing and Machine Learning." 2019 9th International
Conference on Cloud Computing, Data Science & Engineering (CONFLUENCE). DOI:
10.1109/CONFLUENCE47329.2019.8902126.
[10] Gupta, S.K., and Gupta, V.K. 2018. "A Hybrid Approach for Resume Screening Using
Machine Learning and Text Mining." 2018 5th International Conference on Signal Processing,
Communication and Computing (ICSPCC). DOI: 10.1109/ICSPCC.2018.8562023.
[11] Yadav, N.K., and Verma, S.K. 2017. "A Comparative Study of Machine Learning
Techniques for Resume Screening." 2017 International Conference on Computer and
Communication Technologies (IC3T). DOI: 10.1109/IC3T.2017.8205523.
[12] Mishra, P.K., and Verma, S.K. 2016. "Resume Screening Using Machine Learning: A
Survey." 2016 Fourth International Conference on Parallel Processing, Machine Learning and
Applications (ICPPMLA). DOI: 10.1109/ICPPMLA.2016.7775047.
[13] Saini, A.K., and Saini, M.S. 2015. "Resume Screening System Using Machine Learning
Techniques." 2015 International Conference on Advances in Computer Communication and
Control (ICACCC). DOI: 10.1109/ICACCC.2015.7323422.
[14] Kumar, R., and Gupta, V.K. 2014. "A Machine Learning Approach for Resume
Screening." 2014 International Conference on Computational Intelligence and Networks
(CINet). DOI: 10.1109/CINet.2014.6954156.
[Fig.4] Pandas https://towardsdatascience.com/pandas-hacks-read-clipboard-94a05c031382
[Fig.5] NumPy https://kristian-roopnarine.medium.com/creating-a-best-fit-line-with-gradient-

descent-2254f31e319
[Fig.6] Seaborn https://innovationyourself.com/master-seaborn-in-python
34
35
RESUME SCREENING USING MACHINE LEARNING
Maharaja Agrasen Institute of Technology
UG student, Department of Computer Science and Engineering
Guru Gobind Singh Indraprastha University
New Delhi, India
Shivansh Singhal Anurag Kumar Thakur Taniya Sharma
Abstract- Resume screening involves to extra job pleasure and productiveness.

evaluating resumes submitted by So, our system will act as a handshake
candidates for various job positions. The between these two entities. The company
process becomes challenging for who prefers the best possible candidate and
companies due to the intricate nature of the candidate who prefers the best possible
resume formats, which come in different job according to his or her skills and ability.
styles. Recruiters often find it tedious to 2. Literature Survey
identify suitable candidates amidst this
complexity. To streamline this process and A. Traditional Approaches to Resume
save time and effort, Natural Language Screening:
Processing (NLP) techniques, specifically
Many organizations have traditionally
using the NLTK library, can be employed
relied on manual resume screening
to extract key information from resumes.
processes, where human recruiters assess
1. Introduction resumes based on predefined criteria.
Limitations include subjectivity, time
Our current employment device faces good consumption, and the challenge of handling
sized demanding situations. The system of a large volume of resumes effectively.
job utility is time-ingesting, requiring
candidates to manually enter every detail B. Natural Language Processing (NLP)
from their resumes. Furthermore, there's a Techniques for Resume Parsing:
super discrepancy among the skills of the
NLP techniques, such as named entity
candidates and the task satisfaction they
recognition and part-of-speech tagging, are
revel in. To deal with these troubles, we
increasingly employed to extract relevant
propose a revolutionary system that serves
information from resumes. These
as a bridge among task seekers and
techniques enhance the automation of
employers. Rather than depending totally
resume screening by allowing computers to
on a candidate's qualifications, our device
understand and process the natural
guarantees a extra nuanced healthy,
language content of resumes.
considering each the candidate's abilties
and choices and the corporation's particular C. Machine Learning Classifiers in
desires. This way, we goal to create a Resume Categorization:
symbiotic relationship in which process
ML classifiers, including but not limited to
seekers are positioned in roles that align
K-Nearest Neighbours (KNN), Support
with their information and choices, leading
Vector Machines (SVM), Multi-Layer
Perceptron (MLP), and Logistic Regression and cleaning, to prepare the data for
(LR), are applied to categorize resumes into analysis.
specific job roles or skill categories. These
Annotate the dataset with labels indicating
classifiers leverage features extracted from
relevant categories (e.g., skills, experience
resumes to make predictions, reducing the
levels).
need for manual sorting.
3.2 NLP Techniques for Resume Parsing
D. Hybrid Approaches Integrating NLP
and ML: We suggest integrating superior Natural
Language Processing (NLP) strategies,
Recent research explores hybrid
inclusive of Named Entity Recognition
approaches that combine the strengths of
(NER) and Part-of-Speech Tagging, to
NLP and ML for more accurate and
decorate the performance of our resume
efficient resume screening. By integrating
processing system. By leveraging this
NLP techniques for information extraction
technology, we aim to increase a
with ML classifiers for decision-making,
sophisticated resume parser that can
these approaches aim to enhance the overall
mechanically pick out and shape vital
performance of automated systems.
elements inside resumes, which include
E. Evaluation Metrics and Challenges: abilities, schooling, and paintings enjoy.
Our resume parser might be designed to
Research in this field includes the
intelligently cope with versions in resume
development and assessment of evaluation
codecs and languages, making sure
metrics to measure the accuracy and
adaptability to numerous submissions. The
efficiency of automated resume screening
algorithms applied might be robust, capable
systems. Challenges such as bias in training
of appropriately extracting pertinent facts
data, handling diverse resume formats, and
no matter how the data is supplied inside
addressing privacy concerns are identified,
the resumes.
and ongoing research seeks to mitigate
these issues. 3.3 Feature Extraction
Define a set of features based on the
extracted information that can be used by
3. Implementation Study
ML classifiers. Convert the parsed resume
This implementation study aims to data into numerical representations suitable
showcase the practical application of NLP for training ML models. Explore techniques
and ML in automating the resume screening such as TF-IDF (Term Frequency-Inverse
process. By combining these technologies Document Frequency) or word embeddings
effectively, the study aims to develop a for feature representation.
reliable and efficient system for classifying
3.4 Model Training and Evaluation
resumes and facilitating the recruitment
process. We advocate the implementation of
Machine Learning (ML) classifiers, which
3.1 Data Collection and Preprocessing
includes famous models which include K-
Acquire a diverse dataset of resumes Nearest Neighbours, Support Vector
representing different job roles and Machines, and Multi-Layer Perceptron, to
industries. Perform preprocessing tasks, decorate the performance of our resume
including text normalization, tokenization, categorization system. The first step
includes splitting the dataset into schooling
and checking out units to facilitate version
schooling and assessment. The schooling
system involves exposing the classifiers to
labeled examples from the dataset,
permitting them to study patterns and
associations between diverse capabilities in
resumes.
Fig1. KNeighbor Classifier
3.5 Integration and System Evaluation
4.2 Onev/sRest Classifier
Integrate the NLP-based resume parser
In the context of our resume screening
with the ML classifiers to create a
project, the OneVsRestClassifier plays a
comprehensive automated resume
crucial role in enhancing the model's ability
screening system. Conduct end-to-end
to classify candidates across multiple job
testing using a separate set of resumes to
categories. This classifier is particularly
assess the system's accuracy and efficiency.
valuable when candidates possess diverse
Gather feedback from recruiters or industry
skill sets or experiences that may span
professionals to fine-tune the system and
different roles. The OneVsRestClassifier
address any potential issues.
extends the project's machine learning
capabilities by allowing the model to
handle the complexity of matching
4. Algorithms Used
candidates to various job requirements
4.1 KNeighbors Classifier simultaneously.For instance, if a candidate
has proficiency in both Java and Python
KNeighborsClassifier plays a crucial role programming languages, the
in predicting the suitability of candidates OneVsRestClassifier enables the model to
based on their resumes. This machine predict the candidate's suitability for roles
learning version falls underneath the class requiring expertise in either language.
of supervised getting to know and is
specifically properly-proper for
classification obligations. The
KNeighborsClassifier set of rules works by
identifying the k-nearest associates to a
given information point, making
predictions based totally on the bulk
magnificence amongst these friends.. In our
project, the KNeighborsClassifier assesses
resumes by considering similarities in the
skill sets, experience, and qualifications of Fig 2. Onev/sRest Classifier
candidates. By leveraging the algorithm's
ability to identify patterns and similarities
among resumes, the model aids in 5. Results and Evaluation Metrics
categorizing candidates into relevant job Skill-centric Standardization:
roles. This facilitates the optimization of the
screening process, ensuring that candidates Developed a skill-centric approach to
are matched with positions that align with standardize the representation of skills,
their expertise. facilitating a more cohesive and
interpretable output. The unique skill sustained efficiency and relevance over
numbers served as a robust mechanism to prolonged periods.
streamline communication and
understanding across different
stakeholders. Improved Decision Support:
The combination of unique skill numbers,
job category segmentation, and visual
Enhanced Model Interpretability:
insights empowered recruiters with
Achieved enhanced model interpretability improved decision support tools. Recruiters
by associating unique numbers with each could make informed decisions, backed by
skill, simplifying the complexity of the data-driven insights, leading to more
output. Stakeholders could easily interpret successful candidate matches and enhanced
and prioritize specific skills, improving hiring outcomes.
decision-making during the hiring process.
Accuracy Score
Job Category Segmentation:

Implemented a precise segmentation of
resumes into different job categories based
on the model predictions. This
segmentation allowed recruiters to focus on
specific talent pools, optimizing their
efforts and expediting the hiring workflow. Fig 3. Accuracy Score
Quantitative Skill Insights: 6. Conclusion

Provided quantitative insights into the The "Resume Screening through Machine
frequency and importance of each skill and Learning and Natural Language
within the dataset through visualizations. Processing" project emerges as a strategic
Recruiters gained valuable information response to the identified challenges within
about skill trends, aiding in strategic the talent recruitment paradigm. The
workforce planning and skill development project set out with a clear vision to
initiatives. revolutionize traditional hiring practices,
and its success in meeting these objectives
underscores a paradigm shift in how
Adaptability and Evolution: organizations approach candidate selection.
By using new technologies such as Natural
Ensured the adaptability of the model to
Language Processing (NLP), machine
evolving job market dynamics through
learning, the project improved the talent
continuous learning from user feedback.
acquisition process to new heights. The
The system's capacity to evolve with
introduction of unique numerical identifiers
changing hiring trends contributed to its
for skills not only standardized the
representation but also enhanced
transparency. The integration of NLP and
machine learning techniques streamlined
the entire hiring pipeline. The meticulous
data preparation, skill extraction, and
model training processes, supported by
Pandas, NumPy, and Scikit-Learn, ensured
that recruiters were equipped with a finely
tuned system capable of handling large
volumes of resumes efficiently.
Fig 6. Identifying Different Resume
7. References
[1] K, Tejaswini, V, Umadevi, Kadiwal,
Fig 4. Frequency Shashank M., and Revanna, Sanjay. 2019.
"Resume Screening Using Machine
Learning and NLP: A Proposed System." In
IEEE Xplore, 2019. DOI:
10.1109/ICNTE44896.2019.8945869.
[2] Singh, Anjali, and Mishra, Dr. P.K.

2022. "Resume Screening Classification
using Artificial Intelligence and NLP."
International Journal of Recent Innovations
in Computer Science and Engineering.
DOI: 10.5281/zenodo.5533827.
Fig 5. Categorization of Resume
[3] Roy, Pradeep Kumar. 2019. "A Machine

learning approach for automation of resume
recommendation system." In ICCIDS 2019.
Procedia Computer Science. DOI:
10.1016/j.procs.2020.03.284.
[4] Barrett, Aldo Usama, Iqbal,

Fig 6. Web Page of Resume Screening Muhammad, Ismail, Nooraini, Nauman,
Muhammad, and Mohd Yusof, Noor
Shahida. 2021. "Automated Resume
Screening System Using Natural Language
Processing and Similarity-Based
Matching." IEEE Access. DOI:
10.1109/ACCESS.2021.3124773.

Resume Screening Report (1) - Merged

Uploaded by

Copyright:

Available Formats

Resume Screening Report (1) - Merged

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Resume Screening Report (1) - Merged

Uploaded by

Copyright:

Available Formats

Resume Screening through Machine Learning and

Natural Language Processing

A MINOR PROJECT REPORT

Under the Guidance of

Mr. Ashish Sharma and Dr. Sandeep Tayal

Department of Computer Science & Engineering

Maharaja Agrasen Institute of Technology

PSP area, Sector – 22, Rohini, New Delhi – 110085

(Affiliated to Guru Gobind Singh Indraprastha, New Delhi)

I approve this MINOR project for submission.

Prof. Namita Gupta Guide/Co-Guide Name with Designation

It gives me immense pleasure to express my deepest sense of gratitude and sincere

I also wish to express my indebtedness to my parents as well as my family

Place: Delhi Taniya Sharma (20214802720)

S. No. TOPIC NAME PAGE NUMBER

Fig Number Figure Page Number

1.1 Contextual Background

• Candidates, who have been hired:

• Client companies, who are hiring the candidates:

2.1 Language Models

KNeighborsClassifier plays a crucial role in predicting the suitability of candidates

2.2.1 Input Layer

2.2.2 Preprocessing Layer

2.2.3 Feature Extraction

2.2.5 Optimization Layer

2.2.6 Output Layer

2.2.7 Feedback Mechanism

Fig.1 Architecture Diagram

3.1. Overview of KNeighbourClassifier

The KNeighborsClassifier in our resume screening model employs a proximity-based

3.2 Advantages of KNeighbourClassifier:

3.2.1 The KNeighborsClassifier offers a distinctive advantage in resume screening by

3.3.1 Adaptive Categorization:

3.3.2 Nuanced Skill Matching:

3.3.3 Versatility Across Job Roles:

3.3.4 Context-Aware Screening:

3.5 Advantages of OneVsRestClassifier:

3.5.1 Multiclass Classification Handling

3.5.2 Versatility and Flexibility

3.5.3 Improved Accuracy and Precision

3.6.1 Multiclass Classification

3.6.2 Versatility in Skill Matching

3.6.3 Adaptability to Evolving Job Requirements

4.1 Data Gathering:

Source Selection: Acquire a comprehensive dataset of 1000 resumes from Kaggle,

Fig 2. Data Gathering

4.2.2 Contact Information Removal: Eliminate personal contact details such as

4.2.3 Noise Reduction: Clean the text by removing non-essential information,

Fig 3. Text Cleaning

4.3.1 Entity Identification: Apply Named Entity Recognition (NER) techniques to

4.3.2 Skill Extraction: Develop a mechanism to specifically extract and categorize

4.4 Document Vectorization:

4.4.1 TF-IDF Vectorization: Utilize TF-IDF vectorization to convert the

4.4.2 Word Embeddings: Integration of Pre-trained Embeddings: Enhance the

1.1 Data Handling

1.2 Feature Engineering

1.3 Data Exploration

1.4 Data Transformation

2.1 Numerical Operations