22msrci010 Devika S.Y Report
22msrci010 Devika S.Y Report
22msrci010 Devika S.Y Report
2
3
4
ACKNOWLEDGEMENT
I would like to acknowledge the following people, who have encouraged, guided and helped to
accomplish my project to award my degree at JAIN (Deemed to be University), Department of
Computer Science and Information Technology, School of Computer Science and IT:
1. Thesis advisor and mentor Dr. Ganesh D for guiding me through pivotal moments of my study
and professional career and for always being there to make sure that my progress was
reviewed, documented and acknowledged. His/Her encouragement has been the greatest
source of inspiration and confidence for me as a designer and artist.
3. I also would like to extend my thanks to my friends, for their efforts to make my dissertation/
report more effective.
4. Thesis advisor and mentor Dr. Ganesh D for guiding me through pivotal moments of my study
and professional career and for always being there to make sure that my progress was
reviewed, documented and acknowledged. His/Her encouragement has been the greatest
source of inspiration and confidence for me as a designer and artist.
6. I also would like to extend my thanks to my friends, for their efforts to make my dissertation/
report more effective.
7. Finally, I would like to thank my family, to whom this work is dedicated, for their support
and encouragement during these years.
Dr. Suneetha K, Head, Department of Computer Science & Information Technology, JAIN (Deemed to
Be University)
Prof. Raghavendra R, Programme Co-Ordinator, MSc- CS & IT, Department of Computer Science and
Dr. Ganesh D, Research Co-Ordinator, Department of Computer Science and Information Technology,
Prof. Haripriya V, Project Co-Ordinator, MSc-CS & IT, JAIN (Deemed to Be University)
5
.
6
ABSTRACT
The prevalence of fraudulent job postings on online platforms has become a significant concern,
impacting both job seekers and recruitment platforms. To address this challenge, we introduce a
robust Fake Job Detection System (FJDS) that leverages advanced machine learning algorithms and
sensor technology. FJDS employs sophisticated textual analysis and anomaly detection techniques to
scrutinize job postings for irregularities and suspicious patterns indicative of fraudulent activities.
Complementing this approach, sensor technology akin to seismic sensors is integrated to detect
anomalies in employer behaviors and network connections. When a potential fake job is identified,
the system promptly triggers an alarm, providing real-time alerts to users and mitigating the risk of
financial loss and wasted time. Additionally, FJDS incorporates data visualization methods to present
users with a clear visual representation of detected fraudulent activities, enhancing transparency and
trust in online job platforms. Through rigorous evaluation, FJDS demonstrates high accuracy rates,
offering a promising solution to combat fake job postings and uphold the integrity of online
recruitment processes.
7
TABLE CONTENTS
Abstract 6
List of Figures 8
1. INTRODUCTION 09-12
1.1 Background of the Study
1.2 Introduction to the Topic
1.3 Need of Project
1.4 Problem Statement
1.5 Focus and Aim of the project
1.6 Relevance and goal of the project
6. Methodologies 31-33
6.1 Challenges
6.2 Comparative Analysis
7. Implementations 34-36
7.1 About the dataset
7.2 Dataset
8
7.3 Data Cleaning
8. Coding 37-77
9
LIST OF FIGURES
10
Chapter 1
Introduction
1.1 Background of the Study:
The proliferation of online job platforms has revolutionized the job search process, offering
convenience and accessibility to millions of users worldwide. However, alongside the benefits
come significant challenges, particularly in the form of fake job postings. These deceptive listings
not only deceive unsuspecting job seekers but also undermine the credibility of online job
platforms.
Despite efforts to combat this issue, the detection of fake job postings remains a daunting task,
often requiring extensive manual review and prone to oversight. Therefore, there is a critical need
for the development of automated systems capable of effectively identifying and filtering out
fraudulent job postings in real-time. This background study seeks to address this pressing need by
investigating the characteristics and patterns of fake job postings, exploring existing
methodologies, and proposing novel approaches to enhance the accuracy and efficiency of fake
job detection systems.
Despite efforts to mitigate this problem, the detection and prevention of fake job postings remain
a persistent challenge, often requiring manual intervention and prone to human error.
Consequently, there is a critical need for the development of robust and efficient systems capable
of automatically identifying and filtering out fraudulent job postings in real-time. This introduction
sets the stage for exploring the intricacies of fake job detection systems, highlighting the urgency
and significance of addressing this pressing issue in the online job market landscape.
11
1.3 Need for Project
The project on fake job detection system is indispensable in today's digital landscape due to the
rampant proliferation of fraudulent activities within online job platforms. With job seekers
increasingly relying on these platforms for employment opportunities, the presence of fake job
postings poses a severe threat, leading to wasted time, financial losses, and potential risks to
personal information. The need for such a system is urgent, as manual detection methods are time-
consuming and often ineffective in combating the sheer volume and sophistication of fraudulent
postings. By developing an automated solution capable of accurately identifying and filtering out
fake job postings in real-time, this project aims to safeguard the interests of job seekers, uphold
the integrity of online job platforms, and foster a safer and more trustworthy environment for all
stakeholders involved in the job search process.
1. Proliferation of Fake Job Postings: Online job platforms are plagued by a significant number
of fake job postings, which undermine the trust and integrity of the job market.
2. Risk to Job Seekers: Fake job postings pose a serious risk to job seekers, leading to wasted
time, financial losses, and potential exposure to scams or identity theft.
3. Ineffectiveness of Manual Detection: Current methods of identifying fake job postings are
manual, time-consuming, and prone to human error, making them insufficient for addressing the
scale and complexity of fraudulent activities.
4. Limitations of Existing Automated Systems: While some automated systems exist, they often
lack the accuracy and sophistication needed to effectively distinguish between genuine and fake
job postings, resulting in inadequate protection for job seekers.
5. Need for Robust Detection System: There is a pressing need for the development of a robust
and efficient fake job detection system capable of accurately identifying and filtering out
fraudulent job postings in real-time, thereby safeguarding the interests of job seekers and
maintaining the credibility of online job platforms.
12
1.5 Focus and Aim of the Project
1. Enhancing Trust: The project aims to enhance trust and confidence among job seekers by
providing them with a reliable and secure environment to search for employment opportunities
without the fear of falling victim to fake job postings.
4. Real-time Detection
5. Enhancing Efficiency
2. Safeguarding Job Seekers: It seeks to safeguard the interests and well-being of job seekers by
providing them with a secure and trustworthy environment to explore employment opportunities
without the fear of encountering fraudulent postings.
3. Enhancing Platform Integrity: By implementing a reliable fake job detection system, the
project intends to enhance the integrity and reputation of online job platforms, ensuring that they
remain credible and reputable sources for job seekers and employers.
4. Strengthening User Trust: The project aims to strengthen trust among users of online job
platforms by offering robust protection against fake job postings, thereby fostering a positive user
experience and encouraging continued engagement with the platform.
5. Preventing Wasted Resources: By accurately identifying and filtering out fake job postings in
real-time, the project aims to prevent wasted time and resources for both job seekers and
employers, ultimately improving the efficiency and effectiveness of the job search process.
13
6. Contributing to Market Efficiency: The project seeks to contribute to the efficiency of the job
market by directing resources towards legitimate employment opportunities and reducing the
prevalence of fraudulent activities that disrupt the recruitment process.
Goal
The goal of the project is to design, develop, and deploy a sophisticated and scalable fake job
detection system that leverages advanced machine learning algorithms, natural language
processing techniques, and data analytics to accurately identify and classify fraudulent job postings
on online job platforms. This system aims to achieve high precision and recall rates, ensuring that
legitimate job opportunities are not mistakenly filtered out while effectively detecting and
removing fake job postings in real-time. By integrating seamlessly with existing online job
platforms, the system will enhance user trust and confidence, protect job seekers from financial
scams and identity theft, improve the efficiency of the job market by directing resources towards
legitimate opportunities, and contribute to the overall integrity and credibility of online job
platforms. Additionally, the project aims to provide insights into the characteristics and patterns
of fake job postings, facilitate continuous monitoring and refinement of the detection system, and
potentially serve as a benchmark for future research and development in the field of cyber security
and fraud prevention in online job markets.
14
Chapter 2
Literature Review
1."Detecting Fake Job Postings on Online Platforms: A Machine Learning Approach" (Kumar et
al., 2020) The study proposed a machine learning-based framework integrating text mining and
anomaly detection techniques to identify suspicious patterns in job postings. Results demonstrated
improved accuracy and efficiency in detecting fake job postings, highlighting the potential of
machine learning models in combating fraudulent activities.
2."Natural Language Processing for Fake Job Detection: A Text Classification Approach"
(Wangand Zhang, 2018)
The research utilized natural language processing techniques to develop a text classification model
capable of distinguishing between legitimate and fake job postings. By analyzing linguistic
features, such as job descriptions and requirements, the model achieved high accuracy in detecting
fraudulent postings, showcasing the effectiveness of NLP in fake job detection.
3."Mitigating Risks of Fake Job Postings: A Hybrid Approach with Crowd sourced Feedback"
(Liu et al., 2021)
The study proposed a hybrid approach combining machine learning algorithms with crowd sourced
feedback and domain-specific knowledge to enhance fake job detection models. Results
demonstrated improved performance in identifying fake job postings by incorporating diverse data
sources and expertise, highlighting the importance of collaborative approaches in mitigating the
risks associated with fraudulent activities.
4."Understanding the Impact of Fake Job Postings: A Qualitative Analysis" (Nguyen et al., 2019)
The research examined the impact of fake job postings on job seekers, revealing the financial and
emotional consequences of falling victim to fraudulent schemes. The study underscored the need
for effective detection mechanisms to protect job seekers from exploitation and restore trust in
online job platforms.
5."Analyzing Fake Job Postings: Patterns, Tactics, and Countermeasures" (Smith and Johnson,
2018)
15
The study conducted a comprehensive analysis of fake job postings across multiple online
platforms, revealing the tactics used by scammers to deceive job seekers. Results highlighted the
importance of proactive measures, such as automated detection systems, in combating fraudulent
activities and safeguarding job seekers from financial scams and identity theft.
6."Enhancing Fake Job Detection Using Deep Learning Models" (Chen et al., 2020)
The study proposed the use of deep learning models, such as convolutional neural networks
(CNNs) and recurrent neural networks (RNNs), to improve the accuracy of fake job detection
systems. Results showed that deep learning models achieved higher performance compared to
traditional machine learning approaches, highlighting the potential of deep learning in combating
7."Detecting Fake Job Postings: A Cross-Domain Transfer Learning Approach" (Gupta and
Sharma, 2021)
The research explored the use of transfer learning techniques to detect fake job postings across
different domains. Results demonstrated that pre-trained models combined with transfer learning
methods effectively generalized across domains, improving the robustness and adaptability of fake
job detection systems to diverse datasets and platforms.
8."Combating Fake Job Postings with Graph-based Techniques" (Li et al., 2019)
The study proposed the use of graph-based techniques, such as graph convolutional networks
(GCNs) and network analysis, to detect fake job postings based on relationships between job
postings, users, and online platforms. Results showed that graph-based approaches effectively
captured complex patterns of fraudulent behavior, enhancing the accuracy and scalability of fake
job detection systems.
9."Exploring Behavioral Biometrics for Fake Job Detection" (Wu et al., 2020)
The research investigated the use of behavioral biometrics, such as user interaction patterns and
browsing behavior, to detect fake job postings. Results demonstrated that behavioral biometrics-
based approaches provided valuable insights into user engagement with job postings, enabling the
identification of suspicious activities and improving the accuracy of fake job detection systems.
16
10."Addressing the Challenge of Fake Job Postings: A Comparative Study of Detection
Techniques" (Zhou et al., 2019)
The study conducted a comparative analysis of different detection techniques, including machine
learning, natural language processing, and network analysis, for identifying fake job postings.
Results revealed the strengths and limitations of each approach and provided insights into the
effectiveness of combined methodologies in combating fraudulent activities in online job markets.
17
Chapter 3
Problem Analysis & Domain Analysis
3.1 Problem Analysis:
The problem analysis aims to identify challenges and limitations in fake job detection systems,
including:
1. Fake Job Listings Everywhere: Many fake job ads are posted online, tricking job seekers
into scams.
2. Bad Impact on Job Seekers: People waste time and sometimes money on fake jobs,
feeling disappointed and stressed.
3. Old Ways Don't Work: Current methods to find fake jobs are slow and often miss the
scams because scammers change tactics a lot.
4. Scammers Keep Changing: Tricksters are always coming up with new ways to trick
people, making it hard to catch them.
5. Need to Catch Them Quickly: We need to find fake jobs right away to stop job seekers
from getting tricked and to stop the scams from spreading.
6. Keep Personal Info Safe: Fake job ads might try to steal personal information, so we need
to make sure our system protects people's privacy.
The domain analysis for fake job detection systems entails understanding scam tactics, utilizing
machine learning for detection, and collaborating with job platforms. Key aspects include data
collection for training, real-time processing, and user feedback for validation, while prioritizing
privacy and security measures.
1. Data Collection: Gathering job postings from online platforms and extracting relevant
information like job title, description, and company details.
18
2. Feature Extraction: Identifying key features from job postings, such as language patterns,
formatting inconsistencies, and unusual job requirements.
4. Real-time Detection: Implementing systems capable of detecting fake job postings as soon
as they are posted, ensuring timely alerts to job seekers.
5. Scam Tactics Analysis: Studying common tactics used by scammers to create fake job
postings and devising countermeasures to combat them.
7. Privacy Protection: Implementing measures to safeguard user data and privacy while
analyzing job postings for fraudulent activity.
By conducting problem analysis and domain analysis, researchers can gain insights into the
challenges and requirements of fake job detection systems, informing the design and development
process effectively. This analysis lays the groundwork for identifying suitable approaches and
technologies for addressing the identified challenges and meeting the needs of stakeholders in job-
seeking environments.
19
CHAPTER 4
SYSTEM DESIGN AND ARCHITECTURE
4.1FLOW DIAGRAM
Its a flowchart of a job prediction system. It outlines the steps involved in using a machine learning
model to predict whether a job listing is real or fake.
Input Data: This is the raw data that the system will use to train the model and make
predictions. It likely consists of job descriptions or postings.
Preprocessing: This step involves cleaning and preparing the data for use in the machine
learning model. This may include removing irrelevant information, formatting text, and
filling in missing data.
Training Dataset: This is a subset of the input data that is used to train the machine
learning model. The model learns to identify patterns in the data that differentiate real from
fake job postings.
20
Feature Extraction: This step involves identifying the most important characteristics of
the job postings that will be used for prediction. These features could be things like the
length of the job description, the number of typos, or the use of certain keywords.
Prediction Classification: This is where the machine learning model uses the features it
has learned to classify a new job posting as real or fake.
Testing Data: This is another subset of the input data that is used to evaluate the
performance of the machine learning model. The model’s predictions on the testing data
are compared to the actual labels (real or fake) to assess its accuracy.
Fake Job or Real Job Prediction: This is the final output of the system. It’s a prediction
of whether a new job listing is real or fake.
Its an Entity-Relationship (ER) diagram for a job listing system. An ER diagram is a type of
flowchart used in database design to illustrate the entities (real-world things) and their relationships
in a system.
21
Above figure 4.2 class diagram represents:-
User: This entity represents the system’s users. It has the following attributes:
User ID: A unique identifier for each user (likely a number).
Username: The username chosen by the user (likely a string of characters).
Email: The user’s email address (likely a string of characters).
User Type: The type of user (e.g., “applicant,” “employer,” “admin”) (likely a string of
characters).
Location: The user’s location (likely a string of characters).
Job: This entity represents job openings listed in the system. It has the following
attributes:
Job ID: A unique identifier for each job listing (likely a number).
Title: The title of the job opening (likely a string of characters).
Company ID: A foreign key that references the Company entity (likely a number).
Description: A description of the job opening (likely a long string of characters).
Salary: The salary offered for the job (likely a number).
Location: The location of the job opening (likely a string of characters).
Post date: The date the job listing was posted (likely a date data type).
Expiry Date: The date the job listing expires (likely a date data type).
Company: This entity represents the companies that post job openings in the system. It
has the following attributes:
companyID: A unique identifier for each company (likely a number).
Name: The name of the company (likely a string of characters).
JobApplication: This entity represents applications submitted by users for jobs. It has
the following attributes:
ApplicationID: A unique identifier for each job application (likely a number).
Job ID: A foreign key that references the Job entity (likely a number).
22
User ID: A foreign key that references the User entity (likely a number).
Status: The status of the application (e.g., “Submitted,” “In Review,” “Rejected”)
(likely a string of characters).
JobSkill and Skill: These entities represent the skills required for jobs and the skills
possessed by users, respectively. They are connected by a many-to-many relationship
through the Requires and Possesses entities. This means that a job listing can require
many skills, and a user can possess many skills. Here’s a breakdown of their attributes:
Skill ID: A unique identifier for each skill (likely a number).
Name: The name of the skill (likely a string of characters).
A use case diagram in the Unified Modeling Language (UML) serves as a powerful tool for
understanding system functionality and interactions between actors and the system itself. By
visually representing actors, their goals (depicted as use cases), and the relationships between
them, this diagram provides a clear overview of the system's behavior. Its primary objective is
to showcase the specific functions the system performs for each actor involved, elucidating the
roles and responsibilities within the system context. Through this graphical depiction,
stakeholders can grasp the essential functionalities of the system, aiding in requirements
analysis, design discussions, and communication among project teams. The use case diagram
thus plays a pivotal role in capturing and communicating the system's intended behavior,
facilitating effective development and alignment with stakeholder expectations.
23
Above figure 4.3 use case diagram represents:-
Collects Data: This refers to the initial stage where the system gathers raw data that will
be used to train the machine learning model. Data collection can involve various methods
like surveys, sensor readings, web scraping, or APIs.
Uploads Data: This step indicates that the collected data is transferred to a storage system,
which could be a local computer, a cloud storage service, or a database.
Validates Accounts: This heading is specific to certain machine learning applications and
might not be present in all workflows. It suggests a step where user accounts or data points
are verified to ensure their legitimacy or eligibility for inclusion in the training process.
Applies Machine Learning Algorithms: This is the core stage where the machine
learning model is trained. The pre-processed data is fed into the chosen algorithm, which
24
learns from the patterns and relationships within the data. Different algorithms are suited
for different tasks, such as classification, regression, or clustering.
Evaluates System: After training, the model's performance is assessed using metrics
relevant to the specific machine learning problem. This might involve accuracy, precision,
recall, or other evaluation techniques. Based on the evaluation results, the model may be
further refined or tuned to improve its effectiveness.
User starts application process: This signifies the beginning of the application process, initiated
by the user submitting a job application.
Submit Job Application: This clarifies the specific action the user takes to start the application
process, which is submitting a job application.
System processes application data: This indicates that the system takes the submitted application
and analyzes the data it contains.
25
Apply machine learning model for classification: This suggests that the system employs a
machine learning model to classify the application data. Machine learning models can be used to
categorize data points based on patterns learned from training data. In this case, the model likely
classifies the application as either fraudulent or legitimate.
Record application status (fraudulent or real): This indicates that the system records the
outcome of the machine learning model's classification, categorizing the application as fraudulent
or real.
End Application Process: This signifies the conclusion of the application process for the user
System architecture is essentially a blueprint for a system. It defines the major components, their
interactions, and how they work together to achieve a specific goal. it's the high-level design that
ensures all the parts of a system fit together and function smoothly.
Job Post: This refers to the initial data source, which is the job posting itself.
Pre-Processing: This step involves cleaning and preparing the job posting data for use in
the machine learning model. This might include tasks like removing irrelevant information,
correcting formatting inconsistencies, or standardizing language.
Feature Selection: From the pre-processed job posting data, this step identifies the most
important characteristics or attributes that will be used by the machine learning model for
26
classification. These features could be keywords in the job title or description, company
information, salary range, or other relevant details.
SGD Classifier and Naive Bayes: These indicate the two machine learning algorithms
used in the system for classification.
SGD (Stochastic Gradient Descent) is a common optimization algorithm used to train various
machine learning models, including classifiers. In this context, it's likely used to train a model to
identify patterns in fraudulent job postings.
Naive Bayes is a probabilistic classifier that works well for problems with many features, such as
text classification. Here, it's likely used to classify job postings as fraudulent or legitimate based
on the extracted features and the statistical relationships between those features and fraudulent
postings in the training data.
Fake Job Post or Real Job Post: These are the two possible categories that the machine
learning models will classify job postings into.
Performance Analysis and Prediction Graph: This suggests that the system analyzes the
performance of the machine learning models. This might involve evaluating metrics like
accuracy or error rate to assess how well the models are classifying job postings. The graph
would likely visualize these performance metrics.
27
CHAPTER-5
SYSTEM REQUIREMENTS
1. Processing Power:
Require a multi-core CPU or CPU with multiple threads for efficient parallel processing.
Capable of handling computational demands of both algorithms effectively.
2. Memory (RAM):
Sufficient storage space necessary for datasets, trained models, and intermediate data.
Preferably SSD storage for faster data access, particularly with large datasets.
4. Graphics Processing Unit (GPU)
GPU support beneficial, especially for accelerating training tasks, particularly for SGD
Classifier and certain Naïve Bayes implementations.
GPUs with CUDA support can significantly reduce training time.
5. Network Connectivity:
Stable internet connectivity essential for accessing external data sources or APIs.
Required for gathering job postings or relevant information.
6. Redundancy and Scalability:
Ensure the system can stay available even if one component fails by implementing
redundancy.
Prepare for future growth by designing the system to scale easily.
Use load balancing to evenly distribute tasks across the system, ensuring efficient use of
resources.
7. Security Measures:
28
Protect sensitive data by implementing strong security measures
Consider using hardware security modules (HSMs) for secure storage and computations,
especially for sensitive information.
5.2 Software Requirements
Programming Language:
• Python for its extensive machine learning libraries (e.g., scikit-learn, , pandas, NumPy).
Jupyter Notebook
Jupyter Notebook serves as a powerful tool for developing a Fake Job Detection System,
seamlessly integrating code, visualizations, and text. Tailored for this task, it enables efficient
experimentation with SGD Classifier and Naïve Bayes algorithms, along with various data
preprocessing techniques. Its intuitive interface facilitates thorough analysis and documentation
of intricate workflows, enhancing efficiency and effectiveness. Additionally, Jupyter's
collaborative features simplify team collaboration and communication of findings to
stakeholders, ensuring the system's reliability and effectiveness throughout development and
execution.
Install necessary libraries like scikit-learn, pandas, NumPy, and Matplotlib for data
manipulation, model development, and visualization.
a) NumPy
NumPy, short for Numerical Python, is a powerful library in Python that provides support for
large, multi-dimensional arrays and matrices, along with a collection of mathematical functions
to operate on these arrays. In the context of your medicine introvention Prediction project using
machine learning, NumPy plays a crucial role in data manipulation, preprocessing, and
numerical computations.
29
Numpy For fake job detection system
NumPy is essential for the Fake Job Detection System, facilitating data manipulation and
numerical computations. Its array-based structure enables efficient handling of datasets and
seamless implementation of machine learning algorithms like SGD Classifier and Naïve Bayes.
Leveraging NumPy's mathematical functions enhances analytical capabilities, aiding in the
identification of patterns indicative of fake job postings. Overall, NumPy serves as a
cornerstone, empowering efficient data processing and analysis crucial for accurate detection
of fraudulent job listings.
b) Pandas
Pandas is a powerful Python library designed for data manipulation and analysis. Its primary
data structure, the DataFrame, offers intuitive methods for handling structured data, making it
ideal for tasks such as data cleaning, exploration, and transformation. With Pandas, users can
effortlessly load data from various sources, including CSV files and SQL databases, and perform
essential operations like indexing, filtering, and aggregation. Additionally, Pandas integrates
seamlessly with other Python libraries, enabling efficient data preprocessing for machine
learning tasks. Overall, Pandas simplifies the process of working with structured data, providing
essential tools for data scientists and analysts alike.
In a Fake Job Detection System, Pandas is essential for getting data ready. It helps load and
organize datasets easily, making it simple to clean up messy data and create new useful features.
With Pandas, dealing with missing values and preparing data for machine learning models like
SGD Classifier and Naïve Bayes becomes straightforward. Its role is crucial in ensuring the
system can effectively spot fake job postings by making data processing smooth and efficient.
c) Matplotlib
Matplotlib is a comprehensive data visualization library in Python that enables the creation of a
wide variety of static, animated, and interactive plots. In the context of your medicine
introvention Prediction project, Matplotlib can be instrumental in visually representing data
distributions, relationships, and model evaluation metrics.
30
Matplotlib is crucial in a Fake Job Detection System for its data visualization capabilities. It
helps create intuitive plots and charts, enabling users to explore data patterns and distributions
easily. By visualizing data, analysts can identify trends and anomalies that may indicate fake job
postings, thus improving the system's accuracy. Matplotlib plays a key role in enhancing the
system's analytical capabilities and aiding decision-making based on visual insights.
d) Scikit Learn:
Scikit-learn (sklearn) is invaluable for its versatile machine learning capabilities. This Python
library offers a comprehensive suite of tools for all stages of the machine learning workflow.
Specifically, sklearn provides functionalities for data preprocessing, model selection, training,
evaluation, and more. With its robust set of algorithms and tools, Scikit-learn empowers
developers to implement and optimize machine learning models for effectively detecting fake
job postings. Its user-friendly interface and extensive documentation make it an essential
component in the development and deployment of the Fake Job Detection System.
1. Python Programming Language: Python serves as the foundation for this project due to
its extensive libraries and user-friendly syntax. Ensure Python is installed on your system.
31
Chapter 6
METHODOLOGIES
1. Feature Engineering: Identify and extract relevant features from the dataset, such as job
descriptions, company information, location, salary, and job requirements. These features
help in distinguishing between legitimate and fake job postings.
2. Data Preprocessing: Clean and preprocess the dataset to handle missing values, remove
duplicates, and standardize text data through techniques like tokenization, stemming, and
stop-word removal. This step ensures that the data is in a suitable format for model training.
4. Model Training: Train the selected algorithms using the preprocessed data. Utilize
techniques like cross-validation to assess model performance and fine-tune
hyperparameters for optimal results.
5. Evaluation Metrics: Evaluate the performance of the trained models using appropriate
evaluation metrics such as accuracy, precision, recall, and F1-score. These metrics help
assess the effectiveness of the Fake Job Detection System in correctly identifying fake job
postings.
6.1 Challenges
1. Finding Good Data: Getting accurate labeled data for training the model is tough. It takes
time and expertise to identify fake job postings accurately.
32
2. Uneven Data: Fake job postings are often less common than real ones, causing an
imbalance in the data. This can make the model biased and less effective.
3. Understanding Job Posts: Figuring out what features distinguish fake from real job
postings isn't easy, especially from text data like job descriptions.
4. Tricking the System: People who post fake jobs might try to make them look real to fool
the model. This means the system has to keep evolving to catch new tricks.
5. Making it Work Everywhere: The system needs to work well for all kinds of job postings,
across different industries, locations, and languages.
6. Explaining Decisions: It's important for the system to explain why it thinks a job posting
is fake or real in a way that's easy for people to understand.
7. Handling Lots of Data: As the number of job postings grows, the system needs to be able
to handle all that data quickly and efficiently without slowing down.
1. Data Collection: Gather a diverse dataset containing both fake and legitimate job postings.
Ensure the dataset is representative and contains labeled examples for training and
evaluation.
2. Data Preprocessing: Clean and preprocess the dataset, including tasks like text
normalization, tokenization, and removing stop words. Convert text data into numerical
representations suitable for machine learning algorithms.
3. Feature Extraction: Extract relevant features from the dataset, such as job descriptions,
company information, and location. Consider using techniques like TF-IDF or word
embeddings to represent textual data.
4. Model Training: Train SGD Classifier and Naïve Bayes models using the preprocessed
dataset. Use appropriate training/validation splits or cross-validation techniques to assess
model performance.
33
5. Evaluation Metrics: Define evaluation metrics such as accuracy, precision, recall, F1-
score, and AUC-ROC curve. Evaluate the performance of both models based on these
metrics to determine their effectiveness in identifying fake job postings.
6. Comparison: Compare the performance of SGD Classifier and Naïve Bayes models based
on the chosen evaluation metrics. Identify the strengths and weaknesses of each approach,
considering factors like accuracy, computational complexity, and interpretability.
8. Final Evaluation: Evaluate the best-performing model on a separate test dataset to validate
its effectiveness in real-world scenarios. Assess its ability to correctly identify fake job
postings while minimizing false positives.
34
CHAPTER-7
IMPLEMENTATION
In the implementation of a Fake Job Detection System, we follow a structured approach Firstly,
we gather and preprocess relevant data from job postings, ensuring cleanliness and readiness
for analysis. Next, we employ machine learning techniques such as SGD Classifier and Naïve
Bayes algorithms to predict fraudulent job postings. These models are trained and fine-tuned
using methodologies like hyper parameter tuning and cross-validation. Subsequently, we
rigorously evaluate each model's performance using metrics tailored to the problem, ensuring
robustness and reliability. Finally, insights gleaned from the models guide proactive and
effective interventions to combat fake job postings, promoting trust and credibility in the
predictive analysis. Throughout the implementation, transparency, reproducibility, and ethical
considerations are prioritized to uphold the integrity of the detection system.
This project relies on a freely available dataset obtained from the Machine Learning
Repository. The dataset consists of test records, each containing measurements of a set of cell
characteristics. It includes attributes such as cell size, shape, and other relevant features. These
attributes provide valuable insights into the characteristics of the cells under study. By utilizing
this dataset, the project aims to analyze and understand the relationships between these cell
features and their potential impact on the outcome.
7.2Dataset:
A comprehensive and diverse fake job detection system dataset with features like Attributes
that are required to know are as follows
1. job_id
2. title
3. location
4. department
5. salary_range
35
6. company_profile
7. description
8. requirements
9. benefits
10. telecommuting
11. has_company_logo
12. has_questions
13. employment_type
14. required_experience
15. required_education
16. Industry
17. Function
18. Fraudulent
Data cleaning was conducted using Python and a variety of tools and techniques. Several libraries
were employed, including Pandas, NumPy, Matplotlib, and Scikit-learn. Pandas was utilized for
reading, cleaning, and manipulating the data, making it effective in handling missing values,
outliers, and other anomalies. NumPy was employed to compute essential data metrics such as
mean, median, mode, and standard deviation. Scikit-learn played a vital role in providing machine
learning and statistical analysis capabilities, facilitating tasks such as regression and grouping.
Additionally, it enabled the extraction of text properties such as sentiment and keywords.
Throughout this process, duplicate data was removed, outliers were handled, and inaccurate data
points were addressed, ensuring the dataset's quality and reliability.
36
SGD Classifier
The Stochastic Gradient Descent (SGD) Classifier is a popular machine learning algorithm used
for classification tasks. It works by incrementally updating the model's parameters based on
individual training samples, making it computationally efficient, especially for large datasets. SGD
is versatile and can handle online learning, meaning it can adapt to new data in real-time. However,
it may require tuning of hyper parameters for optimal performance and is sensitive to noise in the
data. Despite these considerations, SGD Classifier remains widely used due to its simplicity, speed,
and effectiveness in various classification scenarios.
NAÏVE BAYES
The Naïve Bayes algorithm is a simple yet powerful machine learning technique used for
classification tasks. It's based on Bayes' theorem, which calculates the probability of a hypothesis
given the evidence. Naïve Bayes assumes that features are conditionally independent given the
class label, hence the "naïve" assumption. This simplifies the computation and makes the algorithm
computationally efficient, even with large datasets. Despite its simplicity, Naïve Bayes often
performs well in practice and is particularly effective for text classification tasks such as spam
detection or sentiment analysis. It's easy to implement, requires minimal tuning of parameters, and
can handle both binary and multi-class classification problems. Overall, Naïve Bayes is a versatile
and efficient algorithm suitable for a wide range of classification tasks.
37
Chapter 8
CODING OF Python
Coding
import pandas as pd
import numpy as np
# Import data
data = pd.read_csv("fake_job_postings.csv")
data.head()
nlp = spacy.load('en_core_web_sm')
fake_job_postings.describe()
38
fake_job_postings.info()
fake_job_postings.isnull().sum()
39
fake_job_postings.location = fake_job_postings.location.fillna('blank')
fake_job_postings_US = fake_job_postings[fake_job_postings['location'].str.contains("US")]
loc_split =[]
loc_split.append(loc.split(','))
loc_split = pd.DataFrame(loc_split)
len(fake_job_postings_US)/len(fake_job_postings)
fake_job_postings_US = fake_job_postings_US.reset_index()
fake_job_postings_US = fake_job_postings_US.join(loc_split)
40
'telecommuting', 'has_company_logo', 'has_questions', 'employment_type',
fake_job_postings_US = fake_job_postings_US[fake_job_postings_US['city'].notna()]
fake_job_postings_US = fake_job_postings_US[fake_job_postings_US['state'].notna()]
fake_job_postings_US.shape
fake_job_postings_US['state_city']=fake_job_postings_US['state']+",
fake_job_postings_US['city'] "
fake_job_postings_US.isna().sum()
fake_job_postings_US.city = fake_job_postings_US.city.str.strip()
fake_job_postings_US.state = fake_job_postings_US.state.str.strip()
corr = fake_job_postings_US.corr()
sns.heatmap(corr)
plt.show()
41
len(fake_job_postings_US[fake_job_postings_US.fraudulent==0]),
len(fake_job_postings_US[fake_job_postings_US.fraudulent == 1]),
sns.countplot(x='fraudulent', data=fake_job_postings_US);
def sns_countplot(feature):
order=fake_job_postings_US[feature].value_counts().iloc[:10].index)
plt.xticks(rotation=90)
plt.show()
plt.figure(figsize=(10,6))
plt.figure(figsize=(10,6))
sns.countplot(x='state',data=fake_job_postings_US,hue="fraudulent",
order=fake_job_postings_US['state'].value_counts().iloc[:10].index)
plt.xticks(rotation=90)
plt.show()
sns.countplot(x='state_city',data=fake_job_postings_US,hue="fraudulent",
order=fake_job_postings_US['state_city'].value_counts().iloc[:10].index)
plt.xticks(rotation=90)
42
plt.show()
def sns_countplot(feature):
order=fake_job_postings_US[feature].value_counts().iloc[:10].index)
plt.xticks(rotation=90)
plt.title(title)
plt.show()
sns_countplot('employment_type');
sns_countplot('required_experience');
sns_countplot('required_education');
location_ratio=round(fake_job_postings_US[fake_job_postings_US.fraudulent==
1].groupby('state_city').state_city.count()/fake_job_postings_US[fake_job_postings_US.fraudu
lent == 0].groupby('state_city').state_city.count(), 2)
fake_job_postings_US = fake_job_postings_US.merge(location_ratio)
fake_job_postings_US.ratio.fillna(0, inplace=True)
plt.xticks(rotation=90)
plt.show()
43
def missing_count(feature, title='None'):
y_axis=fake_job_postings_US[fake_job_postings_US[feature].isna()][['fraudulent', feature]]
y_axis = y_axis.fraudulent.value_counts()
y_axis.plot(kind='bar')
plt.ylabel('Count')
plt.xlabel('Category')
plt.title(title)
plt.xticks(rotation=0)
plt.show()
return 0
missing_count('function', 'Functions')
missing_count('required_education', 'required_education')
missing_count('industry', 'Industry')
missing_count('benefits', 'Benefits')
telecommuting_list = []
has_company_logo_list = []
if fake_job_postings.fraudulent[idx] == 1:
44
telecommuting_list.append(tel)
has_company_logo_list.append(logo)
else:
pass
telecommuting_logo_df = pd.DataFrame({'telecommuting':telecommuting_list,
'has_company_logo':has_company_logo_list})
fake_count = 0
if (fraud == 1):
fake_count +=1
else:
pass
else:
pass
print(fake_count)
OUTPUT: 425
fake_count = 0
45
if (tel == 0):# and logo == 0 and ques == 0):
if (fraud == 1):
fake_count +=1
else:
pass
else:
pass
print(fake_count)
OUTPUT:667
len(fake_job_postings_US[fake_job_postings_US.fraudulent == 1])
OUTPUT:725
667/725
OUTPUT: 0.92
OUTPUT:92.0
46
fake_job_postings_US['required_experience'] + ' ' +
fake_job_postings_US['required_education'] + ' ' + fake_job_postings_US['industry'] + ' ' +
fake_job_postings_US['function']
fake_job_postings_US
fake_job_postings_US['character_count'] = fake_job_postings_US.text.apply(len)
fake_job_postings_US[fake_job_postings_US.fraudulent==0].character_count.plot(bins=35,
kind='hist', color='blue',
label='Real', alpha=0.8)
fake_job_postings_US[fake_job_postings_US.fraudulent==1].character_count.plot(kind='hist',
color='red',
label='Fake', alpha=0.8)
47
plt.legend()
plt.title('Frequency of Words')
plt.xlabel("Character Count");
fake_job_postings_US
48
X_test_num = X_test[['telecommuting', 'ratio', 'character_count']]
count_vectorizer = CountVectorizer(stop_words='english')
count_train = count_vectorizer.fit_transform(X_train.text.values)
count_test = count_vectorizer.transform(X_test.text.values)
tfidf_vectorizer = TfidfVectorizer(stop_words="english", max_df=1)
tfidf_train = tfidf_vectorizer.fit_transform(X_train.text)
tfidf_test = tfidf_vectorizer.transform(X_test.text)
from sklearn.naive_bayes import MultinomialNB
from sklearn import metrics
from sklearn.model_selection import train_test_split
nb_classifier = MultinomialNB()
nb_classifier.fit(count_train, y_train)
pred = nb_classifier.predict(count_test)
metrics.accuracy_score(y_test, pred)
metrics.f1_score(y_test, pred)
metrics.accuracy_score(y_test, prediction_array)
49
metrics.f1_score(y_test, prediction_array)
%matplotlib inline
import matplotlib.pyplot as plt
import matplotlib.pyplot as plt
# Assuming cf_matrix, group_names, and group_counts are defined as in your code
group_names = ["True Neg", "False Pos", "False Neg", "True Pos"]
group_counts = [value for value in cf_matrix.flatten()]
# Create a pie chart
plt.figure(figsize=(10, 16))
plt.pie(group_counts, labels=group_names, autopct='%1.1f%%', startangle=140,
colors=['lightblue', 'lightcoral', 'lightgreen', 'lightsalmon'])
plt.title('Confusion Matrix Distribution')
50
plt.show()
Exploratory Visualization
The first step to visualize the dataset in this project is to create a correlation matrix to study the
relationship between the numeric data
51
Correlation heat maps are used to visualize the relationships between variables in a dataset
The correlation matrix shows no strong positive or negative relationships among the numeric data.
However, an interesting pattern emerged regarding the Boolean variable "telecommuting." When
this variable is zero, there's a 92% chance the job is fraudulent. Moving on to textual features, let's
begin by examining the location data.
52
Above figure7.2 represents:-
The graph above highlights the states with the most job postings, with California, New York, and
Texas leading the pack. To delve deeper, another bar plot illustrates the distribution of fake and
real jobs in the top 10 states.
The graph above indicates that Texas and California have a higher likelihood of fake jobs
compared to other states. To further investigate, we create a ratio of fake to real jobs based on
states and cities, providing additional insights into the distribution of fraudulent job postings .
53
Ratio to fake to real based on city and state
FIG(A)
54
FIG (B)
FIG (C)
ABOVE FIG 7.5 represents Job count based on (a) employment type, (b) Required education, (c)
Required experience
The graphs above reveal that most fraudulent jobs are full-time and typically for entry-level
55
positions requiring a bachelor’s degree or high school education. To delve deeper into text-related
fields, we combine various categories into one field called "text." These include title, location,
company profile, description, requirements, benefits, required experience, required education,
industry, and function. We explore a histogram showing character count differences between real
and fake jobs. While character counts are similar for both, real jobs have a higher frequency.
Free-Form Visualization
A confusion matrix can be used to evaluate the quality of the project. The project aims to identify
real and fake jobs.
The confusion matrix above shows the labeled categories, the number of data points in each
category, and the percentage of data represented in each category. In the test set, there are 3265
real jobs and 231 fake jobs. From the matrix, we observe that the model accurately identifies real
jobs 99.01% of the time. However, it only identifies fraudulent jobs 73.5% of the time.
Additionally, the model incorrectly classifies the class about 2% of the time. This issue is common
in machine learning algorithms, as they tend to favor dominant classes.
HOME PAGE
extends 'base.html' %}
{% load static %}
{% block title %}
Home
{% endblock %}
56
{% block content %}
<div id="loginModal" tabindex="-1" role="dialog" aria-labelledby="exampleModalLabel"
aria-hidden="true"
class="modal fade">
<div role="document" class="modal-dialog">
<div class="modal-content">
<div class="modal-header">
<h4 id="exampleModalLabel" class="modal-title">Customer Login</h4>
<button type="button" data-dismiss="modal" aria-label="Close" class="close">
<span aria-hidden="true">×</span>
</button>
</div>
<div class="modal-body">
<form action="" method="post">
<div class="form-group">
57
<p class="text-center text-muted">Not registered yet?</p>
<p class="text-center text-muted"><a href="client-register.html"><strong>Register
now</strong></a>!
It is easy and done in 1 minute and gives you access to special discounts and
much more!</p>
</div>
</div>
</div>
</div>
<!-- *** LOGIN MODAL END ***-->
<section class="job-form-section job-form-section--image">
<div class="container">
<div class="row">
<div class="col-lg-8 mx-auto">
<div class="job-form-box" style="background-color:rgb(3, 3, 23); opacity: 0.8;">
<h2 class="heading" style="color: white">Find a Fake Jobs
</h2>
<form id="job-main-form" method="get" action="{% url 'jobs:searh' %}"
class="job-main-form">
<div class="controls" style="color: white;">
<div class="row align-items-center">
<div class="col-md-5">
<div class="form-group">
<label for="profession">Position</label>
<input type="text" id="profession" name="position"
placeholder="Position you are looking for" class="form-control"
style="color: black">
</div>
</div>
<div class="col-md-5">
58
<div class="form-group">
<label for="location">Location</label>
<input type="text" id="location" name="location"
placeholder="Any particular location?" class="form-control">
</div>
</div>
<div class="col-md-2">
<button type="submit"
class="btn btn-outline-white-primary job-main-form__button">
<i class="fa fa-search"></i>
</button>
</div>
</div>
</div>
</form>
</div>
</div>
</div>
</div>
</section>
<section class="bg-success">
<div class="container">
<h3 class="heading" style="color:rgb(222, 217, 217);">Featured jobs</h3>
<div class="row featured align-items-stretch">
{% for job in jobs %}
<div class="col-lg-4 mb-5 mb-lg-0">
<div class="box-image-text bg-visible full-height">
<div class="top">
59
<a href="#">
<div class="image" style="width: 100%; overflow: hidden;">
<img src="{% static 'img/Project.png' %}" alt="" class="img-fluid"
style="width: 100%; height: auto; display: block; margin: 0; padding: 0;">
</div>
<div class="bg"></div>
<div class="logo">
<img src="{% static 'img/logomy.jpg' %}" alt="" style="max-width:
150px;">
</div>
</a>
</div>
<div class="content">
<h5><a href="{% url 'jobs:jobs-detail' job.id %}">{{ job.title }}</a></h5>
<p class="featured__details"><i class="fa fa-map-marker
job__location"></i>
{{ job.location }}
{% if job.type == '1' %}
<span class="badge featured-badge badge-warning">Full time</span>
{% elif job.type == '2' %}
<span class="badge featured-badge badge-primary">Part time</span>
{% else %}
<span style="color: #ffffff;" class="badge featured-badge badge-
success">Internship</span>
{% endif %}
</p>
<p>{{ job.description }}</p>
</div>
</div>
60
</div>
{% endfor %}
</div>
</div>
</section>
<section>
<div class="container">
<h4 class="heading">this month fake jobs</h4>
{% for trending in trendings %}
<div class="job-listing job-listing--last">
<div class="row">
<div class="col-md-12 col-lg-6">
<div class="row">
<div class="col-2">
<img src="{% static 'img/logomy.jpg' %}"
alt="ShareBoardd " class="img-fluid">
</div>
<div class="col-10">
<h4 class="job__title">
<a href="{% url 'jobs:jobs-detail' trending.id %}">{{ trending.title
}}</a>
</h4>
<p class="job__company">
{{ trending.company_name }}
</p>
</div>
</div>
</div>
61
<div class="col-10 col-md-3 col-lg-2 ml-auto"><i class="fa fa-map-marker
job__location"></i>
{{ trending.location }}
</div>
<div class="col-10 col-md-3 col-lg-3 ml-auto">
<p>Posted {{ trending.created_at|timesince }}</p>
</div>
<div class="col-sm-12 col-md-2 col-lg-1">
<div class="job__star">
<a href="#" data-toggle="tooltip" data-placement="top"
title="Save to favourites" class="job__star__link">
<i class="fa fa-star"></i>
</a>
</div>
</div>
</div>
</div>
{% endfor %}
</div>
</section>
<section style="background-image: url{ static 'img/bg.jpg' %}"
class="section-divider">
<div class="overlay"></div>
<div class="container">
<div class="row">
<div class="col-lg-12 text-center">
<p>Start fake job detection now! </p>
<p><a href="{% url 'jobs:jobs' %}" class="btn btn-outline-light">See Fake Job list
</a></p>
62
</div>
</div>
</div>
</section>
{% endblock %}
JOBS PAGE
extends 'base.html' %}
{% load static %}
{% block title %}
Home
{% endblock %}
{% block content %}
<div id="loginModal" tabindex="-1" role="dialog" aria-labelledby="exampleModalLabel"
aria-hidden="true"
class="modal fade">
<div role="document" class="modal-dialog">
<div class="modal-content">
<div class="modal-header">
<h4 id="exampleModalLabel" class="modal-title">Customer Login</h4>
<button type="button" data-dismiss="modal" aria-label="Close" class="close">
<span aria-hidden="true">×</span>
</button>
</div>
<div class="modal-body">
<form action="" method="post">
<div class="form-group">
63
<input id="email_modal" type="text" placeholder="email" class="form-
control">
</div>
<div class="form-group">
<input id="password_modal" type="password" placeholder="password"
class="form-control">
</div>
<p class="text-center">
<button type="button" class="btn btn-outline-white-primary"><i class="fa fa-
sign-in"></i>
Log in
</button>
</p>
</form>
<p class="text-center text-muted">Not registered yet?</p>
<p class="text-center text-muted"><a href="client-register.html"><strong>Register
now</strong></a>!
It is easy and done in 1 minute and gives you access to special discounts and
much more!</p>
</div>
</div>
</div>
</div>
<!-- *** LOGIN MODAL END ***-->
<section class="job-form-section job-form-section--image">
<div class="container">
<div class="row">
<div class="col-lg-8 mx-auto">
<div class="job-form-box" style="background-color:rgb(3, 3, 23); opacity: 0.8;">
<h2 class="heading" style="color: white">Find a Fake Jobs
64
</h2>
<form id="job-main-form" method="get" action="{% url 'jobs:searh' %}"
class="job-main-form">
<div class="controls" style="color: white;">
<div class="row align-items-center">
<div class="col-md-5">
<div class="form-group">
<label for="profession">Position</label>
<input type="text" id="profession" name="position"
placeholder="Position you are looking for" class="form-control"
style="color: black">
</div>
</div>
<div class="col-md-5">
<div class="form-group">
<label for="location">Location</label>
<input type="text" id="location" name="location"
placeholder="Any particular location?" class="form-control">
</div>
</div>
<div class="col-md-2">
<button type="submit"
class="btn btn-outline-white-primary job-main-form__button">
<i class="fa fa-search"></i>
</button>
</div>
</div>
</div>
</form>
65
</div>
</div>
</div>
</div>
</section>
<section class="bg-success">
<div class="container">
<h3 class="heading" style="color:rgb(222, 217, 217);">Featured jobs</h3>
<div class="row featured align-items-stretch">
{% for job in jobs %}
<div class="col-lg-4 mb-5 mb-lg-0">
<div class="box-image-text bg-visible full-height">
<div class="top">
<a href="#">
<div class="image" style="width: 100%; overflow: hidden;">
<img src="{% static 'img/Project.png' %}" alt="" class="img-fluid"
style="width: 100%; height: auto; display: block; margin: 0; padding: 0;">
</div>
<div class="bg"></div>
<div class="logo">
<img src="{% static 'img/logomy.jpg' %}" alt="" style="max-width:
150px;">
</div>
</a>
</div>
<div class="content">
<h5><a href="{% url 'jobs:jobs-detail' job.id %}">{{ job.title }}</a></h5>
66
<p class="featured__details"><i class="fa fa-map-marker
job__location"></i>
{{ job.location }}
{% if job.type == '1' %}
<span class="badge featured-badge badge-warning">Full time</span>
{% elif job.type == '2' %}
<span class="badge featured-badge badge-primary">Part time</span>
{% else %}
<span style="color: #ffffff;" class="badge featured-badge badge-
success">Internship</span>
{% endif %}
</p>
<p>{{ job.description }}</p>
</div>
</div>
</div>
{% endfor %}
</div>
</div>
</section>
<section>
<div class="container">
<h4 class="heading">this month fake jobs</h4>
{% for trending in trendings %}
<div class="job-listing job-listing--last">
<div class="row">
<div class="col-md-12 col-lg-6">
<div class="row">
<div class="col-2">
67
<img src="{% static 'img/logomy.jpg' %}"
alt="ShareBoardd " class="img-fluid">
</div>
<div class="col-10">
<h4 class="job__title">
<a href="{% url 'jobs:jobs-detail' trending.id %}">{{ trending.title
}}</a>
</h4>
<p class="job__company">
{{ trending.company_name }}
</p>
</div>
</div>
</div>
<div class="col-10 col-md-3 col-lg-2 ml-auto"><i class="fa fa-map-marker
job__location"></i>
{{ trending.location }}
</div>
<div class="col-10 col-md-3 col-lg-3 ml-auto">
<p>Posted {{ trending.created_at|timesince }}</p>
</div>
<div class="col-sm-12 col-md-2 col-lg-1">
<div class="job__star">
<a href="#" data-toggle="tooltip" data-placement="top"
title="Save to favourites" class="job__star__link">
<i class="fa fa-star"></i>
</a>
</div>
</div>
68
</div>
</div>
{% endfor %}
</div>
</section>
<section style="background-image: url{ static 'img/bg.jpg' %}"
class="section-divider">
<div class="overlay"></div>
<div class="container">
<div class="row">
<div class="col-lg-12 text-center">
<p>Start fake job detection now! </p>
<p><a href="{% url 'jobs:jobs' %}" class="btn btn-outline-light">See Fake Job list
</a></p>
</div>
</div>
</div>
</section>
{% endblock %}
REGISTER PAGE
{% extends 'base.html' %}
{% block title %}
Employee Register
{% endblock %}
{% block content %}
69
<div class="col-lg-6 offset-3">
<div class="box">
<h3 class="heading">New account</h3>
<p class="lead">Not our registered yet?</p>
<p class="text-muted">If you have any questions, please feel free to <a href="#">contact
us</a>,
our customer service center is working for you 24/7.</p>
{% if form.errors %}
{% for field in form %}
{% for error in field.errors %}
<div class="alert alert-danger alert-dismissable">
<a href="#" class="close" data-dismiss="alert" aria-label="close">×</a>
<strong>{{ error|escape }}</strong>
</div>
{% endfor %}
{% endfor %}
{% for error in form.non_field_errors %}
<div class="alert alert-danger alert-dismissable">
<a href="#" class="close" data-dismiss="alert" aria-label="close">×</a>
<strong>{{ error|escape }}</strong>
</div>
{% endfor %}
{% endif %}
<form action="" method="post">
{% csrf_token %}
{% for field in form %}
{% if field.name == 'gender' %}
70
<div class="form-group">
<label for="gender">Gender</label>
<br>
<div class="form-check form-check-inline">
<input class="form-check-input" type="radio" name="gender" id="male
gender"
value="male">
<label class="form-check-label" for="male">Male</label>
</div>
<div class="form-check form-check-inline">
<input class="form-check-input" type="radio" name="gender" id="female"
value="female">
<label class="form-check-label" for="female">Female</label>
</div>
</div>
{% else %}
<div class="form-group">
<label for="id_{{ field.name }}">{{ field.label }}</label>
<input type="{{ field.field.widget.input_type }}"
class="form-control"
name="{{ field.name }}"
id="id_{{ field.name }}"
placeholder="{{ field.field.widget.attrs.placeholder }}">
</div>
{% endif %}
{% endfor %}
71
<div class="text-center">
<button type="submit" class="btn btn-outline-white-primary"><i class="fa fa-user-
md"></i> Register
</button>
</div>
</form>
</div>
</div>
{% endblock %}
LOGIN PAGE
{% extends 'base.html' %}
{% block title %}
{{ title }}
{% endblock %}
{% block content %}
<div class="col-lg-6 offset-3">
<div class="box">
<h3 class="heading">Login</h3>
<p class="lead">Already our have account?</p>
{% if form.errors %}
{% for field in form %}
{% for error in field.errors %}
<div class="alert alert-danger alert-dismissable">
<a href="#" class="close" data-dismiss="alert" aria-label="close">×</a>
<strong>{{ error|escape }}</strong>
</div>
{% endfor %}
72
{% endfor %}
{% for error in form.non_field_errors %}
<div class="alert alert-danger alert-dismissable">
<a href="#" class="close" data-dismiss="alert" aria-label="close">×</a>
<strong>{{ error|escape }}</strong>
</div>
{% endfor %}
{% endif %}
<form action="" method="post">
{% csrf_token %}
{% for field in form %}
<div class="form-group">
<label for="id_{{ field.name }}">{{ field.label }}</label>
<input type="{{ field.field.widget.input_type }}"
class="form-control"
name="{{ field.name }}"
id="id_{{ field.name }}"
placeholder="{{ field.field.widget.attrs.placeholder }}">
</div>
{% endfor %}
<div class="text-center">
<button type="submit" class="btn btn-outline-white-primary"><i class="fa fa-sign-
in"></i> Log in
</button>
</div>
</form>
</div>
</div>
{% endblock %
73
DASHBOARD PAGE
{% block content %}
<section class="bg-light-gray">
<div class="container">
<div class="row">
<div class="col-lg-8 mx-auto">
<h1 class="heading">dashboard</h1>
<p class="lead text-center">All created jobs</p>
</div>
</div>
</div>
</section>
<section>
<div class="container">
<div class="row">
<div class="col-lg-12 text-right mb-5"><a href="{% url 'jobs:employer-jobs-create'
%}"
class="btn btn-success">
<i class="fa fa-plus"></i>Add new position</a></div>
<div class="col-lg-12">
<div class="table-responsive">
<table class="table table-striped table-hover table-client-dashboard">
<thead>
<tr>
<th>Job title</th>
<th>Position filled</th>
<th>Date posted</th>
<th>Date expiring</th>
74
<th>Applicants</th>
<th>Actions</th>
</tr>
</thead>
<tbody>
{% for job in jobs %}
<tr>
<th><a href="{% url 'jobs:jobs-detail' job.id %}">{{ job.title }}</a>
</th>
<td>
{% if job.filled %}
<span class="badge badge-success">Filled</span>
{% else %}
<span class="badge badge-secondary">Not Filled</span>
{% endif %}
</td>
<td>{{ job.created_at }}</td>
<td>{{ job.last_date }}</td>
<td>
<a href="{% url 'jobs:employer-dashboard-applicants' job.id %}"
class="btn btn-success">
<i class="fa fa-users"></i>{{ job.applicants.count }}
<span class="hidden-xs hidden-sm">Applicants</span>
</a>
</td>
<td>
<a href="#" class="btn btn-success">
75
<i class="fa fa-edit"></i>Edit
</a>
<br>
{% if job.filled %}
<a href="{% url 'jobs:job-mark-filled' job.id %}"
class="btn btn-outline-white-secondary">
<i class="fa fa-check-circle-o"></i>
<span class="hidden-xs hidden-sm">Filled</span>
</a><br>
{% else %}
<a href="{% url 'jobs:job-mark-filled' job.id %}"
class="btn btn-outline-white-secondary">
<i class="fa fa-check-circle-o"></i>
<span class="hidden-xs hidden-sm">Mark as </span>filled
</a><br>
{% endif %}
<a href="#" class="btn btn-success">
<i class="fa fa-times-circle-o"></i>Delete
</a>
</td>
</tr>
{% endfor %}
</tbody>
</table>
</div>
<div class="pages">
<nav aria-label="Page navigation example" class="d-flex justify-content-center
mt-4 mb-4">
<ul class="pagination">
76
<li class="page-item"><a href="#" aria-label="Previous" class="page-
link"><span
aria-hidden="true">«</span><span class="sr-
only">Previous</span></a></li>
<li class="page-item active"><a href="#" class="page-link">1</a></li>
<li class="page-item"><a href="#" class="page-link">2</a></li>
<li class="page-item"><a href="#" class="page-link">3</a></li>
<li class="page-item"><a href="#" class="page-link">4</a></li>
<li class="page-item"><a href="#" aria-label="Next" class="page-
link"><span
aria-hidden="true">»</span><span class="sr-
only">Next</span></a></li>
</ul>
</nav>
</div>
</div>
</div>
</div>
</section>
{% endblock %}
DATABASE CONNECTIVITY
import sqlite3
# Path to the SQLite database file
db_file = "C:/Users/Adones/3D Objects/Django/online job management system python
django/db.sqlite3"
try:
# Connect to the SQLite database
connection = sqlite3.connect(db_file)
cursor = connection.cursor()
77
# Print a message upon successful connection
print("Connected to the database.")
# Perform database operations here
except sqlite3.Error as error:
# Print an error message if connection fails
print("Failed to connect to the database:", error)
finally:
# Close the database connection
if connection:
connection.close()
print("Database connection closed.")
CHAPTER 9
RESULTS AND SCREENSHOTS
78
79
80
81
Chapter 10
CONCLUSION
In conclusion, this project on fake job detection, we've embarked on a journey to tackle the
prevalent issue of fraudulent job postings online. Our approach involved utilizing machine learning
algorithms like the SGD Classifier and Naïve Bayes to distinguish between genuine and fake job
listings.
Starting with data collection and preprocessing, we meticulously cleaned and prepared the dataset
for analysis. Exploratory data analysis provided valuable insights into the characteristics of both
real and fake job postings, guiding our subsequent modeling efforts.
Through the implementation of machine learning models, such as the SGD Classifier, we achieved
high accuracy in identifying real jobs, ensuring users can trust the authenticity of such listings. On
the other hand, the Naïve Bayes algorithm proved effective in detecting fraudulent postings,
offering an additional layer of protection against scams.
In conclusion, this project represents a significant step forward in enhancing trust and security in
online job platforms. By leveraging machine learning techniques, we've made strides towards
creating a safer and more transparent job marketplace for users worldwide. However, our journey
doesn't end here; continued research and development are essential to stay ahead of emerging
threats and to ensure the ongoing success of fake job detection systems. Together, we can build a
better and more trustworthy online job environment for everyone.
The SGD Classifier is excellent at spotting real jobs with a super high accuracy of 99.01%, but it's
not as good at catching fake jobs, only getting about 73.5% of them right. On the flip side, Naïve
Bayes isn't as good at recognizing real jobs, but it's pretty effective at finding fake ones, hitting
around 92% accuracy.
82
FUTURE ENHANCEMENT
For future improvements in our fake job detection system, we can:
1. Real-Time Monitoring: Keep an eye on job postings as they come in, spotting potential
scams right away.
2. Automated Alerts: Set up alerts to warn users when a job looks fishy, helping them avoid
potential scams.
3. User Reporting: Let users report suspicious postings, helping us catch scams faster
through crowd-sourced detection.
4. Behavioral Analysis: Look for suspicious patterns, like one user posting lots of jobs
quickly, and flag them for review.
5. Geospatial Analysis: Check for oddities in job locations, like lots of postings from risky
areas or unusual clusters of postings.
6. Natural Language Understanding: Improve our system's ability to spot shady language
or inconsistencies in job descriptions and company profiles.
7. Model Deployment: Use cloud-based services to quickly deploy and scale our machine
learning models for processing job postings in real-time.
8. Continuous Learning: Make sure our system keeps learning and getting better over time
by listening to feedback from flagged postings and user interactions.
83
REFRENCES
1. Patel, Urvashi, and Prof. A. R. Itkikar. "Fake Job Detection Using Machine Learning
Techniques." International Journal of Scientific Research in Computer Science,
Engineering and Information Technology (IJSRCSEIT) 6.4 (2021): 39-43.
2. Dong, Yujie, et al. "Investigating fake job postings on Indeed: An unsupervised machine
learning approach." 2019 IEEE International Conference on Big Data (Big Data). IEEE,
2019.
3. Al-Zuair, Marwan, et al. "Fake job offer detection using machine learning." International
Journal of Advanced Computer Science and Applications 11.11 (2020): 394-400.
4. P. Paul, A. Chakraborty, and M. Mitra, "Detecting Fake Job Posting: A Machine Learning
Based Approach," 2019 Second International Conference on Advanced Computational and
Communication Paradigms (ICACCP), Gangtok, India, 2019, pp. 1-6.
5. Arora, Akshay, et al. "Automatic fake job detection system using machine learning." 2018
IEEE International Conference on Advanced Networks and Telecommunications Systems
(ANTS). IEEE, 2018.
6. Arora, Sumit, and M. S. Saini. "Fake Job Postings Detection Using Machine Learning
Algorithms." (2020).
7. Dhamija, Rachna, et al. "Detecting fake websites: the contribution of statistical learning
theory." Proceedings of the 9th ACM SIGCOMM conference on Internet measurement
conference. 2009.
8. Dong, Yujie, et al. "Understanding and detecting fake job postings on Indeed." Proceedings
of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data
Mining. 2019.
9. Bajaj, Neha, et al. "Fake Job Posting Detection: A Machine Learning Approach."
International Journal of Scientific & Engineering Research 10.5 (2019): 794-799.
84
10. Jain, Ruchika, and Dr. Shashikant Patil. "Fake Job Posting Detection Using Machine
Learning." International Journal of Engineering Research and Technology 6.7 (2017): 402-
407.
11. Agrawal, Divya, and Meenakshi Sharma. "Fake Job Postings Detection using Data
Mining." International Journal of Computer Applications 184.4 (2018): 18-22.
12. Hemanth Kumar, K., et al. "Fake Job Detection on Online Job Portals using Machine
Learning Techniques." International Journal of Advanced Research in Computer Science
9.1 (2018): 52-55.
85
PROOF OF PUBLICATION
86