ISE 527 IEEE Access LaTeX Template
ISE 527 IEEE Access LaTeX Template
ISE 527 IEEE Access LaTeX Template
ABSTRACT Turnover is regarded a big loss in organizations in terms of cost, time, and effort. Especially
when the outflow of talented employees exceed the replacement level of the organization, this turnover event
becomes dysfunctional for the organization. Thus, knowledge of factors that affect the decision of employee
turnover is invaluable for managers to make the best strategy for workforce planning. This study aims to
reveal the most important factors that retain employees in organizations based on feature selections and
machine learning, then deep learning is used for result validations. The results show that the best features
obtained from the feature selections phase could give a better prediction for machine learning methods,
particularly when dealing with a relatively small dataset, the XGBT algorithm gives a satisfactory level of
prediction result.
INDEX TERMS Employee retention, employee turnover prediction, feature selection, machine learning
for decision making
VOLUME 4, 2016 1
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
(MCDM) or multiple criteria decision analysis (MCDA) for II. LITERATURE REVIEW
analysis of the importance of different attributes in decision Human resource management has changed significantly over
making for employee retention using machine learning and time, and topics such as artificial intelligence and data mining
deep learning. have gained a lot of attention. Investigations on the elements
The voluntary turnover which is marked by the willingness that drive employee turnover and affect visible and hidden
of a knowledgeable and talented employee to quit the organi- costs of the organization have been scrutinized in [3]. The
zation while he/she is still needed is a big concern for HR. It hiring approach was reported to be upgraded after evalu-
is even worse if the employee joined another organization’s ating the company’s historical data. Another insight after
competitor when the ex-employee knew many aspects of performing an HR analysis is [4] to identify strategies to in-
his/her ex-organization and could give better contribution fluence employee performance and decision-making process
in the new place. Therefore, retaining talented employees in various departments. Since human resource management
will actually keep the organization on its best performance, and relevant topics are a complex issue containing many
and it can be achieved by recognizing the causes of the constraints with different types, managers usually use multi-
turnover along with appropriate management approach im- criteria decision-making (MCDM) techniques in personnel
plementation, which is widely believed to reduce the turnover selection, candidate evaluation, and employee classification.
rate. Therefore, this study focuses on automated MCDA and Several MCDM methods have been applied in human re-
machine learning techniques to identify the factors affecting source management such as the Best-Worst Method (BWM)
human resource turnover, rank the different attribute / criteria combined with the technique for ordering priority based on
using machine learning-based feature selection techniques similarity to the ideal solution (TOPSIS) [5], the combination
that help managers make wise and strategic decisions, and between BWM and the Decision-Making Trial and Evalu-
validate the results of the above methods applying them to ation Laboratory (DEMATEL) [6] and the fuzzy variant of
different machine and deep learning techniques. the Analytic Hierarchy Process (FAHP) and TOPSIS [7].
As such, a number of machine learning techniques for [8] enhanced hiring and placement decisions by providing
the automated analysis of human resource factor importance, a comprehensive analytics framework that can be used as a
cutting-edge machine learning, and deep learning techniques decision support tool for HR recruiters in practical contexts.
are leveraged to validate the importance of turnover factors They showed that it is possible to forecast a candidate’s
in the employee retention decision process. performance in a particular job at the pre-hire stage and
Random Forest (RF) is an ensemble method along with use predictions to create a global optimization model. They
its variant of the decision tree: Extras Tree (ET) and Gra- used a machine learning (ML) model, which in this case
dient Boosting Tree (GBT) are employed for feature impor- is obtained by applying the Variable-Order Bayesian Net-
tance, and Convolutional Neural Network (CNN), FastONN, work (VOBN) model to the recruitment data. Their findings
XGBT and LGBM are used for attrition prediction. The demonstrated that, despite the inherent trade-off between
experimental results show that the results of the models the two, the proposed framework is capable of producing a
presented in this paper have performance comparable to those balanced recruitment strategy while increasing both diver-
of the literature. Additionally, our models achieve the most sity and recruitment success rates compared to real hiring
promising results even with using 4 or 5 features as mini- decisions.The summary of previous conventional works is
mum. This leads to reducing the complexity of the decision provided in Table 1.
process in HR resource retention. Furthermore, to the best The application of machine learning in the field of human
of our knowledge, there are no prior work on FastONN for resource management has a great impact that helps the man-
employee retention prediction. This paves the way for further ager make the right decision based on the huge data available
research, though we cannot have much better performance in the company. This type of data contains many variables
score using FastONN. and thousands of records, which is suitable for machine
The remainder of the paper is organized as follows. Sec- methods to analyze. Several aspects of human resource man-
tion II discusses the related recent literature, and Section agement benefit from machine learning applications, such
III is about methodology. Then Section IV talks about the as in the recruitment process [9], where the application of
datasets used in this work and related preprocessing tech- the contracted Autoencoder in the recommendation system
niques needed to prepare the data for analysis. Afterward, helps candidates find a better match between their qualifica-
Section V is mainly about the experiment setup environment tions and the available job openings. The author overcame
for this work, the result obtained from every dataset from the cold start problem of the recommendation system with
different algorithms. In Section VI, we analyze the result ob- the proposed prototype, but the prototype still needs more
tained from the previous section and describe the factors that improvement in scalability to analyze more realistic data.
mainly affect the retention decision of employees. Finally, Another study related to job requirements and applicant re-
in the last two sections, we report on the problems that we sume matching is done in [35]. The difficulty that companies
faced in the work and suggest a recommendation for future face today is to find the right candidates with sufficient skills
improvement, and then we conclude our work, in Sections and willing to work within the company. Specifically, in the
VII and VIII, respectively. manufacturing type of company which needs a huge number
2 VOLUME 4, 2016
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
approximate, their learning performance differs considerably. well-known shortcomings and restrictions of conventional
Their homogeneous configuration, which is mainly based Convolutional Neural Networks (CNNs), such as network
on the linear neuron model, is the main cause of this. As homogeneity with a single linear neuron model. ONNs are
a result, they may entirely fail when the solution space is heterogeneous networks with a generalized neuron model.
very non-linear and complicated, even when they learn quite However, because the same set of operators is applied to
well when the solution space is monotonous, straightforward, all neurons in each layer, the operator search approach in
and linearly separable. It is not surprising that, in many ONNs restricts network heterogeneity in addition to being
difficult problems, only deep CNNs with massive complexity computationally heavy. Furthermore, there is a potential that
and depth can achieve the required diversity and learning performance will decrease since the library of operator sets
performance. This is also true for conventional Convolutional being used directly influences how effectively ONNs func-
Neural Networks (CNNs), which share the same linear neu- tion, particularly when the optimal operator set required for a
ron model with two additional constraints (local connections certain operation is lacking from the library [20]. As a result,
and weight sharing). Operational Neural Networks (ONNs), its newest variation, Self-organized ONNs (Self-ONNs), has
which can be heterogeneous and encapsulate neurons with been shown to perform better than CNNs in a variety of
any set of operators to boost diversity and learn highly regression tests and suggests adding more non-linearity to the
complex and multi-modal functions or spaces with minimal convolutional neuron model [21]. The convolution process in
network complexity and training data, are proposed by [19] CNN is generalized by an operational neuron following this
to address this limitation and achieve a more generalized formula [19]:
model over convolutional neurons. ONNs, a newly developed
type of neural network, are machine learning technology K−1,K−1
that offers better performance, efficiency, and scalability xlik = Plk ψ wik
l
(r, t), yil−1 (m + t, n + t) (r,t)=(0,0)
compared to Convolutional Neural Networks (CNN). By
including neurons with any combination of operators, ONNs Where ψ(·) and P (·) are called nodal and pool operators,
can be heterogeneous, enabling them to learn extremely com- respectively. Every neuron in a heterogeneous ONN system
plex and multimodal functions or spaces with little network has its own set of ψ and P operators. To identify the best
complexity and training data. operators (see [19]) Self-ONNs presented a composite nodal
function that is created and updated frequently during back-
Despite the fact that Operational Neural Networks (ONNs) propagation using the Taylor series-based function approx-
have just recently been put forth as a potential remedy for the imation [20], [22]. An indefinitely differentiable function
4 VOLUME 4, 2016
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
FIGURE 1: High level methodological pipeline. Extra Tree (ET), Extreme Gradient Boosting Tree (XGBT), Light Gradient
Boosting Machine (LGBM), Fast Operational Neural Network (FastONN), Convolutional Neural Network (CNN), Recursive
feature elimination (RFE).
f (x) centered on a point a has the following Taylor series As a new kind of neural network, which has several
expansion: advantages over CNN [19] especially in restoring an image
from a noisy image, we tried to implement FastONN of its
∞
X f n (a) 1-dimensional type on human resource retention problem
f (x) = (x − a)n where this attempt requires a detailed process to construct
n=0
n!
the model. In our architecture, we stacked 2 FastONN layers
From the above formula, the Taylor polynomial of Qth followed by 2 Linear layers, Drop Out of 0.2 neurons, and
order truncated approximation has the following closed form: sigmoid function. A SelfONN Layer expects the inputs to be
within the range [-1, 1]. A Tanh or Sigmoid activation layer
Q must come before it to achieve that range. Especially when
X f n (a) n working with high values of the q parameter, because the
f (x)Q,a = x
n=0
n! input increases to the q-th power and, if not bounded between
[-1, 1], can explode. In practice, for low q values and shallow
With the aforementioned formulation, any function f (x) networks, the ReLU activation layer can also be used.
may be approximated to the extent a. The operational neuron
model’s main building block extends the Generalized Oper- C. EXTREME GRADIENT BOOSTING TREE (XGBT)
ational Perceptrons (GOP) to the convolutional realm prin- It was initially put forward in [24] and and is based on the
ciples. While maintaining the benefits of sparse-connectivity gradient boosting approach. Due to its quicker training, con-
and weight-sharing, an operational neuron gives the ability to vergence, and performance improvement, this ensemble has
apply nonlinear changes within local receptive fields without grown in favor in many fields of machine learning research.
the burden of additional trainable parameters. According to In addition on its performance improvement, the use of L1
[19], segmentation, denoising, and image-to-image transla- and L2 regularization avoids XGBT to overfit. We utilize the
tion are just a few of the challenging learning tasks that rich default parameters, which include a total of 100 estimators, a
non-linear operators (operator sets) powering ONNs have maximum tree depth of 3, a minimum sample requirement of
been shown to outperform CNN. In the case of any operator 2 for splitting, and a minimum sample requirement of 1 for a
set θ = (ϕ, ψ, f ), an operational neuron with θ alters the leaf.
operation as shown in Figure 3 and is expressed as:
D. LIGHT GRADIENT BOOSTING MACHINE(LGBM)
It has several advantages over XGBT and was initially
x(i, j) = ϕm−1 n−1
u=0 ϕu=0 (ψ(w(u, v), y(i − u, j − v))) suggested by Microsoft [25]. The main difference between
VOLUME 4, 2016 5
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
FIGURE 2: The architecture of all models used in this paper. CNN1D stands for one dimensional convolutional layer, MaxP
represents MaxPooling, BN stands for Batch normalization, DO is an abbreviation of Dropout, and FONN1D represents one
dimensional FastONN.
XGBT and LGBM is in the method used to develop the trees. of unnecessary or poorly linked features wastes computa-
In XGBT, the grow of trees starts accross the nodes, level- tional resources, training time, and money. Because of this,
by-level, while in LGBM, they are grown starting from one choosing highly significant features helps the machine learn-
node or leafwise. Due to its sampling method called GOSS ing model perform better. The training time of the model for
(Gradient-based One Side Sampling) and the method to real-time deployment is an important factor that is attract-
reduce total effective features called EFB (Exclusive Feature ing serious attention from business owners and researchers
Bundling), LGBM executes more quickly and has higher alike in the context of employee retention factors, where a
accuracy [25]. Please be aware that we utilize the same large volume of data is overwhelming due to the availability
default values and parameter settings as in XGBT. Figures of high data acquisition technology. Therefore, prioritizing
2 (c) and (d) show the high-level architecture of the XGBT features or eliminating less beneficial aspects is the best
and LGBM, respectively. acceptable alternative. As such, we deploy the Recursive
Feature Elimination (RFE) method in combination with three
E. RECURSIVE FEATURE ELIMINATION (RFE) powerful ensemble estimators such as Random Forest, Extra
Trees and Gradient Boosting, one at a time. RFE works as
High-quality features are crucial for machine learning mod-
follows: With this feature selection method, we investigate
els. In addition to lowering model performance, the addition
6 VOLUME 4, 2016
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
how features of retention prediction with heterogeneous na- tures and splitting them using the optimal (best) split rather
ture reveal their importance as a whole inclusion fashion than a random split [28].
using external classifiers. RFE’s selection of features begin
first with all features and later, as name suggest, recursively 3) Gradient Boosting Tree
abandoning less important features and selecting fewer sets
The GBT is another ensemble methodology that enhances
of important features in each iteration. In the training, each
a decision tree (DT) using a boosting mechanism, with the
feature will have its importance level assigned, and the
idea of fusing disparate weak models into a single strong
features with smallest level are removed in each subsequent
consensus model. Instead of creating a new optimized tree,
iteration. In this way, the process is continued on the re-
GBT requires each tree to reduce the error of the preceding
maining feature set until pre-defined number of feature set
tree. The final model combines the results from the previous
is obtained [26]. Here in this method, selection of number
stage to provide the idea of a more powerful learner [29],
of features to select and estimator is the two most sensitive
[28].
parameters. For the purpose of this work, we exploited three
different ensembles, which are briefly discussed below, and
the detailed parameters are discussed in Section VI-B. IV. DATASETS AND DATA PREPROCESSING
The high quality dataset and efficient preprocessing is an
1) Random Forest important preliminary step for almost all machine or deep
learning techniques to achieve the best results. In this study,
The bagging approaches [27] are used along with combina-
we focus on IBM Employee Dataset, which has 35 variables
tion with single tree predictors in Random Forest. To create
and 1470 records [30]. Although this data set is popular,
a decision tree for each set of samples, RF selects random
several different data sets are taken into account in this study
records from the training set. A majority vote is obtained
and are summarized in Table 3. As seen in the table, the cho-
for the classification problem, or an average is calculated for
sen datasets consist of different number of samples and fea-
the regression problem in order to determine the final output
tures, making the problem investigation more inclusive. First,
from each decision tree’s findings.
we prepare each dataset by removing unnecessary columns,
and encoding non-numerical columns. Subsequently, if the
2) Extra Trees samples are imbalanced, SMOTE upsampling was applied to
It is an ensemble approach in which various data subsamples make the number of samples for each class almost equal. For
are fitted using a randomized decision tree. Accuracy is example, the IBM Employee dataset comprises 26 numeri-
calculated, and over-learning is avoided using an averaging cal variables and 9 categorical variables which come from
technique. In Extra Tree, a split is also formed based on 3 different departments of the company: HR, Research &
random subsets of the feature at each node, and the trees Development, and sales. The exploratory analysis is done,
are constructed using a random subset of features without an initial observation is that the data set imbalance with 237
replacement. These two characteristics set it apart from the employee with attrition and the other 1233 with no attrition.
random forest, which creates random trees by replacing fea- Therefore, data upsampling is applied; it should be noted
VOLUME 4, 2016 7
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
that before applying upsampling, the data standardization is Dataset#1, belong to XGBT. This case was for using full
applied for data scaling. features, however, when 5 best features were used, XGBT
Furthermore, correlation among the features of a particu- has better performance also in precision.
lar dataset was also studied for illustration purposes. From
dataset [31], the correlation values between variables were C. DATASET #3
calculated and found that the most strong correlation variable The third dataset, titled US firm Employee Turnover dataset
with attrition is satisfaction level of an employee, with value (no name given for privacy reasons), is composed of 2385
of -0.388. Another interesting fact is from [32] where the and 7155 data points for testing and training, respectively.
decision to left a company has a positive correlation of 0.3 Here, LGBM worked better when dealing with all 9 fea-
with the composite score the employees received in their last tures, but CNN performed best on the best 4 features.
evaluation. The correlation table was given in Fig. 4.
D. DATASET #4
V. EXPERIMENTS AND RESULTS Employee Attrition dataset, the fourth dataset, has 25491 and
All the experiments are implemented using Python and Py- 4507 data points for training and testing, respectively.
torch using Scikit-learn, Keras and Tensorflow library. All In all metrics, CNN performed best, except in precision,
experiments run on (i) MacBook Pro 2015 with core i5 CPU where XGBT performed better. But when 5 best features
and 8GB RAM, and (ii) Google colaboratory free account. were analyzed, XGBT performed well in every metric.
For CNN model, binary cross-entropy loss function and
Adam optimizer are used with the batch size of 64 and E. DATASET #5
epoch number of 2000 for all experiments. As for Fast ONN, The fifth dataset, the IBM dataset which is a fictional data
the batch size of 32 and the number of epochs of 100 for created by IBM data scientists, contains 1249 and 221 data
all experiments. The corresponding results are tabulated in points, respectively, for training and testing.
Table 6 in the following sections. These sections provide the CNN gave the best recall metric in full feature setting, and
training and testing results of four different models (XGBT, LGBM performed better in terms of accuracy, F1-score, and
LGBM, CNN, and FastONN) on five different datasets, the precision. For 17 best features, XGBT and LGBM shared the
metric results, and number of samples in training and testing same two best metrics, where XGBT did better in accuracy
dataset. The learning curves of CNN for each datasets with all and precision, while LGBM was good in F1-score and recall.
and best features are shown in Fig. 5. From the figure it can be
observed that, the learning curves of Dataset #1 and #3 (with VI. DISCUSSION AND ANALYSIS
all features) show overfitting. This may be due to inclusion A. DECISION SUPPORT FOR ATTRITION
of less co-related or representative features in training. CLASSIFICATION MODEL SELECTION
The results of the experiment in various data sets show
A. DATASET #1 that XGBT performs well when dealing with HR-related
First dataset is a real dataset shared firstly by Edward data. From the experiment and dataset used, it is obvious
Babushkin [33], named Employee Turnover dataset which is that the dimension of the dataset plays a great effect on
divided into 903 data points for training and 226 data points the result obtained by the machine learning models. More
for testing. complex models such as Convolutional Neural Network and
In this dataset, XGBT model perform better in all metrics. Self-Operational Neural Network were inferior in terms of
The learning curves of CNN tells that the selection of best accuracy, F1-value, precision, and recall, compared to XGBT
features gives better result in validation dataset. With full and LGBM since the datasets have only from 9 up to 34 input
15 features, the model seems to overfit the training dataset features (variables).
as seen in Figure 5 where the model performed well on As an overall comparison in dataset-wise, the highest
training data but poorly on validation data. This is why accuracy of 98.87% is achieved by CNN using all features
XGBT performed better in this dataset since regularization of dataset # 4. The reason is obvious because it has the larger
techniques used in it preventing overfitting and enhance gen- number of samples compared to other datasets. It is interest-
eralization. It includes both L1 and L2 regularization terms ing to mention that even when the features are reduced from
in its objective function, which help control the complexity 9 features to best 5 features the accuracy is 98.27% which
of the model and reduce the impact of individual trees on the is quite impressive. In terms of F1-score as well, when all
final prediction. features are used CNN has the highest score of 98.50% in
dataset #4. But when best features are used, the highest F1-
B. DATASET #2 score is achieved by LGBM on the same dataset. This suggest
The HR Analytics dataset, the second dataset, is composed that as much as 5 human resource factors are enough to
of 3750 data points for testing and 11249 data points for decide an employee attrition in a company. Therefore, the HR
training. manager can take support from such model in analysing and
LGBM gave better precision in this dataset, and other planing HR resources for the organization’s efficient resource
three metrics (Accuracy, F1-score, and Recall) are same as management and growth.
8 VOLUME 4, 2016
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
VOLUME 4, 2016 9
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
We analyze further the classification and prediction be- B. DECISION SUPPORT FOR EMPLOYEE RETENTION
havior of attrition in the organizations, with respect to all FACTOR SELECTION
machine learning models. For this, F1-score is chosen as a As discussed earlier, manually focusing and analyzing which
evaluation metric and the scores are shown in Fig. 7. From factors to pay the most attention to retain highly compati-
the figure, it can be noticed that the same performance trend ble employees in the organization, is laborious and costly,
for both scenarios; using all features and using only best Therefore, we deployed three tree-based ensemble machine
features can be seen for all machine learning models, except learning models for selecting factors affecting employee re-
FastONN. In more details, the performance of FastONN is tention. We used the RFE approach to experiment with three
stable when all features are used, compared to when best estimators for human resource feature ranking. As such, we
features are used. For dataset 1 and 5, the performance of applied an enhanced RFE approach with cross-validation in
CNN is inferior than LGBM and XGBT due to lower number which RF is used as an estimator. It was trained using the
of training instances. For other datasets, the performance of Stratified KFold cross-validation method with 10 splits. 100
CNN, LGBM and XGBT are almost the same. Therefore, trees are used for estimation with sample replacements, and
decision maker and HR manager can take the support of any Gini impurity is used to determine information value. Second
of the three techniques in their decisions making regarding ensemble technique we used is Extra Tree (ET) in which the
employee attrition classification. Additionally, the perfor- same parameters as Random Forest for the estimator is used.
mance of CNN and FastONN are more sensitive to irrelevant But in ET, bootstrap is set to equal false, which means that
features in the training process. The highest F1-score of about the subset of samples used to create one tree is not replaced
99% is achieved by CNN in Dataset 4. As a final conclusion while creating future trees. The third technique we used
in this regards, when there are more than ten of thousands of is Gradient Boosting Tree (GBT). As in previous methods,
input samples for training, CNN or fastONN is recommended this estimator is trained with 100 trees using stratified 10-
for employee attrition classification, otherwise LGBM or fold cross-validation of RFE. The Friedman Mean Square
XGBT is the suitable choice. Error (FMSE) is used to calculate the information value.
HR factors importance for each machine learning ensembles
are tabulated in Fig. 4. Note that for implementation of this
RFECV, we made use of scikit-learn library.
If the results are analyzed, it is observed that the number of
10 VOLUME 4, 2016
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
VOLUME 4, 2016 11
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
FIGURE 6: Classification results of all models on different datasets using all features
important factors in human resource is not the same for each three most influential features. Differently, RF ranks traffic
of the three different techniques. This fact can be exploited as the sixth important feature, ET ranks it as the fourth, and
by HR managers to play with their decision boundaries by GBT ranks it as the third important factor. These rankings are
excluding or including the most important HR factors. For validated by training the deep learning models again with the
dataset #1, the number of best HR factors selected are 14, most common features ranked for each dataset listed in Fig.
9 and 3 for RF, ET and GBT, respectively. It is interesting 4 by the three methods of the ensemble of machine learning.
to note that all three techniques rank the same factors as If the attrition classification of considering all features and
importance, showing that the ranking is not inconsistent. best 3 features is analyzed based on Table 6, it is seen that
From Fig. 4 and from the first row of Fig. 8, we can see there is a significance decrease in the classification scores
that all the ML techniques rank ’experience’ and ’age’ within for CNN, XGBT, and LGBM. But FastONN has the best
12 VOLUME 4, 2016
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
F1-score, precision, and recall when trained using only three can ignore the less important features (factors) such as salary
best features. It shows the applicability of FastONN to the while formulating HR strategy related to employees. This in
prediction of employee attrition. turn helps HR managers focus on specific factors for efficient
Unlike Dataset# 1, in Dataset# 2, the number of best decision making. Even with the best 5 features, our XGBT
features returned by the three estimators is almost the same. model is able to achieve about 99% of the score in terms of
Interesting enough, all three methods rank the level of job all metrics considered for evaluation.
satisfaction as the first, the last evaluation report as the For the case of Dataset# 3, the performance trend is
second, and so on. It is noteworthy that the importance observed the same as in Dataset# 2. For some cases, the
returned by the estimators is the same for all common 5 performance increased when the models are trained using
featured that we chose for model training for employee the best four features. For example, the accuracy and F1-
attrition classification. If the performance of applying all score of CNN model for determining whether an employee
features and the best features is analyzed (see Table 6), we will stay or leave the company are 87.46% and 84.46%,
observe that the scores of all machine learning models are respectively (Table 6) when the best 4 features (Fig. 4) are
the same. In more detail, the reduction of the number of used for model training. But these scores are significantly low
features from 9 to the best 5 features (Fig. 4 and Fig. 8) when considering all features for CNN (Table 6). For all the
does not degrade the performance, showing that the manager other models except FastONN, the scores are almost the same
VOLUME 4, 2016 13
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
for both scenario; training the model with all features and FastONN model. Furthermore, the accuracy of other models
best features. It is also served that FasONN perform better may change if applied to other datasets with different input
when only the best features are used. According to the results, features. Therefore, in future work, we can consider differ-
decision makers can consider job satisfaction and working ent datasets with different input features. Hyperparameter
hours, among other factors to be emphasized in HR strategic optimization is another potential future direction for this
planning to retain the most highly competent HR resources research.
to gain competitive advantage.
In Dataset #4, the number of best features returned by RF, VIII. CONCLUSION
ET and GBT are 9, 7 and 5, respectively, with the rankings
of common features are consistent in all three estimators Managers are often very concerned about employee turnover
as in the previous datasets. The 5 best common features due to its negative impact on company capital and strategic
are job satisfaction, period of employment in the company, objectives, typically resulting from inadequate recruitment
last evaluation report, number of projects worked on, and and management practices. Consequently, companies that
working hours (Fig. 4). These findings show that considering rely heavily on data charts can greatly benefit from identi-
factors other than these is just a waste of time, resources, fying the factors that contribute to employee turnover. This
and complication decision-making process in HR resource article employs different techniques, namely feature selec-
retention. This is further validated by training the models tion, automated multi-criteria decision-analysis, and deep
with the best five features and the results in Table 6 show learning techniques, to accurately identify the key drivers of
that about 99% accuracy can be achieved even with the best employee turnover. Although each of these methods is highly
five features. This is valid for LGBM, XGBT, and CNN, valuable and can serve as a useful tool for managers, their
and FastONN has the lower classification accuracy when it implementation in businesses has been largely overlooked.
is trained and tested using only five best features. In our automated multi-criteria decision analysis, we em-
For dataset # 5, from the best features (Fig. 4) returned by ployed recursive feature elimination (RFE) based on three
different estimators, 17 best common features are selected different techniques, Random Forest, Extra Trees, and Gra-
and trained the model to validate the importance of HR dient Boosting to show the main factors influencing em-
factors. From the scores tabulated in Table 6, it is seen that ployee turnover. The results of this method show that, for
for XGBT and LGBM, the scores are comparable with those example from dataset #1, experience, gender, age, industry,
obtained by training all features. For CNN and FastONN, profession, traffic, coach, head gender, way, extraversion,
there are significant differences. The reason may be due to independence, self-control, anxiety, and novator are the most
the limited number of input samples available for training as influential characteristics of retention decision of an em-
both CNN and FastONN are data hungry models. However, ployee based on Random Forest method. The result from
the decision maker can exploit this importance to strategize Extra Trees show that experience, age, industry, traffic, ex-
their HR resources in an efficient manner to make a quality traversion, independence, self-control, anxiety, and novator
decision. For this dataset, the highest F1-score when used are the most significant factors. Gradient Boosting shows that
best features is 73.20% by LGBM. Although we have the experience, age, and traffic are the most significant factors.
highest accuracy of 89.59% by XGBT, we are reluctant to Then we took the common features from those three methods
use this measure of accuracy for this dataset, because it is an and found that experience, age, and traffic are features that
imbalance dataset and it may bias the majority class. gave better testing accuracy than if we incorporated all the
We applied ensemble methods to filter out unnecessary features.
features in HR retention analysis. This is an important step In the subsequent stage, various machine learning algo-
for HR managers and decision makers alike to reduce labori- rithms are utilized to predict the departure of required human
ous burden and cost in taking efficient and quality decisions. resources. These algorithms are utilized and then compared
Our experimental results and validation results show that the such as XGBT (Extreme Gradient Boosting Tree), LGBM
work presented in this paper reduces the number of features (Light Gradient Boosting Machine), CNN (Convolutional
of machine learning model training, while at the same time Neural Network) and FastONN (Fast Operational Neural
the highest performance scores in automated employee reten- Network), which are the prediction algorithms used to eval-
tion classification. The importance of all features can be seen uate the results. Additionally, prediction models that use all
in Fig.8. features are implemented and compared against each other.
The results of the prediction demonstrate that the XGBT
VII. LIMITATIONS AND FUTURE WORK algorithm, in general, outperforms other techniques. Further-
A number of machine learning and deep learning techniques more, five different case studies reveal that among the three
are explored for the classification of employee attrition using methods employed to identify factors affecting employee
different datasets. Although we have tried FastONN, the turnover, total working years, and number of hours per month
accuracy is poor due to low data complexity and a lower are common factors among those case studies. Consequently,
number of input instances. This can be mitigated in the managers should prioritize these two factors to better manage
future work by considering a large data set on a fine-tuned the rate of employee turnover.
14 VOLUME 4, 2016
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
VOLUME 4, 2016 15
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
16 VOLUME 4, 2016