0% found this document useful (0 votes)

66 views

Logistic Regression Ensemble For Predicting Custom

Logistic Regression Ensemble For Predicting Customer Review

Uploaded by

James Sarumaha

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

66 views

Logistic Regression Ensemble For Predicting Custom

Logistic Regression Ensemble For Predicting Customer Review

Uploaded by

James Sarumaha

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Available online at www.sciencedirect.

com

ScienceDirect
Procedia Computer Science 72 (2015) 86 – 93

The Third Information Systems International Conference

Logistic Regression Ensemble for Predicting Customer

Defection with Very Large Sample Size

Heri Kuswantoa*, Ayu Asfihanib,Yogi Sarumahab, Hayato Ohwadac

a,b
Department of Statistics, Institut Teknologi Sepuluh Nopember, Kampus ITS Sukolilo, Surabaya, 60111, Indonesia
c
Department of Industrial Administration, Graduate School of Science and Technology, Tokyo University of Science, Noda-Chiba
Japan

Abstract

Predicting customer defection is an important subject for companies producing cloud based software. The studied
company sell three products (High, Medium and Low Price), in which the consumer has choice to defect or retain the
product after certain period of time. The fact that the company collected very large dataset leads to inapplicability of
standard statistical models due to the curse of dimensionality. Parametric statistical models will tend to produce very
big standard error which may lead to inaccurate prediction results. This research examines a machine learning
approach developed for high dimensional data namely logistic regression ensemble (LORENS). Using computational
approaches, LORENS has prediction ability as good as standard logistic regression model i.e. between 66% to 77%
prediction accuracy. In this case, LORENS is preferable as it is more reliable and free of assumptions.

© 2015 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license
© 2015 Published by Elsevier Ltd. Selection and/or peer-review under responsibility of the scientific
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
committeeunder
Peer-review of The Third Information
responsibility of organizing Systems
committee International Conference
of Information Systems (ISICO
International 2015) (ISICO2015)
Conference

Keywords: ensemble; logistic regression; classification; high dimensional data, machine learning

1. Introduction

The annual growth of cloud software reaches about 36 percents in software market and it will be
continued until 2016 as predicted by Columbus [1]. Furthermore, the use of internet to collect a speed and
real time feedback from customers has produced big data which lead to some complexities in predicting
the customer behaviour. This situation happens in most cloud based software companies including
company “X” producing three kind of antivirus products. The customers are recognized to be defective
when they are stop to use any products, showed by termination of the contract.
Parametric statistical approaches involve statistical test and inference which usually require strict
assumptions. The approaches are commonly failed to be applied to high dimensional data (or even very

1877-0509 © 2015 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
Peer-review under responsibility of organizing committee of Information Systems International Conference (ISICO2015)
doi:10.1016/j.procs.2015.12.108
Heri Kuswanto et al. / Procedia Computer Science 72 (2015) 86 – 93 87

large sample size data) due to the sensitivity of P-value. Lin et al. [2] showed that applying statistical test
to very large sample size data tends to reject the null hypothesis as the P-value will be extremely low.
Furthermore, computational approaches which are free of assumptions have been rapidly developed to
analyze big data. Lim [3] introduced LORENS for classification problem. The LORENS has been
developed by involving the Classification by Ensemble from Random Partitions (CERP) algorithm which
divide the variables into several subspaces. The method reassemble the logistic regression (LR) based
models from each partition into a single probability value used for classification. Lim et al. [4] as well as
Lee et al. [5] argued that LORENS is able to produce good classification accuracy. Its strength in the
classification is developed from the informative and representative characteristic of logistic regression as
well as the CERP characteristic leading to mutually exclusive of the deterministic variables.
Another challenge encountered by the company “X” data is significantly imbalance proportion
between defective and non-defective (retained) response. King and Zeng [6] showed that this kind of
imbalance response may cause bias of the estimated parameter of Binary Logistic Regression especially
when the estimation procedure is carried out by maximum likelihood estimation. In this case, the Hessian
matrix used in the estimation will be small and the estimated parameters are biased. Dealing with
imbalance response proportion, the use of 0.5 as a standard threshold for assigning the predictive
classification becomes unfair to the probability of each class. LORENS can overcome this problem by
proposing optimum threshold depending on the data characteristic.
This paper applies LORENS to predict the customer defection of cloud based software. Prasasti et al.
[7] used C4.5 and Support Vector Machine (SVM) to the same dataset used in this paper. Moreover,
Prasasti and Ohwada [8] used J48, MLP as well as SMO and obtained satisfactory classification results.
Those methods are popular machine learning approaches that are not designed specifically for high
dimensional data, while LORENS is originally developed for classification of high dimensional data. The
dataset in this paper not necessarily fits the definition of high dimensional data as the number of variable
is not greater than the sample size. However, it is worth to assess the performance and applicability of
LORENS to classify very large sample size data.

2. Literature Review

This section briefly describes about two methods applied in this paper i.e. Binary Logistic Regression
(BLR) and Logistic regression ensemble (LORENS).

2.1. Binary Logistic Regression (BLR)

Binary Logistic Regression is a method of data analysis used to find the relationship between the
variables response (y) that is binary or dichotomous with predictor variable (x) which is polycotomous or
continuous (see Hosmer and Lemeshow [9] for details). The Parameters of logistic regression are
estimated using maximum likelihood. Suppose that ‫ݔ‬௜ and ‫ݕ‬௜ are a pair of independent variable and
dependent variable from i-th observation and it is assumed that each observation is independent one
another, then the probability function for each pair can be expressed as follow .

݂ሺ‫ݔ‬௜ ሻ ൌ ߨሺ‫ݔ‬௜ ሻ௬೔ ሺͳ െ ߨሺ‫ݔ‬௜ ሻሻଵି௬೔ Ǣ‫ݕ‬௜ ൌ Ͳ‫ͳݎ݋‬

(1)
with,
೛
൬σ ഁ ೣ ൰
௘ ೕసబ ೕ ೕ
ߨሺ‫ݔ‬௜ ሻ ൌ ೛ (2)
൬σ ഁ ೣ ൰
ଵା௘ ೕసబ ೕ ೕ

where if ݆ ൌ Ͳ then the value ‫ݔ‬௜௝ ൌ ‫ݔ‬௜଴ ൌ ͳ.

88 Heri Kuswanto et al. / Procedia Computer Science 72 (2015) 86 – 93

Each pair of observation is assumed to be independent, thus the likelihood function is a combination
of the each distribution function for each pair as follows :
ଵି௬೔
݈ሺߚሻ ൌ ς௡௜ୀଵ ݂ሺ‫ݔ‬௜ ሻ ൌ ς௡௜ୀଵ ߨሺ‫ݔ‬௜ ሻ௬೔ ൫ͳ െ ߨሺ‫ݔ‬௜ ሻ൯ (3)
The given likelihood function is easier to be maximized in the form of log ݈ሺߚሻ and the parameter ߚ can
be optimized using Newton Raphson from the first derivative ‫ܮ‬ሺߚሻ. The significance of the parameter can
be tested by Wald test with the following hypothesis:
‫ܪ‬଴ ǣ ߚ௜ ൌ Ͳ
‫ܪ‬ଵ ǣ ߚ௜ ് ͲǢ ݅ ൌ ͳǡʹǡ ǥ ǡ ‫݌‬
and the statistics test is defined as
ߚመ௜
ܹൌ
ܵ‫ܧ‬൫ߚመ௜ ൯
The test statistics W (Wald statistic) follows the normal distribution with ‫ܪ‬଴ is rejected if ȁܹȁ ൐ ܼఈൗଶ.
Another important value in the logistic regression is the odds ratio. Details about interpretation of odd
ratio can be seen in Agresti [10]. If the number of class response is two, Table 1 depicts the predicted
classification and actual class.

Table 1. Cross tabulated classification of prediciton and actual class

Actual Class
ሺ൅ሻ ሺെሻ
ሺ൅ሻ True Positive (TP) False Positive (FP)
Predicted Class
ሺെሻ False Negative (FN) True Negative (TN)

Catal [11] defines the sensitivity as ܶܲȀሺܶܲ ൅ ‫ܰܨ‬ሻ and the specificity which is measured by ܶܰȀሺ‫ ܲܨ‬൅
ܶܰሻ, while the accuracy is measured from ሺܶܲ ൅ ܶܰሻȀሺܶܲ ൅ ‫ ܲܨ‬൅ ܶܰ ൅ ‫ܰܨ‬ሻ.

2.2. Logistic Regression Ensemble (LORENS)

By using the Logistic Regression algorithm Classification by Ensembles from Random Partitions
(LR_CERP), LORENS partitioned space predictor randomly into k-subspace in the same size. Because
the subspaces are randomly selected from the same distribution, it is assumed that there is no bias in the
selection of predictor variables in each subspace. In each subspace, the logistic regression model is
formed without applying variable selection. For one ensemble, LORENS combines predictive value
(probability value) of the produced logistic regression model for each partition to increase the accuracy.
The probability values from all models are averaged and classified as 0 or 1 with a certain threshold.
LORENS generates some ensembles with random varying partitions and then selects the highest value
among several ensembles. From these values, optimum accuracy level is determined by choosing the one
that significantly improved when ensemble number is increased or changed. Lim [3] showed that the
accuracy significantly increases when the number of ensemble is more than ten.
The normal threshold used in the classification for binary response in logistic regression is 0.5.
However, the classification accuracy will not be reliable if the proportion of class 1 and 0 is not equal. To
equalize the sensitivity and specificity, LORENS finds optimal threshold from the following formula,
where ‫ݕ‬ത is the average probability of observation lies in the positive class.
‫ݕ‬ത ൅ ͲǤͷ
ൌ
ʹ
To apply LORENS, either holdout or cross-validation method can be applied. The holdout procedure is
applied by taking a number of data for training and using the rest for testing. In this study, 10% of data
will be used as testing data and the remaining as training data. Meanwhile, cross validation divides the
sample into multiple partitions of k-folds or equally the same partition, each turn is used for testing and
Heri Kuswanto et al. / Procedia Computer Science 72 (2015) 86 – 93 89

the remaining is used for training, This procedure is repeated until all the partitions have been treated as a
testing set (Witten, et al. [12]). In summary, the steps of classification with LORENS using holdout
procedure can be described as follows:
- Do random partition of the variables into k-subspace predictor variables for one ensemble.
- Compose LR model for each subspace partition from data training.
- Obtain predicted value from each model for all observation from data testing.
- Calculate the average of all predicted values for each observation.
- Repeat the steps above to form n ensemble.
- Search the highest predicted value for each observation between all ensemble.
- Calculate the optimal threshold value.
- Classify the observation

3. Data and Variables

The data used in this research is secondary data that has been pre-processed by Prasasti et al. [7]. The
data was taken from the e-commerce website of company 'X' from 2007 until 2013 consisting the records
of consumer activities with consumer observation unit. Data is distinguished by its price, which are the
Low Price, Price Medium, and High Price products. The unit of observation in the data is company 'X's
consumers with sample size of 500000 consumers for Low Price product, 408810 for the Medium Price
product, and 709899 consumers High Price product. Below are the variables from the research of
Martono et al. [13] used in this research.
- Accumulation update (ܺଵ )
Accumulation update is the accumulation of updates that have been carried out by customer since
purchasing to renewal. Everytime the customers do the purchasing and renewal, thus the accumulation
update will be added by 1. The number of update ranges between 0 to 7 times
- Product price (ܺଶ )
Product price is the price of newly purchased products that range from 1886 to 39000 Japanese Yen
(JPY)
- Contract answer (ܺଷ )
Contract answer is the customer choice with value 1 for ‘opt-in’ (continue to use a certain product) or
0 for ‘opt-out’ (stop to use the product).
- Consumer type (ܺସ )
Consumer type is the type of consumer with the value 0 for individuals and 1 for organization
- Delivery status (ܺହ )
Delivery status is the status of the e-mail delivery with value 1 when it is sent and 0 if it is not sent.
- Customer defection (ܻ)
Consumer defection is consumer decision to defect or not defect (retain) the product where 1 if they
decide to defect and 0 if consumers continue to use one or more antivirus products from company ‘X’
even for different product.

4. Results and Discussion

4.1. Classification of customer defection using Binary Logistic Regression

This section is started by applying Binary Logistic Regression analysis to model Low Price customer
dataset. All variables have been entered in the model and it showed that product price and type of
consumer variables do not influence the customer defection significantly, showed by P-values that are
greater than significant level of 5 percent. A new model has been formed without considering both
insignificant variables. The best model obtained for Low Price customer data is as follow
90 Heri Kuswanto et al. / Procedia Computer Science 72 (2015) 86 – 93

௘ భǤఱరషబǤరయ೉భ షమǤవమ೉య షబǤఴఱ೉ఱ

.‫ݕ‬ሺ‫ݔ‬ሻ ൌ
ଵା௘ భǤఱరషబǤరయ೉భ షమǤవమ೉య షబǤఴఱ೉ఱ

This model is used as the basis for prediction with new observations. For this case, the result of
hypothesis testing is consistent with the obtained odd ratios, therefore we omitted the detail for the sake
of space. Now, the Binary Logistic Regression analysis is applied to Medium Price consumer data and the
following best regression model is obtained:

݁ ଶǤଽି଴Ǥ଻଴ହ௑భି଴Ǥ଴଴଴ଵଵ௑మିଷǤଷଶ଺௑యି଴Ǥ଴ହଶ଼௑రି଴Ǥଶ଴ଵ௑ఱ
‫ݕ‬ሺ‫ݔ‬ሻ ൌ
ͳ ൅ ݁ ଶǤଽି଴Ǥ଻଴ହ௑భି଴Ǥ଴଴଴ଵଵ௑మିଷǤଷଶ଺௑యି଴Ǥ଴ହଶ଼௑రି଴Ǥଶ଴ଵ௑ఱ

The hypothesis testing shows that all the variables included in the model above have significant effect on
the model with the values of odd ration listed in Table 2.

.Table 2. The coefficients and odds ratio of Medium Price customer data
Parameter Coefficient Odds Ratio
(Intercept) 2.9 18.24
Update Accumulation -0.7 0.49
Product Price -0.00011 0.99
Contract Answer -3.33 0.04
Consumer Type -0.053 0.95
Delivery Statuss -0.2 0.82

Table 2 reveals that the Binary Logistic Regression yields on misleading results i.e. product price and
consumer type variables have significant effect in the model but not in the magnitude of the odds ratio.
Similar result is obtained for High price product where all variables are significantly influence the
tendency of being defective or not, tested with P-value. However, the odd ratios for customer type and
product price variables are nearly zero. These misleading results are induced by the very large number of
dataset used to form the model leading to very small P-values. Moreover, the datasets are suffered from
imbalance proportion of class response. Similar results are obtained for High Price customer dataset.
Tables 3 shows the classification results generated from the models, while its accuracy, sensitivity
and specificity are performed in Table 4.

Table 3. Classification result with Binary Logistic Regression analysis

Actual Class
Low Price Medium Price High Price
‫݌‬ሺ൅ሻ ݊ሺെሻ ‫݌‬ሺ൅ሻ ݊ሺെሻ ‫݌‬ሺ൅ሻ ݊ሺെሻ
‫݌‬ሺ൅ሻ 22878 10299 24467 6432 30337 14113
Prediction class
݊ሺെሻ 6433 10391 2842 7141 8307 18234

Table 4. Classification accuracy of Binary Logistic Regression analysis

Product Accuracy Sensitivity Specificity

Low Price 66.54 77.31 68.42
Medium Price 78.05 89.6 78.5
High Price 50.22 52.61 56.37

We see that the Binary Logistic model has poor performance to classify the High Price customer data.
Meanwhile, the accuracy levels of other tow cases are moderate. Again, The model is suffered from
misleading results for Medium and High Price data.

4.2. Classification of customer defection using LORENS with holdout

Heri Kuswanto et al. / Procedia Computer Science 72 (2015) 86 – 93 91

As an illustration, the LORENS analysis is applied to the Low Price data with 3 partitions and 10
ensembles. It means that 5 predictor variables will be allocated into 3 space partitions and the process will
be repeated up to10 ensembles as shown in Table 5.
Table 5. Allocation of Predictor Variables in 3 Partitions and 10 Ensembles

Ensemble
Variabel Prediktor
1 2 3 4 5 6 7 8 9 10
Update Accumulation 3 2 3 1 1 1 2 1 1 3
Product Price 1 3 1 1 3 2 3 1 2 3
Contract Answer 3 3 3 3 2 3 1 3 1 1
Consumer Type 1 1 1 3 3 1 1 2 3 1
Delivery Status 2 1 2 2 1 3 3 3 3 2

Table 5 informs us that first ensemble consists of 3 models (subspaces) in which product price and
consumer type are in model 1, update accumulation and contract answer are in model 3. Meanwhile,
model 2 consists only one predictor variable i.e. delivery status. From the specification in Table 5, logistic
regression models are formed in each space partition of the training data. Furthermore, the threshold value
is calculated and its value is 0.5431. It means that consumers with an average probability of greater than
0.5431 will be classified into defection class, and vice versa. Results of the classification will be
compared with the actual class in order to be able to calculate sensitivity and specificity. Applying the
same procedure as described above, LORENS analysis is performed to all data partition sizes of 1 to 5 as
well as the size threshold of 0.5 and optimum threshold. From the analysis we tabulated the accuracy,
sensitivity and specificity as performed in Table 6.
Table 6. LORENS classification accuracy analysis with holdout

Optimum Threshold
Product Partition 1 2 3 4 5
Acur. 66.54 66.54 66.25 65.04 65.00
Low Price Sens. 78.05 78.05 78.24 88.43 88.49
Spec. 50.22 50.22 49.27 31.90 31.73
Acur. 75.45 77.20 74.53 74.06 67.79
edium
Sens. 78.80 88.09 92.09 92.32 97.96
Price
Spec. 68.72 55.29 39.20 37.34 7.09
Acur. 69.04 67.88 67.78 67.79 67.73
High Price Sens. 76.98 78.85 78.96 78.97 78.98
Spec. 59.56 54.77 54.43 54.43 54.30
Threshold 0.5
Product Partition 1 2 3 4 5
Acur. 66.54 65.04 64.50 63.41 59.27
Low Price Sens. 78.05 88.09 89.55 95.09 98.66
Spec. 50.22 32.39 29.01 18.54 3.47
Acur. 77.32 69.60 66.75 66.87 66.83
Medium
Sens. 89.59 96.43 99.69 99.99 99.99
Price
Spec. 52.61 15.62 0.49 0.24 0.10
Acur. 68.42 67.78 67.74 65.58 65.95
High
Sens. 78.50 78.96 78.98 88.43 89.61
Price
Spec. 56.37 54.43 54.32 38.29 37.68

In the analysis, the optimum partition is selected when the addition of one predictor into the model
could increase the classification accuracy most significantly. From the LORENS with holdout the size of
optimum partition for Low Price data is 3 partitions, for Medium Price is 4 partitions and for High Price
is 1 partition. If we compare accuracy of using optimum threshold and standard threshold in Table 6,
analysis with optimum threshold outperforms the analysis of using threshold equal to 0.5
92 Heri Kuswanto et al. / Procedia Computer Science 72 (2015) 86 – 93

4.3. Classification of customer defection using LORENS with cross validation

The LORENS analysis proposed to use cross validation method in the classification steps. This
method treats all observations equally in terms of the position as a training set and testing set. In this case,
LORENS with cross validation is incomparable with Binary Logistic Regression and LORENS with
holdout because the training and testing dataset involved in the analysis are different.
Suppose that the High Price dataset were analyzed using the size of the partition 2 and 10 ensembles
with 10 folds. Predictor variables are allocated to the partition space and the training data is used to
construct models. In the first fold, the predictor variables are substituted into the model for each partition
space. Probability values resulted from the two partitions on the same ensemble are averaged, which then
compared with the optimum threshold value to predict the class. Using LORENS, different threshold is
obtained from different folds as the training dataset are also different. Table 7 performs the optimum
threshold values of analysing High Pricedata with 2 partitions for each fold
Table 7. Optimum threshold for different each fold

Fold Optimum Threshold Fold Optimum Threshold

1 0.522120 6 0.522233
2 0.522158 7 0.522117
3 0.522007 8 0.522003
4 0.522338 9 0.522365
5 0.522257 10 0.522257

After applying the complete procedure of LORENS with 10 ensembles including the majority voting
steps, we obtained the accuracy, sensitivity and specificity as listed in Table 8.
Table 8. LORENS classification accuracy analysis with cross validation

Optimum Threshold
Product Partition 1 2 3 4 5
Accuracy 66.34 66.34 66.04 65.09 65.04
Low Price Sensitivity 77.69 77.70 77.99 88.22 88.35
Specificity 50.26 50.24 49.12 32.31 32.01
Accuracy 75.16 76.81 74.50 73.97 67.76
Medium Price Sensitivity 78.45 88.04 91.97 92.20 97.80
Specificity 68.54 54.23 39.35 37.29 7.32
Accuracy 69.21 68.08 68.03 68.03 72.78
High Price Sensitivity 76.87 78.69 78.85 78.86 78.73
Specificity 60.06 55.40 55.10 55.10 67.66
Threshold 0.5
Product Partition 1 2 3 4 5
Accuracy 66.34 65.08 64.70 63.56 59.33
Low Price Sensitivity 77.69 88.29 89.16 95.01 98.63
Specificity 50.26 32.20 30.04 19.01 3.65
Accuracy 77.19 73.61 66.76 66.88 66.84
Medium Price Sensitivity 89.31 92.56 99.66 99.98 99.99
Specificity 52.82 35.49 0.57 0.27 0.15
Accuracy 68.70 68.03 68.00 65.26 65.98
High Price Sensitivity 78.41 78.85 78.88 86.01 89.50
Specificity 57.09 55.10 55.00 40.48 37.88

Similar to the holdout method, the optimum partition is selected when adding the variables in the
model can improve the accuracy significantly. From the analysis of LORENS with Cross Validation,
optimum partition for data Low Price is partition with size of 3, for data Medium Price is 4 partitions, and
for the data High Price is 1 partition. Classification with optimum threshold still yields on better
classification results.
Heri Kuswanto et al. / Procedia Computer Science 72 (2015) 86 – 93 93

4.4. Best model selection

Table 9 below summarizes the results of the classification accuracy using Binary Logistic Regression
and LORENS with optimum partition and optimum threshold
Tabel 9 Comparison the accuracy of classification using BLR and LORENS

Product Method Accur. Sens. Spec.

Low Price BLR 66.54 78.05 50.22
LORENS 66.25 78.24 49.27
Medium Price BLR 77.32 89.59 52.61
LORENS-Holdout 74.06 92.32 37.34
High Price BLR 68.42 78.50 56.37
LORENS-Holdout 69.04 76.98 59.56

We can see that the classification accuracy of Binary Logistic Regression is slightly greater than
LORENS especially for Medium Price. Moreover, LORENS outperforms the Logistic Regression for
High Price data and similar result is obtained for Low Price data. Having the fact that there is misleading
result in the Binary Logistic Regression, thus the accuracy generated from this method is also
questionable, while the accuracy generated from LORENS is valid due to the fact that this method is free
of assumption.

5. Conclusion

This paper has successfully applied LORENS to classify cases where the sample size is very large.
Although LORES was originally developed for high dimensional data in the sense that the number of
variables exceeds the sample size, this paper shows that LORENS is still capable to be applied for limited
number of variable but large sample size. The analysis clearly showed that standard logistic regression
model fails to generate consistent results between P-value test and odd ratio. The LORENS is also
outperforms the logistic regression for some cases. To deal with the threshold choice, LORENS offers a
fair way to set the threshold. It has been shown also that using optimum threshold in LORENS yields on
better classification results than using threshold of 0.5.

References

[1] Colombus, L. Predicting enterprise cloud computing growth. Forbes (April 9, 2013), available at http://www.forbes.com/
sites/louiscolumbus/2013/09/04/predicting-enterprise-cloud-computing-growth/ (accessed on July 10, 2014).
[2] Lin, M., Lucas, H.C.Jr and Shmueli, G. Too big to fail: large samples and the p-value problem. Information System
Research, Article in Advance, 1-12; 2013
[3] Lim,N. Classification by ensembles from random partitions using logistic models. PhD thesis, Stony Brook University;2007.
[4] Lim, N., Ahn, H., Moon, H., Chen, J. J. Classification of High Dimensional Data with Ensemble of Logistic Regression
Models. Journal of Biopharma-ceutical Statistics 20:160-17; 2010.
[5] Lee, K., Ahn, H., Moon, H., Kodell, R.L., & Chen, J.J. Multinomial Logistic Regression Ensembles. Biopharm Stat, 23(3),
681-94; 2013.
[6] King, G. and Zeng, L. Logistic regression in rare events data. Society for Political Methodology WV006-01; 2001.
[7] Prasasti, N., Okada, M., Kanamori, K. And Ohwada, H. Customer lifetime value and defection possibility prediction model
using machine learning: an application to a cloud-based software company. Lecture Notes in Customer Science, 8399; 2013.
[8] Prasasti, N. and Ohwada, H. Applicability of machine-learning techniques in predicting customer Defection.In: International
Symposium on Technology Management and Emerging Technologies (ISTMET); 2014.
[9] Hosmer, D.W. and Lemeshow, S. Applied logistic regression, Second Edition. New York: John Wiley & Sons, Inc.; 2000.
[10] Catal, C. Performance evaluation metrics for software fault prediction Studies.Acta Polytechnica Hungarica; 2012, 9 ( 4).
[11] Agresti, A. Categorical data analysis. New York: John Wiley & Sons, Inc; 2002.
[12] Witten, I. H., Frank. E., Hall. M. A. Data mining: practical machine learning tools and techniques 3 rd Edition. Burlington:
Morgan Kaufmann; 2001.
[13] Martono, N.P., Kanamori, K. and Ohwada, H. Utilizing customer’s purchase and contract renewal details to predict
defection in the cloud software industry. Springer International Publishing Switzerland: PKAW 2014, LNCS 8863; 2014,
138–149.

Machine Learning Models and Bankruptcy Prediction Paper File
No ratings yet
Machine Learning Models and Bankruptcy Prediction Paper File
13 pages
Workstation Installation Checklist
No ratings yet
Workstation Installation Checklist
13 pages
Participation For ED 6590
No ratings yet
Participation For ED 6590
15 pages
New Theory On Ayanamsa
No ratings yet
New Theory On Ayanamsa
3 pages
Customer Churn Prediction Using Improved Balanced Random Forests
No ratings yet
Customer Churn Prediction Using Improved Balanced Random Forests
5 pages
Class Imbalance Should Not Throw You Off Balance - Choosing The Right Classifiers and Performance Metrics For Brain Decoding With Imbalanced Data
No ratings yet
Class Imbalance Should Not Throw You Off Balance - Choosing The Right Classifiers and Performance Metrics For Brain Decoding With Imbalanced Data
14 pages
Expert Systems With Applications: Yaya Xie, Xiu Li, E.W.T. Ngai, Weiyun Ying
No ratings yet
Expert Systems With Applications: Yaya Xie, Xiu Li, E.W.T. Ngai, Weiyun Ying
5 pages
Machine Learning Techniques Assignment-7: Name:Ishaan Kapoor Rollno:1/15/Fet/Bcs/1/055
No ratings yet
Machine Learning Techniques Assignment-7: Name:Ishaan Kapoor Rollno:1/15/Fet/Bcs/1/055
5 pages
Mathematical Programming For Piecewise Linear Regression Analysis
No ratings yet
Mathematical Programming For Piecewise Linear Regression Analysis
43 pages
Journal of Statistical Software: Imputation With The R Package VIM
No ratings yet
Journal of Statistical Software: Imputation With The R Package VIM
16 pages
An Overview of Software For Conducting Dimensionality Assessmentent in Multidiomensional Models
No ratings yet
An Overview of Software For Conducting Dimensionality Assessmentent in Multidiomensional Models
11 pages
Empirical Validation of Neural Network Models For Agile Software Effort Estimation Based On Story Points
No ratings yet
Empirical Validation of Neural Network Models For Agile Software Effort Estimation Based On Story Points
10 pages
2
No ratings yet
2
15 pages
Thesis Proposal Abstract - Deepesh Gotherwal
No ratings yet
Thesis Proposal Abstract - Deepesh Gotherwal
6 pages
Biometrics - 2020 - Williamson - Nonparametric variable importance assessment using machine learning techniques
No ratings yet
Biometrics - 2020 - Williamson - Nonparametric variable importance assessment using machine learning techniques
14 pages
Multimodal Biometrics: Issues in Design and Testing
No ratings yet
Multimodal Biometrics: Issues in Design and Testing
5 pages
Count Distributions For Autoregressive Conditional Duration Model
No ratings yet
Count Distributions For Autoregressive Conditional Duration Model
2 pages
Prediction of Medical Costs Using Regression Algorithms: A. Lakshmanarao, Chandra Sekhar Koppireddy, G.Vijay Kumar
0% (1)
Prediction of Medical Costs Using Regression Algorithms: A. Lakshmanarao, Chandra Sekhar Koppireddy, G.Vijay Kumar
7 pages
Entropy: Deconstructing Cross-Entropy For Probabilistic Binary Classifiers
No ratings yet
Entropy: Deconstructing Cross-Entropy For Probabilistic Binary Classifiers
20 pages
Sharpening The Blade Missing Data Imputation Using Supervised Machine Learning
No ratings yet
Sharpening The Blade Missing Data Imputation Using Supervised Machine Learning
24 pages
Machine Learning (CSE4020) Review III: A Review On Bio-Inspired Computing in Finance Management
No ratings yet
Machine Learning (CSE4020) Review III: A Review On Bio-Inspired Computing in Finance Management
20 pages
A_New_Nadarajah-Haghighi_Distribution_with_Applica
No ratings yet
A_New_Nadarajah-Haghighi_Distribution_with_Applica
27 pages
Dissertation Using Logistic Regression
100% (2)
Dissertation Using Logistic Regression
6 pages
Prabhakar 2002
No ratings yet
Prabhakar 2002
14 pages
Ahrens - Et - Al - A Theory-Based Lasso For Time-Series Data
No ratings yet
Ahrens - Et - Al - A Theory-Based Lasso For Time-Series Data
35 pages
R - (M) T S G - R C GAN: EAL Valued Edical IME Eries Enera Tion With Ecurrent Onditional S
No ratings yet
R - (M) T S G - R C GAN: EAL Valued Edical IME Eries Enera Tion With Ecurrent Onditional S
13 pages
Oversampling techniques for imbalanced data in regression
No ratings yet
Oversampling techniques for imbalanced data in regression
19 pages
Autorank_A_Python_package_for_automated_ranking_of
No ratings yet
Autorank_A_Python_package_for_automated_ranking_of
4 pages
DAta Mining Healthcare
No ratings yet
DAta Mining Healthcare
8 pages
2019_GOSIEWSKA_AUDITOR_AN r package for model agnostic visual validation and diagnostics
No ratings yet
2019_GOSIEWSKA_AUDITOR_AN r package for model agnostic visual validation and diagnostics
14 pages
The Big Data Newsvendor Practical Insights From
No ratings yet
The Big Data Newsvendor Practical Insights From
17 pages
The Prediction of Disease Using Machine Learning: December 2021
No ratings yet
The Prediction of Disease Using Machine Learning: December 2021
8 pages
ICMLSC2019 C018-A 29 11 2018 PPLan
No ratings yet
ICMLSC2019 C018-A 29 11 2018 PPLan
7 pages
Generative Adversarial Networks in Time Series: A Systematic Literature Review
No ratings yet
Generative Adversarial Networks in Time Series: A Systematic Literature Review
31 pages
Oral Care
No ratings yet
Oral Care
6 pages
1 s2.0 S1319157821000677 Main
No ratings yet
1 s2.0 S1319157821000677 Main
15 pages
Expert Systems With Applications: Georgios Douzas, Fernando Bacao
No ratings yet
Expert Systems With Applications: Georgios Douzas, Fernando Bacao
8 pages
Progresser: Adaptive Progressive Approach To Relational Entity Resolution
No ratings yet
Progresser: Adaptive Progressive Approach To Relational Entity Resolution
45 pages
Chapter 3 CYTED Book
No ratings yet
Chapter 3 CYTED Book
19 pages
Opposition-Based Differential Evolution
No ratings yet
Opposition-Based Differential Evolution
16 pages
HPC Mini Project Report
100% (1)
HPC Mini Project Report
12 pages
A modified ID3 decision tree algorithm based on cumulative
No ratings yet
A modified ID3 decision tree algorithm based on cumulative
19 pages
An Overview of Overfitting and Its Solutions
No ratings yet
An Overview of Overfitting and Its Solutions
7 pages
Li 2019
No ratings yet
Li 2019
28 pages
A Structured Approach To Neural
No ratings yet
A Structured Approach To Neural
8 pages
Review of Data Analysis Algorithm and Its Applications
No ratings yet
Review of Data Analysis Algorithm and Its Applications
6 pages
L - B F E A: Earning Ased Requency Stimation Lgorithms
No ratings yet
L - B F E A: Earning Ased Requency Stimation Lgorithms
20 pages
23-1553
No ratings yet
23-1553
38 pages
VARMA_JoE_May17
No ratings yet
VARMA_JoE_May17
41 pages
A03 Research Paper
No ratings yet
A03 Research Paper
11 pages
Cruttwell et al. - 2021 - Categorical Foundations of Gradient-Based Learning
No ratings yet
Cruttwell et al. - 2021 - Categorical Foundations of Gradient-Based Learning
30 pages
ILKOM - AI Super Resolution Application To Turbulence and Combustion
No ratings yet
ILKOM - AI Super Resolution Application To Turbulence and Combustion
27 pages
Untitled document (4)
No ratings yet
Untitled document (4)
6 pages
Zhao 2020
No ratings yet
Zhao 2020
32 pages
SE-Sync - A Certifiably Correct Algorithm For Synchronization Over The Special Euclidean Group (2019)
No ratings yet
SE-Sync - A Certifiably Correct Algorithm For Synchronization Over The Special Euclidean Group (2019)
31 pages
Resilient Parallel Similarity Based Reasoning For Cla - 2016 - Digital Communica
No ratings yet
Resilient Parallel Similarity Based Reasoning For Cla - 2016 - Digital Communica
6 pages
Predicting Financial Distress of Agriculture Companies in The EU 2017 9p SK
No ratings yet
Predicting Financial Distress of Agriculture Companies in The EU 2017 9p SK
9 pages
Conformal Recursive Feature Elimination
No ratings yet
Conformal Recursive Feature Elimination
35 pages
Statistical Classification: Fundamentals and Applications
From Everand
Statistical Classification: Fundamentals and Applications
Fouad Sabry
No ratings yet
Introduction To Business Statistics Through R Software: Software
From Everand
Introduction To Business Statistics Through R Software: Software
Editor IJSMI
No ratings yet
Support Vector Machine: Fundamentals and Applications
From Everand
Support Vector Machine: Fundamentals and Applications
Fouad Sabry
No ratings yet
Analyzing the Large Number of Variables in Biomedical and Satellite Imagery
From Everand
Analyzing the Large Number of Variables in Biomedical and Satellite Imagery
Phillip I. Good
No ratings yet
Pattern Recognition: Fundamentals and Applications
From Everand
Pattern Recognition: Fundamentals and Applications
Fouad Sabry
No ratings yet
Finals
No ratings yet
Finals
2 pages
How To Make DIFFICULT Choices by Gaur Gopal Das
No ratings yet
How To Make DIFFICULT Choices by Gaur Gopal Das
4 pages
Skyway RMC: Plants PVT - LTD
No ratings yet
Skyway RMC: Plants PVT - LTD
16 pages
Resumen Del Libro de Spivak
No ratings yet
Resumen Del Libro de Spivak
7 pages
01b EasyIO FS20 Installation v1
No ratings yet
01b EasyIO FS20 Installation v1
13 pages
4-5. Art, Copy - Creative Strategy-Ganjil2021-2022
No ratings yet
4-5. Art, Copy - Creative Strategy-Ganjil2021-2022
163 pages
7900
No ratings yet
7900
2 pages
DR - Atta The Wondrous World of Science
100% (1)
DR - Atta The Wondrous World of Science
282 pages
Electronic Commerce Proc 1205.2020
No ratings yet
Electronic Commerce Proc 1205.2020
32 pages
Chapter 2.1 Cell Structure and Function
No ratings yet
Chapter 2.1 Cell Structure and Function
5 pages
ALTURAS Pampa Colorada Feb2009 PDF
No ratings yet
ALTURAS Pampa Colorada Feb2009 PDF
2 pages
18 - Extreme Risk and Fat-Tails Distribution Model
No ratings yet
18 - Extreme Risk and Fat-Tails Distribution Model
2 pages
Basic Techniques of Technical Writing
100% (1)
Basic Techniques of Technical Writing
6 pages
Ph.D. Regulations
67% (3)
Ph.D. Regulations
28 pages
Laser Gauge Application
No ratings yet
Laser Gauge Application
2 pages
Phrases Used in Mathematical Texts
No ratings yet
Phrases Used in Mathematical Texts
3 pages
Discrepant Events PDF
No ratings yet
Discrepant Events PDF
9 pages
Ielts Speaking A Collection of Common Topics: Unit 1 People Lesson 7 An Old Person You Respect
No ratings yet
Ielts Speaking A Collection of Common Topics: Unit 1 People Lesson 7 An Old Person You Respect
8 pages
Jurisprudence-I Class Notes
100% (2)
Jurisprudence-I Class Notes
23 pages
Amlodipine Besylate
No ratings yet
Amlodipine Besylate
6 pages
The Role of Informal Mechanism in The Co
No ratings yet
The Role of Informal Mechanism in The Co
28 pages
Pending Transactions For Inventory Period Close
No ratings yet
Pending Transactions For Inventory Period Close
6 pages
Aschenbrenner 1964-Aesthetics and Logic - An Analogy
No ratings yet
Aschenbrenner 1964-Aesthetics and Logic - An Analogy
18 pages
Decision Making Skills
No ratings yet
Decision Making Skills
14 pages
CBA Pro Circuit Breaker Analyzer User Manual
No ratings yet
CBA Pro Circuit Breaker Analyzer User Manual
92 pages
Worksheet
No ratings yet
Worksheet
2 pages
Module 3a-Physics 1
100% (1)
Module 3a-Physics 1
5 pages