Credit Balance Analysis: Saee Chaudhari
Credit Balance Analysis: Saee Chaudhari
Credit Balance Analysis: Saee Chaudhari
ANALYSIS
BUSINESS REPORT 2
BUSINESS REPORT 3
A. DATA UNDERSTANDING
Data Intro
Credit card and users’ characteristics data along with the credit card balance (variable under analysis). Credit card
balance is the total amount of money currently owned by a cardholder to their credit card company.
Data dictionary
Descriptive statistics
BUSINESS REPORT 4
1. UNDERSTANDING AVERAGE BALANCE
A company manager says that the average balance on their credit cards is $500. Do you think that
this assertion is justified? Use a one-sample t-test to draw your conclusion.
BUSINESS REPORT 5
2. UNDERSTANDING GENDER INFLUENCE ON BALANCE
Is there a difference between men and women as far as average balance is concerned? Use a two-
sample t-test to draw your conclusion.
Data preparation
BUSINESS REPORT 6
3. UNDERSTANDING STUDENT FLAG INFLUENCE ON BALANCE
Data preparation
BUSINESS REPORT 7
4. UNDERSTANDING #CREDIT CARDS INFLUENCE ON BALANCE
It is generally assumed that if there are more credit cards then the balance on the cards will be
more. Based on this dataset, do you think this is true? Calculate a correlation coefficient and show
a scatter plot to support your answer
Ho: More the number of credit cards, higher is the balance on the cards
Ha: Number of credit cards doesn’t affect balance on the cards
Data preparation
Correlation Analysis
Result: Balance on the cards is dependent on the number of credit cards. More the number of cards,
more is the balance on the cards. With every 1 unit increase in the #cards, balance increases by ~$29.
Maximum number of people in the sample have up to 5 credit cards.
BUSINESS REPORT 8
5. UNDERSTANDING AGE, EDUCATION AND MARITAL STATUS INFLUENCE
ON BALANCE
Examine whether the following demographic variables influence balance: (a) age, (b) years of
education, (c) marital status. For age and years of education, use scatter plots to depict their
relationship with balance and calculate the correlation coefficient. For the relationship
between marital status and balance, use a two-sample t-test to draw your conclusion
Data preparation
Correlation Analysis
BUSINESS REPORT 9
Results:
Data preparation
BUSINESS REPORT 10
P value = 0.9099
P value > 0.05: Fail to reject null hypothesis
Data preparation
BUSINESS REPORT 11
6. UNDERSTANDING ENTHNICITY INFLUENCE ON BALANCE
“Ethnicity of the cardholder does not matter as far a balance is concerned.” Carry out an analysis of
variance (ANOVA) and discuss whether this statement is supported by the data or not.
Data preparation
Ethnicity: three values: Asian/ Caucasian/ African American -> Categorical data
P value = 0.957
P value > 0.05: Fail to reject null hypothesis
BUSINESS REPORT 12
7. UNDERSTANDING CREDIT RATING INFLUENCE ON CREDIT LIMIT
A general principle that credit card companies often follow is to assign a higher credit limit to
people with a higher credit rating. Does the data show that this principle is being followed?
Data preparation
Correlation Analysis
BUSINESS REPORT 13
8. UNDERSTANDING CREDIT LIMIT INFLUENCE ON BALANCE
Run a simple linear regression of balance on the credit limit. (Here credit limit is the X and the
balance is the Y). Report the coefficients and the R-squared. Show a scatter plot. State
inference
Data preparation
P value = 2.531E-119
P value < 0.05 : Reject null hypothesis
BUSINESS REPORT 14
Equation of balance as function of credit limit:
BUSINESS REPORT 15
9. UNDERSTANDING CREDIT RATING INFLUENCE ON BALANCE
Run a simple linear regression of balance on the credit rating (X). Report the coefficients and
R-squared. Show a scatter plot. State inference
Data preparation
P value = 1.9E-120
P value < 0.05: Reject null hypothesis
BUSINESS REPORT 16
Equation of balance as function of credit limit:
BUSINESS REPORT 17
10 . UNDERSTANDING RATING AND LIMIT INFLUENCE ON BALANCE
Consider your findings in questions 8-9. Discuss business mechanisms to increase or decrease
the balance on credit cards. Try to quantify your answers in this context, focus on possible
specific strategies using variables in Q8 and Q9 that the business could adopt to increase the
balance on credit cards
Data preparation
Correlation analysis
BUSINESS REPORT 18
Correlation findings:
• Limit and balance are highly correlated (coeff.: 0.862)
• Rating and balance are also highly correlated (coeff.: 0.864)
• Limit and rating are highly correlated (coeff.: 0.997)
• Every unit increase in ratings, keeping the limit constant will increase the balance on cards by
~$2.2
• Every dollar increase in credit card limit, keeping the rating constant will increase the balance
on cards by very negligible amount of $0.025
Also,
1. When tested for individual influence on balance: credit limit has coefficient of 0.172
2. When tested for individual influence on balance: credit rating has coefficient of 2.566
Here, the balance will increase with every unit increase in limit and rating by $0.172 and $2.566
respectively. But when tested together as influence on balance on cards, these coefficients differ, as
highlighted in Eqn.10.
Results: Business should focus on improving the credit card ratings to increase balance on cards
BUSINESS REPORT 19
11. UNDERSTANDING LIMIT AND #CARDS INFLUENCE ON BALANCE
The credit limit is provided as a consolidated amount for all the credit cards the cardholder
has. Run a multiple linear regression of Balance (Y) on Limit and Cards as two X variables.
Report the coefficients. Discuss the effect on the balance of (a) increasing the credit limit on
the same number of cards and (b) increasing the number of cards without altering the total
credit limit.
F(balance) = {limit, cards}
Data preparation
Correlation analysis
BUSINESS REPORT 20
Correlation findings:
• Limit and balance are highly correlated (coeff.: 0.861)
• Cards and balance are also highly correlated (coeff.: 0.086)
• Limit and rating are nearly correlated (coeff.: 0.0102)
• Every dollar increase in credit limit, keeping the #cards constant will increase the balance on
cards by ~$0.17
• Every unit increase in number of credit card, keeping the limit constant will increase the
balance on cards by ~$26
• Although there is negligible correlation between limit and #cards, they individually are highly
correlated to balance on the cards
Also,
3. When tested for individual influence on balance: credit limit has coefficient of 0.172
4. When tested for individual influence on balance: #cards have coefficient of 0.0864
Here, the balance will increase with every unit increase in limit and #cards by $0.172 and $28
respectively. But when tested together as influence on balance on cards, these coefficients differ, as
highlighted in Eqn.11.
Results:
• Increasing the number of cards people own, keeping the credit limit constant, will increase
balance on cards by 26 times
• Increasing the credit limit, keeping the number of cards people own constant, the balance will
increase by 0.17 times
BUSINESS REPORT 21
12. UNDERSTANDING INCOME INFLUENCE ON BALANCE
Run a simple linear regression equation with Income as X and Balance as Y. Report the
coefficients. Is the coefficient of Income significantly different from zero? What does this say
about the effect of income on balance?
Ho: Income doesn’t influence balance on the cards
Ha: Income influences balance on the cards
Data preparation
Create data as:
Correlation analysis
BUSINESS REPORT 22
• Income and balance have a correlation coefficient of 0.463
Results: Income influences balance on the cards. More the income, more is the balance on cards
Based on the equation derived in question 12, what is the estimated balance for a person with
an income of USD 100k per year?
Y = 246.515 + 6.048 *100
Y = 851.315
The estimated balance for a person with an income of $100k per year with 95% confidence is
$851.315
To safely assume the lower and upper 95% confidence interval, it is safe to say that the estimated
balance will lie within $673 and $1030
BUSINESS REPORT 23
14. UNDERSTANDING THE INFLUENCE OF SEVERAL FACTORS ON BALANCE
TOGETHER
Based on the dataset, explore the relationship between credit card balance (Y) and (a) Income
(b) Age (c) Education (c) Limit, and (d) Rating as X variables? Estimate a multiple linear
regression model and report the statistical significance of each of these variables.
Data preparation
Create data as:
Correlation analysis
BUSINESS REPORT 24
Equation of balance as function of income, age. Education, limit and rating
Y = -473.251 – 7.609 (income) – 0.860 (age) + 1.967 (education) + 0.079 (limit) + 2.774 (rating)
…Eqn12
• With every unit increase in ratings, keeping all the other parameters constant, will increase
balance by ~$2.8
• With every unit dollar increase in limit, keeping all the other parameters constant, will increase
balance by ~$0.8
• With every unit increase in level of education, keeping all the other parameters constant, will
increase balance by ~$2
• Although age and income have negative coefficients here, these coefficients are only to
indicate their performance when considering all the mentioned factors together
BUSINESS REPORT 25
BUSINESS REPORT 26