Interview Questions - Linear Regression
Interview Questions - Linear Regression
3. What are the assumptions associated with the linear regression model?
Linear regression models rely on several key assumptions:
1. Linearity: The relationship between the independent and dependent
variable is linear.
2. Independence: The residuals (errors) are independent, meaning that the
error terms are not correlated.
3. Homoscedasticity: The residuals have constant variance at all levels of the
independent variables (i.e., no heteroscedasticity).
4. Normality of residuals: The residuals should follow a normal distribution.
5. No multicollinearity: For multiple regression, the independent variables
should not be highly correlated with each other.
7. When would you prefer multiple linear regression to simple linear regression?
You would prefer multiple linear regression when:
- There are multiple independent variables that could influence the
dependent variable.
- The relationship between the dependent variable and each independent
variable is not fully captured by just one variable.
- You want to model complex, real-world relationships that depend on more
than one factor.
14.What is the difference between simple linear regression and multiple linear
regression?
Simple linear regression involves modeling the relationship between a single
independent variable and a dependent variable. Multiple linear regression, on
the other hand, involves modeling the relationship between two or more
independent variables and a dependent variable.
19.What are the assumptions of logistic regression? How do they differ from
linear regression?
Logistic regression assumes that the relationship between the independent
variables and the dependent variable is logistic (S-shaped), and the
dependent variable is binary or categorical. Unlike linear regression, logistic
regression does not assume linearity or homoscedasticity.
20.How do you handle outliers in linear regression?
Outliers in linear regression can be handled by detecting them using methods
such as box plots, scatter plots, or residual analysis and then removing them,
transforming variables, or using robust regression techniques that are less
sensitive to outliers.
22.What are the advantages of using Python for linear regression compared to
other languages?
Python offers several advantages for implementing linear regression,
including its simplicity, readability, extensive libraries for data analysis and
machine learning (e.g., NumPy, pandas, scikit-learn), and a vibrant
community that provides support and resources.
24.What are some common metrics used to evaluate the performance of a linear
regression model in Python?
Common metrics for evaluating the performance of a linear regression model
in Python include mean squared error (MSE), R-squared (coefficient of
determination), adjusted R-squared, mean absolute error (MAE), and root
mean squared error (RMSE).
30.Can you perform cross-validation for linear regression models in Python? If so,
how?
Yes, you can perform cross-validation for linear regression models in Python
using techniques like k-fold cross-validation or train-test split. Libraries like
scikit-learn provide functions (e.g., cross_val_score) for performing cross-
validation easily. Cross-validation helps assess the model's generalization
performance and avoid overfitting.