Cognitive Class - Answers Data Analysis With Python
Cognitive Class - Answers Data Analysis With Python
Clear My Certification September 18, 2020 Cognitive Class Leave a comment 11,862 Views
Module 1 – Introduction
Question 1: What does CSV stand for ?
Comma Separated Values
Car Sold values
Column
Each element in the data set
Question 3: What is another name for the variable that we want to predict?
Target
Feature
Dataframe
Question 4: What is the command to display the first five rows of a dataframe df?
df.head()
df.tail()
Question 5: what command do you use to get the data type of each row of the dataframe df?
df.dtypes
df.head()
df.tail()
df,tails()
Question 7: If you use the method describe() without changing any of the arguments you will get a statistical summary
of all the columns of type object?
False
True
Question 2: Consider the dataframe “df”, what does the command df.rename(columns={‘a’:’b’}) change about the
dataframe “df”
rename column “a” of the dataframe to “b”
rename the row “a” to “b”
Question 4: Consider the column of the dataframe df[‘a’]. The colunm has been standardized. What is the standard
deviation of the values, i.e the result of applying the following operation df[‘a’].std() :
1
0
3
Question 5: Consider the column of the dataframe df[‘Fuel’], with two values ‘gas’ and’ diesel’. What will be the name
of the new colunms pd.get_dummies(df[‘Fuel’]) ?
1 and 0
Just diesel
Just gas
Just gas
df.tail()
df.summary()
-1
Question 5: What is the Pearson correlation between variables X and Y, if X=Y:
-1
1
0
X
Y
500
100
0
Question 2: What value of R^2 (coefficient of determination) indicates your model performs best ?
-100
-1
0
1
Question 3: What statement is true about Polynomial linear regression
Polynomial linear regression is not linear in any way
Although the predictor variables of Polynomial linear regression are not linear the relationship between the
parameters or coefficients is linear.
Polynomial linear regression uses wavelets
Question 4: The larger the mean square error, the better your model has performed
False
True
Question 5: Assume all the libraries are imported, y is the target and X is the features or dependent variables, consider
the following lines of code:
Input = [(‘scale’, StandardScaler()), (‘model’, LinearRegression())]
pipe = Pipeline(Input)
pipe.fit(X,y)
ypipe = pipe.predict(X)
What have we just done in the above code?
Polynomial transform, Standardize the data, then perform a prediction using a linear regression model
Standardize the data, then perform prediction using a linear regression model
Polynomial transform then Standardize the data
Module 5 – Model Evaluation:
Question 1: In the following plot, the vertical access shows the mean square error andthe horizontal axis represents the
order of the polynomial. The red line represents the training error the blue line is the test error. What is the best order
of the polynomial given the possible choices in the horizontal axis?
2
8
16
Question 2: What is the use of the “train_test_split” function such that 40% of the data samples will be utilized for
testing, the parameter “random_state” is set to zero, and the input variables for the features and targets are_data,
y_data respectively.
train_test_split(x_data, y_data, test_size=0, random_state=0.4)
The average R^2 on the test data for each of the two folds
This function finds the free parameter alpha
Question 4: What is the code to create a ridge regression object “RR” with an alpha term equal 10
RR=LinearRegression(alpha=10)
RR=Ridge(alpha=10)
RR=Ridge(alpha=1)
Question 5: What dictionary value would we use to perform a grid search for the following values of alpha: 1,10, 100.
No other parameter values should be tested
alpha=[1,10,100]
[{‘alpha’: [1,10,100]}]
[{‘alpha’: [0.001,0.1,1, 10, 100, 1000,10000,100000,100000],’normalize’:[True,False]} ]
Question 2: How would you provide many of the summery statistics for all the columns in the dataframe “df”:
df.describe(include = “all”)
df.head()
type(df)
df.shape
df.head()
type(df)
df.shape
Question 4: What task does the following command to df.to_csv(“A.csv”) perform
change the name of the column to “A.csv”
load the data from a csv file called “A” into a dataframe
Question 7: How do you “one hot encode” the column ‘fuel-type’ in the dataframe df
pd.get_dummies(df[“fuel-type”])
df.mean([“fuel-type”])
df[df[“fuel-type”])==1 ]=1
dependent variable
Question 9: What does the horizontal axis in a scatter plot represent
independent variable
dependent variable
Question 10: If we have 10 columns and 100 samples how large is the output of df.corr()
10 x 100
10 x 10
100×100
100×100
Question 11: what is the largest possible element resulting in the following operation “df.corr()”
100
1000
1
Question 12: if the Pearson Correlation of two variables is zero:
the two variable have zero mean
Standardize the data, then perform a prediction using a linear regression model using the features Z and
targets y
Question 17: What is the maximum value of R^2 that can be obtained
10
1
0
Question 18: We create a polynomial feature as follows “PolynomialFeatures(degree=2)”, what is the order of the
polynomial
0
1
2
Question 19: You have a linear model the average R^2 value on your training data is 0.5, you perform a 100th order
polynomial transform on your data then use these values to train another model, your average R^2 is 0.99 which
comment is correct
100-th order polynomial will work better on unseen data
the results on your training data is not the best indicator of how your model performs, you should use your test
data to get a beter idea
Question 20:You train a ridge regression model, you get a R^2 of 1 on your training data and you get a R^2 of 0 on
your validation data, what should you do:
Nothing your model performs flawlessly on your test data
your model is under fitting perform a polynomial transform