Excel Analytics Google
Excel Analytics Google
DIV/0!
Reason:Your formula or function is asking the spreadsheet to
divide something that is impossible to divide. Its value is either
'0' or blank.
Solution1 Divide by another number or add a value to the blank
cell
Solution2 You could use the QUOTIENT function in
IFB3,QUOTIENTB2,B3,0 to divide B2 by B3, and make the
solution 0 if the dividing parameter (B3 is 0
N/A
Reason:It tells you that the numbers in your formula or function
can’t be found by the spreadsheet application
Solution1Simply add the data being looked up in the formula that
is causing the error
NAME?
Reason: It means that your spreadsheet needs help
understanding your formula or function
Solution1 check the spelling and make sure you use the full
name of any formula
NULL!
Reason: You have specified two or more ranges that are
supposed to overlap, but don’t actually intersect
Solution1 Use a colon (:) to separate the first cell from the last
cell when you refer to a continuous range of cells in a formula
Solution2 Use a comma (,) between ranges to tell your
spreadsheet that they are actually separate.E.G;
=count(G1G10,B1B10
NUM!
Reason: The spreadsheet can’t perform a calculation as written.
Solution: Return to your formula and double-check it
REF!
Reason: It means that there are spaces, characters, or text in
your formula or function, which should actually be numbers Your
formula or function is referencing a cell that is not valid.
Solution: Make sure you are giving your formula or function only
valid cells by double-checking the reference
VALUE!
Reason: It means that there are spaces, characters, or text in
your formula or function, which should actually be numbers
Solution: You can solve this problem by making sure they are
either eliminated or numerical
=SORT(F19:G82, 2, TRUE)
=VALUE(A2)
To convert a column of text values, you enter the formula
in the first cell, and drag the fill handle to copy the
formula down the column
Use commas to separate the cells you are combining and use
quotation marks to add spaces, commas, or other text
search_key
range
index
is_sorted
SUMIF
The basic syntax of a SUMIF function is
The first range is where the function will search for the condition
that you have set. The criterion is the condition you are applying
and the sum_range is the range of cells that will be included in
the calculation.
SUMIFS
SUMIF and SUMIFS are very similar, but SUMIFS can include
multiple conditions.
The square brackets let you know that this is optional. The
ellipsis at the end of the statement lets you know that you can
have as many repetition of these parameters as needed. For
example, if you wanted to calculate the sum of the fuel costs for
one date in this table, you could create a SUMIF statement with
multiple conditions, like this:
=SUMIFS(B1:B9,A1:A9,"fuel",C1:C9,"12/15/2020")
COUNTIF
=COUNTIF(range, criterion)
Just like SUMIF, you set the range and then the condition that
needs to be met. For example, if you wanted to count the
number of times Food came up in the Expenses column, you
could use a COUNTIF function like this:
=COUNTIF(A1:A9,"Food")
COUNTIFS
=COUNTIFS(criteria_range1, criterion1, [criteria_range2,
criterion2, ...])
The criteria_range and criterion are in the same order, and you
can add more conditions to the end of the function. So, if you
wanted to find the number of times Coffee appeared in the
Expenses column on 12/15/2020, you could use COUNTIFS to
apply those conditions, like this:
=COUNTIFS(A1:A9,"Coffee",C1:C9,"12/15/2020")
plt.show()
plt.close()
#training data
#testing data
# lr: linear regression object
#poly_transform: polynomial transformation object
xmax=max([xtrain.values.max(), xtest.values.max()])
xmin=min([xtrain.values.min(), xtest.values.min()])
Code V1
import pandas as pd
import numpy as np
df.to_csv('module_5_auto.csv')
df=df._get_numeric_data()
df.head()
Code V2
import pandas as pd
import numpy as np
df.to_csv('module_5_auto.csv')
df=df._get_numeric_data()
df.head()
y_data = df['price']
#### Drop price data in dataframe **x_data**
x_data=df.drop('price',axis=1)
lre=LinearRegression()
lre.score(x_test[['horsepower']], y_test)
#### 0.3635875575078824
#### We can see the R^2 is much smaller using the test
data compared to the training data
lre.score(x_train[['horsepower']], y_train)
#### 0.6619724197515103
Code V3
import pandas as pd
import numpy as np
df.to_csv('module_5_auto.csv')
df=df._get_numeric_data()
df.head()
y_data = df['price']
x_data=df.drop('price',axis=1)
lre=LinearRegression()
lre.fit(x_train[['horsepower']], y_train)
lre.score(x_test[['horsepower']], y_test)
#### 0.3635875575078824
#### We can see the R^2 is much smaller using the test
data compared to the training data
lre.score(x_train[['horsepower']], y_train)
#### 0.6619724197515103
Rcross
-1 * cross_val_score(lre,x_data[['horsepower']],
y_data,cv=4,scoring='neg_mean_squared_error')
yhat = cross_val_predict(lre,x_data[['horsepower']],
y_data,cv=4)
yhat[0:5]
Overfitting
Overfitting occurs when the model fits the noise, but not the
underlying process. Therefore, when testing your model using
the test set, your model does not perform as well since it is
modelling noise, not the underlying process that generated the
relationship
Code V4
plt.title(Title)
plt.xlabel('Price (in dollars)')
plt.ylabel('Proportion of Cars')
plt.show()
plt.close()
#training data
#testing data
# lr: linear regression object
#poly_transform: polynomial transformation object
xmax=max([xtrain.values.max(), xtest.values.max()])
xmin=min([xtrain.values.min(), xtest.values.min()])
import pandas as pd
import numpy as np
df.to_csv('module_5_auto.csv')
df=df._get_numeric_data()
df.head()
y_data = df['price']
x_data=df.drop('price',axis=1)
lre=LinearRegression()
lre.fit(x_train[['horsepower']], y_train)
lre.score(x_test[['horsepower']], y_test)
#### 0.3635875575078824
#### We can see the R^2 is much smaller using the test
data compared to the training data
lre.score(x_train[['horsepower']], y_train)
#### 0.6619724197515103
Rcross
-1 * cross_val_score(lre,x_data[['horsepower']],
y_data,cv=4,scoring='neg_mean_squared_error')
yhat = cross_val_predict(lre,x_data[['horsepower']],
y_data,cv=4)
yhat[0:5]
lr = LinearRegression()
lr.fit(x_train[['horsepower', 'curb-weight', 'engine-
size', 'highway-mpg']], y_train)
yhat_train[0:5]
DistributionPlot(y_test,yhat_test,"Actual Values
(Test)","Predicted Values (Test)",Title)
#### Comparing Figure 1 and Figure 2, it is evident that
the distribution of the test data in Figure 1 is much
better at fitting the data. This difference in Figure 2 is
apparent in the range of 5000 to 15,000. This is where the
shape of the distribution is extremely different. Let's
see if polynomial regression also exhibits a drop in the
prediction accuracy when analysing the test dataset.
#### Let's use 55 percent of the data for training and the
rest for testing
pr = PolynomialFeatures(degree=5)
x_train_pr = pr.fit_transform(x_train[['horsepower']])
x_test_pr = pr.fit_transform(x_test[['horsepower']])
pr
#### We can see the output of our model using the method
"predict." We assign the values to "yhat"
yhat = poly.predict(x_test_pr)
yhat[0:5]
PollyPlot(x_train[['horsepower']], x_test[['horsepower']],
y_train, y_test, poly,pr)
poly.score(x_test_pr, y_test)
#### We see the R^2 for the training data is 0.5567 while
the R^2 on the test data was -29.87. The lower the R^2,
the worse the model. A negative R^2 is a sign of
overfitting
#### Let's see how the R^2 changes on the test data for
different order polynomials and then plot the results
Rsqu_test = []
order = [1, 2, 3, 4]
for n in order:
pr = PolynomialFeatures(degree=n)
x_train_pr = pr.fit_transform(x_train[['horsepower']])
x_test_pr = pr.fit_transform(x_test[['horsepower']])
lr.fit(x_train_pr, y_train)
Rsqu_test.append(lr.score(x_test_pr, y_test))
plt.plot(order, Rsqu_test)
plt.xlabel('order')
plt.ylabel('R^2')
plt.title('R^2 Using Test Data')
plt.text(3, 0.75, 'Maximum R^2 ')
Code V5
plt.title(Title)
plt.xlabel('Price (in dollars)')
plt.ylabel('Proportion of Cars')
plt.show()
plt.close()
#training data
#testing data
# lr: linear regression object
#poly_transform: polynomial transformation object
xmax=max([xtrain.values.max(), xtest.values.max()])
xmin=min([xtrain.values.min(), xtest.values.min()])
import pandas as pd
import numpy as np
df.to_csv('module_5_auto.csv')
df=df._get_numeric_data()
df.head()
y_data = df['price']
x_data=df.drop('price',axis=1)
lre=LinearRegression()
lre.fit(x_train[['horsepower']], y_train)
#### We can see the R^2 is much smaller using the test
data compared to the training data
lre.score(x_train[['horsepower']], y_train)
#### 0.6619724197515103
Rcross
-1 * cross_val_score(lre,x_data[['horsepower']],
y_data,cv=4,scoring='neg_mean_squared_error')
yhat = cross_val_predict(lre,x_data[['horsepower']],
y_data,cv=4)
yhat[0:5]
lr = LinearRegression()
lr.fit(x_train[['horsepower', 'curb-weight', 'engine-
size', 'highway-mpg']], y_train)
yhat_train[0:5]
DistributionPlot(y_test,yhat_test,"Actual Values
(Test)","Predicted Values (Test)",Title)
#### Let's use 55 percent of the data for training and the
rest for testing
pr = PolynomialFeatures(degree=5)
x_train_pr = pr.fit_transform(x_train[['horsepower']])
x_test_pr = pr.fit_transform(x_test[['horsepower']])
pr
poly = LinearRegression()
poly.fit(x_train_pr, y_train)
#### We can see the output of our model using the method
"predict." We assign the values to "yhat"
yhat = poly.predict(x_test_pr)
yhat[0:5]
PollyPlot(x_train[['horsepower']], x_test[['horsepower']],
y_train, y_test, poly,pr)
poly.score(x_train_pr, y_train)
poly.score(x_test_pr, y_test)
#### We see the R^2 for the training data is 0.5567 while
the R^2 on the test data was -29.87. The lower the R^2,
the worse the model. A negative R^2 is a sign of
overfitting
#### Let's see how the R^2 changes on the test data for
different order polynomials and then plot the results
Rsqu_test = []
order = [1, 2, 3, 4]
for n in order:
pr = PolynomialFeatures(degree=n)
x_train_pr = pr.fit_transform(x_train[['horsepower']])
x_test_pr = pr.fit_transform(x_test[['horsepower']])
lr.fit(x_train_pr, y_train)
Rsqu_test.append(lr.score(x_test_pr, y_test))
plt.plot(order, Rsqu_test)
plt.xlabel('order')
plt.ylabel('R^2')
plt.title('R^2 Using Test Data')
plt.text(3, 0.75, 'Maximum R^2 ')
pr=PolynomialFeatures(degree=2)
x_train_pr=pr.fit_transform(x_train[['horsepower', 'curb-
weight', 'engine-size', 'highway-mpg','normalized-
losses','symboling']])
x_test_pr=pr.fit_transform(x_test[['horsepower', 'curb-
weight', 'engine-size', 'highway-mpg','normalized-
losses','symboling']])
#### Let's import **Ridge** from the module **linear
models**
RigeModel=Ridge(alpha=1)
#### Like regular regression, you can fit the model using
the method **fit**
RigeModel.fit(x_train_pr, y_train)
yhat = RigeModel.predict(x_test_pr)
print('predicted:', yhat[0:4])
print('test set :', y_test[0:4].values)
Rsqu_test = []
Rsqu_train = []
dummy1 = []
Alpha = 10 * np.array(range(0,1000))
pbar = tqdm(Alpha)
Rsqu_test.append(test_score)
Rsqu_train.append(train_score)
width = 12
height = 10
plt.figure(figsize=(width, height))
#### Here the model is built and tested on the same data,
so the training and test data are the same.
Code V6
plt.title(Title)
plt.xlabel('Price (in dollars)')
plt.ylabel('Proportion of Cars')
plt.show()
plt.close()
#training data
#testing data
# lr: linear regression object
#poly_transform: polynomial transformation object
xmax=max([xtrain.values.max(), xtest.values.max()])
xmin=min([xtrain.values.min(), xtest.values.min()])
import pandas as pd
import numpy as np
df.to_csv('module_5_auto.csv')
df=df._get_numeric_data()
df.head()
y_data = df['price']
x_data=df.drop('price',axis=1)
lre=LinearRegression()
lre.fit(x_train[['horsepower']], y_train)
lre.score(x_test[['horsepower']], y_test)
#### 0.3635875575078824
#### We can see the R^2 is much smaller using the test
data compared to the training data
lre.score(x_train[['horsepower']], y_train)
#### 0.6619724197515103
Rcross
-1 * cross_val_score(lre,x_data[['horsepower']],
y_data,cv=4,scoring='neg_mean_squared_error')
lr = LinearRegression()
lr.fit(x_train[['horsepower', 'curb-weight', 'engine-
size', 'highway-mpg']], y_train)
yhat_train[0:5]
DistributionPlot(y_test,yhat_test,"Actual Values
(Test)","Predicted Values (Test)",Title)
#### Let's use 55 percent of the data for training and the
rest for testing
pr = PolynomialFeatures(degree=5)
x_train_pr = pr.fit_transform(x_train[['horsepower']])
x_test_pr = pr.fit_transform(x_test[['horsepower']])
pr
poly = LinearRegression()
poly.fit(x_train_pr, y_train)
#### We can see the output of our model using the method
"predict." We assign the values to "yhat"
yhat = poly.predict(x_test_pr)
yhat[0:5]
PollyPlot(x_train[['horsepower']], x_test[['horsepower']],
y_train, y_test, poly,pr)
poly.score(x_train_pr, y_train)
poly.score(x_test_pr, y_test)
#### We see the R^2 for the training data is 0.5567 while
the R^2 on the test data was -29.87. The lower the R^2,
the worse the model. A negative R^2 is a sign of
overfitting
#### Let's see how the R^2 changes on the test data for
different order polynomials and then plot the results
Rsqu_test = []
order = [1, 2, 3, 4]
for n in order:
pr = PolynomialFeatures(degree=n)
x_train_pr = pr.fit_transform(x_train[['horsepower']])
x_test_pr = pr.fit_transform(x_test[['horsepower']])
lr.fit(x_train_pr, y_train)
Rsqu_test.append(lr.score(x_test_pr, y_test))
plt.plot(order, Rsqu_test)
plt.xlabel('order')
plt.ylabel('R^2')
plt.title('R^2 Using Test Data')
plt.text(3, 0.75, 'Maximum R^2 ')
x_train_pr=pr.fit_transform(x_train[['horsepower', 'curb-
weight', 'engine-size', 'highway-mpg','normalized-
losses','symboling']])
x_test_pr=pr.fit_transform(x_test[['horsepower', 'curb-
weight', 'engine-size', 'highway-mpg','normalized-
losses','symboling']])
RigeModel=Ridge(alpha=1)
#### Like regular regression, you can fit the model using
the method **fit**
RigeModel.fit(x_train_pr, y_train)
yhat = RigeModel.predict(x_test_pr)
print('predicted:', yhat[0:4])
print('test set :', y_test[0:4].values)
Rsqu_test = []
Rsqu_train = []
dummy1 = []
Alpha = 10 * np.array(range(0,1000))
pbar = tqdm(Alpha)
Rsqu_test.append(test_score)
Rsqu_train.append(train_score)
#### Here the model is built and tested on the same data,
so the training and test data are the same.
RR=Ridge()
RR
BestRR=Grid1.best_estimator_
BestRR