0% found this document useful (0 votes)

30 views

Linear Regression and SVR

The document discusses importing household power consumption data, cleaning it by removing missing values and special characters, and converting data types. Key steps include sampling 30,000 rows of data, separating date and time columns into individual components, calculating a total metering column, and dropping unneeded columns to reduce the final dataset to 10 columns for analysis. The summary describes analyzing the cleaned data and finding null values in several columns related to power usage.

Uploaded by

ARCHANA R

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views

Linear Regression and SVR

Uploaded by

ARCHANA R

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

Importing libraries

In [1]: import pandas as pd

import numpy as np

import seaborn as sns

import matplotlib.pyplot as plt

%matplotlib inline

import warnings

warnings.filterwarnings('ignore')

Data ingestion

In [2]: df = pd.read_csv("C:\\Users\\Rajesh Singh\\Downloads\\household_power_consumption\\

In [3]: df

Out[3]: Date Time Global_active_power Global_reactive_power Voltage Global_intens

0 16/12/2006 17:24:00 4.216 0.418 234.840 18.4

1 16/12/2006 17:25:00 5.360 0.436 233.630 23.0

2 16/12/2006 17:26:00 5.374 0.498 233.290 23.0

3 16/12/2006 17:27:00 5.388 0.502 233.740 23.0

4 16/12/2006 17:28:00 3.666 0.528 235.680 15.8

... ... ... ... ... ...

2075254 26/11/2010 20:58:00 0.946 0.000 240.430 4.0

2075255 26/11/2010 20:59:00 0.944 0.000 240.000 4.0

2075256 26/11/2010 21:00:00 0.938 0.000 239.820 3.8

2075257 26/11/2010 21:01:00 0.934 0.000 239.700 3.8

2075258 26/11/2010 21:02:00 0.932 0.000 239.550 3.8

2075259 rows × 9 columns

EDA

In [4]: df.shape

(2075259, 9)
Out[4]:

In [5]: ## selecting 30000 sample data

df1= df.sample(30000)

In [6]: df1.info()

Int64Index: 30000 entries, 1032508 to 1532620

Data columns (total 9 columns):

# Column Non-Null Count Dtype

--- ------ -------------- -----

0 Date 30000 non-null object

1 Time 30000 non-null object

2 Global_active_power 30000 non-null object

3 Global_reactive_power 30000 non-null object

4 Voltage 30000 non-null object

5 Global_intensity 30000 non-null object

6 Sub_metering_1 30000 non-null object

7 Sub_metering_2 30000 non-null object

8 Sub_metering_3 29619 non-null float64

dtypes: float64(1), object(8)

memory usage: 2.3+ MB

Feature Information
Global_active_power: household global minute-averaged active power (in kilowatt)
global_reactive_power: household global minute-averaged reactive power (in
kilowatt)
voltage: minute-averaged voltage (in volt)
global_intensity: household global minute-averaged current intensity (in ampere)
date: Date in format dd/mm/yyyy
time: time in format hh:mm:ss
sub_metering_1, sub_metering_1, and sub_metering_1 are the meter readings

separating date, month and year

In [7]: df1['Date']= pd.to_datetime(df1['Date'])

In [8]: df1['date']= df1['Date'].dt.day

In [9]: df1['month']= df1['Date'].dt.month

In [10]: df1['year']= df1['Date'].dt.year

separating hours, minutes and seconds

In [12]: df1['hour'] = pd.to_datetime(df1['Time'], format='%H:%M:%S').dt.hour

df1['Minutes'] = pd.to_datetime(df1['Time'], format='%H:%M:%S').dt.minute

replacing special characters from data

In [14]: df1.replace('?',np.nan, inplace =True)

df1.replace(',',np.nan, inplace =True)

df1.replace(' ', np.nan, inplace =True)

In [15]: df1.info()

Int64Index: 30000 entries, 1032508 to 1532620

Data columns (total 14 columns):

# Column Non-Null Count Dtype

--- ------ -------------- -----

0 Date 30000 non-null datetime64[ns]

1 Time 30000 non-null object

2 Global_active_power 29619 non-null object

3 Global_reactive_power 29619 non-null object

4 Voltage 29619 non-null object

5 Global_intensity 29619 non-null object

6 Sub_metering_1 29619 non-null object

7 Sub_metering_2 29619 non-null object

8 Sub_metering_3 29619 non-null float64

9 date 30000 non-null int64

10 month 30000 non-null int64

11 year 30000 non-null int64

12 hour 30000 non-null int64

13 Minutes 30000 non-null int64

dtypes: datetime64[ns](1), float64(1), int64(5), object(7)

memory usage: 3.4+ MB

Converting the datatypes

In [16]: df1['Global_active_power']= df1['Global_active_power'].astype(float)

df1['Global_reactive_power']= df1['Global_reactive_power'].astype(float)

df1['Voltage']= df1['Voltage'].astype(float)

df1['Global_intensity']= df1['Global_intensity'].astype(float)

df1['Sub_metering_1']= df1['Sub_metering_1'].astype(float)

df1['Sub_metering_2']= df1['Sub_metering_2'].astype(float)

Creating total metering column

In [17]: df1['total_metering']= df1['Sub_metering_1']+df1['Sub_metering_2']+df1['Sub_meterin

In [18]: df1.info()

Int64Index: 30000 entries, 1032508 to 1532620

Data columns (total 15 columns):

# Column Non-Null Count Dtype

--- ------ -------------- -----

0 Date 30000 non-null datetime64[ns]

1 Time 30000 non-null object

2 Global_active_power 29619 non-null float64

3 Global_reactive_power 29619 non-null float64

4 Voltage 29619 non-null float64

5 Global_intensity 29619 non-null float64

6 Sub_metering_1 29619 non-null float64

7 Sub_metering_2 29619 non-null float64

8 Sub_metering_3 29619 non-null float64

9 date 30000 non-null int64

10 month 30000 non-null int64

11 year 30000 non-null int64

12 hour 30000 non-null int64

13 Minutes 30000 non-null int64

14 total_metering 29619 non-null float64

dtypes: datetime64[ns](1), float64(8), int64(5), object(1)

memory usage: 3.7+ MB

Dropping columns not necessary

In [19]: dff= df1.drop(columns=['Date','Time','Sub_metering_1','Sub_metering_2','Sub_meterin

In [20]: dff.info()

Int64Index: 30000 entries, 1032508 to 1532620

Data columns (total 10 columns):

# Column Non-Null Count Dtype

--- ------ -------------- -----

0 Global_active_power 29619 non-null float64

1 Global_reactive_power 29619 non-null float64

2 Voltage 29619 non-null float64

3 Global_intensity 29619 non-null float64

4 date 30000 non-null int64

5 month 30000 non-null int64

6 year 30000 non-null int64

7 hour 30000 non-null int64

8 Minutes 30000 non-null int64

9 total_metering 29619 non-null float64

dtypes: float64(5), int64(5)

memory usage: 2.5 MB

In [21]: dff.describe()

Out[21]: Global_active_power Global_reactive_power Voltage Global_intensity date

count 29619.000000 29619.000000 29619.000000 29619.00000 30000.000000 3

mean 1.101824 0.124000 240.810522 4.66880 15.722133

std 1.066910 0.113641 3.254252 4.48595 8.818710

min 0.078000 0.000000 225.450000 0.20000 1.000000

25% 0.310000 0.048000 238.950000 1.40000 8.000000

50% 0.620000 0.100000 240.980000 2.80000 16.000000

75% 1.534000 0.196000 242.860000 6.40000 23.000000

max 8.702000 1.100000 253.140000 38.00000 31.000000

Primary Observation:
There are null values in Global_active_power, Global_reactive_power, Voltage,
Global_intensity and total_metering.
And Global_active_power, Global_reactive_power,
Global_intensity and total_metering features seems to have outliers and hence data is
skewed.
Global_active_power : has observed mean power of 1.087975 and standard deviation
1.052817, minimum power 0.078000, range og 25th percentile to 75th percentile is
[0.308000 to 1.530000], but maximum power 10.670000.
This clearly shows data is
skewed and presence of outliers.
Global_reactive_power : has observed mean power of 0.123332 and standard deviation
0.113058, minimum power 0, range og 25th percentile to 75th percentile is [0.048000 to
0.194000], but maximum power 1.186000.
This clearly shows data is skewed and
presence of outliers.
Voltage : has observed mean voltage of 240.853321 and standard deviation 3.241289,
minimum voltage 225.140000, range og 25th percentile to 75th percentile is
[238.990000 to 242.910000], and maximum voltage 253.420000.
Global_intensity : has observed mean current intensity of 4.612513 and standard
deviation 4.426749, minimum is 0.200000, range og 25th percentile to 75th percentile is
[1.400000 to 6.400000], but maximum value is 46.400000.
This clearly shows data is
skewed and presence of outliers.
total_metering : has observed mean total metering is 8.815839 and standard deviation
12.710086, minimum reading 0, range og 25th percentile to 75th percentile is [0 to 18],
but maximum reading is 126.
This is skewed ans have outliers.

In [22]: dff.duplicated() ###checking presence of duplicate

1032508 False

Out[22]:
1794736 False

590021 False

1946737 False

691314 False

...

340664 False

2063841 False

1437739 False

1027465 False

1532620 False

Length: 30000, dtype: bool

In [23]: dff.skew()

Global_active_power 1.805078

Out[23]:
Global_reactive_power 1.272971

Voltage -0.329061

Global_intensity 1.868954

date -0.000066

month -0.000640

year -0.015933

hour 0.002056

Minutes 0.003394

total_metering 2.244892

dtype: float64

Handling null values{replacing with mean when features are without outliers and replacing with median
when features are with outliers}

In [24]: dff['Global_active_power'] = dff['Global_active_power'].fillna(dff['Global_active_p

dff['Global_reactive_power'] = dff['Global_reactive_power'].fillna(dff['Global_reac
dff['Voltage'] = dff['Voltage'].fillna(dff['Voltage'].mean())

dff['Global_intensity'] = dff['Global_intensity'].fillna(dff['Global_intensity'].me
dff['total_metering'] = dff['total_metering'].fillna(dff['total_metering'].median()

In [25]: dff.head()

Out[25]: Global_active_power Global_reactive_power Voltage Global_intensity date month ye

1032508 1.838 0.128 242.58 7.6 12 2 200

1794736 1.756 0.190 241.51 7.2 16 5 20

590021 1.688 0.054 239.15 7.2 30 1 200

1946737 1.600 0.308 239.54 6.8 29 8 20

691314 1.368 0.000 239.24 5.6 4 9 200

In [26]: plt.figure(figsize=(15, 15))

plt.suptitle('Univariate Analysis', fontsize=20, fontweight='bold', alpha=0.8, y=1

for i in range(0, len(dff.columns)):

plt.subplot(5, 3, i+1)

sns.kdeplot(x=dff[dff.columns[i]],shade=True, color='b')

plt.xlabel(dff.columns[i])

plt.tight_layout()

In [27]: plt.figure(figsize=(15, 15))

plt.suptitle('Relation of features with total metering', fontsize=20)

for i in range(0, len(dff.columns)):

plt.subplot(5, 3, i+1)

sns.scatterplot(data = dff, x=dff[dff.columns[i]], y = dff['total_metering'], c

plt.xlabel(dff.columns[i])

plt.tight_layout()

In [28]: sns.lineplot(x='hour', y='total_metering', data=dff, color = 'blue')

<AxesSubplot:xlabel='hour', ylabel='total_metering'>
Out[28]:

Observation:

total reading incraeses in the morning around 7 am then with a dip aroung 3 or 4 of
afternoon, it again rises in evening time then dip after 8 or 9 of the night.

In [29]: sns.lineplot(x='month', y='total_metering', data=dff, color = 'blue')

<AxesSubplot:xlabel='month', ylabel='total_metering'>
Out[29]:
Observation:
In july there is a dip in power consumption.

In [30]: sns.lineplot(x='year', y='total_metering', data=dff, color = 'blue')

<AxesSubplot:xlabel='year', ylabel='total_metering'>
Out[30]:

Observation:

The power consumption has decreased from 2006 onwards.

Checking correlation

In [31]: dff.corr()

Out[31]: Global_active_power Global_reactive_power Voltage Global_intensity

Global_active_power 1.000000 0.262954 -0.398298 0.998943 -0.0

Global_reactive_power 0.262954 1.000000 -0.112563 0.281364 0.0

Voltage -0.398298 -0.112563 1.000000 -0.409977 0.0

Global_intensity 0.998943 0.281364 -0.409977 1.000000 -0.0

date -0.010590 0.004627 0.006404 -0.010543 1.0

month 0.005835 0.013825 0.042346 0.004858 0.0

year -0.041554 0.045964 0.253848 -0.045567 -0.0

hour 0.284929 0.137597 -0.182374 0.285519 0.0

Minutes 0.000974 -0.019742 0.018049 0.000006 -0.0

total_metering 0.846845 0.190819 -0.342922 0.843842 -0.0

In [32]: plt.figure(figsize=(15,15))

sns.heatmap(data=dff.corr(), annot=True);

Observation:

Global_active_power and global_intensity are highly correlated

Handling multicollinearity

In [33]: from statsmodels.stats.outliers_influence import variance_inflation_factor

In [34]: vif_data = pd.DataFrame()

vif_data['VIF'] = [variance_inflation_factor(dff.values,i) for i in range(len(dff.c

vif_data['features'] = dff.columns

vif_data

Out[34]: VIF features

0 1317.390296 Global_active_power

1 2.937605 Global_reactive_power

2 7550.255484 Voltage

3 1337.011908 Global_intensity

4 4.179946 date

5 4.530054 month

6 7638.125596 year

7 4.222324 hour

8 3.889487 Minutes

9 5.372782 total_metering

In [35]: ## 'Global_active_power', 'Voltage', 'Global_intensity', and 'year' has VIF value g

## Dropping 'Global_active power'as both 'global active power' and 'global intensit
## although voltage's VIF is also high but will not drop it as it does not have col
## dropping year feature as high VIF value and also not having this will not affect
dff.drop(columns=['Global_active_power','year'],axis=1, inplace = True)

In [36]: dff.head()

Out[36]: Global_reactive_power Voltage Global_intensity date month hour Minutes total_me

1032508 0.128 242.58 7.6 12 2 17 52

1794736 0.190 241.51 7.2 16 5 1 40

590021 0.054 239.15 7.2 30 1 11 5

1946737 0.308 239.54 6.8 29 8 15 1

691314 0.000 239.24 5.6 4 9 19 18

Checking outliers
In [37]: plt.figure(figsize=(15,15))

plt.suptitle("Outliers Analysis",fontsize = 20, fontweight = 'bold', alpha= 0.8)

for i in range(0, len(dff.columns)):

plt.subplot(5,3,i+1)

sns.boxplot(dff[dff.columns[i]])

plt.tight_layout()

Importing winsorizer- to handle outliers

In [39]: from feature_engine.outliers.winsorizer import Winsorizer

In [40]: winsorizer = Winsorizer(capping_method = 'iqr', # choose skewed for IQR rule bound
tail = 'both', # cap left, right or both tails

fold = 1.5, # 1.5 times of iqr

variables = ['Global_reactive_power'])

# capping_methods = 'iqr' - 25th quantile & 75th quantile

dff['Global_reactive_power'] = winsorizer.fit_transform(dff[['Global_reactive_power

In [41]: winsorizer = Winsorizer(capping_method = 'iqr', # choose skewed for IQR rule bound
tail = 'both', # cap left, right or both tails

fold = 1.5, # 1.5 times of iqr

variables = ['Voltage'])

# capping_methods = 'iqr' - 25th quantile & 75th quantile

dff['Voltage'] = winsorizer.fit_transform(dff[['Voltage']])

In [42]: winsorizer = Winsorizer(capping_method = 'iqr', # choose skewed for IQR rule bound
tail = 'both', # cap left, right or both tails

fold = 1.5, # 1.5 times of iqr

variables = ['Global_intensity'])

# capping_methods = 'iqr' - 25th quantile & 75th quantile

dff['Global_intensity'] = winsorizer.fit_transform(dff[['Global_intensity']])

In [43]: winsorizer = Winsorizer(capping_method = 'iqr', # choose skewed for IQR rule bound
tail = 'both', # cap left, right or both tails

fold = 1.5, # 1.5 times of iqr

variables = ['total_metering'])

# capping_methods = 'iqr' - 25th quantile & 75th quantile

dff['total_metering'] = winsorizer.fit_transform(dff[['total_metering']])

Checking outliers after outlier treatment

In [44]: plt.figure(figsize=(15,15))

plt.suptitle("Outliers Analysis",fontsize = 20, fontweight = 'bold', alpha= 0.8)

for i in range(0, len(dff.columns)):

plt.subplot(5,3,i+1)

sns.boxplot(dff[dff.columns[i]])

plt.tight_layout()

Saving the cleaned data

In [45]: dff.to_csv("power_consumption_data.csv")

we will try to upload this dataset in Mongodb

In [46]: import pymongo

In [47]: ## creating connection with mongodb server

client = pymongo.MongoClient("mongodb://raje:mongodb@ac-tl0bnvo-shard-00-00.bkfbasy
db = client.test

In [48]: database = client['power_consumption'] ##creating a database at their in mongodb

In [49]: collection = database["power_consumption_data"]

In [50]: ## first we have to convert this dataframe into dict or josn format as this is the

In [51]: dff_dict = dff.to_dict(orient = 'records')

In [52]: ## now lets insert the data in mongdb

collection.insert_many(dff_dict)

loading the data from mongodb

In [53]: ## here connection is already established so no need to establish connection again,
#client = pymongo.MongoClient("mongodb://raje:mongodb@ac-tl0bnvo-shard-00-00.bkfbas
#db = client.test

#database = client['power_consumption']

#collection = database["power_consumption_data"]

In [54]: ## fetching data from the collection in mongodb{using find will return all the occu
data = pd.DataFrame(list(collection.find()))

In [55]: data

Out[55]: _id Global_reactive_power Voltage Global_intensity date month

0 636e4c605d7ad8332ed99cd5 0.182 241.410 13.4 1 6

1 636e4c605d7ad8332ed99cd7 0.000 238.130 0.6 22 5

2 636e4c605d7ad8332ed99ce3 0.000 239.580 0.8 17 10

3 636e4c605d7ad8332ed99ce6 0.000 233.335 4.6 19 7

4 636e4c605d7ad8332ed99ced 0.176 239.320 6.2 21 4

... ... ... ... ... ... ...

29995 636e4c605d7ad8332eda11c1 0.104 244.810 1.4 1 3

29996 636e4c605d7ad8332eda11cf 0.254 240.480 1.4 22 8

29997 636e4c605d7ad8332eda11d9 0.112 241.310 1.4 26 3

29998 636e4c605d7ad8332eda11e4 0.102 238.750 1.4 28 7

29999 636e4c605d7ad8332eda11f6 0.124 241.800 9.2 29 9

30000 rows × 9 columns

In [56]: #### dropping id column

data.drop(columns=['_id'],axis=1,inplace=True)

In [57]: data.head()

Out[57]: Global_reactive_power Voltage Global_intensity date month hour Minutes total_metering

0 0.182 241.410 13.4 1 6 19 34 18.0

1 0.000 238.130 0.6 22 5 11 17 1.0

2 0.000 239.580 0.8 17 10 16 55 0.0

3 0.000 233.335 4.6 19 7 8 18 17.0

4 0.176 239.320 6.2 21 4 12 30 18.0

splitting the data into dependent and independent features

In [58]: X = data.drop("total_metering", axis=1)

In [59]: X

Out[59]: Global_reactive_power Voltage Global_intensity date month hour Minutes

0 0.182 241.410 13.4 1 6 19 34

1 0.000 238.130 0.6 22 5 11 17

2 0.000 239.580 0.8 17 10 16 55

3 0.000 233.335 4.6 19 7 8 18

4 0.176 239.320 6.2 21 4 12 30

... ... ... ... ... ... ... ...

29995 0.104 244.810 1.4 1 3 13 57

29996 0.254 240.480 1.4 22 8 10 7

29997 0.112 241.310 1.4 26 3 18 5

29998 0.102 238.750 1.4 28 7 1 54

29999 0.124 241.800 9.2 29 9 7 35

30000 rows × 7 columns

In [60]: y = data['total_metering']

In [61]: y.head()

0 18.0

Out[61]:
1 1.0

2 0.0

3 17.0

4 18.0

Name: total_metering, dtype: float64

Splitting the data into train and test data

In [62]: from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_st

In [63]: X_train

Out[63]: Global_reactive_power Voltage Global_intensity date month hour Minutes

13707 0.098 243.79 1.2 18 8 9 20

10403 0.256 240.06 3.4 4 5 19 23

6673 0.000 245.50 1.0 18 2 4 17

28904 0.274 244.23 4.2 4 9 16 28

2987 0.112 245.65 1.4 18 3 4 14

... ... ... ... ... ... ... ...

28017 0.098 243.68 1.2 2 2 3 9

17728 0.202 239.73 2.4 7 7 23 49

29199 0.180 238.16 6.8 3 9 8 9

7293 0.054 241.76 8.4 31 8 22 22

17673 0.104 239.95 1.4 31 8 3 20

22500 rows × 7 columns

In [64]: X_test

Out[64]: Global_reactive_power Voltage Global_intensity date month hour Minutes

20412 0.000 243.45 1.0 23 12 23 19

1296 0.058 240.72 7.2 11 5 7 57

3906 0.136 242.68 10.4 1 1 4 10

20454 0.078 245.20 1.2 4 9 14 52

5200 0.070 241.62 4.2 10 8 0 2

... ... ... ... ... ... ... ...

7783 0.152 241.96 2.0 25 12 12 53

16480 0.000 243.42 6.2 14 11 22 27

18914 0.170 234.49 13.9 25 3 20 22

3418 0.000 243.03 1.0 10 2 15 45

4855 0.068 240.11 6.0 30 5 20 15

7500 rows × 7 columns

In [65]: y_train

13707 0.0

Out[65]:
10403 1.0

6673 0.0

28904 11.0

2987 0.0

...

28017 0.0

17728 0.0

29199 19.0

7293 19.0

17673 2.0

Name: total_metering, Length: 22500, dtype: float64

Standardising data

In [66]: from sklearn.preprocessing import StandardScaler

In [67]: scaler = StandardScaler()

In [68]: #### Using fit transform to standardise train data

In [69]: X_train = scaler.fit_transform(X_train,)

In [70]: X_train

array([[-0.21494871, 0.93559813, -0.85225693, ..., 0.4274415 ,

Out[70]:
-0.36264487, -0.5475558 ],
[ 1.29044859, -0.26217727, -0.26946362, ..., -0.44053182,

1.08034131, -0.37407043],
[-1.14867616, 1.48471233, -0.90523815, ..., -1.30850514,

-1.08413797, -0.72104116],
...,

[ 0.56633343, -0.87230415, 0.63121697, ..., 0.71676594,

-0.50694349, -1.18366881],
[-0.63417328, 0.28372573, 1.05506665, ..., 0.4274415 ,

1.51323716, -0.43189889],
[-0.15778173, -0.2975004 , -0.79927572, ..., 0.4274415 ,

-1.22843658, -0.5475558 ]])

In [71]: #### using only transform to avoid data leakage

In [72]: X_test = scaler.transform(X_test)

X_test

array([[-1.14867616, 0.82641753, -0.90523815, ..., 1.58473926,

Out[72]:
1.65753578, -0.60538425],
[-0.59606196, -0.05023846, 0.73717939, ..., -0.44053182,

-0.65124211, 1.59209705],
[ 0.14710886, 0.57915559, 1.58487876, ..., -1.59782958,

-1.08413797, -1.12584035],
...,

[ 0.47105512, -2.05081239, 2.51204995, ..., -1.0191807 ,

1.22463993, -0.43189889],
[-1.14867616, 0.69154738, -0.90523815, ..., -1.30850514,

0.50314684, 0.89815559],
[-0.50078365, -0.2461213 , 0.41929213, ..., -0.44053182,

1.22463993, -0.83669807]])

Pickling

In [73]: ## pickling{saving some preprocessing objects/models}

import pickle

# Writing different model files to file

with open('sandardScalar.sav', 'wb') as f:

pickle.dump(scaler,f)

## here we have pickled the standard scaler object

Linear Regression
In [74]: from sklearn.linear_model import LinearRegression

In [75]: ## Creating linear regression model

linear_reg = LinearRegression()

In [76]: ## training the model

linear_reg.fit(X_train, y_train)

LinearRegression()
Out[76]:

In [77]: ## coefficients and intercept of the best fit hyperplane

print('1. the coefficients of independent features: {}'.format(linear_reg.coef_))

print('2. intercept of the best fit hyperplane: {}'.format(linear_reg.intercept_))

1. the coefficients of independent features: [-4.50357114e-01 -1.05664480e-01 9.5

8753698e+00 7.08918901e-03

-3.03128453e-02 -8.95250419e-01 -8.64638801e-02]

2. intercept of the best fit hyperplane: 8.405333333333331

In [78]: ## predicting the test data using the model

y_pred = linear_reg.predict(X_test)

y_pred

array([-1.31742958, 16.20173682, 24.57747909, ..., 31.47360105,

Out[78]:
-0.32246875, 11.67767303])

cost functions
In [79]: from sklearn.metrics import mean_squared_error

from sklearn.metrics import mean_absolute_error

In [80]: print("Mean squared error is {}".format(round(mean_squared_error(y_test, y_pred),2)

print("Mean squared error is {}".format(round(mean_absolute_error(y_test, y_pred),2
print("Root Mean squared error is {}".format(round(np.sqrt(mean_squared_error(y_tes

Mean squared error is 38.2

Mean squared error is 4.18

Root Mean squared error is 6.18

Validating the model using assumptions of Linear regression

Linear relationship

Test truth data and predicted data should follow linear relationship
{this is an indiaction
of a good model}

In [81]: plt.scatter(x=y_test, y = y_pred)

plt.xlabel("Test truth data")

plt.ylabel("Predicted data")

Text(0, 0.5, 'Predicted data')

Out[81]:

Residual distribution

Residuals should follow normal distribution

In [82]: residual_linear= y_test-y_pred

residual_linear.head()

20412 1.317430

Out[82]:
1296 2.798263

3906 -24.577479

20454 1.194867

5200 -3.652491

Name: total_metering, dtype: float64

In [83]: sns.displot(x=residual_linear, kind='kde')

<seaborn.axisgrid.FacetGrid at 0x15bb818faf0>
Out[83]:

Uniform distribution
Residuals vs Predictions should follow a uniform dstribution.

In [84]: plt.scatter(x=y_pred, y=residual_linear)

plt.xlabel('Predictions')

plt.ylabel('Residuals')

Text(0, 0.5, 'Residuals')

Out[84]:

Accuracy of the model with train data and with test data

In [85]: linear_reg.score(X_train,y_train)

0.6849425651586472
Out[85]:

In [86]: linear_reg.score(X_test,y_test)

0.690139882731237
Out[86]:

Performance Matrics
R Square and Adjusted R Square values

In [87]: from sklearn.metrics import r2_score

In [88]: r2_score_lr= r2_score(y_test, y_pred)

print("Linear Regression model has {} % accuracy".format(round(r2_score_lr*100,3)))

adjr2_score_lr=1-((1-r2_score_lr)*(len(y_test)-1)/(len(y_test)-X_test.shape[1]-1))

print("Adjusted R square accuracy is {} percent".format(round(adjr2_score_lr*100,2)

Linear Regression model has 69.014 % accuracy

Adjusted R square accuracy is 68.99 percent

Ridge Regression
In [89]: from sklearn.linear_model import Ridge

In [90]: ## creating ridge regrssion model

ridge_reg= Ridge()

In [91]: ## training the model

ridge_reg.fit(X_train, y_train)

Ridge()
Out[91]:

In [92]: # Printing co-efficients and intercept of best fit hyperplane

print("coefficient of independent feature is {}".format(ridge_reg.coef_))

print("Intercept of best fit hyperplane is {}".format(ridge_reg.intercept_))

coefficient of independent feature is [-4.50239810e-01 -1.05849300e-01 9.58696058

e+00 7.08439131e-03

-3.02969840e-02 -8.95086518e-01 -8.64597295e-02]

Intercept of best fit hyperplane is 8.405333333333331

In [93]: # predicting test data

y_pred_r = ridge_reg.predict(X_test)

y_pred_r

array([-1.31690495, 16.20114673, 24.57627609, ..., 31.47276509,

Out[93]:
-0.32214097, 11.67760058])

In [94]: print("Mean squared error is {}".format(round(mean_squared_error(y_test, y_pred_r),

print("Mean absolute error is {}".format(round(mean_absolute_error(y_test, y_pred_r
print("Root Mean squared error is {}".format(round(np.sqrt(mean_squared_error(y_tes

Mean squared error is 38.2

Mean absolute error is 4.18

Root Mean squared error is 6.18

In [95]: ridge_reg.score(X_train,y_train)

0.6849425631543848
Out[95]:

In [96]: ridge_reg.score(X_test,y_test)

0.6901402931343383
Out[96]:

Validating model using performace matrics

R Square and Adjusted R Square values

In [97]: ridge_r2_score=r2_score(y_test, y_pred_r)

print("Our Ridge regression model has {} % accuracy".format(round(ridge_r2_score*10

ridge_adjr2_score=1-((1-ridge_r2_score)*(len(y_test)-1)/(len(y_test)-X_test.shape[1
print("Adjusted R square accuracy is {} percent".format(round(ridge_adjr2_score*100

Our Ridge regression model has 69.014 % accuracy

Adjusted R square accuracy is 68.99 percent

Lasso Regression
In [98]: from sklearn.linear_model import Lasso

In [99]: # creating Lasso regression model

lasso_reg = Lasso()

lasso_reg

Lasso()
Out[99]:

In [100… ## training the model

lasso_reg.fit(X_train, y_train)

Lasso()
Out[100]:

In [101… # Printing co-efficients and intercept of best fit hyperplane

print("coefficient of independent feature is {}".format(lasso_reg.coef_))

print("Intercept of best fit hyperplane is {}".format(lasso_reg.intercept_))

coefficient of independent feature is [-0. -0. 8.25775306 -0.

-0. -0.

-0. ]

Intercept of best fit hyperplane is 8.405333333333331

In [102… ## predicting the test data

y_pred_lasso = lasso_reg.predict(X_test)

y_pred_lasso

array([ 0.93010027, 14.4927787 , 21.49287079, ..., 29.14922151,

Out[102]:
0.93010027, 11.86774416])

In [103… print("Mean squared error is {}".format(round(mean_squared_error(y_test, y_pred_las

print("Mean absolute error is {}".format(round(mean_absolute_error(y_test, y_pred_l
print("Root Mean squared error is {}".format(round(np.sqrt(mean_squared_error(y_tes

Mean squared error is 40.15

Mean absolute error is 4.41

Root Mean squared error is 6.34

In [104… lasso_reg.score(X_train,y_train)

0.6693582767991895
Out[104]:

In [105… lasso_reg.score(X_test,y_test)

0.6743540592319746
Out[105]:

Validating model using performance matrics

R Square and Adjusted R Square values

In [106… lasso_r2_score=r2_score(y_test, y_pred_lasso)

print("Our Ridge regression model has {} % accuracy".format(round(lasso_r2_score*10

lasso_adjr2_score=1-((1-lasso_r2_score)*(len(y_test)-1)/(len(y_test)-X_test.shape[1
print("Adjusted R square accuracy is {} percent".format(round(lasso_adjr2_score*100

Our Ridge regression model has 67.435 % accuracy

Adjusted R square accuracy is 67.4 percent

Elastic-Net Regression
In [107… from sklearn.linear_model import ElasticNet

In [108… # creating Elastic-Net regression model

elastic_reg=ElasticNet()

elastic_reg

ElasticNet()
Out[108]:

In [109… ## training the model

elastic_reg.fit(X_train, y_train)

ElasticNet()
Out[109]:

In [110… # Printing co-efficients and intercept of best fit hyperplane

print("1. Co-efficients of independent features is {}".format(elastic_reg.coef_))

print("2. Intercept of best fit hyper plane is {}".format(elastic_reg.intercept_))

1. Co-efficients of independent features is [ 0. -0.64573766 5.66694515 -

0. 0. 0.

-0. ]

2. Intercept of best fit hyper plane is 8.405333333333331

In [111… elastic_y_pred=elastic_reg.predict(X_test)

elastic_y_pred

array([ 2.7417495 , 12.61532936, 17.01277178, ..., 23.9652694 ,

Out[111]:
2.82884024, 10.9403686 ])

In [112… print("Mean squared error is '{}'".format(round(mean_squared_error(y_test, elastic_

print("Mean absolute error is '{}'".format(round(mean_absolute_error(y_test, elasti
print("Root Mean squared error is '{}'".format(round(np.sqrt(mean_squared_error(y_t

Mean squared error is '50.16'

Mean absolute error is '5.5'

Root Mean squared error is '7.08'

In [113… elastic_reg.score(X_train,y_train)

0.5870637265888694
Out[113]:

In [114… elastic_reg.score(X_test,y_test)

0.5931641182844434
Out[114]:

Validating model using performance matrices

R Square and Adjusted R Square values

In [115… elastic_reg_r2_score=r2_score(y_test, elastic_y_pred)

print("Our Elastic-Net regression model has {} % accuracy".format(round(elastic_reg

elastic_reg_adj_r2_score=1-((1-elastic_reg_r2_score)*(len(y_test)-1)/(len(y_test)-X
print("Adjusted R square accuracy is {} percent".format(round(elastic_reg_adj_r2_sc

Our Elastic-Net regression model has 59.316 % accuracy

Adjusted R square accuracy is 59.28 percent

SVR
In [116… from sklearn.svm import SVR

In [117… # creating SVR model

svr = SVR()

svr

SVR()
Out[117]:

In [118… ## training the model

svr.fit(X_train, y_train)

SVR()
Out[118]:

In [119… ## predicting the dependent feature value w.r.t. test data

svr_y_pred= svr.predict(X_test)

svr_y_pred

array([-0.21238191, 21.55678922, 20.86542642, ..., 30.91950729,

Out[119]:
0.21280761, 12.93475299])

In [120… print("Mean squared error is '{}'".format(round(mean_squared_error(y_test, svr_y_pr

print("Mean absolute error is '{}'".format(round(mean_absolute_error(y_test, svr_y_
print("Root Mean squared error is '{}'".format(round(np.sqrt(mean_squared_error(y_t

Mean squared error is '33.16'

Mean absolute error is '3.38'

Root Mean squared error is '5.76'

In [121… svr.score(X_train, y_train)

0.7336778695440571
Out[121]:

In [122… svr.score(X_test, y_test)

0.7310080566997736
Out[122]:

Performance Matrix SVR

R Square and Adjusted R Square values

In [123… svr_r2_score=r2_score(y_test, svr_y_pred)

print("SVR model has {} % accuracy".format(round(svr_r2_score*100,3)))

svr_adj_r2_score=1-((1-svr_r2_score)*(len(y_test)-1)/(len(y_test)-X_test.shape[1]-1
print("Adjusted R square accuracy is {} percent".format(round(svr_adj_r2_score*100,

SVR model has 73.101 % accuracy

Adjusted R square accuracy is 73.08 percent

Apply hyperparameter tuning

In [124… from sklearn.model_selection import GridSearchCV

In [125… model_params = {

'Ridge Regression': {

'model': Ridge(),

'params' : {

'alpha': [1,5,10,20]

'Lasso Regression': {

'model': Lasso(),

'params' : {

'alpha': [1,5,10,20]

'Elastic-Net Regression' : {

'model': ElasticNet(),

'params': {

'alpha': [1,5,10,20],

'l1_ratio':[0.5,1,1.5,2]

'SVR':{

'model': SVR(),

'params':{

'kernel': ['linear', 'poly', 'sigmoid', 'rbf'],

'C':[1,5,10,20]

In [126… model_params.items()

dict_items([('Ridge Regression', {'model': Ridge(), 'params': {'alpha': [1, 5, 10,

Out[126]:
20]}}), ('Lasso Regression', {'model': Lasso(), 'params': {'alpha': [1, 5, 10, 2
0]}}), ('Elastic-Net Regression', {'model': ElasticNet(), 'params': {'alpha': [1,
5, 10, 20], 'l1_ratio': [0.5, 1, 1.5, 2]}}), ('SVR', {'model': SVR(), 'params':
{'kernel': ['linear', 'poly', 'sigmoid', 'rbf'], 'C': [1, 5, 10, 20]}})])

In [127… ##scaling the independent features before fitting it inside grid object{to simplify
X1= scaler.fit_transform(X)

In [128… scores = []

for model_name, mp in model_params.items():

clf = GridSearchCV(mp['model'], mp['params'], cv=5, n_jobs=-1,return_train_sco

clf.fit(X1, y)

scores.append({

'model': model_name,

'best_score': clf.best_score_,

'best_params': clf.best_params_

})

df = pd.DataFrame(scores,columns=['model','best_score','best_params'])

Out[128]: model best_score best_params

0 Ridge Regression 0.686089 {'alpha': 1}

1 Lasso Regression 0.670507 {'alpha': 1}

2 Elastic-Net Regression 0.670507 {'alpha': 1, 'l1_ratio': 1}

3 SVR 0.765643 {'C': 20, 'kernel': 'rbf'}

Conclusion:
The SVR model with 'rbf kernel' is the best model for this household power
consumption data
In [ ]:

Aosdijfpqoiew
No ratings yet
Aosdijfpqoiew
6 pages
Time Series Forecasting Week 2 Quiz Part 1
75% (4)
Time Series Forecasting Week 2 Quiz Part 1
3 pages
Time Series Forecasting Week 1 Quiz Part 2
67% (3)
Time Series Forecasting Week 1 Quiz Part 2
2 pages
Time Series
No ratings yet
Time Series
31 pages
Yash Week 3 Uber Case Study
No ratings yet
Yash Week 3 Uber Case Study
38 pages
Suresh-Sparkling Time Series Forecasting Project Report
No ratings yet
Suresh-Sparkling Time Series Forecasting Project Report
73 pages
30 Deep Learning Projects
No ratings yet
30 Deep Learning Projects
7 pages
Enma 104 1.4
50% (2)
Enma 104 1.4
23 pages
Individual Household Electric Power Consumption
No ratings yet
Individual Household Electric Power Consumption
29 pages
Week-5 - Jupyter Notebook
No ratings yet
Week-5 - Jupyter Notebook
9 pages
Gold Price Analysis (Neural Network)
No ratings yet
Gold Price Analysis (Neural Network)
44 pages
Untitled1.ipynb - Colaboratory
No ratings yet
Untitled1.ipynb - Colaboratory
12 pages
Sunbase Data Assignment
No ratings yet
Sunbase Data Assignment
11 pages
Electrical Machine Learning Tool
No ratings yet
Electrical Machine Learning Tool
3 pages
Innovative Assignment PDF
No ratings yet
Innovative Assignment PDF
11 pages
Modelo Guiaunocreditmodel Recall 2
No ratings yet
Modelo Guiaunocreditmodel Recall 2
101 pages
1Demand
No ratings yet
1Demand
13 pages
Credit_Card_fraud_detection Using ML - Jupyter Notebook (1)
No ratings yet
Credit_Card_fraud_detection Using ML - Jupyter Notebook (1)
12 pages
Credit_Card_fraud_detection Using ML - Jupyter Notebook2
No ratings yet
Credit_Card_fraud_detection Using ML - Jupyter Notebook2
13 pages
Test 1
No ratings yet
Test 1
5 pages
Loading The Dataset: First We Load The Dataset and Find Out The Number of Columns, Rows, NULL Values, Etc
100% (1)
Loading The Dataset: First We Load The Dataset and Find Out The Number of Columns, Rows, NULL Values, Etc
8 pages
Program 4+Linear+Discriminant+Analysis+-+Mentor+Version0.2 - New
No ratings yet
Program 4+Linear+Discriminant+Analysis+-+Mentor+Version0.2 - New
16 pages
Análisis Exploratorio de Datos (EDA) - NVIDIA 2021-2023
No ratings yet
Análisis Exploratorio de Datos (EDA) - NVIDIA 2021-2023
9 pages
New ANN-Copy1
No ratings yet
New ANN-Copy1
1 page
Prepared by Asif Bhat Exploratory Data Analysis: Explore Dataset
No ratings yet
Prepared by Asif Bhat Exploratory Data Analysis: Explore Dataset
143 pages
Uber
No ratings yet
Uber
7 pages
Stock Price Prediction Project Utilizing LSTM Techniques
No ratings yet
Stock Price Prediction Project Utilizing LSTM Techniques
14 pages
Danmairo - Analysis - Ipynb - Colaboratory
No ratings yet
Danmairo - Analysis - Ipynb - Colaboratory
18 pages
dvlab-code
No ratings yet
dvlab-code
10 pages
DMV - 5 - Jupyter Notebook
No ratings yet
DMV - 5 - Jupyter Notebook
5 pages
DMV - 6 - Jupyter Notebook
No ratings yet
DMV - 6 - Jupyter Notebook
6 pages
Marvel Vs DC
No ratings yet
Marvel Vs DC
1 page
Granger Causality and VAR Models
No ratings yet
Granger Causality and VAR Models
1 page
Predictive Modelling Project
No ratings yet
Predictive Modelling Project
28 pages
Uber ml1 - Jupyter Notebook
No ratings yet
Uber ml1 - Jupyter Notebook
10 pages
Task 1 Vijaya Lakshman PDF
No ratings yet
Task 1 Vijaya Lakshman PDF
10 pages
Time Series
No ratings yet
Time Series
23 pages
Predictive Modelling - Linear Discriminant Analysis - Mentor Version - Jupyter Notebook
100% (1)
Predictive Modelling - Linear Discriminant Analysis - Mentor Version - Jupyter Notebook
25 pages
Natural Disasters Prediction
No ratings yet
Natural Disasters Prediction
21 pages
Cia Code
No ratings yet
Cia Code
38 pages
Import Numpy As NP Import Pandas As PD
No ratings yet
Import Numpy As NP Import Pandas As PD
7 pages
Merged
No ratings yet
Merged
47 pages
Pro Yec To Machine Learning
No ratings yet
Pro Yec To Machine Learning
35 pages
ML - 2 - Jupyter Notebook
No ratings yet
ML - 2 - Jupyter Notebook
6 pages
SEMANA 13 AJUSTE DE HIPERPARÁMETROS DE UN MODELO ESTRATEGIAS DE COMPARACIÓN Y EVALUACIÓN DE DIFERENTES MODELOS Allccahuaman Quichua Paul
No ratings yet
SEMANA 13 AJUSTE DE HIPERPARÁMETROS DE UN MODELO ESTRATEGIAS DE COMPARACIÓN Y EVALUACIÓN DE DIFERENTES MODELOS Allccahuaman Quichua Paul
12 pages
Week 10 Intro Forecasting
No ratings yet
Week 10 Intro Forecasting
25 pages
Yulu Case Study
No ratings yet
Yulu Case Study
1 page
2441843 Week 1 Time Series.pdf
No ratings yet
2441843 Week 1 Time Series.pdf
47 pages
lab04_notebook
No ratings yet
lab04_notebook
22 pages
CATEGORICAL FEATURES With PYTHON
No ratings yet
CATEGORICAL FEATURES With PYTHON
24 pages
Cohort Analysis
No ratings yet
Cohort Analysis
11 pages
Assignment 2 - SourceCode
No ratings yet
Assignment 2 - SourceCode
6 pages
ML Project - Jupyter Notebook
No ratings yet
ML Project - Jupyter Notebook
5 pages
Visualisation of The Data - Jupyter Notebook
No ratings yet
Visualisation of The Data - Jupyter Notebook
21 pages
credit-card_notebooks_preprocessed-data_data_preprocessing.ipynb at main · Shubhamdongarjal_credit-card
No ratings yet
credit-card_notebooks_preprocessed-data_data_preprocessing.ipynb at main · Shubhamdongarjal_credit-card
15 pages
10 Minutes To Pandas
No ratings yet
10 Minutes To Pandas
19 pages
Natural Disasters Prediction 1
No ratings yet
Natural Disasters Prediction 1
26 pages
230103-ECON209_S2025__Lab_2.ipynb-Colab
No ratings yet
230103-ECON209_S2025__Lab_2.ipynb-Colab
10 pages
Stock Prediction Web App - Jupyter Notebook
No ratings yet
Stock Prediction Web App - Jupyter Notebook
12 pages
ML Practical 1 Code
100% (1)
ML Practical 1 Code
1 page
Loading The Dataset: 'Churn - Modelling - CSV'
No ratings yet
Loading The Dataset: 'Churn - Modelling - CSV'
6 pages
DMV - 1 - Jupyter Notebook
No ratings yet
DMV - 1 - Jupyter Notebook
4 pages
Estadística Univariable - 20160210 - Jupyter Notebook
No ratings yet
Estadística Univariable - 20160210 - Jupyter Notebook
7 pages
Kgptalkie Com Multi Step Time Series Predicting Using RNN LSTM
No ratings yet
Kgptalkie Com Multi Step Time Series Predicting Using RNN LSTM
32 pages
Data Mining Models: Techniques and Applications
From Everand
Data Mining Models: Techniques and Applications
Ravi Deshpande
No ratings yet
Vired
No ratings yet
Vired
4 pages
Plan The Week - Storytelling With Data-1
No ratings yet
Plan The Week - Storytelling With Data-1
5 pages
Suresh-Rose Time Series Forecasting Project Report
100% (1)
Suresh-Rose Time Series Forecasting Project Report
75 pages
Feature Engineering
No ratings yet
Feature Engineering
23 pages
DSML Brochure 2023 Latest Feb
No ratings yet
DSML Brochure 2023 Latest Feb
18 pages
Tableau+2020 2+relationships
No ratings yet
Tableau+2020 2+relationships
2 pages
Data Visualisation Using Tableau
No ratings yet
Data Visualisation Using Tableau
12 pages
Statistics Materials: Data Science: Week 9
No ratings yet
Statistics Materials: Data Science: Week 9
22 pages
Wine DS
No ratings yet
Wine DS
14 pages
Model Deployment GL
No ratings yet
Model Deployment GL
20 pages
Statistics Interview Questions
100% (2)
Statistics Interview Questions
5 pages
Statistics and Machine Learning
No ratings yet
Statistics and Machine Learning
51 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
60 pages
Machine: Learning
No ratings yet
Machine: Learning
15 pages
Machine Learning: by Team 2
No ratings yet
Machine Learning: by Team 2
41 pages
Python Lib
No ratings yet
Python Lib
33 pages
Titanic DS Callenge
No ratings yet
Titanic DS Callenge
24 pages
Deposit Subscription: Eda Mini Project
No ratings yet
Deposit Subscription: Eda Mini Project
41 pages
Statistic & Machine Learning: Team 2
No ratings yet
Statistic & Machine Learning: Team 2
42 pages
Python Codin
No ratings yet
Python Codin
4 pages
Stru of DS Project
No ratings yet
Stru of DS Project
24 pages
Project Time Series Forecasting ROSE Dataset by Somya Dhar 1 PDF
No ratings yet
Project Time Series Forecasting ROSE Dataset by Somya Dhar 1 PDF
52 pages
68617302-Spermans-Rank-Correlation Solved Examples
No ratings yet
68617302-Spermans-Rank-Correlation Solved Examples
23 pages
Lecture CH 3
No ratings yet
Lecture CH 3
11 pages
Psych Assessment Lesson 3 Part 1 Notes
No ratings yet
Psych Assessment Lesson 3 Part 1 Notes
7 pages
Lesson Note For S.S 2
No ratings yet
Lesson Note For S.S 2
24 pages
6.2 Part1 Hwork Answers
No ratings yet
6.2 Part1 Hwork Answers
4 pages
Hubungan Antara Pengetahuan Gizi, Aktivitas Fisik, Dan Intensitas Olahraga Dengan Status Gizi
No ratings yet
Hubungan Antara Pengetahuan Gizi, Aktivitas Fisik, Dan Intensitas Olahraga Dengan Status Gizi
7 pages
Business Statistics (B.com) P 1
No ratings yet
Business Statistics (B.com) P 1
99 pages
01 Standard Deviation Labreport
No ratings yet
01 Standard Deviation Labreport
5 pages
STATISTICS MODULE 2-Updated
No ratings yet
STATISTICS MODULE 2-Updated
21 pages
Flood Estimation by Log Pearson's Type III Method
No ratings yet
Flood Estimation by Log Pearson's Type III Method
3 pages
Statistical Tools
No ratings yet
Statistical Tools
12 pages
Lecture 7 101
No ratings yet
Lecture 7 101
27 pages
Uji Valid Kuisioner Penerapan Indikator Feeding Rules
No ratings yet
Uji Valid Kuisioner Penerapan Indikator Feeding Rules
2 pages
Central Limit Theorem: Finding The Mean and Variance of The Sampling Distribution of Means
No ratings yet
Central Limit Theorem: Finding The Mean and Variance of The Sampling Distribution of Means
5 pages
Ib Statistics - No - Calculator
No ratings yet
Ib Statistics - No - Calculator
29 pages
Judul Pembahasan
No ratings yet
Judul Pembahasan
2 pages
STATS 2nd Quarter Tutor
No ratings yet
STATS 2nd Quarter Tutor
3 pages
S-Chapter 15: Frequency Distribution, Cross-Tabulation, and Hypothesis Testing
No ratings yet
S-Chapter 15: Frequency Distribution, Cross-Tabulation, and Hypothesis Testing
4 pages
Measures of Central Tendency Position
No ratings yet
Measures of Central Tendency Position
12 pages
ECS7020P ClassificationExercisesSolutions II
No ratings yet
ECS7020P ClassificationExercisesSolutions II
7 pages
Vicky patil_Practical_9 - Colab
No ratings yet
Vicky patil_Practical_9 - Colab
4 pages
Math Practice - 2023.12.30 2
No ratings yet
Math Practice - 2023.12.30 2
15 pages
Ovwmp IGC0 LXF SVYubzo E
No ratings yet
Ovwmp IGC0 LXF SVYubzo E
6 pages
Aiag SPC
No ratings yet
Aiag SPC
31 pages
S1 Revision Worksheet November 2020: Chapter 2, 3, 4, 5
No ratings yet
S1 Revision Worksheet November 2020: Chapter 2, 3, 4, 5
4 pages
Unit 1 - Descriptive Statistics
No ratings yet
Unit 1 - Descriptive Statistics
49 pages
EPGP 10 DSA Business Analysis QT Project Group 02
No ratings yet
EPGP 10 DSA Business Analysis QT Project Group 02
15 pages
CT
No ratings yet
CT
43 pages
Statistics Project
No ratings yet
Statistics Project
3 pages