Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
29 views

Assignment (EMBA 502)

Uploaded by

TanvirAhmed
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views

Assignment (EMBA 502)

Uploaded by

TanvirAhmed
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Assignment On:

DATA PROCESSING OF A DATASET INCLUDING FREQUENCY


DISTRIBUTION, CENTRAL TENDENCY, DISPERSION, CORRELATION
& REGRESSION ANALYSIS

Course Code: EMBA 502

Course Name: Business Mathematics and Statistics

Course Instructor: Dr. M. Amir Hossain

Group members

S. Tanvir Ahmed 2019-3-91-025


Mirza Alamin 2019-1-91-016
Injamul Hossain 2019-1-91-007
Farhana Ahmed 2019-2-91-001
Shaikh Rayhan Hossain 2019-1-91-013
Source of Data: BASF Bangladesh Limited
(A German based multinational chemical company)
Plant Location: Tejgaon, Dhaka
Plant Capacity: Maximum 60 MT per day
Type of Data: Daily basis production of admixture chemical
Data Collection: Month of October 2019

SL Production Production Product Quantity


Product Name Batch No.
No. Date Hour No. (MT)
1 01-10-19 6 50424562 MRheobuild 623 DJ33119 25
2 02-10-19 3 50548519 MRheobuild 623 DJ33319 12
3 03-10-19 9 50191350 MPolyheed 8650 DJ33419 35
4 04-10-19 9 50191350 MPolyheed 8650 DJ33519 37
5 05-10-19 4 50548519 MRheobuild 623 DJ33619 14
6 06-10-19 8 50424561 MPolyheed 8632 DJ33719 33
7 07-10-19 9 50548519 MRheobuild 623 DJ33819 37
8 08-10-19 11 50424562 MRheobuild 623 DJ33919 45
9 09-10-19 8 50411740 MPolyheed 8320 DJ34019 32
10 10-10-19 14 50424562 MRheobuild 623 DJ34119 55
11 11-10-19 6 50191350 MPolyheed 8650 DJ34219 25
12 12-10-19 11 50411740 MPolyheed 8320 DJ34319 42
13 13-10-19 6 50424562 MRheobuild 623 DJ34419 22
14 14-10-19 10 50411740 MPolyheed 8320 DJ34519 38
15 15-10-19 14 50411740 MPolyheed 8320 DJ34619 55
16 16-10-19 10 50411740 MPolyheed 8320 DJ34719 38
17 17-10-19 5 51747280 MRheobuild 1100 DJ34819 18
18 18-10-19 7 50424562 MRheobuild 623 DJ34919 28
19 19-10-19 5 50626218 MPolyheed 8395 DJ35019 18
20 20-10-19 11 50424562 MRheobuild 623 DJ35119 45
21 21-10-19 11 56277085 MGlenium ACE 30JP DJ35219 42
22 22-10-19 7 50548519 MRheobuild 623 DJ35319 28
23 23-10-19 8 50544229 MGlenium SKY 8632 DJ35419 32
24 24-10-19 4 50411740 MPolyheed 8320 DJ35519 15
25 25-10-19 6 50411740 MPolyheed 8320 DJ35619 25
26 26-10-19 12 50548519 MRheobuild 623 DJ35719 48
27 27-10-19 9 50660161 MPolyheed 8396 DJ35819 36
28 28-10-19 14 50411740 MPolyheed 8320 DJ35919 55
29 29-10-19 9 50411740 MPolyheed 8320 DJ36019 35
30 30-10-19 9 50626218 MPolyheed 8395 DJ36119 37
31 31-10-19 11 50548519 MRheobuild 623 DJ35319 45
Variable Types:
Variable is a characteristic that can assume any set of prescribed values.
Example: Age, Height, Temperature etc.
There are two types of variables
➢ Qualitative Variable (Attribute)
➢ Quantitative Variable

Qualitative Variable: The Characteristic or variable being studied in non-numeric.


Example: Gender, Religious.
Quantitative Variable: Variables can be measured and reported numerically.
Example- Time, age, income, weather, temperature
There are 2 types of Quantitative Variable
1. Discrete Variable: Can only assume certain values and there are usually “gaps”
between values.
Example- Income, Temperature, GPA
2. Continuous Variables: Can assume any value with a specific range
Examples: Weather, CGPA

This Data set is a discrete variable data

Scale of Measurement:
Scale of measurement
Measurement means assigning numbers or other symbols to characteristics of
objects according to certain prescribed rules.

Ratio level
The interval level with an inherent zero starting point. Differences and ratios are
meaningful for this level of measurement
Example: Materials quantity
Frequency Distribution:

Data Presentation (Frequency Distribution)


a. Class mark (midpoint): A point that divides a class into two equal parts. This is the
average between the upper and lower class limits.
Lower Limit + Upper Limit
b. Class Midpoint = ( )
2
c. Class interval: For a frequency distribution having classes of the same size, the
class interval is the difference between upper and lower limits of a class.
d. Class Interval = (Upper Limit − Lower Limit)

Product Quantity Range Tallies Frequency

10 - 20 lllll 5

20 - 30 llllll 6

30 - 40 lllllllllll 11

40 - 50 llllll 6

50 - 60 lll 3
Histogram:
A graph in which the classes are marked on the horizontal axis and the class frequencies
on the vertical axis. The class frequencies are represented by the heights of the bars
(equal class interval) and the bars are drawn adjacent to each other

Product Quantity Range Frequency

10 - 20 5

20 - 30 6

30 - 40 11

40 - 50 6

50 - 60 3

Histogram

12

10
Frequency

0
0-10 10-20 20-30 30-40 40-50 50-60

Product Quantity Range


Frequency Polygon:
Frequency polygon consists of line segments connecting the points formed by plotting the
midpoint and the class frequency for each class and then joined with X-axis at lower limit
of first class and upper limit of last class.

Product Quantity Range Mid-Point Frequency

10 - 20 15 5

20 - 30 25 6

30 - 40 35 11

40 - 50 45 6

50 - 60 55 3

Frequency Polygon

12

10

8
Frequency

0
0-10 10-20 20-30 30-40 40-50 50-60 60-70

Mid Level of Production Range


Cumulative Frequency Curve:

Cumulative frequency curve (ogive curve)) is a smooth curve obtained by joining the
points formed by plotting upper limit (less than type) or lower limit (more than type) of and
the cumulative frequency of each class. It is used to determine how many or what
proportion of the data values are below or above a certain value.

Less than type


Product Quantity Range Frequency
CF

10 - 20 5 5

20 - 30 6 11

30 - 40 11 22

40 - 50 6 28

50 - 60 3 31

Cumulative Frequency Curve


35

30
Cumulative Frequency

25

20

15

10

0
0 10 20 30 40 50 60 70

Upper Limit of CI (production quantity)


Central Tendency:

Measure of Central Tendency:


• Measure of Central Tendency is a single value that summarizes a set of data
• It locates the center of value
• Also known as measure of location or average

Different Measures of Central Tendency:


• Arithmetic mean
• Median
• Mode

Product Quantity Frequency Mid Value


fx fc
Range (f) (x)

10 - 20 5 15 75 5

20 - 30 6 25 150 11

30 - 40 11 35 385 22

40 - 50 6 45 270 28

50 - 60 3 46 138 31

Σf = 31 Σfx = 1018
Σfx 1018
Mean: X= = = 32.84
Σf 31

𝑁
−𝑓𝑐
2
Median: L+[ ]xC
𝑓𝑚
31
−11
2
= 30 + [ x 10
11

= 30 + [0.409] x 10
= 34.09

∆1
Mode: L+( )xC
∆1+∆2

5
= 30 + ( ) x 10
5+5
= 30 + (0.5) x 10

= 35

The result is showing that Mode > Median > Mean

So, the frequency distribution is Negative Skewed Distribution


Dispersion:

❖ It deals with spread of the data


❖ A small value of the measure of dispersion indicates that data are clustered closely
❖ A large value of dispersion indicates the estimate of central tendency is not reliable

❖ Measure of Scale:

Absolute Measure: Range


Mean Deviation
Variance
Standard Deviation
Relative Measure: Co-efficient of Variation (CV)

Product Frequency Mid Value


fx |x-x̅| f|x-x̅| (x-x̅)² f(x-x̅)²
Quantity Range (f) (x)

10 – 20 5 15 75 17.84 89.20 318.26 1591.30

20 – 30 6 25 150 7.84 47.04 61.46 368.76

30 - 40 11 35 385 2.16 23.76 4.66 51.26

40 - 50 6 45 270 12.16 72.96 147.86 887.16

50 - 60 3 46 138 13.16 39.48 173.18 519.54

Σf = 31 Σfx = 1018 Σf|x-x̅| = 272.44 Σf(x-x̅)² = 3418.02

Σfx 1018
Here, x̅ = = = 32.84
Σf 31
1) Range = Highest Limit of Upper Class- Lowest Limit of Lower Class
= 60 - 10
= 50

So, the maximum deviation range of observation is 50

Σf|x−x̅|
2) Mean Deviation =
Σf

272.44
=
31
= 8.79

So, the arithmetic mean of the deviations of the observations from the mean and among
themselves is 8.79

Σf(x−x̅)²
3) Variance: S2 =
Σf
3418.02
=
31
= 110.25

So, the arithmetic mean of the squared deviations of the observations from the mean is
110.25

4) Standard Deviation = √S2


= √110.25
= 10.50

So, the standard deviation of observations is 10.50

𝑆
5) Co-Efficient of Variation, CV = ∗ 100
𝑋

10.50
= ∗ 100
32.84

= 31.97 %
x y x-x̅ y-y̅ (x-x̅)² (y-y̅)² (x-x̅)(y-y̅)
6 25 -2.58 -8.94 6.66 79.92 23.07
3 12 -5.58 -21.94 31.14 481.36 122.43
9 35 0.42 1.06 0.18 1.12 0.45
9 37 0.42 3.06 0.18 9.36 1.29
4 14 -4.58 -19.94 20.98 397.60 91.33
8 33 -0.58 -0.94 0.34 0.88 0.55
9 37 0.42 3.06 0.18 9.36 1.29
11 45 2.42 11.06 5.86 122.32 26.77
8 32 -0.58 -1.94 0.34 3.76 1.13
14 55 5.42 21.06 29.38 443.52 114.15
6 25 -2.58 -8.94 6.66 79.92 23.07
11 42 2.42 8.06 5.86 64.96 19.51
6 22 -2.58 -11.94 6.66 142.56 30.81
10 38 1.42 4.06 2.02 16.48 5.77
14 55 5.42 21.06 29.38 443.52 114.15
10 38 1.42 4.06 2.02 16.48 5.77
5 18 -3.58 -15.94 12.82 254.08 57.07
7 28 -1.58 -5.94 2.50 35.28 9.39
5 18 -3.58 -15.94 12.82 254.08 57.07
11 45 2.42 11.06 5.86 122.32 26.77
11 42 2.42 8.06 5.86 64.96 19.51
7 28 -1.58 -5.94 2.50 35.28 9.39
8 32 -0.58 -1.94 0.34 3.76 1.13
4 15 -4.58 -18.94 20.98 358.72 86.75
6 25 -2.58 -8.94 6.66 79.92 23.07
12 48 3.42 14.06 11.70 197.68 48.09
9 36 0.42 2.06 0.18 4.24 0.87
14 55 5.42 21.06 29.38 443.52 114.15
9 35 0.42 1.06 0.18 1.12 0.45
9 37 0.42 3.06 0.18 9.36 1.29
11 45 2.42 11.06 5.86 122.32 26.77
Σx = 266 Σy = 1052 Σ(x-x̅)² = 266.55 Σ(y-y̅)² = 4299.87 Σ(x-x̅)(y-y̅) = 1063.16

x̅ = 8.58 y̅ = 33.94
Co-efficient of Correlation:

The Coefficient of Correlation (r) is a measure of the strength of the relationship between
two variables.

It can range from -1.00 to 1.00

𝛴(𝑥−𝑥̅ )(𝑦−𝑦̅ )
Co-efficient of Correlation, r =
√𝛴(𝑥−𝑥̅ )2 𝛴(𝑦−𝑦̅ )²

1063.16
=
√(266.55∗4299.87)

1063.16
=
1070.57
= 0.99

As we know, if 0.5 ≤ r ≤ 1, then variables have strong positive correlation. So, admixture
production time and production quantity have a strong positive Correlation.

Coefficient of Determination:

Coefficient of Determination (r2) is proportion of the total variation in the dependent


variable y that is explained or accounted for by the variation in the independent variable
x.

The coefficient of determination is the square of the coefficient of correlation.

It can range from -1.00 to 1.00

So, Co-efficient of Determination, r² = (0.99)2

= 0.98

So, 98% variation of admixture production can be explained by the variation in


production time.
Scatter Diagram:

➢ A plot of the paired observations of X and Y on a graph


➢ Graphically shows the relationship between two variables
➢ Common practice is to place the dependent variable on Y–axis and independent
variable on X–axis

Here, dependent variable is production quantity (Y) and independent variable is


production time (X)

Scatter Diagram
60

50
Production Quantity

40

30

20

10

0
0 2 4 6 8 10 12 14 16

Production Time
Regression Model:

➢ In regression analysis an equation is developed to express the relationship


between dependent and independent variables
➢ The equation must be linear

Purpose: To determine the regression equation; it is used to predict the value of the
dependent variable (Y) based on the independent variable (X).

General form of linear regression model:

Y = a + bX

Where, Y is dependent variable, X is independent variable and b is regression co-efficient,


a is intercept term.

𝛴(𝑥−𝑥̅ )(𝑦−𝑦̅)
b=
𝛴(𝑥−𝑥̅ )2

1063.16
=
266.55

= 3.98

a = Y̅- bX̅

= 33.94 – 3.98 * 8.58

= 33.94 – 34.15

= -0.21

So, Linear Regression Model is, Y = -0.21 + 3.98 X

You might also like