Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
36 views

Chapter 5

This document contains R code for exploring and visualizing distributions of sample data. Several functions are used such as hist(), density(), stem(), and qqnorm() to examine the distributions and compare them to theoretical normal distributions. Samples are taken from the data and random numbers are generated from Poisson and uniform distributions. Normality tests like Shapiro-Wilk and Kolmogorov-Smirnov are applied.

Uploaded by

Rajat Bansal
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views

Chapter 5

This document contains R code for exploring and visualizing distributions of sample data. Several functions are used such as hist(), density(), stem(), and qqnorm() to examine the distributions and compare them to theoretical normal distributions. Samples are taken from the data and random numbers are generated from Poisson and uniform distributions. Normality tests like Shapiro-Wilk and Kolmogorov-Smirnov are applied.

Uploaded by

Rajat Bansal
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

5.

DATA: DISTRIBUTION

> data2

[1] 3 5 7 5 3 2 6 8 5 6 9 4 5 7 3 4

> table(data2)

data2

23456789

13242211

> stem(data2)

The decimal point is at the |

2 | 0000

4 | 000000

6 | 0000

8 | 00

> stem(data2,scale=2)

The decimal point is at the |

2|0

1
3 | 000

4 | 00

5 | 0000

6 | 00

7 | 00

8|0

9|0

> data4

[1] 23.0 17.0 12.5 11.0 17.0 12.0 14.5 9.0 11.0 9.0 12.5 14.5 17.0 8.0 21.0

> stem(data4)

The decimal point is 1 digit(s) to the right of the |

0 | 899

1 | 11233

1 | 55777

2 | 13

> stem(data4,scale=2)

The decimal point is at the |

8 | 000

2
10 | 00

12 | 055

14 | 55

16 | 000

18 |

20 | 0

22 | 0

> data2

[1] 3 5 7 5 3 2 6 8 5 6 9 4 5 7 3 4

> hist(data2)

> hist(data2,breaks = 'sturges')

3
> hist(data2,breaks='scott')

> hist(data2,breaks='FD')

4
> hist(data2,breaks=2:9)

5
> hist(data2,breaks=c(2,3,4,5,6,7,8,9))

> hist(data2,breaks=c(2,4,5,6,9))

6
> hist(data2,col='gray75',main=NULL,xlab='size class for
data2',ylim=c(0,0.3),freq=FALSE)

7
> dens=density(data2)

> dens

Call:

density.default(x = data2)

Data: data2 (16 obs.); Bandwidth 'bw' = 0.9644

x y

Min. :-0.8932 Min. :0.0002982

1st Qu.: 2.3034 1st Qu.:0.0134042

Median : 5.5000 Median :0.0694574

Mean : 5.5000 Mean :0.0781187

8
3rd Qu.: 8.6966 3rd Qu.:0.1396352

Max. :11.8932 Max. :0.1798531

> names(dens)

[1] "x" "y" "bw" "n" "call" "data.name" "has.na"

> str(dens)

List of 7

$x : num [1:512] -0.893 -0.868 -0.843 -0.818 -0.793 ...

$y : num [1:512] 0.000313 0.000339 0.000367 0.000397 0.000429 ...

$ bw : num 0.964

$n : int 16

$ call : language density.default(x = data2)

$ data.name: chr "data2"

$ has.na : logi FALSE

- attr(*, "class")= chr "density"

> plot(dens$x,dens$y)

9
> plot(density(data2),main="",xlab='size bin classes')

10
> hist(data2,freq=F,col='gray85')

> lines(density(data2),lty=2)

> lines(density(data2,k='rectangular'))

11
> rnorm(20,mean=5,sd=1)

[1] 4.079572 5.742426 4.463164 3.954680 4.359709 5.825043 5.649413 3.823335 3.722325
5.145951

[11] 3.862853 4.442077 5.239563 4.148004 5.109675 6.373912 5.599388 5.136036 4.826468
5.324387

> pnorm(5,mean=5,sd=1)

[1] 0.5

> qnorm(0.5,5,1)

[1] 5

> dnorm(c(54,5,6))

[1] 0.000000e+00 1.486720e-06 6.075883e-09

> dnorm(c(54,5,6),mean=5,sd=1)

[1] 0.0000000 0.3989423 0.2419707

> dnorm(c(4,5,6))

12
[1] 1.338302e-04 1.486720e-06 6.075883e-09

> qnorm(c(0.05,0.95),mean=5,sd=1)

[1] 3.355146 6.644854

> data2.norm=rnorm(1000,mean(data2),sd(data2))

> hist(data2,freq=FALSE)

> lines(density(data2.norm))

13
> hist(data2.norm,freq=F)

> lines(density(data2))

14
> hist(data2.norm,freq=F,border='gray50',main='comparing two distributions',xlab='data2
size classes')

15
> lines(density(data2),lwd=2)

> rpois(50,lambda=10)

[1] 12 13 11 7 14 5 10 10 6 10 3 9 14 14 6 8 8 8 7 7 9 14 12 11 6 11 15 9 10 10 12 11

[33] 15 15 14 12 6 15 4 6 9 11 13 14 12 8 8 10 6 14

> pbinom(c(3,6,9,12),size=17,prob=0.5)

[1] 0.006362915 0.166152954 0.685470581 0.975479126

> qt(0.975,df=c(5,10,100,Inf))

[1] 2.570582 2.228139 1.983972 1.959964

> (1-pt(c(1.6,1.9,2.2),df=Inf))*2

[1] 0.10959858 0.05743312 0.02780690

> pt(c(1.6,1.9,2.2),Inf)

[1] 0.9452007 0.9712834 0.9860966

> runif(10)

[1] 0.03654488 0.38184345 0.32327513 0.79359610 0.90218825 0.64389728 0.15583453


16
0.32216108

[9] 0.90528344 0.72820431

> punif(6,min=0,max=10)

[1] 0.6

> data2

[1] 3 5 7 5 3 2 6 8 5 6 9 4 5 7 3 4

> sample(data2[data2<5])

[1] 2 3 4 3 4 3

> sample(data2[data2>5],size=3)

[1] 7 7 6

> sample(data2[data2>5])

[1] 7 6 8 9 6 7

> sample(data2[data2>8])

[1] 3 8 9 7 5 1 6 4 2

> data2

[1] 3 5 7 5 3 2 6 8 5 6 9 4 5 7 3 4

> set.seed(4)

> sample(data2,size=3)

[1] 8 9 7

> set.seed(4)

> sample(data2[data2>8])

[1] 8 3 9 7 4 6 2 1 5

> data2

[1] 3 5 7 5 3 2 6 8 5 6 9 4 5 7 3 4

> shapiro.test(data2)

17
Shapiro-Wilk normality test

data: data2

W = 0.96332, p-value = 0.7223

> shapiro.test(rpois(100,lambda=5))

Shapiro-Wilk normality test

data: rpois(100, lambda = 5)

W = 0.9735, p-value = 0.04122

> ks.test(data2,"pnorm",mean=5,sd=2)

One-sample Kolmogorov-Smirnov test

data: data2

D = 0.125, p-value = 0.9639

alternative hypothesis: two-sided

Warning message:

In ks.test(data2, "pnorm", mean = 5, sd = 2) :

ties should not be present for the Kolmogorov-Smirnov test

> qqnorm(data2)

18
> qqnorm(data2,main = 'QQ plot of example data',xlab = 'theoretical',ylab='quantiles for data2')

> qqline(data2,lwd=2,lty=2)

19
> qqplot(rpois(50,5),rnorm50(,5,1))

> qqplot(data2,data1)

20
> qqp=qqplot(data2,rnorm(50,5,1))

> abline(lm(qqp$y~qqp$x))

21
22

You might also like