Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
9 views

R Programming Unit 3

The document discusses data visualization in R, highlighting various packages such as plotly, ggplot2, and shiny, which facilitate the creation of interactive and aesthetically pleasing graphics. It covers standard graphics functions, the grammar of graphics, and the advantages and disadvantages of data visualization, including insights into pie charts, bar charts, and boxplots. Additionally, it provides examples of creating these visualizations with R code.

Uploaded by

Chaya Anu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

R Programming Unit 3

The document discusses data visualization in R, highlighting various packages such as plotly, ggplot2, and shiny, which facilitate the creation of interactive and aesthetically pleasing graphics. It covers standard graphics functions, the grammar of graphics, and the advantages and disadvantages of data visualization, including insights into pie charts, bar charts, and boxplots. Additionally, it provides examples of creating these visualizations with R code.

Uploaded by

Chaya Anu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

R PROGRAMMING V - SEM BCA

UNIT – 3
STATISTICS & PROBOBILITY

R Data Visualization

In R, we can create visually appealing data visualizations by writing few lines of code. For this
purpose, we use the diverse functionalities of R. Data visualization is an efficient technique for
gaining insight about data through a visual medium. With the help of visualization techniques, a
human can easily obtain information about hidden patterns in data that might be neglected.

By using the data visualization technique, we can work with large datasets to efficiently obtain
key insights about it.

R Visualization Packages

R provides a series of packages for data visualization. These packages are as follows:

1) plotly

The plotly package provides online interactive and quality graphs. This package extends upon the
JavaScript library ?plotly.js.

2) ggplot2

R allows us to create graphics declaratively. R provides the ggplot package for this purpose. This
package is famous for its elegant and quality graphs, which sets it apart from other visualization
packages.

SHREE MEDHA DEGREE COLLEGE MANJESH M


R PROGRAMMING V - SEM BCA

3) tidyquant

The tidyquant is a financial package that is used for carrying out quantitative financial analysis.
This package adds under tidyverse universe as a financial package that is used for importing,
analyzing, and visualizing the data.

4) taucharts

Data plays an important role in taucharts. The library provides a declarative interface for rapid
mapping of data fields to visual properties.

5) ggiraph

It is a tool that allows us to create dynamic ggplot graphs. This package allows us to add tooltips,
JavaScript actions, and animations to the graphics.

6) geofacets

This package provides geofaceting functionality for 'ggplot2'. Geofaceting arranges a sequence of
plots for different geographical entities into a grid that preserves some of the geographical
orientation.

7) googleVis

googleVis provides an interface between R and Google's charts tools. With the help of this
package, we can create web pages with interactive charts based on R data frames.

8) RColorBrewer

This package provides color schemes for maps and other graphics, which are designed by Cynthia
Brewer.

9) dygraphs

The dygraphs package is an R interface to the dygraphs JavaScript charting library. It provides
rich features for charting time-series data in R.

10) shiny

R allows us to develop interactive and aesthetically pleasing web apps by providing


a shiny package. This package provides various extensions with HTML widgets, CSS, and
JavaScript.

SHREE MEDHA DEGREE COLLEGE MANJESH M


R PROGRAMMING V - SEM BCA

R Graphics
Graphics play an important role in carrying out the important features of the data. Graphics are
used to examine marginal distributions, relationships between variables, and summary of very
large data. It is a very important complement for many statistical and computational techniques.

Standard Graphics

R standard graphics are available through package graphics, include several functions which
provide statistical plots, like:

o Scatterplots
o Piecharts
o Boxplots
o Barplots etc.

We use the above graphs that are typically a single function call.

The basics of the grammar of graphics

There are some key elements of a statistical graphic. These elements are the basics of the grammar
of graphics. Let's discuss each of the elements one by one to gain the basic knowledge of graphics.

SHREE MEDHA DEGREE COLLEGE MANJESH M


R PROGRAMMING V - SEM BCA

1) Data

Data is the most crucial thing which is processed and generates an output.

2) Aesthetic Mappings

Aesthetic mappings are one of the most important elements of a statistical graphic. It controls the
relation between graphics variables and data variables. In a scatter plot, it also helps to map the
temperature variable of a data set into the X variable.

In graphics, it helps to map the species of a plant into the color of dots.

3) Geometric Objects

Geometric objects are used to express each observation by a point using the aesthetic mappings. It
maps two variables in the data set into the x,y variables of the plot.

4) Statistical Transformations

Statistical transformations allow us to calculate the statistical analysis of the data in the plot.The
statistical transformation uses the data and approximates it with the help of a regression line having
x,y coordinates, and counts occurrences of certain values.

5) Scales

It is used to map the data values into values present in the coordinate system of the graphics device.

6) Coordinate system

SHREE MEDHA DEGREE COLLEGE MANJESH M


R PROGRAMMING V - SEM BCA

The coordinate system plays an important role in the plotting of the data.

o Cartesian
o Plot

7) Faceting

Faceting is used to split the data into subgroups and draw sub-graphs for each group.

Advantages of Data Visualization in R


1. Understanding

It can be more attractive to look at the business. And, it is easier to understand through graphics
and charts than a written document with text and numbers. Thus, it can attract a wider range of
audiences. Also, it promotes the widespread use of business insights that come to make better
decisions.

2. Efficiency

Its applications allow us to display a lot of information in a small space. Although, the decision-
making process in business is inherently complex and multifunctional, displaying evaluation
findings in a graph can allow companies to organize a lot of interrelated information in useful
ways.

3. Location

Its app utilizing features such as Geographic Maps and GIS can be particularly relevant to wider
business when the location is a very relevant factor. We will use maps to show business insights
from various locations, also consider the seriousness of the issues, the reasons behind them, and
working groups to address them.

Disadvantages of Data Visualization in R

1. Cost

R application development range a good amount of money. It may not be possible, especially for
small companies, that many resources can be spent on purchasing them. To generate reports, many
companies may employ professionals to create charts that can increase costs. Small enterprises are
often operating in resource-limited settings, and are also receiving timely evaluation results that
can often be of high importance.

2. Distraction

SHREE MEDHA DEGREE COLLEGE MANJESH M


R PROGRAMMING V - SEM BCA

However, at times, data visualization apps create highly complex and fancy graphics-rich reports
and charts, which may entice users to focus more on the form than the function. If we first add
visual appeal, then the overall value of the graphic representation will be minimal. In resource-
setting, it is required to understand how resources can be best used. And it is also not caught in the
graphics trend without a clear purpose.

R Pie Charts

R programming language has several libraries for creating charts and graphs. A pie-chart is a
representation of values in the form of slices of a circle with different colors. Slices are labeled
with a description, and the numbers corresponding to each slice are also shown in the chart.
However, pie charts are not recommended in the R documentation, and their characteristics are
limited. The authors recommend a bar or dot plot on a pie chart because people are able to measure
length more accurately than volume.

The Pie charts are created with the help of pie () function, which takes positive numbers as vector
input. Additional parameters are used to control labels, colors, titles, etc.

There is the following syntax of the pie() function:

1. pie(X, Labels, Radius, Main, Col, Clockwise)

Here, ip 10s

1. X is a vector that contains the numeric values used in the pie chart.
2. Labels are used to give the description to the slices.
3. Radius describes the radius of the pie chart.
4. Main describes the title of the chart.
5. Col defines the color palette.
6. Clockwise is a logical value that indicates the clockwise or anti-clockwise direction in which slices
are drawn.

Title and color


A pie chart has several more features that we can use by adding more parameters to the pie()
function. We can give a title to our pie chart by passing the main parameter. It tells the title of the
pie chart to the pie() function. Apart from this, we can use a rainbow colour pallet while drawing
the chart by passing the col parameter.

SHREE MEDHA DEGREE COLLEGE MANJESH M


R PROGRAMMING V - SEM BCA

Note: The length of the pallet will be the same as the number of values that we have for the chart.
So for that, we will use length() function.

Let's see an example to understand how these methods work in creating an attractive pie chart with
title and color.

Example

1. # Creating data for the graph.


2. x <- c(20, 65, 15, 50)
3. labels <- c("India", "America", "Shri Lanka", "Nepal")
4. # Giving the chart file a name.
5. png(file = "title_color.jpg")
6. # Plotting the chart.
7. pie(x,labels,main="Country Pie chart",col=rainbow(length(x)))
8. # Saving the file.
9. dev.off()

Output:

SHREE MEDHA DEGREE COLLEGE MANJESH M


R PROGRAMMING V - SEM BCA

Slice Percentage & Chart Legend

There are two additional properties of the pie chart, i.e., slice percentage and chart legend. We can
show the data in the form of percentage as well as we can add legends to plots in R by using the
legend() function. There is the following syntax of the legend() function.

1. legend(x,y=NULL,legend,fill,col,bg)

Here,

o x and y are the coordinates to be used to position the legend.


o legend is the text of legend
o fill is the color to use for filling the boxes beside the legend text.
o col defines the color of line and points besides the legend text.
o bg is the background color for the legend box.

Example

1. # Creating data for the graph.


2. x <- c(20, 65, 15, 50)
3. labels <- c("India", "America", "Shri Lanka", "Nepal")
4. pie_percent<- round(100*x/sum(x), 1)
5. # Giving the chart file a name.
6. png(file = "per_pie.jpg")
7. # Plotting the chart.
8. pie(x, labels = pie_percent, main = "Country Pie Chart",col = rainbow(length(x)))
9. legend("topright", c("India", "America", "Shri Lanka", "Nepal"), cex = 0.8,
10. fill = rainbow(length(x)))
11. #Saving the file.
12. dev.off()

Output:

SHREE MEDHA DEGREE COLLEGE MANJESH M


R PROGRAMMING V - SEM BCA

R Bar Charts

A bar chart is a pictorial representation in which numerical values of variables are represented by
length or height of lines or rectangles of equal width. A bar chart is used for summarizing a set of
categorical data. In bar chart, the data is shown through rectangular bars having the length of the
bar proportional to the value of the variable.

In R, we can create a bar chart to visualize the data in an efficient manner. For this purpose, R
provides the barplot() function, which has the following syntax:

1. barplot(h,x,y,main, names.arg,col)

S.No Parameter Description

1. H A vector or matrix which contains numeric values used in the bar chart.

2. xlab A label for the x-axis.

3. ylab A label for the y-axis.

4. main A title of the bar chart.

SHREE MEDHA DEGREE COLLEGE MANJESH M


R PROGRAMMING V - SEM BCA

5. names.arg A vector of names that appear under each bar.

6. col It is used to give colors to the bars in the graph.

Labels, Title & Colors

Like pie charts, we can also add more functionalities in the bar chart by-passing more arguments
in the barplot() functions. We can add a title in our bar chart or can add colors to the bar by adding
the main and col parameters, respectively. We can add another parameter i.e., args.name, which is
a vector that has the same number of values, which are fed as the input vector to describe the
meaning of each bar.

Let's see an example to understand how labels, titles, and colors are added in our bar chart.

Example

1. # Creating the data for Bar chart


2. H <- c(12,35,54,3,41)
3. M<- c("Feb","Mar","Apr","May","Jun")
4. # Giving the chart file a name
5. png(file = "bar_properties.png")
6. # Plotting the bar chart
7. barplot(H,names.arg=M,xlab="Month",ylab="Revenue",col="Green",
8. main="Revenue Bar chart",border="red")
9. # Saving the file
10. dev.off()

Output:

SHREE MEDHA DEGREE COLLEGE MANJESH M


R PROGRAMMING V - SEM BCA

Group Bar Chart & Stacked Bar Chart

We can create bar charts with groups of bars and stacks using matrices as input values in each bar.
One or more variables are represented as a matrix that is used to construct group bar charts and
stacked bar charts.

Let's see an example to understand how these charts are created.

Example

1. library(RColorBrewer)
2. months <- c("Jan","Feb","Mar","Apr","May")
3. regions <- c("West","North","South")
4. # Creating the matrix of the values.
5. Values <- matrix(c(21,32,33,14,95,46,67,78,39,11,22,23,94,15,16), nrow = 3, ncol = 5, byrow = TRUE)
6. # Giving the chart file a name
7. png(file = "stacked_chart.png")
8. # Creating the bar chart
9. barplot(Values, main = "Total Revenue", names.arg = months, xlab = "Month", ylab = "Revenue", ccol =
c("cadetblue3","deeppink2","goldenrod1"))

SHREE MEDHA DEGREE COLLEGE MANJESH M


R PROGRAMMING V - SEM BCA

10. # Adding the legend to the chart


11. legend("topleft", regions, cex = 1.3, fill = c("cadetblue3","deeppink2","goldenrod1"))
12.
13. # Saving the file
14. dev.off()

Output:

R Boxplot
Boxplots are a measure of how well data is distributed across a data set. This divides the data set
into three quartiles. This graph represents the minimum, maximum, average, first quartile, and the
third quartile in the data set. Boxplot is also useful in comparing the distribution of data in a data
set by drawing a boxplot for each of them.

R provides a boxplot() function to create a boxplot. There is the following syntax of boxplot()
function:

1. boxplot(x, data, notch, varwidth, names, main)

Here,

SHREE MEDHA DEGREE COLLEGE MANJESH M


R PROGRAMMING V - SEM BCA

S.No Parameter Description

1. x It is a vector or a formula.

2. data It is the data frame.

3. notch It is a logical value set as true to draw a notch.

4. varwidth It is also a logical value set as true to draw the width of the box same as the sample size.

5. names It is the group of labels that will be printed under each boxplot.

6. main It is used to give a title to the graph.

Let?s see an example to understand how we can create a boxplot in R. In the below example, we
will use the "mtcars" dataset present in the R environment. We will use its two columns only, i.e.,
"mpg" and "cyl". The below example will create a boxplot graph for the relation between mpg and
cyl, i.e., miles per gallon and number of cylinders, respectively.

Example

1. # Giving a name to the chart file.


2. png(file = "boxplot.png")
3. # Plotting the chart.
4. boxplot(mpg ~ cyl, data = mtcars, xlab = "Quantity of Cylinders",
5. ylab = "Miles Per Gallon", main = "R Boxplot Example")
6.
7. # Save the file.
8. dev.off()

Output:

SHREE MEDHA DEGREE COLLEGE MANJESH M


R PROGRAMMING V - SEM BCA

Boxplot using notch


In R, we can draw a boxplot using a notch. It helps us to find out how the medians of different data
groups match with each other. Let's see an example to understand how a boxplot graph is created
using notch for each of the groups.

In our below example, we will use the same dataset ?mtcars."

Example

1. # Giving a name to our chart.


2. png(file = "boxplot_using_notch.png")
3. # Plotting the chart.
4. boxplot(mpg ~ cyl, data = mtcars,
5. xlab = "Quantity of Cylinders",
6. ylab = "Miles Per Gallon",
7. main = "Boxplot Example",
8. notch = TRUE,
9. varwidth = TRUE,

SHREE MEDHA DEGREE COLLEGE MANJESH M


R PROGRAMMING V - SEM BCA

10. ccol = c("green","yellow","red"),


11. names = c("High","Medium","Low")
12. )
13. # Saving the file.
14. dev.off()

Output:

R Histogram

A histogram is a type of bar chart which shows the frequency of the number of values which are
compared with a set of values ranges. The histogram is used for the distribution, whereas a bar
chart is used for comparing different entities. In the histogram, each bar represents the height of
the number of values present in the given range.

For creating a histogram, R provides hist() function, which takes a vector as an input and uses
more parameters to add more functionality. There is the following syntax of hist() function:

1. hist(v,main,xlab,ylab,xlim,ylim,breaks,col,border)

Here,

SHREE MEDHA DEGREE COLLEGE MANJESH M


R PROGRAMMING V - SEM BCA

S.No Parameter Description

1. v It is a vector that contains numeric values.

2. main It indicates the title of the chart.

3. col It is used to set the color of the bars.

4. border It is used to set the border color of each bar.

5. xlab It is used to describe the x-axis.

6. ylab It is used to describe the y-axis.

7. xlim It is used to specify the range of values on the x-axis.

8. ylim It is used to specify the range of values on the y-axis.

9. breaks It is used to mention the width of each bar.

Let?s see an example in which we create a simple histogram with the help of required parameters
like v, main, col, etc.

Example

1. # Creating data for the graph.


2. v <- c(12,24,16,38,21,13,55,17,39,10,60)
3. # Giving a name to the chart file.
4. png(file = "histogram_chart.png")
5. # Creating the histogram.
6. hist(v,xlab = "Weight",ylab="Frequency",col = "green",border = "red")
7.
8. # Saving the file.
9. dev.off()

Output:

SHREE MEDHA DEGREE COLLEGE MANJESH M


R PROGRAMMING V - SEM BCA

Let?s see some more examples in which we have used different parameters of hist() function to
add more functionality or to create a more attractive chart.

Example: Use of xlim & ylim parameter

1. # Creating data for the graph.


2. v <- c(12,24,16,38,21,13,55,17,39,10,60)
3. # Giving a name to the chart file.
4. png(file = "histogram_chart_lim.png")
5. # Creating the histogram.
6. hist(v,xlab = "Weight",ylab="Frequency",col = "green",border = "red",xlim = c(0,40), ylim = c(0
,3), breaks = 5)
7.
8. # Saving the file.
9. dev.off()

Output:

SHREE MEDHA DEGREE COLLEGE MANJESH M


R PROGRAMMING V - SEM BCA

R Line Graphs

A line graph is a pictorial representation of information which changes continuously over time. A
line graph can also be referred to as a line chart. Within a line graph, there are points connecting
the data to show the continuous change. The lines in a line graph can move up and down based on
the data. We can use a line graph to compare different events, information, and situations.

A line chart is used to connect a series of points by drawing line segments between them. Line
charts are used in identifying the trends in data. For line graph construction, R provides plot()
function, which has the following syntax:

1. plot(v,type,col,xlab,ylab)

Here,

S.No Parameter Description

1. v It is a vector which contains the numeric values.

SHREE MEDHA DEGREE COLLEGE MANJESH M


R PROGRAMMING V - SEM BCA

2. type This parameter takes the value ?I? to draw only the lines or ?p? to draw only the
points and "o" to draw both lines and points.

3. xlab It is the label for the x-axis.

4. ylab It is the label for the y-axis.

5. main It is the title of the chart.

6. col It is used to give the color for both the points and lines

Let’s see a basic example to understand how plot() function is used to create the line graph:

Example

1. # Creating the data for the chart.


2. v <- c(13,22,28,7,31)
3. # Giving a name to the chart file.
4. png(file = "line_graph.jpg")
5. # Plotting the bar chart.
6. plot(v,type = "o")
7. # Saving the file.
8. dev.off()

Output:

SHREE MEDHA DEGREE COLLEGE MANJESH M


R PROGRAMMING V - SEM BCA

Line Chart Title, Color, and Labels

Like other graphs and charts, in line chart, we can add more features by adding more parameters.
We can add the colors to the lines and points, add labels to the axis, and can give a title to the chart.
Let?s see an example to understand how these parameters are used in plot() function to create an
attractive line graph.

Example

1. # Creating the data for the chart.


2. v <- c(13,22,28,7,31)
3. # Giving a name to the chart file.
4. png(file = "line_graph_feature.jpg")
5. # Plotting the bar chart.
6. plot(v,type = "o",col="green",xlab="Month",ylab="Temperature")
7. # Saving the file.
8. dev.off()

Output:

SHREE MEDHA DEGREE COLLEGE MANJESH M


R PROGRAMMING V - SEM BCA

Line Charts Containing Multiple Lines


In our previous examples, we created line graphs containing only one line in each graph. R allows
us to create a line graph containing multiple lines. R provides lines() function to create a line in
the line graph.

The lines() function takes an additional input vector for creating a line. Let?s see an example to
understand how this function is used:

Example

1. # Creating the data for the chart.


2. v <- c(13,22,28,7,31)
3. w <- c(11,13,32,6,35)
4. x <- c(12,22,15,34,35)
5. # Giving a name to the chart file.
6. png(file = "multi_line_graph.jpg")
7. # Plotting the bar chart.
8. plot(v,type = "o",col="green",xlab="Month",ylab="Temperature")
9. lines(w, type = "o", col = "red")
10. lines(x, type = "o", col = "blue")

SHREE MEDHA DEGREE COLLEGE MANJESH M


R PROGRAMMING V - SEM BCA

11. # Saving the file.


12. dev.off()

Output:

Line Graph using ggplot2


In R, there is another way to create a line graph i.e. the use of ggplot2 packages. The ggplot2
package provides geom_line(), geom_step() and geom_path() function to create line graph. To use
these functions, we first have to install the ggplot2 package and then we load it into the current
working library.

Let?s see an example to understand how ggplot2 is used to create a line graph. In the below
example, we will use the predefined ToothGrowth dataset, which describes the effect of vitamin
C on tooth growth in Guinea pigs.

Example

1. library(ggplot2)
2. #Creating data for the graph
3. data_frame<- data.frame(dose=c("D0.5", "D1", "D2"),
4. len=c(4.2, 10, 29.5))

SHREE MEDHA DEGREE COLLEGE MANJESH M


R PROGRAMMING V - SEM BCA

5. head(data_frame)
6. png(file = "multi_line_graph2.jpg")
7. # Basic line plot with points
8. ggplot(data=data_frame, aes(x=dose, y=len, group=1)) +geom_line()+geom_point()
9. # Change the line type
10. ggplot(data=df, aes(x=dose, y=len, group=1)) +geom_line(linetype = "dashed")+geom_point()
11. # Change the color
12. ggplot(data=df, aes(x=dose, y=len, group=1)) +geom_line(color="red")+geom_point()
13. dev.off()

Output:

R Scatterplots

The scatter plots are used to compare variables. A comparison between variables is required when
we need to define how much one variable is affected by another variable. In a scatterplot, the data
is represented as a collection of points. Each point on the scatterplot defines the values of the two
variables. One variable is selected for the vertical axis and other for the horizontal axis. In R, there
are two ways of creating scatterplot, i.e., using plot() function and using the ggplot2 package's
functions.

There is the following syntax for creating scatterplot in R:

1. plot(x, y, main, xlab, ylab, xlim, ylim, axes)

Here,

S.No Parameters Description

SHREE MEDHA DEGREE COLLEGE MANJESH M


R PROGRAMMING V - SEM BCA

1. x It is the dataset whose values are the horizontal coordinates.

2. y It is the dataset whose values are the vertical coordinates.

3. main It is the title of the graph.

4. xlab It is the label on the horizontal axis.

5. ylab It is the label on the vertical axis.

6. xlim It is the limits of the x values which is used for plotting.

7. ylim It is the limits of the values of y, which is used for plotting.

8. axes It indicates whether both axes should be drawn on the plot.

Let's see an example to understand how we can construct a scatterplot using the plot function. In
our example, we will use the dataset "mtcars", which is the predefined dataset available in the R
environment.

Example

1. #Fetching two columns from mtcars


2. data <-mtcars[,c('wt','mpg')]
3. # Giving a name to the chart file.
4. png(file = "scatterplot.png")
5. # Plotting the chart for cars with weight between 2.5 to 5 and mileage between 15 and 30.
6. plot(x = data$wt,y = data$mpg, xlab = "Weight", ylab = "Milage", xlim = c(2.5,5), ylim = c(15,30), main
= "Weight v/sMilage")
7. # Saving the file.
8. dev.off()

Output:

SHREE MEDHA DEGREE COLLEGE MANJESH M


R PROGRAMMING V - SEM BCA

Scatterplot using ggplot2


In R, there is another way for creating scatterplot i.e. with the help of ggplot2 package.

The ggplot2 package provides ggplot() and geom_point() function for creating a scatterplot. The
ggplot() function takes a series of the input item. The first parameter is an input vector, and the
second is the aes() function in which we add the x-axis and y-axis.

Let's start understanding how the ggplot2 package is used with the help of an example where we
have used the familiar dataset "mtcars".

Example

1. #Loading ggplot2 package


2. library(ggplot2)

SHREE MEDHA DEGREE COLLEGE MANJESH M


R PROGRAMMING V - SEM BCA

3. # Giving a name to the chart file.


4. png(file = "scatterplot_ggplot.png")
5. # Plotting the chart using ggplot() and geom_point() functions.
6. ggplot(mtcars, aes(x = drat, y = mpg)) +geom_point()
7. # Saving the file.
8. dev.off()

Output:

We can add more features and make a more attractive scatter plots also. Below are some examples
in which different parameters are added.

SHREE MEDHA DEGREE COLLEGE MANJESH M


R PROGRAMMING V - SEM BCA

COMMON PROBABILITY DISTRIBUTION

Introduction:

R comes with built-in implementation of many probability distributions. Probability


distribution in R is associated with four functions that follows a naming convention: the probability
density function always begins with ‘d’, the cumulative distribution function always begins with
‘p’, the inverse cumulative distribution (or quantile function) always begins with ‘q’, and a
function that produces random variables always begins with ‘r’. each function takes a single
argument at which to evaluate the function followed by specific parameters that define the
particular distribution function to evaluate.

Some of the Common Probability Distribution Functions in R

Name Probability Density Cumulative Distribution Quantile


Normal dnorm(Z,mean,sd) pnorm(Z,mean,sd) qnorm(Z,mean,sd)
Poisson dpois(N,lambda) ppois(N,lambda) qpois(N,lambda)
Binomial dbinom(N,size,prob) pbinom (N,size,prob) qbinom (N,size,prob)
R Normal Distribution

In random collections of data from independent sources, it is commonly seen that the distribution
of data is normal. It means that if we plot a graph with the value of the variable in the horizontal
axis and counting the values in the vertical axis, then we get a bell shape curve. The curve center
represents the mean of the data set. In the graph, fifty percent of the value is located to the left of
the mean. And the other fifty percent to the right of the graph. This is referred to as the normal
distribution.

R allows us to generate normal distribution by providing the following functions:

These function can have the following parameters:

S.No Parameter Description

SHREE MEDHA DEGREE COLLEGE MANJESH M


R PROGRAMMING V - SEM BCA

1. x It is a vector of numbers.

2. p It is a vector of probabilities.

3. n It is a vector of observations.

4. mean It is the mean value of the sample data whose default value is zero.

5. sd It is the standard deviation whose default value is 1.

Let's start understanding how these functions are used with the help of the examples.

dnorm():Density
The dnorm() function of R calculates the height of the probability distribution at each point for a
given mean and standard deviation. The probability density of the normal distribution is:

Example

1. # Creating a sequence of numbers between -1 and 20 incrementing by 0.2.


2. x <- seq(-1, 20, by = .2)
3. # Choosing the mean as 2.0 and standard deviation as 0.5.
4. y <- dnorm(x, mean = 2.0, sd = 0.5)
5. # Giving a name to the chart file.
6. png(file = "dnorm.png")
7. #Plotting the graph
8. plot(x,y)
9. # Saving the file.
10. dev.off()

Output:

SHREE MEDHA DEGREE COLLEGE MANJESH M


R PROGRAMMING V - SEM BCA

pnorm():Direct Look-Up

The dnorm() function is also known as "Cumulative Distribution Function". This function
calculates the probability of a normally distributed random numbers, which is less than the value
of a given number. The cumulative distribution is as follows:

f(x)=P(X≤x)

Example

1. # Creating a sequence of numbers between -1 and 20 incrementing by 0.2.


2. x <- seq(-1, 20, by = .1)
3. # Choosing the mean as 2.0 and standard deviation as 0.5.
4. y <- pnorm(x, mean = 2.0, sd = 0.5)
5. # Giving a name to the chart file.
6. png(file = "pnorm.png")
7. #Plotting the graph
8. plot(x,y)
9. # Saving the file.
10. dev.off()

SHREE MEDHA DEGREE COLLEGE MANJESH M


R PROGRAMMING V - SEM BCA

Output:

qnorm():Inverse Look-Up
The qnorm() function takes the probability value as an input and calculates a number whose
cumulative value matches with the probability value. The cumulative distribution function and the
inverse cumulative distribution function are related by

p=f(x)
x=f-1 (p)

Example

1. # Creating a sequence of numbers between -1 and 20 incrementing by 0.2.


2. x <- seq(0, 1, by = .01)
3. # Choosing the mean as 2.0 and standard deviation as 0.5.
4. y <- qnorm(x, mean = 2.0, sd = 0.5)
5. # Giving a name to the chart file.
6. png(file = "qnorm.png")
7. #Plotting the graph
8. plot(y,x)

SHREE MEDHA DEGREE COLLEGE MANJESH M


R PROGRAMMING V - SEM BCA

9. # Saving the file.


10. dev.off()

Output:

ADVERTISEMENT

rnorm():Random variates
The rnorm() function is used for generating normally distributed random numbers. This function
generates random numbers by taking the sample size as an input. Let's see an example in which
we draw a histogram for showing the distribution of the generated numbers.

Example

1. # Creating a sequence of numbers between -1 and 20 incrementing by 0.2.


2. x <- rnorm(1500, mean=80, sd=15 )
3. # Giving a name to the chart file.
4. png(file = "rnorm.png")
5. #Creating histogram
6. hist(x,probability =TRUE,col="red",border="black")

SHREE MEDHA DEGREE COLLEGE MANJESH M


R PROGRAMMING V - SEM BCA

7. # Saving the file.


8. dev.off()

Output:

Poisson Distribution:

The poisson distribution f(λ) is often used to represent the number of events occurring in a fixed
interval of time or space.

where x=0,1,2,3,….

There are four poisson functions available in R:

 dpois()
 ppois()
 qpois()
 rpois()

SHREE MEDHA DEGREE COLLEGE MANJESH M


R PROGRAMMING V - SEM BCA

The dpois (density), ppois (distribution function) rpois (random generation) and qpois (quantile
function). The probability density dpois and cumulative distribution ppois are defined on non-
negative integers.

The probability mass function (PMF) of the poisson distribution is given by the formula:

P(X=k)=(e- λ λk)/k!

 P(X=k) – is the probability of observing k events


 λ is the average rate of events in the given time interval
 e is the base of the natural logarithm,
 k! represents the factorial of k

dpois():

dpois() function is used for illustration of Poisson density in an R plot. The function dpois(0
calculates the probability of a random variable that is available within a certain range.

Syntax:

dpois(x, lambda, log=FALSE)

 x: the number of events for which the probability is to be calculated.


 Lambda: the average rate of occurrence (a non-negative numeric value)
 Log: a logical value. If TRUE, probability are given as log values.

Ex:

lambda<-2
# Compute the probability mass function (PMF) for a specific value of k
k<-3
pmf_value<-dpois(k,lambda)
cat("probability mass function(PMF)for k=",k,":",pmf_value,"\n")
Out put:
Probability Mass Function PMF) for k=3: 0.180447

ppois():
ppois() function is used for the illustration of the cumulative probability function in an R plot. The
function ppois() calculates the probability of a random variable that will be equal to or less than a
number.
Syntax:
ppois(q, lambda, lower.tail, log)

SHREE MEDHA DEGREE COLLEGE MANJESH M


R PROGRAMMING V - SEM BCA

 K:number of successful events happened in an interval


 lambda: mean per interval
 lower.tail: if TRUE then left tail is considered otherwise if FALSE right tail is considered.
 log: If TRUE then the function returns probability in form of log.

Ex:
lambda<-4
#Calculate the CDFfor k
k<-3
cdf_value<-ppois(k,lambda)
cat("cumulative distribution function (CDF)fork=",k,"and lambda=",lambda,":",cdf_value,"\n")

Output:
Cumulative Distribution Function (CDF) for k=3 and lambda=4: 0.4334701

rpois():
The function rpois() is used for generating random number s from a given Poisson’s distribution.

Syntax:
rpois(q, lambda)
 q: number of random numbers needed
 lambda: mean per interval

Ex:
lam<-2.5
rv<-rpois(10,lam)
print(rv)

Output: [1] 0 5 2 0 3 3 0 0 2 5

qpois():
The function qpois() is used for generating quantile of a given Poisson’s distribution.
In probability, quantiles are marked points that divide the graph of probability distribution into
intervals(continuous) which have equal probabilities.

Syntax:
qpois(q, lambda, lower.tail, log)
 q: number of successful events happened in an interval
 lambda: mean per interval
 lower.tail: If TRUE then left tail is considered otherwise right tail is considered

SHREE MEDHA DEGREE COLLEGE MANJESH M


R PROGRAMMING V - SEM BCA

 log: If TRUE then the function returns probability in form of log.

Ex:
lambda<-2
probability<-0.7
quantile_value<-qpois(probability,lambda)
cat("Quantile value for probability",probability,"and lambda",lambda,":",quantile_value,"\n")

Output: Quantile value for probability 0.7 and lambda 2: 3

Binomial Distribution

The binomial distribution is also known as discrete probability distribution, which is used to
find the probability of success of an event. The event has only two possible outcomes in a series
of experiments. The tossing of the coin is the best example of the binomial distribution. When a
coin is tossed, it gives either a head or a tail. The probability of finding exactly three heads in
repeatedly tossing the coin ten times is approximate during the binomial distribution.

R allows us to create binomial distribution by providing the following function:

These function can have the following parameters:

S.No Parameter Description

1. x It is a vector of numbers.

2. p It is a vector of probabilities.

3. n It is a vector of observations.

4. size It is the number of trials.

SHREE MEDHA DEGREE COLLEGE MANJESH M


R PROGRAMMING V - SEM BCA

5. prob It is the probability of the success of each trial.

Let's start understanding how these functions are used with the help of the examples

dbinom(): Direct Look-Up, Points


The dbinom() function of R calculates the probability density distribution at each point. In simple
words, it calculates the density function of the particular binomial distribution.

Example

1. # Creating a sample of 100 numbers which are incremented by 1.5.


2. x <- seq(0,100,by = 1)
3. # Creating the binomial distribution.
4. y <- dbinom(x,50,0.5)
5. # Giving a name to the chart file.
6. png(file = "dbinom.png")
7. # Plotting the graph.
8. plot(x,y)
9. # Saving the file.
10. dev.off()

Output:

SHREE MEDHA DEGREE COLLEGE MANJESH M


R PROGRAMMING V - SEM BCA

pbinom():Direct Look-Up, Intervals

The dbinom() function of R calculates the cumulative probability(a single value representing the
probability) of an event. In simple words, it calculates the cumulative distribution function of the
particular binomial distribution.

Example

1. # Probability of getting 20 or fewer heads from 48 tosses of a coin.


2. x <- pbinom(20,48,0.5)
3. #Showing output
4. print(x)

Output:

SHREE MEDHA DEGREE COLLEGE MANJESH M


R PROGRAMMING V - SEM BCA

qbinom(): Inverse Look-Up

The qbinom() function of R takes the probability value and generates a number whose cumulative
value matches with the probability value. In simple words, it calculates the inverse cumulative
distribution function of the binomial distribution.

Let's find the number of heads that have a probability of 0.45 when a coin is tossed 51 times.

Example

1. # Finding number of heads with the help of qbinom() function


2. x <- qbinom(0.45,48,0.5)
3. #Showing output
4. print(x)

Output:

rbinom()

The rbinom() function of R is used to generate required number of random values for given
probability from a given sample.

Let's see an example in which we find nine random values from a sample of 160 with a probability
of 0.5.

Example

1. # Finding random values


2. x <- rbinom(9,160,0.5)
3. #Showing output
4. print(x)

Output:

SHREE MEDHA DEGREE COLLEGE MANJESH M


R PROGRAMMING V - SEM BCA

The Uniform Distribution in R

A uniform distribution is a probability distribution in which every value between an


interval from a to b is equally likely to be chosen. The probability that we will obtain a value
between x1 and x2 on an interval from a to b can be found using the formula:

P(obtain value between x1 and x2)=( x2 and x1)/(b-a)

The uniform distribution has the following properties:

 the mean of the distribution is μ=(a+b)/2


 the variance of the distribution is σ2=(b-a)2/12
 the standard deviation of the distribution is σ = √ σ2

dunif():

The dunif() function is used to compare the density of the uniform distribution at specified points.
However, the uniform distribution is not as commonly used as other distribution. For continuous
probability distribution, density is the value of the probability density function as x(i.e. f(x))

Syntax:

dunif(x, min=0, max=1)

 x: represents vector
 min: lower limit of the distribution (default value is 0 in R)
 max: upper limit of the distribution (default value is 1 in R)

Ex:
dunif<-function(x,min,max){
den<-ifelse(x>=min & x <=max, 1/(max-min),0)
return(den)
}
xval<-seq(0, 1, by=0.1)
den_val<-dunif(xval,0,1)
print(den_val)

SHREE MEDHA DEGREE COLLEGE MANJESH M


R PROGRAMMING V - SEM BCA

Output: [1] 1 1 1 1 1 1 1 1 1 1

punif():
The punif() function in R is used to compute the cumulative distribution function(CDF) for the
uniform distribution. it calculates the probability that a random observation from a uniform
distribution will be less than or equal to a specific value.
Syntax:
punif(q, min=0, max=1, lower.tail=TRUE)
 q: the quantile (a numeric vector of values)
o min: lower limit of the distribution (default value is 0)
o max: upper limit of the distribution (default value is 1)

Ex:
q<-0.4
minv<-0.2
maxv<-0.8

cum_prob<-punif(q, min=minv, max=maxv)


print(cum_prob)

Output:[1] 0.3333333

qunif():
the qunif() function in R is used to compute quantiles from the uniform distribution.

Syntax:
qunif(p, min, max)
 p: is the probability at which to compute the quantile
 min: is the minimum value of the distribution (lower limit of the interval)
 max: is the maximum value of the distribution (upper limit of the interval)
the qunif function returns the quantiles corresponding to the probability provided in the p
arguments.

Ex:
prob<-c(0.2, 0.5, 0.8) #probabilities
minv<-2 #Lower limit of the interval
maxv<-5 #Upper limit of the interval

quant<-qunif(prob, min=minv, max=maxv)


print(quant)

SHREE MEDHA DEGREE COLLEGE MANJESH M


R PROGRAMMING V - SEM BCA

Output: [1] 2.6 3.5 4.4

rnuif():
The rnif() function in R is used to generate random numbers from a uniform distribution. It
produces a specified number of random variates within a defined interval.

Syntax:
runif(n, min=0, max=1)
 n: is the number of random values to generates
 min: is the lower limit of the interval (default=0)
 max: is the upper limit of the interval (default=1)

runif() function is used to create random variates within a specified range.


Ex:
runif(1)

runif(5)

runif(10, min=5, max=10)


output:[1] 8.552517, 9.490746, 6.250996, 6.765012, 5.633094, 7.700866, 8.516678
[9] 5.650098, 8.942687

Bernoulli Distribution:
The Bernoulli distribution is a discrete probability distribution that represents the outcomes
of a random experiment with two possible outcomes: success and failure. It is names after Jacob
Bernoulli, a Swiss mathematician, and is a special case of the binomial distribution. the Burnoulli
distribution is a takes value 1 with p and value 0 with probability 1-p where 0≤p≤1.

In this distribution,
 k takes on the value 1 with probability
 p and the value 0 with probability
 1-p. the mean(expected value) of the Burnoulli random variable is
E[X]=p, and the variance is Var[X]=p(1-p)
In R programming language, there are 4 built-in functions for Burnoulli distribution

dbern():
dbern() function in R programming measures the density function of the Burnoulli distribution

Syntax:dbern(x, prob, log=FALSE)


Parameter:
 x: vector of quantiles

SHREE MEDHA DEGREE COLLEGE MANJESH M


R PROGRAMMING V - SEM BCA

 prob: probability of success on each trial


 log: logical; if TRUE, probabilities p are given as log(p)

Ex:
Library(extraDistr)

prob<-0.3
xval<-c(0,1)
den<-dbern(xval, prob)
print(den)

Output:[1] 0.7 0.3

pbern():
pbern() function in R programming gives the distribution function for the Bernoulli distribution
The distribution function or cumulative distribution function(CDF) or cumulative frequency
function, describes the probability that a variate X takes on a value less than or equal to a number
x.

Syntax: pbern(q, prob, lower.tail, log.p=FALSE)


Parameter:
 q: vector of quantiles
 prob: probability of success on each trail
 lower.tail: logical value
 log.p: logical; if TRUE, probabilities p are given as log(p)

Ex:
n<-1
prob<-0.3

cum_prob<-pbern(0:1, n, prob, lower.tail=TRUE)


print(cum_prob)

Output:[1]0.7 0.1

qbern():
qbern() gives the quantile function for the Bernoulli distribution
A quantile function in statistical terms specifies the value of the random variable such that
the probability of the variable being less than or equal to that value equals the given probability.

Syntax: qbern(p, prob, lower.tail=TRUE, log.p=FALSE)

SHREE MEDHA DEGREE COLLEGE MANJESH M


R PROGRAMMING V - SEM BCA

Parameter:
 p: vector of probabilities.
 prob: probability of success on each trail
 lower.tail: logical value
 log.p: logical ; if TRUE, probabilities p are given as log(p).

Ex:
if(!require(extraDistr)){
install.packages(“extraDistr”)
}
Library(extraDistr)
prob<-0.3
qunat_val<-qbern(prob)
print(quant_val)

Output: [1] 0

rbern():
rbern() function in R programming is used to generate a vector of random numbers which are
Bernoulli distribution.

Syntax:
rbern(n, prob)

Parameter:
 n: number of observations
 prob: probability

Ex:
#install.packages(“extraDistr”)
#loading the ‘extraDistr’ package
library(extraDistr)
prob<-0.4

samp<-rbern(prob, 10)
print(samp)

Output: [1] 0 0 1 0 0 1 1 0 0 0

Student t-distribution:

SHREE MEDHA DEGREE COLLEGE MANJESH M


R PROGRAMMING V - SEM BCA

The Student’s t-distribution (or simply the t-distribution) is a probability that arises in the
problem of estimating the mean of a normally distributed population when the sample size is small
and the population standard deviation is unknown. It is also used for constructing confidence
intervals and hypothesis tests on the population mean.

Key properties of the t-distribution include:


 Shape: The t-distribution is bell-shaped and symmetrical, like the standard normal
distribution, but with heavier tails. As the degree of freedom increases, the t-distribution
approaches the standard normal distribution.
 Centrality: the mean, median and mode of the t-distribution are all 0.
 Degree of Freedom(df):The t-distribution is characterized by its degrees of freedom,
denoted as v. The degrees of freedom determine the shape of the distribution. As the
degrees of freedom increase, the t-distribution approaches the standard normal distribution.
For small sample sizes, the t-distribution has more probability in the tails compared to the
normal distribution.
 Probability Density Function(PDF): The PDF of the t-distribution involves the gamma
function and is defined for any real number. The formula for the PDF is given in the
response to the previous question.
The Student’s t-distribution using various functions such as dt, pt, qt, and rt, which allow you
to compute the probability density function (PDF), the cumulative distribution function (CDF),
the quantile function, and generate random deviates, respectively. Here’s summary of these
functions:
dt()
dt(x, df) computes the probability density function(PDF) of the Student’s t-distribution at
the specified values.
X : Vector of values as which to evaluate the PDF.
df:Degrees of freedom.
Example
Density<-dt(1.5,10)
Print(density)
Output: [1]0.127444

Example
Program to demonstrate the use of dt() along with a chart program to visualize the t-
distribution:
#Define the range of values
x<-seq(-3, 3, length.out=100)
#Set the degrees of freedom
df<-5
#Compute the probability density function for the given values

SHREE MEDHA DEGREE COLLEGE MANJESH M


R PROGRAMMING V - SEM BCA

densities<-dt(x, df)

#Create a plot of the t-distribution


plot(x, densities, type = “l”,
main = “Density of Student’s t-distribution”,
xlab = “x”, ylab = “Density”)

Output:

pt():
pt(q, df) computes the cumulative distribution function(CDF) of the Student’s t-
distribution for the given values.
q: Vector of quantiles.
df: Degrees of freedom.

Example
cumulative_prob<-pt(1.5,10)
print(cumulative_prob)

Output:[1]0.9177463

Example
Program to demonstrate the use of pt() along with a chart program to visualize the t-
distribution:
#Set the degrees of freedom
df<- 10

SHREE MEDHA DEGREE COLLEGE MANJESH M


R PROGRAMMING V - SEM BCA

#Generate a sequence of values for the x-axis


x<- seq(-3,3, length.out =100)

#Calculate the lower tail probabilities for the t-distribution


p_values<-pt(x, df)
#Plot the t-distribution
plot(x,p_values, type =”1”,
main= “T-Distribution”,
xlab=”X”, ylab=”P(T<=x)”,
col=”blue”,lwd=2)
Output:

qt():

qt(p, df): Computes the quantiles of the Student’s t-distribution for the given probabailities.
p: Vector of probabilities.
df: Degrees of freedom

Example
quantiles <-qt(0.95, 10)
print(quantiles)
Output:[1]1.812461

Example
Program to demonstrate the use of qt() along with a chart program to visualize the
t-distribution:

SHREE MEDHA DEGREE COLLEGE MANJESH M


R PROGRAMMING V - SEM BCA

#Set the degrees of freedom


df <-10
#Compute quantiles of the Student’s t-distribution
quantiles <-qt(seq(0.01,0.99, by=0.01), df)
plot(seq(0.01,0.99, by=0.01), quantiles, type =”l”,
xlab =”Probability”, ylab =”Quantiles”,
main=paste(“Quantiles of t-Distribution (dg =”, df,”)”))

Output

rt()
rt(n, df): Generates random deviates from a Student’s t-distribution.
n: Number of observations to generate.
df: Degrees of freedom.

Example
Sample <-rt(10, 10)
Print(sample)

Output:
[1]-1.22383726 0.94088638-0.07026774-1.85703872 0.95586137 -1.11909061
[7]-0.02851959 0.72807861 0.46058078-0.16748534
Example

SHREE MEDHA DEGREE COLLEGE MANJESH M


R PROGRAMMING V - SEM BCA

Program to demonstrate the use of rt() along with a chart program to visualize the
t-distribution:
#Generating random numbers from a t=distribution
random_t <-rt(n=100, df=5)

#Creating a histogram
hist(random_t, col = “skyblue”, main = “Histogram of Random Numbers from t-
Distribution”, xlab=”Value”)

Output:

SHREE MEDHA DEGREE COLLEGE MANJESH M

You might also like