Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

RMC Lovish

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 41

VIVEKANANDA INSTITUTE OF

PROFESSIONAL STUDIES
VIVEKANANDA SCHOOL OF BUSINESS
STUDIES

RESEARCH METHODOLOGY OF
COMMERCE
B.COM(HONS) 213
R Studio
STUDENT NAME-LOVISH MANGLA
ENROLLEMENT NUMBER-13017788822
SECTION-BCOM(H)3-C
SUBMITTED TO:DR KRITIKA NAGDEV
RAISETIA
ASSISTANT PROFESSOR,VIPS
INTRODUCTION TO R Studio
Q1. How to Install R Studio? What is the latest version of R. Give details?
Ans-In order to install R Studio, we first need to install R. Following are the steps how to
install R:

 Go to CRAN, click Download R for Windows, click Base, and download the installer
for the latest R version.
 Right-click the installer file and select Run as Administrator from the pop-up menu.
 Select the language to be used during installation.
 This doesn't change the language used by R; all messages and Help files remain in
English.

Follow the instructions of the installer.


You can safely use the default settings and just keep clicking Next until R starts installing.

After installing the setup of R, we can install the setup of R Studio. Following are the steps
how
to install R Studio:

1. Install R. Leave all default settings in the installation options.


2. Open RStudio.
3. Go to the "Packages" tab and click on "Install Packages".
4. Start typing "Rcmdr" until you see it appear in a list....
5.Wait while all the parts of the R Commander package are installed.

The latest stable version of R is R 4.3.2(Eye Holes), released on 2023-10-31. Here are some
examples of R versions along with their code names:

R 4.3.1 (Beagle Scouts)


R 4.2.3 (Shortstop Beagle)
R 4.1.2 (No specific code name)
R 4.1.1 (No specific code name)
R 4.1.0 (Camp Pontanezen)
R 4.0.0 (Arbor Day)
R 3.6.0 (Planting of a Tree)
R 3.5.0 (Joy in Playing)
R 3.4.0 (You Stupid Darkness)
R 3.3.0 (Supposedly Educational)
R 3.2.0 (Full of Ingredients)
R 3.1.0 (Spring Dance)

Q2. RStudio Layout with snapshot. Explain the purpose of all panes.
Ans- There are four panes in R Studio also known as windows. These panes are given below:
1. Source
2. Console
3. Environment/History
4. Files/Plots/Packages/Help
Source: This is that part of the window where we write our code. Our code will not be
evaluated until we run this code in the console.
Console: This is that part of the pane where our code from the source is evaluated by R. We
can also use the console to perform quick calculations that we don't need to save.
Environment/History: This is that part of the window where we can see that what objects
are in our working space.
Files/Post/Packages/Help: This is that part of the pane where we can see file directories,
view plots, see our packages and access R help.

Q3. Who designed and developed the R language?


Ans- Ross Ihaka and Robert Gentleman designed the R Language for Statistical Analysis. The
R Development Core Team developed the R Language. They provided different forms such as
sources written in C and precompiled binary files for different operating systems such as
UNIX, Linux, Windows, and Macintosh.
Q4. Are variables ‘H’ and ‘h’ same in R language?
Ans- R is a case-sensitive language and hence, 'H' and 'h' both are different.
Q5. What is a package? What are two major parts of R language?
Ans- A Package is a collection of functions, examples and documentation. R language
provides different packages useful for many applications. The base package contains other
packages such as utils, stats, and graphics.
The base system and add-on packages are the two major parts of the R language software.
The base system contains the most important package "base" and other
necessary functions.
Q6. How is a package installed and accessed?
Ans- To install a package, simply pass the package to be installed as an argument to
"install.packages(function)".
A package can be accessed once it has been installed using "library()" command.
Q7. What is CRAN?
Ans- Comprehensive R Archive Network (CRAN) is the official repository, it is a network of ft
and web servers maintained by the R community around the world. The R community
coordinates it, and for a package to be published in CRAN, the Package needs to pass several
tests to ensure that the package is following CRAN policies.
Q8. What do you mean by Object Assignment? Elucidate difference between Left-side and
Right-side Assignment with output. What is Assignment operator in Rstudio?
Ans- Object assignment refers to assigning values to a name via the assignment operator,
which will create a new object with a name. You can use the new named object once it is
created in subsequent calculations without redundancy.
The assignment operation has three components. From left to right-
1. the first component x numeric is the object name of a new object.
2. The second component is the assignment operator <-, which is a combination of the less
than sign < immediately followed by the minus sign -.
3. The final component is the values) to be assigned to the name.
<- assigns a value to a variable from right to left.
-> assigns a value to a variable left to right.
The assignment operator is used to assign a value. For instance, we can assign the value 3 to
the variable × using the <- assignment operator. We can then evaluate the variable by simply
typing x at the command line which will return the value of x.
Q9. Explain The c( ) function. Write a sample command for the same.
Ans- The c() function in R is used to perform three common tasks:
1. Create a vector.
2. Concatenate multiple vectors.
3.Create columns in a data frame.
Q10.What is paste( ) function used for? Write a sample command for the same
Ans- Paste() function takes multiple elements from the multiple vectors and concatenates
them into a single element.

Q11.What is %>% operator used for? Write a sample command for the same
Ans- %>% is called the forward pipe operator in R. It provides a mechanism for chaining
commands with a new forward-pipe operator, %>%. This operator will forward a value, or
the result of an expression, into the next function call/expression.

Q12.What is meant by “>”, “+” and [1]in R console?


Ans-In R Console, every output is displayed with [1], which indicates that every program
code calls every single number as a vector or a matrix. Whereas, "" "+" works as a secondary
prompt that occurs when any instruction or command is not finished. In this case, either the
codes need to be completed or press "Es' to get back on the primary prompt, i.e.,
"›"
Q13.Write the code to identify odd or even numbers using IF statement.
Ans-

Q14.Write the code to identify minimum number among three numbers using Nested IF
statement
Ans-

Q15.Display grade of student using nested if command for following criterion (customize
student name). Output example: Kritika has scored “A” Grade

Ans-

Q16.How to Import of Data Sheet in Excel


Ans- Step 1: Select the Import Dataset option in the environment window. Here the user
needs to select the option to import the dataset from the environment window in RStudio.
Step 2: Select the option of "From excel" under the import Dataset option. In this step, the
user needs to select the option to "from excel" as the file is in the form of excel under the
import dataset option to import the excel file.
Step 3: Select the browse option and select the excel file to be imported. Now, under this
with the click to the browse option user will be given the choice to select the needed excel
file to be imported in R and then the user need to select the needed excel file to be
imported in R.
Step 4: Select the import option and the excel file is successfully imported.

Q17.What are Data Frames, Matrices, Vectors


Ans- Data Frames are data displayed in a format as a table. Data Frames can have different
types of data inside it. While the first column can be character, the second and third can be
numeric or logical. However, each column should have the same type of data.
A matrix is a two-dimensional data set with columns and rows. A column is a vertical
representation of data, while a row is a horizontal representation of data. A matrix can be
created with the matrix ) function.
A vector is simply a list of items that are of the same type. To combine the list of items to a
vector, use the c( ) function and separate the items by a comma.
Q18.Write a command to create Data Frames, Matrices, Vectors
Ans-Command to create a Data Frame:

Command to create a matrix:


Command to create a vector:

Q19.Name some built-in functions with their description.


Ans-
Operator Description

Mean(x) Mean of x

Median(x) Median of x

Var(x) Variance of x

Sd(x) Standard deviation of x

Scale(x) Standard scores(z-scores) of x

Quartile(x) The quartiles of x

Summary(x) Summary of x:mean,min,max etc.

Q20.Create a function for multiplication but no return value


Ans-

Q21.Write a command for Accessing Rows and Columns


Ans-

Q22.Create a data frame by your surname of 12 rows and 8 columns.


Ans-

Q23.Write a command to access non-consecutive rows or columns, use ‘c() ‘. For example, to
obtain rows 1 to 5, 7 and 11 and columns 3 to 4 and 7
Ans-

Q24.Add one new column and drop two existing columns 4 and 5
Ans-

Q25.Drop rows 1, 3 and 4


Ans-

Q26.Write a command to calculate the number of columns and number of rows.


Ans-

Q27.What is the command to access built in datasets? What is the command to get
description of a built-in datasets
Ans- We use data() function without any arguments to get the list of built-in datasets.
Q28.Access Titanic dataset and Execute commands to evaluate whether the evacuation
strategy was fair or not. If biased, state which gender, age group and class was most
favoured. Analyse using cross tabulations.
Ans-

It is evidently visible that the percentage of child survived is 52.29 % while the percentage of
adult survived is 31.26 % . So ,child was most favoured.
Similarly, the percentage of male survived is 21.20 % while the percentage of female
survived is 73.19 %.So, females were most favoured.

Q29.Calculate correlation by importing data from excel. Determine whether there is a


positive or a negative correlation in Advertisement in month and Sales in crores.
Ans-

Hence there is a positive correlation between advertisement run and sales.


Packages in R Programming

The tidyr Package

Q30.Apply Important Functions (gather, separate, unite, spread, fill, full_seq, drop_na, and
replace_na) in “tidyr Package” for following dataset.

Ans-gather()

separate()
unite()

spread()
fill()
full_seq

drop_na
replace_na

The dyplr Package

Q31.Apply Important Functions (filter, arrange, select, rename, mutate and transmute,
sample_n and sample_frac) for following column heads with 5 data rows:

Ans-Dataset:
Filter

Arrange:
Select:

Rename:

Mutate:
Transmute:
sample_n:

sample_frac:
Data Visualization in R Studio
Quick plot with ggplot2

Q32.Generate BCOM marks data containing the sections and overall percentage (5 sections
ranging from A to E ), with 60 students in each section.
Q33.Create following Quick plots with customized labels (with your name and DOB) for both
the axis and Main title of the chart
• Histogram fill color by group (Section)

• Basic density plot


• Density plot line color by group (Section) and change line type

• Draw a plot using data from numeric vectors where X contains values ranging from 10 to
20 and Y is square of X
• Add to the dot plot for X & Y

Q34.Activate Motor Trend Car Road Tests dataset. Using the given data set prepare following
quick plots:

• Scatter plots with smoothed line for Miles/(US) gallon on y axis and Weight (lb/1000) on x
axis
• Scatter plots (for Miles/(US) gallon on y axis and Weight (lb/1000) on x axis) with
Smoothed line by groups (Number of cylinders)

• Scatter plots with colors for Miles/(US) gallon on y axis and Weight (lb/1000) on x axis
• Scatter plots (for Miles/(US) gallon on y axis and Weight (lb/1000) on x axis) with colors by
groups (Number of gears)

• Scatter plots (for Miles/(US) gallon on y axis and Weight (lb/1000) on x axis) with
Smoothed line and colors by groups (Number of gears)
• Scatter plots (for Miles/(US) gallon on y axis and Weight (lb/1000) on x axis) with
Smoothed line and the point shape by groups (Number of gears)

Q35.Provide 5 commands for Descriptive statistics


Descriptive statistics are a set of techniques used to summarize and describe the main
features of a dataset. These statistics provide a concise overview of the key characteristics of
the data, helping to simplify and interpret the information. Descriptive statistics are often
the first step in the analysis of a dataset and are used to explore, understand, and
communicate its main properties.
Q36.Provide summary statistics for the MTCARS dataset while displaying a count summary
of categorical variables.
HYPOTHESIS TESTING using R studio

For all test import excel with the given data saved by your name_test
T-TEST
A. One Sample t- Test using dummy (One- Tailed)
File name example: kritika_ttest

Problem 1:
To determine that the population mean of age is equal to 40 at α=0.05.
Decision Rule: If p> a, accept null hypothesis.
Inference: p(0.1571)> a(0.05), accept null hypothesis.
Conclusion: The population mean of age is 40 at a=0.05.

B. Two Sample t- Test


File name example: kritika_ttest2
Problem 1:
To analyze that the time spent by full time students in studying statistics is different as time
spent by part time students.
Decision Rule: If p<a, reject null hypothesis.
Inference: p(0.8) > a(0.05), accept null hypothesis.
Conclusion: The time spent by full time students in studying is not different as time spent by
part time students at a=0.05.
C. Two Sample t- Test

Problem 1:
Is there sufficient evidence to suggest that the mean time to exhaustion is greater after
chocolate milk than after carbohydrate replacement drink? Use a significance level of 0.05.
(Use µCM-µCD in hypothesis statements)
Decision Rule: If p<a, reject null hypothesis.
Inference: p(0.076)> a(0.05), accept null hypothesis.
Conclusion: The mean time to exhaustion is greater after carbohydrate drink than after
chocolate milk at a=0.05.

D. Paired t- Test
Problem 1:
Coaching was given to students for Statistical software after their result was evaluatedin
January in order to improve their performance in April exams. Determine if the coaching was
successful. (α = 0.05%)
Decision Rule: If p<a, reject null hypothesis.
Inference: p(0.85)> a(0.05), accept null hypothesis.
Conclusion: The coaching was not successful.
E. Two Sample t Test
Problem 1:

To analyse that there is a significant difference between the marks scored by class groups A
& B in mathematics at α=10%.
Decision Rule: If p<a, reject null hypothesis.
Inference: p(0.85)> a(0.10), accept null hypothesis.
Conclusion: There is no significant difference between the marks scored by A and B.

F. F Test

Problem 1:
Determine whether or not there is a significant difference between variances of two data
sets.
Decision Rule: If p<a, reject null hypothesis.
Inference: p(0.87)> a(0.05), accept null hypothesis.
Conclusion: There is no significant difference between the variances of the two datasets.

G. One Way Anova


Problem 1:

The marks for 3 different groups in Economics, Science, History are given. Determine
whether there is a significant difference between the means of population.
Decision Rule: If p<a, reject null hypothesis.
Inference: p(0.06)> a(0.05), accept null hypothesis.
Conclusion: There is no significant difference between the mean of the population.
H. Chi Square Test
Problem 1:
Determine whether brand preference is independent of age group.

Decision Rule: If p<a, reject null hypothesis.


Inference: p(0.35)> a(0.05), accept null hypothesis.
Conclusion: There is no association between brand preference and age group.

You might also like