RMC Lovish
RMC Lovish
RMC Lovish
PROFESSIONAL STUDIES
VIVEKANANDA SCHOOL OF BUSINESS
STUDIES
RESEARCH METHODOLOGY OF
COMMERCE
B.COM(HONS) 213
R Studio
STUDENT NAME-LOVISH MANGLA
ENROLLEMENT NUMBER-13017788822
SECTION-BCOM(H)3-C
SUBMITTED TO:DR KRITIKA NAGDEV
RAISETIA
ASSISTANT PROFESSOR,VIPS
INTRODUCTION TO R Studio
Q1. How to Install R Studio? What is the latest version of R. Give details?
Ans-In order to install R Studio, we first need to install R. Following are the steps how to
install R:
Go to CRAN, click Download R for Windows, click Base, and download the installer
for the latest R version.
Right-click the installer file and select Run as Administrator from the pop-up menu.
Select the language to be used during installation.
This doesn't change the language used by R; all messages and Help files remain in
English.
After installing the setup of R, we can install the setup of R Studio. Following are the steps
how
to install R Studio:
The latest stable version of R is R 4.3.2(Eye Holes), released on 2023-10-31. Here are some
examples of R versions along with their code names:
Q2. RStudio Layout with snapshot. Explain the purpose of all panes.
Ans- There are four panes in R Studio also known as windows. These panes are given below:
1. Source
2. Console
3. Environment/History
4. Files/Plots/Packages/Help
Source: This is that part of the window where we write our code. Our code will not be
evaluated until we run this code in the console.
Console: This is that part of the pane where our code from the source is evaluated by R. We
can also use the console to perform quick calculations that we don't need to save.
Environment/History: This is that part of the window where we can see that what objects
are in our working space.
Files/Post/Packages/Help: This is that part of the pane where we can see file directories,
view plots, see our packages and access R help.
Q11.What is %>% operator used for? Write a sample command for the same
Ans- %>% is called the forward pipe operator in R. It provides a mechanism for chaining
commands with a new forward-pipe operator, %>%. This operator will forward a value, or
the result of an expression, into the next function call/expression.
Q14.Write the code to identify minimum number among three numbers using Nested IF
statement
Ans-
Q15.Display grade of student using nested if command for following criterion (customize
student name). Output example: Kritika has scored “A” Grade
Ans-
Mean(x) Mean of x
Median(x) Median of x
Var(x) Variance of x
Q23.Write a command to access non-consecutive rows or columns, use ‘c() ‘. For example, to
obtain rows 1 to 5, 7 and 11 and columns 3 to 4 and 7
Ans-
Q24.Add one new column and drop two existing columns 4 and 5
Ans-
Q27.What is the command to access built in datasets? What is the command to get
description of a built-in datasets
Ans- We use data() function without any arguments to get the list of built-in datasets.
Q28.Access Titanic dataset and Execute commands to evaluate whether the evacuation
strategy was fair or not. If biased, state which gender, age group and class was most
favoured. Analyse using cross tabulations.
Ans-
It is evidently visible that the percentage of child survived is 52.29 % while the percentage of
adult survived is 31.26 % . So ,child was most favoured.
Similarly, the percentage of male survived is 21.20 % while the percentage of female
survived is 73.19 %.So, females were most favoured.
Q30.Apply Important Functions (gather, separate, unite, spread, fill, full_seq, drop_na, and
replace_na) in “tidyr Package” for following dataset.
Ans-gather()
separate()
unite()
spread()
fill()
full_seq
drop_na
replace_na
Q31.Apply Important Functions (filter, arrange, select, rename, mutate and transmute,
sample_n and sample_frac) for following column heads with 5 data rows:
Ans-Dataset:
Filter
Arrange:
Select:
Rename:
Mutate:
Transmute:
sample_n:
sample_frac:
Data Visualization in R Studio
Quick plot with ggplot2
Q32.Generate BCOM marks data containing the sections and overall percentage (5 sections
ranging from A to E ), with 60 students in each section.
Q33.Create following Quick plots with customized labels (with your name and DOB) for both
the axis and Main title of the chart
• Histogram fill color by group (Section)
• Draw a plot using data from numeric vectors where X contains values ranging from 10 to
20 and Y is square of X
• Add to the dot plot for X & Y
Q34.Activate Motor Trend Car Road Tests dataset. Using the given data set prepare following
quick plots:
• Scatter plots with smoothed line for Miles/(US) gallon on y axis and Weight (lb/1000) on x
axis
• Scatter plots (for Miles/(US) gallon on y axis and Weight (lb/1000) on x axis) with
Smoothed line by groups (Number of cylinders)
• Scatter plots with colors for Miles/(US) gallon on y axis and Weight (lb/1000) on x axis
• Scatter plots (for Miles/(US) gallon on y axis and Weight (lb/1000) on x axis) with colors by
groups (Number of gears)
• Scatter plots (for Miles/(US) gallon on y axis and Weight (lb/1000) on x axis) with
Smoothed line and colors by groups (Number of gears)
• Scatter plots (for Miles/(US) gallon on y axis and Weight (lb/1000) on x axis) with
Smoothed line and the point shape by groups (Number of gears)
For all test import excel with the given data saved by your name_test
T-TEST
A. One Sample t- Test using dummy (One- Tailed)
File name example: kritika_ttest
Problem 1:
To determine that the population mean of age is equal to 40 at α=0.05.
Decision Rule: If p> a, accept null hypothesis.
Inference: p(0.1571)> a(0.05), accept null hypothesis.
Conclusion: The population mean of age is 40 at a=0.05.
Problem 1:
Is there sufficient evidence to suggest that the mean time to exhaustion is greater after
chocolate milk than after carbohydrate replacement drink? Use a significance level of 0.05.
(Use µCM-µCD in hypothesis statements)
Decision Rule: If p<a, reject null hypothesis.
Inference: p(0.076)> a(0.05), accept null hypothesis.
Conclusion: The mean time to exhaustion is greater after carbohydrate drink than after
chocolate milk at a=0.05.
D. Paired t- Test
Problem 1:
Coaching was given to students for Statistical software after their result was evaluatedin
January in order to improve their performance in April exams. Determine if the coaching was
successful. (α = 0.05%)
Decision Rule: If p<a, reject null hypothesis.
Inference: p(0.85)> a(0.05), accept null hypothesis.
Conclusion: The coaching was not successful.
E. Two Sample t Test
Problem 1:
To analyse that there is a significant difference between the marks scored by class groups A
& B in mathematics at α=10%.
Decision Rule: If p<a, reject null hypothesis.
Inference: p(0.85)> a(0.10), accept null hypothesis.
Conclusion: There is no significant difference between the marks scored by A and B.
F. F Test
Problem 1:
Determine whether or not there is a significant difference between variances of two data
sets.
Decision Rule: If p<a, reject null hypothesis.
Inference: p(0.87)> a(0.05), accept null hypothesis.
Conclusion: There is no significant difference between the variances of the two datasets.
The marks for 3 different groups in Economics, Science, History are given. Determine
whether there is a significant difference between the means of population.
Decision Rule: If p<a, reject null hypothesis.
Inference: p(0.06)> a(0.05), accept null hypothesis.
Conclusion: There is no significant difference between the mean of the population.
H. Chi Square Test
Problem 1:
Determine whether brand preference is independent of age group.