CuRious about R in Power BI? End to end R in Power BI for beginners

“R is a free software environment for
statistical computing and graphics.”
e GNU-Project: Open Source
e Based on “S” (programming language
developed by John Chambers at Bell-Labs)
e R Foundation: NPO for the development of R

§most widely used data analysis software - used by 2M +
data scientist, statisticians and analysts
§Most powerful statistical programming language
§flexible, extensible and comprehensive for productivity

•
© 2021 Dynamic Communities 15

•
© 2021 Dynamic Communities 16

Download,
save file,
double-click
17
© 2021 Dynamic Communities

Download,
save file,
double-click
18

Download,
save file,
double-click
19

Download,
save file,
double-click
20

• Mean – this is the average
• Median – splits the data in two halves
• Mode – the most popular value

• Variance – average squared difference between the data points
and the mean
• Standard Deviation – square root of the variance, more intuitive
• Percentiles – dataset is divided into 100 equal parts
• Quartiles – dataset is divided into four equal parts
• Interquartile range – middle 50% of data points

Advantages
• Free
• “Lingua franca” in methodological research: new statistical
procedures are often developed with R
• Large community: most problems are discussed on the internet
• No “point and click”: scripts make procedures transparent
and reproducible
• Flexible programming allows for automated replication with
new data

Drawbacks
• Not very intuitive
• No “Point and Click”: handling only through command line
and scripts
• Documentation is very technical at times
• Community-based: different developers (different, lacking
compatibility)
• Slow with very large data sets

Enter from command line
Ctrl + Enter from script
Assign variables:
x <- 2
Comments:
# Comment
Comment selection with Ctrl + Shift + c

FunctionName (arguments)
Function Effect
summary(x)
str(x)
head(x)
tail(x)
sum(x)
mean(x)
Summary information on x
Structure of x
Shows first 6 elements of x
Shows last 6 elements of x
Calculates the sum of a numeric vector
Calculates the arithmetic mean

Class Example
integer
numeric
character
logical
factor
date
complex
1, 2, 3
1.414, 3.14, 1.0
A, B, C TRUE,
FALSE
“A”, “1”, “rather correct”
date
complex numbers

sum(x) Sum of all elements
mean(x) Mean of all elements
prod(x) Product of all elements of x
diff(x) x2 − x1, x3 − x2, x4 − x3 etc.

x & y Logical AND
x | y Logical OR
!x Logical Negation
all(x) TRUE if all elements of x are TRUE
any(x) TRUE if at least one element of x is
TRUE

== Is equal to
!= Is not equal to
<,> Smaller than, Larger Than
<=, >= Smaller or equal, larger or equal
x in y Elements of x in y

ls <- list (1, 2, 3)
print (ls)

mx <- matrix (1:16, nrow = 4, ncol = 4)
print (mx)

Code Description
y ~ x
y ~ x1 + x2
y ~ x1 + x2 + 0 y ~
I(x1 + x2) y ~ . -
x1
y ~ x1 * x2
x has an effect on y
x1 and x2 have an effect on y
intercept set to zero
y is influenced by x1 plus x2
model of all variables except x1
interaction between x1 and x2

•
•
•
•
•
•
•
•

Let’s see the data!
Remember to press the
‘Run’ button or select
CTRL + ENTER to run
the command

This creates a new table
called PerfectDiamonds.
SELECT statements
allow you to choose the
columns you want.

This creates a new table
called PerfectDiamonds
using a Filter to select
only perfect diamonds

Script
Component
Appears in the
Power BI
Canvas
R Script goes here

10
20
30
2 3 4 5 6 7
displ
ct
y

0.0
0.1
0.2
0.3
2 3 4 5 6 7
displ
densit
y
as.factor(year)
1999
2008
ggplot (data = mpg, aes (x = displ)) +
geom_density (aes (fill = as.factor(year)), alpha=0.5)

With ggmap map graphics can be generated.

https://shiny.rstudio.com/gallery/

Tidy code is easier to
write, read, maintain
and frequently even
faster than the base R
counterparts.
It is also easier to learn.
So here we are!

● Tidy Data is a standard approach to structure datasets
● Good for Data Analysis and Data Visualization
● Variables make up the columns
● Observations make up the rows
● Values go into cells

● A Variable is a measurement
● Also known as:
● Independent or dependent variables
● Features – this is Microsoft’s terminology
● Predictors – (machine learning background)
● Outcomes – (social sciences background)
● The Response (if you have a statistics background)
● Attributes (if you have a dimensional modelling background)

● A Variable can fall into three categories:
● Fixed Variables
● Known variables prior to the start of the investigation
● Measured Variables
● Data that’s captured during the investigative process
● Derived Variables
● Think of a calculated column in DAX or SQL

● Ingests data from different sources
● There are lots of options to work with the file
● Headers
● Limiters
● https://cran.r-project.org/web/packages/readr/readr.pdf for more information

● Easy data manipulation
● Built for data frames
● There are equivalents in SQL
● Written in C++ so it’s faster

● 6 verbs for data manipulation
● Select
● Filter
● Mutate
● group_by
● Summarize
● Tally
● There are equivalents in SQL

•
•
•
•
•
•
•
•
•
•

CuRious about R in Power BI? End to end R in Power BI for beginners

More Related Content

CuRious about R in Power BI? End to end R in Power BI for beginners