1 Introduction To Data Science R Programming Edited 1 1
1 Introduction To Data Science R Programming Edited 1 1
What is Data?
0 1
𝐀= 𝛑𝐫 𝟐
1 0
Small Sized
Structured
Single Format
Unstructured
M ultiple Formats
Humongous Sized
The need to understand and analyze data to make better decisions is what gave birth to Data
Science.
Data
Data Science
Science
Prescriptive
Predictive
Prescriptive
Diagnostic Diagnostic
Descriptive
Knowledge
Representation
Pattern Evaluation
Model Building
Data Preprocessing
Data Acquisition
Data comes from multiple sources and is present in multiple formats. This data has to be integrated and
stored in one single location.
Once data acquisition is done, the raw data has to be processed to bring it to the right format.
Summarize
Transform Normalize
Aggregat
e
Copyright IntelliPaat. All rights reserved
Model Building
Model building is the process where we apply different scientific algorithms to find interesting insights
from the data.
Linear Random
K-Means
Regression Forest
The model gives us some patterns/information. These patterns have to be evaluated, i.e., here we have
to evaluate whether the obtained information is new, correct and useful.
Once the information is validated, it can be represented with simple aesthetic graphs.
Fraud Reduction
Price Optimization
Detect Fraud
R is a visualization tool.
R-Studio is a set of integrated tools designed to help you be more productive with R. It includes a console,
syntax-highlighting editor that supports direct code execution and a variety of robust tools for plotting, viewing
history, debugging and managing your workspace.
Change your working directory with the setwd() function, such as:
setwd("~/mydirectory")
Note that slashes always have to be forward slashes, even if you're on a Windows system.
R-Studio options are accessible from the Options dialog, Tools > Options menu (R-Studio > Preferences on a Mac)
and include the following categories:
Appearance & Themes Specify the font size and visual theme for the console and source editor
Locations of console, source editor and tab panes; set which tabs are
Pane Layout
included in each pane
Packages Set default CRAN repository and specify package development options
Spelling Choose main dictionary language and specify spell checking options
Configure locations of Git and SVN binaries and create and/or view SSH
Git/SVN
RSA keys
Publishing Enable publishing apps and documents from IDE and set account
In the top left corner of the screen, one can see a script editor window. Within this pane, one can edit his or her R
script.
Script
Window
Results of the script execution, together with the script lines that generated these results, will be displayed in the Console
window located in the bottom left corner of the screen.
Console
Window
The top right pane of the screen provides information about the variables and data structures used or generated by the
script. This is the so-called “Environment” window.
Environment
Window
The window on the bottom right corner of the screen shows information about the files and packages used by the
project and allows one to view plots (or visualizations) generated by R and also access help for various elements of R
syntax.
Plots/Help
Packages are collections of R functions, data and compiled code in a well-defined format.
R comes with a standard set of packages. Others are available for download and installation.
R Package Function
.libPaths() # Get library location
library() # See all packages installed
search() # See packages currently loaded
detach(“package:pkg”) # Unload the loaded package
Install.packages(“package”) # Install the package
library(“package”) # Load the package
library(help= “package”) # List package contents
1 Run R-Studio
Click on the Packages tab in the bottom-right section and then click on
2 Install. The following dialog box will appear.
In the Install Packages dialog, write the package name you want to install
under the Packages field and then click Install. This will install the package
3 you searched for or give you a list of matching packages based on your
package text.
The help() function and ? help operator in R provide access to the documentation pages for R functions, data
sets and other objects, both for packages in the standard R distribution and for contributed packages.
The help() function can be used to access information about a package in your library—for
example, help(package="MASS")—which displays an index of available help pages for the package, along
with some other information.
A variable is a temporary storage space where you can keep changing values.
“hello
5 1ooo “z” TRUE FALSE 30-2i 2+5i
world”
“This is
−33 −0.45i
Sparta”
Operators help in performing certain manipulations on top of the data and variables.
Assignment Operators
Arithmetic Operators
Relational Operators
Logical Operators
Operators
<- ->
Example
x = 10 y <- 20 30 -> z
+ Addition
− Subtraction
* Multiplication
/ Division
Greater than or
>=
equal to
== Is equal to
!= Not equal to
& AND
FALSE
+ FALSE FALSE
FALSE
+ TRUE FALSE
TRUE
+ FALSE FALSE
TRUE
+ TRUE TRUE
| OR
FALSE
+ FALSE FALSE
FALSE
+ TRUE TRUE
TRUE
+ FALSE TRUE
TRUE
+ TRUE TRUE
You are the Data Scientist at a telecom company “Neo” whose customers are churning out to its
competitors. You have to analyse the data of your company and find insights.
I’ll analyse my
company’s data
completely to find
why customers
are churning out.
Neo
1
Data
Manipulation
2
Data
Visualization
Represent the data with graphs by using ggplot2 package I’ll depict the data
pictorially to get a
better
understanding.
3
Linear
Regression
4
Logistic Regression
5
Decision Tree &
Random Forest
6
Clustering
Divide the customers into different clusters with k-means I’ll build k-means
clustering on top of the
“customer_churn”
dataset.
a. Descriptive
b. Business Intelligence
c. Predictive
d. Prescriptive
Solution:
b. Business Intelligence
Solution:
sales@intellipaat.com