Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
183 views

1 Introduction To Data Science R Programming Edited 1 1

Uploaded by

Mamadou Diakite
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
183 views

1 Introduction To Data Science R Programming Edited 1 1

Uploaded by

Mamadou Diakite
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 74

Data Science

Introduction to Data Science

Copyright IntelliPaat. All rights reserved


Agenda

01 Need of Data Science 02 Life Cycle of Data Science

03 Applications of Data Science 04 Introduction to R

05 Introduction to R-Studio 06 Data Types & Operators in R

Copyright IntelliPaat. All rights reserved


Data

What is Data?

Copyright IntelliPaat. All rights reserved


Data

Well, it’s just a -0.879


“My name
collection of facts! is Sam”
348

0 1
𝐀= 𝛑𝐫 𝟐
1 0

Copyright IntelliPaat. All rights reserved


Data Back Then

Small Sized

Structured

Single Format

Copyright IntelliPaat. All rights reserved


Data Today

Unstructured

M ultiple Formats

Humongous Sized

Copyright IntelliPaat. All rights reserved


Need of Data Science

Now, what should I


do with all of this
huge
unstructured
data?

The need to understand and analyze data to make better decisions is what gave birth to Data
Science.

Copyright IntelliPaat. All rights reserved


Need of Data Science

Understand the data

Find interesting insights

Make informed decisions

Copyright IntelliPaat. All rights reserved


What is Data Science?

Applying science on data to make the data talk to us

Data
Data Science
Science

Copyright IntelliPaat. All rights reserved


What is Data Science?

Data Science is an umbrella term which encompasses multiple domains.

Data Data Statistical M achine


Visualization M anipulation Analysis Learning

Copyright IntelliPaat. All rights reserved


Types of Data Analytics

Prescriptive

Comprehensive, accurate and effective visualization

Predictive
Prescriptive

Ability to drill down to the root cause

Diagnostic Diagnostic

Historical patterns being used to predict specific outcomes using


algorithms Predictive Descriptive

Descriptive

Applying advanced analytical algorithms to make specific


recommendations and strategies

Copyright IntelliPaat. All rights reserved


Life Cycle of Data Science

Copyright IntelliPaat, All rights reserved


Life Cycle of Data Science

Knowledge
Representation

Pattern Evaluation

Model Building

Data Preprocessing

Data Acquisition

Copyright IntelliPaat. All rights reserved


Data Acquisition

Data comes from multiple sources and is present in multiple formats. This data has to be integrated and
stored in one single location.

Data from multiple


Data Warehouse Target Data
sources

Copyright IntelliPaat. All rights reserved


Data Preprocessing

Once data acquisition is done, the raw data has to be processed to bring it to the right format.

Summarize

Transform Normalize

Aggregat
e
Copyright IntelliPaat. All rights reserved
Model Building

Model building is the process where we apply different scientific algorithms to find interesting insights
from the data.

Linear Random
K-Means
Regression Forest

Copyright IntelliPaat. All rights reserved


Pattern Evaluation

The model gives us some patterns/information. These patterns have to be evaluated, i.e., here we have
to evaluate whether the obtained information is new, correct and useful.

Model Pattern Evaluation

Copyright IntelliPaat. All rights reserved


Knowledge Representation

Once the information is validated, it can be represented with simple aesthetic graphs.

Copyright IntelliPaat. All rights reserved


Application of Data Science
in Different Industries

Copyright IntelliPaat, All rights reserved


Application of Data Science in Telecom

Analytical Customer Relationship Management


(ACRM)

Fraud Reduction

Bad Debt Reduction

Price Optimization

Copyright IntelliPaat. All rights reserved


Application of Data Science in Banking

Acquire and Retain Customers

Detect Fraud

Improve Risk Control

Optimize Product and Portfolio Model

Copyright IntelliPaat. All rights reserved


Application of Data Science in E-commerce

Enhance Customer Engagement

Customize Offers and Promotions

Maintain Effective Supply Chain Management

Improve User Experience

Copyright IntelliPaat. All rights reserved


Introduction to R

Copyright IntelliPaat, All rights reserved


Introduction to R

R is a language for data analysis and statistical analysis.

Copyright IntelliPaat. All rights reserved


Introduction to R

R is a visualization tool.

Copyright IntelliPaat. All rights reserved


Introduction to R

R is an open-source, cross-platform compatible software.

Copyright IntelliPaat. All rights reserved


Introduction to R

R is a Turing complete language.

Copyright IntelliPaat. All rights reserved


Installing R

Copyright IntelliPaat, All rights reserved


Installing R

You can install R from https://cran.r-project.org/

Copyright IntelliPaat. All rights reserved


R-Studio

Copyright IntelliPaat, All rights reserved


R-Studio

R-Studio is a set of integrated tools designed to help you be more productive with R. It includes a console,
syntax-highlighting editor that supports direct code execution and a variety of robust tools for plotting, viewing
history, debugging and managing your workspace.

Copyright IntelliPaat. All rights reserved


Setting Working Directory

Change your working directory with the setwd() function, such as:

setwd("~/mydirectory")

Note that slashes always have to be forward slashes, even if you're on a Windows system.

For Windows, the command might look something like: setwd("C:/Sham/Documents/RProjects")

Copyright IntelliPaat. All rights reserved


Customizing R-Studio

R-Studio options are accessible from the Options dialog, Tools > Options menu (R-Studio > Preferences on a Mac)
and include the following categories:

Default CRAN mirror, initial working directory, workspace and history


General R Options
behavior

Enable/disable line numbers, selected word and line highlighting, soft-


Source Code Editing wrapping for R files, parent matching, right margin display, console syntax
highlighting, configure tab spacing and set default text encoding

Appearance & Themes Specify the font size and visual theme for the console and source editor

Locations of console, source editor and tab panes; set which tabs are
Pane Layout
included in each pane

Copyright IntelliPaat. All rights reserved


Customizing R-Studio

Packages Set default CRAN repository and specify package development options

Sweave Configure Sweave compiling options and PDF previewing

Spelling Choose main dictionary language and specify spell checking options

Configure locations of Git and SVN binaries and create and/or view SSH
Git/SVN
RSA keys

Publishing Enable publishing apps and documents from IDE and set account

Copyright IntelliPaat. All rights reserved


R-Studio GUI

Copyright IntelliPaat, All rights reserved


R-Studio GUI

Copyright IntelliPaat. All rights reserved


R-Studio GUI

In the top left corner of the screen, one can see a script editor window. Within this pane, one can edit his or her R
script.

Script
Window

Copyright IntelliPaat. All rights reserved


R-Studio GUI

Results of the script execution, together with the script lines that generated these results, will be displayed in the Console
window located in the bottom left corner of the screen.

Console
Window

Copyright IntelliPaat. All rights reserved


R-Studio GUI

The top right pane of the screen provides information about the variables and data structures used or generated by the
script. This is the so-called “Environment” window.

Environment
Window

Copyright IntelliPaat. All rights reserved


R-Studio GUI

The window on the bottom right corner of the screen shows information about the files and packages used by the
project and allows one to view plots (or visualizations) generated by R and also access help for various elements of R
syntax.

Plots/Help

Copyright IntelliPaat. All rights reserved


R Packages

Copyright IntelliPaat, All rights reserved


R Packages

Packages are collections of R functions, data and compiled code in a well-defined format.

The directory where packages are stored is called the library.

R comes with a standard set of packages. Others are available for download and installation.

R Package Function
.libPaths() # Get library location
library() # See all packages installed
search() # See packages currently loaded
detach(“package:pkg”) # Unload the loaded package
Install.packages(“package”) # Install the package
library(“package”) # Load the package
library(help= “package”) # List package contents

Copyright IntelliPaat. All rights reserved


Steps to Install R Packages

1 Run R-Studio

Click on the Packages tab in the bottom-right section and then click on
2 Install. The following dialog box will appear.

In the Install Packages dialog, write the package name you want to install
under the Packages field and then click Install. This will install the package
3 you searched for or give you a list of matching packages based on your
package text.

Copyright IntelliPaat. All rights reserved


Getting Help with R

Copyright IntelliPaat, All rights reserved


Getting Help with R

The help() function and ? help operator in R provide access to the documentation pages for R functions, data
sets and other objects, both for packages in the standard R distribution and for contributed packages.

The help() function can be used to access information about a package in your library—for
example, help(package="MASS")—which displays an index of available help pages for the package, along
with some other information.

Help Command Function


help.start () # General help
help(lm) # Help about function lm
example(lm) # Show an example of function lm
help(package) # List help page for “package”
?package # short form for “help(package)”

Copyright IntelliPaat. All rights reserved


Variables in R

Copyright IntelliPaat, All rights reserved


Variables in R

A variable is a temporary storage space where you can keep changing values.

Variable Variable Variable

Copyright IntelliPaat. All rights reserved


Data Types in R

Copyright IntelliPaat, All rights reserved


Data Types in R

Every variable is associated with a data type.

“hello
5 1ooo “z” TRUE FALSE 30-2i 2+5i
world”

“This is
−33 −0.45i
Sparta”

Numeric Character Logical Complex

Copyright IntelliPaat. All rights reserved


Operators in R

Copyright IntelliPaat, All rights reserved


Operators in R

Operators help in performing certain manipulations on top of the data and variables.

Assignment Operators

Arithmetic Operators

Relational Operators

Logical Operators

Copyright IntelliPaat. All rights reserved


Assignment Operators

Assignment operators are used to assign a value to an object.

Operators

<- ->

Example

x = 10 y <- 20 30 -> z

Copyright IntelliPaat. All rights reserved


Arithmetic Operators

Arithmetic operators are used to perform basic mathematical operations.

+ Addition

− Subtraction

* Multiplication

/ Division

Copyright IntelliPaat. All rights reserved


Relational Operators

Relational operators are used to test/define a relationship between two operands.

< Less than

Less than or equal


<=
to

> Greater than

Greater than or
>=
equal to

== Is equal to

!= Not equal to

Copyright IntelliPaat. All rights reserved


Logical Operators - AND

Logical operators are used to make a decision on the basis of a condition.

& AND

FALSE
+ FALSE FALSE

FALSE
+ TRUE FALSE

TRUE
+ FALSE FALSE

TRUE
+ TRUE TRUE

Copyright IntelliPaat. All rights reserved


Logical Operators - OR

Logical operators are used to make a decision on the basis of a condition

| OR

FALSE
+ FALSE FALSE

FALSE
+ TRUE TRUE

TRUE
+ FALSE TRUE

TRUE
+ TRUE TRUE

Copyright IntelliPaat. All rights reserved


Project-based Data Science
Course

Copyright IntelliPaat, All rights reserved


Data Science Project

We’ll learn Data


Science with this
“customer churn”
dataset.

Copyright IntelliPaat. All rights reserved


Problem Statement

You are the Data Scientist at a telecom company “Neo” whose customers are churning out to its
competitors. You have to analyse the data of your company and find insights.

I’ll analyse my
company’s data
completely to find
why customers
are churning out.

Neo

Copyright IntelliPaat. All rights reserved


Tasks to be Performed

1
Data
Manipulation

Find out hidden patterns in the “customer_churn” dataset by


using apply family of functions and dplyr package I’ll start off by
manipulating the
data.

Copyright IntelliPaat. All rights reserved


Tasks to be Performed

2
Data
Visualization

Represent the data with graphs by using ggplot2 package I’ll depict the data
pictorially to get a
better
understanding.

Copyright IntelliPaat. All rights reserved


Tasks to be Performed

3
Linear
Regression

I’ll build a linear


regression
Understand how the Monthly Charges of the customers vary algorithm on top of
with respect to other factors the
“customer_churn”
data.

Copyright IntelliPaat. All rights reserved


Tasks to be Performed

4
Logistic Regression

I’ll build a logistic


Get the probability of customers churning out with respect to other regression
factors algorithm on top
of the
‘customer_churn’
data.

Copyright IntelliPaat. All rights reserved


Tasks to be Performed

5
Decision Tree &
Random Forest

Classify whether the customer will churn or not on the basis of


other factors I’ll build decision
tree and random
forest algorithms.

Copyright IntelliPaat. All rights reserved


Tasks to be Performed

6
Clustering

Divide the customers into different clusters with k-means I’ll build k-means
clustering on top of the
“customer_churn”
dataset.

Copyright IntelliPaat. All rights reserved


Individual Modules

Copyright IntelliPaat, All rights reserved


Individual Modules

Individual Modules which are not based on ‘customer_churn’ dataset

Market Basket Analysis Recommendation


Engine

Time Series Deep Learning

Copyright IntelliPaat. All rights reserved


Quiz

Copyright IntelliPaat, All rights reserved


Quiz

Which of the following is not a type of analytics?

a. Descriptive

b. Business Intelligence

c. Predictive

d. Prescriptive

e. None of the above

Copyright IntelliPaat. All rights reserved


Quiz

Which of the following is not a type of analytics?

Solution:

b. Business Intelligence

Copyright IntelliPaat. All rights reserved


Quiz

Which of the following are the 4 Vs or dimensions of Big Data ?

a. Volume, Velocity, Variable & Vacuum

b. Volume, Velocity, Variety & Veracity

c. Volume, Vaccine, Variety & Variable

d. All of the above

Copyright IntelliPaat. All rights reserved


Quiz

Which of the following are the 4 Vs or dimensions of Big Data ?

Solution:

b. Volume, Velocity, Variety & Veracity

Copyright IntelliPaat. All rights reserved


Thank You

Copyright IntelliPaat. All rights reserved


India : +91-7847955955

US : 1-800-216-8930 (TOLL FREE)

sales@intellipaat.com

24/7 Chat with Our Course Advisor

Copyright IntelliPaat. All rights reserved

You might also like