Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
100% found this document useful (1 vote)
338 views

DATA SCIENCE With DA, ML, DL, AI Using Python & R PDF

This document provides an overview of topics related to data science, machine learning, and artificial intelligence using Python, R, and Weka. It introduces data science concepts and processes, machine learning techniques including supervised and unsupervised learning, and algorithms such as KNN, naive bayes, decision trees, and support vector machines. It also covers programming in Python and R, including data structures, loading and analyzing data, statistics, data visualization, and applying machine learning algorithms like KNN, naive bayes, and decision trees to classification problems.

Uploaded by

Saikumar Reddy
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
338 views

DATA SCIENCE With DA, ML, DL, AI Using Python & R PDF

This document provides an overview of topics related to data science, machine learning, and artificial intelligence using Python, R, and Weka. It introduces data science concepts and processes, machine learning techniques including supervised and unsupervised learning, and algorithms such as KNN, naive bayes, decision trees, and support vector machines. It also covers programming in Python and R, including data structures, loading and analyzing data, statistics, data visualization, and applying machine learning algorithms like KNN, naive bayes, and decision trees to classification problems.

Uploaded by

Saikumar Reddy
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

with DATA ANALYTICS, MACHINE LEARNING,

DEEP LEARNING & ARTIFICIAL INTELLIGENCE


using PYTHON, R & Weka

INTRODUCTION TO DATA SCIENCE:


 What is Data Science?
 Who is Data Scientist and who can become a Data Scientist?
 Real time process of Data Science
 Data Science Applications
 Technologies used in Data Science
 Prerequisites knowledge to learn Data Science

INTRODUCTION TO MACHINE LEARINING:


 What is Machine Learning?
 How Machine will learn like Human Learning?
 Traditional Programming vs. machine learning
 Machine Learning engineer responsibilities
 Types of learning
 Supervised learning
 Un-supervised learning
 Machine learning algorithms: KNN, Naïve-bayes, Decision trees,
Classification rules, Regression (Linear Regression, Logistic Regression),
K-means clustering, Association rules, Support Vector Machine, Random
Forest.

PYTHON PROGRAMMING:
 What is Python? History of Python
 Python Features, Applications of Python
 Downloading and Installing Python
 Python IDE: Jupyter Notebook & Spyder
 What is Anaconda Navigator?
 Downloading and Installing Anaconda, Jupyter Notebook & Spyder
 Python Programming vs. Existing Programming
 Interactive Mode Programming & Script Mode Programming
 Python Identifiers, Reserved Words
 Lines and Indentations, Quotations, Comments
 Assigning values to variables

DATAhill Solutions, Near Malabar Gold, KPHB, Hyderabad. Ph: 9292005440


 Operators - Arithmetic Operators, Comparison (Relational) Operators,
Assignment Operators, Logical Operators, Bitwise Operators, Membership
Operators, Identity Operators
 Decision Making and Loops
 Flavors in Python, Python Versions
 Data Types: int, float, complex, bool, str
 List, Tuple, Range, Bytes & Bytearray
 Set, Frozenset, Dict, None
 Inbuilt Functions in Python, Slice operator - Indexing
 Mutable vs. Immutable, Modules and Packages
 Database Connection - PyMySQL, Defining & Manipulating

NumPy with Python:


 NumPy Environment setup in Python, Features of NumPy
 Array Creation, Indexing & Slicing, Array Manipulation
 Mathematical Functions, Statistical Functions

Pandas with Python:


 Pandas Environment setup in Python
 Features of Pandas, Data Structures
 Series - Create Series, Accessing Data from Series with Position
 DataFrame - Features of DataFrame, Create DataFrame, DataFrame from
List, Dict, Row & Column Selecting, Adding & Deleting
 Panel - Create and select data from Panel
 Indexing & Selecting Data, Statistical Functions
 Merging / Joining, Categorical Data

R PROGRAMMING:
 R Programming Introduction
 R Programming vs. Existing Programming
 Downloading and Installing R, What is CRAN?
 R Programming IDE: RStudio, Downloading and Installing RStudio
 Variable Assignment - Displaying & Deleting Variables
 Comments – Single Line and Multi Line Comments
 Data Types – Logical, Integer, Double, Complex, Character
 Operators - Arithmetic Operators, Relational Operators, Logical Operators,
Assignment Operators, R as Calculator, Performing different Calculations
 Functions – Inbuilt Functions and User Defined Functions
 STRUCTURES – Vector, List, Matrix, Data frame, Array, Factors
 Inbuilt Constants & Functions

Setting Environment:
 Search Packages in R Environment
 Search Packages in Machine with inbuilt function and manual searching
 Attach Packages to R Environment
 Install Add-on Packages from CRAN

DATAhill Solutions, Near Malabar Gold, KPHB, Hyderabad. Ph: 9292005440


 Detach Packages from R Environment
 Functions and Packages Help

Vectors:
 Vector Creation, Single Element Vector, Multiple Element Vector
 Vector Manipulation, Sub setting & Accessing the Data in Vectors

Lists:
 Creating a List, Naming List Elements, Accessing List Elements
 Manipulating List Elements, Merging Lists, Converting List to Vector

Matrix:
 Creating a Matrix, Accessing Elements of a Matrix
 Matrix Manipulations, Dimensions of Matrix, Transpose of Matrix

Data Frames:
 Create Data Frame, Vector to Data Frame
 Why Characters are Converting into Factors? – stringsAsFactors
 Convert the columns of a data frame to characters
 Extract Data from Data Frame
 Expand Data Frame, Column Bind and Row Bind
 Merging / Joining Data Frames – Inner Join, Outer Join & Cross Join

Arrays:
 Create Array with Multiple Dimensions, Naming Columns and Rows
 Accessing Array Elements, Manipulating Array Elements
 Calculations across Array Elements

Factors:
 Factors in Data Frame, Changing the Order of Levels
 Generating Factor Levels, Deleting Factor Levels

Loading and Reading Data:


 DATA EXTRACTION FROM CSV
 Getting and Setting the Working Directory
 Input as CSV File, Reading a CSV File
 Analyzing the CSV File, Writing into a CSV File
 DATA EXTRACTION FROM URL
 DATA EXTRACTION FROM CLIPBOARD
 DATA EXTRACTION FROM EXCEL
 Install “xlsx” Package
 Verify and Load the "xlsx" Package, Input as “xlsx” File
 Reading the Excel File, Writing the Excel File
 DATA EXTRACTION FROM DATABASES

DATAhill Solutions, Near Malabar Gold, KPHB, Hyderabad. Ph: 9292005440


 RMySQL Package, Connecting to MySql
 Querying the Tables, Query with Filter Clause
 Updating Rows in the Tables, Inserting Data into the Tables
 Creating Tables in MySql, Dropping Tables in MySql
 Using dplyr and tidyr package

STATISTICS:
 Mean, Median and Mode
 Data Variability: Range, Quartiles, IQR, Calculating Percentiles
 Variance, Standard Deviation, Statistical Summaries
 Types of Distributions – Normal, Binomial, Poisson
 Probability Distributions, Skewness, Outliers
 Data Distribution, 68–95–99.7 rule (Empirical rule)
 Descriptive Statistics and Inferential Statistics
 Statistics Terms and Definitions, Types of Data
 Data Measurement Scales, Normalization
 Measure of Distance, Euclidean Distance
 Probability Calculation – Independent & Dependent
 Hypothesis Testing, Analysis of Variance

DATA VISUALIZATION:
 Data Visualization with MatPlotLib and Seaborn
 Data Visualization with Graphics and GrDevices
 High Level Plotting and Low Level Plotting
 Pie Charts - Title, Colors, Slice Percentages, Chart Legend
 3D Pie Charts
 Box Plots - Outliers, Ranges, IQR, Quantiles, Median, Data Distribution
Analysis, 68–95–99.7 rule (Empirical rule)
 Bar Charts - Label, Title, Colors, Group Bar, Stacked Bar Charts
 Histograms - Range of X and Y Values
 Line Graphs - Types: Points, Lines, Both, Overplotted, Steps
 Scatterplots
 Combining Plots - Par and Layout

LAZY LEARNING – CLASSIFICATION USING NEAREST NEIGHBORS:


 Understanding Classification Using Nearest Neighbors
 The KNN algorithm
 Calculating distance
 Choosing an appropriate k
 Preparing data for use with KNN
 Why is the KNN algorithm lazy?
 Diagnosing breast cancer with the KNN algorithm
 Collecting data
 Exploring and preparing the data
o Transformation-normalizing numeric the data

DATAhill Solutions, Near Malabar Gold, KPHB, Hyderabad. Ph: 9292005440


o Data preparing –creating training and test datasets
 Training a model on the data
 Evaluating model performance
 Improving model performance
o Transformation –z-score standardization
o Testing alternative values of k

PROBABILISTIC LEARNING – CLASSIFICATION USING NAÏVE


BAYES:
 Understanding Naïve-Bayes
 Basic concepts of Bayesian methods
 Probability
 Joint probability
 Conditional probability with Bayes’ theorem
 The Naïve Bayes Algorithm
 The Naïve Bayes classification
 The Laplace estimator
 Using numeric features with Naïve Bayes
 Filtering Mobile Phone Spam with the Naïve-Bayes Algorithm
 Collecting data
 Exploring and preparing the data
 Data preparation –processing text data for analysis
o Data preparation –creating training and test datasets
o Visualizing text data-word clouds
o Data preparation-creating indicator features for frequent words
 Training a model on the data
 Evaluating model performance
 Improving model performance

DIVIDE AND CONQUER – CLASSIFICATION USING DECISION TREES


AND RULES:
 Understanding decision trees
 Divide conquer
 The C5.0 decision tree algorithm
o Choosing the best split
o Pruning the decision tree
 Identifying risky bank loans using C5.0 decision trees
 Collect data
 Exploring and preparing the data
o Data preparation-creating random training and test datasets
 Training a model on the data
 Evaluating model performance
 Improving model performance
o Boosting the accuracy of decision trees
o Making some mistakes more costly than others

DATAhill Solutions, Near Malabar Gold, KPHB, Hyderabad. Ph: 9292005440


 Understanding classification rules
 Separate and conquer
 The one rule algorithm
 The RIPPER algorithm
 Rules from decision trees
 Identifying poisonous mushrooms with rule learners
 Collecting data
 Exploring and preparing data
 Training a model on the data
 Evaluating model performance
 Improving model performance

FORECASTING NUMARIC DATA – REGRESSION METHODS:


 Understanding regression
 Simple linear regression
 Ordinary least squares estimation
 Correlations
 Multiple linear regressions
 Predicting medical expenses using linear regression
 Collecting data
 Exploring and preparing data
o Exploring relationships among features- the correlation matrix
o Visualizing relationships among features –the scatter plot
matrix
 Training a model on the data
 Evaluating model performance
 Improving model performance
o Model specification –adding non-linear relationships
o Transformation –converting a numeric variable to a binary
indicator
o Model specification –adding interaction effects
o Putting it all together-an improved regression model
 Understanding regression trees and model trees
 Adding regression to trees
 Estimating the quality of wines with regression trees and model trees
 Collecting data
 Exploring and preparing the data
 Training a model on the data
o Visualizing decision trees
 Evaluating model performance
o Measuring performance with mean absolute error
 Improving model performance

DATAhill Solutions, Near Malabar Gold, KPHB, Hyderabad. Ph: 9292005440


FINDING PATTERNS - MARKET BASKET ANALYSIS USING
ASSOCIATION RULES:
 Understanding Association Rules
 The Apriori algorithm for association rule learning
o Measuring rule interest –support and confidence
o Building a set of rules with the Apriori
 Identifying frequently purchased groceries with association rules
 Collecting data
 Exploring and preparing the data
o Data preparation – creating a sparse matrix for transaction
data
o Visualizing item support –item frequency plots
o Visualizing transaction data-plotting the sparse matrix
 Training a model on the data
 Evaluating model performance
 Improving model performance
o Sorting the set of association rules
o Taking subsets of association rules
o Saving association rules to a file or data frame

FINDING GROUPS OF DATA - CLUSTERING WITH K-MEANS:


 Understanding Clustering
 Clustering as a machine learning task
 The K-means algorithm for clustering
o Using distance to assign and update cluster
o Choosing the appropriate number of cluster
 Finding teen market segments using K-means clustering
 Collecting data
 Exploring and preparing the data
o Data preparation –dummy coding missing values
o Data preparing –imputing missing values
 Training a model on the data
 Evaluating model performance
 Improving model performance

EVALUATING MODEL PERFORMANCE:


 Measuring Performance for Classification
 Working with classification prediction data in R
 A closer look at confusion matrices
 Using confusion matrices to measure performance
 Beyond accuracy – other measure of performance
o The kappa statistic
o Sensitivity and specificity
o Precision and recall
o The F- measure

DATAhill Solutions, Near Malabar Gold, KPHB, Hyderabad. Ph: 9292005440


 Visualizing performance TRADEOFFS
o ROC curves
 Estimating future performance
 The holdout method
 Cross-validation
 Bootstrap sampling

IMPROVING MODEL PERFORMANCE:


 Tuning Stock Models for Better Performance
 Using caret for automated parameter tuning
o Creating a simple tuned model
o Customizing the tuning process
 Improving Model Performance with Meta – Learning
 Understanding ensembles
 Bagging
 Boosting
 Random forests
o Training random forests
o Evaluating random forest performance

DEEP LEARNING:
 Installation of Theano, TensorFlow, Keras, OpenCV
 Relating Deep Learning and Traditional Machine Learning
 Basics of Neural Networks
 Artificial Neural Networks
 Deep Neural Networks
 Convolutional Neural Networks
 Recurrent Neural Networks
 Deep learning with Theano
 Deep Learning with TensorFlow
 Deep Learning with Keras
 Deep Learning with OpenCV
 Implementation of Deep learning

ARTIFICIAL INTELLIGENCE:
 AI Introduction
 AI Intelligent Systems
 AI Popular Search Algorithms
 AI Fuzzy Logic Systems
 AI Natural Language Processing
 AI Robotics
 AI Neural Networks

INTRODUCTION TO WEKA
 EXPLORE WEKA MACHINE LEARNING TOOLKIT

DATAhill Solutions, Near Malabar Gold, KPHB, Hyderabad. Ph: 9292005440


 Installation of WEKA
 Features of WEKA Toolkit
 Explore & Load data sets in Weka
 PERFORM DATA PREPROCESSING TASKS
 Apply Filters on data sets
 PERFORMING CLASSIFICATION ON DATA SETS
 J48 Classification Algorithm
 Decision Trees Algorithm
 K-NN Classification Algorithm
 Naive-bayes Classification Algorithm
 Comparing Classification Results
 PERFORMING REGRESSION ON DATA SETS
 Simple Linear Regression Model, Multi Linear Regression Model
 Logistic Regression Model, Cross-Validation and Percentage Split
 PERFORMING CLUSTERING ON DATA SETS
 Clustering Techniques in Weka
 Simple K-means Clustering Algorithm
 Association Rule Mining on Data Sets
 Apriori Association Rule Algorithm
 Discretization in the Rule Generation Process
 GRAPHICAL VISUALIZATION IN WEKA
 Visualization Features in Weka
 Visualize the data in various dimensions
 Plot Histogram, Derive Interesting Insights

 Trainer received Masters of Technology in Computer Science &


Engineering from JNTU, MICROSOFT Certified Professional, Certified
from IIT Kanpur & IIT Ropar.
 Having 10+ Years of Experience in Software & Training.
 His experience Includes Managing, Data Processing, Data Cleaning,
Predicting and Analyzing of Large volume of Business Data.
 Expertise in Data Science, Data Analytics, Machine Learning, Deep
Learning, Artificial Intelligence, Python, R, Weka, Data Management &
BI Technologies.
 Having publications and patents in various fields such as machine
learning, data security, and data science technologies.
 Professionally, he is Data Science management consultant with over 7+
years of experience in finance, retail, transport and other industries.

DATAhill Solutions, Near Malabar Gold, KPHB, Hyderabad. Ph: 9292005440


 Best training materials are provided with Lab Exercises, Data sets,
Codes, Quizzes, Case studies on real data.
 For every online session Recorded video & live running notes will
provide.
 Real time Training with live Scenarios and Applications.
 Support in Resume preparation and Interview preparation.
 Conduct Mock interviews through Skype and Telephonic after course
completion.
 You can shift the batch to weekday batches (morning or evening) and
weekend batches.
 Any number of batches can be attend in a year without any extra fees
 Job support for 1 month after successfully placing the candidates.
 Online help on Doubt Clearance, Career Guidance, Resume
Preparation and Interview Preparation.

DATAhill Solutions, Near Malabar Gold, KPHB, Hyderabad. Ph: 9292005440

You might also like