Python 06 MachineLearning
Python 06 MachineLearning
http://had.co.nz/stat645/model-vis.pdf
Dimensions and Features
In order to do machine learning you need a data set containing
instances (examples) that are composed of features from which
you compose dimensions.
Target
1. Y ≡ Thickness of car tires after some testing period
Variables
1. X1 ≡ distance travelled in test
2. X2 ≡ time duration of test
3. X3 ≡ amount of chemical C in tires
The feature space is R3, or more accurately, the positive quadrant in R3 as all
the X variables can only be positive quantities.
http://stats.stackexchange.com/questions/46425/what-is-feature-space
Mappings
Domain knowledge about tires might suggest that the speed the vehicle was
moving at is important, hence we generate another variable, X4 (this is the
feature extraction part):
This extends our old feature space into a new one, the positive part of R4.
ϕ(x1,x2,x3) = (x1,x2,x3,x1x2)
http://stats.stackexchange.com/questions/46425/what-is-feature-space
Your Task
Given a data set of instances of size N, create
a model that is fit from the data (built) by
extracting features and dimensions. Then use
that model to predict outcomes …
1. Data Wrangling (normalization, standardization, imputing)
2. Feature Analysis/Extraction
3. Model Selection/Building
4. Model Evaluation
5. Operationalize Model
A Tour of Machine Learning
Algorithms
Models: Instance Methods
Compare instances in data set with a similarity
measure to find best matches.
- Suffers from curse of dimensionality.
- Focus on feature representation and
similarity metrics between instances
● Ridge Regression
● LASSO (Least Absolute Shrinkage & Selection Operator)
● Elastic Net
Models: Regularization Methods
LASSO
• Limits total weight of parameters
• Can be interpreted as a prior distribution on parameters
http://www.saedsayad.com/decision_tree.htm
Models: Bayesian
Explicitly apply Bayes’ Theorem for
classification and regression tasks. Usually by
fitting a probability function constructed via the
chain rule and a naive simplification of Bayes.
● Naive Bayes
● Averaged One-Dependence Estimators (AODE)
● Bayesian Belief Network (BBN)
Naive Bayes
Probability distribution:
● k-Means
● Affinity Propegation
● OPTICS (Ordering Points to Identify Cluster Structure)
● Agglomerative Clustering
K-means clustering
Models: Artificial Neural Networks
Inspired by biological neural networks, ANNs are
nonlinear function approximators that estimate
functions with a large number of inputs.
- System of interconnected neurons that activate
- Deep learning extends simple networks recursively
● Boosting
● Bootstrapped Aggregation (Bagging)
● AdaBoost
● Stacked Generalization (blending)
● Gradient Boosting Machines (GBM)
● Random Forest
AdaBoost
AdaBoost
Models: Other
The list before was not comprehensive, other
algorithm and model classes include:
● Conditional Random Fields (CRF)
● Markovian Models (HMMs)
● Dimensionality Reduction (PCA, PLS)
● Rule Learning (Apriori, Brill)
● More ...
What is Scikit-Learn?
Extensions to SciPy (Scientific Python) are
called SciKits. SciKit-Learn provides machine
learning algorithms.
● Algorithms for supervised & unsupervised learning
● Built on SciPy and Numpy
● Standard Python API interface
● Sits on top of c libraries, LAPACK, LibSVM, and Cython
● Open Source: BSD License (part of Linux)
- Scikit-Learn Tutorial
class Estimator(object):
def f i t ( s e l f , X, y=None):
" " " F i t s estimator to d a t a . " " "
# set state o f ` ` s e l f ` `
returns e l f
def p r e d i c t ( s e l f , X ) :
" " " P r e d i c t response o f` ` X ` ` . " " "
# compute p r e d i c t i o n s ` ` p r e d ` `
return pred
Basic methodology
Wrapping fit and predict
We’ve already discussed a broad workflow, the
following is a development workflow:
Feature Feature
Raw Data
Extraction Evaluation
Load &
Build Model Evaluate Model
Transform Data
Task 6
- Select dataset (wines / student performance)
- Apply various learning algorithms on the
problem
- Provide best possible prediction w.r.t. RMSE