Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
2 views

Machine Learning With Python

The document provides an overview of Machine Learning with Python, covering topics such as motivation, Python basics, and various machine learning techniques. It discusses Python's history, its libraries, and essential programming concepts including data types, control structures, and sequence types. The document emphasizes the importance of data in modern applications and the role of Python as a versatile programming language in the field of machine learning.

Uploaded by

Manish Sharma
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Machine Learning With Python

The document provides an overview of Machine Learning with Python, covering topics such as motivation, Python basics, and various machine learning techniques. It discusses Python's history, its libraries, and essential programming concepts including data types, control structures, and sequence types. The document emphasizes the importance of data in modern applications and the role of Python as a versatile programming language in the field of machine learning.

Uploaded by

Manish Sharma
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 86

Machine Learning with Python

By
Dr. Manish Sharma
Topics to be covered…..

• Motivation
• Python
• Introduction to Machine Learning
• Supervised Learning
• Unsupervised Learning
• Python libraries for Machine Learning
Motivation
• Data is the new fuel.
• Every application is using the data to get the
actual insight.
• Technologies like big data, Cloud computing,
IoT, smart infrastructures, 5G communication.
• Existing statistical analytics not capable of
exploring the details.
Brief History of Python
• Invented in the Netherlands, early 90s by Guido
van Rossum
• Named after Monty Python
• Open sourced from the beginning
• Considered a scripting language, but is much more
• Scalable, object oriented and functional from the
beginning
• Used by Google from the beginning
• Increasingly popular
Python’s Benevolent Dictator For Life

“Python is an experiment in
how much freedom
program-mers need. Too
much freedom and nobody
can read another's code;
too little and expressive-
ness is endangered.”
- Guido van Rossum
http://docs.python.org/
The Python tutorial is good!
Python IDEs
• Jupyter Notebook
• Jupyter Lab
• Spyder
• PyCharm
• Pydev
• Thonny
• ANACONDA(Distribution)
Running Python
Installing
• Python is pre-installed on most Unix systems,
including Linux and MAC OS X
• The pre-installed version may not be the most
recent one
• Download from http://python.org/download/
• Python comes with a large library of standard
modules
• There are several options for an IDE
– IDLE – works well with Windows
– Emacs with python-mode or your favorite text editor
– Eclipse with Pydev (http://pydev.sourceforge.net/)
IDLE Development Environment
• IDLE is an Integrated DeveLopment Environment for
Python, typically used on Windows
• Multi-window text editor with syntax highlighting,
auto-completion, smart indent and other.
• Python shell with syntax highlighting.
• Integrated debugger
with stepping, persis-
tent breakpoints,
and call stack visibility
Python Scripts
• When you call a python program from the command
line the interpreter evaluates each expression in the
file
• Familiar mechanisms are used to provide command
line arguments and/or redirect input and output
• Python also has mechanisms to allow a python
program to act both as a script and as a module to
be imported and used by another python program
The Basics
A Code Sample (in IDLE)
x = 34 - 23 # A comment.
y = “Hello” # Another one.
z = 3.45
if z == 3.45 or y == “Hello”:
x = x + 1
y = y + “ World” # String concat.
print x
print y
Enough to Understand the Code
 Indentation matters to code meaning
• Block structure indicated by indentation
 First assignment to a variable creates it
• Variable types don’t need to be declared.
• Python figures out the variable types on its own.
 Assignment is = and comparison is ==
 For numbers + - * / % are as expected
• Special use of + for string concatenation and % for
string formatting (as in C’s printf)
 Logical operators are words (and, or,
not) not symbols
 The basic printing command is print
Basic Datatypes
 Integers (default for numbers)
z = 5 / 2 # Answer 2, integer division
 Floats
x = 3.456
 Strings
• Can use “” or ‘’ to specify with “abc” ==
‘abc’
• Unmatched can occur within the string:
“matt’s”
• Use triple double-quotes for multi-line strings or
strings than contain both ‘ and “ inside of them:
“““a‘b“c”””
Whitespace
Whitespace is meaningful in Python: especially
indentation and placement of newlines
Use a newline to end a line of code
Use \ when must go to next line prematurely
No braces {} to mark blocks of code, use
consistent indentation instead
• First line with less indentation is outside of the block
• First line with more indentation starts a nested block
Colons start of a new block in many constructs,
e.g. function definitions, then clauses
Comments
 Start comments with #, rest of line is ignored
 Can include a “documentation string” as the
first line of a new function or class you define
 Development environments, debugger, and
other tools use it: it’s good style to include one
def fact(n):
“““fact(n) assumes n is a positive
integer and returns facorial of n.”””
assert(n>0)
return 1 if n==1 else n*fact(n-1)
Assignment
 Binding a variable in Python means setting a name to
hold a reference to some object
• Assignment creates references, not copies
 Names in Python do not have an intrinsic type,
objects have types
• Python determines the type of the reference automatically
based on what data is assigned to it
 You create a name the first time it appears on the left
side of an assignment expression:
x = 3
 A reference is deleted via garbage collection after
any names bound to it have passed out of scope
 Python uses reference semantics (more later)
Naming Rules
 Names are case sensitive and cannot start
with a number. They can contain letters,
numbers, and underscores.
bob Bob _bob _2_bob_ bob_2 BoB
 There are some reserved words:
and, assert, break, class, continue,
def, del, elif, else, except, exec,
finally, for, from, global, if,
import, in, is, lambda, not, or,
pass, print, raise, return, try,
while
Naming conventions
The Python community has these recommend-
ed naming conventions
joined_lower for functions, methods and,
attributes
joined_lower or ALL_CAPS for constants
StudlyCaps for classes
camelCase only to conform to pre-existing
conventions
Attributes: interface, _internal, __private
Assignment
 You can assign to multiple names at the
same time
>>> x, y = 2, 3
>>> x
2
>>> y
3
This makes it easy to swap values
>>> x, y = y, x
 Assignments can be chained
>>> a = b = x = 2
Accessing Non-Existent Name
Accessing a name before it’s been properly
created (by placing it on the left side of an
assignment), raises an error
>>> y

Traceback (most recent call last):


File "<pyshell#16>", line 1, in -toplevel-
y
NameError: name ‘y' is not defined
>>> y = 3
>>> y
3
Sequence types:
Tuples, Lists, and
Strings
Sequence Types
1. Tuple: (‘john’, 32, [CMSC])
 A simple immutable ordered sequence of
items
 Items can be of mixed types, including
collection types
2. Strings: “John Smith”
• Immutable
• Conceptually very much like a tuple
3. List: [1, 2, ‘john’, (‘up’, ‘down’)]
 Mutable ordered sequence of items of
mixed types
Similar Syntax
 All three sequence types (tuples,
strings, and lists) share much of the
same syntax and functionality.
 Key difference:
• Tuples and strings are immutable
• Lists are mutable
 The operations shown in this section
can be applied to all sequence types
• most examples will just show the
operation performed on one
Sequence Types 1

 Define tuples using parentheses and commas


>>> tu = (23, ‘abc’, 4.56, (2,3), ‘def’)
 Define lists are using square brackets and
commas
>>> li = [“abc”, 34, 4.34, 23]
 Define strings using quotes (“, ‘, or “““).
>>> st = “Hello World”
>>> st = ‘Hello World’
>>> st = “““This is a multi-line
string that uses triple quotes.”””
Sequence Types 2
 Access individual members of a tuple, list, or
string using square bracket “array” notation
 Note that all are 0 based…
>>> tu = (23, ‘abc’, 4.56, (2,3), ‘def’)
>>> tu[1] # Second item in the tuple.
‘abc’
>>> li = [“abc”, 34, 4.34, 23]
>>> li[1] # Second item in the list.
34
>>> st = “Hello World”
>>> st[1] # Second character in string.
‘e’
Positive and negative indices

>>> t = (23, ‘abc’, 4.56, (2,3), ‘def’)


Positive index: count from the left, starting with 0
>>> t[1]
‘abc’
Negative index: count from right, starting with –1
>>> t[-3]
4.56
The ‘in’ Operator
 Boolean test whether a value is inside a container:
>>> t = [1, 2, 4, 5]
>>> 3 in t
False
>>> 4 in t
True
>>> 4 not in t
False
 For strings, tests for substrings
>>> a = 'abcde'
>>> 'c' in a
True
>>> 'cd' in a
True
>>> 'ac' in a
False
 Be careful: the in keyword is also used in the syntax
of for loops and list comprehensions
The + Operator
The + operator produces a new tuple, list, or
string whose value is the concatenation of its
arguments.

>>> (1, 2, 3) + (4, 5, 6)


(1, 2, 3, 4, 5, 6)

>>> [1, 2, 3] + [4, 5, 6]


[1, 2, 3, 4, 5, 6]

>>> “Hello” + “ ” + “World”


‘Hello World’
The * Operator
 The * operator produces a new tuple, list, or
string that “repeats” the original content.
>>> (1, 2, 3) * 3
(1, 2, 3, 1, 2, 3, 1, 2, 3)

>>> [1, 2, 3] * 3
[1, 2, 3, 1, 2, 3, 1, 2, 3]

>>> “Hello” * 3
‘HelloHelloHello’
Mutability:
Tuples vs. Lists
Lists are mutable

>>> li = [‘abc’, 23, 4.34, 23]


>>> li[1] = 45
>>> li
[‘abc’, 45, 4.34, 23]
 We can change lists in place.
 Name li still points to the same memory
reference when we’re done.
Tuples are immutable
>>> t = (23, ‘abc’, 4.56, (2,3), ‘def’)
>>> t[2] = 3.14
Traceback (most recent call last):
File "<pyshell#75>", line 1, in -toplevel-
tu[2] = 3.14
TypeError: object doesn't support item assignment

 You can’t change a tuple.


 You can make a fresh tuple and assign its
reference to a previously used name.
>>> t = (23, ‘abc’, 3.14, (2,3), ‘def’)
 The immutability of tuples means they’re faster
than lists.
Operations on Lists Only

>>> li = [1, 11, 3, 4, 5]

>>> li.append(‘a’) # Note the method


syntax
>>> li
[1, 11, 3, 4, 5, ‘a’]

>>> li.insert(2, ‘i’)


>>>li
[1, 11, ‘i’, 3, 4, 5, ‘a’]
The extend method vs +
 + creates a fresh list with a new memory ref
 extend operates on list li in place.
>>> li.extend([9, 8, 7])
>>> li
[1, 2, ‘i’, 3, 4, 5, ‘a’, 9, 8, 7]
 Potentially confusing:
• extend takes a list as an argument.
• append takes a singleton as an argument.
>>> li.append([10, 11, 12])
>>> li
[1, 2, ‘i’, 3, 4, 5, ‘a’, 9, 8, 7, [10,
11, 12]]
Operations on Lists Only
Lists have many methods, including index, count,
remove, reverse, sort
>>> li = [‘a’, ‘b’, ‘c’, ‘b’]
>>> li.index(‘b’) # index of 1st occurrence
1
>>> li.count(‘b’) # number of occurrences
2
>>> li.remove(‘b’) # remove 1st occurrence
>>> li
[‘a’, ‘c’, ‘b’]
Operations on Lists Only
>>> li = [5, 2, 6, 8]

>>> li.reverse() # reverse the list *in place*


>>> li
[8, 6, 2, 5]

>>> li.sort() # sort the list *in place*


>>> li
[2, 5, 6, 8]

>>> li.sort(some_function)
# sort in place using user-defined comparison
Tuple details
 The comma is the tuple creation operator, not parens
>>> 1,
(1,)
 Python shows parens for clarity (best practice)
>>> (1,)
(1,)
 Don't forget the comma!
>>> (1)
1
 Trailing comma only required for singletons others
 Empty tuples have a special syntactic form
>>> ()
()
>>> tuple()
()
Summary: Tuples vs. Lists
 Lists slower but more powerful than tuples
• Lists can be modified, and they have lots of
handy operations and mehtods
• Tuples are immutable and have fewer
features
 To convert between tuples and lists use the
list() and tuple() functions:
li = list(tu)
tu = tuple(li)
Dictionaries
 Hash tables, "associative arrays"
—d = {"duck": "eend", "water": "water"}
 Lookup:
—d["duck"] -> "eend"
—d["back"] # raises KeyError exception
 Delete, insert, overwrite:
—del d["water"] # {"duck": "eend", "back": "rug"}
—d["back"] = "rug" # {"duck": "eend", "back":
"rug"}
—d["duck"] = "duik" # {"duck": "duik", "back":
"rug"}
Conditional Branching
 if and else
if variable == condition:
#do something based on v == c
else:
#do something based on v != c
 elif allows for additional branching
if condition:
elif another condition:

else: #none of the above
Looping with For
 For allows you to loop over a block of
code a set number of times
 For is great for manipulating lists:
a = ['cat', 'window', 'defenestrate']
for x in a:
print x, len(x)
Results:
cat 3
window 6
defenestrate 12
Looping with For
 We could use a for loop to perform
geoprocessing tasks on each layer in a list
 We could get a list of features in a feature class
and loop over each, checking attributes
 Anything in a sequence or list can be used in a
For loop
 Just be sure not to modify the list while looping
URLs
 http://www.python.org
• official site
 http://starship.python.net
• Community
 http://www.python.org/psa/bookstore/
• (alias for http://www.amk.ca/bookstore/)
• Python Bookstore
Further Reading

 Learning Python: Lutz, Ascher (O'Reilly '98)


 Python Essential Reference: Beazley (New Riders '99)
 Programming Python, 2nd Ed.: Lutz (O'Reilly '01)
 Core Python Programming: Chun (Prentice-Hall '00)
 The Quick Python Book: Harms, McDonald (Manning '99)
 The Standard Python Library: Lundh (O'Reilly '01)
 Python and Tkinter Programming: Grayson (Manning '00)
 Python Programming on Win32:
Hammond, Robinson (O'Reilly '00)
 Learn to Program Using Python: Gauld (Addison-W. '00)
 And many more titles...
What is Machine Learning?
• The capability of Artificial Intelligence systems
to learn by extracting patterns from data is
known as Machine Learning.
• Machine Learning is an idea to learn from
examples and experience, without being
explicitly programmed. Instead of writing code,
you feed data to the generic algorithm, and it
builds logic based on the data given.
Introduction to Machine Learning
• Python is a popular platform used for research
and development of production systems. It is
a vast language with number of modules,
packages and libraries that provides multiple
ways of achieving a task.
• Python and its libraries like NumPy, Pandas,
SciPy, Scikit-Learn, Matplotlib are used in data
science and data analysis. They are also
extensively used for creating scalable machine
learning algorithms.
• Python implements popular machine learning
techniques such as Classification, Regression,
Recommendation, and Clustering.
• Python offers ready-made framework for
performing data mining tasks on large
volumes of data effectively in lesser time
What is Machine Learning?

• Data science, machine learning and artificial


intelligence are some of the top trending
topics in the tech world today. Data mining
and Bayesian analysis are trending and this is
adding the demand for machine learning.
• Machine Learning
– Study of algorithms that improve their
performance at some task with experience
• Machine learning is a discipline that deals with programming
the systems so as to make them automatically learn and
improve with experience.
• Here, learning implies recognizing and understanding the input
data and taking informed decisions based on the supplied data.
• It is very difficult to consider all the decisions based on all
possible inputs. To solve this problem, algorithms are
developed that build knowledge from a specific data and past
experience by applying the principles of statistical science,
probability, logic, mathematical optimization, reinforcement
learning, and control theory
ML
Machine Learning (ML) is an automated learning
with little or no human intervention. It involves
programming computers so that they learn from
the available inputs. The main purpose of
machine learning is to explore and construct
algorithms that can learn from the previous data
and make predictions on new input data.
Growth of Machine Learning
• Machine learning is preferred approach to
– Speech recognition, Natural language processing
– Computer vision
– Medical outcomes analysis
– Robot control
– Computational biology
• This trend is accelerating
– Improved machine learning algorithms
– Improved data capture, networking, faster computers
– Software too complex to write by hand
– New sensors / IO devices
– Demand for self-customization to user, environment
– It turns out to be difficult to extract knowledge from human expertsfailure of
expert systems in the 1980’s.
Applications of Machine Learning Algorithms
• The developed machine learning algorithms are used in various
applications such as:
Web search Vision processing
Computational biology Language processing
Finance
E-commerce Forecasting things like
Space exploration stock market trends,
Robotics weather
Information extraction Pattern recognition
Social networks
Debugging Games
Data mining [Your favorite area]
Expert systems
Robotics
Benefits of Machine Learning

• Powerful Processing
• Better Decision Making & Prediction
• Quicker Processing
• Accurate
• Affordable Data Management
• Inexpensive
• Analyzing Complex Big Data
Steps Involved in Machine Learning

• A machine learning project involves the


following steps:
– Defining a Problem
– Preparing Data
– Evaluating Algorithms
– Improving Results
– Presenting Results
Magic?
No, more like gardening

• Seeds = Algorithms
• Nutrients = Data
• Gardener = You
• Plants = Programs
So what the machine learning is…

• Automating automation
• Getting computers to program themselves
• Writing software is the bottleneck
• Let the data do the work instead!
Machine Learning Techniques
Given below are some techniques in this Machine
Learning tutorial.
•Classification
•Categorization
•Clustering
•Trend analysis
•Anomaly detection
•Visualization
•Decision making
ML in a Nutshell
• Machine Learning is a sub-set of Artificial
Intelligence where computer algorithms are used to
autonomously learn from data and
information. Machine learning computers can
change and improve their algorithms all by
themselves.
• Tens of thousands of machine learning algorithms
• Every machine learning algorithm has three
components:
– Representation
– Evaluation
– Optimization
Representation
• Decision trees
• Sets of rules / Logic programs
• Instances
• Graphical models
• Neural networks
• Support vector machines (SVM)
• Model ensembles
etc………
Evaluation
• Accuracy
• Precision and recall
• Squared error
• Likelihood
• Posterior probability
• Cost / Utility
• Margin
• Entropy
• K-L divergence
• Etc.
Optimization
• Combinatorial optimization
– E.g.: Greedy search
• Convex optimization
– E.g.: Gradient descent
• Constrained optimization
– E.g.: Linear programming
Features of Machine Learning
Let us look at some of the features of Machine
Learning.
•Machine Learning is computing-intensive and
generally requires a large amount of training data.
•It involves repetitive training to improve the
learning and decision making of algorithms.
•As more data gets added, Machine Learning
training can be automated for learning new data
patterns and adapting its algorithm.
Machine Learning Algorithms

• Machine Learning can learn from labeled data


(known as supervised learning) or unlabelled
data (known as unsupervised learning).
• Machine Learning algorithms involving
unlabelled data, or unsupervised learning, are
more complicated than those with the labeled
data or supervised learning
• Machine Learning algorithms can be used to
make decisions in subjective areas as well.
Examples
• Logistic Regression can be used to predict which
party will win at the ballots.
• Naïve Bayes algorithm can separate valid emails
from spam.
• Face detection: Identify faces in images (or indicate
if a face is present).
• Email filtering: Classify emails into spam and not-
spam.
• Medical diagnosis: Diagnose a patient as a sufferer
or non-sufferer of some disease.
• Weather prediction: Predict, for instance, whether
or not it will rain tomorrow.
Concepts of Learning
• Learning is the process of converting
experience into expertise or knowledge.
• Learning can be broadly classified into three
categories, as mentioned below, based on the
nature of the learning data and interaction
between the learner and the environment.
• Supervised Learning
• Unsupervised Learning
• Semi-supervised learning
Types of Learning
• Supervised (inductive) learning
– Training data includes desired outputs
• Unsupervised learning
– Training data does not include desired outputs
• Semi-supervised learning
– Training data includes a few desired outputs
• Reinforcement learning
– Rewards from sequence of actions
• Similarly, there are four categories of machine
learning algorithms as shown below:
• Supervised learning algorithm
• Unsupervised learning algorithm
• Semi-supervised learning algorithm
• Reinforcement learning algorithm
Supervised Learning
• A majority of practical machine learning uses
supervised learning.
• In supervised learning, the system tries to learn
from the previous examples that are given. (On the
other hand, in unsupervised learning, the system
attempts to find the patterns directly from the
example given.)
• Speaking mathematically, supervised learning is
where you have both input variables (x) and output
variables(Y) and can use an algorithm to derive the
mapping function from the input to the output.
• The mapping function is expressed as Y = f(X).
• When an algorithm learns from example data
and associated target responses that can consist
of numeric values or string labels, such as classes
or tags, in order to later predict the correct
response when posed with new examples comes
under the category of Supervised learning.
• This approach is indeed similar to human learning
under the supervision of a teacher. The teacher
provides good examples for the student to
memorize, and the student then derives general
rules from these specific examples.
K. Anvesh, Dept. of IT
Categories of Supervised learning
• Supervised learning problems can be further
divided into two parts, namely classification, and
regression.
• Classification: A classification problem is when
the output variable is a category or a group, such
as “black” or “white” or “spam” and “no spam”.
• Regression: A regression problem is when the
output variable is a real value, such as “Rupees”
or “height.”
Unsupervised Learning
• In unsupervised learning, the algorithms are left
to themselves to discover interesting structures
in the data.
• Mathematically, unsupervised learning is when
you only have input data (X) and no
corresponding output variables.
• This is called unsupervised learning because
unlike supervised learning above, there are no
given correct answers and the machine itself
finds the answers.
• Unsupervised learning is used to detect anomalies,
outliers, such as fraud or defective equipment, or
to group customers with similar behaviours for a
sales campaign. It is the opposite of supervised
learning. There is no labelled data here.
• When learning data contains only some indications
without any description or labels, it is up to the
coder or to the algorithm to find the structure of
the underlying data, to discover hidden patterns,
or to determine how to describe the data. This
kind of learning data is called unlabeled data.
Categories of Unsupervised learning
• Unsupervised learning problems can be further
divided into association and clustering
problems.
• Association: An association rule learning problem
is where you want to discover rules that describe
large portions of your data, such as “people that
buy X also tend to buy Y”.
• Clustering: A clustering problem is where you
want to discover the inherent groupings in the
data, such as grouping customers by purchasing
behaviour.
Reinforcement Learning
• A computer program will interact with a dynamic
environment in which it must perform a
particular goal (such as playing a game with an
opponent or driving a car). The program is
provided feedback in terms of rewards and
punishments as it navigates its problem space.
• Using this algorithm, the machine is trained to
make specific decisions. It works this way: the
machine is exposed to an environment where it
continuously trains itself using trial and error
method.
• Here learning data gives feedback so that the
system adjusts to dynamic conditions in order
to achieve a certain objective. The system
evaluates its performance based on the
feedback responses and reacts accordingly.
The best known instances include self-driving
cars and chess master algorithm AlphaGo.
Semi-supervised learning
• If some learning samples are labeled, but some other
are not labelled, then it is semi- supervised learning. It
makes use of a large amount of unlabeled data for
training and a small amount of labelled data for
testing. Semi-supervised learning is applied in cases
where it is expensive to acquire a fully labelled dataset
while more practical to label a small subset.
• For example, it often requires skilled experts to
label certain remote sensing images, and lots of field
experiments to locate oil at a particular location, while
acquiring unlabeled data is relatively easy.
• Here an incomplete training signal is given: a
training set with some (often many) of the
target outputs missing. There is a special case
of this principle known as Transduction where
the entire set of problem instances is known
at learning time, except that part of the
targets are missing.
Categorizing on the basis of required Output
Another categorization of machine learning tasks arises
when one considers the desired output of a machine-
learned system:
•Classification : When inputs are divided into two or more
classes, and the learner must produce a model that assigns
unseen inputs to one or more (multi-label classification) of
these classes. This is typically tackled in a supervised way.
Spam filtering is an example of classification, where the
inputs are email (or other) messages and the classes are
“spam” and “not spam”.
•Regression : Which is also a supervised problem, A case
when the outputs are continuous rather than discrete.
•Clustering : When a set of inputs is to be divided into
groups. Unlike in classification, the groups are not known
beforehand, making this typically an unsupervised task.
Libraries and Packages
• To understand machine learning, you need to have basic
knowledge of Python programming. In addition, there are a
number of libraries and packages generally used in
performing various machine learning tasks as listed below:

– numpy - is used for its N-dimensional array objects


– pandas – is a data analysis library that includes dataframes
– matplotlib – is 2D plotting library for creating graphs and plots
– scikit-learn - the algorithms used for data analysis and data mining
tasks
– seaborn – a data visualization library based on matplotlib

You might also like