Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
2 views

Python Presentation 3

The document discusses various third-party packages for numerical and scientific computing in Python, focusing on the SciPy stack, which includes NumPy, SciPy, Matplotlib, IPython, Pandas, and SymPy. It highlights the features of NumPy, such as its powerful n-dimensional array object, various data types, and array creation methods, as well as its efficient mathematical operations. The document also touches on linear algebra capabilities and the differences between NumPy arrays and standard Python sequences.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Python Presentation 3

The document discusses various third-party packages for numerical and scientific computing in Python, focusing on the SciPy stack, which includes NumPy, SciPy, Matplotlib, IPython, Pandas, and SymPy. It highlights the features of NumPy, such as its powerful n-dimensional array object, various data types, and array creation methods, as well as its efficient mathematical operations. The document also touches on linear algebra capabilities and the differences between NumPy arrays and standard Python sequences.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

NUMERIC AND SCIENTIFIC

APPLICATIONS
As you might expect, there are a number of third-party packages available for
numerical and scientific computing that extend Python’s basic math module.
These include:
• NumPy/SciPy – numerical and scientific function libraries.
• Numba – Python compiler that support JIT compilation.
• ALGLIB – numerical analysis library.
• Pandas – high-performance data structures and data analysis tools.
• PyGSL – Python interface for GNU Scientific Library.
• ScientificPython – collection of scientific computing modules.
SCIPY AND FRIENDS
By far, the most commonly used packages are those in the SciPy stack.
We will focus on these in this class. These packages include:
• NumPy
• SciPy
• Matplotlib – plotting library.
• IPython – interactive computing.
• Pandas – data analysis library.
• SymPy – symbolic computation library.
NUMPY
Let’s start with NumPy. Among other things, NumPy contains:
• A powerful N-dimensional array object.
• Sophisticated (broadcasting/universal) functions.
• Tools for integrating C/C++ and Fortran code.
• Useful linear algebra, Fourier transform, and random number
capabilities.
Besides its obvious scientific uses, NumPy can also be used as an
efficient multi-dimensional container of generic data.
NUMPY
The key to NumPy is the ndarray object, an n-dimensional array of
homogeneous data types, with many operations being performed in
compiled code for performance. There are several important differences
between NumPy arrays and the standard Python sequences:
• NumPy arrays have a fixed size. Modifying the size means creating a
new array.
• NumPy arrays must be of the same data type, but this can include
Python objects.
• More efficient mathematical operations than built-in sequence types.
NUMPY DATATYPES
To begin, NumPy supports a wider variety of data types than are built-in to the
Python language by default. They are defined by the numpy.dtype class and
include:
• intc (same as a C integer) and intp (used for indexing)
• int8, int16, int32, int64
• uint8, uint16, uint32, uint64
• float16, float32, float64
• complex64, complex128
• bool_, int_, float_, complex_ are shorthand for defaults.
These can be used as functions to cast literals or sequence types, as well as
arguments to numpy functions that accept the dtype keyword argument.
NUMPY DATATYPES
Some examples:
>>> import numpy as np
>>> x = np.float32(1.0)
>>> x
1.0
>>> y = np.int_([1,2,4])
>>> y
array([1, 2, 4])
>>> z = np.arange(3, dtype=np.uint8)
>>> z
array([0, 1, 2], dtype=uint8)
>>> z.dtype
dtype('uint8')
NUMPY ARRAYS
There are a couple of mechanisms for creating arrays in NumPy:
• Conversion from other Python structures (e.g., lists, tuples).
• Built-in NumPy array creation (e.g., arange, ones, zeros, etc.).
• Reading arrays from disk, either from standard or custom formats (e.g.
reading in from a CSV file).
• and others …
NUMPY ARRAYS
In general, any numerical data that is stored in an array-like container can
be converted to an ndarray through use of the array() function. The most
obvious examples are sequence types like lists and tuples.

>>> x = np.array([2,3,1,0])
>>> x = np.array([2, 3, 1, 0])
>>> x = np.array([[1,2.0],[0,0],(1+1j,3.)])
>>> x = np.array([[ 1.+0.j, 2.+0.j], [ 0.+0.j, 0.+0.j], [ 1.+1.j,
3.+0.j]])
NUMPY ARRAYS
There are a couple of built-in NumPy functions which will create arrays
from scratch.
• zeros(shape) -- creates an array
>>> filled with 03))
np.zeros((2, values with the specified
shape. The default dtype is float64. 0., 0., 0.], [ 0., 0., 0.]])
array([[

• ones(shape) -- creates an array filled with 1 values.


• arange()
>>> np.arange(10)
-- creates arrays with regularly incrementing values.
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> np.arange(2, 10, dtype=np.float)
array([ 2., 3., 4., 5., 6., 7., 8., 9.])
>>> np.arange(2, 3, 0.1)
array([ 2. , 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9])
NUMPY ARRAYS
• linspace() -- creates arrays with a specified number of elements, and
spaced equally between the specified beginning and end values.

>>> np.linspace(1., 4., 6)


array([ 1. , 1.6, 2.2, 2.8, 3.4, 4. ])

• random.random(shape) – creates arrays with random floats over the


interval [0,1).
>>> np.random.random((2,3))
array([[ 0.75688597, 0.41759916, 0.35007419],
[ 0.77164187, 0.05869089, 0.98792864]])
NUMPY ARRAYS >>> import numpy as np
>>> a = np.arange(3)
>>> print a
[0 1 2]
>>> a
Printing an array can be array([0, 1, 2])
done with the print >>> b = np.arange(9).reshape(3,3)
statement. >>> print b
[[0 1 2]
[3 4 5]
[6 7 8]]
>>> c = np.arange(8).reshape(2,2,2)
>>> print c
[[[0 1]
[2 3]]

[[4 5]
[6 7]]]
INDEXING
Single-dimension indexing is accomplished as usual.
>>> x = np.arange(10)
>>> x[2]
2 0 1 2 3 4 5 6 7 8 9
>>> x[-2]
8
Multi-dimensional arrays support multi-dimensional indexing.
>>> x.shape = (2,5) # now x is 2-dimensional
>>> x[1,3]
0 1 2 3 4
8
5 6 7 8 9
>>> x[1,-1]
9
INDEXING
Using fewer dimensions to index will result in a subarray.
>>> x[0]
array([0, 1, 2, 3, 4])

This means that x[i, j] == x[i][j] but the second method is less
efficient.
INDEXING
Slicing is possible just as it is for typical Python sequences.
>>> x = np.arange(10)
>>> x[2:5]
array([2, 3, 4])
>>> x[:-7]
array([0, 1, 2])
>>> x[1:7:2]
array([1, 3, 5])
>>> y = np.arange(35).reshape(5,7)
>>> y[1:5:2,::3]
array([[ 7, 10, 13], [21, 24, 27]])
ARRAY OPERATIONS
>>> a = np.arange(5) Basic operations apply element-wise.
>>> b = np.arange(5) The result is a new array with the
>>> a+b resultant elements.
array([0, 2, 4, 6, 8])
>>> a-b Operations like *= and += will modify
array([0, 0, 0, 0, 0]) the existing array.
>>> a**2
array([ 0, 1, 4, 9, 16])
>>> a>3
array([False, False, False, False, True], dtype=bool)
>>> 10*np.sin(a)
array([ 0., 8.41470985, 9.09297427, 1.41120008, -
7.56802495])
>>> a*b
array([ 0, 1, 4, 9, 16])
ARRAY OPERATIONS
>>> a = np.zeros(4).reshape(2,2)
>>> a
Since multiplication is done array([[ 0., 0.],
element-wise, you need to [ 0., 0.]])
specifically perform a dot >>> a[0,0] = 1
product to perform matrix >>> a[1,1] = 1
>>> b = np.arange(4).reshape(2,2)
multiplication.
>>> b
array([[0, 1],
[2, 3]])
>>> a*b
array([[ 0., 0.],
[ 0., 3.]])
>>> np.dot(a,b)
array([[ 0., 1.],
[ 2., 3.]])
ARRAY OPERATIONS
There are also some built-in
methods of ndarray objects. >>> a = np.random.random((2,3))
>>> a
Universal functions which array([[ 0.68166391, 0.98943098, 0.69361582],
[ 0.78888081, 0.62197125, 0.40517936]])
may also be applied
>>> a.sum()
include exp, sqrt, add, sin, 4.1807421388722164
cos, etc… >>> a.min()
0.4051793610379143
>>> a.max(axis=0)
array([ 0.78888081, 0.98943098, 0.69361582])
>>> a.min(axis=1)
array([ 0.68166391, 0.40517936])
ARRAY OPERATIONS
>>> a = np.floor(10*np.random.random((3,4)))
>>> print a
[[ 9. 8. 7. 9.]
An array shape can be [ 7. 5. 9. 7.]
manipulated by a number [ 8. 2. 7. 5.]]
>>> a.shape
of methods. (3, 4)
>>> a.ravel()
resize(size) will modify an array([ 9., 8., 7., 9., 7., 5., 9., 7., 8., 2., 7., 5.])
array in place. >>> a.shape = (6,2)
>>> print a
[[ 9. 8.]
reshape(size) will return a [ 7. 9.]
copy of the array with a [ 7. 5.]
[ 9. 7.]
new shape. [ 8. 2.]
[ 7. 5.]]
>>> a.transpose()
array([[ 9., 7., 7., 9., 8., 7.],
[ 8., 9., 5., 7., 2., 5.]])
LINEAR ALGEBRA
>>> from numpy import *
One of the most common reasons for>>> from numpy.linalg import *
using the NumPy package is its linear>>> a = array([[1.0, 2.0], [3.0, 4.0]])
algebra module. >>> print a
[[ 1. 2.]
[ 3. 4.]]
>>> a.transpose()
array([[ 1., 3.],
[ 2., 4.]])
>>> inv(a) # inverse
array([[-2. , 1. ],
[ 1.5, -0.5]])
LINEAR ALGEBRA
>>> u = eye(2) # unit 2x2 matrix; "eye" represents "I"
>>> u
array([[ 1., 0.],
[ 0., 1.]])
>>> j = array([[0.0, -1.0], [1.0, 0.0]])
>>> dot(j, j) # matrix product
array([[-1., 0.],
[ 0., -1.]])
>>> trace(u) # trace
2.0
>>> y = array([[5.], [7.]])
>>> solve(a, y) # solve linear matrix equation
array([[-3.],
[ 4.]])
>>> eig(j) # get eigenvalues/eigenvectors of matrix
(array([ 0.+1.j, 0.-1.j]),
array([[ 0.70710678+0.j, 0.70710678+0.j],
[ 0.00000000-0.70710678j, 0.00000000+0.70710678j]]))
MATRICES >>> A = matrix('1.0 2.0; 3.0 4.0')
>>> A
[[ 1. 2.]
[ 3. 4.]]
There is also a matrix class which >>> type(A)
inherits from the ndarray class. <class
'numpy.matrixlib.defmatrix.matrix'> >>>
There are some slight differences butA.T # transpose
[[ 1. 3.]
matrices are very similar to general [ 2. 4.]]
arrays. >>> X = matrix('5.0 7.0')
>>> Y = X.T
In NumPy’s own words, the question>>> of print A*Y # matrix multiplication
whether to use arrays or matrices comes
[[19.]
down to the short answer of “use arrays”.
[43.]]
>>> print A.I # inverse
[[-2. 1. ]
[ 1.5 -0.5]]
>>> solve(A, Y) # solving linear
MATLAB STYLE FOR
DEFINING MATRIX
>>> B = np.matrix("1,2; 3,4; 5,6")
>>> B matrix([[ 1, 2],
[ 3, 4],
[ 5, 6]])
DIFFERENCE BETWEEN NUMPY DOT()
AND INNER()
>>> a=np.array([[1,2],[3,4]])
>>> b=np.array([[11,12],[13,14]])
>>> np.dot(a,b)
array([[37, 40],
[85, 92]])
>>> np.inner(a,b)
array([[35, 41],
[81, 95]])
With dot():

With inner():
SOLVING EQUATIONS
>>> import numpy as np
>>> from numpy.linalg import solve
>>> A = np.array([[4,5],[6,-3]])
>>> A
array([[4, 5],
[6, -3]])
>>> b = np.array([23, 3])
>>> x = solve(A,b)
>>> x
array([ 2, 3])
EIGEN VALUES & EIGEN
VECTORS
The eig returns two tuples: the first one is the eigen values
and the second one is a matrix whose columns are the two
eigen vectors.

>>> import numpy as np


>>> from numpy.linalg import eig
>>> A = np.array([[1,2],[3,4]])
>>> eig(A)
(array([-0.37228132, 5.37228132]), array([[-0.82456484, -
0.41597356], [ 0.56576746, -0.90937671]]))
NUMPY DOCS
There is a very nice table of NumPy equivalent operations for MATLAB
users. However, even if you do not know MATLAB, this is a pretty handy
overview of NumPy functionality.

There is also a pretty comprehensive list of example usage of all the


NumPy functions here.
SCIPY
Now we move on to SciPy. In it’s own words:
SciPy is a collection of mathematical algorithms and
convenience functions built on the Numpy extension of
Python. It adds significant power to the interactive
Python session by providing the user with high-level
commands and classes for manipulating and
visualizing data. With SciPy an interactive Python
session becomes a data-processing and system-
prototyping environment rivaling sytems such as
Basically, MATLAB,
SciPy IDL, Octave,
contains variousR-Lab, and
tools SciLab.
and functions for solving common
problems in scientific computing.
SCIPY
SciPy’s functionality is implemented in a number of specific sub-modules. These
include:
Special mathematical functions (scipy.special) -- airy, elliptic, bessel, etc.
Integration (scipy.integrate)
Optimization (scipy.optimize)
Interpolation (scipy.interpolate)
Fourier Transforms (scipy.fftpack)
Signal Processing (scipy.signal)
Linear Algebra (scipy.linalg)
Compressed Sparse Graph Routines (scipy.sparse.csgraph)
Spatial data structures and algorithms (scipy.spatial)
Statistics (scipy.stats)
Multidimensional image processing (scipy.ndimage)
Data IO (scipy.io)
Weave (scipy.weave)
and more!
SCIPY
We can’t possibly tour all of the SciPy library and, even if we did, it might
be a little boring. So let’s just look at some example modules with SciPy
to see how it can be used in a Python program.

Let’s start with a simple little integration example.


Say we wanted to compute the following:

Obviously, the first place we should look is


scipy.integrate!
SCIPY.INTEGRATE
Methods for Integrating Functions given a function object:
quad -- General purpose integration.
dblquad -- General purpose double integration.
tplquad -- General purpose triple integration.
fixed_quad -- Integrate func(x) using Gaussian quadrature of order n.
quadrature -- Integrate with given tolerance using Gaussian quadrature.
romberg -- Integrate func using Romberg integration.
Methods for Integrating Functions given a fixed set of samples:
trapz -- Use trapezoidal rule to compute integral from samples.
cumtrapz -- Use trapezoidal rule to cumulatively compute integral.
simps -- Use Simpson's rule to compute integral from samples.
romb -- Use Romberg Integration to compute integral from (2**k + 1) evenly-spaced
samples.
SCIPY.INTEGRATE

>>> result = scipy.integrate.quad(np.sin, 0, np.pi)


>>> print result
(2.0, 2.220446049250313e-14) # 2 with a very small error
margin!
>>> result = scipy.integrate.quad(np.sin, -np.inf, +np.inf)
>>> print result
(0.0, 0.0) # Integral does not converge
SCIPY.INTEGRATE

Let’s say that we don’t have a function object, we only have some (x,y) samples that “define” our functi
We can estimate the integral using the trapezoidal rule.

>>> sample_x = np.linspace(0, np.pi, 1000)


>>> sample_y = np.sin(sample_x) # Creating 1,000 samples
>>> result = scipy.integrate.trapz(sample_y, sample_x)
>>> print result
1.99999835177
>>> sample_x = np.linspace(0, np.pi, 1000000)
>>> sample_y = np.sin(sample_x) # Creating 1,000,000
samples
>>> result = scipy.integrate.trapz(sample_y, sample_x)
>>> print result
2.0
PLOTTING
Before we can look at some more sophisticated examples, we need to get
some plotting under our belt.

We’ll start the next lecture by introducing the matplotlib plotting package
and see how we can build more complex scientific applications.
PLOT
import matplotlib.pyplot as plt

xs = [1,2,3,4,5]
ys = [x**2 for x in xs]

plt.plot(xs, ys)

no return value?

• We are operating on a “hidden” variable representing the figure.


• This is a terrible, terrible trick.
• Its only purpose is to pander to MATLAB users.
• I’ll show you how this works in the next lecture
import matplotlib.pyplot as plt

xs = range(-100,100,10)
x2 = [x**2 for x in xs]
negx2 = [-x**2 for x in xs]

plt.plot(xs, x2)
plt.plot(xs, negx2)
plt.xlabel("x”)
plt.ylabel("y”) Incrementally
plt.ylim(-2000, 2000) modify the figure.
plt.axhline(0) # horiz line
plt.axvline(0) # vert line
plt.savefig(“quad.png”)
plt.show() Save your figure to a
file
Show it on the screen
from pylab import *
labels = ["Baseline", "System"]
data = [3.75, 4.75]
yerror = [0.3497, 0.3108]
xerror = [0.2, 0.2]
xlocations = array(range(len(data)))+0.5
width = 0.5
csize = 10
ec = 'r'
bar(xlocations, data, yerr=yerror, width=width,
xerr=xerror, capsize=csize, ecolor=ec)
yticks(range(0, 8))
xticks(xlocations+ width/2, labels)
xlim(0, xlocations[-1]+width*2)
title("Average Ratings on the Training Set")
savefig('bar')
HISTOGRAMS
hist(x, bins=n )
Computes and draws a histogram
x: a sequence of numbers (usually with many repetitions)
If keyword argument bins is an integer, it’s the number of (equally
spaced) bins
 Default is 10

from pylab import *


import numpy
x = numpy.random.normal(2, 0.5, 1000)
hist(x, bins=50)
savefig('my_hist')
SCATTER PLOTS
scatter(x, y )
x and y are arrays of numbers of the same length, N
Makes a scatter plot of x vs. y
from pylab import *
N = 20
x = 0.9*rand(N)
y = 0.9*rand(N)
scatter(x,y)
savefig('scatter_dem')
Keyword Arguments
s
If an integer, size of marks in points2 , i.e., area occupied (default 20)
If an array of length N, gives the sizes of the corresponding elements of
x, y
marker
Symbol marking the points (default = ‘o’)
Same options as for lines that don’t connect the dots
 And a few more
c
A single color or a length-N sequence of colors
from pylab import *
N = 30
x = 0.9*rand(N)
y = 0.9*rand(N)
area = pi * (10 * rand(N))**2 # 0 to 10 point radius
scatter(x,y,s=area, marker='^', c='r')
savefig('scatter_demo')
MULTIPLE SUBPLOTS
Functions axes() and subplot() both used to create axes
 subplot() used more often

subplot(numRows, numCols, plotNum)


 Creates axes in a regular grid of axes numRows by numCols
 plotNum becomes the current subplot
 Subplots numbered top-down, left-to-right
 1 is the 1st number

Commas can be omitted if all numbers are single digit


Subsequent subplot() calls must be consistent on the numbers or
rows and columns
subplot() returns a Subplot instance
 Subplot is derived from Axes

Can call subplot() and axes() in a subplot


 E.g., have multiple axes in 1 subplot

See the Tutorial for subplots sharing axis tick labels


from pylab import *
subplot(221)
plot([1,2,3])
subplot(222)
plot([2,3,4])
subplot(223)
plot([1,2,3], 'r--')
subplot(224)
plot([2,3,4], 'r--')
subplot(221)
plot([3,2,1], 'r--')
savefig('subplot')

You might also like