Python Presentation 3
Python Presentation 3
APPLICATIONS
As you might expect, there are a number of third-party packages available for
numerical and scientific computing that extend Python’s basic math module.
These include:
• NumPy/SciPy – numerical and scientific function libraries.
• Numba – Python compiler that support JIT compilation.
• ALGLIB – numerical analysis library.
• Pandas – high-performance data structures and data analysis tools.
• PyGSL – Python interface for GNU Scientific Library.
• ScientificPython – collection of scientific computing modules.
SCIPY AND FRIENDS
By far, the most commonly used packages are those in the SciPy stack.
We will focus on these in this class. These packages include:
• NumPy
• SciPy
• Matplotlib – plotting library.
• IPython – interactive computing.
• Pandas – data analysis library.
• SymPy – symbolic computation library.
NUMPY
Let’s start with NumPy. Among other things, NumPy contains:
• A powerful N-dimensional array object.
• Sophisticated (broadcasting/universal) functions.
• Tools for integrating C/C++ and Fortran code.
• Useful linear algebra, Fourier transform, and random number
capabilities.
Besides its obvious scientific uses, NumPy can also be used as an
efficient multi-dimensional container of generic data.
NUMPY
The key to NumPy is the ndarray object, an n-dimensional array of
homogeneous data types, with many operations being performed in
compiled code for performance. There are several important differences
between NumPy arrays and the standard Python sequences:
• NumPy arrays have a fixed size. Modifying the size means creating a
new array.
• NumPy arrays must be of the same data type, but this can include
Python objects.
• More efficient mathematical operations than built-in sequence types.
NUMPY DATATYPES
To begin, NumPy supports a wider variety of data types than are built-in to the
Python language by default. They are defined by the numpy.dtype class and
include:
• intc (same as a C integer) and intp (used for indexing)
• int8, int16, int32, int64
• uint8, uint16, uint32, uint64
• float16, float32, float64
• complex64, complex128
• bool_, int_, float_, complex_ are shorthand for defaults.
These can be used as functions to cast literals or sequence types, as well as
arguments to numpy functions that accept the dtype keyword argument.
NUMPY DATATYPES
Some examples:
>>> import numpy as np
>>> x = np.float32(1.0)
>>> x
1.0
>>> y = np.int_([1,2,4])
>>> y
array([1, 2, 4])
>>> z = np.arange(3, dtype=np.uint8)
>>> z
array([0, 1, 2], dtype=uint8)
>>> z.dtype
dtype('uint8')
NUMPY ARRAYS
There are a couple of mechanisms for creating arrays in NumPy:
• Conversion from other Python structures (e.g., lists, tuples).
• Built-in NumPy array creation (e.g., arange, ones, zeros, etc.).
• Reading arrays from disk, either from standard or custom formats (e.g.
reading in from a CSV file).
• and others …
NUMPY ARRAYS
In general, any numerical data that is stored in an array-like container can
be converted to an ndarray through use of the array() function. The most
obvious examples are sequence types like lists and tuples.
>>> x = np.array([2,3,1,0])
>>> x = np.array([2, 3, 1, 0])
>>> x = np.array([[1,2.0],[0,0],(1+1j,3.)])
>>> x = np.array([[ 1.+0.j, 2.+0.j], [ 0.+0.j, 0.+0.j], [ 1.+1.j,
3.+0.j]])
NUMPY ARRAYS
There are a couple of built-in NumPy functions which will create arrays
from scratch.
• zeros(shape) -- creates an array
>>> filled with 03))
np.zeros((2, values with the specified
shape. The default dtype is float64. 0., 0., 0.], [ 0., 0., 0.]])
array([[
[[4 5]
[6 7]]]
INDEXING
Single-dimension indexing is accomplished as usual.
>>> x = np.arange(10)
>>> x[2]
2 0 1 2 3 4 5 6 7 8 9
>>> x[-2]
8
Multi-dimensional arrays support multi-dimensional indexing.
>>> x.shape = (2,5) # now x is 2-dimensional
>>> x[1,3]
0 1 2 3 4
8
5 6 7 8 9
>>> x[1,-1]
9
INDEXING
Using fewer dimensions to index will result in a subarray.
>>> x[0]
array([0, 1, 2, 3, 4])
This means that x[i, j] == x[i][j] but the second method is less
efficient.
INDEXING
Slicing is possible just as it is for typical Python sequences.
>>> x = np.arange(10)
>>> x[2:5]
array([2, 3, 4])
>>> x[:-7]
array([0, 1, 2])
>>> x[1:7:2]
array([1, 3, 5])
>>> y = np.arange(35).reshape(5,7)
>>> y[1:5:2,::3]
array([[ 7, 10, 13], [21, 24, 27]])
ARRAY OPERATIONS
>>> a = np.arange(5) Basic operations apply element-wise.
>>> b = np.arange(5) The result is a new array with the
>>> a+b resultant elements.
array([0, 2, 4, 6, 8])
>>> a-b Operations like *= and += will modify
array([0, 0, 0, 0, 0]) the existing array.
>>> a**2
array([ 0, 1, 4, 9, 16])
>>> a>3
array([False, False, False, False, True], dtype=bool)
>>> 10*np.sin(a)
array([ 0., 8.41470985, 9.09297427, 1.41120008, -
7.56802495])
>>> a*b
array([ 0, 1, 4, 9, 16])
ARRAY OPERATIONS
>>> a = np.zeros(4).reshape(2,2)
>>> a
Since multiplication is done array([[ 0., 0.],
element-wise, you need to [ 0., 0.]])
specifically perform a dot >>> a[0,0] = 1
product to perform matrix >>> a[1,1] = 1
>>> b = np.arange(4).reshape(2,2)
multiplication.
>>> b
array([[0, 1],
[2, 3]])
>>> a*b
array([[ 0., 0.],
[ 0., 3.]])
>>> np.dot(a,b)
array([[ 0., 1.],
[ 2., 3.]])
ARRAY OPERATIONS
There are also some built-in
methods of ndarray objects. >>> a = np.random.random((2,3))
>>> a
Universal functions which array([[ 0.68166391, 0.98943098, 0.69361582],
[ 0.78888081, 0.62197125, 0.40517936]])
may also be applied
>>> a.sum()
include exp, sqrt, add, sin, 4.1807421388722164
cos, etc… >>> a.min()
0.4051793610379143
>>> a.max(axis=0)
array([ 0.78888081, 0.98943098, 0.69361582])
>>> a.min(axis=1)
array([ 0.68166391, 0.40517936])
ARRAY OPERATIONS
>>> a = np.floor(10*np.random.random((3,4)))
>>> print a
[[ 9. 8. 7. 9.]
An array shape can be [ 7. 5. 9. 7.]
manipulated by a number [ 8. 2. 7. 5.]]
>>> a.shape
of methods. (3, 4)
>>> a.ravel()
resize(size) will modify an array([ 9., 8., 7., 9., 7., 5., 9., 7., 8., 2., 7., 5.])
array in place. >>> a.shape = (6,2)
>>> print a
[[ 9. 8.]
reshape(size) will return a [ 7. 9.]
copy of the array with a [ 7. 5.]
[ 9. 7.]
new shape. [ 8. 2.]
[ 7. 5.]]
>>> a.transpose()
array([[ 9., 7., 7., 9., 8., 7.],
[ 8., 9., 5., 7., 2., 5.]])
LINEAR ALGEBRA
>>> from numpy import *
One of the most common reasons for>>> from numpy.linalg import *
using the NumPy package is its linear>>> a = array([[1.0, 2.0], [3.0, 4.0]])
algebra module. >>> print a
[[ 1. 2.]
[ 3. 4.]]
>>> a.transpose()
array([[ 1., 3.],
[ 2., 4.]])
>>> inv(a) # inverse
array([[-2. , 1. ],
[ 1.5, -0.5]])
LINEAR ALGEBRA
>>> u = eye(2) # unit 2x2 matrix; "eye" represents "I"
>>> u
array([[ 1., 0.],
[ 0., 1.]])
>>> j = array([[0.0, -1.0], [1.0, 0.0]])
>>> dot(j, j) # matrix product
array([[-1., 0.],
[ 0., -1.]])
>>> trace(u) # trace
2.0
>>> y = array([[5.], [7.]])
>>> solve(a, y) # solve linear matrix equation
array([[-3.],
[ 4.]])
>>> eig(j) # get eigenvalues/eigenvectors of matrix
(array([ 0.+1.j, 0.-1.j]),
array([[ 0.70710678+0.j, 0.70710678+0.j],
[ 0.00000000-0.70710678j, 0.00000000+0.70710678j]]))
MATRICES >>> A = matrix('1.0 2.0; 3.0 4.0')
>>> A
[[ 1. 2.]
[ 3. 4.]]
There is also a matrix class which >>> type(A)
inherits from the ndarray class. <class
'numpy.matrixlib.defmatrix.matrix'> >>>
There are some slight differences butA.T # transpose
[[ 1. 3.]
matrices are very similar to general [ 2. 4.]]
arrays. >>> X = matrix('5.0 7.0')
>>> Y = X.T
In NumPy’s own words, the question>>> of print A*Y # matrix multiplication
whether to use arrays or matrices comes
[[19.]
down to the short answer of “use arrays”.
[43.]]
>>> print A.I # inverse
[[-2. 1. ]
[ 1.5 -0.5]]
>>> solve(A, Y) # solving linear
MATLAB STYLE FOR
DEFINING MATRIX
>>> B = np.matrix("1,2; 3,4; 5,6")
>>> B matrix([[ 1, 2],
[ 3, 4],
[ 5, 6]])
DIFFERENCE BETWEEN NUMPY DOT()
AND INNER()
>>> a=np.array([[1,2],[3,4]])
>>> b=np.array([[11,12],[13,14]])
>>> np.dot(a,b)
array([[37, 40],
[85, 92]])
>>> np.inner(a,b)
array([[35, 41],
[81, 95]])
With dot():
With inner():
SOLVING EQUATIONS
>>> import numpy as np
>>> from numpy.linalg import solve
>>> A = np.array([[4,5],[6,-3]])
>>> A
array([[4, 5],
[6, -3]])
>>> b = np.array([23, 3])
>>> x = solve(A,b)
>>> x
array([ 2, 3])
EIGEN VALUES & EIGEN
VECTORS
The eig returns two tuples: the first one is the eigen values
and the second one is a matrix whose columns are the two
eigen vectors.
Let’s say that we don’t have a function object, we only have some (x,y) samples that “define” our functi
We can estimate the integral using the trapezoidal rule.
We’ll start the next lecture by introducing the matplotlib plotting package
and see how we can build more complex scientific applications.
PLOT
import matplotlib.pyplot as plt
xs = [1,2,3,4,5]
ys = [x**2 for x in xs]
plt.plot(xs, ys)
no return value?
xs = range(-100,100,10)
x2 = [x**2 for x in xs]
negx2 = [-x**2 for x in xs]
plt.plot(xs, x2)
plt.plot(xs, negx2)
plt.xlabel("x”)
plt.ylabel("y”) Incrementally
plt.ylim(-2000, 2000) modify the figure.
plt.axhline(0) # horiz line
plt.axvline(0) # vert line
plt.savefig(“quad.png”)
plt.show() Save your figure to a
file
Show it on the screen
from pylab import *
labels = ["Baseline", "System"]
data = [3.75, 4.75]
yerror = [0.3497, 0.3108]
xerror = [0.2, 0.2]
xlocations = array(range(len(data)))+0.5
width = 0.5
csize = 10
ec = 'r'
bar(xlocations, data, yerr=yerror, width=width,
xerr=xerror, capsize=csize, ecolor=ec)
yticks(range(0, 8))
xticks(xlocations+ width/2, labels)
xlim(0, xlocations[-1]+width*2)
title("Average Ratings on the Training Set")
savefig('bar')
HISTOGRAMS
hist(x, bins=n )
Computes and draws a histogram
x: a sequence of numbers (usually with many repetitions)
If keyword argument bins is an integer, it’s the number of (equally
spaced) bins
Default is 10