Effective Numerical Computation in NumPy and SciPy

Effective Numerical Computation in NumPy and SciPy
Kimikazu Kato
PyCon JP 2014
September 13, 2014
1 / 35

About Myself
Kimikazu Kato
Chief Scientists at Silver Egg Technology Co., Ltd.
Ph.D in Computer Science
Background in Mathematics, Numerical Computation, Algorithms, etc.
<2 year experience in Python
>10 year experience in numerical computation
Now designing algorithms for recommendation system, and doing research
about machine learning and data analysis.
2 / 35

This talk...
is about effective usage of NumPy/SciPy
is NOT exhaustive introduction of capabilities, but shows some case
studies based on my experience and interest
3 / 35

Table of Contents
Introduction
Basics about NumPy
Broadcasting
Indexing
Sparse matrix
Usage of scipy.sparse
Internal structure
Case studies
Conclusion
4 / 35

Numerical Computation
Differential equations
Simulations
Signal processing
Machine Learning
etc...
Why Numerical Computation in Python?
Productivity
Easy to write
Easy to debug
Connectivity with visualization tools
Matplotlib
IPython
Connectivity with web system
Many frameworks (Django, Pyramid, Flask, Bottle, etc.)
5 / 35

But Python is Very Slow!
Code in C
#include <stdio.h>
int main() {
int i; double s=0;
for (i=1; i<=100000000; i++) s+=i;
printf("%.0fn",s);
}
Code in Python
s=0.
for i in xrange(1,100000001):
s+=i
print s
Both of the codes compute the sum of integers from 1 to 100,000,000.
Result of benchmark in a certain environment:
Above: 0.109 sec (compiled with -O3 option)
Below: 8.657 sec
(80+ times slower!!)
6 / 35

Better code
import numpy as np
a=np.arange(1,100000001)
print a.sum()
Now it takes 0.188 sec. (Measured by "time" command in Linux, loading time
included)
Still slower than C, but sufficiently fast as a script language.
7 / 35

Lessons
Python is very slow when written badly
Translate C (or Java, C# etc.) code into Python is often a bad idea.
Python-friendly rewriting sometimes result in drastic performance
improvement
8 / 35

Basic rules for better performance
Avoid for-sentence as far as possible
Utilize libraries' capabilities instead
Forget about the cost of copying memory
Typical C programmer might care about it, but ...
9 / 35

Basic techniques for NumPy
Broadcasting
Indexing
10 / 35

Broadcasting
>>> import numpy as np
>>> a=np.array([0,1,2])
>>> a*3
array([0, 3, 6])
>>> b=np.array([1,4,9])
>>> np.sqrt(b)
array([ 1., 2., 3.])
A function which is applied to each element when applied to an array is called
a universal function.
11 / 35

Broadcasting (2D)
>>> a=np.arange(9).reshape((3,3))
>>> b=np.array([1,2,3])
>>> a
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>> b
array([1, 2, 3])
>>> a*b
array([[ 0, 2, 6],
[ 3, 8, 15],
[ 6, 14, 24]])
12 / 35

Indexing
>>> a=np.arange(10)
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> indices=np.arange(0,10,2)
>>> indices
array([0, 2, 4, 6, 8])
>>> a[indices]=0
>>> a
array([0, 1, 0, 3, 0, 5, 0, 7, 0, 9])
>>> b=np.arange(100,600,100)
>>> b
array([100, 200, 300, 400, 500])
>>> a[indices]=b
>>> a
array([100, 1, 200, 3, 300, 5, 400, 7, 500, 9])
13 / 35

Refernces
Gabriele Lanaro, "Python High Performance Programming," Packt
Publishing, 2013.
Stéfan van der Walt, Numpy Medkit
14 / 35

Sparse matrix
Defined as a matrix in which most elements are zero
Compressed data structure is used to express it, so that it will be...
Space effective
Time effective
15 / 35

scipy.sparse
The class scipy.sparse has mainly three types as expressions of a sparse
matrix. (There are other types but not mentioned here)
lil_matrix : convenient to set data; setting a[i,j] is fast
csr_matrix : convenient for computation, fast to retrieve a row
csc_matrix : convenient for computation, fast to retrieve a column
Usually, set the data into lil_matrix, and then, convert it to csc_matrix or
csr_matrix.
For csr_matrix, and csc_matrix, calcutaion of matrices of the same type is fast,
but you should avoid calculation of different types.
16 / 35

Use case
>>> from scipy.sparse import lil_matrix, csr_matrix
>>> a=lil_matrix((3,3))
>>> a[0,0]=1.; a[0,2]=2.
>>> a=a.tocsr()
>>> print a
(0, 0) 1.0
(0, 2) 2.0
>>> a.todense()
matrix([[ 1., 0., 2.],
[ 0., 0., 0.],
[ 0., 0., 0.]])
>>> b=lil_matrix((3,3))
>>> b[1,1]=3.; b[2,0]=4.; b[2,2]=5.
>>> b=b.tocsr()
>>> b.todense()
matrix([[ 0., 0., 0.],
[ 0., 3., 0.],
[ 4., 0., 5.]])
>>> c=a.dot(b)
>>> c.todense()
matrix([[ 8., 0., 10.],
[ 0., 0., 0.],
[ 0., 0., 0.]])
>>> d=a+b
>>> d.todense()
matrix([[ 1., 0., 2.],
[ 0., 3., 0.],
[ 4., 0., 5.]]) 17 / 35

Internal structure: csr_matrix
>>> a[0,1]=1.; a[0,2]=2.; a[1,2]=3.; a[2,0]=4.; a[2,1]=5.
>>> b=a.tocsr()
>>> b.todense()
matrix([[ 0., 1., 2.],
[ 0., 0., 3.],
[ 4., 5., 0.]])
>>> b.indices
array([1, 2, 2, 0, 1], dtype=int32)
>>> b.data
array([ 1., 2., 3., 4., 5.])
>>> b.indptr
array([0, 2, 3, 5], dtype=int32)
18 / 35

Internal structure: csc_matrix
>>> a[0,1]=1.; a[0,2]=2.; a[1,2]=3.; a[2,0]=4.; a[2,1]=5.
>>> b=a.tocsc()
>>> b.todense()
matrix([[ 0., 1., 2.],
[ 0., 0., 3.],
[ 4., 5., 0.]])
>>> b.indices
array([2, 0, 2, 0, 1], dtype=int32)
>>> b.data
array([ 4., 1., 5., 2., 3.])
>>> b.indptr
array([0, 1, 3, 5], dtype=int32)
19 / 35

Merit of knowing the internal structure
Setting csr_matrix or csc_matrix with its internal structure is much faster than
setting lil_matrix with indices.
See the benchmark of setting

ý ý
ý

20 / 35

from scipy.sparse import lil_matrix, csr_matrix
import numpy as np
from timeit import timeit
def set_lil(n):
a=lil_matrix((n,n))
for i in xrange(n):
a[i,i]=2.
if i+1n:
a[i,i+1]=1.
return a
def set_csr(n):
data=np.empty(2*n-1)
indices=np.empty(2*n-1,dtype=np.int32)
indptr=np.empty(n+1,dtype=np.int32)
# to be fair, for-sentence is intentionally used
# (using indexing technique is faster)
for i in xrange(n):
indices[2*i]=i
data[2*i]=2.
if in-1:
indices[2*i+1]=i+1
data[2*i+1]=1.
indptr[i]=2*i
indptr[n]=2*n-1
a=csr_matrix((data,indices,indptr),shape=(n,n))
return a
print lil:,timeit(set_lil(10000),
number=10,setup=from __main__ import set_lil)
print csr:,timeit(set_csr(10000),
number=10,setup=from __main__ import set_csr)
21 / 35

Result:
lil: 11.6730761528
csr: 0.0562081336975
Remark
When you deal with already sorted data, setting csr_matrix or csc_matrix
with data, indices, indptr is much faster than setting lil_matrix
But the code tend to be more complicated if you use the internal structure
of csr_matrix or csc_matrix
22 / 35

Case 1: Norms
If 2
is dense:
norm=np.dot(v,v)
Ï2 Ï %
2%
Expressed as product of matrices. (dot means matrix product, but you don't
have to take transpose explicitly.)
When is sparse, suppose that is expressed as matrix:
2 2 g *
norm=v.multiply(v).sum()
(multiply() is element-wise product)
This is because taking transpose of a sparse matrix changes the type.
24 / 35

Frobenius norm:
norm=a.multiply(a).sum()
ÏÏ'SP %
%
25 / 35

Case 2: Applying a function to all of the elements of a
sparse matrix
A universal function can be applied to a dense matrix:
import numpy as np
a=np.arange(9).reshape((3,3))
a
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
np.tanh(a)
array([[ 0. , 0.76159416, 0.96402758],
[ 0.99505475, 0.9993293 , 0.9999092 ],
[ 0.99998771, 0.99999834, 0.99999977]])
This is convenient and fast.
However, we cannot do the same thing for a sparse matrix.
26 / 35

from scipy.sparse import lil_matrix
a=lil_matrix((3,3))
a[0,0]=1.
a[1,0]=2.
b=a.tocsr()
np.tanh(b)
3x3 sparse matrix of type 'type 'numpy.float64''
with 2 stored elements in Compressed Sparse Row format
This is because, for an arbitrary function, its application to a sparse matrix is
not necessarily sparse.
However, if a universal function satisfies

, the density is
preserved.
Then, how can we compute it?
27 / 35

Use the internal structure!!
The positions of the non-zero elements are not changed after application of
the function.
Keep indices and indptr, and just change data.
Solution:
b = csr_matrix((np.tanh(a.data), a.indices, a.indptr), shape=a.shape)
28 / 35

Case 3: Formula which appears in a paper
In the algorithm for recommendation system [1], the following formula
appears:
øø
* g
where is dense matrix, and D is a diagonal matrix defined from a
given array as:
%

ý
*

Here, (which corresponds to the number of users or items) is big and
(which means the number of latent factors) is small.
[1] Hu et al. Collaborative Filtering for Implicit Feedback Datasets, ICDM,
2008.
*
29 / 35

Solution 1:
There is a special class dia_matrix to deal with a diagonal sparse matrix.
import scipy.sparse as sparse
import numpy as np
def f(a,d):
a: 2d array of shape (n,f), d: 1d array of length n
dd=sparse.diags([d],[0])
return np.dot(a.T,dd.dot(a))
30 / 35

Solution 2:
Pack csr_matrix with data,indices,indptr
data=d
indices=[0,1,..,n]
indptr=[0,1,...,n+1]
def g(a,d):
n,f=a.shape
data=d
indices=np.arange(n)
indptr=np.arange(n+1)
dd=sparse.csr_matrix((data,indices,indptr),shape=(n,n))
return np.dot(a.T,dd.dot(a))
31 / 35

Solution 3:

û
)

û
)

g g

û
)

û
)
This is equivalent to the broadcasting!
def h(a,d):
return np.dot(a.T*d,a)
ü
ü
ü
*
*
û
*)

ý
*

ü
ü
g
ü
* *
* *
û
*) *

32 / 35

Benchmark
def datagen(n,f):
np.random.seed(0)
a=np.random.random((n,f))
d=np.random.random(n)
return a,d
from timeit import timeit
print dia_matrix :,timeit(f(a,d),number=10,
setup=from __main__ import f,datagen; a,d=datagen(1000000,10))
print csr_matrix :,timeit(g(a,d),number=10,
setup=from __main__ import g,datagen; a,d=datagen(1000000,10))
print broadcasting :,timeit(h(a,d),number=10,
setup=from __main__ import h,datagen; a,d=datagen(1000000,10))
Result:
dia_matrix : 1.60458707809
csr_matrix : 1.32580018044
broadcasting : 1.30032682419
33 / 35

Conclusion
Try not to use for-sentence, but use libraries' capabilities instead.
Knowledge about the internal structure of the sparse matrix is useful to
extract further performance.
Mathematical derivation is important. The key is to find a mathematically
equivalent and Python-friendly formula.
Computational speed does not necessarily matter. Finding a better code in
a short time is valuable. Otherwise, you shouldn't pursue too much.
34 / 35

Acknowledgment
I would like to thank
(@shima__shima)
who gave me useful advice in Twitter.
35 / 35

Effective Numerical Computation in NumPy and SciPy

More Related Content

Effective Numerical Computation in NumPy and SciPy