Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
85 views

Python Faster For Loop

Python is one of the most popular programming languages today. It’s an interpreted and high-level language with elegant and readable syntax. However, Python is generally significantly slower than Java, C#, and especially C, C++, or Fortran. Sometimes performance issues and bottlenecks might seriously affect the usability of applications. Fortunately, in most cases, there are solutions to improve the performance of Python programs. There are choices developers can take to improve the speed of th

Uploaded by

duomly
Copyright
© © All Rights Reserved
0% found this document useful (0 votes)
85 views

Python Faster For Loop

Python is one of the most popular programming languages today. It’s an interpreted and high-level language with elegant and readable syntax. However, Python is generally significantly slower than Java, C#, and especially C, C++, or Fortran. Sometimes performance issues and bottlenecks might seriously affect the usability of applications. Fortunately, in most cases, there are solutions to improve the performance of Python programs. There are choices developers can take to improve the speed of th

Uploaded by

duomly
Copyright
© © All Rights Reserved
You are on page 1/ 17

Loops in Python –

Comparison and
Performance
This content was originally published at:
https://www.blog.duomly.com/loops-in-python-comparison-and-performance/

***

Python is one of the most popular


programming languages today. It’s an
interpreted and high-level language with
elegant and readable syntax. However,
Python is generally significantly slower
than Java, C#, and especially C, C++, or
Fortran. Sometimes performance issues and
bottlenecks might seriously affect the
usability of applications.

Fortunately, in most cases, there are


solutions to improve the performance of
Python programs. There are choices
developers can take to improve the speed
of their code. For example, the general
advice is to use optimized Python built-
in or third-party routines, usually
written in C or Cython. Besides, it’s
faster to work with local variables than
with globals, so it’s a good practice to
copy a global variable to a local before
the loop. And so on.

Finally, there’s always the possibility


to write own Python functions in C, C++,
or Cython, call them from the application
and replace Python bottleneck routines.
But this is usually an extreme solution
and is rarely necessary for practice.

Often performance issues arise when using


Python loops, especially with a large
number of iterations. There is a number
of useful tricks to improve your code and
make it run faster, but that’s beyond the
scope here.

This article compares the performance of


several approaches when summing two
sequences element-wise:
• Using the while loop
• Using the for loop
• Using the for loop with list
comprehensions
• Using the third-party library numpy

However, performance isn’t the only


concern when developing software.

Moreover, according to Donald Knuth in


The Art of Computer Programming,
“premature optimization is the root of
all evil (or at least most of it) in
programming”. After all, “readability
counts”, as stated in the Zen of Python
by Tim Peters.
Problem Statement

We’ll try to sum two sequences element-


wise. In other words, we’ll take two
sequences (list or arrays) of the same
size and create the third one with the
elements obtained by adding the
corresponding elements from the inputs.
Preparation

We’ll import Python built-in package


random and generate a list r that
contains 100.000 pseudo-random numbers
from 0 to 99 (inclusive):

import random
r = [random.randrange(100) for _ in
range(100_000)]

We’ll also use the third-party


package numpy, so let’s import it:

import numpy as np

And we’re ready to go!


Simple Loops

Let’s first see some simple Python loops


in action.

Using Pure Python

We’ll start with two lists with 1.000


elements each. The integer variable n
represents the length of each list. The
lists x and y are obtained by randomly
choosing n elements from r:

n = 1_000
x, y = random.sample(r, n), random.sample(r, n)

Let’s see what is the time needed to get


a new list z with n elements, each of
which is the sum of the corresponding
elements from x and y.

We’ll test the performance of the while


loop first:

%%timeit
i, z = 0, []
while i < n:
z.append(x[i] + y[i])
i += 1

The output is:


160 µs ± 1.44 µs per loop (mean ± std.
dev. of 7 runs, 10000 loops each)
Please, note that the output
of timeit depends on many factors and
might be different each time.

The for loop in Python is better


optimized for the cases like this, that
is to iterate over collections,
iterators, generators, and so on. Let’s
see how it works:

%%timeit
z = []
for i in range(n):
z.append(x[i] + y[i])

The output is:


122 µs ± 188 ns per loop (mean ± std.
dev. of 7 runs, 10000 loops each)
In this case, the for loop is faster, but
also more elegant compared to while.

List comprehensions are very similar to


the ordinary for loops. They are suitable
for simple cases (like this one).
In addition to being more compact, they
are usually slightly faster since some
overhead is removed:

%%timeit
z = [x[i] + y[i] for i in range(n)

The output is:


87.2 µs ± 490 ns per loop (mean ± std.
dev. of 7 runs, 10000 loops each)
Please, have in mind that you can’t apply
list comprehensions in all cases when you
need loops. Some more complex situations
require the ordinary for or even while
loops.

Using Python with NumPy

numpy is a third-party Python library


often used for numerical computations.
It’s especially suitable to manipulate
arrays. It offers a number of useful
routines to work with arrays, but also
allows writing compact and elegant code
without loops.

Actually, the loops, as well as other


performance-critical operations, are
implemented in numpy on the lower level.

That allows numpy routines to be much


faster compared to pure Python code. One
more advantage is the way numpy handles
variables and types.

Let’s first use the lists of Python


integers x and y to create
corresponding numpy arrays of 64-bit
integers:
x_, y_ = np.array(x, dtype=np.int64), np.array(y,
dtype=np.int64)

Summing two numpy arrays x_ and y_


element-wise is as easy as x_ + y_. But
let’s check the performance:

%%timeit
z = x_ + y_

The output is:


1.03 µs ± 5.09 ns per loop (mean ± std.
dev. of 7 runs, 1000000 loops each)
That’s almost 85 times faster than when
we used list comprehensions.
And the code is extremely simple and
elegant. numpy arrays can be a much better
choice for working with large arrays.
Performance benefits are generally
greater when data is bigger.

It might be even better. If we’re OK with


using 32-bit integers instead of 64-bit,
we can save both memory and time in some
cases:

x_, y_ = np.array(x, dtype=np.int32), np.array(y,


dtype=np.int32)

We can add these two arrays as before:

%%timeit
z = x_ + y_
The output is:
814 ns ± 5.8 ns per loop (mean ± std.
dev. of 7 runs, 1000000 loops each)
The results obtained when n is larger,
that is 10_000 and 100_000 are shown in
the table below. They show the same
relationships, as in this case, with even
higher performance boost when using numpy.
Nested Loops

Let’s now compare the nested Python


loops.

Using Pure Python

We’ll work with two lists called x and y


again. Each of them will contain 100
inner lists with 1.000 pseudo-random
integer elements. Thus, x and y will
actually represent the matrices with 100
rows and 1.000 columns:

m, n = 100, 1_000
x = [random.sample(r, n) for _ in range(m)]
y = [random.sample(r, n) for _ in range(m)]

Let’s see the performance of adding them


using two nested while loops:

%%timeit
i, z = 0, []
while i < m:
j, z_ = 0, []
while j < n:
z_.append(x[i][j] + y[i][j])
j += 1
z.append(z_)
i += 1

The output is:


19.7 ms ± 271 µs per loop (mean ± std.
dev. of 7 runs, 100 loops each Again, we
can get some performance improvement with
the nested for loops:

%%timeit
z = []
for i in range(m):
z_ = []
for j in range(n):
z_.append(x[i][j] + y[i][j])
z.append(z_)

The output is:


16.4 ms ± 303 µs per loop (mean ± std.
dev. of 7 runs, 100 loops each) In some
cases, the nested for loops can be used
with lists comprehensions, enabling an
additional benefit:

%%timeit
z = [[x[i][j] + y[i][j] for j in range(n)] for i
in range(m)]

The output is:


12.1 ms ± 99.4 µs per loop (mean ± std.
dev. of 7 runs, 100 loops each)
We can see that in the case of nested
loops, list comprehensions are faster
than the ordinary for loops, which are
faster than while.
In this case, we have 100.000 (100×1.000)
integer elements in each list. This
example is slightly slower than the one
with 100.000 elements and a single loop.
That’s the conclusion for all three
approaches (list comprehensions, ordinary
for, and while loops).

Using Python with NumPy

numpy is great to work with multi-


dimensional arrays. Let’s use x and y to
create corresponding numpy arrays of 64-
bit integers:

x_, y_ = np.array(x, dtype=np.int64), np.array(y,


dtype=np.int64)

And let’s check the performance:

%%timeit
z = x_ + y_

The output is:


69.9 µs ± 909 ns per loop (mean ± std.
dev. of 7 runs, 10000 loops each) That’s
about 173 times faster than list
comprehensions. But it might be even
faster if we use 32-bit integers:

x_, y_ = np.array(x, dtype=np.int32), np.array(y,


dtype=np.int32)
The performance check is done as before:

%%timeit
z = x_ + y_

The output is:


34.3 µs ± 44.6 ns per loop (mean ± std.
dev. of 7 runs, 10000 loops each) which
is twice faster than with 64-bit
integers.

Results Summary
This table summarizes the obtained
results:

Number of 1 1 1 100
elements, × × × ×
m × n 1K 10K 100K 1K

160
Python while µs
1.66 ms 17.0 ms 19.7 ms

122
Python for 1.24 ms 13.4 ms 16.4 ms
µs

Python for with


87.2
list 894 µs 8.92 ms 12.1 ms
µs
comprehension

numpy with 64- 1.03


6.36 µs 71.9 µs 69.9 µs
bit integers µs
numpy with 32- 814
2.87 µs 34.1 µs 34.3 µs
bit integers ns
Conclusions of python faster for loop

This article compares the performance of


Python loops when adding two lists or
arrays element-wise.
The results show that list comprehensions
were faster than the ordinary for loop,
which was faster than the while loop. The
simple loops were slightly faster than
the nested loops in all three cases.

numpy offers the routines and operators


that can substantially reduce the amount
of code and increase the speed of
execution. It’s especially useful when
working with single- and multi-
dimensional arrays.

Please, have in mind that the conclusions


or relations among the results obtained
here are not applicable, valid, or useful
in all cases! They are presented for
illustration. The proper way to handle
inefficiencies is to discover the
bottlenecks and perform own tests.
Table of contents:
• Problem Statement
• Preparation
• Simple Loops
• Nested Loops
• Results Summary
• Conclusions

You might also like