Python Threads
Python Threads
Aahz
aahz@pobox.com
http://starship.python.net/crew/aahz/
Powered by PythonPoint
http://www.reportlab.com/
title: Title
title:
Meta Tutorial
I'm hearing-impaired
Please write questions if at all possible
Pop Quiz
Slides and scripts on web
Contents
title: Part 1
Generic Threads
Similar to processes
Shared memory
Light-weight
Difficult to set up
Especially cross-platform
Efficiency/speed
multiple CPUs, parallelize blocking I/O
Responsiveness
e.g. background thread for GUI
Algorithmic simplicity
simulations, data passing
(mostly skipped in this tutorial)
title: Why Use Threads?
Python Threads
Class-based
Use threading, not thread
Cross-platform, OS-level
Thread Library
Python 1.5.2
configure --with-thread
Except on MS Windows and some Linux
distributions
Multi-CPU bug
Creating/destroying large numbers of
threads
Upgrade to 2.x
title: Python 1.5.2
GIL
www.python.org/doc/current/api/threads.html
GIL in Action
Which is faster?
One Thread
total = 1
for i in range(10000):
total += 1
total = 1
for i in range(10000):
total += 1
Two Threads
total = 1
for i in range(10000):
total += 1
total = 1
for i in range(10000):
total += 1
title: GIL in action
sys.setcheckinterval()
(default 10)
C extensions can release GIL
Blocking I/O releases GIL
So does time.sleep(!=0)
Multiple Processes
CORBA, XML-RPC, sockets, etc.
title: Dealing with GIL
Don't
title: Share External Objects 2
Don't
Subclass threading.Thread
Override __init__() and run()
Do not override start()
In __init__(), call
Thread.__init__()
Non-threaded Example
class Retriever:
def __init__(self, URL):
self.URL = URL
def run(self):
self.page = self.getPage()
retriever = Retriever('http://www.foo.com/')
retriever.run()
URLs = retriever.getLinks()
Threaded Example
from threading import Thread
class Retriever(Thread):
def __init__(self, URL):
Thread.__init__(self)
self.URL = URL
def run(self):
self.page = self.getPage()
retriever = Retriever('http://www.foo.com/')
retriever.start()
while retriever.isAlive():
time.sleep(1)
URLs = retriever.getLinks()
Multiple Threads
seeds = ['http://www.foo.com/',
'http://www.bar.com/',
'http://www.baz.com/']
threadList = []
URLs = []
for seed in Seed:
retriever = Retriever(seed)
retriever.start()
threadList.append(retriever)
for retriever in threadList:
# join() is more efficient than sleep()
retriever.join()
URLs += retriever.getLinks()
title: Multiple Threads
Thread Methods
Module functions:
activeCount() (not useful)
enumerate() (not useful)
Unthreaded Spider
SingleThreadSpider.py
Compare Tools/webchecker/
BruteThreadSpider.py
Recap Part 1
GIL
Creating threads
Brute force threads
Part 2
Thread Theory
Python Thread Library
title: Part 2
Thread Order
Non-determinate
Thread 1
Thread 2
print "a,",
print "b,",
print "c,",
print "1,",
print "2,",
print "3,",
Sample output
1,
a,
1,
a,
a,
b,
2,
b,
b,
c,
3,
1,
2,
1,
a,
2,
c,
2,
b,
3,
3,
3,
c,
c,
Thread Communication
Data protection
Synchronization
Data Protection
Synchronization
title: Synchronization
Thread Library
Lock()
RLock()
Semaphore()
Condition()
Event()
Queue.Queue()
title: Thread Library
Lock()
Methods
acquire(blocking)
release()
title: Lock()
Thread 2
mutex.acquire()
if myList:
work = myList.pop()
mutex.release()
...
...
...
...
...
...
...
...
mutex.acquire()
if len(myList)<10:
myList.append(work)
mutex.release()
Misusing Lock()
# OOPS!
Thread 1
self.synch.wait()
...
...
self.synch.notify()
Thread 2
...
self.synch.notify()
self.synch.wait()
...
title: Synching threads
RLock()
Mutex only
Other threads cannot release RLock()
Recursive
Methods
acquire(blocking)
release()
title: RLock()
Using RLock()
mutex = RLock()
mutex.acquire()
...
mutex.acquire()
...
mutex.release()
mutex.release()
Thread 1
mutex.acquire()
self.update()
mutex.release()
...
...
...
# Safe
Thread 2
...
...
...
mutex.acquire()
self.update()
mutex.release()
Semaphore()
Methods
Semaphore(value)
acquire(blocking)
release()
title: Semaphore()
Condition()
Methods
Condition(lock)
acquire(blocking)
release()
wait(timeout)
notify()
notifyAll()
title: Condition()
Using Condition()
Avoid timeout
Creates polling loop, so inefficient
Event()
Methods
set()
clear()
isSet()
wait(timeout)
title: Event()
TMTOWTDI
Perl:
There's More Than One Way To Do It
Python:
There should be one - and preferably only
one - obvious way to do it
Producer/Consumer
Example: factory
One part of the factory produces part of a
widget; another part of the factory
consumes widget parts to make complete
widgets. Trick is to keep it all in balance.
title: Producer/Consumer
title:
Body factory
Wheel factory
Assembly
title: Factory 1
Factory Objects 1
Body
Wheels
body.list
body.rlock
body.event
assembly.event
wheels.list
wheels.rlock
wheels.event
assembly.event
Assembly
body.list
body.rlock
body.event
wheels.list
wheels.rlock
wheels.event
assembly.rlock
assembly.event
title: Factory Objects 1
Queue()
Simple!
Handles both data protection and
synchronization
title: Queue()
Queue() Objects
Methods
Queue(maxsize)
put(item,block)
get(block)
qsize()
empty()
full()
Using Queue()
Thread 1
Thread 2
out = self.doWork()
queue2.put(output)
...
...
...
self.in = queue1.get()
...
...
self.in = queue2.get()
out = self.doWork()
queue1.put(output)
...
Body factory
Wheel factory
Assembly
title: Factory 2
Factory Objects 2
Body
Wheels
body.queue
wheels.queue
Assembly
body.queue
wheels.queue
assembly.rlock
Body factory
Wheel factory
Packager
Assembly
title: Factory 3
Factory Objects 3
Body
Wheels
body.queue
wheels.queue
Packager
while 1:
body = self.body.queue.get()
wheels = self.wheels.queue.get()
self.assembly.queue.put( (body,wheels) )
Assembly
assembly.queue
Recap Part 2
Using Queues
spider (thread pool)
GUI (Tkinter) (background thread)
title: Part 3
Spider w/Queue
ThreadPoolSpider.py
Two queues
Pass work to thread pool
Get links back from thread pool
Tkinter Intro
This space intentionally left blank
Widgets
Windows, buttons, checkboxes, text entry,
listboxes
Events
Widget activation, keypress, mouse
movement, mouse click, timers
Widgets
Geometry manager
Register callbacks
title: Widgets
Events
Event loop
Trigger callbacks
title: Events
Tkinter resources
Web
www.python.org/topics/tkinter/doc.html
Books
Python and Tkinter Programming, John E.
Grayson
Fibonacci
Fibonacci.py
UI freezes during calc
Frequent screen updates slow calc
title: Fibonacci
Threaded Fibonacci
FibThreaded.py
Tkinter needs to poll
Use after event
Single-element queue
Use in non-blocking mode to minimize
updates
Compare Spider/Fib
Recap Part 3
Part 4: Miscellaneous
title: Part 4
Unsafe
Multiple operations against Python
variables (e.g. checking the length of a list
before appending) or any operation that
involves a callback to a class (e.g. the
__getattr__ hook)
title: GIL and Shared Vars
Locks vs GIL
GIL example
Threads 2,3,5
mutex.acquire()
if myList:
work = myList.pop()
mutex.release()
dis this
Performance Tip
python -O
Also set PYTHONOPTIMIZE
15% performance boost
Removes bytecodes (SET_LINENO)
Fewer context switches!
Also removes assert
import Editorial
How to import
from threading import Thread, Semaphore
or
import threading
Don't use
from threading import *
Stackless/Microthreads
More info:
http://www.tismer.com/research/stackless/
http://world.std.com/~wware/uthread.html
title: Stackless/Microthreads
Killing Threads
Debugging Threads
gdb
Thread Scheduling
Handling Exceptions
try/finally
Use to make sure locks get released
try/except
Close down all threads in outer block
Be careful to pass SystemExit and
KeyboardInterrupt
try/finally
title: try/finally
try/except
title: try/except
Pop Quiz 1
How are threads and processes similar and different?
What is the GIL?
In what ways does the GIL make thread programming
easier and harder?
How do you create a thread in Python?
What should not be shared between threads?
title: Pop Quiz 1
Pop Quiz 2
What are "brute force" threads?
Explain what each of the following is used for:
Lock()
RLock()
Semaphore()
Condition()
Event()
Queue.Queue()
Why are queues great?
title: Pop Quiz 2
Pop Quiz 3
How do you handle exceptions?