Rolex Pearlmaster Replica
  Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
This article is part of in the series
Published: Thursday 24th January 2013
Last Updated: Wednesday 29th December 2021

At a glance, the yield statement is used to define generators, replacing the return of a function to provide a result to its caller without destroying local variables. Unlike a function, where on each call it starts with new set of variables, a generator will resume the execution where it was left off.

About Python Generators

Since the yield keyword is only used with generators, it makes sense to recall the concept of generators first.

The idea of generators is to calculate a series of results one-by-one on demand (on the fly). In the simplest case, a generator can be used as a list, where each element is calculated lazily. Lets compare a list and a generator that do the same thing - return powers of two:



[python]
>>> # First, we define a list
>>> the_list = [2**x for x in range(5)]
>>>
>>> # Type check: yes, it's a list
>>> type(the_list)
<class 'list'>
>>>
>>> # Iterate over items and print them
>>> for element in the_list:
... print(element)
...
1
2
4
8
16
>>>
>>> # How about the length?
>>> len(the_list)
5
>>>
>>> # Ok, now a generator.
>>> # As easy as list comprehensions, but with '()' instead of '[]':
>>> the_generator = (x+x for x in range(3))
>>>
>>> # Type check: yes, it's a generator
>>> type(the_generator)
<class 'generator'>
>>>
>>> # Iterate over items and print them
>>> for element in the_generator:
... print(element)
...
0
2
4
>>>
>>> # Everything looks the same, but the length...
>>> len(the_generator)
Traceback (most recent call last):
File "", line 1, in
TypeError: object of type 'generator' has no len()
[/python]


[python]
>>> # First, we define a list
>>> the_list = [2**x for x in range(5)]
>>>
>>> # Type check: yes, it's a list
>>> type(the_list)
<type 'list'>
>>>
>>> # Iterate over items and print them
>>> for element in the_list:
... print(element)
...
1
2
4
8
16
>>>
>>> # How about the length?
>>> len(the_list)
5
>>>
>>> # Ok, now a generator.
>>> # As easy as list comprehensions, but with '()' instead of '[]':
>>> the_generator = (x+x for x in range(3))
>>>
>>> # Type check: yes, it's a generator
>>> type(the_generator)
<type 'generator'>
>>>
>>> # Iterate over items and print them
>>> for element in the_generator:
... print(element)
...
0
2
4
>>>
>>> # Everything looks the same, but the length...
>>> len(the_generator)
Traceback (most recent call last):
File "", line 1, in
TypeError: object of type 'generator' has no len()
[/python]

Iterating over the list and the generator looks completely the same. However, although the generator is iterable, it is not a collection, and thus has no length. Collections (lists, tuples, sets, etc) keep all values in memory and we can access them whenever needed. A generator calculates the values on the fly and forgets them, so it does not have any overview about the own result set.

Generators are especially useful for memory-intensive tasks, where there is no need to keep all of the elements of a memory-heavy list accessible at the same time. Calculating a series of values one-by-one can also be useful in situations where the complete result is never needed, yielding intermediate results to the caller until some requirement is satisfied and further processing stops.

Using the Python "yield" keyword

A good example is a search task, where typically there is no need to wait for all results to be found. Performing a file-system search, a user would be happier to receive results on-the-fly, rather the wait for a search engine to go through every single file and only afterwards return results. Are there any people who really navigate through all Google search results until the last page?

Since a search functionality cannot be created using list-comprehensions, we are going to define a generator using a function with the yield statement/keyword. The yield instruction should be put into a place where the generator returns an intermediate result to the caller and sleeps until the next invocation occurs. Let's define a generator that would search for some keyword in a huge text file line-by-line.

[python]
def search(keyword, filename):
print('generator started')
f = open(filename, 'r')
# Looping through the file line by line
for line in f:
if keyword in line:
# If keyword found, return it
yield line
f.close()
[/python]

Now, assuming that my "directory.txt" file contains a huge list of names and phone numbers, lets find someone with "Python" in the name:

[python]
>>> the_generator = search('Python', 'directory.txt')
>>> # Nothing happened
[/python]

When we call the search function, its body code does not run. The generator function will only return the generator object, acting as a constructor:



[python]
>>> type(search)
<class 'function'>
>>> type(the_generator)
<class 'generator'>
[/python]


[python]
>>> type(search)
<type 'function'>
>>> type(the_generator)
<type 'generator'>
[/python]

This is a bit tricky, since everything below def search(keyword, filename): is normally meant to execute after calling it, but not in the case of generators. In fact, there was even a long discussion, suggesting to use "gen", or other keywords to define a generator. However, Guido decided to stick with "def", and that's it. You can read the motivation on PEP-255.

To make the newly-created generator calculate something, we need to access it via the iterator protocol, i.e. call it's next method:



[python]
>>> print(next(the_generator))
generator started
Anton Pythonio 111-222-333
[/python]


[python]
>>> print(the_generator.next())
generator started
Anton Pythonio 111-222-333
[/python]

The debug string was printed, and we got the first search result without looking through the whole file. Now let's request the next match:



[python]
>>> print(next(the_generator))
generator started
Fritz Pythonmann 128-256-512
[/python]


[python]
>>> print(the_generator.next())
generator started
Fritz Pythonmann 128-256-512
[/python]

The generator resumed on the last yield keyword/statement and went through the loop until it hit the yield keyword/statement again. However, Fritz is still not the right guy. Next, please:



[python]
>>> print(next(the_generator))
generator started
Guido Pythonista 123-456-789
[/python]


[python]
>>> print(the_generator.next())
generator started
Guido Pythonista 123-456-789
[/python]

Finally, we found him. Now you could call him and say "thank you" for great generators in Python!

More generator details and examples

As you may noticed, the first time the function runs, it will go from the beginning until it reaches the yield keyword/statement, returning the first result to the caller. Then, each other call will resume the generator code where is was left of. If the generator function does not hit the yield keyword/statement anymore, it will raise a StopIteration exception (just like all iterable objects do when they are exhausted/finished).

To run the yield on subsequent calls, the generator can contain a loop or multiple yield statements:

[python]
def hold_client(name):
yield 'Hello, %s! You will be connected soon' % name
yield 'Dear %s, could you please wait a bit.' % name
yield 'Sorry %s, we will play a nice music for you!' % name
yield '%s, your call is extremely important to us!' % name
[/python]

It usually makes more sense to use a generator as a conveyor, chaining functions to work on some sequence efficiently. A good example is buffering: fetching data in large chunks and processing in small chunks:

[python]
def buffered_read():
while True:
buffer = fetch_big_chunk()
for small_chunk in buffer:
yield small_chunk
[/python]

This approach allows the processing function to abstract away from any buffering issues. It can just get the values one by one using the generator that will take care of buffering.

Even simple tasks can be more efficient using the idea of generators. In Python 2.X, a common range() function in Python is often substituted by xrange(), which yields values instead of creating the whole list at once:



[python]
>>> # "range" returns a list
>>> type(range(0, 3))
<class 'list'>
>>> # xrange does not exist in Python 3.x
[/python]


[python]
>>> # "range" returns a list
>>> type(range(0, 3))
<type 'list'>
>>> # xrange returns a generator-like object "xrange"
>>> type(xrange(0, 3))
<type 'xrange'>
>>>
>>> # It can be used in loops just like range
>>> for i in xrange(0, 3):
... print(i)
...
0
1
2
[/python]

And finally, a "classical" example of generators: calculate the first N given number of Fibonacci numbers:

[python]
def fibonacci(n):
curr = 1
prev = 0
counter = 0
while counter < n:
yield curr
prev, curr = curr, prev + curr
counter += 1
[/python]

Numbers are calculated until the counter reaches 'n'. This example is so popular because the Fibonacci sequence is infinite, making it problematic to fit in memory.

So far the most practical aspects of Python generators have been described. For more detailed info and an interesting discussion take a look at the Python Enhancement Proposal 255, which discusses the feature of the language in detail.

Happy Pythoning!

About The Author

Anton Caceres

Latest Articles


Tags

  • Parsing
  • tail
  • merge sort
  • Programming language
  • remove python
  • concatenate string
  • Code Editors
  • unittest
  • reset_index()
  • Train Test Split
  • Local Testing Server
  • Python Input
  • Studio
  • excel
  • sgd
  • deeplearning
  • pandas
  • class python
  • intersection
  • logic
  • pydub
  • git
  • Scrapping
  • priority queue
  • quick sort
  • web development
  • uninstall python
  • python string
  • code interface
  • PyUnit
  • round numbers
  • train_test_split()
  • Flask module
  • Software
  • FL
  • llm
  • data science
  • testing
  • pathlib
  • oop
  • gui
  • visualization
  • audio edit
  • requests
  • stack
  • min heap
  • Linked List
  • machine learning
  • scripts
  • compare string
  • time delay
  • PythonZip
  • pandas dataframes
  • arange() method
  • SQLAlchemy
  • Activator
  • Music
  • AI
  • ML
  • import
  • file
  • jinja
  • pysimplegui
  • notebook
  • decouple
  • queue
  • heapify
  • Singly Linked List
  • intro
  • python scripts
  • learning python
  • python bugs
  • ZipFunction
  • plus equals
  • np.linspace
  • SQLAlchemy advance
  • Download
  • No
  • nlp
  • machiine learning
  • dask
  • file management
  • jinja2
  • ui
  • tdqm
  • configuration
  • deque
  • heap
  • Data Structure
  • howto
  • dict
  • csv in python
  • logging in python
  • Python Counter
  • python subprocess
  • numpy module
  • Python code generators
  • KMS
  • Office
  • modules
  • web scraping
  • scalable
  • pipx
  • templates
  • python not
  • pytesseract
  • env
  • push
  • search
  • Node
  • python tutorial
  • dictionary
  • csv file python
  • python logging
  • Counter class
  • Python assert
  • linspace
  • numbers_list
  • Tool
  • Key
  • automation
  • website data
  • autoscale
  • packages
  • snusbase
  • boolean
  • ocr
  • pyside6
  • pop
  • binary search
  • Insert Node
  • Python tips
  • python dictionary
  • Python's Built-in CSV Library
  • logging APIs
  • Constructing Counters
  • Assertions
  • Matplotlib Plotting
  • any() Function
  • Activation
  • Patch
  • threading
  • scrapy
  • game analysis
  • dependencies
  • security
  • not operation
  • pdf
  • build gui
  • dequeue
  • linear search
  • Add Node
  • Python tools
  • function
  • python update
  • logging module
  • Concatenate Data Frames
  • python comments
  • matplotlib
  • Recursion Limit
  • License
  • Pirated
  • square root
  • website extract python
  • steamspy
  • processing
  • cybersecurity
  • variable
  • image processing
  • incrementing
  • Data structures
  • algorithm
  • Print Node
  • installation
  • python function
  • pandas installation
  • Zen of Python
  • concatenation
  • Echo Client
  • Pygame
  • NumPy Pad()
  • Unlock
  • Bypass
  • pytorch
  • zipp
  • steam
  • multiprocessing
  • type hinting
  • global
  • argh
  • c vs python
  • Python
  • stacks
  • Sort
  • algorithms
  • install python
  • Scopes
  • how to install pandas
  • Philosophy of Programming
  • concat() function
  • Socket State
  • % Operator
  • Python YAML
  • Crack
  • Reddit
  • lightning
  • zip files
  • python reduce
  • library
  • dynamic
  • local
  • command line
  • define function
  • Pickle
  • enqueue
  • ascending
  • remove a node
  • Django
  • function scope
  • Tuple in Python
  • pandas groupby
  • pyenv
  • socket programming
  • Python Modulo
  • Dictionary Update()
  • Hack
  • sdk
  • python automation
  • main
  • reduce
  • typing
  • ord
  • print
  • network
  • matplotlib inline
  • Pickling
  • datastructure
  • bubble sort
  • find a node
  • Flask
  • calling function
  • tuple
  • GroupBy method
  • Pythonbrew
  • Np.Arange()
  • Modulo Operator
  • Python Or Operator
  • Keygen
  • cloud
  • pyautogui
  • python main
  • reduce function
  • type hints
  • python ord
  • format
  • python socket
  • jupyter
  • Unpickling
  • array
  • sorting
  • reversal
  • Python salaries
  • list sort
  • Pip
  • .groupby()
  • pyenv global
  • NumPy arrays
  • Modulo
  • OpenCV
  • Torrent
  • data
  • int function
  • file conversion
  • calculus
  • python typing
  • encryption
  • strings
  • big o calculator
  • gamin
  • HTML
  • list
  • insertion sort
  • in place reversal
  • learn python
  • String
  • python packages
  • FastAPI
  • argparse
  • zeros() function
  • AWS Lambda
  • Scikit Learn
  • Free
  • classes
  • turtle
  • convert file
  • abs()
  • python do while
  • set operations
  • data visualization
  • efficient coding
  • data analysis
  • HTML Parser
  • circular queue
  • effiiciency
  • Learning
  • windows
  • reverse
  • Python IDE
  • python maps
  • dataframes
  • Num Py Zeros
  • Python Lists
  • Fprintf
  • Version
  • immutable
  • python turtle
  • pandoc
  • semantic kernel
  • do while
  • set
  • tabulate
  • optimize code
  • object oriented
  • HTML Extraction
  • head
  • selection sort
  • Programming
  • install python on windows
  • reverse string
  • python Code Editors
  • Pytest
  • pandas.reset_index
  • NumPy
  • Infinite Numbers in Python
  • Python Readlines()
  • Trial
  • youtube
  • interactive
  • deep
  • kernel
  • while loop
  • union
  • tutorials
  • audio
  • github
  • Python is a beautiful language.