Python Yield Keyword

May 3, 2021

Holy moly coroutines.

In my journey through Luciano Ramalho's Fluent Python, I ran into this incredible feature of Python which I've never experienced in any other programming language. To my (possibly naive) mind, I'm imagining them being Python's magic bullet to fix all the issues that plague concurrency and make it really, really hard to get right.

No mutexes? No needing to worry about an operating system starving a thread? Control over context switching? Sweet! I can't wait to try it!

The first step in the journey to understanding coroutines is the yield keyword. This is another concept that I haven't run into in any other programming language that I've used. It provides a function (in a general sense) the ability to pause execution and send a value to the caller, with the understanding that it can be resumed.

I'd imagine the most frequent usage of yield is in iterators and generators. Here's a really simple example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13 def counter(max_num):
    ''' Yields numbers from 0 to max_num, iterating by 1 '''
    for ii in range(max_num):
        yield ii

# Main function
if __name__ == '__main__':
    # Creates a list from the numbers yielded by counter
    count = [str(ii) for ii in counter(3)]
    print(f'Count is {count}')

# Output    
Count is ['0', '1', '2']

Let's visualize line by line through what is happening here. Our visualization might not be the exact steps Python takes to execute this program, but it will help to illustrate a point. First, the program executes the right side of this line:

count = [str(ii) for ii in counter(3)]

It executes the first loop of the for loop in the counter function by setting ii to 0 and continuing onto yield keyword. Once it reaches it, it sends the expression after yield to the caller (the main function). The value of ii is then converted into a string and appended to the count list in the main function. Execution in counter is paused, but the internal state remains.

Since the for loop in counter(3) has not been satisfied yet, the context is switched back to counter. The ii variable is now 1, which is then sent back to the main function to be appended to the count list. This same process is repeated for ii = 2.

After 2 is yielded to the main function, counter has finished its execution. Execution is permanently switched back to the main function context and the internal state of counter is reset. The main function continues on and runs to completion.

In reality, counter creates a generator object with the internal states of its iterations. This generator object can then be iterated through, converted to a list, etc. But this is just a Python implementation detail.

The same thing can be implemented with a user-defined class with the __iter__ method. The __iter__ method is really similar to the function defined above. Actually, the method contents can be exactly the same. Here's an example of a Counter class:

>>> class Counter:
...     def __init__(self, max_num):
...         self._max_num = max_num
...
...     def __iter__(self):
...         for ii in range(self._max_num):
...             yield ii
... 
>>> count = [ii for ii in Counter(3)]
>>> print(f'Count is {count}')
Count is [0, 1, 2]

As you can see, Counter can be used as a generator, just like the previous counter generator function.

Once a class or a function is made iterable with the yield keyword, the next keyword can be used by the calling class to temporarily switch context to the function/method using yield. Once yield is encountered in the iterable, the iterable passes a value back to the calling class and the context switches again. Here's another really simple example reading a haiku from 'haiku.txt' on my local file system.

>>> class Reader:
...     def __init__(self, file_name):
...         self._file_name = file_name
...         self._file_handle = None
...     def __enter__(self):
...         self._file_handle = open(self._file_name, 'r')
...         return self
...     def __exit__(self, type, value, tb):
...         self._file_handle.close()
...     def __iter__(self):
...         for line in self._file_handle:
...             yield line
... 
>>> with Reader('haiku.txt') as reader:
>>>     counter = 0
>>>     while True:
>>>         counter += 1
>>>         contents = next(iter(reader))
>>>         print(f'Line {counter}: {contents}', end='')
>>> 
Line 1: An old silent pond
Line 2: A frog jumps into the pond--
Line 3: Splash! Silence again.

For those wondering, this poem is called The Old Pond, by Matsuo Bashō. I'm certainly not clever enough to come up with that.

For now, we can skip over the __enter__ and __exit__ methods in the Reader class. Those methods are used to implement the with keyword which automatically opens and closes the 'haiku.txt' file. I'd like to cover context managers like the with keyword in another post. The important parts are the __iter__ method and the next keyword.

As you can see, the Reader class implements the __iter__ method, which loops through each line in self._file_handle (the haiku.txt file). Each time yield is encountered, context is switched back to the main function (those lines with >>> preceding them). The magic happens in this line:

contents = next(iter(reader))

Here, an iterator is retrieved for the Reader class (by using the __iter__ function). Each call to next retrieves the next value yielded from the iterator (as in, the next value yielded from __iter__). Once the value is retrieved and the context is given back to the main function, the retrieved line is printed out to the user.

This context switching implemented by next is the basis for how coroutines and asynchronous programming works in Python!

There's no requirement that next has to be in quite as tight of a loop as it sits in this example. In fact, it can be one step in a very complex calculation. That's what makes this so powerful. One example which comes to the top of my head because I see it everywhere (including Fluent Python) is working with very large files.

If 'haiku.txt' was not three lines, but millions of lines long, it would not be efficient (or even possible) to load the entire thing into memory and churn through as in the above example. It may be required to load the file into bite-sized pieces. A good approach would be to use a class which loaded the file piece-by-piece and could be iterated over by an algorithm which operated on each individual piece. In fact, the algorithm could look exactly the same as one which worked on a three- line haiku. Only the class which loads the file (and provides the iterator) would have to know that it worked with a giant file.

That's some good object oriented programming.

There are still some shortcomings and implementation details that I left out here. I will cover them in future blog posts. In the meantime, I'd like to cover one more piece that anyone who was trying out these examples may have uncovered.

When an iterator reaches the end of its iterable range, it notifies the caller with a StopIteration exception. This is a valid, expected condition and should be handled by the calling code. In simple iterators, this can be as simple as catching the exception, pass-ing and continuing on. More complex examples will be handled in future posts.

Expanding the previous example to handle the StopIteration exception:

>>> with Reader('/proj/skratch/test.txt') as reader:
>>>     counter = 0
>>>     while True:
>>>         counter += 1
>>>         try:
>>>             contents = next(iter(reader))
>>>         except StopIteration:
>>>             break
>>>         print(f'Line {counter}: {contents}', end='')
 
Line 1: An old silent pond
Line 2: A frog jumps into the pond--
Line 3: Splash! Silence again.

This exception will play an interesting role in the upcoming discussion on coroutines.