Iterators & Generators in Python
Let’s assume a scenario where I’m giving a technical interview and I’m asked to write a program in python that would print out the squares of the first ten numbers as output but without using a while loop and the range() function.
One simple way of completing this task would be to store the first ten numbers in a list and loop through them using a for loop to perform the square of each number and print it out on the console. See the following code snippet.
Now the problem with this is that I’m storing these 10 numbers in memory, and for me to be able to loop through them, I need to store all of them in a data structure, which in this case is a list.
Now that’s not very efficient. Moreover, with 10 numbers, this is fine, but you can imagine if I wanted to loop through a hundred thousand numbers, a million numbers, a billion numbers, it really doesn’t make any sense at all for me to store the entire sequence of numbers and memory, especially if all I’m going to do is process them one at a time, and I don’t need to know what all other values are as I loop through the numbers, which is meaningless and deliberate wastage of memory, making this program memory inefficient.
That is why we use something like the range() function. Let’s modify our code and see what’s the output. See the following code snippet.
I get the exact same results looping through the range of 1 to 11. Except when I use the range function here, I don’t actually need to store all of the numbers 1 through 10 in a data structure.
If I were to look at the size of this data structure (list of numbers from 1 to 10) versus the size of the range function, we should see that this data structure has a larger size than the range function because the range function doesn’t need to store all of these numbers.
See the following code snippet.
The reason for this massive difference between the memory usage of both these approaches is the range() function internally uses iterators.
So now we can really talk about what an iterator is. In simple terms, an iterator is really something that allows us to loop through a sequence of numbers or some data without having to store them all. The range() function is a great example of an iterator because it lets us loop through all of these numbers without having to store them in some data structure.
Now let’s have a look at how the range() function works under the hood.
Now notice when we print X, we actually just get a string representation of the range function.
The other print statement states the type of X is an object of class range. The final print statement shows a list of all the attributes and methods that belong to the object X. We can see that there is an ‘__iter__()’ method in this list.
In order to get an iterator from the range object (X) we will have to call __iter__() method on X.
Here we can clearly see that the x.__iter__() returns an iterator object and the memory address of that object.
Now in order to fetch the value of this iterator, we’ll need to call another method named __next__().
Let’s see how we can do that. Refer to the following code snippet.
We can see that the value returned by the iterator is 1. Notice that the line of code iter_obj.__next__() did not return the entire number list (i.e. from 1 to 10). It only returns the value 1 as the next() method returns one value at a time.
Let us see what happens if we call the same statement multiple times.
Here we can see that upon every call to the next() function, the function returns the next single value in the range between 1 to 10.
In order to avoid typing the single statement again and again to get the next value, let’s put it in a loop. Refer to the code snippet below.
We’ve used an infinite loop to loop through all the values of the range. Notice, we’ve caught an exception of StopIteration because once the next method runs out of value range it throws the StopIteration exception and that is how we get to know when to break our infinite loop and stop the iteration.
Now we know how iterators work in the background and two functions are required to make up an iterator ( iter() and next() ).
Now instead of writing and calling these two special methods iter() and next(), python has provided us with Generators. Now I’m going to introduce to you the generators, which is a much more elegant and nice way of creating an iterator. So the generator syntax is quite simple, you create a function and make use of the ‘yield’ keyword instead of the ‘return’ keyword to get the required value from the function.
Let’s see the difference between a function which uses the ‘return’ keyword and the other uses a ‘yield’ keyword.
We’ll create a function whose purpose will be to give us all the natural numbers less than the number passed as a parameter.
Case 1: Function making a use of ‘return’ keyword
In this case, as we’ve used the return keyword, notice that the value of x that we got was 1 and the function execution stopped in the very first execution and we only got one value from the function.
Case 2: Function making a use of ‘yield’ keyword
Now in this case, as we’ve used the yield keyword, and we have got the generator object as a value of x, which means that we can loop through this generator object and fetch the values.
What happens here is that the function gets called with 11 as a parameter and then it initializes a local variable i with the value of 1. Then it checks if the value of the variable i is less than the parameter n, if yes then it hits the yield keyword. As soon as the yield keyword is seen by the interpreter, it PAUSES the execution of the function and saves the state of the function and throws the value of variable i to the caller and prints the value on the console. It then resumes the execution of the function and increments the value of variable i by 1 and continues to check the condition if i < n. For every execution where the value of variable i is less than the value of variable n the function will yield the value of i to the caller and then end if the condition fails.
Here we can clearly see that if we are only concerned with a single value at a time, then how beneficial generators could be.
Now let’s talk about a practical example of the use of generators.
Let’s say we have a file and we are tasked to search for a particular word in the file. Now instead of reading the entire file at once and storing it in the main memory and searching for the word, what we can do is, read one line from the file at a time and see if the given word is present in the line or not. If the word is found we can terminate the program otherwise we can print to the console the word was not found.
Let’s have a look at the code and the text file.
The above code snippet illustrates a function named ‘word_finder()’ which uses ‘yield’ keyword to read a single row of the file and returns it to the caller and checks if the word ‘name’ is present in the row. If the word is found, the program prints “Word Found In The File!” otherwise “Word Not Found In File!”
The source code for this article can be found here.