Generators: Infinite Streams
Breaking the limits of RAM with lazy evaluation. Understanding the Iterator Protocol vs yield.
1. The Big Idea (ELI5)
👶 Explain Like I'm 10: The Buffet vs The Sushi Chef
Imagine you want to eat sushi.The List (The Buffet): The restaurant cooks *all* 1,000 rolls at once and puts them on a giant table. This takes up a lot of space (Memory). If you only eat 3 rolls, the rest is wasted. It takes a long time to prepare before you can start eating.The Generator (The Sushi Chef): The Chef waits for you to ask. You say "One roll please!" (`next()`). He prepares just one. You eat it. He waits perfectly still. You ask again. He makes the next one.
Key Benefit: You can theoretically eat infinite sushi, one by one, without ever needing a table the size of the moon.
2. Iterator Protocol vs Generators
To understand Generators, we must first understand the Iterator Protocol. In Python, any object that implements `__iter__` and `__next__` is an Iterator.
# The Hard Way (Class-based Iterator)
class CountIterator:
def __init__(self, limit):
self.limit = limit
self.count = 0
def __iter__(self):
return self
def __next__(self):
if self.count >= self.limit:
raise StopIteration
self.count += 1
return self.count
# The Easy Way (Generator)
def count_generator(limit):
count = 0
while count < limit:
count += 1
yield countBoth do the same thing. But the Generator version is magically concise. When you call a function with `yield`, Python automatically creates an object that implements `__iter__` and `__next__` for you. It also saves the stack frame (local variables, code pointer) so it can resume exactly where it left off.
3. Memory Profiling: List vs Generator
The difference in memory usage can be astronomical. Let's prove it with `sys.getsizeof`.
import sys
# 1. List Comprehension (Eager)
# Creates a list of 1 million integers immediately
my_list = [i for i in range(1_000_000)]
print(f"List Size: {sys.getsizeof(my_list) / 1024 / 1024:.2f} MB")
# Output: ~8.30 MB
# 2. Generator Expression (Lazy)
# Creates a generator object, calculating nothing yet
my_gen = (i for i in range(1_000_000))
print(f"Gen Size: {sys.getsizeof(my_gen)} Bytes")
# Output: 104 Bytes (!)
# Note: 8 MB is manageable, but imagine processing 100GB of log files.
# A List would crash your RAM. A Generator uses 104 Bytes.4. Advanced Control: Send, Throw, Close
Generators are not just one-way data pumps. You can interact with them while they are running! This turns them into Coroutines.
def chatty_generator():
while True:
# yield acts as both Output and Input!
received = yield "Ready"
if received == "Stop":
print("🛑 Stopping...")
break
print(f"Processing: {received}")
gen = chatty_generator()
print(next(gen)) # Start it. Output: "Ready"
# We can send values INTO the generator
gen.send("Hello") # Output: Processing: Hello
gen.send("World") # Output: Processing: World
# We can tell it to stop
try:
gen.send("Stop")
except StopIteration:
pass # Expected when generator endsCaveat: You must always call `next(gen)` (or `gen.send(None)`) once before sending values, to "prime" the generator up to the first `yield` statement.
5. Composing Generators: `yield from`
What if a generator wants to delegate part of its work to another generator? Before Python 3.3, you had to write a loop. Now, we use `yield from`.
def sub_generator():
yield 1
yield 2
def main_generator():
yield "Start"
yield from sub_generator() # Delegates seamlessly
yield "End"
for item in main_generator():
print(item)
# Output:
# Start
# 1
# 2
# End`yield from` is more than just a loop shortcut. It creates a transparent bidirectional channel. If you `send()` a value to `main_generator`, it is passed directly to `sub_generator`, and exceptions bubbled up correctly. This is key for writing AsyncIO frameworks.
6. Real-World Architecture: Pipelines
The "Unix Pipe" philosophy works beautifully with generators. You can build complex data processing pipelines that process gigabytes of data line-by-line.
def read_file(filename):
with open(filename) as f:
for line in f:
yield line
def filter_errors(lines):
for line in lines:
if "ERROR" in line:
yield line
def extract_ip(lines):
for line in lines:
yield line.split(" ")[0] # Assume IP is first word
# Assemble the Pipeline
# No data flows yet! We are just linking pipes.
logs = read_file("server.log")
errors = filter_errors(logs)
ips = extract_ip(errors)
# Pull the trigger
for ip in ips:
print(f"Alert: {ip}")7. Infinite Sequences
Since generators generate on demand, they can model infinite series (like Fibonacci, Prime numbers, or Sensor streams) that would be impossible to store in a list.
def fibonacci():
a, b = 0, 1
while True: # Infinite loop is safe here!
yield a
a, b = b, a + b
f = fibonacci()
# We can take just what we need
top_5 = [next(f) for _ in range(5)]
print(top_5) # [0, 1, 1, 2, 3]