Sets: Hashing & Fast Lookups
A Set is not just a "list without duplicates". It's a high-performance mathematical tool backed by Hash Tables. It trades memory for speed, offering O(1) lookups that can be 1000x faster than Lists.
If you have a badge ID 12345 and you want to check if it's in a list of 1 million IDs, Python has to scan them one by one. That's O(N) (Linear Time).
If you use a Set, Python hashes 12345 → 0xAF32... and jumps directly to that memory slot. That's O(1) (Constant Time). Size doesn't matter.
What You'll Learn
- The Magic: How Hash Tables allow instantaneous lookups.
- Venn Diagrams: Using
|,&,-, and^for logic. - Hashability: Why you can't put a list inside a set.
- Frozensets: The immutable set (usable as a dictionary key).
The Power of Mathematics (Set Ops)
Sets aren't just for storage; they are for logic. You can perform complex filtering using mathematical operators.
Imagine strict permissions: "Users who are Admins BUT NOT Editors". In loops, this is 10 lines of code. With sets, it's 1 symbol based on Venn Diagrams.
engineers = {"Alice", "Bob", "Charlie", "Dave"}
managers = {"Charlie", "Dave", "Eve"}
# 1. Union (|): Everyone involved
print(engineers | managers)
# {'Alice', 'Bob', 'Charlie', 'Dave', 'Eve'}
# 2. Intersection (&): The Engineers who are ALSO Managers
print(engineers & managers)
# {'Charlie', 'Dave'}
# 3. Difference (-): Engineers who are NOT Managers
print(engineers - managers)
# {'Alice', 'Bob'}
# 4. Symmetric Difference (^): People who are ONE role, but NOT both
print(engineers ^ managers)
# {'Alice', 'Bob', 'Eve'}The Cost of Speed: Hashing
Sets are fast because they rely on hash(). This introduces a strict rule: Set elements must be immutable (hashable).
You can put an integer, string, or tuple into a set. You cannot put a list or dictionary into a set, because if the list changed, its hash would break the lookup logic.
# ✅ Valid: Tuple is immutable
s = {(1, 2), (3, 4)}
# ⌠Invalid: List is mutable
# TypeError: unhashable type: 'list'
s = {[1, 2], [3, 4]}dict keys.Performance Case Study
Why bother converting a list to a set? Let's check the speed difference.
# Looking for a needle in a haystack of size 10,000,000
# LIST Lookup (O(N))
# Python must check 10 million items one by one.
# Time: ~1.5 seconds
# SET Lookup (O(1))
# Python calculates hash("needle") and jumps there.
# Time: ~0.00001 seconds
# Conclusion: If you are doing frequent lookups (ex: "if x in collection"), ALWAYS uses a Set.Summary & Best Practices
✅ Do
- Use sets for large "allow-lists" or "block-lists".
- Use
set(list)to quickly remove duplicates (deduplication). - Use
frozenset()if you need a set as a dict key.
⌠Don't
- Don't assume sets preserve insertion order.
- Don't try to modify a set while iterating over it (RuntimeError).