Python Mastery: Complete Beginner to Professional
HomeInsightsCoursesPythonSets: Hashing & Fast Lookups
Hash Tables

Sets: Hashing & Fast Lookups

A Set is not just a "list without duplicates". It's a high-performance mathematical tool backed by Hash Tables. It trades memory for speed, offering O(1) lookups that can be 1000x faster than Lists.

If you have a badge ID 12345 and you want to check if it's in a list of 1 million IDs, Python has to scan them one by one. That's O(N) (Linear Time).

If you use a Set, Python hashes 12345 → 0xAF32... and jumps directly to that memory slot. That's O(1) (Constant Time). Size doesn't matter.

What You'll Learn

  • The Magic: How Hash Tables allow instantaneous lookups.
  • Venn Diagrams: Using |, &, -, and ^ for logic.
  • Hashability: Why you can't put a list inside a set.
  • Frozensets: The immutable set (usable as a dictionary key).

The Power of Mathematics (Set Ops)

Sets aren't just for storage; they are for logic. You can perform complex filtering using mathematical operators.

Imagine strict permissions: "Users who are Admins BUT NOT Editors". In loops, this is 10 lines of code. With sets, it's 1 symbol based on Venn Diagrams.

Binary Logic Refactored
PYTHON
engineers = {"Alice", "Bob", "Charlie", "Dave"}
managers = {"Charlie", "Dave", "Eve"}

# 1. Union (|): Everyone involved
print(engineers | managers)
# {'Alice', 'Bob', 'Charlie', 'Dave', 'Eve'}

# 2. Intersection (&): The Engineers who are ALSO Managers
print(engineers & managers)
# {'Charlie', 'Dave'}

# 3. Difference (-): Engineers who are NOT Managers
print(engineers - managers)
# {'Alice', 'Bob'}

# 4. Symmetric Difference (^): People who are ONE role, but NOT both
print(engineers ^ managers)
# {'Alice', 'Bob', 'Eve'}

The Cost of Speed: Hashing

Sets are fast because they rely on hash(). This introduces a strict rule: Set elements must be immutable (hashable).

You can put an integer, string, or tuple into a set. You cannot put a list or dictionary into a set, because if the list changed, its hash would break the lookup logic.

PYTHON
# ✅ Valid: Tuple is immutable
s = {(1, 2), (3, 4)}

# ❌ Invalid: List is mutable
# TypeError: unhashable type: 'list'
s = {[1, 2], [3, 4]}
⚠️
No Order Guarantee: Before Python 3.7, sets were completely random. Now they are... "implementation dependent". Never rely on the order of a set. If you need order + uniqueness, use a dict keys.

Performance Case Study

Why bother converting a list to a set? Let's check the speed difference.

PYTHON
# Looking for a needle in a haystack of size 10,000,000

# LIST Lookup (O(N))
# Python must check 10 million items one by one.
# Time: ~1.5 seconds

# SET Lookup (O(1))
# Python calculates hash("needle") and jumps there.
# Time: ~0.00001 seconds

# Conclusion: If you are doing frequent lookups (ex: "if x in collection"), ALWAYS uses a Set.

Summary & Best Practices

✅ Do

  • Use sets for large "allow-lists" or "block-lists".
  • Use set(list) to quickly remove duplicates (deduplication).
  • Use frozenset() if you need a set as a dict key.

❌ Don't

  • Don't assume sets preserve insertion order.
  • Don't try to modify a set while iterating over it (RuntimeError).

Next Steps

We've seen Lists (Arrays) and Sets (Hash Tables). Now let's meet the King of Python Data Structures: the Key-Value store that powers the entire language.