Data Transformation

Type Casting & Conversions

Master the art of transforming data. From simple numeric conversions to parsing complex strings, binary encodings, and restructuring collections, understand how Python's "Universal Adaptor" mechanism works.

In a perfect world, data would always arrive in the format we need. Integers would be integers, and dates would be date objects. In the real world, however, we get numbers as strings from text files, lists that contain duplicates, binary streams from networks, and floating-point noise when we need precise currency.

Casting is the process of converting a value from one data type to another. Think of it as a Universal Adaptor for your data. Just as a travel adaptor allows a US plug to fit into a UK socket, casting allows a String to plug into a Math operation, or a List to plug into a Set operation.

Because Python is Strongly Typed, it refuses to guess what you mean. It won't automatically add text to a number (like JavaScript might, resulting in "10" + 5 = "105"). It forces you to be explicit, which prevents a massive category of silent bugs.

What You'll Learn

The Big Three: Master int(), float(), and str().
Advanced Integers: Casting binary, hex, and octal strings.
Text vs Bytes: The crucial encode() and decode() dance.
Developer Vision: The difference between str() and repr().
Custom Casting: Making your own classes castable using Dunder methods.
Collection Molding: Using casting to remove duplicates or freeze lists.

Implicit vs. Explicit Conversion

Python handles conversions in two ways. Understanding the difference is key to writing clean code.

1. Implicit Conversion (Coercion)

Python performs this automatically when it is 100% safe and data loss is impossible. The most common example is mixing integers and floats.

PYTHON

# Safe Automatic Upgrade
x_int = 10      # Integer
y_float = 2.5   # Float

result = x_int + y_float
print(result)          # 12.5
print(type(result))    # <class 'float'>

# Python "promotes" the integer 10 to 10.0 to match the float.
# No precision is lost, so Python allows it.

2. Explicit Conversion (Casting)

Python never implicitly converts between types if there's ambiguity or risk of validation error. This is why "5" + 5 throws an error. Do you want "55" (String) or 10 (Integer)? Python refuses to guess. You must use a constructor function to tell it exactly what to do.

The Big Three Constructors

Three constructor functions handle 90% of everyday data transformation needs.

Constructor	Role	Can Handle	Fails On
`int(x)`	Makes Integers	Floats (truncates), Strings (whole numbers)	Strings with decimals ("4.5"), Text ("Hi")
`float(x)`	Makes Floats	Integers, Strings (decimals & scientific)	Text ("Hi")
`str(x)`	Makes Strings	Literally Anything (Everything has a string rep)	Nothing

1. The Integer Constructor: int()

int() is strict. It truncates floats (rounding down towards zero) and demands pure whole-number strings. However, it is smarter than it looks.

Parsing Bases (Binary/Hex)

Did you know int() can act as a base converter? By default it assumes Base-10, but you can decode Binary, Octal, or Hexadecimal strings.

PYTHON

# Standard Base-10
print(int("100"))      # 100

# Truncation behavior
print(int(3.99))       # 3 (Floors towards zero)

# Underscores in strings (Python 3.6+)
print(int("1_000_000")) # 1000000 (Readable!)

# Base Conversion
print(int("101", 2))   # 5  (Binary '101')
print(int("FF", 16))   # 255 (Hex 'FF')
print(int("77", 8))    # 63 (Octal '77')

# âŒ The Common Trap
# print(int("3.14"))   # ValueError! It doesn't know how to handle the dot.

# âœ… The Fix
print(int(float("3.14"))) # 3 (String -&gt; Float -&gt; Int)

2. The Float Constructor: float()

float() is more accommodating. It can handle scientific notation and special values like Infinity.

PYTHON

print(float(5))       # 5.0
print(float("3.14"))  # 3.14
print(float("1e3"))   # 1000.0 (scientific notation for 10^3)

# Special Mathematical Values
infinity = float("inf")
nan = float("nan")    # Not a Number
print(infinity &gt; 10**100) # True

3. The String Constructor: str() vs repr()

str() is the friendliest function. It can turn any object into a string representation, usually for printing or logging. However, Python has a secret second way to convert to string: repr().

str(obj): Readable, user-friendly text. (e.g., used by print()).
repr(obj): Unambiguous, developer-focused text. Often code that could recreate the object.

PYTHON

import datetime

now = datetime.datetime.now()

# User Friendly
print(str(now))   # 2023-10-25 14:30:00.123456

# Developer Friendly (Debugging)
print(repr(now))  # datetime.datetime(2023, 10, 25, 14, 30, 0, 123456)

s = "Hello
World"
print(str(s))     
# Hello
# World

print(repr(s))    # 'Hello\nWorld' (Shows the escape character)

Deep Dive: Bytes and Encodings

In the modern web, the most critical casting you will do is not between numbers, but betweenText (Unicode) and Data (Bytes).

In Python 3, str is abstract Unicode text. You can't send it over a network or save it to a file directly. You must encode it into bytes. Conversely, effectively reading a file means receiving bytes and decoding them into text.

The Encode/Decode Dance
PYTHON
# 1. Text (Abstract Idea)
text = "CafÃ© â˜•"
print(type(text)) # <class 'str'>

# âŒ You cannot save 'text' to a binary file directly.
# You must CAST it to bytes via encoding.

# 2. Encoding (Serialization)
# We cast string -&gt; bytes using UTF-8 standard
data = text.encode('utf-8')
print(type(data)) # <class 'bytes'>
print(data)       # b'CafÃ© â' (The raw numbers)

# 3. Decoding (Deserialization)
# We cast bytes -&gt; string
decoded_text = data.decode('utf-8')
print(decoded_text) # "CafÃ© â˜•"

The Boolean Constructor: Truthiness

In Python, everything can be cast to a Boolean. This is extremely useful for writing concise control flow logic.

The rule is simple: Empty/Zero/None is False. Everything else is True.

The Truthiness Chart
PYTHON
# Falsy Values (evaluate to False)
print(bool(0))          # False (Zero)
print(bool(0.0))        # False (Zero float)
print(bool(""))         # False (Empty string)
print(bool([]))         # False (Empty list)
print(bool({}))         # False (Empty dict)
print(bool(None))       # False (The void)

# Truthy Values (evaluate to True)
print(bool(1))          # True
print(bool(-5))         # True (Any non-zero number)
print(bool("False"))    # True (Non-empty string!)
print(bool([0]))        # True (Non-empty container)

Custom Casting: The Dunder Methods

Want your own classes to support casting? Python's object-oriented model allows you to define how your objects behave when passed to int(), str(), or bool(). This is done using "Dunder" (Double Underscore) methods.

PYTHON

class BitcoinWallet:
    def __init__(self, balance):
        self.balance = balance

    def __int__(self):
        """Called when int(wallet) is run"""
        return int(self.balance)

    def __str__(self):
        """Called when str(wallet) or print(wallet) is run"""
        return f"{self.balance} BTC"

    def __bool__(self):
        """Called when bool(wallet) or if wallet: is run"""
        return self.balance &gt; 0

my_wallet = BitcoinWallet(0.5)

print(int(my_wallet))   # 0 (Casts to int)
print(str(my_wallet))   # "0.5 BTC" (Casts to string)

if my_wallet:
    print("You have money!") # Uses __bool__

Collection Molding: Reshaping Data

Casting isn't just for scalars. You can cast between collection types to gain their superpowers.

1. Removes Duplicates (List -> Set -> List)

The fastest way to deduplicate a list is to cast it to a Set (which forbids duplicates) and back.

PYTHON

raw_data = ["apple", "banana", "apple", "cherry", "banana"]

# transform
unique_data = list(set(raw_data))

print(unique_data) 
# ['banana', 'cherry', 'apple'] (Order is lost!)

2. Freezing Data (List -> Tuple)

Need to use a list as a dictionary key? You can't, because lists are mutable. Cast it to a tuple first!

PYTHON

coords = [10, 20]

# location_map = {coords: "Home"} # âŒ TypeError: unhashable type: 'list'

frozen_coords = tuple(coords)
location_map = {frozen_coords: "Home"} # âœ… Works!

3. Zipping to Dictionary (List of Tuples -> Dict)

If you have paired data, dict() can snap it together intelligently.

PYTHON

pairs = [
    ("name", "Alice"),
    ("age", 30),
    ("role", "Engineer")
]

user_profile = dict(pairs)
print(user_profile["name"]) # "Alice"

Real-World Application: Safe Casting

User input is messy. You naturally assume a user will enter a number for "age", but they might enter "twenty" or "20 years". If you just run int(), your program crashes.

This pattern, known as EAFP (Easier to Ask Forgiveness than Permission), is the standard way to handle casting in production.

âŒ Fragile Code

PYTHON

def get_age():
    age_str = input("Enter age: ")
    # Crashes if user types "25ish"
    age = int(age_str)
    return age

âœ… Robust Code

PYTHON

def get_age():
    age_str = input("Enter age: ")
    try:
        # Try to cast
        return int(age_str)
    except ValueError:
        # Handle the failure gracefully
        print("Please enter a valid number (e.g., 25)")
        return None

Best Practices & Takeaways

âœ… Do

Use try/except ValueError when converting user input.
Use if my_list: instead of if len(my_list) > 0: (Implicit Bool).
Use set() casting for quick deduplication.
Use repr() when debugging to see exact characters.
Remember to decode() bytes when reading from networks/files.

âŒ Don't

Don't cast unnecessarily (e.g., str("hello")).
Don't assume set() preserves order (it behaves randomly).
Don't use eval() to cast data (security risk!).
Don't rely on int() rounding (it always floors). use round() first.

Next Steps

You've mastered transforming data. Now let's look at the data types themselves in more detail, specifically the complex world of Numbers and floating-point precision.

Next: Numbers & Precision â†’

Type Casting & Conversions

What You'll Learn

Implicit vs. Explicit Conversion

1. Implicit Conversion (Coercion)

2. Explicit Conversion (Casting)

The Big Three Constructors

1. The Integer Constructor: int()

Parsing Bases (Binary/Hex)

2. The Float Constructor: float()

3. The String Constructor: str() vs repr()

Deep Dive: Bytes and Encodings

The Boolean Constructor: Truthiness

Custom Casting: The Dunder Methods

Collection Molding: Reshaping Data

1. Removes Duplicates (List -> Set -> List)

2. Freezing Data (List -> Tuple)

3. Zipping to Dictionary (List of Tuples -> Dict)

Real-World Application: Safe Casting

âŒ Fragile Code

âœ… Robust Code

Best Practices & Takeaways

âœ… Do

âŒ Don't

Next Steps

âŒ Fragile Code

âŒ Don't