Python Tidbits: Small Python tips, tricks, and packages you wish you knew about yesterday¶

by Nick Hodgskin

This talk will mainly be code examples so that we can learn about these Python features by doing. I am using Python 3.12, but these features work in Python 3.6 and above.

Let's get started! We have many examples to go through.

Native Python Tricks¶

f-strings¶

In [58]:

Copied!





# String concatenation
name = "John"
age = 25
print("Hello, " + name + "! You are " + str(age) + " years old.")


# Python 2: % syntax
name = "Alice"
age = 30
greeting = "Hello, %s! You are %d years old." % (name, age)
print(greeting)

# Python 3: .format() syntax
name = "Bob"
age = 25
greeting = "Hello, {}! You are {} years old.".format(name, age)
print(greeting)

# Python 3.6+: f-strings (the best!)
name = "Charlie"
age = 28
greeting = f"Hello, {name}! You are {age} years old."
print(greeting)
# String concatenation
name = "John"
age = 25
print("Hello, " + name + "! You are " + str(age) + " years old.")


# Python 2: % syntax
name = "Alice"
age = 30
greeting = "Hello, %s! You are %d years old." % (name, age)
print(greeting)

# Python 3: .format() syntax
name = "Bob"
age = 25
greeting = "Hello, {}! You are {} years old.".format(name, age)
print(greeting)

# Python 3.6+: f-strings (the best!)
name = "Charlie"
age = 28
greeting = f"Hello, {name}! You are {age} years old."
print(greeting)

Hello, John! You are 25 years old.
Hello, Alice! You are 30 years old.
Hello, Bob! You are 25 years old.
Hello, Charlie! You are 28 years old.

In [59]:

Copied!





# Bonus: f-strings can evaluate expressions inline
a = 5
b = 10
result = f"The sum of {a} and {b} is {a + b}."
print(result)


def multiply(x, y):
    return x * y

a = 5
b = 10
result = f"The product of {a} and {b} is {multiply(a, b)}."
print(result)
# Bonus: f-strings can evaluate expressions inline
a = 5
b = 10
result = f"The sum of {a} and {b} is {a + b}."
print(result)


def multiply(x, y):
    return x * y

a = 5
b = 10
result = f"The product of {a} and {b} is {multiply(a, b)}."
print(result)

The sum of 5 and 10 is 15.
The product of 5 and 10 is 50.

In [60]:

Copied!





# Bonus: f-strings support formatting options
pi = 3.14159265
formatted_pi = f"Pi rounded to 2 decimal places: {pi:.2f}"
print(formatted_pi)

radius = 6_371_000  # 6,371 km in meters
circumference = 2 * pi * radius
print(f"Earth's circumference (4 decimal places): {circumference:.4e} meters")
print(f"Earth's circumference (4 significant digits): {circumference:.4g} meters")
# Bonus: f-strings support formatting options
pi = 3.14159265
formatted_pi = f"Pi rounded to 2 decimal places: {pi:.2f}"
print(formatted_pi)

radius = 6_371_000  # 6,371 km in meters
circumference = 2 * pi * radius
print(f"Earth's circumference (4 decimal places): {circumference:.4e} meters")
print(f"Earth's circumference (4 significant digits): {circumference:.4g} meters")

Pi rounded to 2 decimal places: 3.14
Earth's circumference (4 decimal places): 4.0030e+07 meters
Earth's circumference (4 significant digits): 4.003e+07 meters

You can find out more about formatting options at W3Schools: Python String Formatting.

Quick reference of format specified (mentioned in the article):

:<		Left aligns the result (within the available space)
:>		Right aligns the result (within the available space)
:^		Center aligns the result (within the available space)
:=		Places the sign to the left most position
:+		Use a plus sign to indicate if the result is positive or negative
:-		Use a minus sign for negative values only
: 		Use a space to insert an extra space before positive numbers (and a minus sign before negative numbers)
:,		Use a comma as a thousand separator
:_		Use a underscore as a thousand separator
:b		Binary format
:c		Converts the value into the corresponding Unicode character
:d		Decimal format
:e		Scientific format, with a lower case e
:E		Scientific format, with an upper case E
:f		Fix point number format
:F		Fix point number format, in uppercase format (show inf and nan as INF and NAN)
:g		General format
:G		General format (using a upper case E for scientific notations)
:o		Octal format
:x		Hex format, lower case
:X		Hex format, upper case
:n		Number format
:%		Percentage format

enumerate and zip¶

In [61]:

Copied!





# Use enumerate to loop over an iterable while keeping track of the index.

# Without enumerate
fruits = ['apple', 'banana', 'cherry']
for i in range(len(fruits)):
    print(i, fruits[i])

# With enumerate
for i, fruit in enumerate(fruits):
    print(i, fruit)

# Bonus: Start indexing at a custom number
for i, fruit in enumerate(fruits, start=1):
    print(i, fruit)
# Use enumerate to loop over an iterable while keeping track of the index.

# Without enumerate
fruits = ['apple', 'banana', 'cherry']
for i in range(len(fruits)):
    print(i, fruits[i])

# With enumerate
for i, fruit in enumerate(fruits):
    print(i, fruit)

# Bonus: Start indexing at a custom number
for i, fruit in enumerate(fruits, start=1):
    print(i, fruit)

0 apple
1 banana
2 cherry
0 apple
1 banana
2 cherry
1 apple
2 banana
3 cherry

In [62]:

Copied!

# under the hood
print(enumerate(fruits))
print(list(enumerate(fruits)))
# under the hood
print(enumerate(fruits))
print(list(enumerate(fruits)))

<enumerate object at 0x1076f9350>
[(0, 'apple'), (1, 'banana'), (2, 'cherry')]

In [63]:

Copied!





# Use zip to loop over multiple iterables in parallel.

# Without zip
names = ['Alice', 'Bob', 'Charlie']
scores = [85, 90, 95]
for i in range(len(names)):
    print(names[i], scores[i])

# With zip
for name, score in zip(names, scores):
    print(name, score)
# Use zip to loop over multiple iterables in parallel.

# Without zip
names = ['Alice', 'Bob', 'Charlie']
scores = [85, 90, 95]
for i in range(len(names)):
    print(names[i], scores[i])

# With zip
for name, score in zip(names, scores):
    print(name, score)

Alice 85
Bob 90
Charlie 95
Alice 85
Bob 90
Charlie 95

In [64]:

Copied!





# Bonus: Unzipping
pairs = list(zip(names, scores))
print('pairs:', pairs)
names_unzipped, scores_unzipped = zip(*pairs)
print("names_unzipped:", names_unzipped)
print("scores_unzipped:", scores_unzipped)
# Bonus: Unzipping
pairs = list(zip(names, scores))
print('pairs:', pairs)
names_unzipped, scores_unzipped = zip(*pairs)
print("names_unzipped:", names_unzipped)
print("scores_unzipped:", scores_unzipped)

pairs: [('Alice', 85), ('Bob', 90), ('Charlie', 95)]
names_unzipped: ('Alice', 'Bob', 'Charlie')
scores_unzipped: (85, 90, 95)

list comprehensions¶

In [65]:

Copied!





numbers = [1, 2, 3, 4, 5]

# Example 1: Basic list comprehension
# Squaring numbers in a list

# using a for loop
squares = []
for x in numbers:
    squares.append(x**2)
print(squares)

# using a list comprehension
squares = [x**2 for x in numbers]
print(squares)
numbers = [1, 2, 3, 4, 5]

# Example 1: Basic list comprehension
# Squaring numbers in a list

# using a for loop
squares = []
for x in numbers:
    squares.append(x**2)
print(squares)

# using a list comprehension
squares = [x**2 for x in numbers]
print(squares)

[1, 4, 9, 16, 25]
[1, 4, 9, 16, 25]

In [66]:

Copied!





# Example 2: Using `if` to filter elements
# Keeping only even numbers

# Using a for loop
evens = []
for x in numbers:
    if x % 2 == 0:
        evens.append(x)
print(evens)

# list comprehension
evens = [x for x in numbers if x % 2 == 0]
print(evens)
# Example 2: Using `if` to filter elements
# Keeping only even numbers

# Using a for loop
evens = []
for x in numbers:
    if x % 2 == 0:
        evens.append(x)
print(evens)

# list comprehension
evens = [x for x in numbers if x % 2 == 0]
print(evens)

[2, 4]
[2, 4]

In [67]:

Copied!





# Example 3: Using `if` and `else` in a list comprehension
# Replacing odd numbers with -1

# Using a for loop
processed = []
for x in numbers:
    if x % 2 == 0:
        processed.append(x)
    else:
        processed.append(-1)
print(processed)

# list comprehension
processed = [x if x % 2 == 0 else -1 for x in numbers]
print(processed)
# Example 3: Using `if` and `else` in a list comprehension
# Replacing odd numbers with -1

# Using a for loop
processed = []
for x in numbers:
    if x % 2 == 0:
        processed.append(x)
    else:
        processed.append(-1)
print(processed)

# list comprehension
processed = [x if x % 2 == 0 else -1 for x in numbers]
print(processed)

[-1, 2, -1, 4, -1]
[-1, 2, -1, 4, -1]

In [68]:

Copied!





# Bonus: Filtering out negative values from data
data = [3.2, -1.5, 0.0, 4.7, -2.3, 5.6]
cleaned_data = [x for x in data if x >= 0]
print(cleaned_data)
# Bonus: Filtering out negative values from data
data = [3.2, -1.5, 0.0, 4.7, -2.3, 5.6]
cleaned_data = [x for x in data if x >= 0]
print(cleaned_data)

[3.2, 0.0, 4.7, 5.6]

sets¶

In [115]:

Copied!





# Creating a set
unique_numbers = {1, 2, 3, 4, 5}
print("unique_numbers:", unique_numbers)

# Adding elements to a set
unique_numbers.add(6)
print("unique_numbers (added 6):", unique_numbers)

# Sets automatically handle duplicates
unique_numbers.add(3)
print("unique_numbers (added 3)", unique_numbers)

# Using sets to remove duplicates from a list
data_with_duplicates = [5, 1, 2, 2, 3, 4, 4]
print("data_with_duplicates:", data_with_duplicates)
unique_data = list(set(data_with_duplicates))
print("unique_data:", unique_data) # Order not preserved
# Creating a set
unique_numbers = {1, 2, 3, 4, 5}
print("unique_numbers:", unique_numbers)

# Adding elements to a set
unique_numbers.add(6)
print("unique_numbers (added 6):", unique_numbers)

# Sets automatically handle duplicates
unique_numbers.add(3)
print("unique_numbers (added 3)", unique_numbers)

# Using sets to remove duplicates from a list
data_with_duplicates = [5, 1, 2, 2, 3, 4, 4]
print("data_with_duplicates:", data_with_duplicates)
unique_data = list(set(data_with_duplicates))
print("unique_data:", unique_data) # Order not preserved

unique_numbers: {1, 2, 3, 4, 5}
unique_numbers (added 6): {1, 2, 3, 4, 5, 6}
unique_numbers (added 3) {1, 2, 3, 4, 5, 6}
data_with_duplicates: [5, 1, 2, 2, 3, 4, 4]
unique_data: [1, 2, 3, 4, 5]

In [70]:

Copied!





# Set operations

# Define two sets
set_a = {1, 2, 3, 4, 5}
set_b = {4, 5, 6, 7, 8}
print("A:", set_a)
print("B:", set_b)

# Union: Combine elements from both sets (no duplicates)
union_set = set_a | set_b  # or set_a.union(set_b)
print("Union (set_a | set_b):", union_set)

# Difference: Elements in set_a but not in set_b
difference_set = set_a - set_b  # or set_a.difference(set_b)
print("Difference (set_a - set_b):", difference_set)

# Intersection: Elements common to both sets
intersection_set = set_a & set_b  # or set_a.intersection(set_b)
print("Intersection (set_a & set_b):", intersection_set)

# Symmetric Difference: Elements in either set but not in both
symmetric_diff_set = set_a ^ set_b  # or set_a.symmetric_difference(set_b)
print("Symmetric Difference (set_a ^ set_b):", symmetric_diff_set)
# Set operations

# Define two sets
set_a = {1, 2, 3, 4, 5}
set_b = {4, 5, 6, 7, 8}
print("A:", set_a)
print("B:", set_b)

# Union: Combine elements from both sets (no duplicates)
union_set = set_a | set_b  # or set_a.union(set_b)
print("Union (set_a | set_b):", union_set)

# Difference: Elements in set_a but not in set_b
difference_set = set_a - set_b  # or set_a.difference(set_b)
print("Difference (set_a - set_b):", difference_set)

# Intersection: Elements common to both sets
intersection_set = set_a & set_b  # or set_a.intersection(set_b)
print("Intersection (set_a & set_b):", intersection_set)

# Symmetric Difference: Elements in either set but not in both
symmetric_diff_set = set_a ^ set_b  # or set_a.symmetric_difference(set_b)
print("Symmetric Difference (set_a ^ set_b):", symmetric_diff_set)

A: {1, 2, 3, 4, 5}
B: {4, 5, 6, 7, 8}
Union (set_a | set_b): {1, 2, 3, 4, 5, 6, 7, 8}
Difference (set_a - set_b): {1, 2, 3}
Intersection (set_a & set_b): {4, 5}
Symmetric Difference (set_a ^ set_b): {1, 2, 3, 6, 7, 8}

In [71]:

Copied!





# Practical example: Finding unique elements in two datasets
data_1 = {10, 20, 30, 40, 50}
data_2 = {30, 40, 50, 60, 70}

# Unique elements in either dataset
unique_elements = data_1 ^ data_2
print("Unique elements in either dataset:", unique_elements)
# Practical example: Finding unique elements in two datasets
data_1 = {10, 20, 30, 40, 50}
data_2 = {30, 40, 50, 60, 70}

# Unique elements in either dataset
unique_elements = data_1 ^ data_2
print("Unique elements in either dataset:", unique_elements)

Unique elements in either dataset: {20, 70, 10, 60}

getting help straight from Python (dir(), help(), locals())¶

In [72]:

Copied!

# 1. Listing available methods and attributes with `dir()`
my_list = [1, 2, 3]
dir(my_list)  # Shows all methods and attributes of the list object
# 1. Listing available methods and attributes with `dir()`
my_list = [1, 2, 3]
dir(my_list)  # Shows all methods and attributes of the list object

Out[72]:

['__add__',
 '__class__',
 '__class_getitem__',
 '__contains__',
 '__delattr__',
 '__delitem__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__iadd__',
 '__imul__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__reversed__',
 '__rmul__',
 '__setattr__',
 '__setitem__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'append',
 'clear',
 'copy',
 'count',
 'extend',
 'index',
 'insert',
 'pop',
 'remove',
 'reverse',
 'sort']

In [73]:

Copied!

# 2. Getting detailed help with `help()`
help(my_list.pop)  # Displays documentation for the `append` method
# 2. Getting detailed help with `help()`
help(my_list.pop)  # Displays documentation for the `append` method

Help on built-in function pop:

pop(index=-1, /) method of builtins.list instance
    Remove and return item at index (default last).

    Raises IndexError if list is empty or index is out of range.

In [74]:

Copied!





print("List before:", my_list)
popped = my_list.pop(0)
print("List after:", my_list)
print("Popped item:", popped)
print("List before:", my_list)
popped = my_list.pop(0)
print("List after:", my_list)
print("Popped item:", popped)

List before: [1, 2, 3]
List after: [2, 3]
Popped item: 1

In [75]:

Copied!





# 3. Inspecting local variables with `locals()`
def example_function():
    x = 10
    y = 20
    print(locals())  # Shows all local variables in the current scope
    # globals() would do the same but for global variables

example_function()

# 3. Inspecting local variables with `locals()`
def example_function():
    x = 10
    y = 20
    print(locals())  # Shows all local variables in the current scope
    # globals() would do the same but for global variables

example_function()

{'x': 10, 'y': 20}

advanced sorting using keys¶

In [76]:

Copied!





# normal sorting
lst = [2, 1, 3, 6, 5, 4]
print("list (unsorted):", lst)
lst.sort()
print("list (sorted):", lst)
# normal sorting
lst = [2, 1, 3, 6, 5, 4]
print("list (unsorted):", lst)
lst.sort()
print("list (sorted):", lst)

list (unsorted): [2, 1, 3, 6, 5, 4]
list (sorted): [1, 2, 3, 4, 5, 6]

In [77]:

Copied!





# Example: Sorting a list of tuples by the second element
def return_second_element(x):
    return x[1]
data = [(1, 20), (3, 15), (2, 25), (4, 10)]
print("data (unsorted):", data)
sorted_data = sorted(data, key=return_second_element)
print("data (sorted by the second element):", sorted_data)


# ...using an inline lambda function
data = [(1, 20), (3, 15), (2, 25), (4, 10)]
print("data (unsorted):", data)
sorted_data = sorted(data, key=lambda x: x[1])
print("data (sorted by the second element):", sorted_data)
# Example: Sorting a list of tuples by the second element
def return_second_element(x):
    return x[1]
data = [(1, 20), (3, 15), (2, 25), (4, 10)]
print("data (unsorted):", data)
sorted_data = sorted(data, key=return_second_element)
print("data (sorted by the second element):", sorted_data)


# ...using an inline lambda function
data = [(1, 20), (3, 15), (2, 25), (4, 10)]
print("data (unsorted):", data)
sorted_data = sorted(data, key=lambda x: x[1])
print("data (sorted by the second element):", sorted_data)

data (unsorted): [(1, 20), (3, 15), (2, 25), (4, 10)]
data (sorted by the second element): [(4, 10), (3, 15), (1, 20), (2, 25)]
data (unsorted): [(1, 20), (3, 15), (2, 25), (4, 10)]
data (sorted by the second element): [(4, 10), (3, 15), (1, 20), (2, 25)]

In [78]:

Copied!





# Example: Sorting a list of dictionaries by a specific key
data = [{'name': 'Alice', 'age': 25}, {'name': 'Bob', 'age': 20}, {'name': 'Charlie', 'age': 30}]
print("data (unsorted):", data)
sorted_data = sorted(data, key=lambda x: x['age'])
print("data (sorted by age):", data)

# Example: Sorting strings by their length
words = ['apple', 'banana', 'kiwi', 'cherry']
print("words (unsorted):", words)
sorted_words = sorted(words, key=len)
print("words (sorted by length):", sorted_words)
sorted_words = sorted(words, key=len, reverse=True)
print("words (sorted reverse by length):", sorted_words)
# Example: Sorting a list of dictionaries by a specific key
data = [{'name': 'Alice', 'age': 25}, {'name': 'Bob', 'age': 20}, {'name': 'Charlie', 'age': 30}]
print("data (unsorted):", data)
sorted_data = sorted(data, key=lambda x: x['age'])
print("data (sorted by age):", data)

# Example: Sorting strings by their length
words = ['apple', 'banana', 'kiwi', 'cherry']
print("words (unsorted):", words)
sorted_words = sorted(words, key=len)
print("words (sorted by length):", sorted_words)
sorted_words = sorted(words, key=len, reverse=True)
print("words (sorted reverse by length):", sorted_words)

data (unsorted): [{'name': 'Alice', 'age': 25}, {'name': 'Bob', 'age': 20}, {'name': 'Charlie', 'age': 30}]
data (sorted by age): [{'name': 'Alice', 'age': 25}, {'name': 'Bob', 'age': 20}, {'name': 'Charlie', 'age': 30}]
words (unsorted): ['apple', 'banana', 'kiwi', 'cherry']
words (sorted by length): ['kiwi', 'apple', 'banana', 'cherry']
words (sorted reverse by length): ['banana', 'cherry', 'apple', 'kiwi']

filter and map¶

Filter and map aren't necessary to know - you can get away with for loops - but it's an alternative way of doing things that may be more readable/faster for your use case.

In [79]:

Copied!





# Example: Filter even numbers from a list
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

# Using filter with a lambda function
evens = filter(lambda x: x % 2 == 0, numbers)
print(list(evens))  # note its `filter(function, iterable)`, and note that `filter` returns an iterator (not a list - hence the `list()` call)
# Example: Filter even numbers from a list
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

# Using filter with a lambda function
evens = filter(lambda x: x % 2 == 0, numbers)
print(list(evens))  # note its `filter(function, iterable)`, and note that `filter` returns an iterator (not a list - hence the `list()` call)

[2, 4, 6, 8, 10]

In [80]:

Copied!





# Example: Square all numbers in a list
numbers = [1, 2, 3, 4, 5]

# Using map with a lambda function
squared = map(lambda x: x**2, numbers)
print(list(squared))
# Example: Square all numbers in a list
numbers = [1, 2, 3, 4, 5]

# Using map with a lambda function
squared = map(lambda x: x**2, numbers)
print(list(squared))

[1, 4, 9, 16, 25]

In [81]:

Copied!





# Example: Square only even numbers
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

# Filter even numbers, then square them
result = map(lambda x: x**2, filter(lambda x: x % 2 == 0, numbers))
print(list(result))
# Example: Square only even numbers
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

# Filter even numbers, then square them
result = map(lambda x: x**2, filter(lambda x: x % 2 == 0, numbers))
print(list(result))

[4, 16, 36, 64, 100]

In [82]:

Copied!





# A more complex example: Converting Celsius to Fahrenheit

# Raw data: Some values are invalid (None or outliers)
data = [22.5, None, 18.3, 1000, 25.0, None, 19.8, 30.2, -999]

# Step 1: Filter out invalid values (None and outliers)
valid_data = filter(lambda x: x is not None and -50 <= x <= 50, data)

# Step 2: Convert Celsius to Fahrenheit
def c_to_f(celsius):
    return celsius * 9/5 + 32

fahrenheit_data = map(c_to_f, valid_data)

# Step 3: Round to 2 decimal places
rounded_data = map(lambda x: round(x, 2), fahrenheit_data)

# Final result
print(list(rounded_data))
# A more complex example: Converting Celsius to Fahrenheit

# Raw data: Some values are invalid (None or outliers)
data = [22.5, None, 18.3, 1000, 25.0, None, 19.8, 30.2, -999]

# Step 1: Filter out invalid values (None and outliers)
valid_data = filter(lambda x: x is not None and -50 <= x <= 50, data)

# Step 2: Convert Celsius to Fahrenheit
def c_to_f(celsius):
    return celsius * 9/5 + 32

fahrenheit_data = map(c_to_f, valid_data)

# Step 3: Round to 2 decimal places
rounded_data = map(lambda x: round(x, 2), fahrenheit_data)

# Final result
print(list(rounded_data))

[72.5, 64.94, 77.0, 67.64, 86.36]

Why is this powerful?:

Readability: Each step is clearly separated and easy to understand.
Lazy Evaluation: filter and map process data on-demand, which is memory-efficient for large datasets.
Functional Style: Avoids mutable state and side effects, making the code more predictable.

Python packages: Standard Library¶

pprint¶

In [83]:

Copied!





# Example: A messy nested data structure
data = [[{
    "experiment": {
        "name": "North Atlantic",
        "samples": [
            {"id": 1, "temperature": 298.15, "results": [0.1, 0.2, 0.3]},
            {"id": 2, "temperature": 310.15, "results": [0.15, 0.25, 0.35]},
        ],
        "metadata": {
            "author": "Dr. Smith",
            "date": "2023-10-01",
            "tags": ["biophysics", "simulation"],
        },
    }
}]]

# Standard print output (hard to read)
print(data)
# Example: A messy nested data structure
data = [[{
    "experiment": {
        "name": "North Atlantic",
        "samples": [
            {"id": 1, "temperature": 298.15, "results": [0.1, 0.2, 0.3]},
            {"id": 2, "temperature": 310.15, "results": [0.15, 0.25, 0.35]},
        ],
        "metadata": {
            "author": "Dr. Smith",
            "date": "2023-10-01",
            "tags": ["biophysics", "simulation"],
        },
    }
}]]

# Standard print output (hard to read)
print(data)

[[{'experiment': {'name': 'North Atlantic', 'samples': [{'id': 1, 'temperature': 298.15, 'results': [0.1, 0.2, 0.3]}, {'id': 2, 'temperature': 310.15, 'results': [0.15, 0.25, 0.35]}], 'metadata': {'author': 'Dr. Smith', 'date': '2023-10-01', 'tags': ['biophysics', 'simulation']}}}]]

In [84]:

Copied!

from pprint import pprint

# Pretty-printed output (clean and readable)
pprint(data)
from pprint import pprint

# Pretty-printed output (clean and readable)
pprint(data)

[[{'experiment': {'metadata': {'author': 'Dr. Smith',
                               'date': '2023-10-01',
                               'tags': ['biophysics', 'simulation']},
                  'name': 'North Atlantic',
                  'samples': [{'id': 1,
                               'results': [0.1, 0.2, 0.3],
                               'temperature': 298.15},
                              {'id': 2,
                               'results': [0.15, 0.25, 0.35],
                               'temperature': 310.15}]}}]]

pathlib¶

See pathlib docs for more info.

In [85]:

Copied!





from pathlib import Path

# Create a Path object
data_dir = Path("data")  # Represents a directory named "data"

# Check if the directory exists
if not data_dir.exists():
    data_dir.mkdir()  # Create the directory if it doesn't exist

# Create a file path
data_file = data_dir / "experiment_results.csv"  # Use / to join paths

# Write to the file
data_file.write_text("Sample data\n")  # Write text to the file

# Read from the file
print(data_file.read_text())  # Read text from the file

# Iterate over files in a directory
for file in data_dir.glob("*.csv"):  # Find all CSV files
    print(f"Found file: {file.name}")

print("if you want the full path:", data_file.resolve())
print("if you want the stem:", data_file.stem)
print("if you want the extension:", data_file.suffix)
from pathlib import Path

# Create a Path object
data_dir = Path("data")  # Represents a directory named "data"

# Check if the directory exists
if not data_dir.exists():
    data_dir.mkdir()  # Create the directory if it doesn't exist

# Create a file path
data_file = data_dir / "experiment_results.csv"  # Use / to join paths

# Write to the file
data_file.write_text("Sample data\n")  # Write text to the file

# Read from the file
print(data_file.read_text())  # Read text from the file

# Iterate over files in a directory
for file in data_dir.glob("*.csv"):  # Find all CSV files
    print(f"Found file: {file.name}")

print("if you want the full path:", data_file.resolve())
print("if you want the stem:", data_file.stem)
print("if you want the extension:", data_file.suffix)

Sample data

Found file: experiment_results.csv
if you want the full path: /Users/Hodgs004/coding/repos/python-for-lunch/docs/talks/data/experiment_results.csv
if you want the stem: experiment_results
if you want the extension: .csv

In [86]:

Copied!

# Path objects can be passed to many functions from external libraries.
# If they *need* a string, you can do
print(str(data_file))
# Path objects can be passed to many functions from external libraries.
# If they *need* a string, you can do
print(str(data_file))

data/experiment_results.csv

In [87]:

Copied!

# let's look at what methods are available
print(dir(Path)) # hmm, a bit difficult to read...
# let's look at what methods are available
print(dir(Path)) # hmm, a bit difficult to read...

['__bytes__', '__class__', '__delattr__', '__dir__', '__doc__', '__enter__', '__eq__', '__exit__', '__format__', '__fspath__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rtruediv__', '__setattr__', '__sizeof__', '__slots__', '__str__', '__subclasshook__', '__truediv__', '_drv', '_flavour', '_format_parsed_parts', '_from_parsed_parts', '_hash', '_lines', '_lines_cached', '_load_parts', '_make_child_relpath', '_parse_path', '_parts_normcase', '_parts_normcase_cached', '_raw_paths', '_root', '_scandir', '_str', '_str_normcase', '_str_normcase_cached', '_tail', '_tail_cached', 'absolute', 'anchor', 'as_posix', 'as_uri', 'chmod', 'cwd', 'drive', 'exists', 'expanduser', 'glob', 'group', 'hardlink_to', 'home', 'is_absolute', 'is_block_device', 'is_char_device', 'is_dir', 'is_fifo', 'is_file', 'is_junction', 'is_mount', 'is_relative_to', 'is_reserved', 'is_socket', 'is_symlink', 'iterdir', 'joinpath', 'lchmod', 'lstat', 'match', 'mkdir', 'name', 'open', 'owner', 'parent', 'parents', 'parts', 'read_bytes', 'read_text', 'readlink', 'relative_to', 'rename', 'replace', 'resolve', 'rglob', 'rmdir', 'root', 'samefile', 'stat', 'stem', 'suffix', 'suffixes', 'symlink_to', 'touch', 'unlink', 'walk', 'with_name', 'with_segments', 'with_stem', 'with_suffix', 'write_bytes', 'write_text']

In [88]:

Copied!





def is_public(name):
    is_private = name.startswith("_")
    return not is_private

list(filter(is_public, dir(Path)))

# or
[name for name in dir(Path) if is_public(name)]
def is_public(name):
    is_private = name.startswith("_")
    return not is_private

list(filter(is_public, dir(Path)))

# or
[name for name in dir(Path) if is_public(name)]

Out[88]:

['absolute',
 'anchor',
 'as_posix',
 'as_uri',
 'chmod',
 'cwd',
 'drive',
 'exists',
 'expanduser',
 'glob',
 'group',
 'hardlink_to',
 'home',
 'is_absolute',
 'is_block_device',
 'is_char_device',
 'is_dir',
 'is_fifo',
 'is_file',
 'is_junction',
 'is_mount',
 'is_relative_to',
 'is_reserved',
 'is_socket',
 'is_symlink',
 'iterdir',
 'joinpath',
 'lchmod',
 'lstat',
 'match',
 'mkdir',
 'name',
 'open',
 'owner',
 'parent',
 'parents',
 'parts',
 'read_bytes',
 'read_text',
 'readlink',
 'relative_to',
 'rename',
 'replace',
 'resolve',
 'rglob',
 'rmdir',
 'root',
 'samefile',
 'stat',
 'stem',
 'suffix',
 'suffixes',
 'symlink_to',
 'touch',
 'unlink',
 'walk',
 'with_name',
 'with_segments',
 'with_stem',
 'with_suffix',
 'write_bytes',
 'write_text']

datetime¶

In [89]:

Copied!





from datetime import datetime, timedelta

# 1. Parsing a string into a datetime object
date_str = "2023-10-15 14:30:00"
parsed_date = datetime.strptime(date_str, "%Y-%m-%d %H:%M:%S")
print(f"Parsed Date: {parsed_date} (object of type {type(parsed_date)})")

# 2. Formatting a datetime object into a string
formatted_date = parsed_date.strftime("%A, %B %d, %Y at %I:%M %p")
print(f"Formatted Date: {formatted_date} (object of type {type(formatted_date)})")

# 3. Calculating time differences
future_date = parsed_date + timedelta(days=7, hours=3)
time_diff = future_date - parsed_date
print(f"Time Difference: {time_diff} (object of type {type(time_diff)})")

# 4. Getting the current time
now = datetime.now() # time in UTC
print(f"Current Time: {now}")
from datetime import datetime, timedelta

# 1. Parsing a string into a datetime object
date_str = "2023-10-15 14:30:00"
parsed_date = datetime.strptime(date_str, "%Y-%m-%d %H:%M:%S")
print(f"Parsed Date: {parsed_date} (object of type {type(parsed_date)})")

# 2. Formatting a datetime object into a string
formatted_date = parsed_date.strftime("%A, %B %d, %Y at %I:%M %p")
print(f"Formatted Date: {formatted_date} (object of type {type(formatted_date)})")

# 3. Calculating time differences
future_date = parsed_date + timedelta(days=7, hours=3)
time_diff = future_date - parsed_date
print(f"Time Difference: {time_diff} (object of type {type(time_diff)})")

# 4. Getting the current time
now = datetime.now() # time in UTC
print(f"Current Time: {now}")

Parsed Date: 2023-10-15 14:30:00 (object of type <class 'datetime.datetime'>)
Formatted Date: Sunday, October 15, 2023 at 02:30 PM (object of type <class 'str'>)
Time Difference: 7 days, 3:00:00 (object of type <class 'datetime.timedelta'>)
Current Time: 2025-03-17 16:25:10.364997

In [90]:

Copied!





# Bonus: Working with timezones (requires `pytz` or `zoneinfo` in Python 3.9+)
from zoneinfo import ZoneInfo  # Python 3.9+
ny_time = now.astimezone(ZoneInfo("America/New_York"))
print(f"New York Time: {ny_time}")
# Bonus: Working with timezones (requires `pytz` or `zoneinfo` in Python 3.9+)
from zoneinfo import ZoneInfo  # Python 3.9+
ny_time = now.astimezone(ZoneInfo("America/New_York"))
print(f"New York Time: {ny_time}")

New York Time: 2025-03-17 11:25:10.364997-04:00

itertools - tools to work with iterators¶

See docs for more.

What is an iterator?:

An iterator is an object that contains a countable number of values.

In Python, an iterator is an object which implements the iterator protocol (i.e., it tells Python how to get from the current value to the next value). Iterators allow for efficient looping and processing of large datasets.

In [91]:

Copied!

import itertools
import itertools

itertools.chain¶

Use chain to seamlessly combine multiple iterables into a single iterator.

In [92]:

Copied!

list1 = [1, 2, 3]
list2 = ['a', 'b', 'c']
combined = itertools.chain(list1, list2)

print(list(combined))
list1 = [1, 2, 3]
list2 = ['a', 'b', 'c']
combined = itertools.chain(list1, list2)

print(list(combined))

[1, 2, 3, 'a', 'b', 'c']

itertools.product – Cartesian Product¶

Generate all possible combinations (Cartesian product) of input iterables.

In [93]:

Copied!

colors = ['red', 'green']
sizes = ['S', 'M', 'L']

combinations = itertools.product(colors, sizes)
print(list(combinations))
colors = ['red', 'green']
sizes = ['S', 'M', 'L']

combinations = itertools.product(colors, sizes)
print(list(combinations))

[('red', 'S'), ('red', 'M'), ('red', 'L'), ('green', 'S'), ('green', 'M'), ('green', 'L')]

itertools.combinations – Generate Combinations¶

Generate all possible combinations of a specific length from an iterable.

In [94]:

Copied!

data = ['a', 'b', 'c']
combinations = itertools.combinations(data, 2)

print(list(combinations))
data = ['a', 'b', 'c']
combinations = itertools.combinations(data, 2)

print(list(combinations))

[('a', 'b'), ('a', 'c'), ('b', 'c')]

itertools.permutations - Generate Permutations¶

Generate all possible permutations of an iterable.

In [95]:

Copied!

data = ['a', 'b', 'c']
perms = itertools.permutations(data)

print(list(perms))
data = ['a', 'b', 'c']
perms = itertools.permutations(data)

print(list(perms))

[('a', 'b', 'c'), ('a', 'c', 'b'), ('b', 'a', 'c'), ('b', 'c', 'a'), ('c', 'a', 'b'), ('c', 'b', 'a')]

itertools.islice – Slice Iterators¶

Slice an iterator without converting it to a list first.

In [96]:

Copied!

data = range(10)
sliced = itertools.islice(data, 2, 6)  # Start at index 2, end at index 6

print(list(sliced))
data = range(10)
sliced = itertools.islice(data, 2, 6)  # Start at index 2, end at index 6

print(list(sliced))

[2, 3, 4, 5]

itertools.groupby – Group Data¶

In [97]:

Copied!

data = [('a', 1), ('a', 2), ('b', 3), ('b', 4), ('c', 5)]
grouped = itertools.groupby(data, key=lambda x: x[0])

for key, group in grouped:
    print(key, list(group))
data = [('a', 1), ('a', 2), ('b', 3), ('b', 4), ('c', 5)]
grouped = itertools.groupby(data, key=lambda x: x[0])

for key, group in grouped:
    print(key, list(group))

a [('a', 1), ('a', 2)]
b [('b', 3), ('b', 4)]
c [('c', 5)]

itertools.cycle – Infinite Cycling¶

Cycle through an iterable indefinitely.

In [98]:

Copied!

import itertools

colors = ['red', 'green', 'blue']
cycled = itertools.cycle(colors)

for _ in range(5):
    print(next(cycled))
import itertools

colors = ['red', 'green', 'blue']
cycled = itertools.cycle(colors)

for _ in range(5):
    print(next(cycled))

red
green
blue
red
green

itertools.tee – Duplicate an Iterator¶

Split an iterator into multiple independent iterators.

In [99]:

Copied!

import itertools

data = iter(range(5))
iter1, iter2 = itertools.tee(data, 2)

print(list(iter1))
print(list(iter2))
import itertools

data = iter(range(5))
iter1, iter2 = itertools.tee(data, 2)

print(list(iter1))
print(list(iter2))

[0, 1, 2, 3, 4]
[0, 1, 2, 3, 4]

more itertools¶

In [100]:

Copied!

[name for name in dir(itertools) if is_public(name)]
[name for name in dir(itertools) if is_public(name)]

Out[100]:

['accumulate',
 'batched',
 'chain',
 'combinations',
 'combinations_with_replacement',
 'compress',
 'count',
 'cycle',
 'dropwhile',
 'filterfalse',
 'groupby',
 'islice',
 'pairwise',
 'permutations',
 'product',
 'repeat',
 'starmap',
 'takewhile',
 'tee',
 'zip_longest']

functools - tools to work with functions¶

Here we just cover partial and cache. See docs for more.

In [101]:

Copied!

import functools
import functools

functools.partial¶

Simplifies repetitive function calls with fixed parameters (e.g., fitting curves, transformations).
Makes code cleaner and more reusable.

In [102]:

Copied!

help(functools.partial)
help(functools.partial)

Help on class partial in module functools:

class partial(builtins.object)
 |  partial(func, *args, **keywords) - new function with partial application
 |  of the given arguments and keywords.
 |
 |  Methods defined here:
 |
 |  __call__(self, /, *args, **kwargs)
 |      Call self as a function.
 |
 |  __delattr__(self, name, /)
 |      Implement delattr(self, name).
 |
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |
 |  __reduce__(...)
 |      Helper for pickle.
 |
 |  __repr__(self, /)
 |      Return repr(self).
 |
 |  __setattr__(self, name, value, /)
 |      Implement setattr(self, name, value).
 |
 |  __setstate__(...)
 |
 |  ----------------------------------------------------------------------
 |  Class methods defined here:
 |
 |  __class_getitem__(...)
 |      See PEP 585
 |
 |  ----------------------------------------------------------------------
 |  Static methods defined here:
 |
 |  __new__(*args, **kwargs)
 |      Create and return a new object.  See help(type) for accurate signature.
 |
 |  ----------------------------------------------------------------------
 |  Data descriptors defined here:
 |
 |  __dict__
 |
 |  __vectorcalloffset__
 |
 |  args
 |      tuple of arguments to future partial calls
 |
 |  func
 |      function object to use in future partial calls
 |
 |  keywords
 |      dictionary of keyword arguments to future partial calls

In [103]:

Copied!





# Original function
def power(base, exponent):
    return base ** exponent

# Create a new function with `base` fixed to 2
square = functools.partial(power, exponent=2)
cube = functools.partial(power, exponent=3)

print(square(5))  # 25
print(cube(3))    # 27
# Original function
def power(base, exponent):
    return base ** exponent

# Create a new function with `base` fixed to 2
square = functools.partial(power, exponent=2)
cube = functools.partial(power, exponent=3)

print(square(5))  # 25
print(cube(3))    # 27

25
27

functools.lru_cache¶

Speeds up recursive or repetitive computations (e.g., dynamic programming, simulations)
Reduces redundant calculations in expensive functions
Should only be used on functions that are deterministic and idempotent (i.e., no side effects)

In [104]:

Copied!

help(functools.lru_cache)
help(functools.lru_cache)

Help on function lru_cache in module functools:

lru_cache(maxsize=128, typed=False)
    Least-recently-used cache decorator.

    If *maxsize* is set to None, the LRU features are disabled and the cache
    can grow without bound.

    If *typed* is True, arguments of different types will be cached separately.
    For example, f(3.0) and f(3) will be treated as distinct calls with
    distinct results.

    Arguments to the cached function must be hashable.

    View the cache statistics named tuple (hits, misses, maxsize, currsize)
    with f.cache_info().  Clear the cache and statistics with f.cache_clear().
    Access the underlying function with f.__wrapped__.

    See:  https://en.wikipedia.org/wiki/Cache_replacement_policies#Least_recently_used_(LRU)

In [105]:

Copied!





from time import time, sleep

@functools.lru_cache(maxsize=None)
def some_long_running_function(a, b):
    sleep(2)  # Simulate a long computation
    return a + b

print("first call with 1, 2:", some_long_running_function(1, 2))  # Takes 2 seconds
from time import time, sleep

@functools.lru_cache(maxsize=None)
def some_long_running_function(a, b):
    sleep(2)  # Simulate a long computation
    return a + b

print("first call with 1, 2:", some_long_running_function(1, 2))  # Takes 2 seconds

first call with 1, 2: 3

In [106]:

Copied!

print("second call with 1, 2:", some_long_running_function(1, 2))  # Returns immediately
print("second call with 1, 2:", some_long_running_function(1, 2))  # Returns immediately

second call with 1, 2: 3

In [107]:

Copied!

print("second call with 2, 4:", some_long_running_function(2, 4))  # takes 2 seconds
print("second call with 2, 4:", some_long_running_function(2, 4))  # takes 2 seconds

second call with 2, 4: 6

In [108]:

Copied!





# A more real-world example
def fibonacci(n):
    """Inefficient recursive function to compute Fibonacci number.
    
    fibonacci(5) calls fibonacci(4) and fibonacci(3), but fibonacci(4) also calls fibonacci(3).
    This leads to an exponential number of function calls (2^(n-1) calls to be precise).
    """
    if n < 2:
        return n
    return fibonacci(n - 1) + fibonacci(n - 2)

n = 40
t = time()
fib = fibonacci(n)
print(f"Time taken: {time() - t:.2f} seconds")
print(f"Fibonacci({n}): {fib}")
print(f"Number of function calls: {2**(n-1)}")
# A more real-world example
def fibonacci(n):
    """Inefficient recursive function to compute Fibonacci number.
    
    fibonacci(5) calls fibonacci(4) and fibonacci(3), but fibonacci(4) also calls fibonacci(3).
    This leads to an exponential number of function calls (2^(n-1) calls to be precise).
    """
    if n < 2:
        return n
    return fibonacci(n - 1) + fibonacci(n - 2)

n = 40
t = time()
fib = fibonacci(n)
print(f"Time taken: {time() - t:.2f} seconds")
print(f"Fibonacci({n}): {fib}")
print(f"Number of function calls: {2**(n-1)}")

Time taken: 7.99 seconds
Fibonacci(40): 102334155
Number of function calls: 549755813888

In [109]:

Copied!





@functools.lru_cache(maxsize=None)  # Cache all results (maxsize default is 128)
def fibonacci(n):
    if n < 2:
        return n
    return fibonacci(n - 1) + fibonacci(n - 2)

n = 40
t = time()
fib = fibonacci(n)
print(f"Time taken: {time() - t:.2f} seconds")
print(f"Fibonacci({n}): {fib}")

@functools.lru_cache(maxsize=None)  # Cache all results (maxsize default is 128)
def fibonacci(n):
    if n < 2:
        return n
    return fibonacci(n - 1) + fibonacci(n - 2)

n = 40
t = time()
fib = fibonacci(n)
print(f"Time taken: {time() - t:.2f} seconds")
print(f"Fibonacci({n}): {fib}")

Time taken: 0.00 seconds
Fibonacci(40): 102334155

functools.reduce¶

In [110]:

Copied!

help(functools.reduce)
help(functools.reduce)

Help on built-in function reduce in module _functools:

reduce(...)
    reduce(function, iterable[, initial]) -> value

    Apply a function of two arguments cumulatively to the items of a sequence
    or iterable, from left to right, so as to reduce the iterable to a single
    value.  For example, reduce(lambda x, y: x+y, [1, 2, 3, 4, 5]) calculates
    ((((1+2)+3)+4)+5).  If initial is present, it is placed before the items
    of the iterable in the calculation, and serves as a default when the
    iterable is empty.

In [111]:

Copied!

# Multiply all numbers in a list
numbers = [1, 2, 3, 4, 5]
product = functools.reduce(lambda x, y: x * y, numbers)

print(product)
# Multiply all numbers in a list
numbers = [1, 2, 3, 4, 5]
product = functools.reduce(lambda x, y: x * y, numbers)

print(product)

In [112]:

Copied!

# interested in other functools stuff? You can Google the public API for usecases...
[name for name in dir(functools) if is_public(name)]
# interested in other functools stuff? You can Google the public API for usecases...
[name for name in dir(functools) if is_public(name)]

Out[112]:

['GenericAlias',
 'RLock',
 'WRAPPER_ASSIGNMENTS',
 'WRAPPER_UPDATES',
 'cache',
 'cached_property',
 'cmp_to_key',
 'get_cache_token',
 'lru_cache',
 'namedtuple',
 'partial',
 'partialmethod',
 'recursive_repr',
 'reduce',
 'singledispatch',
 'singledispatchmethod',
 'total_ordering',
 'update_wrapper',
 'wraps']

Python packages: 3rd Party¶

tqdm¶

After installing it using conda install tqdm or pip install tqdm...

In [113]:

Copied!

from tqdm import tqdm

def run_calculations():
    sleep(0.1)  # Simulate a long computation

for _ in tqdm(range(100)):
    run_calculations()
from tqdm import tqdm

def run_calculations():
    sleep(0.1)  # Simulate a long computation

for _ in tqdm(range(100)):
    run_calculations()

100%|██████████| 100/100 [00:10<00:00,  9.47it/s]

In [114]:

Copied!





# Bonus tip!: Use `_` when assigning variables you don't care about. Good for for loops and unpacking.
# Example 1: Unpacking values
data = (1, 2, 3)
_, y, _ = data
print(y)
# Bonus tip!: Use `_` when assigning variables you don't care about. Good for for loops and unpacking.
# Example 1: Unpacking values
data = (1, 2, 3)
_, y, _ = data
print(y)

Topics not discussed, and further reading¶

Things not mentioned in this talk:

Testing (using Pytest)
- this is quite a large topic and could be a talk in itself
Jupyter Notebook tips and tricks (+using markdown)
- this is quite a large topic and could be a talk in itself
logging
- this is a topic that could form part of a talk in itself

Check out the rest of the Python standard library for more interesting packages!