Data Structures

Data Structures#

The Problem: Storing More Than One Value#

So far, we’ve stored single values in variables: one temperature, one pressure, one mass. But what if you’re running Euler’s method and need to track the parachutist’s velocity at every timestep?

You could create separate variables:

v0 = 0.0
v1 = 19.6
v2 = 32.0
v3 = 39.85
# ... and so on for 1000 timesteps?

This is clearly impractical. What you need is a way to store multiple values in a single variable. That’s exactly what data structures are for.

Python has four built-in data structures, each designed for different situations. Let’s start with the most common one.

Lists: Ordered Collections You Can Change#

A list holds multiple items in order. You create one with square brackets:

velocities = [0.0, 19.6, 32.0, 39.85, 44.82]
compounds = ["methane", "ethane", "propane"]

Now velocities is a single variable containing all five values. Much better than five separate variables!

Accessing Items by Position#

Items in a list are numbered starting from 0, not 1. This trips up many beginners:

temps = [300, 350, 400, 450, 500]

print(temps[0])   # First item
print(temps[2])   # Third item
print(temps[-1])  # Last item
print(temps[-2])  # Second to last

Output:

The negative indices are a nice Python feature: -1 always means the last item, regardless of how long the list is.

Getting Multiple Items: Slicing#

What if you want items 1 through 3? Use a slice:

temps = [300, 350, 400, 450, 500]

print(temps[1:3])   # Items at index 1 and 2
print(temps[:3])    # From start to index 2
print(temps[2:])    # From index 2 to end
print(temps[::2])   # Every 2nd item

Output:

[350, 400]
[300, 350, 400]
[400, 450, 500]
[300, 400, 500]

Notice that [1:3] gives you indices 1 and 2, but not 3. The end index is always excluded. This might seem odd, but it has a nice property: temps[:3] and temps[3:] together give you the whole list with no overlap.

Modifying Lists#

Unlike some other data structures we’ll see, lists can be changed after creation:

temps = [300, 350, 400]

temps[1] = 375           # Change the second item
temps.append(450)        # Add to the end
temps.insert(0, 250)     # Insert at the beginning
temps.remove(375)        # Remove a specific value
last = temps.pop()       # Remove and return the last item

print(temps)

Output:

[250, 300, 400]

Each of these operations modifies the list in place. You don’t need to create a new list.

Common List Operations#

For data analysis, you’ll often need the length, sum, minimum, or maximum:

temps = [400, 300, 500, 350]

print(len(temps))      # 4 (number of items)
print(min(temps))      # 300 (smallest)
print(max(temps))      # 500 (largest)
print(sum(temps))      # 1550 (total)
print(sorted(temps))   # [300, 350, 400, 500] (new sorted list)

Note that sorted() returns a new sorted list. The original stays unchanged. If you want to sort in place, use temps.sort().

Tip

For more built-in functions that work with lists (like sum(), min(), max()), see Common Functions.

Tuples: When Data Shouldn’t Change#

Lists are great, but sometimes you want to guarantee that data can’t be modified. That’s what tuples are for.

A tuple looks like a list, but uses parentheses instead of brackets:

point = (3.5, 2.1)
constants = (8.314, 6.022e23, 1.38e-23)  # R, Avogadro, Boltzmann

You can access items the same way as lists:

print(point[0])  # 3.5
print(point[1])  # 2.1

But if you try to change an item, Python raises an error:

point[0] = 4.0  # TypeError: 'tuple' object does not support item assignment

Why Would I Want That?#

You might wonder: why would I want something I can’t change? A few reasons:

Safety: You can’t accidentally modify data that shouldn’t change (like physical constants)
Dictionary keys: Tuples can be used as dictionary keys, lists cannot
Multiple return values: Functions often return tuples

Unpacking Tuples#

A convenient feature: you can “unpack” a tuple into separate variables:

point = (3.5, 2.1)
x, y = point

print(x)  # 3.5
print(y)  # 2.1

This is why functions can effectively return multiple values; they return a tuple, and you unpack it.

Now, lists and tuples are great for ordered data. But what if you want to look up values by name rather than by position?

Dictionaries: Looking Up Values by Name#

Imagine you want to store properties of water: formula, molar mass, boiling point. With a list, you’d have to remember that index 0 is formula, index 1 is molar mass, etc. That’s error-prone.

A dictionary lets you use meaningful names (called “keys”) instead of numbers:

water = {
    "formula": "H2O",
    "molar_mass": 18.015,
    "boiling_point": 373.15,
    "density": 1000
}

Now you can access values by name:

print(water["formula"])
print(water["molar_mass"])
print(water.get("viscosity", "not found"))

Output:

H2O
18.015
not found

Using .get() with a default value prevents errors when a key doesn’t exist. Compare this to water["viscosity"], which would crash with a KeyError.

Modifying Dictionaries#

Unlike tuples, dictionaries can be modified:

water["viscosity"] = 0.001    # Add new key-value pair
water["density"] = 997        # Update existing value
del water["boiling_point"]    # Remove a key

Looping Through Dictionaries#

You’ll often need to iterate through all the items in a dictionary:

water = {"formula": "H2O", "molar_mass": 18.015, "density": 1000}

# Loop through keys and values together
for key, value in water.items():
    print(f"{key}: {value}")

Output:

formula: H2O
molar_mass: 18.015
density: 1000

Nested Dictionaries: A Practical Example#

Dictionaries can contain other dictionaries. This is perfect for storing properties of multiple compounds:

compounds = {
    "methane": {"formula": "CH4", "MW": 16.04, "Tc": 190.6},
    "ethane": {"formula": "C2H6", "MW": 30.07, "Tc": 305.3},
    "propane": {"formula": "C3H8", "MW": 44.10, "Tc": 369.8}
}

# Access nested data
print(compounds["ethane"]["MW"])  # 30.07

# Loop through all compounds
for name, props in compounds.items():
    print(f"{name}: MW = {props['MW']} g/mol")

Output:

30.07
methane: MW = 16.04 g/mol
ethane: MW = 30.07 g/mol
propane: MW = 44.1 g/mol

There’s one more data structure worth knowing about, especially when you need to find unique values or check membership quickly.

Sets: Unique Items Only#

A set is an unordered collection where duplicates are automatically removed:

elements = {"C", "H", "O", "N"}

# Try adding duplicates
numbers = {1, 2, 2, 3, 3, 3}
print(numbers)

Output:

{1, 2, 3}

Python kept only one copy of each value.

When Are Sets Useful?#

Sets are perfect for finding unique values. Say you have a list of all species in a reaction mechanism and want to know how many unique species there are:

all_species = ["H2", "O2", "H2O", "H2", "OH", "O2", "H2O"]
unique_species = set(all_species)
print(unique_species)
print(f"Found {len(unique_species)} unique species")

Output:

{'H2', 'O2', 'H2O', 'OH'}
Found 4 unique species

Set Operations#

Sets support mathematical set operations:

a = {1, 2, 3, 4}
b = {3, 4, 5, 6}

print(a | b)   # Union: all items from both
print(a & b)   # Intersection: items in both
print(a - b)   # Difference: items in a but not in b

Output:

{1, 2, 3, 4, 5, 6}
{3, 4}
{1, 2}

The in operator checks membership very quickly:

print("H2O" in unique_species)  # True

So Which One Should I Use?#

Now you know four data structures. Here’s how to choose:

Practical example: finding unique species#

reactions = ["A + B -> C", "C + D -> E", "A + E -> F"]

all_species = {"A", "B", "C", "D", "E", "F"}
products = {"C", "E", "F"}
reactants_only = all_species - products
print(reactants_only)

Output:

{'A', 'B', 'D'}

When to Use What?#

Structure	Use When…	Example
List	Order matters, items may change	Time series data, experiment results
Tuple	Data shouldn’t change, returning multiple values	Coordinates, RGB colors, constants
Dictionary	You need to look up values by name	Compound properties, configuration settings
Set	You need unique items or set operations	Finding unique species, checking membership

Quick Reference#

Lists#

Operation	Syntax	Example
Create	`[item1, item2, ...]`	`temps = [300, 350, 400]`
Access	`list[i]`	`temps[0]` → `300`
Slice	`list[start:end]`	`temps[1:3]` → `[350, 400]`
Append	`list.append(item)`	`temps.append(450)`
Insert	`list.insert(i, item)`	`temps.insert(0, 250)`
Remove	`list.remove(item)`	`temps.remove(350)`
Pop	`list.pop()`	`last = temps.pop()`
Length	`len(list)`	`len(temps)` → `3`
Sort	`sorted(list)` or `list.sort()`	`sorted(temps)`

Tuples#

Operation	Syntax	Example
Create	`(item1, item2, ...)`	`point = (3.5, 2.1)`
Access	`tuple[i]`	`point[0]` → `3.5`
Unpack	`a, b = tuple`	`x, y = point`

Dictionaries#

Operation	Syntax	Example
Create	`{key: value, ...}`	`water = {"MW": 18.0}`
Access	`dict[key]`	`water["MW"]` → `18.0`
Safe access	`dict.get(key, default)`	`water.get("Tc", 0)`
Add/update	`dict[key] = value`	`water["Tb"] = 373`
Delete	`del dict[key]`	`del water["Tb"]`
Keys	`dict.keys()`	`water.keys()`
Values	`dict.values()`	`water.values()`
Items	`dict.items()`	`water.items()`

Sets#

Operation	Syntax	Example
Create	`{item1, item2, ...}`	`elements = {"C", "H", "O"}`
Add	`set.add(item)`	`elements.add("N")`
Union	`a \| b`	`{1,2} \| {2,3}` → `{1,2,3}`
Intersection	`a & b`	`{1,2} & {2,3}` → `{2}`
Difference	`a - b`	`{1,2} - {2,3}` → `{1}`
Membership	`item in set`	`"C" in elements` → `True`

Next Steps#

Continue to Iterators to learn how to loop through data structures efficiently.

Data Structures

Contents

Data Structures#

The Problem: Storing More Than One Value#

Lists: Ordered Collections You Can Change#

Accessing Items by Position#

Getting Multiple Items: Slicing#

Modifying Lists#

Common List Operations#

Tuples: When Data Shouldn’t Change#

Why Would I Want That?#

Unpacking Tuples#

Dictionaries: Looking Up Values by Name#

Modifying Dictionaries#

Looping Through Dictionaries#

Nested Dictionaries: A Practical Example#

Sets: Unique Items Only#

When Are Sets Useful?#

Set Operations#

So Which One Should I Use?#

Practical example: finding unique species#

When to Use What?#

Quick Reference#

Lists#

Tuples#

Dictionaries#

Sets#

Next Steps#