Data Structures#

The Problem: Storing More Than One Value#

So far, we’ve stored single values in variables: one temperature, one pressure, one mass. But what if you’re running Euler’s method and need to track the parachutist’s velocity at every timestep?

You could create separate variables:

v0 = 0.0
v1 = 19.6
v2 = 32.0
v3 = 39.85
# ... and so on for 1000 timesteps?

This is clearly impractical. What you need is a way to store multiple values in a single variable. That’s exactly what data structures are for.

Python has four built-in data structures, each designed for different situations. Let’s start with the most common one.


Lists: Ordered Collections You Can Change#

A list holds multiple items in order. You create one with square brackets:

velocities = [0.0, 19.6, 32.0, 39.85, 44.82]
compounds = ["methane", "ethane", "propane"]

Now velocities is a single variable containing all five values. Much better than five separate variables!

Accessing Items by Position#

Items in a list are numbered starting from 0, not 1. This trips up many beginners:

temps = [300, 350, 400, 450, 500]

print(temps[0])   # First item
print(temps[2])   # Third item
print(temps[-1])  # Last item
print(temps[-2])  # Second to last

Output:

300
400
500
450

The negative indices are a nice Python feature: -1 always means the last item, regardless of how long the list is.

Getting Multiple Items: Slicing#

What if you want items 1 through 3? Use a slice:

temps = [300, 350, 400, 450, 500]

print(temps[1:3])   # Items at index 1 and 2
print(temps[:3])    # From start to index 2
print(temps[2:])    # From index 2 to end
print(temps[::2])   # Every 2nd item

Output:

[350, 400]
[300, 350, 400]
[400, 450, 500]
[300, 400, 500]

Notice that [1:3] gives you indices 1 and 2, but not 3. The end index is always excluded. This might seem odd, but it has a nice property: temps[:3] and temps[3:] together give you the whole list with no overlap.

Modifying Lists#

Unlike some other data structures we’ll see, lists can be changed after creation:

temps = [300, 350, 400]

temps[1] = 375           # Change the second item
temps.append(450)        # Add to the end
temps.insert(0, 250)     # Insert at the beginning
temps.remove(375)        # Remove a specific value
last = temps.pop()       # Remove and return the last item

print(temps)

Output:

[250, 300, 400]

Each of these operations modifies the list in place. You don’t need to create a new list.

Common List Operations#

For data analysis, you’ll often need the length, sum, minimum, or maximum:

temps = [400, 300, 500, 350]

print(len(temps))      # 4 (number of items)
print(min(temps))      # 300 (smallest)
print(max(temps))      # 500 (largest)
print(sum(temps))      # 1550 (total)
print(sorted(temps))   # [300, 350, 400, 500] (new sorted list)

Note that sorted() returns a new sorted list. The original stays unchanged. If you want to sort in place, use temps.sort().

Tip

For more built-in functions that work with lists (like sum(), min(), max()), see Common Functions.


Tuples: When Data Shouldn’t Change#

Lists are great, but sometimes you want to guarantee that data can’t be modified. That’s what tuples are for.

A tuple looks like a list, but uses parentheses instead of brackets:

point = (3.5, 2.1)
constants = (8.314, 6.022e23, 1.38e-23)  # R, Avogadro, Boltzmann

You can access items the same way as lists:

print(point[0])  # 3.5
print(point[1])  # 2.1

But if you try to change an item, Python raises an error:

point[0] = 4.0  # TypeError: 'tuple' object does not support item assignment

Why Would I Want That?#

You might wonder: why would I want something I can’t change? A few reasons:

  1. Safety: You can’t accidentally modify data that shouldn’t change (like physical constants)

  2. Dictionary keys: Tuples can be used as dictionary keys, lists cannot

  3. Multiple return values: Functions often return tuples

Unpacking Tuples#

A convenient feature: you can “unpack” a tuple into separate variables:

point = (3.5, 2.1)
x, y = point

print(x)  # 3.5
print(y)  # 2.1

This is why functions can effectively return multiple values; they return a tuple, and you unpack it.

Now, lists and tuples are great for ordered data. But what if you want to look up values by name rather than by position?


Dictionaries: Looking Up Values by Name#

Imagine you want to store properties of water: formula, molar mass, boiling point. With a list, you’d have to remember that index 0 is formula, index 1 is molar mass, etc. That’s error-prone.

A dictionary lets you use meaningful names (called “keys”) instead of numbers:

water = {
    "formula": "H2O",
    "molar_mass": 18.015,
    "boiling_point": 373.15,
    "density": 1000
}

Now you can access values by name:

print(water["formula"])
print(water["molar_mass"])
print(water.get("viscosity", "not found"))

Output:

H2O
18.015
not found

Using .get() with a default value prevents errors when a key doesn’t exist. Compare this to water["viscosity"], which would crash with a KeyError.

Modifying Dictionaries#

Unlike tuples, dictionaries can be modified:

water["viscosity"] = 0.001    # Add new key-value pair
water["density"] = 997        # Update existing value
del water["boiling_point"]    # Remove a key

Looping Through Dictionaries#

You’ll often need to iterate through all the items in a dictionary:

water = {"formula": "H2O", "molar_mass": 18.015, "density": 1000}

# Loop through keys and values together
for key, value in water.items():
    print(f"{key}: {value}")

Output:

formula: H2O
molar_mass: 18.015
density: 1000

Nested Dictionaries: A Practical Example#

Dictionaries can contain other dictionaries. This is perfect for storing properties of multiple compounds:

compounds = {
    "methane": {"formula": "CH4", "MW": 16.04, "Tc": 190.6},
    "ethane": {"formula": "C2H6", "MW": 30.07, "Tc": 305.3},
    "propane": {"formula": "C3H8", "MW": 44.10, "Tc": 369.8}
}

# Access nested data
print(compounds["ethane"]["MW"])  # 30.07

# Loop through all compounds
for name, props in compounds.items():
    print(f"{name}: MW = {props['MW']} g/mol")

Output:

30.07
methane: MW = 16.04 g/mol
ethane: MW = 30.07 g/mol
propane: MW = 44.1 g/mol

There’s one more data structure worth knowing about, especially when you need to find unique values or check membership quickly.


Sets: Unique Items Only#

A set is an unordered collection where duplicates are automatically removed:

elements = {"C", "H", "O", "N"}

# Try adding duplicates
numbers = {1, 2, 2, 3, 3, 3}
print(numbers)

Output:

{1, 2, 3}

Python kept only one copy of each value.

When Are Sets Useful?#

Sets are perfect for finding unique values. Say you have a list of all species in a reaction mechanism and want to know how many unique species there are:

all_species = ["H2", "O2", "H2O", "H2", "OH", "O2", "H2O"]
unique_species = set(all_species)
print(unique_species)
print(f"Found {len(unique_species)} unique species")

Output:

{'H2', 'O2', 'H2O', 'OH'}
Found 4 unique species

Set Operations#

Sets support mathematical set operations:

a = {1, 2, 3, 4}
b = {3, 4, 5, 6}

print(a | b)   # Union: all items from both
print(a & b)   # Intersection: items in both
print(a - b)   # Difference: items in a but not in b

Output:

{1, 2, 3, 4, 5, 6}
{3, 4}
{1, 2}

The in operator checks membership very quickly:

print("H2O" in unique_species)  # True

So Which One Should I Use?#

Now you know four data structures. Here’s how to choose:

Practical example: finding unique species#

reactions = ["A + B -> C", "C + D -> E", "A + E -> F"]

all_species = {"A", "B", "C", "D", "E", "F"}
products = {"C", "E", "F"}
reactants_only = all_species - products
print(reactants_only)

Output:

{'A', 'B', 'D'}

When to Use What?#

Structure

Use When…

Example

List

Order matters, items may change

Time series data, experiment results

Tuple

Data shouldn’t change, returning multiple values

Coordinates, RGB colors, constants

Dictionary

You need to look up values by name

Compound properties, configuration settings

Set

You need unique items or set operations

Finding unique species, checking membership


Quick Reference#

Lists#

Operation

Syntax

Example

Create

[item1, item2, ...]

temps = [300, 350, 400]

Access

list[i]

temps[0]300

Slice

list[start:end]

temps[1:3][350, 400]

Append

list.append(item)

temps.append(450)

Insert

list.insert(i, item)

temps.insert(0, 250)

Remove

list.remove(item)

temps.remove(350)

Pop

list.pop()

last = temps.pop()

Length

len(list)

len(temps)3

Sort

sorted(list) or list.sort()

sorted(temps)

Tuples#

Operation

Syntax

Example

Create

(item1, item2, ...)

point = (3.5, 2.1)

Access

tuple[i]

point[0]3.5

Unpack

a, b = tuple

x, y = point

Dictionaries#

Operation

Syntax

Example

Create

{key: value, ...}

water = {"MW": 18.0}

Access

dict[key]

water["MW"]18.0

Safe access

dict.get(key, default)

water.get("Tc", 0)

Add/update

dict[key] = value

water["Tb"] = 373

Delete

del dict[key]

del water["Tb"]

Keys

dict.keys()

water.keys()

Values

dict.values()

water.values()

Items

dict.items()

water.items()

Sets#

Operation

Syntax

Example

Create

{item1, item2, ...}

elements = {"C", "H", "O"}

Add

set.add(item)

elements.add("N")

Union

a | b

{1,2} | {2,3}{1,2,3}

Intersection

a & b

{1,2} & {2,3}{2}

Difference

a - b

{1,2} - {2,3}{1}

Membership

item in set

"C" in elementsTrue

Next Steps#

Continue to Iterators to learn how to loop through data structures efficiently.