Data Structures#
The Problem: Storing More Than One Value#
So far, we’ve stored single values in variables: one temperature, one pressure, one mass. But what if you’re running Euler’s method and need to track the parachutist’s velocity at every timestep?
You could create separate variables:
v0 = 0.0
v1 = 19.6
v2 = 32.0
v3 = 39.85
# ... and so on for 1000 timesteps?
This is clearly impractical. What you need is a way to store multiple values in a single variable. That’s exactly what data structures are for.
Python has four built-in data structures, each designed for different situations. Let’s start with the most common one.
Lists: Ordered Collections You Can Change#
A list holds multiple items in order. You create one with square brackets:
velocities = [0.0, 19.6, 32.0, 39.85, 44.82]
compounds = ["methane", "ethane", "propane"]
Now velocities is a single variable containing all five values. Much better than five separate variables!
Accessing Items by Position#
Items in a list are numbered starting from 0, not 1. This trips up many beginners:
temps = [300, 350, 400, 450, 500]
print(temps[0]) # First item
print(temps[2]) # Third item
print(temps[-1]) # Last item
print(temps[-2]) # Second to last
Output:
300
400
500
450
The negative indices are a nice Python feature: -1 always means the last item, regardless of how long the list is.
Getting Multiple Items: Slicing#
What if you want items 1 through 3? Use a slice:
temps = [300, 350, 400, 450, 500]
print(temps[1:3]) # Items at index 1 and 2
print(temps[:3]) # From start to index 2
print(temps[2:]) # From index 2 to end
print(temps[::2]) # Every 2nd item
Output:
[350, 400]
[300, 350, 400]
[400, 450, 500]
[300, 400, 500]
Notice that [1:3] gives you indices 1 and 2, but not 3. The end index is always excluded. This might seem odd, but it has a nice property: temps[:3] and temps[3:] together give you the whole list with no overlap.
Modifying Lists#
Unlike some other data structures we’ll see, lists can be changed after creation:
temps = [300, 350, 400]
temps[1] = 375 # Change the second item
temps.append(450) # Add to the end
temps.insert(0, 250) # Insert at the beginning
temps.remove(375) # Remove a specific value
last = temps.pop() # Remove and return the last item
print(temps)
Output:
[250, 300, 400]
Each of these operations modifies the list in place. You don’t need to create a new list.
Common List Operations#
For data analysis, you’ll often need the length, sum, minimum, or maximum:
temps = [400, 300, 500, 350]
print(len(temps)) # 4 (number of items)
print(min(temps)) # 300 (smallest)
print(max(temps)) # 500 (largest)
print(sum(temps)) # 1550 (total)
print(sorted(temps)) # [300, 350, 400, 500] (new sorted list)
Note that sorted() returns a new sorted list. The original stays unchanged. If you want to sort in place, use temps.sort().
Tip
For more built-in functions that work with lists (like sum(), min(), max()), see Common Functions.
Tuples: When Data Shouldn’t Change#
Lists are great, but sometimes you want to guarantee that data can’t be modified. That’s what tuples are for.
A tuple looks like a list, but uses parentheses instead of brackets:
point = (3.5, 2.1)
constants = (8.314, 6.022e23, 1.38e-23) # R, Avogadro, Boltzmann
You can access items the same way as lists:
print(point[0]) # 3.5
print(point[1]) # 2.1
But if you try to change an item, Python raises an error:
point[0] = 4.0 # TypeError: 'tuple' object does not support item assignment
Why Would I Want That?#
You might wonder: why would I want something I can’t change? A few reasons:
Safety: You can’t accidentally modify data that shouldn’t change (like physical constants)
Dictionary keys: Tuples can be used as dictionary keys, lists cannot
Multiple return values: Functions often return tuples
Unpacking Tuples#
A convenient feature: you can “unpack” a tuple into separate variables:
point = (3.5, 2.1)
x, y = point
print(x) # 3.5
print(y) # 2.1
This is why functions can effectively return multiple values; they return a tuple, and you unpack it.
Now, lists and tuples are great for ordered data. But what if you want to look up values by name rather than by position?
Dictionaries: Looking Up Values by Name#
Imagine you want to store properties of water: formula, molar mass, boiling point. With a list, you’d have to remember that index 0 is formula, index 1 is molar mass, etc. That’s error-prone.
A dictionary lets you use meaningful names (called “keys”) instead of numbers:
water = {
"formula": "H2O",
"molar_mass": 18.015,
"boiling_point": 373.15,
"density": 1000
}
Now you can access values by name:
print(water["formula"])
print(water["molar_mass"])
print(water.get("viscosity", "not found"))
Output:
H2O
18.015
not found
Using .get() with a default value prevents errors when a key doesn’t exist. Compare this to water["viscosity"], which would crash with a KeyError.
Modifying Dictionaries#
Unlike tuples, dictionaries can be modified:
water["viscosity"] = 0.001 # Add new key-value pair
water["density"] = 997 # Update existing value
del water["boiling_point"] # Remove a key
Looping Through Dictionaries#
You’ll often need to iterate through all the items in a dictionary:
water = {"formula": "H2O", "molar_mass": 18.015, "density": 1000}
# Loop through keys and values together
for key, value in water.items():
print(f"{key}: {value}")
Output:
formula: H2O
molar_mass: 18.015
density: 1000
Nested Dictionaries: A Practical Example#
Dictionaries can contain other dictionaries. This is perfect for storing properties of multiple compounds:
compounds = {
"methane": {"formula": "CH4", "MW": 16.04, "Tc": 190.6},
"ethane": {"formula": "C2H6", "MW": 30.07, "Tc": 305.3},
"propane": {"formula": "C3H8", "MW": 44.10, "Tc": 369.8}
}
# Access nested data
print(compounds["ethane"]["MW"]) # 30.07
# Loop through all compounds
for name, props in compounds.items():
print(f"{name}: MW = {props['MW']} g/mol")
Output:
30.07
methane: MW = 16.04 g/mol
ethane: MW = 30.07 g/mol
propane: MW = 44.1 g/mol
There’s one more data structure worth knowing about, especially when you need to find unique values or check membership quickly.
Sets: Unique Items Only#
A set is an unordered collection where duplicates are automatically removed:
elements = {"C", "H", "O", "N"}
# Try adding duplicates
numbers = {1, 2, 2, 3, 3, 3}
print(numbers)
Output:
{1, 2, 3}
Python kept only one copy of each value.
When Are Sets Useful?#
Sets are perfect for finding unique values. Say you have a list of all species in a reaction mechanism and want to know how many unique species there are:
all_species = ["H2", "O2", "H2O", "H2", "OH", "O2", "H2O"]
unique_species = set(all_species)
print(unique_species)
print(f"Found {len(unique_species)} unique species")
Output:
{'H2', 'O2', 'H2O', 'OH'}
Found 4 unique species
Set Operations#
Sets support mathematical set operations:
a = {1, 2, 3, 4}
b = {3, 4, 5, 6}
print(a | b) # Union: all items from both
print(a & b) # Intersection: items in both
print(a - b) # Difference: items in a but not in b
Output:
{1, 2, 3, 4, 5, 6}
{3, 4}
{1, 2}
The in operator checks membership very quickly:
print("H2O" in unique_species) # True
So Which One Should I Use?#
Now you know four data structures. Here’s how to choose:
Practical example: finding unique species#
reactions = ["A + B -> C", "C + D -> E", "A + E -> F"]
all_species = {"A", "B", "C", "D", "E", "F"}
products = {"C", "E", "F"}
reactants_only = all_species - products
print(reactants_only)
Output:
{'A', 'B', 'D'}
When to Use What?#
Structure |
Use When… |
Example |
|---|---|---|
List |
Order matters, items may change |
Time series data, experiment results |
Tuple |
Data shouldn’t change, returning multiple values |
Coordinates, RGB colors, constants |
Dictionary |
You need to look up values by name |
Compound properties, configuration settings |
Set |
You need unique items or set operations |
Finding unique species, checking membership |
Quick Reference#
Lists#
Operation |
Syntax |
Example |
|---|---|---|
Create |
|
|
Access |
|
|
Slice |
|
|
Append |
|
|
Insert |
|
|
Remove |
|
|
Pop |
|
|
Length |
|
|
Sort |
|
|
Tuples#
Operation |
Syntax |
Example |
|---|---|---|
Create |
|
|
Access |
|
|
Unpack |
|
|
Dictionaries#
Operation |
Syntax |
Example |
|---|---|---|
Create |
|
|
Access |
|
|
Safe access |
|
|
Add/update |
|
|
Delete |
|
|
Keys |
|
|
Values |
|
|
Items |
|
|
Sets#
Operation |
Syntax |
Example |
|---|---|---|
Create |
|
|
Add |
|
|
Union |
|
|
Intersection |
|
|
Difference |
|
|
Membership |
|
|
Next Steps#
Continue to Iterators to learn how to loop through data structures efficiently.