Skip to content

4.3 The Method of the Most Probable Distribution

Huang, Statistical Mechanics 2ed, Section 4.3

Where does the Boltzmann factor actually come from?

In Section 4.2 we derived the Maxwell-Boltzmann distribution from collisions. Clever, but it felt a bit... specific. Like, we used the Boltzmann transport equation, we talked about scattering cross sections, we assumed a dilute gas. It worked. But it left me wondering: is the Boltzmann factor \(e^{-\epsilon/k_BT}\) some deep truth about nature, or just a quirk of dilute gases?

Spoiler: it's a deep truth. And this section proves it using nothing but counting.

No collisions. No transport equations. No gas-specific assumptions. Just: "how many ways can you arrange \(N\) molecules across energy levels?" Maximize that number. Out pops the Boltzmann factor. Every time.

That's crazy. Let's see why.

Why should you care?

Because every time your simulation samples a configuration, it's doing this calculation implicitly.

Your NVT thermostat doesn't know about collisions or transport equations. It knows about states and weights. The Boltzmann factor \(e^{-\epsilon/k_BT}\) is the weight. This section shows you where that weight comes from: it's the arrangement that nature overwhelmingly prefers, simply because there are astronomically more ways to realize it than any alternative.

That's not a metaphor. It's literally counting.

The setup: cells and occupation numbers

Forget about collisions for a moment. Think about it differently.

You have \(N\) molecules in a box. Each molecule has some energy. Divide the space of possible single-molecule states (position + momentum) into little cells. Number them \(1, 2, \ldots, K\). Each cell \(i\) has energy \(\epsilon_i\).

Now, an "arrangement" is just a list of occupation numbers: \(n_1\) molecules in cell 1, \(n_2\) in cell 2, and so on. Two constraints:

\[\sum_i n_i = N \qquad \text{(right number of molecules)}\]
\[\sum_i \epsilon_i \, n_i = E \qquad \text{(right total energy)}\]

Different sets of occupation numbers \(\{n_i\}\) describe different distributions. The MB distribution is one specific choice. But how many microscopic states correspond to each choice? That's the key question.

Counting: how many ways?

If I hand you a specific set of occupation numbers \(\{n_i\}\), how many distinct ways can \(N\) distinguishable molecules be arranged to produce that distribution?

It's the multinomial:

\[\Omega\{n_i\} = \frac{N!}{n_1! \, n_2! \, \cdots \, n_K!}\]

This is just "N choose \(n_1\), then \(n_2\), then \(n_3\)..." from combinatorics. Nothing exotic.

\(\Omega\) is the volume in phase space occupied by this distribution. Bigger \(\Omega\) = more microstates = more probable. The postulate of equal a priori probability (from Section 6.1) says every microstate is equally likely. So the distribution with the biggest \(\Omega\) wins.

And "wins" is an understatement. We'll see in a moment that the winner crushes everything else by a factor that makes \(10^{23}\) look small.

Maximizing: Lagrange multipliers to the rescue

We want to maximize \(\log\Omega\) (easier than \(\Omega\) itself) subject to the two constraints. Take the log, hit it with Stirling's approximation (\(\log n! \approx n\log n - n\) for large \(n\)):

\[\log\Omega \approx N\log N - \sum_i n_i \log n_i + \text{const}\]

Now maximize. We have two constraints, so we need two Lagrange multipliers, \(\alpha\) and \(\beta\):

\[\frac{\partial}{\partial n_i}\left[\log\Omega - \alpha\sum_j n_j - \beta\sum_j \epsilon_j n_j\right] = 0\]

Work it out. Each \(n_i\) is independent, so set each term to zero:

\[-(\log n_i + 1) - \alpha - \beta\epsilon_i = 0\]

Solve for \(n_i\):

\[\bar{n}_i = C \, e^{-\beta\epsilon_i}\]

Stop. Look at that.

The most probable distribution is an exponential in the energy. That's the Boltzmann factor. It just fell out of counting arrangements and maximizing. No dynamics. No collisions. No Boltzmann transport equation. Pure combinatorics.

The constants \(C\) and \(\beta\) are fixed by the two constraints (\(\sum \bar{n}_i = N\) and \(\sum \epsilon_i \bar{n}_i = E\)), and when you work them out, \(\beta = 1/k_BT\). Same answer as 4.2. Same MB distribution. Completely different derivation.

Done. Beautiful.

Key Insight

The Boltzmann factor \(e^{-\epsilon/k_BT}\) is not a property of any particular kind of gas or interaction. It's a property of counting. Any system where you distribute particles across energy levels, subject to fixed total number and total energy, produces this exponential. Ideal gas, real gas, lattice model, doesn't matter. The Boltzmann factor is the most probable arrangement, and it wins by a landslide.

But how probable is "most probable"?

And I'm sure you're thinking: "OK, it's the most probable distribution. But what if the second-most-probable is almost as likely? Then 'most probable' doesn't mean much."

Fair question. Let's check.

The fluctuation in the occupation number \(n_k\) around its most probable value turns out to be:

\[\langle n_k^2 \rangle - \langle n_k \rangle^2 = \langle n_k \rangle\]

So the fractional fluctuation is:

\[\frac{\langle n_k^2 \rangle - \langle n_k \rangle^2}{N^2} \sim \frac{1}{N}\]

For \(N = 10^{23}\)? The fluctuation is one part in \(10^{23}\). The most probable distribution isn't just the winner. It's the only contestant that matters. Every other distribution has measure essentially zero.

Huang puts it best: imagine all possible states of the gas with given \(N\) and \(E\) placed in a jar. Wearing a blindfold, you reach in and pull one out. You will get the Maxwell-Boltzmann distribution. Not "probably." Not "almost certainly." You will. The probability of pulling anything else is so small it makes winning the lottery look like a sure thing.

MD Connection

This is why equilibrium simulations work. Your NVT trajectory wanders through phase space, visiting one microstate after another. It doesn't need to visit every microstate. It doesn't even need to visit most of them. It just needs to visit a representative sample. And since essentially all microstates look like the Boltzmann distribution (the fluctuations are \(10^{-23}\)), any reasonable sample will give you the right averages. You'd have to be astronomically unlucky to sample a non-Boltzmann distribution. That's not a hope. It's a theorem.

Two roads, same destination

Let's recap what just happened. We derived the MB distribution two completely different ways:

Section 4.2 Section 4.3
Starting point Boltzmann transport equation Microcanonical ensemble (equal a priori probability)
Key input Collisions conserve energy and momentum Count arrangements, maximize \(\Omega\)
Method Solve functional equation Lagrange multipliers + Stirling
Result \(f_0 \propto e^{-p^2/2mk_BT}\) \(\bar{n}_i \propto e^{-\epsilon_i/k_BT}\)
Needs interactions? Yes (collisions must happen) No (just counting)

Same answer. That's not a coincidence. The Boltzmann factor is the answer to "what does equilibrium look like for a classical system?" It doesn't matter how you ask the question.

Takeaway

The Boltzmann factor \(e^{-\epsilon/k_BT}\) isn't a model. It isn't an approximation. It's what you get when you count the number of ways to distribute particles across energy levels and pick the arrangement with the most ways. And that arrangement doesn't just win. It wins so completely that for \(10^{23}\) particles, literally nothing else exists. That's why your simulations sample Boltzmann-weighted states. Not because the thermostat is clever. Because there's nothing else to sample.

Check Your Understanding
  1. Stirling's approximation needs large \(n\). But some cells will definitely have \(n_i = 0\) or \(n_i = 1\). So why doesn't the whole derivation fall apart for a real system?
  2. You have a tiny system: 100 particles. The fluctuations around the most probable distribution are now \(\sim 1\%\), not \(10^{-23}\). Does the "most probable distribution wins by a landslide" argument still hold, or could other distributions actually matter here?
  3. We got the Boltzmann factor two completely different ways: collision dynamics in 4.2, pure counting in 4.3. Two roads, same \(e^{-\epsilon/k_BT}\). What does that tell you about the Boltzmann factor? Is it a property of gases, or something deeper?
  4. The Lagrange multiplier \(\beta\) exists because we forced total energy to be fixed at \(E\). What if you just... didn't? Drop the energy constraint entirely. What happens to the distribution?