Explaining the Principle of Least Action

The principle of least action is one of the most profound and far-reaching ideas in physics. It's a different way of looking at things that underlies a huge amount of what we humans have learned about the world in the last few hundred years—from Newtonian mechanics, to relativity, to quantum mechanics, and quantum field theory.

The basic idea goes like this: say you have a particle that travels from point 1 to point 2. What trajectory is it going to follow to get there? Newton gave us one approach to answering this question, but beginning in the 17 and 1800s Lagrange and Hamilton and others developed a different strategy. They assigned a number to each possible path called the action, and showed that the path the particle actually takes is the one that minimizes the action. Actually, in quantum mechanics, Feynman showed that the particle in a sense traverses all the possible paths, and the classical path that minimizes the action is the one that dominates.

But we're getting ahead of ourselves. What I want to do here is explain the principle of least action and show how it reproduces the equations you already know and love from Newton's laws. This is the first in a series I'm working on in which I hope to show you the action principle not only in Newtonian mechanics, but in special relativity and general relativity, and even in string theory too, in as accessible a way as possible. For right now I'll just assume that you're familiar with Newton's law $F = ma$, potential energy, and some basic calculus.

So let's say we have a particle of mass $m$ that travels from the point $x_1$ at time $t_1$ to $x_2$ at time $t_2$. What trajectory $x(t)$ is it going to follow to get there? I'm going to work with one spatial dimension $x$ here to keep things from getting unnecessarily complicated, but of course all of this discussion will generalize to three dimensions.

Newton told us that to answer this question we should write down all the forces on the particle, add them up, and then set that equal to the mass times the acceleration:

$$F = m \ddot x.$$

The dots here stand for rates of change—so $x(t)$ is the trajectory that we're looking for, $\dot x(t)$ denotes the velocity function, and $\ddot x(t)$ is the acceleration.

This equation is called the equation of motion. It's a second order differential equation that we would then need to solve to figure out $x(t)$. But that's a math problem—the physics is about how we write down this differential equation in the first place, and Newton gave us one way to do it. The principle of least action is going to give us another way.

You've hopefully learned about how the force $F(x)$ is related to the potential energy function $U(x)$; if not you can check out this earlier video I made. The relation is that the force is minus the slope of the potential energy:

$$F(x) = -\frac{\mathrm{d}U}{\mathrm{d}x}.$$

For example, for gravity acting on a projectile the potential is $U = mg x$ (or $mg y$, if you prefer to use the name $y$ to label the height of the particle), which is a straight line with slope $mg$. Then the force is minus that, $F = - mg$. Or for a mass on a spring, the potential energy is $U = \frac{1}{2} k(x - l)^2$, whose slope is $k(x-l)$, from which we get the spring force $F = - k(x-l)$.

In terms of the potential energy then, we can write the equation of motion as

$$m \ddot x = -\frac{\mathrm{d} U }{\mathrm{d} x }.$$

Now I want to show you the new route to this equation. $U$ is the potential energy of the particle, and of course we can also write down the kinetic energy $K = \frac{1}{2} m \dot x^2$. Their sum $E = K + U$ is the total energy. But actually, right now we want to write down their difference, $K - U$:

$$L = \frac{1}{2} m \dot x^2 - U(x).$$

This combination is called the Lagrangian function, and right now it might look like it's coming out of left field, but let's see where it leads us. For any path $x(t)$ between the particle's starting position $x(t_1) = x_1$ and its ending position $x(t_2) = x_2$, define a number $S$ by integrating the Lagrangian along the curve:

$$S = \int_{t_1}^{t_2} \mathrm{d}t \left( \frac{1}{2} m \dot x(t)^2 - U(x(t)) \right).$$

$S$ is called the action of the path. So far $x(t)$ can be any curve connecting the two given endpoints. We want to use the action to figure out the actual trajectory that the particle will follow.

And here's the claim: the trajectory that the particle follows is the one for which the action $S$ is minimized. It's therefore called the principle of least action. Actually, minimized is too strong a word here—there can be situations where the solution is a saddle instead of a minimum. But let's focus on the typical case here.

So how do we see that this claim is true? Think back to your calculus classes, where you have some function $f(x)$ and you want to find its minima, maxima, or saddle points. These are called extremal or critical points. They're the points where the slope of the function vanishes: $f'(x) = 0$. In other words, say we look at the function at a tiny distance $\varepsilon$ away from a minimum. We can Taylor expand it like so:

$$f(x+\varepsilon) = f(x) + f'(x)\varepsilon + \frac{1}{2} f''(x)\varepsilon^2 + \cdots$$

where the $\cdots$ stand for higher powers of $\varepsilon$. But if $\varepsilon$ is tiny, so that we're near the minimum, then these corrections get tinier and tinier and are unimportant. But if $x$ is a minimum, then $f'(x) = 0$, and so the leading term in the displacement $\varepsilon$ vanishes. That means that when you take a little step $\varepsilon$ away from an extremum, to leading order the value of the function doesn't change at all! This can in fact be taken as the defining property of an extremal point.

The same idea goes for our action $S$ and the critical path $x(t)$. If we successfully find the trajectory $x(t)$ that minimizes the action, then for any nearby path $x(t) + \varepsilon(t)$, the value of $S$ should be unchanged, where $\varepsilon(t)$ is a deformation that can add little "wiggles" to the original curve $x(t)$. So, let's expand our Lagrangian in powers of $\varepsilon(t)$:

$$L(x+\varepsilon) = \frac{1}{2} m (\dot x + \dot \varepsilon)^2 - U(x + \varepsilon).$$

In the first term, we get $(\dot x + \dot \varepsilon)^2 = \dot x^2 + 2 \dot x\dot \varepsilon + \dot \varepsilon^2$, but remember we don't care about things with more than one power of $\varepsilon$ here, so we'll forget about the $\dot\varepsilon^2$. And in the second term we'll apply our usual Taylor series, $U(x+\varepsilon) = U(x) + U'(x) \varepsilon + \cdots$. So we get

$$L(x+\varepsilon) = \frac{1}{2} m (\dot x^2 + 2 \dot x\dot \varepsilon ) - U(x) - U'(x) \varepsilon$$

to leading order in $\varepsilon$. Remember $\dot x(t)$ stands for the derivative of $x(t)$ with respect to $t$, and I'm using $U'(x)$ to denote the derivative of $U(x)$ with respect to $x$.

We can rearrange this a bit like so:

$$L(x+\varepsilon) = \left(\frac{1}{2} m \dot x^2 - U(x)\right) + \left( m \dot x\dot \varepsilon - U'(x) \varepsilon \right).$$

The reason I did that is that now the first term in parentheses is the original Lagrangian $L(x)$, and then the second term is the first correction when you take a little step $\varepsilon$ away. At a minimum of the action, the value is supposed to be unchanged to leading order, and so the contribution from the second piece should vanish:

$$\int_{t_1}^{t_2}\mathrm{d}t \left( m \dot x(t) \dot \varepsilon(t) - U'(x(t))\varepsilon(t) \right) = 0,$$

for any little deformation $\varepsilon(t)$. How can we ensure that this happens? Let's use integration by parts to rewrite the first term as

$$ m \dot x(t) \frac{\mathrm{d} }{\mathrm{d} t } \varepsilon(t) = - m \ddot x(t) \varepsilon + \frac{\mathrm{d} }{\mathrm{d} t } \left(m \dot x(t) \varepsilon(t) \right).$$

When we take the integral from $t_1$ to $t_2$, the second term is just going to give

$$\int_{t_1}^{t_2} \mathrm{dt} ~\frac{\mathrm{d} }{\mathrm{d} t }(m \dot x(t) \varepsilon(t)) = m \dot x(t_2) \varepsilon(t_2) - m \dot x(t_1) \varepsilon(t_1).$$

Remember that we're considering all the paths here that go from $x(t_1) = x_1$ to $x(t_2) = x_2.$ Our deformation $x(t) \to x(t) + \varepsilon(t)$ is a tiny variation of such a path, but we don't want it to change the boundary conditions. So we're only going to allow $\varepsilon$'s that vanish at the boundaries, $\varepsilon(t_1) = \varepsilon(t_2) = 0$. Then the contribution from this second term vanishes!

The leading change in the action is then

$$\int_{t_1}^{t_2}\mathrm{d}t \left( -m \ddot x - U'(x) \right) \varepsilon = 0,$$

where I've pulled out the factor of $\varepsilon$ that's now common to both terms after we've integrated by parts. Since this is supposed to vanish for any deformation $\varepsilon$, we conclude that a trajectory $x(t)$ that minimizes the action must satisfy

$$-m \ddot x - U'(x) = 0.$$

But that's just our equation of motion from before!:

$$m \ddot x = - \frac{\mathrm{d} U }{\mathrm{d} x }.$$

So that proves our claim: of all the paths $x(t)$ that the particle could follow to get from point 1 to point 2, the one it actually takes is the path for which the action $S$ is minimized!

The principle of least action is an extremely powerful way of looking at physics, though if this is your first time encountering it I imagine you might think it looks more complicated than $F = ma$. But actually it's very often the most straightforward way to write down the equations of motion for a system. We computed the leading order change in the action pretty systematically just now, but there's a faster way to get to it once you know the deal: we're essentially taking a derivative of $L = \frac{1}{2} m \dot x^2 - U(x)$. Under the variation $x(t) \to x(t) + \varepsilon(t)$, the change in $L$ from the first term is $2 \cdot \frac{1}{2} m \dot x \dot \varepsilon$, just like taking a derivative, and from the second term it's $U'(x) \varepsilon$. Then the change in the Lagrangian is $m \dot x \dot \varepsilon - U'(x) \varepsilon$, like we found before. Now we just integrate the $\dot \varepsilon$ term by parts, pull out the common factor of $\varepsilon$, and then we arrive at the equation of motion $-m \ddot x - U'(x) = 0.$ With a little practice you'll be able to do all that in your head and go straight from the Lagrangian to writing down the equation of motion.

Another awesome feature of the action approach is that we don't have to deal with any of the annoying vectors that show up in Newton's law. We just pick whatever coordinates we want to describe our system, write the Lagrangian $L = K -U$, and then take its derivative and set it equal to zero. For an explicit example, take a look at the mini-lesson I posted showing how to derive the equation of motion for a pendulum using the Lagrangian.

Of course, all this generalizes to systems with multiple particles in multiple dimensions. You can find the general minimization condition once and for all by taking the variation of the action for a general Lagrangian $L(x_i, \dot x_i)$ with coordinates $x_i$ for any number of particles. We want to compute the change in $L$ when we vary $x_i(t) \to x_i(t) + \varepsilon_i(t)$. First we take the derivative of $L$ with respect to $x_i$, and then multiply by the change in $x_i$; that gives us

$$\frac{\partial L }{\partial x_i } \varepsilon_i.$$

The $\partial$'s here denote partial derivatives, which just means that we pretend everything in $L$ is a constant except for the variable that we want to take the derivative with respect to. Then we also have to account for the change in $\dot x_i.$ So we take the derivative of $L$ with respect to $\dot x_i$, and then multiply by its change $\dot \varepsilon_i.$ So we add on

$$\frac{\partial L }{\partial \dot x_i } \dot\varepsilon_i.$$

Lastly, we want to integrate this by parts again so that we can pull the $\varepsilon_i$ out. That gives us the change in the action:

$$\sum_i\int_{t_1}^{t_2}\mathrm{dt} \left( \frac{\partial L}{\partial x_i} - \frac{\mathrm{d} }{\mathrm{d} t } \frac{\partial L}{\partial \dot x_i} \right) \varepsilon_i = 0.$$

Requiring that this vanishes implies

$$\frac{\mathrm{d} }{\mathrm{d} t } \frac{\partial L}{\partial \dot x_i} = \frac{\partial L}{\partial x_i}$$

for each coordinate. This is called the Euler-Lagrange equation. In our example with $L = \frac{1}{2} m \dot x^2 - U(x)$, we have

$$\frac{\partial L}{\partial x} = - U'(x)$$

and

$$\frac{\mathrm{d} }{\mathrm{d} t } \frac{\partial L}{\partial \dot x} = \frac{\mathrm{d} }{\mathrm{d} t }(m \dot x) = m \ddot x.$$

Then the Euler-Lagrange equation gives $m\ddot x = - U'(x)$, just like we found by explicitly taking the variation of the action and setting it equal to zero. As a matter of fact, most of the time physicists compute the equations of motion like I showed by taking the variation of the action rather than having to remember the form of the Euler-Lagrange equation.

All this might seem a little mysterious or even miraculous if this is your first time learning about the principle of least action. Where is this coming from? The last thing I want to do is briefly describe how the principle of least action arises from the classical limit of a more precise quantum mechanical treatment of the motion. Although, if you haven't learned quantum mechanics before I suppose this will be even more mysterious! But hopefully it'll make you curious to go off and keep learning more.

In quantum mechanics, all we can say is that the particle has a certain probability of traveling from its initial point $x(t_1) = x_1$ to its final point $x(t_2) = x_2.$ The rules of quantum mechanics tell us how to compute the transition amplitude, denoted $\langle x_2, t_2 | x_1, t_1 \rangle$, and then the probability is given by the square of this. The famous physicist Richard Feynman showed (in his PhD thesis!) that this amplitude is related to the action as follows.

Consider the possible paths from the starting point to the end point. Assign a complex number to each path given by $e^{i S/\hbar}$, where $S$ is the action of the path and $\hbar$ is Planck's constant, which characterizes the scale of quantum mechanical effects. Then the particle takes all the possible paths from point 1 to point 2, and if we add up these weights $e^{i S/\hbar}$ for each path we get the transition amplitude:

$$\langle x_2,t_2|x_1,t_1\rangle \propto \int \mathrm{D}x~ e^{i S/\hbar}.$$

The integral here stands for the sum over all the paths from $(x_1,t_1)$ to $(x_2,t_2)$. It's therefore called the path integral. It's more complicated than the ordinary integrals you're familiar with, because we're not just summing over a regular variable, we're summing over functions $x(t)$.

Now, what does this have to do with the principle of least action? These weights $e^{i S/\hbar}$ are phases, meaning that they're complex numbers of absolute value one. In other words, they're like arrows pointing on a circle of radius one. When you add up all the phases $e^{i S/\hbar}$ for all the paths the particle can take, these arrows mostly point at random directions all around the circle, and they add up to zero. The exception is for the paths near the one that minimizes the action. Because for those paths the action is nearly a constant, as by definition the action doesn't change to leading order around the minimum. Then these contributions near the classical path have approximately the same weight $e^{i S/\hbar}$, and those arrows add constructively instead of cancelling out!

The result is that the path integral is dominated by the trajectory that minimizes the action, which as we've seen yields the classical solution, $\langle x_2, t_2|x_1,t_1\rangle \sim e^{i S_\mathrm{classical}/\hbar}$! So this is how the principle of least action emerges from quantum mechanics. But of course, now you should ask why the heck this path integral computes this probability like I claimed. But that will have to wait for another day.