The Dirac Delta Function, Explained

Introduction

The Dirac delta function is one of the most useful—but also most commonly misunderstood—objects used all across math, physics, and engineering. In this lesson, I'll show you both how to think about it theoretically, as well as five key applications where it shows up in practice.

To begin with, here's a sketch of the basic idea. Picture a simple rectangular function $f(x)$ of width $w$ and height $1/w$. The area underneath it—in other words, the integral of $f$—is then always equal to one, regardless of what we choose for $w$:

$$ \int\limits_{-\infty}^\infty \mathrm dx\, f(x) = 1. $$

When $w$ is large, the function is wide and squat. And when $w$ is small, it's tall and narrow. In fact, in the limit that $w$ becomes vanishingly small, the graph becomes an infinitely tall spike that's equal to zero everywhere except at the origin, but still with that same area of one underneath it.

These two properties, roughly speaking, define the Dirac delta function, denoted $\delta(x)$: $$ \int\limits_{-\infty}^{\infty} \mathrm d x\, \delta(x) = 1, \qquad \delta(x)= \begin{cases} 0 & x \neq 0 \\ \infty & x = 0. \end{cases} $$

Or at least, this is the way that physicists usually like to define it.

It's named for Paul Dirac—one of the founders of quantum mechanics—who wrote it down almost exactly one hundred years ago. (Although it had also been explored years earlier by Oliver Heaviside.) And, intuitively, it's meant to represent the density of a point-like source, as we'll discuss shortly.

Of course, to try to construct a function that's equal to zero everywhere except at one point, where it's suddenly equal to infinity, is an extremely singular definition. In fact, the delta "function" doesn't technically even qualify as a function at all. And in trying to make rigorous sense of it, mathematicians had to generalize the very notion of a function itself. But again, we'll return to that topic shortly.

However you define it, though, it's hard to overstate just how prevalent the delta function is in applications throughout physics, math, and engineering. And by the end of this lesson, you'll understand why.

Motivation: The Density of a Point Particle

Before we get too deep into the math, let me try to motivate where the idea of the delta function comes from with the first of our five physical examples: the density of a point particle.

Suppose we have a solid ball with some total mass $m$ that's distributed uniformly throughout its volume, $V$. Then what's the density function $\rho(\vec r)$ that tells us the amount of mass per unit volume at each point in this space?

The density at any point inside the ball is a constant: it's its mass $m$ divided by its volume $V$. And of course, at any point outside the ball, the density is equal to zero because there's no mass out there to be found. And so we obtain a simple function for the mass density at any point in this space:

$$ \rho(\vec r) = \begin{cases} \frac{m}{V} & \text{inside} \\[1mm] 0 & \text{outside}. \end{cases} $$

Going the other way, if somebody handed us some density function and asked us to compute the total mass, the way we would work it out is by slicing the space up into lots of tiny chunks. The mass of each little piece is then given by its tiny volume $\mathrm dv$, times the mass per volume $\rho(\vec r)$ at that location:

$$ \mathrm dm = \mathrm dv \, \rho(\vec r). $$

If we add up those contributions by integrating over all the little pieces making up the space, we'll obtain the total mass contained within the region:

$$ m = \int \mathrm dv\, \rho(\vec r). $$

Indeed, plugging in the density function for our solid ball, the only contribution to the integral comes from inside the ball, where the density is non-zero:

$$ \int_\mathrm{Ball} \mathrm dv\, \frac{m}{V} = \frac{m}{V} \cdot V. $$

The integral simply becomes the constant density $m/V$ times the ball's total volume $\int_\mathrm{B} \mathrm dv = V$, which, as expected, gives us back its mass, $m$.

So far so good. But very often in physics, we deal not with finite-size objects, but with point-like particles. So what happens to our density function in the limit that the ball shrinks down to become infinitesimally small?

It's easier to visualize things if we drop down to one dimension of space. In that case, our 3D ball turns into a 1D line segment, whose density is its mass $m$ divided by its width $w$.

If we make a plot of the density $\rho(x)$ as a function of $x$, it's equal to zero everywhere outside that tiny strip, where it suddenly jumps up to the constant value $m/w$ inside. And, when we integrate the density function over the whole line, it once again computes the total mass, $m$:

$$ m = \int\limits_{-\infty}^{\infty} \mathrm dx\, \rho(x). $$

In other words, the area under the plot of the density function is always equal to the mass.

And now we can see what happens when we let $w$ shrink down to zero size, representing a point particle sitting at the origin of our 1D space. The density function becomes an infinitesimally narrow, infinitely tall spike—but still with that same fixed area of $m$ lying beneath it.

Therefore, the density function representing a point particle of mass $m$ located at $x = 0$ is a delta function, multiplied by a factor of $m$:

$$ \rho(x) = m \delta(x). $$

It's equal to zero everywhere apart from the location of the particle, but when we integrate it we get back the mass, $m$:

$$ \rho(x \neq 0) = 0, \qquad \int\limits_{-\infty}^\infty \mathrm dx\, \rho(x) = m. $$

I hope this motivating example makes it clear why the delta function emerges so naturally in physics, whenever we're dealing with the behavior of point particles.

And now that we have some physical motivation for where this strange beast comes from, let's return to try to define it more precisely and extract some of its key properties.

Key Properties

As we've now seen, we can think of the Dirac delta function as a sequence of ordinary functions that become more and more sharply peaked around the origin, but always with an area of one underneath.

There's nothing unique about the rectangular step function we've been using so far, however. For example, we could just as well consider a Gaussian function $G(x)$ with standard deviation $\sigma$—in other words, a bell curve:

$$ G(x) = \frac{1}{\sqrt{2\pi \sigma^2}} e^{-\frac{x^2}{2\sigma^2}}. $$

The area under the bell curve is always equal to one,

$$ \int\limits_{-\infty}^{\infty} \mathrm dx \, G(x) = 1, $$

and, if we let the standard deviation shrink down to zero size, it once again approaches the delta function.

Whichever sequence you choose, the delta function always satisfies those same two properties: its integral is equal to one, and it vanishes at all but one point.

Clearly that leads to a very singular sort of object, and we'll come to the more precise way that mathematicians formalize all this in just a minute. But for now, let's keep following our nose and work out some of the other key properties satisfied by this object.

First, although we've been integrating $\delta(x)$ over the whole real line so far, we don't actually have to. After all, the integrand is equal to zero everywhere except the origin, and we can therefore restrict the range of the integral to any segment from point $a$ to point $b$, as long as it encloses the spike at the origin:

$$ \int\limits_a^b \mathrm dx\, \delta(x) = 1, \qquad a < 0 < b. $$

In fact, we can even choose an infinitesimally small neighborhood of the origin, say, from $-\varepsilon$ to $+\varepsilon$, and the integral would still be equal to one:

$$ \int\limits_{-\varepsilon}^\varepsilon \mathrm dx \, \delta(x) = 1. $$

Next, rather than considering just the integral of $\delta$ alone, let's see what happens when we integrate it against some other ordinary function, $f$:

$$ \int\limits_{-\infty}^\infty \mathrm dx\, \delta(x) f(x)= ? $$

When we multiply $\delta$ with $f$, we still get zero everywhere apart from the origin. So it doesn't actually matter what $f$ looks like aside from that point.

And since we're free to shrink the domain of the integral down to a small window surrounding the origin, we can just pull out the value of $f$ at that location:

$$ \int\limits_{-\varepsilon}^\varepsilon \mathrm dx\, \delta(x) f(x)= f(0)\int\limits_{-\varepsilon}^\varepsilon \mathrm dx\, \delta(x). $$

The remaining integral is equal to one, and therefore the effect of integrating a function against $\delta(x)$ is to pick out its value at the location of the spike:

$$ \int\limits_{-\infty}^\infty \mathrm dx \, \delta(x) f(x) = f(0). $$

This effect is often called the "sifting property" of the delta function, and it generalizes the fact that the integral of $\delta$ alone is equal to one, which corresponds to the special case where we choose $f$ to be a constant function, $f(x) = 1$.

Moreover, there's nothing special about the point $x = 0$ in the above discussion. We could just as well shift the location of our spike to some other point—call it $x = a$—simply by shifting the argument of the delta function to $\delta(x-a).$

In that case, the sifting property picks out the value of $f$ at point $a$, where the shifted delta function has its spike:

$$ \int\limits_{-\infty}^\infty \mathrm dx\, \delta(x-a) f(x) = f(a). $$

The sifting property means that whenever you see an integral with a delta function in it, you should jump for joy! Because all you need to do to evaluate it is identify the point where the argument of the delta function is equal to zero, plug that value into whatever function is being integrated along with it, and you're done!

To see an application of the sifting property in action, suppose we have a distribution of multiple particles on a line, and we want to determine the location of their center-of-mass. To start with, we can write down the total density function by adding up a series of delta functions—one at the location of each particle:

$$ \rho(x) = m_1 \delta(x - x_1) + \cdots + m_N \delta(x -x_N). $$

From there, the COM is defined by integrating the density against the coordinate $x$, divided by the total mass, $M$:

$$ x_\mathrm{CM} = \frac{1}{M}\int\limits_{-\infty}^\infty \mathrm dx\, x\rho(x). $$

Then, when we plug in our series of delta functions, we can apply the sifting property to each term separately; each one simply collapses the integral to the location of each particle:

$$ \int\limits_{-\infty}^\infty \mathrm dx\, x \left( m_1 \delta(x - x_1) + \cdots + m_N \delta(x -x_N) \right) =m_1 x_1 + \cdots +m_N x_N. $$

We therefore obtain the usual weighted-average formula for the COM of a distribution of particles:

$$ x_\mathrm{CM} = \frac{m_1 x_1 + \cdots +m_N x_N}{m_1 + \cdots + m_N}. $$

Additional Properties

The delta function satisfies several more useful properties, all related to what happens when we insert different things inside of its argument.

First, as is apparent from the picture of $\delta(x)$ as a spike at the origin, the delta function is even:

$$ \delta(-x) = \delta(x). $$

We can see as much by verifying that the two sides give the same result when we integrate them over the real line. By performing the change of variables $u = -x$, we can rewrite the integral of $\delta(-x)$ as

$$ \int\limits_{x=-\infty}^{x=\infty} \mathrm dx\, \delta(-x) = \int\limits_{u=\infty}^{u=-\infty} (-\mathrm du) \,\delta(u). $$

Swapping the limits of the integral cancels out the minus sign we got from replacing $\mathrm dx = -\mathrm du$, and therefore we find that

$$ \int\limits_{-\infty}^\infty \mathrm dx \, \delta(-x) = 1. $$

Thus, $\delta(-x)$ satisfies the same defining properties as $\delta(x)$, and we conclude that the two are identical.

Next, consider the delta function $\delta(kx)$ with its argument rescaled by some real number $k$. Once again performing a change of variables $u = kx$, we find that

$$ \int\limits_{x=-\infty}^{x=\infty} \mathrm dx \, \delta(kx) = \int\limits_{u=-k \cdot \infty}^{u= k \cdot \infty} \frac{\mathrm du}{k} \delta(u). $$

$k\cdot \infty$ is still infinite, and so we get back the integral of $\delta(u)$ over the whole real line, times a factor of $1/k$. The only catch is that, if $k$ is negative, then the limits of the integral come out in the opposite order, contributing a minus sign.

We therefore find that

$$ \int\limits_{-\infty}^\infty \mathrm dx\, \delta(kx) = \begin{cases} \frac{1}{k} & k > 0 \\ -\frac{1}{k} & k < 0, \end{cases} $$

or, more simply,

$$ \int\limits_{-\infty}^\infty \mathrm dx\, \delta(kx) = \frac{1}{|k|}. $$

Thus, $\delta(kx)$ integrates to $1/|k|,$ despite vanishing everywhere except at the origin. We therefore conclude that

$$ \delta(kx) = \frac{1}{|k|} \delta(x). $$

Note that the previous result, $\delta(-x) = \delta(x)$, corresponds to the special case where $k = -1$. It also shows the importance of taking the absolute value of $|k|,$ since, if we had written $\delta(kx) = \frac{1}{k} \delta(x),$ we would have gotten the wrong answer when $k = -1$!

Even more generally, we can imagine inserting a general function $g(x)$ into the argument of the delta function, $\delta(g(x)).$ Since the delta function is only non-zero when its argument vanishes, the only contributions to the result will come from the points where $g(x)$ is itself equal to zero.

Thus, if we write $\{x_i\}$ for the zeros of $g(x)$, we expect the answer to take the form of a sum of delta functions, with a spike at the location of each zero of $g$:

$$ \delta(g(x)) \sim \sum_i \delta(x-x_i). $$

This equation can't be correct as written, though. If we choose $g(x) = k x$, which has a single zero at $x_i = 0$, we know from above that we should get $\delta(g(x)) = \frac{1}{|k|} \delta(x)$. That example does suggest the correct answer, though. Since $k = g'(x_i)$, we conjecture that

$$ \delta(g(x)) = \sum_i \frac{1}{|g'(x_i)|} \delta(x - x_i). $$

To verify that this is indeed the correct result, we again need to check that integrating both sides gives the same answer. When we go to evaluate the integral

$$ \int\limits_{-\infty}^\infty \mathrm dx\, \delta(g(x)), $$

since the integrand vanishes except at the zeros of $g$, the integral collapses to the neighborhoods of those points:

$$ \int\limits_{-\infty}^\infty \mathrm dx\, \delta(g(x)) = \sum_i\int\limits_{x_i-\varepsilon}^{x_i + \varepsilon} \mathrm dx\, \delta(g(x)). $$

At each zero, we can expand $g(x)$ in a Taylor series:

$$ g(x) = g'(x_i) (x-x_i) + \cdots, $$

where the $\cdots$ represent higher-order corrections proportional to higher powers of $(x-x_i)$. But since we're only concerned here with the infinitesimal neighborhood surrounding each point, we can ignore those corrections. We therefore obtain

$$ \int\limits_{-\infty}^\infty \mathrm dx\, \delta(g(x)) = \sum_i\int\limits_{x_i-\varepsilon}^{x_i + \varepsilon} \mathrm dx\, \delta(g'(x_i)(x-x_i)). $$

Finally, by performing the change of variables $u = x - x_i$ at each zero, we get

$$ \int\limits_{-\infty}^\infty \mathrm dx\, \delta(g(x)) = \sum_i\int\limits_{-\varepsilon}^{\varepsilon} \mathrm du\, \delta(g'(x_i)u), $$

which, according to our previous result, gives us back $1/|g'(x_i)|$ for each term:

$$ \int\limits_{-\infty}^\infty \mathrm dx\, \delta(g(x)) = \sum_i \frac{1}{|g'(x_i)|}. $$

Therefore, as claimed above, we can indeed identify $\delta(g(x))$ with a sum of delta functions, one at the location of each (simple) zero of $g(x)$, divided by the slope of $g$ at that point:

$$ \delta(g(x)) = \sum_i \frac{1}{|g'(x_i)|} \delta(x-x_i). $$

This formula generalizes our earlier two results for $\delta(-x)$ and $\delta(kx)$, which correspond to the special cases where $g(x) = -x$ and $g(x) = kx$.

The 3-Dimensional Delta Function

Before we move on, I should mention that the 1-dimensional delta function we've been discussing here can be generalized to higher dimensions in a straightforward way. As before, you can think of it as an infinitely tall, infinitesimally narrow spike at a given point in space.

In two dimensions, with Cartesian coordinates $x$ and $y$, we can define the delta function simply by taking the product of $\delta(x)$ and $\delta(y)$:

$$ \delta^2(\vec r) = \delta(x)\delta(y). $$

And likewise, in three dimensions, we'd add on a factor of $\delta(z)$ as well:

$$ \delta^3(\vec r) = \delta(x) \delta(y) \delta(z). $$

$\delta^3(\vec r)$ is equal to zero everywhere except the origin, but its volume integral over 3D space is equal to one:

$$ \int_{\mathbb R^3} \mathrm dv\, \delta^3(\vec r) = 1. $$

Thus, to complete the answer to our earlier question, the mass density function for a particle sitting at the origin of 3D space is given by $m$ times a 3-dimensional delta function:

$$ \rho(\vec r) = m \delta^3(\vec r). $$

And if we have a distribution of multiple particles at various points, we simply add up a delta function for each of them:

$$ \rho(\vec r) = \sum m_i \delta^3(\vec r - \vec r_i). $$

What's more, the exact same discussion applies if we're dealing with electric charges, and we're trying to construct not the mass density, but the charge density. All we need to do is replace all the $m$'s with $q$'s representing the charges of the individual particles:

$$ \rho(\vec r) = \sum q_i \delta^3(\vec r - \vec r_i). $$

There are more applications of the delta function to physics that I want to tell you about. But first, we should back up a step and spend at least a little time discussing how mathematicians define the delta function more formally.

The Delta Distribution

We've managed to learn a lot about the Dirac delta function just by following our nose. There's one glaring problem with all of this discussion, though. $\delta(x)$ doesn't actually exist!

That is, to try to construct a function that's equal to zero everywhere except at one point, where it suddenly shoots off to infinity, doesn't make sense by any reasonable definition of a function.

For example, how would we distinguish between, say, $\delta(x)$ and $2\delta(x)$? They both take the same values, because $2$ times $\infty$ is still $\infty$:

$$ \delta(x) = \begin{cases} 0 & x \neq 0\\ \infty & x = 0 \end{cases} \qquad 2\delta(x) = \begin{cases} 0 & x \neq 0\\ \infty & x = 0. \end{cases} $$

And yet, when we integrate them, we're supposed to get 1 for the first and 2 for the second:

$$ \int\limits_{-\infty}^\infty \mathrm dx \, \delta(x) = 1, \qquad \int\limits_{-\infty}^\infty \mathrm dx\, 2\delta(x) = 2. $$

You can therefore see why mathematicians were, reasonably enough, highly skeptical of the delta "function" after Dirac popularized it in physics in the 1920s.

It wasn't until 20 years later when a French mathematician named Laurent Schwartz put the delta function on firm mathematical footing by generalizing the very notion of a function, for which he won the Fields Medal in 1950.

The key to Schwartz's approach is the sifting property of the delta function, which, remember, picks out the value of any function $f$ at the location of the spike:

$$ \int\limits_{-\infty}^\infty \mathrm dx\, \delta(x) f(x) = f(0). $$

The only problem is that the integral itself doesn't actually make sense. So, a mathematician would say: forget about the integral! Just define an abstract machine called $\delta$ that takes in a function $f$ and returns its value at the origin:

$$ \delta: f \to f(0). $$

This, in a nutshell, is the more precise way of thinking about the Dirac delta.

What's more, this machine satisfies one of the most beautiful properties that any map can satisfy: it's linear. Meaning that if we pass in a sum of two functions $f$ and $g$—perhaps multiplied by some constants $\alpha$ and $\beta$—then we can distribute the map to each of the two terms separately, and moreover pull the constant factors out front:

$$ \delta[\alpha f + \beta g]= \alpha \delta[f] + \beta \delta[g]. $$

The same was true of our original integral, after all, since if we insert a linear combination of functions, we can break it up into a sum of two integrals, and just add up their results:

$$ \int\limits_{-\infty}^\infty \mathrm dx\, \delta(x) \left( \alpha f(x) + \beta g(x) \right)\ = \alpha \int\limits_{-\infty}^\infty\mathrm dx\, \delta(x) f(x) + \beta \int\limits_{-\infty}^\infty \mathrm dx\, \delta(x) g(x). $$

Thus, we can think of the Dirac delta as defining a particular linear map from the space of functions to numbers. In fact, there are many maps of this kind, and Schwartz named them distributions, after the mass and charge distributions from physics that inspired them.

$$ \text{Distribution: test function} \to \text{number}. $$

More precisely, we can't just insert any old function into a distribution willy-nilly. Instead, we have to choose from a class of particularly nice functions known as "test functions." For example, we can choose our test functions to be smooth curves that are non-zero only in a finite segment of the real line.

A general distribution $T$ is then defined as a continuous, linear map that takes in a test function $f$ and returns a number $T[f]$:

$$ T: f \to T[f]. $$

Note that we're using square brackets here in order to emphasize the distinction between a distribution $T[f]$ and an ordinary function $f(x).$ A function assigns a number $f(x)$ to a point $x,$ whereas a distribution assigns a number $T[f]$ to an entire function $f$.

And there are a whole lot of distributions out there! For one thing, most any ordinary function $\phi$—not necessarily a test function—can be converted into a distribution $T_\phi$, simply by sticking the function inside of an integral, provided that it converges for any test function:

$$ T_\phi[f] := \int\limits_{-\infty}^\infty \mathrm dx\, \phi(x)f(x). $$

An integral certainly qualifies as a continuous, linear map, and so an ordinary function can also be viewed as a distribution. But the reverse is not true—there are many distributions which don't originate in this way from ordinary functions. And in that sense, a distribution is a generalization of the concept of a function.

The idea here is that with an ordinary function $\phi$, we could of course read off its values $\phi(x)$ at specific points in space. But another thing that we can do is take a kind of average of the function near a given point by integrating it against a test function supported in that region—a sort of "smeared out" version of $\phi(x)$. And that's what $T_\phi[f]$ computes.

A general distribution, on the other hand, is too singular to be evaluated at individual points. But we can still compute its smeared values by evaluating $T[f]$ on a test function.

Of course, the most famous example of a distribution that isn't a function is the Dirac delta, which, as we've already seen, is defined by the property that it takes in any test function and returns its value at the origin:

$$ \delta[f] = f(0). $$

This beautifully simple equation is the true definition of the Dirac delta.

But... old habits die hard. And so, by analogy with the case of an ordinary function, as physicists we abuse the notation and once again write the distribution as if it were an integral:

$$ \delta [f] = \int\limits_{-\infty}^\infty \mathrm dx\, \delta(x) f(x), $$

and we write $\delta(x)$ for the corresponding Dirac delta "function". But really, it's just a somewhat sloppy—but very useful—shorthand for the actual distribution.

We can mostly get away with it, though, because, as we've seen, we can think of the result as a limit of integrals that are actually well-defined—whether we use the rectangular functions, Gaussians, or whatever else:

$$ \delta[f] = \lim_{w\to 0} \int\limits_{-\infty}^\infty \mathrm dx \, \delta_w(x) f(x). $$

The Derivative of a Distribution

Schwartz's theory of distributions has enabled us to give precise mathematical meaning to the Dirac delta. In practice, however, whether we write $\delta[f]$ as a mathematician would or $\int \mathrm dx\, \delta(x)f(x)$ as a physicist would mostly amounts to a difference in notational conventions.

There's much more to Schwartz's theory, however. Its true power lies in the fact that it enables us to do calculus with distributions; namely, that it lets us take derivatives, even of an extraordinarily singular object like the delta function.

Our Gaussian approximation to $\delta(x)$ gives us a clue as to what to expect here. If we plot its derivative, the slope starts out close to zero, then positive, then negative, and back to zero again. $G'(x)$ therefore has two spikes—one shooting up at $x=-\sigma$ and the other shooting down at $x=+\sigma.$

As a result, when we integrate a test function $f$ against that curve, it picks out the values of $f$ at the locations of those two spikes, with a relative minus sign between them, as well as an overall normalization factor which you can show comes out to $1/(2\sigma)$:

$$ \int\limits_{-\infty}^\infty \mathrm dx\, G'(x)f(x) \approx \frac{-f(\sigma) +f(-\sigma)}{2\sigma}. $$

In the limit that $\sigma$ shrinks down to zero size, we recognize that the result is nothing but the derivative of $f$ at $x = 0,$ up to a sign:

$$ \int\limits_{-\infty}^\infty \mathrm dx\, G'(x)f(x) \overset{\sigma \to 0}{\to} -f'(0). $$

Therefore, whereas integrating $f(x)$ against $\delta(x)$ returned its value $f(0)$ at the origin, we expect to find that integrating against $\delta'(x)$ should pick out the value of its derivative, times a minus sign:

$$ \int\limits_{-\infty}^\infty \mathrm dx\, \delta'(x)f(x) =-f'(0). $$

To verify this property—at a physicist's level of rigor—all we need to do is integrate by parts in the integrand:

$$ \delta'(x) f(x) = \frac{\mathrm{d} }{\mathrm{d} x } (\delta(x) f(x)) - \delta(x)f'(x). $$

When we integrate, the first term disappears because $\delta(x)$ vanishes at the boundaries $x = \pm \infty$, and so we're left with

$$ \int\limits_{-\infty}^\infty \mathrm dx\, \delta'(x)f(x) = -\int\limits_{-\infty}^\infty \mathrm dx\,\delta(x)f'(x). $$

In other words, we can pull the derivative off of the $\delta$ and onto the $f$ at the cost of a minus sign.

Applying the defining property of the delta function, we indeed obtain

$$ \int\limits_{-\infty}^\infty \mathrm dx\, \delta'(x)f(x) = -f'(0), $$

as anticipated.

Once again, these manipulations can be rigorously justified using the language of distributions, where the above equation would be written in mathematicians' notation as

$$ \delta'[f] = -f'(0). $$

To see how this works, let's first return to the formula that told us how to turn an ordinary function $\phi$ into a corresponding distribution:

$$ T_\phi[f] = \int\limits_{-\infty}^\infty \mathrm dx\, \phi(x)f(x). $$

If we were to instead plug in the derivative of $\phi$ here, we would get

$$ T_{\phi'}[f] = \int\limits_{-\infty}^\infty \mathrm dx\, \phi'(x)f(x), $$

and, integrating by parts, we can once again move the derivative off of $\phi$ and onto $f$, at the cost of a minus sign:

$$ T_{\phi'}[f] = -\int\limits_{-\infty}^\infty \mathrm dx\, \phi(x)f'(x). $$

(There's no boundary term to worry about since, by definition, the test function $f(x)$ vanishes at $x \to \pm \infty$).

The resulting formula is then nothing but $T_\phi$ again, except that now it's acting on (minus) $f'$:

$$ T_{\phi'}[f] = -T_\phi[f']. $$

With that motivation in mind, we see that it's natural to define the derivative of a general distribution $T'[f]$ by pulling the derivative off of the $T$ and onto the test function $f$, along with an overall minus sign:

$$ T'[f] = -T[f']. $$

And since the test function $f$ is, by definition, infinitely differentiable, this definition tells us how to take the derivative of any distribution that we want!

For example, the derivative of the Dirac delta is defined by

$$ \delta'[f] = -\delta[f']. $$

Remembering that when we act $\delta$ on any test function we just get back its value at the origin, we therefore obtain

$$ \delta'[f] = -f'(0), $$

just as we anticipated above on physical grounds.

For another example of the power of these tools, consider the step function, $\Theta(x)$. It's defined to be equal to zero when $x$ is negative and equal to one when $x$ is positive. In between—at $x = 0$—the function is discontinuous; it jumps from zero to one.

According to what you learned in intro calculus, this function is clearly non-differentiable at $x = 0$. And yet, if we think about the slope of the above graph, it's equal to zero everywhere except at $x = 0$, where the slope is suddenly infinite. And that sounds suspiciously like the Dirac delta function!

Indeed, when viewed as a distribution, the step function is perfectly differentiable.

To rigorously take the derivative, we first convert $\Theta(x)$ into a distribution:

$$ \Theta[f] = \int\limits_{-\infty}^\infty \mathrm dx\, \Theta(x) f(x). $$

Its derivative, according to our new definition, is then given by

$$ \Theta'[f] = -\Theta[f'], $$

and, when we expand out the right-hand side, we get

$$ \Theta'[f] = -\int\limits_{-\infty}^\infty \mathrm dx\, \Theta(x) f'(x). $$

Since $\Theta(x)$ vanishes for $x < 0$, its effect inside of the integral is just to throw out the portion of the range where $x$ is negative:

$$ \Theta'[f] = -\int\limits_0^\infty \mathrm dx\, f'(x). $$

Finally, applying the fundamental theorem of calculus, we get

$$ \Theta'[f] = -\left ( f(\infty) - f(0) \right). $$

The test function has to vanish at infinity, and so we're left with

$$ \Theta'[f] = f(0). $$

Thus, taking the derivative of $\Theta$, viewed as a distribution, has given us back a new distribution $\Theta'$ that takes in any test function $f$ and returns its value at the origin.

But we already know the distribution that satisfies that defining property! It's the Dirac delta. And we therefore conclude that

$$ \Theta' = \delta, $$

just as we anticipated from the graph of $\Theta(x)$.

In this way, the theory of distributions enables us to take derivatives of objects like the step function that have no business being differentiated when viewed as ordinary functions. And that's especially important in physics applications, because the solutions to many differential equations that we write down in physics often come out to be distributions rather than ordinary functions.

But that's enough about the abstract mathematics of distributions for now. Let's get back to some of the key applications where the delta function shows up in practice. There are four more that I want to tell you about, starting off with a classic example from classical mechanics.

Modeling an Impulsive Force

We've seen that the delta function represents the mass density of a point particle located at some point in space, $\rho(x) = m \delta(x).$ But it can also represent a point source in time.

To explain what I mean, think about a hammer striking a nail. The nail is initially at rest when, at time $t = 0$, the hammer hits it for a brief moment, rapidly accelerating it to some final speed $v$.

The question is, what force did the hammer exert on the nail during that time?

Well, when a force acts on an object, it accelerates it according to Newton's 2nd law,

$$ F = m \frac{\mathrm{d} v}{\mathrm{d} t }, $$

or, in other words, the force is the rate of change of the momentum $p = mv$:

$$ F = \frac{\mathrm{d} p}{\mathrm{d} t }. $$

The change in the particle's momentum is then given by integrating the force over time, and it's called the impulse that's been delivered to the particle:

$$ \Delta p = \int\limits_{t_i}^{t_f} \mathrm dt\, F. $$

That is, if we plot the force as a function of time, then the area underneath the curve is the change in the particle's momentum, $\Delta p$.

For our nail, it started out at rest and wound up at speed $v$, so the change in its momentum is simply $\Delta p = m v$. The force from the hammer, on the other hand, was only non-zero for the brief moment $\Delta t$ while the two objects were in contact. Suppose it had some approximately constant magnitude $F_0$ during that short time interval.

Then the graph of the force resembles the simple rectangular function that we started with from the very beginning. And, for a short-acting force like this that's exerted over a vanishingly small time interval, it becomes an infinitesimally narrow spike, but with that same area of $mv$ underneath it.

Therefore, we can model the force as a delta function, times that overall factor of $mv$:

$$ F(t) = mv \delta(t). $$

$F(t)$ is non-zero only at the moment of impact $t= 0$, but when we integrate it over time to get the total impulse,

$$ \underbrace{\int\limits_{-\infty}^\infty \mathrm dt\, F(t)}_{\Delta p} = mv \underbrace{\int\limits_{-\infty}^\infty \mathrm dt\, \delta(t)}_1, $$

we correctly reproduce the final momentum of the nail:

$$\Delta p = m v.$$

Gauss's Law for a Point Charge

The next application of the delta function that I want to show you comes from electromagnetism.

The physics of electric and magnetic fields is described by a set of differential equations known as Maxwell's equations:

$$ \begin{alignat*}{2} &\vec \nabla \cdot \vec E = \frac{\rho}{\epsilon_0}&\quad\quad\quad &\vec\nabla \cdot \vec B = 0\\ &\vec\nabla \times \vec E = -\frac{\partial \vec B}{\partial t}& &\vec\nabla \times \vec B = \mu_0 \vec J + \mu_0\epsilon_0 \frac{\partial \vec E}{\partial t}. \end{alignat*} $$

The first of these is what I want to focus on right now. It's called Gauss's law, and it says that the divergence $\vec\nabla \cdot \vec E$ of the electric field at any point in space must equal the charge density $\rho$ at that same point.

If you studied my previous lesson, then you're familiar with at least one example of an electric field: the Coulomb field produced by a particle of charge $Q$ sitting at rest at the origin,

$$ \vec E = \frac{Q}{4\pi \epsilon_0} \frac{\hat r}{r^2}. $$

It's a vector field that assigns an arrow to each point in space: the arrows all point radially away from the charge along the $\hat r$ direction, and their magnitudes fall off as $1/r^2$—which is represented by the opacity of the arrows in the above picture.

The divergence $\vec\nabla \cdot \vec E$ is a kind of derivative that measures how much the vector field spreads out—or diverges—from a given point. And so we certainly should get something non-zero for the divergence of this electric field, since the arrows are all spreading out away from the origin.

The corresponding charge density function, as we learned earlier, is given by a delta function at the location of the particle,

$$ \rho = Q \delta^3(\vec r). $$

And so we can check whether Gauss's law is indeed satisfied for this simple configuration:

$$ \vec\nabla \cdot \left( \frac{Q}{4\pi\epsilon_0} \frac{\hat r}{r^2} \right) \overset{?}{=} \frac{Q}{\epsilon_0} \delta^3(\vec r). $$

Canceling out the constants, that means that we want to verify that the divergence of the vector field $\vec V = \hat r/4\pi r^2$ is equal to the delta function:

$$ \vec\nabla \cdot \left( \frac{\hat r}{4\pi r^2} \right) \overset{?}{=} \delta^3(\vec r). $$

To do so, we need to check two things:

That the divergence of $\vec V$ vanishes at any point away from the origin,

$$ \vec\nabla \cdot \left( \frac{\hat r}{4\pi r^2} \right) = 0 \quad \text{for } r \neq 0. $$

That the integral of $\vec\nabla \cdot \vec V$ over a solid ball containing the origin is equal to one:

$$ \int_\mathrm B \mathrm dv\, \vec\nabla \cdot \left( \frac{\hat r}{4\pi r^2} \right) = 1. $$

To prove those two properties, I'll need to use a couple of facts about the divergence operator from vector calculus.

The first fact is that the divergence of a vector $\vec V$ like this is given by

$$ \vec\nabla \cdot \vec V = \frac{1}{r^2} \frac{\partial }{\partial r } (r^2 V_r) + \xcancel{\text{more terms}}. $$

For a general vector field there would be more terms here, but we don't have to worry about them in this case since our vector only has an $r$ component.

Then plugging in $V_r = \frac{1}{4\pi r^2}$, we obtain

$$ \vec\nabla \cdot \vec V = \frac{1}{r^2} \frac{\partial }{\partial r } \left( r^2\frac{1}{4\pi r^2} \right). $$

But the factors of $r^2$ inside the parentheses cancel each other out, leaving us with the derivative of a constant. So it looks like we simply get zero for the divergence of this vector field:

$$ \vec\nabla \cdot \left(\frac{\hat r}{4\pi r^2} \right) = 0 \quad \text{for } r \neq 0. $$

And indeed we do—but only if we're sitting at a point away from the origin. Because our vector field goes like $1/r^2$, it blows up when $r$ is equal to zero, and so we need to be more careful about what happens there.

That's what brings us to the second condition: that when we integrate $\vec\nabla \cdot \vec V$ over a ball surrounding the charge, we should get one:

$$ \int_\mathrm B \mathrm dv \, \vec\nabla \cdot \left( \frac{\hat r}{4\pi r^2} \right) \overset{?}{=} 1. $$

To verify that we do, we'll need a second fact from vector calculus: the divergence theorem.

It says that any time we integrate a divergence over a solid volume, it's the same as integrating the vector itself over the surface on the boundary:

$$ \int_\mathrm{B}\mathrm dv\, \vec\nabla \cdot \vec V = \int_{ \mathrm S} \vec V \cdot \mathrm d\vec a. $$

On the right-hand side, we're taking the boundary surface $\mathrm S$ and dividing it up into lots of tiny patches. For each patch, we assign a vector $\mathrm d \vec a = \hat n \,\mathrm da$ whose magnitude is the area of the patch and whose direction points perpendicularly away from the surface. Then we take the dot product $\vec V \cdot \mathrm d \vec a$ between that area vector and the value of our vector field $\vec V$, and we add up the contributions from all the little patches by integrating over the entire surface.

The divergence theorem then states that the two integrals—one over the 2D surface $\mathrm S$ and the other over the 3D volume $\mathrm B$ inside of it—are identical.

In this case, our volume is a solid ball, and so its boundary is a sphere of some radius $R$. Applying the divergence theorem, we obtain the integral of our vector field $\vec V$ over that sphere:

$$ \int_\mathrm B \mathrm dv\, \vec\nabla \cdot \left( \frac{\hat r}{4\pi r^2} \right) = \int_\mathrm S \left(\frac{\hat r}{4\pi r^2}\right) \cdot \mathrm d \vec a. $$

Recalling that $\mathrm d \vec a$ is defined to point perpendicularly away from the surface—which in this case means the $\hat r$ direction—we can write $\mathrm d \vec a = \hat r\, \mathrm da$. Pulling the constants outside of the integral, including the constant radius $r=R$ of the sphere, we obtain

$$ \int_\mathrm B \mathrm dv\, \vec\nabla \cdot \left( \frac{\hat r}{4\pi r^2} \right) = \frac{1}{4\pi R^2} \int_\mathrm{S} \hat r \cdot \hat r\, \mathrm da. $$

The two unit vectors dot together to give 1, leaving us with

$$ \int_\mathrm B \mathrm dv\, \vec\nabla \cdot \left( \frac{\hat r}{4\pi r^2} \right) = \frac{1}{4\pi R^2} \int_\mathrm{S}\mathrm da. $$

All that remains of the integral is then the area $\mathrm da$ of each little patch, summed up over all the patches making up the sphere. And that sum is simply the total surface area, $4\pi R^2.$ Thus, we've finally shown that

$$ \int_\mathrm B \mathrm dv\, \vec\nabla \cdot \left( \frac{\hat r}{4\pi r^2} \right) = 1, $$

just as we'd hoped.

With those two properties established, we've confirmed that the divergence of the vector field $\vec V = \hat r/4\pi r^2$ satisfies the two properties required of the delta function:

$$ \vec\nabla \cdot \left( \frac{\hat r}{4\pi r^2} \right) = \delta^3(\vec r). $$

And therefore, Gauss's law is indeed satisfied for the familiar electric field of a point charge:

$$\vec\nabla \cdot \vec E = \frac{\rho}{\epsilon_0}.$$ By the way, in the last lesson, you learned that Gauss's law can also be stated in a different way: that the integral of the electric field over any surface is equal to the amount of charge contained inside of it:

$$ \int_\mathrm S \vec E \cdot \mathrm d \vec a = \frac{Q_\mathrm{in}}{\epsilon_0}. $$

We call the first equation the differential form of Gauss's law and the second the integral form. And we can see that they're equivalent by applying the divergence theorem once more.

Starting from the differential form, we can integrate both sides over any volume $\mathrm V$:

$$ \int_\mathrm V \mathrm dv \, \vec\nabla \cdot \vec E = \int_\mathrm V \mathrm dv\, \frac{\rho}{\epsilon_0}. $$

On the left, the divergence theorem lets us turn the volume integral over $\mathrm V$ into the surface integral of $\vec E$ over the boundary $\mathrm S$ of the region. And on the right, the integral of the charge density simply gives us back the amount of charge contained inside:

$$ \int_\mathrm S \vec E \cdot \mathrm d \vec a = \frac{Q_\mathrm{in}}{\epsilon_0}, $$

thereby establishing the integral form of Gauss's law.

That's a bit by the by, though. Now let's get back to another application of the delta function, this time in relation to the Fourier transform.

The Fourier Transform of the Dirac Delta

It's hard to think of subjects in science that don't rely on the Fourier transform in some way.

Given a function $f(x)$, its Fourier transform $\hat f(k)$ is a new function defined by integrating the original against a complex wave, $e^{-ikx}$:

$$ \hat f(k) = \frac{1}{\sqrt{2\pi}} \int\limits_{-\infty}^\infty \mathrm dx\, e^{-ikx} f(x). $$

I made an earlier lesson all about it if you want to get some intuition for where this formula comes from.

Going the other way, we can recover $f(x)$ from $\hat f(k)$ by performing the inverse Fourier transform,

$$ f(x) = \frac{1}{\sqrt{2\pi}} \int\limits_{-\infty}^\infty \mathrm dk\, e^{ikx} \hat f(k), $$

the only difference being the sign that appears in the exponent.

These formulas are a lot to take in if you haven't seen them before. As a concrete example, consider our function $G(x)$ once again, describing a Gaussian whose width is set by its standard deviation $\sigma$:

$$ G(x) = \frac{1}{\sqrt{2\pi \sigma^2}} e^{-\frac{1}{2\sigma^2}x^2}. $$

By plugging that function into the Fourier transform formula and evaluating the integral, you'll find that $\hat G(k)$ is actually another Gaussian:

$$ \hat G(k) = \frac{1}{\sqrt{2\pi}} e^{-\frac{\sigma^2}{2}k^2}. $$

There's a key difference between the two, though. Whereas our original function had standard deviation $\sigma$, its Fourier transform has standard deviation $1/\sigma$. Meaning that if our original curve was very narrow in $x$-space, its transform will be very broad in $k$-space, and vice versa.

In fact, if we once again take the limit where $\sigma$ shrinks down to zero size, $G$ gets squished to an infinitesimally narrow spike at the origin as it approaches the delta function, whereas $\hat G$ flattens out and simply approaches a constant: $1/\sqrt{2\pi}.$

And this is indeed telling us something deep about the Fourier transform: the transform of a constant is a delta function, and vice versa:

$$ \delta(x) \longleftrightarrow \hat \delta(k) = \frac{1}{\sqrt{2\pi}}, $$

again reflecting the fact that a narrow spike in one space transforms into a broad plateau in the other.

We can verify this relationship by directly plugging the delta function into the Fourier integral:

$$ \hat \delta(k) = \frac{1}{\sqrt{2\pi}} \int\limits_{-\infty}^\infty \mathrm dx\, e^{-ikx} \delta(x). $$

As usual, the effect of $\delta(x)$ is simply to collapse the integral to the point $x = 0,$ where we get $e^0 = 1.$ The result of Fourier transforming the delta function is therefore indeed equal to a constant:

$$ \hat \delta(k) = \frac{1}{\sqrt{2\pi}}. $$

Going the other way, if we plug that constant value into the inverse Fourier transform,

$$ \delta(x) = \frac{1}{\sqrt{2\pi}} \int\limits_{-\infty}^\infty \mathrm dk\, e^{ikx} \hat \delta(k), $$

we obtain one of the key representations of the delta function that's used all the time in physics: as an integral over all complex waves,

$$ \delta(x) = \frac{1}{2\pi} \int\limits_{-\infty}^\infty \mathrm dk\, e^{ikx}. $$

The integral doesn't converge, and so as before this equation needs to be interpreted in the language of distributions to make rigorous sense of it.

But to understand what it means intuitively, remember that $e^{ikx}$ represents a complex wave, whose real part is $\cos(kx)$ and whose imaginary part is $\sin(kx)$:

$$ e^{ikx} = \cos(kx) + i \sin(kx). $$

Focusing on just the real part, what the above integral is telling us to do is add up these waves for every possible wavelength—from arbitrarily narrow to arbitrarily broad, and everything in between.

At a generic point $x$, those waves assign essentially random values ranging from $-1$ to $1$. When we add them all up, they then interfere destructively and leave us with nothing—with one key exception.

At the origin where $x = 0$, all the waves take the same value—$e^{ik\cdot 0} = 1$—regardless of their wavelength. And so at that point, they interfere constructively and they add up to infinity. Exactly as we would expect for the delta function!

The Wavefunction of a Quantum Particle

To finish off this lesson, I want to briefly return to the place where Dirac made the delta function famous in the first place: quantum mechanics.

The state of a quantum system is described by its wavefunction, $\psi(x)$. And the physical quantities that we're accustomed to measuring—like the position and momentum—are represented by operators that act on the wavefunction:

$$ \hat O : \psi \to \hat O \psi. $$

The position operator $\hat X$ is particularly simple: all it does is multiply the wavefunction by the coordinate $x$:

$$ \hat X \psi(x) = x \psi(x). $$

For a particle sitting at some definite position $a$, by definition we get back $a$ times the wavefunction when we act on it with the position operator:

$$ x\psi_a(x) = a\psi_a(x). $$

So, what is the wavefunction that satisfies this defining equation?

The answer, of course, is a delta function!

$$ \psi_a(x) = \delta(x-a). $$

The reason is that when we multiply the delta function by the coordinate $x$, it simply picks out the value $x = a$ where the delta has its spike:

$$ x \delta(x-a) = a\delta(x-a). $$

And so, the delta function is the wavefunction for a quantum mechanical particle located at a definite position.

There are many more applications where the delta function arises in physics—especially in the realm of quantum mechanics. But this is where we'll stop for now.

You now have the tools to go off and understand the many uses of the Dirac delta function in your own studies!