But what if I told you it didn’t have to be that way? The universality of the speed of light is not some quirky foible we all have to live with. Rather, it's a fundamental statement about the nature of space and time. There’s even a surprisingly simple proof of this fact, and it’s been well-documented in the literature for many decades [2,3]. Yet, for some very strange reason, it never quite made its way into our textbooks. Most popular science educators are therefore blissfully unaware of it, and they can't help but spread confusing misinformation as a result. It’s a real shame, too, because the proof itself is surprisingly simple and elegant. Anyone with a basic comprehension of calculus can replicate it for themselves, which means you don't need an advanced degree to understand the argument.
Here's how it works.
I’d like to introduce you to the intrepid space explorer, Annie the astronaut. At this particular moment, Annie just so happens to be soaring through the furthest reaches of outer space, far away from any massive objects like stars or planets. As part of her mission mandate, Annie must catalogue her encounters with any passing objects, and she naturally accomplishes her task through the use of a very large, and very sophisticated, meter stick. Furthermore, any time she measures the location of a passing object, she uses her on-board block to record the time at which it occurred. The result is thus an ordered pair of numbers, (x,t), which physicists refer to as an event.
Now suppose that Annie and Jim both observe an event during their encounter. Say, a piece of nearby space debris happens to collide with another. Naturally, Annie performs her measurement, which she records as the numbers (x,t). Likewise, Jim does the same thing from his perspective, resulting in (x',t').
So far so good, right? But this raises an interesting question:
Question: Given Annie's measurement of some event, (x,t), plus her velocity v relative to Jim, is it possible to calculate Jim's measurement, (x',t'), of the same event?
This kind of calculation has a name in physics, and it is called a transformation. At first glance, it almost comes across as a boringly trivial thing to ask, doesn't it? Yet that’s exactly what makes this thought experiment so beautiful. It just feels like the result ought to be intuitively dull. But let's suppose, just for the sake of argument, that we know nothing whatsoever about the underlying nature of space, time, or motion. What minimal set of assumptions do we require in order to generate a unique solution that also jives with our everyday experience?
Not many people realize this, but this simple question is the foundation of Einstein’s theory of special relatively. It’s such a deceptively boring question, too, that most people would never even think to ask it. Yet the moment we take the time to walk through it, we are immediately forced to abandon every naïve intuition we ever had about the nature of space and time. So by all means, get out a pen and piece of paper and follow along. All you need is basic understanding of high school mathematics, and you too can see for yourself why the speed of light must be a universal constant that no one can exceed.
Assumption #1: The transformation is a function of space and time.
That’s easy enough, right? After all, the whole point of this thought experiment is to transform from Annie’s coordinates into Jim’s. It also tells us that we can ignore such factors as the temperature of the meter stick and the metallic composition of the clocks. Presumably, these sorts of things can be controlled for, so they should not have any effect on the transformation anyway. It is also conceivable that space itself exhibits some kind of weird hysteresis effect, such that there are multiple possible outcomes for any given transformation. In practice, however, none of this stuff has ever been observed, and so we may reasonably assume that it won’t be an issue here.
Assumption #2: The laws of physics are identical in all inertial frames of reference.
This is known as The Principle of Relativity, and it is a cornerstone to our entire understanding of motion through space and time. Basically, it tells us that there is no reason to think Annie's frame of reference is fundamentally different from Jim's. Presumably, the universe does not care that Annie is four feet to the left of Jim or one hundred miles to the right. It does not care if an event occurs at 3:32 pm vs 11:54 am, nor does it distinguish between Jim's motion to the right vs Annie's motion to the left. Both observers feel as if they are the one at rest while the is the one moving.
One important consequence of this assumption is that any experiment performed in one frame of reference must yield the same result when performed in the other. For example, suppose Annie holds up two magnets and measures the force between them. Then, for whatever reason, Jim happens to replicate that exact same experiment in his own cockpit an hour later. All other things being equal, it seems natural to expect that Jim's result should perfectly replicate Annie's. It simply shouldn't matter that Jim is 9,000 meters away, or that he's moving to Annie's left at 50 meters/second, or that he's doing his experiment on a Tuesday morning. The universe does not care where you are, what time it is, or what direction you happen to be moving. The magnets should behave exactly the same in all instances.
Assumption #3: The Law of Inertia---An object in motion shall remain in motion, and it shall travel along a straight-line trajectory unless acted upon by an external force.
Easy, right? Now let's see what we can deduce from these basic assumptions.
To begin, imagine a piece of floating space rock that happens to move past Annie's spaceship. From her point of view, she observes the rock casually floating on by with some constant velocity in accordance with the law of inertia. For convenience, let's call this velocity u0, just to avoid any confusion with Jim's relative velocity v. Thus, using the language of calculus, we would say that the derivative of the rock's position, x, with respect to time, t, is equal to u0:
dx/dt = u0
Now let’s consider the same encounter from Jim’s perspective. According to the law of inertia, Jim also observes that the same rock is moving along with some constant velocity. Jim, however, is also moving with some constant velocity v relative to Annie, which means his measurement of the rock's velocity yields a different value. Thus, we have a slightly different expression for Jim, which is written as
dx’/dt’ = u0'
Now watch what happens when we take the derivative with respect to Annie's position:
d/dx ( dx’/dt’ ) = d/dx (u0') = 0
According to the law of inertia, the rock's velocity, u0', is a constant throughout space and time, and so the derivative naturally evaluates to zero. If we then swap the denominators and calculate the antiderivative, we quickly find that
d/dx ( dx'/dt' ) = d/dt' (dx'/dx ) = 0
dx'/dx = A .
This result should feel perfectly intuitive. All it says is that, given some tiny change in the rock's position from Jim's perspective (dx'), Annie observes some proportional change in position from her perspective (dx). If we then take the ratio of these two changes, we get a constant value that does not depend on either space or time. If we then repeat this argument three more times, we get three more ratios for a total of four unknowns constants:
Now let's take the antiderivative of the first two expressions. The first case tells us that
x' = Ax + T(t) ,
where T(t) is some arbitrary function of Annie's time coordinate. The second case then tells us that
x' = X(x) + Bt ,
where X(x) is some other arbitrary function of Annie's spatial coordinate. If we then compare these expressions side-by-side, it is easy to that there is only one possible solution:
x' = Ax + Bt.
Finally we repeat the same argument with Jim's time coordinate, t', and we find that:
t’ = Cx + Dt .
Looking at these two expressions side-by-side, we should immediately see that the transformation from Annie’s frame of reference into Jim’s takes on a linear structure---that is to say, it is a linear transformation expressed by a system of linear equations. And that just feels intuitive, right? Lots of things in nature of linear, so it should come as no surprise that our result should take on this simple structure. And since the structure is linear, we should immediately feel inclined to express this relationship using standard matrix-vector notation. That is, after all, what matrix-vector algebra was specifically designed to handle:
To that end, consider what would happen if Annie measures Jim’s location. By the law of inertia, she has to observe a straight-line trajectory satisfying x = vt. By definition, however, this same location also corresponds to Jim’s origin at x’ = 0. Let us therefore substitute this information directly into the transformation for x’, such that
x' = 0 = Avt + Bt .
A little bit of algebra later, and we now find that the constant B must satisfy:
B = -Av.
Let us further imagine the exact same situation in reverse. That is to say, Jim measures Annie's position from his perspective, which he naturally observes as x’ = -vt’. In other words, he sees the exact same thing that Annie did, but in reverse. Again, by definition, this corresponds to Annie’s origin at x = 0, and so the transformation from Jim’s frame of reference follows a similar argument:
0 = (1/Δ)(Dx' - Bt') = (1/Δ)(-Dvt' - Bt').
A little bit more algebra, and we soon find that
D = -B/v = A.
In other words, the two elements along the main diagonal must be the same---a fact that will be very important later one. The transformation from Annie's frame of reference into Jim's can now be written as follows:
Next, suppose that Annie decides to measure some arbitrary time interval with her clock; say, the time, T, in seconds, that it takes for her heart to beat ten times. From Jim’s perspective, this event must transform to the location x = 0, and so it appears to last a time interval of t ' = AT seconds.
But what exactly should we expect to happen if we perform the same experiment in reverse? That is to say, Jim measures the time it takes for his heart to beat 10 times, and he also finds that it takes exactly T seconds. Meanwhile, Annie cruises on by in her spaceship, and she observes a time interval of t = AT/Δ. According to the principle of relativity, there is no reason for Annie's frame of reference to behave any differently from Jim's. That means Annie must observe the same time interval as Jim did when the situation was reversed. In other words, Annie's measurement t is necessarily the same as Jim's measurement t'. This forces us to conclude
AT = AT/Δ,
from which it immediately follows that the determinant, Δ, equals 1. The inverse transformation from Jim's frame of reference back to Annie's can therefore be written like this:
At this point, it helps to do a little bit of rearrangement on the transformation. Note that this is not imposing any assumptions, but simply cleaning up the expression so that it better matches our modern conventions.
To begin, let's factor out the coefficient A and then rename it to the Greek letter gamma (γ). This little guy is called the Lorentz factor, and it easily the most important parameter in all of relativity. In doing so, however, we also encounter another little factor of C/A. Since this is just some constant number divided by yet another constant number, it is likewise nothing more than some arbitrary constant number. Thus, for lack of anything better to call it, let us simply replace it with the letter F. Our updated transformation now looks like this:
Assumption #4: All coordinate transformations must form a mathematical group.
For those of you unfamiliar with group theory, this may sound a little strange, but I promise it's actually perfectly sensible. In this case, all it says is that a transformation acting upon another transformation must also be a transformation.
Here's how it works. Suppose a third observer, whom we’ll call Carl, happens to pass on by. Annie therefore observes Jim moving to the right with velocity v1 and Carl moving along with velocity v2. If, however, we were to transform to Jim's perspective, then he naturally observes things slightly differently. As expected, he sees Annie move along with some velocity v1', and Carl moving along with some different velocity, which we'll call v2'. Finally, if we transform yet again into Carl's perspective, then he naturally sees things differently yet again, which we can indicate using the double-primed coordinates. Carl therefore sees Annie moving along with some velocity v1'', and he sees Jim moving along with some other velocity, which we'll call v2''.
Now let us impose the condition of a mathematical group: a transformation from Annie (x,t) to Jim (x',t'), followed by yet another transformation from Jim (x',t') to Carl (x'',t''), should naturally be the same as a direct transformation from Annie (x,t) to Carl (x'',t''). So let’s write that out mathematically. Starting with the transformation from Annie to Jim:
followed by the transformation from Jim to Carl:
We now multiply the two matrices together, yielding a new transformation which looks like this:
This thing may look complicated at first glance, but it really isn't that bad. Remember that all we care about is the simple fact that this new expression needs to obey the rules of a transformation. That includes all properties of transformations which we have already discovered thus far, which includes the condition that the two main diagonal elements must be the same (D = A). In other words,
1 - F1v2' = 1 - F2'v1 .
Again, we do a little bit of algebra, and we quickly find that:
v1/F1 = v2'/F2'.
This may not look like much, but it's actually a very peculiar result. Remember that the relative velocities between Annie, Jim, and Carl are completely arbitrary. We're the ones in charge of this thought experiment, which means we can easily set them to any values we like. For example, I could could hypothetically double the value of v1 while simultaneously leaving v2' untouched, and this equality still has to hold. Yet if you look closely, the right side of this equation does not depend on v1 at all, which means it cannot possibly change, no matter what values I pick. How can this be true?
The only way to reconcile this contradiction is for both ratios to evaluate to some universal constant that does not depend on either velocity. And since we have no idea what that constant is, I'm just going to assign it another random letter like... I dunno... a. We therefore have,
v1/F1 = v2'/F2' = a .
Remember, this relationship must be true for any arbitrary coordinate transformation. Therefore, our original transformation from Annie to Jim must likewise obey the same rule. That little constant F must therefore evaluate to v/a for all coordinate transformations. That leaves us with a new transformation which looks like this:
Now let's take a moment to remind ourselves of another useful fact that was already derived earlier. Namely, that the determinant of any transformation matrix must equal 1, no matter whose frame of reference it happens to represent. A little bit more algebra later, and we find that our mysterious Lorentz factor must evaluate to the following:
- a = 0 (thus implying c = 0)
- a is positive, implying that a = +c2, or
- a is negative, implying that a = -c2.
Now let's consider each case one-by-one.
Clearly, we can immediately reject the first option outright because all it would do is introduce a bunch of divisions by zero.
The second option is likewise not physically viable, but the reason is not quite as obvious. It begins by asking what would happen if Annie observes Jim traveling to the right with some moderate velocity like v = c/10. Meanwhile, Jim observes Carl, who is likewise traveling to the right with the same relative velocity of c/10. Carl, in turn, observes yet another traveler with velocity c/10, who observes another, and so on.
Now suppose that Annie measures her heartbeat and observes a delay of exactly t = 1.0 second. From Jim's perspective, he likewise observes Annie's heartbeat, and he naturally measures some duration t' from his frame of reference. Carl then measures a value of t'', and so on for every other observer down the train. But what exactly should we expect to happen if we simply repeated this transformation a hundred times over? The answer, it turns out, is a little strange. Given enough transformations, Annie's heartbeat will actually appear to flow in the negative direction. As in, literally, the observers will eventually perceive Annie's flow of time as moving backwards!
Obviously, that cannot possibly be the case. So let's introduce a 5th assumption to our universe:
Assumption #5: For any two events, (x1,t1) and (x2,t2), if x2 = x1 and t2 > t1, then all transformations must likewise result in t2' > t1'.
In other words, all observers must agree that time tends to move forwards for certain durations. This condition immediately removes the second option from our list, leaving behind only one logical possibility. Our final transformation therefore must take on the following structure:
Finally let us remove the matrix-vector notation and write the transformation in its standard form, as shown here:
dx/dt = u0
Now ask yourself what velocity, u0', Jim should expect to see from his perspective. The answer is actually perfectly straightforward. First, we use implicit differentiation to derive the following expressions:
dx' = γ(dx - vdt)
dt' = γ(-vdx/c2 + dt)
Next, we simply divide the one by the other to find this:
dx'/dt' = (dx - vdt)/(dt - v dx/c2) .
Remember that, by definition, this quantity has to represent the transformed velocity, u0', from Jim's perspective. Likewise, the quantity dx/dt represents the relative velocity u0 between Annie and the rock, which means we finally arrive at this expression here:
dx'/dt' = (u0 - v)/(1 - u0v/c2) = u0'
This is the famous velocity addition formula of special relativity, and you can clearly see for yourself that velocities do not NOT add linearly. It seems strange and counterintuitive, but it is also the only way to preserve a well-behaved universe that obeys our earlier assumptions.
Isn't that weird? No matter how many times we repeat the transformation, the velocity of that little piece of space rock will never exceed the mystery constant c. In fact, if we let the velocity equal c itself, then all observers in all inertial frames of reference will likewise agree that it is moving exactly that fast. That is to say, if any observer measures an object moving with speed c, then ALL observers will likewise meausure a speed of c. This has to be the case, because it is the only logical outcome that is consistent with the assumptions we've made.
- Einstein, A., "On the electrodynamics of moving bodies," Annalen Phys. 17 (1905) 891-921 [link]
- Pelissetto, A., and Testa, M., "Getting the Lorentz transformations without requiring an invariant speed," American Journal of Physics, 83, 338 (2015) 338-340 [link]
- W. von Ignatowsky, "Das Relativitatspringzip," Archiv der Mathematik und Physik 17, 1-24 (1911)