The Speed of Light Has Nothing to Do With Light

How many times has this happened to you? You're watching one of those popular science programs when suddenly you're bombarded the outlandish claim that nothing travels faster than the speed of light. It’s a common expression to be sure, and physicists love to remind us of this fact at every opportunity. But how many times have those same science educators ever stopped to explain why this must be the case? It’s an insanely counterintuitive thing to say, yet most educational resources seem to treat it as a brute fact of life---as if the universe itself is just really quirky that way, so quit your whining and get used to it. Even Einstein himself seems to postulate it outright in his original work [1], and all the laws of special relativity just happen to fall out of the mathematics once you do.

But what if I told you it didn’t have to be that way? The universality of the speed of light is not some quirky foible we all have to live with. Rather, it's a fundamental statement about the nature of space and time. There’s even a surprisingly simple proof of this fact, and it’s been well-documented in the literature for many decades [2,3]. Yet, for some very strange reason, it never quite made its way into our textbooks. Most popular science educators are therefore blissfully unaware of it, and they can't help but spread confusing misinformation as a result. It’s a real shame, too, because the proof itself is surprisingly simple and elegant. Anyone with a basic comprehension of calculus can replicate it for themselves, which means you don't need an advanced degree to understand the argument.

Here's how it works.

I’d like to introduce you to the intrepid space explorer, Annie the astronaut. At this particular moment, Annie just so happens to be soaring through the furthest reaches of outer space, far away from any massive objects like stars or planets. As part of her mission mandate, Annie must catalogue her encounters with any passing objects, and she naturally accomplishes her task through the use of a very large, and very sophisticated, meter stick. Furthermore, any time she measures the location of a passing object, she uses her on-board block to record the time at which it occurred. The result is thus an ordered pair of numbers, (x,t), which physicists refer to as an event.

Now let us imagine what would happen if a second space explorer, whom we’ll call Jim, just so happens to cruise on by. Just like Annie, Jim is also charged with taking his own measurements, which means he also has his own a space-age meter stick and on-board clock. Jim, however, is moving with some velocity v relative to Annie. Using the language of physics, we would therefore say that Jim travels in his own inertial frame of reference. Unsurprisingly, Jim therefore measures his events a little bit differently, and we can represent his perspective through the coordinates x-prime and t-prime (x’,t’). Finally, for the sake of personal convenience, we may further assume that Jim and Annie both agree to synchronize their instruments at the moment their origins intersect, which means they both agree to label that event as (0,0).

Now suppose that Annie and Jim both observe an event during their encounter. Say, a piece of nearby space debris happens to collide with another. Naturally, Annie performs her measurement, which she records as the numbers (x,t). Likewise, Jim does the same thing from his perspective, resulting in (x',t'). 

So far so good, right? But this raises an interesting question:

Question: Given Annie's measurement of some event, (x,t), plus her velocity v relative to Jim, is it possible to calculate Jim's measurement, (x',t'), of the same event?

This kind of calculation has a name in physics, and it is called a transformation. At first glance, it almost comes across as a boringly trivial thing to ask, doesn't it? Yet that’s exactly what makes this thought experiment so beautiful. It just feels like the result ought to be intuitively dull. But let's suppose, just for the sake of argument, that we know nothing whatsoever about the underlying nature of space, time, or motion. What minimal set of assumptions do we require in order to generate a unique solution that also jives with our everyday experience? 

Not many people realize this, but this simple question is the foundation of Einstein’s theory of special relatively. It’s such a deceptively boring question, too, that most people would never even think to ask it. Yet the moment we take the time to walk through it, we are immediately forced to abandon every naïve intuition we ever had about the nature of space and time. So by all means, get out a pen and piece of paper and follow along. All you need is basic understanding of high school mathematics, and you too can see for yourself why the speed of light must be a universal constant that no one can exceed.

Assumption #1: The transformation is a function of space and time.

That’s easy enough, right? After all, the whole point of this thought experiment is to transform from Annie’s coordinates into Jim’s. It also tells us that we can ignore such factors as the temperature of the meter stick and the metallic composition of the clocks. Presumably, these sorts of things can be controlled for, so they should not have any effect on the transformation anyway. It is also conceivable that space itself exhibits some kind of weird hysteresis effect, such that there are multiple possible outcomes for any given transformation. In practice, however, none of this stuff has ever been observed, and so we may reasonably assume that it won’t be an issue here.

Assumption #2: The laws of physics are identical in all inertial frames of reference.

This is known as The Principle of Relativity, and it is a cornerstone to our entire understanding of motion through space and time. Basically, it tells us that there is no reason to think Annie's frame of reference is fundamentally different from Jim's. Presumably, the universe does not care that Annie is four feet to the left of Jim or one hundred miles to the right. It does not care if an event occurs at 3:32 pm vs 11:54 am, nor does it distinguish between Jim's motion to the right vs Annie's motion to the left. Both observers feel as if they are the one at rest while the is the one moving.

One important consequence of this assumption is that any experiment performed in one frame of reference must yield the same result when performed in the other. For example, suppose Annie holds up two magnets and measures the force between them. Then, for whatever reason, Jim happens to replicate that exact same experiment in his own cockpit an hour later. All other things being equal, it seems natural to expect that Jim's result should perfectly replicate Annie's. It simply shouldn't matter that Jim is 9,000 meters away, or that he's moving to Annie's left at 50 meters/second, or that he's doing his experiment on a Tuesday morning. The universe does not care where you are, what time it is, or what direction you happen to be moving. The magnets should behave exactly the same in all instances.

Assumption #3: The Law of Inertia---An object in motion shall remain in motion, and it shall travel along a straight-line trajectory unless acted upon by an external force.

Easy, right? Now let's see what we can deduce from these basic assumptions.

To begin, imagine a piece of floating space rock that happens to move past Annie's spaceship. From her point of view, she observes the rock casually floating on by with some constant velocity in accordance with the law of inertia. For convenience, let's call this velocity u0, just to avoid any confusion with Jim's relative velocity v. Thus, using the language of calculus, we would say that the derivative of the rock's position, x, with respect to time, t, is equal to u0:

dx/dt = u0

Now let’s consider the same encounter from Jim’s perspective. According to the law of inertia, Jim also observes that the same rock is moving along with some constant velocity. Jim, however, is also moving with some constant velocity v relative to Annie, which means his measurement of the rock's velocity yields a different value. Thus, we have a slightly different expression for Jim, which is written as

dx’/dt’ = u0'

Now watch what happens when we take the derivative with respect to Annie's position:

d/dx ( dx’/dt’ ) = d/dx (u0') = 0

According to the law of inertia, the rock's velocity, u0', is a constant throughout space and time, and so the derivative naturally evaluates to zero. If we then swap the denominators and calculate the antiderivative, we quickly find that

d/dx ( dx'/dt' ) = d/dt' (dx'/dx ) = 0 

dx'/dx = A .

This result should feel perfectly intuitive. All it says is that, given some tiny change in the rock's position from Jim's perspective (dx'), Annie observes some proportional change in position from her perspective (dx). If we then take the ratio of these two changes, we get a constant value that does not depend on either space or time. If we then repeat this argument three more times, we get three more ratios for a total of four unknowns constants:

Now let's take the antiderivative of the first two expressions. The first case tells us that

x' = Ax + T(t) ,

where T(t) is some arbitrary function of Annie's time coordinate. The second case then tells us that

x' = X(x) + Bt ,

where X(x) is some other arbitrary function of Annie's spatial coordinate. If we then compare these expressions side-by-side, it is easy to that there is only one possible solution:

x' = Ax + Bt.

Finally we repeat the same argument with Jim's time coordinate, t', and we find that:

t’ = Cx + Dt .

Looking at these two expressions side-by-side, we should immediately see that the transformation from Annie’s frame of reference into Jim’s takes on a linear structure---that is to say, it is a linear transformation expressed by a system of linear equations. And that just feels intuitive, right? Lots of things in nature of linear, so it should come as no surprise that our result should take on this simple structure. And since the structure is linear, we should immediately feel inclined to express this relationship using standard matrix-vector notation. That is, after all, what matrix-vector algebra was specifically designed to handle:

 
By the way, if you're not familiar with matrix-vector notation, then feel free to pause now and look it up. It is important to be comfortable with this stuff as we move forward.
 
One major advantage to this formulation is that it allows us to easily calculate the inverse transformation from Jim's frame of reference back into Annie's. This is accomplished by calculating the matrix inverse, which leads to the following:

 
Note that little delta (Δ) here simply represents the determinant of the matrix, which is AD-BC for a 2x2 matrix. All that remains for us now is for us to derive the four mystery coefficients.

To that end, consider what would happen if Annie measures Jim’s location. By the law of inertia, she has to observe a straight-line trajectory satisfying x = vt. By definition, however, this same location also corresponds to Jim’s origin at x’ = 0. Let us therefore substitute this information directly into the transformation for x’, such that

x' = 0 = Avt + Bt .

A little bit of algebra later, and we now find that the constant B must satisfy:

B = -Av.

Let us further imagine the exact same situation in reverse. That is to say, Jim measures Annie's position from his perspective, which he naturally observes as x’ = -vt’. In other words, he sees the exact same thing that Annie did, but in reverse. Again, by definition, this corresponds to Annie’s origin at x = 0, and so the transformation from Jim’s frame of reference follows a similar argument:

0 =  (1/Δ)(Dx' - Bt') = (1/Δ)(-Dvt' - Bt').

A little bit more algebra, and we soon find that

D = -B/v = A.

In other words, the two elements along the main diagonal must be the same---a fact that will be very important later one. The transformation from Annie's frame of reference into Jim's can now be written as follows:

Next, suppose that Annie decides to measure some arbitrary time interval with her clock; say, the time, T, in seconds, that it takes for her heart to beat ten times. From Jim’s perspective, this event must transform to the location x = 0, and so it appears to last a time interval of t ' = AT seconds. 

But what exactly should we expect to happen if we perform the same experiment in reverse? That is to say, Jim measures the time it takes for his heart to beat 10 times, and he also finds that it takes exactly T seconds. Meanwhile, Annie cruises on by in her spaceship, and she observes a time interval of t = AT/Δ. According to the principle of relativity, there is no reason for Annie's frame of reference to behave any differently from Jim's. That means Annie must observe the same time interval as Jim did when the situation was reversed. In other words, Annie's measurement t is necessarily the same as Jim's measurement t'. This forces us to conclude

AT = AT/Δ, 

from which it immediately follows that the determinant, Δ, equals 1. The inverse transformation from Jim's frame of reference back to Annie's can therefore be written like this:

At this point, it helps to do a little bit of rearrangement on the transformation. Note that this is not imposing any assumptions, but simply cleaning up the expression so that it better matches our modern conventions.

To begin, let's factor out the coefficient A and then rename it to the Greek letter gamma (γ). This little guy is called the Lorentz factor, and it easily the most important parameter in all of relativity. In doing so, however, we also encounter another little factor of C/A. Since this is just some constant number divided by yet another constant number, it is likewise nothing more than some arbitrary constant number. Thus, for lack of anything better to call it, let us simply replace it with the letter F. Our updated transformation now looks like this:

 
We are are now ready to introduce a fourth assumption into our hypothetical universe.

Assumption #4: All coordinate transformations must form a mathematical group.

For those of you unfamiliar with group theory, this may sound a little strange, but I promise it's actually perfectly sensible. In this case, all it says is that a transformation acting upon another transformation must also be a transformation. 

Here's how it works. Suppose a third observer, whom we’ll call Carl, happens to pass on by. Annie therefore observes Jim moving to the right with velocity v1 and Carl moving along with velocity v2. If, however, we were to transform to Jim's perspective, then he naturally observes things slightly differently. As expected, he sees Annie move along with some velocity v1', and Carl moving along with some different velocity, which we'll call v2'. Finally, if we transform yet again into Carl's perspective, then he naturally sees things differently yet again, which we can indicate using the double-primed coordinates. Carl therefore sees Annie moving along with some velocity v1'', and he sees Jim moving along with some other velocity, which we'll call v2''.

Now let us impose the condition of a mathematical group: a transformation from Annie (x,t) to Jim (x',t'), followed by yet another transformation from Jim (x',t') to Carl (x'',t''), should naturally be the same as a direct transformation from Annie (x,t) to Carl (x'',t''). So let’s write that out mathematically. Starting with the transformation from Annie to Jim:

followed by the transformation from Jim to Carl:

 
 
we naturally have the transformation from Annie to Carl: 

 

We now multiply the two matrices together, yielding a new transformation which looks like this: 

This thing may look complicated at first glance, but it really isn't that bad. Remember that all we care about is the simple fact that this new expression needs to obey the rules of a transformation. That includes all properties of transformations which we have already discovered thus far, which includes the condition that the two main diagonal elements must be the same (D = A). In other words,

1 - F1v2' = 1 - F2'v1 .

Again, we do a little bit of algebra, and we quickly find that:

v1/F1 = v2'/F2'.

This may not look like much, but it's actually a very peculiar result. Remember that the relative velocities between Annie, Jim, and Carl are completely arbitrary. We're the ones in charge of this thought experiment, which means we can easily set them to any values we like. For example, I could could hypothetically double the value of v1 while simultaneously leaving v2' untouched, and this equality still has to hold. Yet if you look closely, the right side of this equation does not depend on v1 at all, which means it cannot possibly change, no matter what values I pick. How can this be true?

The only way to reconcile this contradiction is for both ratios to evaluate to some universal constant that does not depend on either velocity. And since we have no idea what that constant is, I'm just going to assign it another random letter like... I dunno... a. We therefore have,

v1/F1 = v2'/F2' = a .

Remember, this relationship must be true for any arbitrary coordinate transformation. Therefore, our original transformation from Annie to Jim must likewise obey the same rule. That little constant F must therefore evaluate to v/a for all coordinate transformations. That leaves us with a new transformation which looks like this:

Now let's take a moment to remind ourselves of another useful fact that was already derived earlier. Namely, that the determinant of any transformation matrix must equal 1, no matter whose frame of reference it happens to represent. A little bit more algebra later, and we find that our mysterious Lorentz factor must evaluate to the following:

 
 We are therefore left with a coordinate transformation that finally looks like this:
 
 
Clearly, the only parameter we have left to make sense out of is that mysterious little constant a. So let's take a moment to think about what exactly that thing represents.
 
For starters, the astute observer might notice that a just so happens to have units of velocity squared [m2/s2]. That's a little odd, but it does motivate us to rewrite a in terms of a new velocity constant, which we'll call c, such that a = +/-c2. What exactly this magical velocity represents, we cannot yet say. But clearly, there are only three possible scenarios to consider:

  1. a = 0 (thus implying c = 0)
  2. a is positive, implying that a = +c2, or 
  3. a is negative, implying that a = -c2.

Now let's consider each case one-by-one.

Clearly, we can immediately reject the first option outright because all it would do is introduce a bunch of divisions by zero. 

The second option is likewise not physically viable, but the reason is not quite as obvious. It begins by asking what would happen if Annie observes Jim traveling to the right with some moderate velocity like v = c/10. Meanwhile, Jim observes Carl, who is likewise traveling to the right with the same relative velocity of c/10. Carl, in turn, observes yet another traveler with velocity c/10, who observes another, and so on. 

Now suppose that Annie measures her heartbeat and observes a delay of exactly t = 1.0 second. From Jim's perspective, he likewise observes Annie's heartbeat, and he naturally measures some duration t' from his frame of reference. Carl then measures a value of t'', and so on for every other observer down the train. But what exactly should we expect to happen if we simply repeated this transformation a hundred times over? The answer, it turns out, is a little strange. Given enough transformations, Annie's heartbeat will actually appear to flow in the negative direction. As in, literally, the observers will eventually perceive Annie's flow of time as moving backwards! 

Obviously, that cannot possibly be the case. So let's introduce a 5th assumption to our universe:

Assumption #5: For any two events, (x1,t1) and (x2,t2), if x2 = x1 and t2 > t1, then all transformations must likewise result in t2' > t1'.

In other words, all observers must agree that time tends to move forwards for certain durations. This condition immediately removes the second option from our list, leaving behind only one logical possibility. Our final transformation therefore must take on the following structure:

Finally let us remove the matrix-vector notation and write the transformation in its standard form, as shown here:


This expression has a name in physics, and it is called the Lorentz Transformation. It is also an incredible result when you think about it. Remember that we started this thought experiment by asking a seemingly trivial question and expecting a trivial result. Yet of all the different mathematical possibilities we could have imagined for transforming between inertial frames of reference, this is the only one that is consistent with our assumptions. Plus, it’s not like we demanded anything extraordinary, either. We’re talking about stupidly basic assumptions like the law of inertia and the tendency for time to move forward. It simply cannot be any other way without the universe suddenly exploding into a bunch of really bizarre manifestations.
 
Notice also how the word "light" was never mentioned a single time throughout this entire derivation. If anything, the very phrase itself---the speed of light---is nothing but a blundering misnomer. That little mystery constant c has absolutely nothing whatsoever to do with photons or electromagnetic radiation. It's a fundamental statement about the interplay between space, time, and motion. That's why many physiscists prefer not to use the phrase "speed of light." Rather, a far more appropriate name would be something like "the speed of causality," in that it represents the fastest possible speed at which any single event can ever appear to causally influence another.

"But wait!" I hear you saying. "Who says that I can't travel faster than light? Surely, all I have to do is hop into a space ship and gun the engines. Sooner or later, I have to exceed the speed of light, don't I?"
 
But will you? Consider again that little piece of space rock floating by from Annie's perspective. According to the law of inertia, Annie necessarily observes a velocity of

dx/dt = u0

Now ask yourself what velocity, u0', Jim should expect to see from his perspective. The answer is actually perfectly straightforward. First, we use implicit differentiation to derive the following expressions:

dx' = γ(dx - vdt)

dt' = γ(-vdx/c2 + dt)

Next, we simply divide the one by the other to find this:

dx'/dt' = (dx - vdt)/(dt - v dx/c2) .

Remember that, by definition, this quantity has to represent the transformed velocity, u0', from Jim's perspective. Likewise, the quantity dx/dt represents the relative velocity u0 between Annie and the rock, which means we finally arrive at this expression here:

dx'/dt' = (u0 - v)/(1 - u0v/c2) = u0' 

This is the famous velocity addition formula of special relativity, and you can clearly see for yourself that velocities do not NOT add linearly. It seems strange and counterintuitive, but it is also the only way to preserve a well-behaved universe that obeys our earlier assumptions.

Now consider again that that long train of observers, all moving past each other with some relative velocity of c/10. You would think that eventually some observer will see Annie zooming off faster than the speed of light, but do they really? Just try it for yourself and graph the result. This is what you'll see:


Isn't that weird? No matter how many times we repeat the transformation, the velocity of that little piece of space rock will never exceed the mystery constant c. In fact, if we let the velocity equal c itself, then all observers in all inertial frames of reference will likewise agree that it is moving exactly that fast. That is to say, if any observer measures an object moving with speed c, then ALL observers will likewise meausure a speed of c. This has to be the case, because it is the only logical outcome that is consistent with the assumptions we've made.
 
This is exactly why physicists are so confident in the idea of a universal speed limit. It's not just a matter of "well, the evidence seems to indicate." It logically cannot be any other way! Any universe that allows for objects to move beyond some universal speed limit would completely violate our most basic presumptions about space, time, and motion. It is therefore not matter of hopping into a spaceship and firing the engines. The very geometry of the universe itself simply doesn't allow it.
 
So what exactly is the value of that universal speed limit, anyway? To answer that question, you just have to start doing some basic experiments. In our particular universe, it just so happens to be 299,792,458 meters per second---a very large number. So large, in fact, that we can usually approximate it as infinite and still get reliably good results. Letting c approach infinity thus leads us to the following transformation:


These equations are known as the Gallilean transformation, and they finally represent the simple, intuitive answer that we were originally expecting to find when we first began this exercise. However, it's interesting to point out that we practically had to trip over the correct, Lorentizan transformation just to get here, not to mention impose the highly dubious assumption of infinite value for that mysery constant. It all just goes to show that you never know what you might discover if only you take the time to enumerate a few basic assumptions and then follow the argument wherever it leads.

Thanks for reading.
 
References
  1. Einstein, A., "On the electrodynamics of moving bodies," Annalen Phys. 17 (1905) 891-921 [link]
  2. Pelissetto, A., and Testa, M., "Getting the Lorentz transformations without requiring an invariant speed," American Journal of Physics, 83, 338 (2015) 338-340 [link]
  3. W. von Ignatowsky, "Das Relativitatspringzip," Archiv der Mathematik und Physik 17, 1-24 (1911)