Occam's Razor not some fuzzy rule of thumb

Not everyone is raised to revere Occam's Razor. To someone who wasn't, the statement "it's the simplest explanation" isn't a knockdown argument for anything. Why couldn't a complex Non-Occam explanation be correct?

So it bears explaining.

Occam's Razor is not just some fuzzy rule of thumb, it has a formalism: minimum message length (MML)!

"The woman down the street is a witch; she did it"

The above sentence looks like a short and simple theory for whatever happened, but it's far from simple. Several of the words used, such as "witch", require a lot of explanation for an AI or alien who knows nothing about any of the words. The resulting completed message, that contains all the data needed to interpret the sentence, in addition to the sentence itself, is the true MML of your theory.

If you then represent the message as binary code, you can describe its complexity in terms of bits (a log₂ number).

(To be fair, though it doesn't make a difference for this page, here we handwave away an important detail: the "message" would actually be a computer program (Turing machine code, I think), as that is the shortest possible way—and a language-neutral way—to express a theory.)

Slightly longer messages must be taken by an Ideal reasoner as exponentially less likely to match reality.

Even if on one given occasion this may feel hard to justify, it's simply math, that if you have the habit of believing messages just one bit longer than the shortest message available, you'll be wrong twice as often as otherwise. To say nothing of when the message is ten bits longer, where on average you must expect your first thousand (because 2¹⁰ = 1024) theories of the same length to be proven false.

And though there's technically a way out here to save your pet theory, if you were motivated to argue it into a defensible position… it's not valid to hope along the lines of "there's still a chance, right?" for the longer message to happen by luck to describe reality more closely. No one can feel a probability that small, so it's more human-psychologically correct (in the sense of that famous parable by Asimov "…wronger than both of them put together") to say that it's simply zero, i.e. to say that we actually know that the simpler explanation is correct (technically, just the most accurate by far – until someone thinks of a theory with an even shorter MML).

This is why physicists strive so hard to find simple theories – the simplicity is as good as proof it's correct!

(Why do physicists run any experiments at all then when they could just sit in an armchair crafting ever simpler theories? Excellent question! There's one constraint on your theory-making: you need the simplest theory that still fits all the facts at hand. Otherwise you could just propose a zero-length message as explanation for everything, right? If a theory fails to explain just one fact, it's already disproven and the answer has to be in a different theory, even if that one must be longer. They just discount anything that's longer than necessary. And run experiments to differentiate between theories of equal length.)

Once a simpler theory is found that fits, everyone acts like we know this theory is true, because… we essentially do know it.

The word "know", if it's to mean anything useful, is shorthand for a sufficiently high probability – large percentages like 99.9976%, the amount of decimals passing beyond the realm where it's psychologically realistic to keep track of the probability as a mental entity at all. We throw it away, and that's the point where we say we "know" the attached proposition. Although for agents with unbounded computing power, the number would always remain.

As Dennis Lindley (1923–2013) said, our theories must always allow for the possibility that the moon is made of green cheese, however tiny (Cromwell's Rule). Most people alive today would assign such a proposition about the moon too tiny a probability to bother keeping track of – in other words, they know perfectly well it's not made of green cheese! If this bothers you, the issue is that the word "know" is a bit of an abomination, a shorthand for a probability hugging up against 0% or 100% with many decimals. And the word "know" serve a pragmatic purpose as such a shorthand, but the vast majority of people don't think of it that way, they just hear it as absolute, so be wary.

Anyway, just as you won't bother to do an experiment to check if the moon is made of green cheese, as it's so improbable as to be not worth your time, then for the same reason, you don't bother to test or even consider any other hypotheses with long MML – they're so improbable as to be not worth your time.

To nevertheless privilege a long-MML hypothesis and insist it be tested, you must likewise argue for checking whether the moon is cheese, and decillions of other improbable hypotheses, and then humanity has no time to do anything else.

But… is it so bad to privilege a hypothesis "just this once"? From www.greaterwrong.com/posts/X2AD2LgtKgkRNPj2a/privileging-the-hypothesis:

In the minds of human beings, if you can get them to think about this particular hypothesis rather than the trillion other possibilities that are no more complicated or unlikely, you really have done a huge chunk of the work of persuasion. Anything thought about is treated as “in the running,” and if other runners seem to fall behind in the race a little, it’s assumed that this runner is edging forward or even entering the lead.

What if you have special knowledge that implies it's worth testing? Well, that's allowed and totally OK! Science doesn't pick sides. But your knowledge has to have a large evidential weight to offset the long MML. Without such weight, we're back to the previous reasoning – it's overwhelmingly likely to just waste our time.

If the explicit probability argument doesn't persuade you, how about track record?

Contrary to how it's often presented, the Copernican revolution, where we transitioned from a geocentric to a heliocentric model, wasn't straightforward! Read The Copernican Revolution From the Inside. In the beginning, the data fit the theory worse!

Yet people insisted trying to make heliocentrism true.

Why? They liked its philosophical simplicity. And in the end, that bore fruit. That's why we're now so confident in Occam's Razor: when you find a simple theory, it tends to be worth insisting on it for a while, more than any other butterfly idea. If you don't have that policy, you may get stuck on theories that fit the facts better right now and miss out on the truth.

Science would have discovered almost nothing by now if the scientists weren't thinking about hypotheses according to Occam's Razor.

There are infinite possible explanations for any phenomena, and every time you test one and it fails, you can rule out a large segment of the space of possible explanations similar to the one you just tested. Thus you quickly narrow down the most correct explanations, which results in technology that works. That phone in your hand was crafted by the invisible hand of Occam.

What links here

Created 2023-Jan-04 (17 months ago)