Occam's Razor not some fuzzy rule of thumb

Not everyone is raised to revere Occam's Razor. To someone who wasn't, the statement "it's the simplest explanation" isn't a knockdown argument for anything. Why couldn't a complex Non-Occam explanation be correct?

So it bears explaining.

Occam's Razor is not just some fuzzy rule of thumb, it has a formalism: minimum message length (MML)!

"The woman down the street is a witch; she did it"

The above sentence looks like a short and simple theory for whatever happened, but it's far from simple. Several of the words used, such as "witch", require a lot of explanation for an AI or alien who knows nothing about any of the words. The resulting completed message, that contains all the data needed to interpret the sentence, in addition to the sentence itself, is the true MML of your theory.

If you then represent the message as binary code, you can describe its complexity in terms of bits (a log₂ number).

(To be fair, though it doesn't make a difference for this page, here we handwave away an important detail: the "message" would actually be a computer program (Turing machine code, I think), as that is the shortest possible way—and a language-neutral way—to express a theory.)

Slightly longer messages must be taken by an Ideal reasoner as exponentially less likely to match reality.

Even if on one given occasion this may feel hard to justify, it's simply math, that if you have the habit of believing messages just one bit longer than the shortest message available, you'll be wrong twice as often as otherwise. To say nothing of when the message is ten bits longer, where on average you must expect your first thousand (because 2¹⁰ = 1024) theories of the same length to be proven false.

And though there's technically a way out here to save your pet theory, if you were motivated to argue it into a defensible position… it's not valid to hope along the lines of "there's still a chance, right?" for the longer message to happen by luck to describe reality more closely. No one can feel a probability that small, so it's more human-psychologically correct (in the sense of that famous parable by Asimov "…wronger than both of them put together") to say that it's simply zero, i.e. to say that we actually know that the simpler explanation is correct (technically, just the most accurate by far – until someone thinks of a theory with an even shorter MML).

This is why physicists strive so hard to find simple theories – the simplicity is as good as proof it's correct!

(Why do physicists run any experiments at all then when they could just sit in an armchair crafting ever simpler theories? Excellent question! There's one constraint on your theory-making: you need the simplest theory that still fits all the facts at hand. Otherwise you could just propose a zero-length message as explanation for everything, right? If a theory fails to explain just one fact, it's already disproven and the answer has to be in a different theory, even if that one must be longer. They just discount anything that's longer than necessary. And run experiments to differentiate between theories of equal length.)

Once a simpler theory is found that fits, everyone acts like we know this theory is true, because… we essentially do know it.

The word "know", if it's to mean anything useful, is shorthand for a sufficiently high probability – large percentages like 99.9976%, the amount of decimals passing beyond the realm where it's psychologically realistic to keep track of the probability as a mental entity at all. We throw it away, and that's the point where we say we "know" the attached proposition. Although for agents with unbounded computing power, the number would always remain.

As Dennis Lindley (1923–2013) said, our theories must always allow for the possibility that the moon is made of green cheese, however tiny (Cromwell's Rule). Most people alive today would assign such a proposition about the moon too tiny a probability to bother keeping track of – in other words, they know perfectly well it's not made of green cheese! If this bothers you, the issue is that the word "know" is a bit of an abomination, a shorthand for a probability hugging up against 0% or 100% with many decimals. And the word "know" serve a pragmatic purpose as such a shorthand, but the vast majority of people don't think of it that way, they just hear it as absolute, so be wary.

Anyway, just as you won't bother to do an experiment to check if the moon is made of green cheese, as it's so improbable as to be not worth your time, then for the same reason, you don't bother to test or even consider any other hypotheses with long MML – they're so improbable as to be not worth your time.

To nevertheless privilege a long-MML hypothesis and insist it be tested, you must likewise argue for checking whether the moon is cheese, and decillions of other improbable hypotheses, and then humanity has no time to do anything else.

But… is it so bad to privilege a hypothesis "just this once"? From www.greaterwrong.com/posts/X2AD2LgtKgkRNPj2a/privileging-the-hypothesis:

In the minds of human beings, if you can get them to think about this particular hypothesis rather than the trillion other possibilities that are no more complicated or unlikely, you really have done a huge chunk of the work of persuasion. Anything thought about is treated as “in the running,” and if other runners seem to fall behind in the race a little, it’s assumed that this runner is edging forward or even entering the lead.

What if you have special knowledge that implies it's worth testing? Well, that's allowed and totally OK! Science doesn't pick sides. But your knowledge has to have a large evidential weight to offset the long MML. Without such weight, we're back to the previous reasoning – it's overwhelmingly likely to just waste our time.

If the explicit probability argument doesn't persuade you, how about track record?

Contrary to how it's often presented, the Copernican revolution, where we transitioned from a geocentric to a heliocentric model, wasn't straightforward! Read The Copernican Revolution From the Inside. In the beginning, the data fit the theory worse!

Yet people insisted trying to make heliocentrism true.

Why? They liked its philosophical simplicity. And in the end, that bore fruit. That's why we're now so confident in Occam's Razor: when you find a simple theory, it tends to be worth insisting on it for a while, more than any other butterfly idea. If you don't have that policy, you may get stuck on theories that fit the facts better right now and miss out on the truth.

Science would have discovered almost nothing by now if the scientists weren't thinking about hypotheses according to Occam's Razor.

There are infinite possible explanations for any phenomena, and every time you test one and it fails, you can rule out a large segment of the space of possible explanations similar to the one you just tested. Thus you quickly narrow down the most correct explanations, which results in technology that works. That phone in your hand was crafted by the invisible hand of Occam.

What links here

Created 2023-Jan-04 (3 years ago)

Nullius in verba

The Royal Society in 1660 had the slogan nullius in verba – "Take nobody's word for it". We can see it as representing a fundamental shift in mindset that we call the Enlightenment. It used to be near-universal among human cultures to believe in some sort of Fall From Grace: everything was better before, and the most solid knowledge comes from authorities like the church or someone who lived earlier who wrote something, the older the better.

Mapmakers everywhere used to fill-in the regions they didn't know well or didn't know anything about (perhaps they just tried to hide their knowledge-holes in order to sell, but I read in Sapiens: A History of Humankind that it also reflected a different mindset – they acted as if knowledge couldn't progress so what we had was as good as we were ever gonna have), but starting around this time, we see maps with blank areas clearly marked as unexplored, which invited curiosity.

Admitting what we didn't know led to the desire to find out.

But why was truth from established authority no longer satisfactory?

Sapiens: A Brief History of Humankind has an explanation for the post-1600 Europeans' odd (enterprising & exploratory) outlook.
- Counterexample of the Chinese general who once took fleets as far as Madagascar but whose expeditions ended due to lack of interest from the throne. This is the norm for most societies. China didn't believe there was anything of interest far away. Thus they never discovered Polynesia or Australia.
- Science was supported by empires with ample funds. Every Royal Navy ship brought a scientist or two just because, to document what they found. From the empire's perspective, it was also a way to buy legitimacy for colonialism, "white man's burden".
- Dutch East India Company. Early stock market.
- The tiny Netherlands defeated Spain because investors trusted NL finances. Spanish king unable to get loans, while NL could get all the loans they wanted. Early example of the fact that credit-ratin wins wars.

What links here

Default to null
Settled science
My Copia
2021-04-12

Created 2023-Jan-04 (3 years ago)

Say not "truth"

"Different societies have different truths" – no, they have different beliefs
A belief is not "true" when it matches reality – it is "pretty accurate". See also Examples of type error: a belief is a cons cell binding a proposition to a number between zero and one, never exactly zero and never exactly one, and certainly not a boolean. If your belief put 70% on something later shown real, the belief's accuracy was log₂(70%) = -0.51 bits. If your belief put 30%, you did worse: log₂(30%) = -1.73 bits. Accuracy is measured in negative numbers up to zero, never hitting exactly zero.

Created 2023-Jan-04 (3 years ago)

Occam's Razor not some fuzzy rule of thumb

What links here

Nullius in verba

What links here

Say not "truth"

Fallacy of generalizing from fictional evidence

What links here