Buscar

Sober (2016) Are scientific theories really better when they are simpler_ _ Aeon Essays

Prévia do material em texto

Photo by Getty
E S S AY S
Why is
simpler
better?
Ockham’s Razor says that simplicity is a scientific virtue,
but justifying this philosophically is strangely elusive
by Elliott Sober
Elliott Sober is the Hans Reichenbach Professor and William F Vilas Research Professor
in the department of philosophy at the University of Wisconsin, Madison. His latest book
https://aeon.co/essays
https://aeon.co/users/elliott-sober
https://aeon.co/users/elliott-sober
Two of Barcelona’s architectural masterpieces are as
different as different could be. The Sagrada Família,
designed by Antoni Gaudí, is only a few miles from the
German Pavilion, built by Mies van der Rohe. Gaudí’s
church is flamboyant and complex. Mies’s pavilion is
tranquil and simple. Mies, the apostle of minimalist
architecture, used the slogan ‘less is more’ to express what
he was after. Gaudí never said ‘more is more’, but his
buildings suggest that this is what he had in mind.
One reaction to the contrast between Mies and Gaudí is to
choose sides based on a conviction concerning what all art
should be like. If all art should be simple or if all art should
be complex, the choice is clear. However, both of these
norms seem absurd. Isn’t it obvious that some estimable
art is simple and some is complex? True, there might be
extremes that are beyond the pale; we are alienated by art
that is far too complex and bored by art that is far too
simple. However, between these two extremes there is a
vast space of possibilities. Different artists have had
different goals. Artists are not in the business of trying to
discover the uniquely correct degree of complexity that all
artworks should have. There is no such timeless ideal.
Science is different, at least according to many scientists.
Albert Einstein spoke for many when he said that ‘it can
scarcely be denied that the supreme goal of all theory is to
is Ockham’s Razors: A User’s Manual (2015). 
Follow Elliott
2,600 words
Edited by Sam Dresser
Send to Kindle
 
https://aeon.co/users/samdresser
javascript:(%0A%28function%28%29%7Bwindow.baseUrl%3D%27//www.readability.com%27%3Bwindow.readabilityToken%3D%27%27%3Bvar%20s%3Ddocument.createElement%28%27script%27%29%3Bs.setAttribute%28%27type%27%2C%27text/javascript%27%29%3Bs.setAttribute%28%27charset%27%2C%27UTF-8%27%29%3Bs.setAttribute%28%27src%27%2CbaseUrl%2B%27/bookmarklet/send-to-kindle.js%27%29%3Bdocument.documentElement.appendChild%28s%29%3B%7D%29%28%29)
make the irreducible basic elements as simple and as few
as possible without having to surrender the adequate
representation of a single datum of experience’. The search
for simple theories, then, is a requirement of the scientific
enterprise. When theories get too complex, scientists
reach for Ockham’s Razor, the principle of parsimony, to
do the trimming. This principle says that a theory that
postulates fewer entities, processes or causes is better
than a theory that postulates more, so long as the simpler
theory is compatible with what we observe. But what does
‘better’ mean? It is obvious that simple theories can be
beautiful and easy to understand, remember and test. The
hard problem is to explain why the fact that one theory is
simpler than another tells you anything about the way the
world is.
One of the most famous scientific endorsements of
Ockham’s Razor can be found in Isaac Newton’s
Mathematical Principles of Natural Philosophy (1687), where
he states four ‘Rules of Reasoning’. Here are the first two:
Rule I. No more causes of natural things should be
admitted than are both true and sufficient to explain
their phenomena. As the philosophers say: nature
does nothing in vain, and more causes are in vain
when fewer suffice. For nature is simple and does
not indulge in the luxury of superfluous causes.
Rule II. Therefore, the causes assigned to natural
effects of the same kind must be, so far as possible, the
same. Examples are the cause of respiration in man
and beast, or of the falling of stones in Europe and
America, or of the light of a kitchen fire and the
Sun, or of the reflection of light on our Earth and
the planets.
Newton doesn’t do much to justify these rules, but in an
unpublished commentary on the book of Revelations, he
says more. Here is one of his ‘Rules for
methodising/construing the Apocalypse’:
To choose those constructions which without straining
reduce things to the greatest simplicity. The reason of
this is… [that] truth is ever to be found in
simplicity, and not in the multiplicity and
confusion of things. It is the perfection of God’s
works that they are all done with the greatest
simplicity. He is the God of order and not of
confusion. And therefore as they that would
understand the frame of the world must endeavour
to reduce their knowledge to all possible simplicity,
so it must be in seeking to understand these
visions…
Newton thinks that preferring simpler theories makes
sense, whether the task is to interpret the Bible or to
discover the laws of physics. Ockham’s Razor is right on
both counts because the Universe was created by God.
n the 20th century, philosophers, statisticians and
scientists have made progress on understanding why the
simplicity of a theory is relevant to assessing what the
world is like. Their justifications of Ockham’s Razor do not
depend on theology, nor do they invoke the grandiose
I
thesis that nature is simple. There are at least three
‘parsimony paradigms’ within which the razor can be
justified.
The first is exemplified by the advice given to medical
students that they should ‘avoid chasing zebras’. If a
patient’s symptoms can be explained by the hypothesis
that she has common disease C, and also can be explained
by the hypothesis that she has rare disease R, you should
prefer the C diagnosis over the R. C is said to be more
parsimonious. In this case, the more parsimonious
hypothesis has the higher probability of being true.
There is another situation in which simpler theories have
higher probabilities. It involves the version of Ockham’s
Razor that I call ‘the razor of silence’. If you have evidence
that C is a cause of E, and no evidence that C is a cause of
E, then C is a better explanation of E than C &C is. The
19th-century philosopher John Stuart Mill was thinking of
such cases when he said that the principle of parsimony is
a case of the broad practical principle, not to believe
anything of which there is no evidence … The
assumption of a superfluous cause is a belief
without evidence; as if we were to suppose that a
man who was killed by falling over a precipice must
have taken poison as well.
Mill is talking about the razor of silence. The better
explanation of E is silent about C ; it does not deny that C
was a cause. The problem changes if you consider two
conjunctive hypotheses. Which is the better explanation of
1 2
1 1 2
2 2
E: C &notC or C &C ? The razor of silence provides no
guidance, but another razor, the razor of denial, does. It
tells you to prefer the former. Unfortunately, it is unclear
what justification there could be for this claim if you have
no evidence, one way or the other, as to whether C is true.
The razor of silence is easy to justify; justifying the razor of
denial is more difficult.
Postulating a single common cause is more
parsimonious than postulating a large
number of independent, separate causes
In the example of the rare and common diseases, the two
hypotheses confer the same probability on the
observations. The second parsimony paradigm focuses on
situations in which a simpler hypothesis and a more
complex hypothesis confer different probabilities on the
observations. In many such cases, the evidence favours the
simpler theory over its more complex competitor. For
example, suppose that all the lights in your neighbourhood
go out at the same time. You then consider two
hypotheses:
(H ) something happened to the power plant at 8pm
on Tuesday that influenced all the lights; or
(H ) something happened to each of the light bulbs
at 8pm on Tuesday thatinfluenced whether the
light would go on.
Postulating a single common cause is more parsimonious
1 2 1 2
2
1
2
than postulating a large number of independent, separate
causes. The simultaneous darkening of all those lights is
more probable if H is true than it would be if H were true.
Building on ideas developed by the philosopher Hans
Reichenbach, you can prove mathematically (from
assumptions that flesh out what H and H are saying) that
the observations favour H over H . The mathematically
curious could have a look at my book Ockham’s Razors: A
User’s Manual (2015).
An important biological example in which common causes
are preferred to separate causes can be found in Charles
Darwin’s hypothesis that all present-day life traces back to
one or a few original progenitors. Modern biologists are on
the same page when they point to the near universality of
the genetic code as strongly favouring the hypothesis of
universal common ancestry over the hypothesis of
multiple ancestors. The shared code would be a surprising
coincidence if different groups of organisms stemmed
from different start-ups. It would be much more probable
if all current life traced back to a single origination.
ccording to the third parsimony paradigm, parsimony is
relevant to estimating how accurately a model will predict
new observations. A central result in the part of statistics
called ‘model selection theory’ is due to Hirotugu Akaike,
who proved a surprising theorem that demonstrated this
relevance. This theorem is the basis of a model evaluation
criterion that came to be called AIC (the Akaike
Information Criterion). AIC says that a model’s ability to
predict new data can be estimated by seeing how well it
fits old data and by seeing how simple it is.
1 2
1 2
1 2
A
Here’s an example. You are driving down a country road
late in the summer and notice that there are two huge
fields of corn, one on each side of the road. You stop your
car and sample 100 corn plants from each field. You find
that the average height in the first sample is 52 inches and
the average height in the second sample is 56 inches. Since
it is late in the growing season, you assume that the
average heights in the two huge fields will not change over
the next few days. You plan to return to the two fields
tomorrow and sample 100 corn plants from each. Which of
the following two predictions do you think will be more
accurate?
Prediction A: the 100 plants you sample tomorrow
from the first population will average 52 inches and
the 100 plants you sample tomorrow from the
second will average 56 inches.
Prediction B: each of the two samples will average
54 inches.
Model selection theory says that this problem can be
solved by considering the following two models of the
average heights in the two populations:
DIFF: the average height in the first population = h ,
and the average height in the second population =
h .
NULL: the average height in the first population =
the average height in the second population = h.
1
2
Neither model says what the values are of h , h , and h;
these are called ‘adjustable parameters.’ The NULL model
has that name because it says that the two populations do
not differ in their average heights. The name I give to the
DIFF model is a little misleading, since the model doesn’t
say that the two populations differ in their average
heights. DIFF allows for that possibility, but it also allows
that the two populations might have the same average
height.
What do DIFF and NULL predict about the data you will
draw from the two fields tomorrow? The models on their
own don’t provide numbers. However, you can fit each
model to your old data by estimating the values of the
adjustable parameters (h , h , and h) in the two models.
The result is the following two fitted models:
f(DIFF): h = 52 inches, and h = 56 inches.
f(NULL): h = 54 inches.
The question of which model will more accurately predict
new data is interpreted to mean: which model, when fitted
to the old data you have, will more accurately predict the
new data that you do not yet have?
DIFF, you might be thinking, has got to be true. And NULL,
you might also be thinking, must be false. What are the
odds that two huge populations of corn plants should have
exactly the same average heights? If your goal were to say
which of the two models is true and which is false, you’d be
done. But that is not the problem at hand. Rather, you
want to evaluate the two models for their predictive
accuracies. One of the surprising facts about models such
1 2
1 2
1 2
as NULL and DIFF is that a model known to be false will
sometimes make more accurate predictions than a model
known to be true. NULL, though false, might be close to
the truth. If it is, you might be better off using NULL to
predict new data, rather than using DIFF to make your
prediction. After all, the old data might be
unrepresentative! NULL keeps you to the straight and
narrow; DIFF invites you to stray.
There is no disputing matters of taste when it
comes to the value of simplicity and
complexity in works of art. But simplicity, in
science, is not a matter of taste
The Akaike Information Criterion evaluates NULL and
DIFF by taking account of two facts: f(DIFF) fits the old
data better than f(NULL) does, and DIFF is more complex
than NULL. Here the complexity of a model is the number
of adjustable parameters the model contains. As I
mentioned, AIC is based on Akaike’s theorem, which can be
described informally as follows:
An unbiased estimate of the predictive accuracy of
model M = [how well f(M) fits the old data] minus
[the number of adjustable parameters M contains].
A mathematical result, therefore, can establish that
parsimony is relevant to estimating predictive accuracy.
Akaike’s theorem is a theorem, which means that it is
derived from assumptions. There are three. The first is
that old and new data sets are generated from the same
underlying reality; this assumption is satisfied in our
example if each population’s average height remains
unchanged as the old and new data sets are drawn. The
second assumption is that repeated estimates of each of
the parameters in a model will form a bell-shaped
distribution. The third assumption is that one of the
competing models is true, or is close to the truth. That
assumption is satisfied in the corn example, since either
NULL or DIFF must be true.
Gaudí and Mies remind us that there is no disputing
matters of taste when it comes to assessing the value of
simplicity and complexity in works of art. Einstein and
Newton say that science is different – simplicity, in
science, is not a matter of taste. Reichenbach and Akaike
provided some reasons for why this is so. The upshot is
that there are three parsimony paradigms that explain
how the simplicity of a theory can be relevant to saying
what the world is like:
Paradigm 1: sometimes simpler theories have
higher probabilities.
Paradigm 2: sometimes simpler theories are better
supported by the observations.
Paradigm 3: sometimes the simplicity of a model is
relevant to estimating its predictive accuracy.
These three paradigms have something important in
common. Whether a given problem fits into any of them
03 May, 2016
depends on empirical assumptions about the problem.
Those assumptions might be true of some problems, but
false of others. Although parsimony is demonstrably
relevant to forming judgments about what the world is
like, there is in the end no unconditional and
presuppositionless justification for Ockham’s Razor.
Logic Philosophy of Science History of Ideas All topics →
https://aeon.co/philosophy/logic
https://aeon.co/philosophy/philosophy-of-science
https://aeon.co/philosophy/history-of-ideas
https://aeon.co/sections

Continue navegando