Musonius Rufus (c. 30–62 CE)

Gaius Musonius Rufus was one of the four great Stoic philosophers of the Roman empire, along with Seneca, Epictetus, and Marcus Aurelius. Renowned as a great Stoic teacher, Musonius conceived of philosophy as nothing but the practice of noble behavior. He advocated a commitment to live for virtue, not pleasure, since virtue saves us from the mistakes that ruin life. Though philosophy is more difficult to learn than other subjects, it is more important because it corrects the errors in thinking that lead to errors in acting. He also called for austere personal habits, including the simplest vegetarian diet, and minimal, inexpensive garments and footwear, in order to achieve a good, sturdy life in accord with the principles of Stoicism. He believed that philosophy must be studied not to cultivate brilliance in arguments or an excessive cleverness, but to develop good character, a sound mind, and a tough, healthy body. Musonius condemned all luxuries and disapproved of sexual activity outside of marriage. He argued that women should receive the same education in philosophy as men, since the virtues are the same for both sexes. He praised the married life with lots of children. He affirmed Stoic orthodoxy in teaching that neither death, injury, insult, pain, poverty, nor exile is to be feared since none of them are evils.

Table of Contents

  1. Life
  2. Teachings
  3. Philosophy, Philosophers, and Virtue
  4. Food and Frugality
  5. Women and Equal Education
  6. Sex, Marriage, Family, and Old Age
  7. Impact
  8. References and Further Reading

1. Life

Gaius Musonius Rufus was born before 30 C.E. in Volsinii, an Etruscan city of Italy, as a Roman eques (knight), the class of aristocracy ranked second only to senators. He was a friend of Rubellius Plautus, whom emperor Nero saw as a threat. When Nero banished Rubellius around 60 C.E., Musonius accompanied him into exile in Asia Minor. After Rubellius died in 62 C.E. Musonius returned to Rome, where he taught and practiced Stoicism, which roused the suspicion of Nero. On discovery of the great conspiracy against Nero, led by Calpurnius Piso in 65 C.E., Nero banished Musonius to the arid, desolate island of Gyaros in the Aegean Sea. He returned to Rome under the reign of Galba in 68 C.E. and tried to advocate peace to the Flavian army approaching Rome. In 70 C.E. Musonius secured the conviction of the philosopher Publius Egnatius Celer, who had betrayed Barea Soranus, a friend of Rubellius Plautus. Musonius was exiled a second time, by Vespasian, but returned to Rome in the reign of Titus. Musonius was highly respected and had a considerable following during his life. He died before 101-2 C.E.

2. Teachings

Either Musonius wrote nothing himself, or what he did write is lost, because none of his own writings survive. His philosophical teachings survive as thirty-two apophthegms (pithy sayings) and twenty-one longer discourses, all apparently preserved by others and all in Greek, except for Aulus Gellius’ testimonia in Latin. For this reason, it is likely that he lectured in Greek. Musonius favored a direct and concise style of instruction. He taught that the teacher of philosophy should not present many arguments but rather should offer a few, clear, practical arguments oriented to his listener and couched in terms known to be persuasive to that listener.

3. Philosophy, Philosophers, and Virtue

Musonius believed that (Stoic) philosophy was the most useful thing. Philosophy persuades us, according to Stoic teaching, that neither life, nor wealth, nor pleasure is a good, and that neither death, nor poverty, nor pain is an evil; thus the latter are not to be feared. Virtue is the only good because it alone keeps us from making errors in living. Moreover, it is only the philosopher who seems to make a study of virtue. The person who claims to be studying philosophy must practice it more diligently than the person studying medicine or some other skill, because philosophy is more important, and more difficult to understand, than any other pursuit. This is because, unlike other skills, people who study philosophy have been corrupted in their souls with vices and thoughtless habits by learning things contrary to what they will learn in philosophy. But the philosopher does not study virtue just as theoretical knowledge. Rather, Musonius insists that practice is more important than theory, as practice more effectively leads us to action than theory. He held that though everyone is naturally disposed to live without error and has the capacity to be virtuous, someone who has not actually learned the skill of virtuous living cannot be expected to live without error any more than someone who is not a trained doctor, musician, scholar, helmsman, or athlete could be expected to practice those skills without error.

In one of his lectures Musonius recounts the advice he offered to a visiting Syrian king. A king must protect and help his subjects, so a king must know what is good or bad, helpful or harmful, useful or useless for people. But to diagnose these things is precisely the philosopher’s job. Since a king must also know what justice is and make just decisions, a king must study philosophy. A king must possess self-control, frugality, modesty, courage, wisdom, magnanimity, the ability to prevail in speech over others, the ability to endure pain, and must be free of error. Philosophy, Musonius argued, is the only art that provides all such virtues. To show his gratitude the king offered him anything he wanted, to which Musonius asked only that the king adhere to the principles set forth.

Musonius held that since a human being is made of body and soul, we should train both, but the latter demands greater attention. This dual method requires becoming accustomed to cold, heat, thirst, hunger, scarcity of food, a hard bed, abstaining from pleasures, and enduring pains. This method strengthens the body, inures it to suffering, and makes it fit for every task. He believed that the soul is similarly strengthened by developing courage through enduring hardships, and by making it self-controlled through abstaining from pleasures. Musonius insisted that exile, poverty, physical injury, and death are not evils and a philosopher must scorn all such things. A philosopher regards being beaten, jeered at, or spat upon as neither injurious nor shameful and so would never litigate against anyone for any such acts, according to Musonius. He argued that since we acquire all good things by pain, the person who refuses to endure pain all but condemns himself to not being worthy of anything good.

Musonius criticized cooks and chefs while defending farming as a suitable occupation for a philosopher and no obstacle to learning or teaching essential lessons.

4. Food and Frugality

Musonius’ extant teachings emphasize the importance of daily practices. For example, he emphasized that what one eats has significant consequences. He believed that mastering one’s appetites for food and drink is the basis for self-control, a vital virtue. He argued that the purpose of food is to nourish and strengthen the body and to sustain life, not to provide pleasure. Digesting our food gives us no pleasure, he reasoned, and the time spent digesting food far exceeds the time spent consuming it. It is digestion which nourishes the body, not consumption. Therefore, he concluded, the food we eat serves its purpose when we’re digesting it, not when we’re tasting it.

The proper diet, according to Musonius, was lacto-vegetarian. These foods are least expensive and most readily available: raw fruits in season, certain raw vegetables, milk, cheese, and honeycombs. Cooked grains and some cooked vegetables are also suitable for humans, whereas a meat-based diet is too crude for human beings and is more suitable for wild beasts. Those who eat relatively large amounts of meat seemed slow-witted to Musonius.

We are worse than brute animals when it comes to food, he thought, because we are obsessed with embellishing how our food is presented and fuss about what we eat and how we prepare it merely to amuse our palates. Moreover, too much rich food harms the body. For these reasons, Musonius thought that gastronomic pleasure is undoubtedly the most difficult pleasure to combat. He consequently rejected gourmet cuisine and delicacies as a dangerous habit. He judged gluttony and craving gourmet food to be most shameful and to show a lack of moderation. Indeed, Musonius was of the opinion that those who eat the least expensive food can work harder, are the least fatigued by working, become sick less often, tolerate cold, heat, and lack of sleep better, and are stronger, than those who eat expensive food. He concluded that responsible people favor what is easy to obtain over what is difficult, what involves no trouble over what does, and what is available over what isn’t. These preferences promote self-control and goodness.

Musonius advocated a similarly austere philosophy about clothes. The purpose of our clothes and footwear is strictly protection from the elements. So clothes and shoes should be modest and inexpensive, not attract the attention of the foolish. One should dress to strengthen and toughen the body, not to bundle up in many layers so as never to experience cold and heat and make the body soft and sensitive. Musonius recommended dressing to feel somewhat cold in the winter and avoiding shade in the summer. If possible, he advised, go shoeless.

The purpose of houses, he believed, was to protect us from the elements, to keep out cold, excessive heat, and the wind. Our dwelling should protect us and our food the way a cave would. Money should be spent both publicly and privately on people, not on elaborate buildings or fancy décor. Beds or tables of ivory, silver, or gold, hard-to-get textiles, cups of gold, silver, or marble—all such furnishings are entirely unnecessary and shamefully extravagant. Items that are expensive to acquire, hard to use, troublesome to clean, difficult to guard, or impractical, are inferior when compared with inexpensive, useful, and practical items made of cast iron, plain ceramic, wood, and the like. Thoughtless people covet expensive furnishings they wrongly believe are good and noble. He said he would rather be sick than live in luxury, because illness harms only the body, whereas living in luxury harms both the body and the soul. Luxury makes the body weak and soft and the soul undisciplined and cowardly. Musonius judged that luxurious living fosters unvarnished injustice and greed, so it must be completely avoided.

For the ancient Roman philosophers, following their Greek predecessors, the beard was the badge of a philosopher. Musonius said that a man should cut the hair on his scalp the way he prunes vines, by removing only what is useless and bothersome. The beard, on the other hand, should not be shaved, he insisted, because (1) nature provides it to protect a man’s face, and (2) the beard is the emblem of manhood, the human equivalent of the cock’s comb and the lion’s mane. Hair should never be trimmed to beautify or to please women or boys. Hair is no more trouble for men than feathers are for birds, Musonius said. Consequently, shaving or fastidiously trimming one’s beard were acts of emasculation.

5. Women and Equal Education

Musonius supported his belief that women ought to receive the same education in philosophy as men with the following arguments. First, the gods have given women the same power of reason as men. Reason considers whether an action is good or bad, honorable or shameful. Second, women have the same senses as men: sight, hearing, smell, and the rest. Third, the sexes share the same parts of the body: head, torso, arms, and legs. Fourth, women have an equal desire for virtue and a natural affinity for it. Women, no less than men, are by nature pleased by noble, just deeds and censure their opposites. Therefore, Musonius concluded, it is just as appropriate for women to study philosophy, and thereby to consider how to live honorably, as it is for men.

Moreover, he reasoned, a woman must be able to manage an estate, to account for things beneficial to it, and to supervise the household staff. A woman must also have self-control. She must be free from sexual improprieties and must exercise self-control over other pleasures. She must neither be a slave to desires, nor quarrelsome, nor extravagant, nor vain. A self-controlled woman, Musonius believed, controls her anger, is not overcome by grief, and is stronger than every emotion. But these are the character traits of a beautiful person, whether male or female.

Musonius argued that a woman who studies philosophy would be just, a blameless partner in life, a good and like-minded co-worker, a careful guardian of husband and children, and completely free from the love of gain and greed. She would regard it worse to do wrong than to be wronged, and worse to take more than one’s share than to suffer loss. No one, Musonius insisted, would be more just than she. Moreover, a philosophic woman would love her children more than her own life. She would not hesitate to fight to protect her children any more than a hen that fights with predators much larger than she is to protect her chicks.

He considered it appropriate for an educated woman to be more courageous than an uneducated woman and for a woman trained in philosophy to be more courageous than one untrained in philosophy. Musonius noted that the tribe of Amazons conquered many people with weapons, thus demonstrating that women are fully capable of courageously participating in armed conflict. He thought it appropriate for the philosophic woman not to submit to anything shameful out of fear of death or pain. Nor is it appropriate for her to bow down to anyone, whether well-born, powerful, wealthy, or even a tyrant. Musonius saw the philosophic woman as holding the same beliefs as the Stoic man: she thinks nobly, does not judge death to be an evil, nor life to be a good, nor shrinks from pain, nor pursues lack of pain above all else. Musonius thought it likely that this kind of woman would be self-motivated and persevering, would breast-feed her children, serve her husband with her own hands, and do without hesitation tasks which some regard as appropriate for slaves. Thus, he judged that a woman like this would be a great advantage for her husband, a source of honor for her kinfolk, and a good example for the women who know her.

Musonius believed that sons and daughters should receive the same education, since those who train horses and dogs train both the males and the females the same way. He rejected the view that there is one type of virtue for men and another for women. Women and men have the same virtues and so must receive the same education in philosophy.

When it comes to certain tasks, on the other hand, Musonius was less egalitarian. He believed that the nature of males is stronger and that of females is weaker, and so the most suitable tasks ought to be assigned to each nature accordingly. In general, men should be assigned the heavier tasks (e.g. gymnastics and being outdoors), women the lighter tasks (e.g. spinning yarn and being indoors). Sometimes, however, special circumstances—such as a health condition—would warrant men undertaking some of the lighter tasks which seem to be suited for women, and women in turn undertaking some of the harder ones which seem more suited for men. Thus, Musonius concludes that no chores have been exclusively reserved for either sex. Both boys and girls must receive the same education about what is good and what is bad, what is helpful and what is harmful. Shame towards everything base must be instilled in both sexes from infancy on. Musonius’ philosophy of education dictated that both males and females must be accustomed to endure toil, and neither to fear death nor to become dejected in the face of any misfortune.

6. Sex, Marriage, Family, and Old Age

Musonius’ opposition to luxurious living extended to his views about sex. He thought that men who live luxuriously desire a wide variety of sexual experiences, both legitimate and illegitimate, with both women and men. He remarked that sometimes licentious men pursue a series of male sex-partners. Sometimes they grow dissatisfied with available male sex-partners and choose to pursue those who are hard to get. Musonius condemned all such recreational sex acts. He insisted that only those sex acts aimed at procreation within marriage are right. He decried adultery as unlawful and illegitimate. He judged homosexual relationships as an outrage contrary to nature. He argued that anyone overcome by shameful pleasure is base in his lack of self-control, and so blamed the man (married or unmarried) who has sex with his own female slave as much as the woman (married or unmarried) who has sex with her male slave.

Musonius argued that there must be companionship and mutual care of husband and wife in marriage since its chief end is to live together and have children. Spouses should consider all things as common possessions and nothing as private, not even the body itself. But since procreation can result from sexual relations outside of marriage, childbirth cannot be the only motive for marriage. Musonius thought that when each spouse competes to surpass the other in giving complete care, the partnership is beautiful and admirable. But when a spouse considers only his or her own interests and neglects the other's needs and concerns, their union is destroyed and their marriage cannot help but go poorly. Such a couple either splits up or their relationship becomes worse than solitude.

Musonius advised those planning to marry not to worry about finding partners from noble or wealthy families or those with beautiful bodies. Neither wealth, nor beauty, nor noble birth promotes a sense of partnership or harmony, nor do they facilitate procreation. He judged bodies that are healthy, normal in form, and able to function on their own, and souls that are naturally disposed towards self-control, justice, and virtue, as most fit for marriage.

Since he held that marriage is obviously in accordance with nature, Musonius rejected the objection that being married gets in the way of studying philosophy. He cited Pythagoras, Socrates, and Crates as examples of married men who were fine philosophers. To think that each should look only to his own affairs is to think that a human being is no different from a wolf or any other wild beast whose nature is to live by force and greed. Wild animals spare nothing they can devour, they have no companions, they never work together, and they have no share in anything just, according to Musonius. He supposed that human nature is very much like that of bees, who are unable to live alone and die when isolated. Bees work and act together. Wicked people are unjust and savage and have no concern for a neighbor in trouble. Virtuous people are good, just, and kind, and show love for their fellow human beings and concern for their neighbors. Marriage is the way for a person to create a family to provide the well-being of the city. Therefore, Musonius judged that anyone who deprives people of marriage destroys family, city, and the entire human race. He reasoned that this is because humans would cease to exist if there were no procreation, and there would be no just and lawful procreation without marriage. He concluded that marriage is important and serious because great gods (Hera, Eros, Aphrodite) govern it.

Musonius opined that the bigger a family, the better. He thought that having many children is beneficial and profitable for cities while having few or none is harmful. Consequently, he praised the wise lawgivers who forbade women from agreeing to be childless and from preventing conception, who enacted punishment for women who induced miscarriages, who punished married couples who were childless, and who honored married couples with many children. He reasoned that just as the man with many friends is mightier than the man without friends, the father with many children is much mightier than the man who has few or none. Musonius was very impressed by the spectacle of a husband or wife with their many children crowded around them. He believed that no procession for the gods and no sacred ritual is as beautiful to behold, or as worthy of being witnessed, as a chorus of many children leading their parents through the city. Musonius scolded those who offered poverty as an excuse for not raising many children. He was outraged by the well-off or affluent refusing to raise children who are born later so that the earlier-born may be better off. Musonius believed that it was an unholy act to deprive the earlier-born of siblings to increase their inheritance. He thought it much better to have many siblings than to have many possessions. Possessions invite plots from neighbors. Possessions need protection. Siblings, he taught, are one’s greatest protectors. Musonius opined that the man who enjoys the blessing of many loyal brothers is most worthy of emulation and most beloved by the gods.

When asked whether one’s parents should be obeyed in all matters, Musonius answered as follows. If a father, who was neither a doctor nor knowledgeable about health and sickness, ordered his sick son to take something he believed would help, but which would in fact be useless, or even harmful, and the sick son, knowing this, did not comply with his father’s order, the son would not be acting disobediently. Similarly, if the father was ill and asked for wine or inappropriate food that would worsen his illness if consumed, and the son, knowing better, refused to give them to his ill father, the son would not be acting disobediently. Even less disobedient, Musonius argued, is the son who refuses to steal or embezzle money entrusted to him when his greedy father commands it. The lesson is that refusing to do what one ought not to do merits praise, not blame. The disobedient person disobeys orders that are right, honorable, and beneficial, and acts shamefully in doing so. But to refuse to obey a shameful, blameworthy command of a parent is just and blameless. The obedient person obeys only his parent’s good, appropriate advice, and obeys all of such advice. The law of Zeus orders us to be good, and being good is the same thing as being a philosopher, Musonius taught.

He argued that the best thing to have on hand during old age is living in accord with nature, the definitive goal of life according to the Stoics. Human nature, he thought, can be better understood by comparing it to the nature of other animals. Horses, dogs, and cows are all inferior to human beings. We do not consider a horse to reach its potential by merely eating, drinking, mating without restraint, and doing none of the things suitable for a horse. Nor do we regard a dog as reaching its potential if it merely indulges in all sorts of pleasures while doing none of the things for which dogs are thought to be good. Nor would any other animal reach its potential by being glutted with pleasures but failing to function in a manner appropriate to its species. Hence, no animal comes into existence for pleasure. The nature of each animal determines the virtue characteristic of it. Nothing lives in accord with nature except what demonstrates its virtue through the actions which it performs in accord with its own nature. Therefore, Musonius concluded, the nature of human beings is to live for virtue; we did not come into existence for the sake of pleasure. Those who live for virtue deserve praise, can rightly think well of themselves, and can be hopeful, courageous, cheerful, and joyful. The human being, Musonius taught, is the only creature on earth that is the image of the divine. Since we have the same virtues as the gods, he reasoned, we cannot imagine better virtues than intelligence, justice, courage, and self-control. Therefore, a god, since he has these virtues, is stronger than pleasure, greed, desire, envy, and jealousy. A god is magnanimous and both a benefactor to, and a lover of, human beings. Consequently, Musonius reasoned, inasmuch as a human being is a copy of a god, a human being must be considered to be like a god when he acts in accord with nature. The life of a good man is the best life, and death is its end. Musonius thought that wealth is no defense against old age because wealth lets people enjoy food, drink, sex, and other pleasures, but never supplies contentment to a wealthy person nor banishes his grief. Therefore, the good man lives without regret, and according to nature by accepting death fearlessly and boldly in his old age, living happily and honorably till the end.

7. Impact

Musonius seems to have acted as an advisor to his friends the Stoic martyrs Rubellius Plautus, Barea Soranus, and Thrasea. Musonius’ son-in-law Artemidorus was judged by Pliny to be the greatest of the philosophers of his day, and Pliny himself professed admiration for Musonius. Musonius’ pupils and followers include Fundanus, the eloquent philosopher Euphrates of Tyre, Timocrates of Heracleia, Athenodotus (the teacher of Marcus Cornelius Fronto), and the golden-tongued orator Dio of Prusa. His greatest student was Epictetus, who mentions him several times in The Discourses of Epictetus written by Epictetus’ student Arrian. Epictetus’ philosophy was deeply influenced by Musonius.

After his death Musonius was admired by philosophers and theologians alike. The Stoic Hierocles and the Christian apologist Clement of Alexandria were strongly influenced by Musonius. Roman emperor Julian the Apostate said that Musonius became famous because he endured his sufferings with courage and sustained with firmness the cruelty of tyrants. Dio of Prusa judged that Musonius enjoyed a reputation greater than any one man had attained for generations and that he was the man who, since the time of the ancients, had lived most nearly in conformity with reason. The Greek sophist Philostratus declared that Musonius was unsurpassed in philosophic ability.

8. References and Further Reading

  • Engel, David M. 'The Gender Egalitarianism of Musonius Rufus.' Ancient Philosophy 20 (Fall 2000): 377-391.
  • Geytenbeek, A. C. van. Musonius Rufus and Greek Diatribe. Wijsgerige Texten en Studies 8. Assen: Van Gorcum, 1963.
  • King, Cynthia (trans.). Musonius Rufus. Lectures and Sayings. CreateSpace, 2011.
  • Klassen, William. 'Musonius Rufus, Jesus, and Paul: Three First-Century Feminists.' Pages 185-206 in From Jesus to Paul: Studies in Honour of Francis Wright Beare. Ed. P. Richardson and J. C. Hurd. Waterloo, ON: Wilfrid Laurier University Press, 1984.
  • Lutz, Cora E. Musonius Rufus: 'The Roman Socrates'. Yale Classical Studies 10: 3-147. New Haven: Yale Univ., 1947.
  • Nussbaum, Martha C. 'The Incomplete Feminism of Musonius Rufus, Platonist, Stoic, and Roman.' Pages 283-326 in The Sleep of Reason. Erotic Experience and Sexual Ethics in Ancient Greece and Rome. Ed. M. C. Nussbaum and J. Sihvola. Chicago: The University of Chicago Press, 2002.
  • Stephens, William O. 'The Roman Stoics on Habit.' Pages 37-65 in A History of Habit: From Aristotle to Bourdieu. Ed. T. Sparrow and A. Hutchinson. Lanham, MD: Lexington Books, 2013.

Author Information

William O. Stephens
Creighton University
U. S. A.

The Philosophy of Anthropology

The Philosophy of Anthropology refers to the central philosophical perspectives which underpin, or have underpinned, the dominant schools in anthropological thinking. It is distinct from Philosophical Anthropology which attempts to define and understand what it means to be human.

This article provides an overview of the most salient anthropological schools, the philosophies which underpin them and the philosophical debates surrounding these schools within anthropology. It specifically operates within these limits because the broader discussions surrounding the Philosophy of Science and the Philosophy of Social Science  have been dealt with at length elsewhere in this encyclopedia. Moreover, the specific philosophical perspectives have also been discussed in great depth in other contributions, so they will be elucidated to the extent that this is useful to comprehending their relationship with anthropology. In examining the Philosophy of Anthropology, it is necessary to draw some, even if cautious borders, between anthropology and other disciplines. Accordingly, in drawing upon anthropological discussions, we will define, as anthropologists, scholars who identify as such and who publish in anthropological journals and the like. In addition, early anthropologists will be selected by virtue of their interest in peasant culture and non-Western, non-capitalist and stateless forms of human organization.

The article specifically aims to summarize the philosophies underpinning anthropology, focusing on the way in which anthropology has drawn upon them. The philosophies themselves have been dealt with in depth elsewhere in this encyclopedia. It has been suggested by philosophers of social science that anthropology tends to reflect, at any one time, the dominant intellectual philosophy because, unlike in the physical sciences, it is influenced by qualitative methods and so can more easily become influenced by ideology (for example Kuznar 1997 or Andreski 1974). This article begins by examining what is commonly termed ‘physical anthropology.’ This is the science-oriented form of anthropology which came to prominence in the nineteenth century. As part of this section, the article also examines early positivist social anthropology, the historical relationship between anthropology and eugenics, and the philosophy underpinning this.

The next section examines naturalistic anthropology. ‘Naturalism,’ in this usage, is drawn from the biological ‘naturalists’ who collected specimens in nature and described them in depth, in contrast to ‘experimentalists.’ Anthropological ‘naturalists’ thus conduct fieldwork with groups of people rather than engage in more experimental methods. The naturalism section looks at the philosophy underpinning the development of ethnography-focused anthropology, including cultural determinism, cultural relativism, fieldwork ethics and the many criticisms which this kind of anthropology has provoked. Differences in its development in Western and Eastern Europe also are analyzed. As part of this, the article discusses the most influential schools within naturalistic anthropology and their philosophical foundations.

The article then examines Post-Modern or ‘Contemporary’ anthropology. This school grew out of the ‘Crisis of Representation’ in anthropology beginning in the 1970s. The article looks at how the Post-Modern critique has been applied to anthropology, and it examines the philosophical assumptions behind developments such as auto-ethnography. Finally, it examines the view that there is a growing philosophical split within the discipline.

Table of Contents

  1. Positivist Anthropology
    1. Physical Anthropology
    2. Race and Eugenics in Nineteenth Century Anthropology
    3. Early Evolutionary Social Anthropology
  2. Naturalist Anthropology
    1. The Eastern European School
    2. The Ethnographic School
    3. Ethics and Participant Observation Fieldwork
  3. Anthropology since World War I
    1. Cultural Determinism and Cultural Relativism
    2. Functionalism and Structuralism
    3. Post-Modern or Contemporary Anthropology
  4. Philosophical Dividing Lines
    1. Contemporary Evolutionary Anthropology
    2. Anthropology: A Philosophical Split?
  5. References and Further Reading

1. Positivist Anthropology

a. Physical Anthropology

Anthropology itself began to develop as a separate discipline in the mid-nineteenth century, as Charles Darwin’s (1809-1882) Theory of Evolution by Natural Selection (Darwin 1859) became widely accepted among scientists. Early anthropologists attempted to apply evolutionary theory within the human species, focusing on physical differences between different human sub-species or racial groups (see Eriksen 2001) and the perceived intellectual differences that followed.

The philosophical assumptions of these anthropologists were, to a great extent, the same assumptions which have been argued to underpin science itself. This is the positivism, rooted in Empiricism, which argued that knowledge could only be reached through the empirical method and statements were meaningful only if they could be empirically justified, though it should be noted that Darwin should not necessarily be termed a positivist. Science needed to be solely empirical, systematic and exploratory, logical, theoretical (and thus focused on answering questions). It needed to attempt to make predictions which are open to testing and falsification and it needed to be epistemologically optimistic (assuming that the world can be understood). Equally, positivism argues that truth-statements are value-neutral, something disputed by the postmodern school. Philosophers of Science, such as Karl Popper (1902-1994) (for example Popper 1963), have also stressed that science must be self-critical, prepared to abandon long-held models as new information arises, and thus characterized by falsification rather than verification though this point was also earlier suggested by Herbert Spencer (1820-1903) (for example Spencer 1873). Nevertheless, the philosophy of early physical anthropologists included a belief in empiricism, the fundamentals of logic and epistemological optimism. This philosophy has been criticized by anthropologists such as Risjord (2007) who has argued that it is not self-aware – because values, he claims, are always involved in science – and non-neutral scholarship can be useful in science because it forces scientists to better contemplate their ideas.

b. Race and Eugenics in Nineteenth Century Anthropology

During the mid-nineteenth and early twentieth centuries, anthropologists began to systematically examine the issue of racial differences, something which became even more researched after the acceptance of evolutionary theory (see Darwin 1871). That said, it should be noted that Darwin himself did not specifically advocate eugenics or theories of progress. However, even prior to Darwin’s presentation of evolution (Darwin 1859), scholars were already attempting to understand 'races' and the evolution of societies from ‘primitive’ to complex (for example Tylor 1865).

Early anthropologists such as Englishman John Beddoe (1826-1911) (Boddoe 1862) or Frenchman Arthur de Gobineau (1816-1882) (Gobineau 1915) developed and systematized racial taxonomies which divided, for example, between ‘black,’ ‘yellow’ and ‘white.’ For these anthropologists, societies were reflections of their racial inheritance; a viewpoint termed biological determinism. The concept of ‘race’ has been criticized, within anthropology, variously, as being simplistic and as not being a predictive (and thus not a scientific) category (for example Montagu 1945) and there was already some criticism of the scope of its predictive validity in the mid-nineteenth century (for example Pike 1869). The concept has also been criticized on ethical grounds, because racial analysis is seen to promote racial violence and discrimination and uphold a certain hierarchy, and some have suggested its rejection because of its connotations with such regimes as National Socialism or Apartheid, meaning that it is not a neutral category (for example Wilson 2002, 229).

Those anthropologists who continue to employ the category have argued that ‘race’ is predictive in terms of life history, only involves the same inherent problems as any cautiously essentialist taxonomy and that moral arguments are irrelevant to the scientific usefulness of a category of apprehension (for example Pearson 1991) but, to a great extent, current anthropologists reject racial categorization. The American Anthropological Association’s (1998) ‘Statement on Race’ began by asserting that: ‘"Race" thus evolved as a worldview, a body of prejudgments that distorts our ideas about human differences and group behavior. Racial beliefs constitute myths about the diversity in the human species and about the abilities and behavior of people homogenized into "racial" categories.’ In addition, a 1985 survey by the American Anthropological Association found that only a third of cultural anthropologists (but 59 percent of physical anthropologists) regarded ‘race’ as a meaningful category (Lynn 2006, 15). Accordingly, there is general agreement amongst anthropologists that the idea, promoted by anthropologists such as Beddoe, that there is a racial hierarchy, with the white race as superior to others, involves importing the old ‘Great Chain of Being’ (see Lovejoy 1936) into scientific analysis and should be rejected as unscientific, as should ‘race’ itself. In terms of philosophy, some aspects of nineteenth century racial anthropology might be seen to reflect the theories of progress that developed in the nineteenth century, such as those of G. W. F. Hegel (1770-1831) (see below). In addition, though we will argue that Herderian nationalism is more influential in Eastern Europe, we should not regard it as having no influence at all in British anthropology. Native peasant culture, the staple of the Eastern European, Romantic nationalism-influenced school (as we will see), was studied in nineteenth century Britain, especially in Scotland and Wales, though it was specifically classified as ‘folklore’ and as outside anthropology (see Rogan 2012). However, as we will discuss, the influence is stronger in Eastern Europe.

The interest in race in anthropology developed alongside a broader interest in heredity and eugenics. Influenced by positivism, scholars such as Herbert Spencer (1873) applied evolutionary theory as a means of understanding differences between different societies. Spencer was also seemingly influenced, on some level, by theories of progress of the kind advocated by Hegel and even found in Christian theology. For him, evolution logically led to eugenics. Spencer argued that evolution involved a progression through stages of ever increasing complexity – from lower forms to higher forms - to an end-point at which humanity was highly advanced and was in a state of equilibrium with nature. For this perfected humanity to be reached, humans needed to engage in self-improvement through selective breeding.

American anthropologist Madison Grant (1865-1937) (Grant 1916), for example, reflected a significant anthropological view in 1916 when he argued that humans, and therefore human societies, were essentially reflections of their biological inheritance and that environmental differences had almost no impact on societal differences. Grant, as with other influential anthropologists of the time, advocated a program of eugenics in order to improve the human stock. According to this program, efforts would be made to encourage breeding among the supposedly superior races and social classes and to discourage it amongst the inferior races and classes (see also Galton 1909). This form of anthropology has been criticized for having a motivation other than the pursuit of truth, which has been argued to be the only appropriate motivation for any scientist. It has also been criticized for basing its arguments on disputed system of categories – race – and for uncritically holding certain assumptions about what is good for humanity (for example Kuznar 1997, 101-109). It should be emphasized that though eugenics was widely accepted among anthropologists in the nineteenth century, there were also those who criticized it and its assumptions (for example Boas 1907. See Stocking 1991 for a detailed discussion). Proponents have countered that a scientist’s motivations are irrelevant as long as his or her research is scientific, that race should not be a controversial category from a philosophical perspective and that it is for the good of science itself that the more scientifically-minded are encouraged to breed (for example Cattell 1972). As noted, some scholars stress the utility of ideologically-based scholarship.

A further criticism of eugenics is that it fails to recognize the supposed inherent worth of all individual humans (for example Pichot 2009). Advocates of eugenics, such as Grant (1916), dismiss this as a ‘sentimental’ dogma which fails to accept that humans are animals, as acceptance of evolutionary theory, it is argued, obliges people to accept, and which would lead to the decline of civilization and science itself. We will note possible problems with this perspective in our discussion of ethics. Also, it might be useful to mention that the form of anthropology that is sympathetic to eugenics is today centered around an academic journal called The Mankind Quarterly, which critics regard as ‘racist’ (for example Tucker 2002, 2) and even academically biased (for example Ehrenfels 1962). Although ostensibly an anthropology journal, it also publishes psychological research. A prominent example of such an anthropologist is Roger Pearson (b. 1927), the journal’s current editor. But such a perspective is highly marginal in current anthropology.

c. Early Evolutionary Social Anthropology

Also from the middle of the nineteenth century, there developed a school in Western European and North American anthropology which focused less on race and eugenics and more on answering questions relating to human institutions, and how they evolved, such as ‘How did religion develop?’ or ‘How did marriage develop?’ This school was known as ‘cultural evolutionism.’ Members of this school, such as Sir James Frazer (1854-1941) (Frazer 1922), were influenced by the positivist view that science was the best model for answering questions about social life. They also shared with other evolutionists an acceptance of a modal human nature which reflected evolution to a specific environment. However, some, such as E. B. Tylor (1832-1917) (Tylor 1871), argued that human nature was the same everywhere, moving away from the focus on human intellectual differences according to race. The early evolutionists believed that as surviving ‘primitive’ social organizations, within European Empires for example, were examples of the ‘primitive Man,’ the nature of humanity, and the origins of its institutions, could be best understood through analysis of these various social groups and their relationship with more ‘civilized’ societies (see Gellner 1995, Ch. 2).

As with the biological naturalists, scholars such as Frazer and Tylor collected specimens on these groups – in the form of missionary descriptions of ‘tribal life’ or descriptions of 'tribal life' by Westernized tribal members – and compared them to accounts of more advanced cultures in order to answer discrete questions. Using this method of accruing sources, now termed ‘armchair anthropology’ by its critics, the early evolutionists attempted to answered discrete questions about the origins and evolution of societal institutions. As early sociologist Emile Durkheim (1858-1917) (Durkheim 1965) summarized it, such scholars aimed to discover ‘social facts.’ For example, Frazer concluded, based on sources, that societies evolved from being dominated by a belief in Magic, to a belief in Spirits and then a belief in gods and ultimately one God. For Tylor, religion began with ‘animism’ and evolved into more complex forms but tribal animism was the essence of religion and it had developed in order to aid human survival.

This school of anthropology has been criticized because of its perceived inclination towards reductionism (such as defining ‘religion’ purely as ‘survival’), its speculative nature and its failure to appreciate the problems inherent in relying on sources, such as ‘gate keepers’ who will present their group in the light in which they want it to be seen. Defenders have countered that without attempting to understand the evolution of societies, social anthropology has no scientific aim and can turn into a political project or simply description of perceived oddities (for example Hallpike 1986, 13). Moreover, the kind of stage theories advocated by Tylor have been criticized for conflating evolution with historicist theories of progress, by arguing that societies always pass through certain phases of belief and the Western civilization is the pinnacle of development, a belief known as unilinealism. This latter point has been criticized as ethnocentric (for example Eriksen 2001) and reflects some of the thinking of Herbert Spencer, who was influential in early British anthropology.

2. Naturalist Anthropology

a. The Eastern European School

Whereas Western European and North American anthropology were oriented towards studying the peoples within the Empires run by the Western powers and was influenced by Darwinian science, Eastern European anthropology developed among nascent Eastern European nations. This form of anthropology was strongly influenced by Herderian nationalism and ultimately by Hegelian political philosophy and the Romantic Movement of eighteenth century philosopher Jean-Jacques Rousseau (1712-1778). Eastern European anthropologists believed, following the Romantic Movement, that industrial or bourgeois society was corrupt and sterile. The truly noble life was found in the simplicity and naturalness of communities close to nature. The most natural form of community was a nation of people, bonded together by shared history, blood and customs, and the most authentic form of such a nation’s lifestyle was to be found amongst its peasants. Accordingly, Eastern European anthropology elevated peasant life as the most natural form of life, a form of life that should, on some level, be strived towards in developing the new ‘nation’ (see Gellner 1995).

Eastern European anthropologists, many of them motivated by Romantic nationalism, focused on studying their own nations’ peasant culture and folklore in order to preserve it and because the nation was regarded as unique and studying its most authentic manifestation was therefore seen as a good in itself. As such, Eastern European anthropologists engaged in fieldwork amongst the peasants, observing and documenting their lives. There is a degree to which the kind of anthropology – or ‘ethnology’ – remains more popular in Eastern than in Western Europe (see, for example, Ciubrinskas 2007 or SarkanyND) at the time of writing.

Siikala (2006) observes that Finnish anthropology is now moving towards the Western model of fieldwork abroad but as recently as the 1970s was still predominantly the study of folklore and peasant culture. Baranski (2009) notes that in Poland, Polish anthropologists who wish to study international topics still tend to go to the international centers while those who remain in Poland tend to focus on Polish folk culture, though the situation is slowly changing. Lithuanian anthropologist Vytis Ciubrinkas (2007) notes that throughout Eastern Europe, there is very little separate ‘anthropology,’ with the focus being ‘national ethnology’ and ‘folklore studies,’ almost always published in the vernacular. But, again, he observes that the kind of anthropology popular in Western Europe is making inroads into Eastern Europe. In Russia, national ethnology and peasant culture also tends to be predominant (for example Baiburin 2005). Indeed, even beyond Eastern Europe, it was noted in the year 2000 that ‘the emphasis of Indian social anthropologists remains largely on Indian tribes and peasants. But the irony is that barring the detailed tribal monographs prepared by the British colonial officers and others (. . .) before Independence, we do not have any recent good ethnographies of a comparable type’ (Srivastava 2000). By contrast, Japanese social anthropology has traditionally been in the Western model, studying cultures more ‘primitive’ than its own (such as Chinese communities), at least in the nineteenth century. Only later did it start to focus more on Japanese folk culture and it is now moving back towards a Western model (see Sedgwick 2006, 67).

The Eastern school has been criticized for uncritically placing a set of dogmas – specifically nationalism – above the pursuit of truth, accepting a form of historicism with regard to the unfolding of the nation’s history and drawing a sharp, essentialist line around the nationalist period of history (for example Popper 1957). Its anthropological method has been criticized because, it is suggested, Eastern European anthropologists suffer from home blindness. By virtue of having been raised in the culture which they are studying, they cannot see it objectively and penetrate to its ontological presuppositions (for example Kapferer 2001).

b. The Ethnographic School

The Ethnographic school, which has since come to characterize social and cultural anthropology, was developed by Polish anthropologist Bronislaw Malinowski (1884-1942) (for example Malinowski 1922). Originally trained in Poland, Malinowski’s anthropological philosophy brought together key aspects of the Eastern and Western schools. He argued that, as with the Western European school, anthropologists should study foreign societies. This avoided home blindness and allowed them to better perceive these societies objectively. However, as with the Eastern European School, he argued that anthropologists should observe these societies in person, something termed ‘participant observation’ or ‘ethnography.’ This method, he argued, solved many of the problems inherent in armchair anthropology.

It is this method which anthropologists generally summarize as ‘naturalism’ in contrast to the ‘positivism,’ usually followed alongside a quantitative method, of evolutionary anthropologists. Naturalist anthropologists argue that their method is ‘scientific’ in the sense that it is based on empirical observation but they argue that some kinds of information cannot be obtained in laboratory conditions or through questionnaires, both of which lend themselves to quantitative, strictly scientific analysis. Human culturally-influenced actions differ from the subjects of physical science because they involve meaning within a system and meaning can only be discerned after long-term immersion in the culture in question. Naturalists therefore argue that a useful way to find out information about and understand a people – such as a tribe – is to live with them, observe their lives, gain their trust and eventually live, and even think, as they do. This latter aim, specifically highlighted by Malinowski, has been termed the empathetic perspective and is considered, by many naturalist anthropologists, to be a crucial sign of research that is anthropological. In addition to these ideas, the naturalist perspective draws upon aspects of the Romantic Movement in that it stresses, and elevates, the importance of ‘gaining empathy’ and respecting the group it is studying, some naturalists argue that there are ‘ways of knowing’ other than science (for example Rees 2010) and that respect for the group can be more important than gaining new knowledge. They also argue that human societies are so complex that they cannot simply be reduced to biological explanations.

In many ways, the successor to Malinowski as the most influential cultural anthropologist was the American Clifford Geertz (1926-2006). Where Malinowski emphasized ‘participant observation’ – and thus, to a greater degree, an outsider perspective – it was Geertz who argued that the successful anthropologist reaches a point where he sees things from the perspective of the native. The anthropologist should bring alive the native point of view, which Roth (1989) notes ‘privileges’ the native, thus challenging a hierarchical relationship between the observed and the observer. He thus strongly rejected a distinction which Malinowski is merely critical of: the distinction between a ‘primitive’ and ‘civilized’ culture. In many respects, this distinction was also criticised by the Structuralists – whose central figure, Claude Levi-Strauss (1908-2009), was an earlier generation than Geertz – as they argued that all human minds involved similar binary structures (see below).

However, there was a degree to which both Malinowski and Geertz did not divorce ‘culture’ from ‘biology.’ Malinowski (1922) argued that anthropological interpretations should ultimately be reducible to human instincts while Geertz (1973, 46-48) argued that culture can be reduced to biology and that culture also influences biology, though he felt that the main aim of the ethnographer was to interpret. Accordingly, it is not for the anthropologist to comment on the culture in terms of its success or the validity of its beliefs. The anthropologist’s purpose is merely to record and interpret.

The majority of those who practice this form of anthropology are interpretivists. They argue that the aim of anthropology is to understand the norms, values, symbols and processes of a society and, in particular, their ‘meaning’ – how they fit together. This lends itself to the more subjective methods of participant observation. Applying a positivist methodology to studying social groups is regarded as dangerous because scientific understanding is argued to lead to better controlling the world and, in this case, controlling people. Interpretivist anthropology has been criticized, variously, as being indebted to imperialism (see below) and as too subjective and unscientific, because, unless there is a common set of analytical standards (such as an acceptance of the scientific method, at least to some extent), there is no reason to accept one subjective interpretation over another. This criticism has, in particular, been leveled against naturalists who accept cultural relativism (see below).

Also, many naturalist anthropologists emphasize the separateness of ‘culture’ from ‘biology,’ arguing that culture cannot simply be traced back to biology but rather is, to a great extent, independent of it; a separate category. For example, Risjord (2000) argues that anthropology ‘will never reach the social reality at which it aims’ precisely because ‘culture’ cannot simply be reduced to a series of scientific explanations. But it has been argued that if the findings of naturalist anthropology are not ultimately consilient with science then they are not useful to people outside of naturalist anthropology and that naturalist anthropology draws too stark a line between apes and humans when it claims that human societies are too complex to be reduced to biology or that culture is not closely reflective of biology (Wilson 1998, Ch. 1). In this regard, Bidney (1953, 65) argues that, ‘Theories of culture must explain the origins of culture and its intrinsic relations to the psychobiological nature of man’ as to fail to do so simply leaves the origin of culture as a ‘mystery or an accident of time.’

c. Ethics and Participant Observation Fieldwork

From the 1970s, the various leading anthropological associations began to develop codes of ethics. This was, at least in part, inspired by the perceived collaboration of anthropologists with the US-led counterinsurgency groups in South American states. For example, in the 1960s, Project Camelot commissioned anthropologists to look into the causes of insurgency and revolution in South American States, with a view to confronting these perceived problems. It was also inspired by the way that increasing numbers of anthropologists were employed outside of universities, in the private sector (see Sluka 2007).

The leading anthropological bodies – such as the Royal Anthropological Institute – hold to a system of research ethics which anthropologists, conducting fieldwork, are expected, though not obliged, to adhere to. For example, the most recent American Anthropological Association Code of Ethics (1998) emphasizes that certain ethical obligations can supersede the goal of seeking new knowledge. Anthropologists, for example, may not publish research which may harm the ‘safety,’ ‘privacy’ or ‘dignity’ of those whom they study, they must explain their fieldwork to their subjects and emphasise that attempts at anonymity may sometimes fail, they should find ways of reciprocating to those whom they study and they should preserve opportunities for future fieldworkers.

Though the American Anthropological Association does not make their philosophy explicit, much of the philosophy appears to be underpinned by the golden rule. One should treat others as one would wish to be treated oneself. In this regard, one would not wish to be exploited, misled or have ones safety or privacy comprised. For some scientists, the problem with such a philosophy is that, from their perspective, humans should be an objective object of study like any other. The assertion that the ‘dignity’ of the individual should be preserved may be seen to reflect a humanist belief in the inherent worth of each human being. Humanism has been accused of being sentimental and of failing to appreciate the substantial differences between human beings intellectually, with some anthropologists even questioning the usefulness of the broad category ‘human’ (for example Grant 1916). It has also been accused of failing to appreciate that, from a scientific perspective, humans are a highly evolved form of ape and scholars who study them should attempt to think, as Wilson (1975, 575) argues, as if they are alien zoologists. Equally, it has been asked why primary ethical responsibility should be to those studied. Why should it not be to the public or the funding body? (see Sluka 2007) In this regard, it might be suggested that the code reflects the lauding of members of (often non-Western) cultures which might ultimately be traced back to the Romantic Movement. Their rights are more important than those of the funders, the public or of other anthropologists.

Equally, the code has been criticized in terms of power dynamics, with critics arguing that the anthropologist is usually in a dominant position over those being studied which renders questionable the whole idea of ‘informed consent’ (Bourgois 2007). Indeed, it has been argued that the most recent American Anthropological Association Code of Ethics (1998) is a movement to the right, in political terms, because it accepts, explicitly, that responsibility should also be to the public and to funding bodies and is less censorious than previous codes with regard to covert research (Pels 1999). This seems to be a movement towards a situation where a commitment to the group being studied is less important than the pursuit of truth, though the commitment to the subject of study is still clear.

Likewise, the most recent set of ethical guidelines from the Association of Anthropologists of the UK and the Commonwealth implicitly accepts that there is a difference of opinion among anthropologists regarding whom they are obliged to. It asserts, ‘Most anthropologists would maintain that their paramount obligation is to their research participants . . .’ This document specifically warrants against giving subjects ‘self-knowledge which they did not seek or want.’ This may be seen to reflect a belief in a form of cultural relativism. Permitting people to preserve their way of thinking is more important than their knowing what a scientist would regard as the truth. Their way of thinking – a part of their culture - should be respected, because it is theirs, even if it is inaccurate. This could conceivably prevent anthropologists from publishing dissections of particular cultures if they might be read by members of that culture (see Dutton 2009, Ch. 2). Thus, philosophically, the debate in fieldwork ethics ranges from a form of consequentialism to, in the form of humanism, a deontological form of ethics. However, it should be emphasized that the standard fieldwork ethics noted are very widely accepted amongst anthropologists, particularly with regard to informed consent. Thus, the idea of experimenting on unwilling or unknowing humans is strongly rejected, which might be interpreted to imply some belief in human separateness.

3. Anthropology since World War I

a. Cultural Determinism and Cultural Relativism

As already discussed, Western European anthropology, around the time of World War I, was influenced by eugenics and biological determinism. But as early as the 1880s, this was beginning to be questioned by German-American anthropologist Franz Boas (1858-1942) (for example Boas 1907), based at Columbia University in New York. He was critical of biological determinism and argued for the importance of environmental influence on individual personality and thus modal national personality in a way of thinking called ‘historical particularism.’

Boas emphasized the importance of environment and history in shaping different cultures, arguing that all humans were biologically relatively similar and rejecting distinctions of ‘primitive’ and civilized.’ Boas also presented critiques of the work of early evolutionists, such as Tylor, demonstrating that not all societies passed through the phases he suggested or did not do so in the order he suggested. Boas used these findings to stress the importance of understanding societies individually in terms of their history and culture (for example Freeman 1983).

Boas sent his student Margaret Mead (1901-1978) to American Samoa to study the people there with the aim of proving that they were a ‘negative instance’ in terms of violence and teenage angst. If this could be proven, it would undermine biological determinism and demonstrate that people were in fact culturally determined and that biology had very little influence on personality, something argued by John Locke (1632-1704) and his concept of the tabula rasa. This would in turn mean that Western people’s supposed teenage angst could be changed through changing the culture. After six months in American Samoa, Mead returned to the USA and published, in 1928, her influential book Coming of Age in Samoa: A Psychological Study of Primitive Youth for Western Civilization (Mead 1928). It portrayed Samoa as a society of sexual liberty in which there were none of the problems associated with puberty that were associated with Western civilization. Accordingly, Mead argued that she had found a negative instance and that humans were overwhelming culturally determined. At around the same time Ruth Benedict (1887-1948), also a student of Boas’s, published her research in which she argued that individuals simply reflected the ‘culture’ in which they were raised (Benedict 1934).

The cultural determinism advocated by Boas, Benedict and especially Mead became very popular and developed into school which has been termed ‘Multiculturalism’ (Gottfried 2004). This school can be compared to Romantic nationalism in the sense that it regards all cultures as unique developments which should be preserved and thus advocates a form of ‘cultural relativism’ in which cultures cannot be judged by the standards of other cultures and can only be comprehended in their own terms. However, it should be noted that ‘cultural relativism’ is sometimes used to refer to the way in which the parts of a whole form a kind of separate organism, though this is usually referred to as ‘Functionalism.' In addition, Harris (see Headland, Pike, and Harris 1990) distinguishes between ‘emic’ (insider) and ‘etic’ (outsider) understanding of a social group, arguing that both perspectives seem to make sense from the different viewpoints. This might also be understood as cultural relativism and perhaps raises the question of whether the two worlds can so easily be separated.  Cultural relativism also argues, as with Romantic Nationalism, that so-called developed cultures can learn a great deal from that which they might regard as ‘primitive’ cultures. Moreover, humans are regarded as, in essence, products of culture and as extremely similar in terms of biology.

Cultural Relativism led to so-called ‘cultural anthropologists’ focusing on the symbols within a culture rather than comparing the different structures and functions of different social groups, as occurred in ‘social anthropology’ (see below). As comparison was frowned upon, as each culture was regarded as unique, anthropology in the tradition of Mead tended to focus on descriptions of a group’s way of life. Thick description is a trait of ethnography more broadly but it is especially salient amongst anthropologists who believe that cultures can only be understood in their own terms. Such a philosophy has been criticized for turning anthropology into little more than academic-sounding travel writing because it renders it highly personal and lacking in comparative analysis (see Sandall 2001, Ch. 1).

Cultural relativism has also been criticized as philosophically impractical and, ultimately, epistemologically pessimistic (Scruton 2000), because it means that nothing can be compared to anything else or even assessed through the medium of a foreign language’s categories. In implicitly defending cultural relativism, anthropologists have cautioned against assuming that some cultures are more ‘rational’ than others. Hollis (1967), for example, argues that anthropology demonstrates that superficially irrational actions may become ‘rational’ once the ethnographer understands the ‘culture.’ Risjord (2000) makes a similar point. This implies that the cultures are separate worlds, ‘rational’ in themselves. Others have suggested that entering the field assuming that the Western, ‘rational’ way of thinking is correct can lead to biased fieldwork interpretation (for example Rees 2010).

Critics have argued that certain forms of behaviour can be regarded as undesirable in all cultures, yet are only prevalent in some. It has also been argued that Multiculturalism is a form of Neo-Marxism on the grounds that it assumes imperialism and Western civilization to be inherently problematic but also because it lauds the materially unsuccessful. Whereas Marxism extols the values and lifestyle of the worker, and critiques that of the wealthy, Multiculturalism promotes “materially unsuccessful” cultures and critiques more materially successful, Western cultures (for example Ellis 2004 or Gottfried 2004).

Cultural determinism has been criticized both from within and from outside anthropology. From within anthropology, New Zealand anthropologist Derek Freeman (1916-2001), having been heavily influenced by Margaret Mead, conducted his own fieldwork in Samoa around twenty years after she did and then in subsequent fieldwork visits. As he stayed there far longer than Mead, Freeman was accepted to a greater extent and given an honorary chiefly title. This allowed him considerable access to Samoan life. Eventually, in 1983 (after Mead’s death) he published his refutation: Margaret Mead and Samoa: The Making and Unmaking of an Anthropological Myth (Freeman 1983). In it, he argued that Mead was completely mistaken. Samoa was sexually puritanical, violent and teenagers experienced just as much angst as they did everywhere else. In addition, he highlighted serious faults with her fieldwork: her sample was very small, she chose to live at the American naval base rather than with a Samoan family, she did not speak Samoan well, she focused mainly on teenage girls and Freeman even tracked one down who, as an elderly lady, admitted she and her friends had deliberately lied to Mead about their sex lives for their own amusement (Freeman 1999). It should be emphasized that Freeman’s critique of Mead related to her failure to conduct participant observation fieldwork properly (in line with Malinowski’s recommendations). In that Freeman rejects distinctions of primitive and advanced, and stresses the importance of culture in understanding human differences, it is also in the tradition of Boas. However, it should be noted that Freeman’s (1983) critique of Mead has also been criticized as being unnecessarily cutting, prosecuting a case against Mead to the point of bias against her and ignoring points which Mead got right (Schankman 2009, 17).

There remains an ongoing debate about the extent to which culture reflects biology or is on a biological leash. However, a growing body of research in genetics is indicating that human personality is heavily influenced by genetic factors (for example Alarcon, Foulks, and Vakkur 1998 or Wilson 1998), though some research also indicates that environment, especially while a fetus, can alter the expression of genes (see Nettle 2007). This has become part of the critique of cultural determinism from evolutionary anthropologists.

b. Functionalism and Structuralism

Between the 1930s and 1970s, various forms of functionalism were influential in British social anthropology. These schools accepted, to varying degrees, the cultural determinist belief that ‘culture’ was a separate sphere from biology and operated according to its own rules but they also argued that social institutions could be compared in order to better discern the rules of such institutions. They attempted to discern and describe how cultures operated and how the different parts of a culture functioned within the whole. Perceiving societies as organisms has been traced back to Herbert Spencer. Indeed, there is a degree to which Durkheim (1965) attempted to understand, for example, the function of religion in society. But functionalism seemingly reflected aspects of positivism: the search for, in this case, social facts (cross-culturally true), based on empirical evidence.

E. E. Evans-Pritchard (1902-1973) was a leading British functionalist from the 1930s onwards. Rejecting grand theories of religion, he argued that a tribe’s religion could only make sense in terms of function within society and therefore a detailed understanding of the tribe’s history and context was necessary. British functionalism, in this respect, was influenced by the linguistic theories of Swiss thinker Ferdinand de Saussure (1857-1913), who suggested that signs only made sense within a system of signs. He also engaged in lengthy fieldwork. This school developed into ‘structural functionalism.’ A. R. Radcliffe-Brown (1881-1955) is often argued to be a structural functionalist, though he denied this. Radcliffe-Brown rejected Malinowski’s functionalism – which argued that social practices were grounded in human instincts. Instead, he was influenced by the process philosophy of Alfred North Whitehead (1861-1947). Radcliffe-Brown claimed that the units of anthropology were processes of human life and interaction. They are in constant flux and so anthropology must explain social stability. He argued that practices, in order to survive, must adapt to other practices, something called ‘co-adaptation’ (Radcliffe-Brown 1957). It might be argued that this leads us asking where any of the practices came from in the first place.

However, a leading member of the structural functionalist school was Scottish anthropologist Victor Turner (1920-1983). Structural functionalists attempted to understand society as a structure with inter-related parts. In attempting to understand Rites of Passage, Turner argued that everyday structured society could be contrasted with the Rite of Passage (Turner 1969). This was a liminal (transitional) phase which involved communitas (a relative breakdown of structure). Another prominent anthropologist in this field was Mary Douglas (1921-2007). She examined the contrast between the ‘sacred’ and ‘profane’ in terms of categories of ‘purity’ and ‘impurity’ (Douglas 1966). She also suggested a model – the Grid/Group Model – through which the structures of different cultures could be categorized (Douglas 1970). Philosophically, this school accepted many of the assumptions of naturalism but it held to aspects of positivism in that it aimed to answer discrete questions, using the ethnographic method. It has been criticized, as we will see below, by postmodern anthropologists and also for its failure to attempt consilience with science.

Turner, Douglas and other anthropologists in this school, followed Malinowski by using categories drawn from the study of 'tribal' cultures – such as Rites of Passage, Shaman and Totem – to better comprehend advanced societies such as that of Britain. For example, Turner was highly influential in pursuing the Anthropology of Religion in which he used tribal categories as a means of comprehending aspects of the Catholic Church, such as modern-day pilgrimage (Turner and Turner 1978). This research also involved using the participant observation method. Critics, such as Romanian anthropologist Mircea Eliade (1907-1986) (for example Eliade 2004), have insisted that categories such as ‘shaman’ only make sense within their specific cultural context. Other critics have argued that such scholarship attempts to reduce all societies to the level of the local community despite there being many important differences and fails to take into account considerable differences in societal complexity (for example Sandall 2001, Ch. 1). Nevertheless, there is a growing movement within anthropology towards examining various aspects of human life through the so-called tribal prism and, more broadly, through the cultural one. Mary Douglas, for example, has looked at business life anthropologically while others have focused on politics, medicine or education. This has been termed ‘traditional empiricism’ by critics in contemporary anthropology (for example Davies 2010).

In France, in particular, the most prominent school, during this period, was known as Structuralism. Unlike British Functionalism, structuralism was influenced by Hegelian idealism.  Most associated with Claude Levi-Strauss, structuralism argued that all cultures follow the Hegelian dialectic. The human mind has a universal structure and a kind of a priori category system of opposites, a point which Hollis argues can be used as a starting point for any comparative cultural analysis. Cultures can be broken up into components – such as ‘Mythology’ or ‘Ritual’ – which evolve according to the dialectical process, leading to cultural differences. As such, the deep structures, or grammar, of each culture can be traced back to a shared starting point (and in a sense, the shared human mind) just as one can with a language. But each culture has a grammar and this allows them to be compared and permits insights to be made about them (see, for example, Levi-Strauss 1978). It might be suggested that the same criticisms that have been leveled against the Hegelian dialectic might be leveled against structuralism, such as it being based around a dogma. It has also been argued that category systems vary considerably between cultures (see Diamond 1974). Even supporters of Levi-Strauss have conceded that his works are opaque and verbose (for example Leach 1974).

c. Post-Modern or Contemporary Anthropology

The ‘postmodern’ thinking of scholars such as Jacques Derrida (1930-2004) and Michel Foucault (1926-1984) began to become influential in anthropology in the 1970s and have been termed anthropology’s ‘Crisis of Representation.’ During this crisis, which many anthropologists regard as ongoing, every aspect of ‘traditional empirical anthropology’ came to be questioned.

Hymes (1974) criticized anthropologists for imposing ‘Western categories’ – such as Western measurement – on those they study, arguing that this is a form of domination and was immoral, insisting that truth statements were always subjective and carried cultural values. Talal Asad (1971) criticized field-work based anthropology for ultimately being indebted to colonialism and suggested that anthropology has essentially been a project to enforce colonialism. Geertzian anthropology was criticized because it involved representing a culture, something which inherently involved imposing Western categories upon it through producing texts. Marcus argued that anthropology was ultimately composed of ‘texts’ – ethnographies – which can be deconstructed to reveal power dynamics, normally the dominant-culture anthropologist making sense of the oppressed object of study through means of his or her subjective cultural categories and presenting it to his or her culture (for example Marcus and Cushman 1982). By extension, as all texts – including scientific texts – could be deconstructed, they argued, that they can make no objective assertions. Roth (1989) specifically criticizes seeing anthropology as ‘texts’ arguing that it does not undermine the empirical validity of the observations involved or help to find the power structures.

Various anthropologists, such as Roy Wagner (b. 1938) (Wagner 1981), argued that anthropologists were simply products of Western culture and they could only ever hope to understand another culture through their own. There was no objective truth beyond culture, simply different cultures with some, scientific ones, happening to be dominant for various historical reasons. Thus, this school strongly advocated cultural relativism. Critics have countered that, after Malinowski, anthropologists, with their participant observation breaking down the color bar, were in fact an irritation to colonial authorities (for example Kuper 1973) and have criticized cultural relativism, as discussed.

This situation led to what has been called the ‘reflexive turn’ in cultural anthropology. As Western anthropologists were products of their culture, just as those whom they studied were, and as the anthropologist was himself fallible, there developed an increasing movement towards ‘auto-ethnography’ in which the anthropologist analyzed their own emotions and feelings towards their fieldwork. The essential argument for anthropologists engaging in detailed analysis of their own emotions, sometimes known as the reflexive turn, is anthropologist Charlotte Davies’ (1999, 6) argument that the ‘purpose of research is to mediate between different constructions of reality, and doing research means increasing understanding of these varying constructs, among which is included the anthropologist’s own constructions’ (see Curran 2010, 109). But implicit in Davies’ argument is that there is no such thing as objective reality and objective truth; there are simply different constructions of reality, as Wagner (1981) also argues. It has also been argued that autoethnography is ‘emancipatory’ because it turns anthropology into a dialogue rather than a traditional hierarchical analysis (Heaton-Shreshta 2010, 49). Auto-ethnography has been criticized as self-indulgent and based on problematic assumptions such as cultural relativism and the belief that morality is the most important dimension to scholarship (for example Gellner 1992). In addition, the same criticisms that have been leveled against postmodernism more broadly have been leveled against postmodern anthropology, including criticism of a sometimes verbose and emotive style and the belief that it is epistemologically pessimistic and therefore leads to a Void (for example Scruton 2000). However, cautious defenders insist on the importance of being at least ‘psychologically aware’ (for example Emmett 1976) before conducting fieldwork, a point also argued by Popper (1963) with regard to conducting any scientific research. And Berger (2010) argues that auto-ethnography can be useful to the extent that it elucidates how a ‘social fact’ was uncovered by the anthropologist.

One of the significant results of the ‘Crisis of Representation’ has been a cooling towards the concept of ‘culture’ (and indeed ‘culture shock’) which was previously central to ‘cultural anthropology’ (see Oberg 1960 or Dutton 2012). ‘Culture’ has been criticized as old-fashioned, boring, problematic because it possesses a history (Rees 2010), associated with racism because it has come to replace ‘race’ in far right politics (Wilson 2002, 229), problematic because it imposes (imperialistically) a Western category on other cultures, vague and difficult to perfectly define (Rees 2010), helping to maintain a hierarchy of cultures (Abu Lughod 1991) and increasingly questioned by globalization and the breakdown of discrete cultures (for example Eriksen 2002 or Rees 2010). Defenders of culture have countered that many of these criticisms can be leveled against any category of apprehension and that the term is not synonymous with ‘nation’ so can be employed even if nations become less relevant (for example Fox and King 2002). Equally, ‘culture shock,’ formerly used to describe a rite of passage amongst anthropologists engaging in fieldwork, has been criticized because of its association with culture and also as old-fashioned (Crapanzano 2010).

In addition, a number of further movements have been provoked by the postmodern movement in anthropology. One of these is ‘Sensory Ethnography’ (for example Pink 2009). It has been argued that traditionally anthropology privileges the Western emphasis on sight and the word and that ethnographies, in order to avoid this kind of cultural imposition, need to look at other senses such as smell, taste and touch. Another movement, specifically in the Anthropology of Religion, has argued that anthropologists should not go into the field as agnostics but should accept the possibility that the religious perspective of the group which they are studying may actually be correct and even work on the assumption that it is and engage in analysis accordingly (a point discussed in Engelke 2002).

During the same period, schools within anthropology developed based around a number of other fashionable philosophical ideologies. Feminist anthropology, like postmodern anthropology, began to come to prominence in the early 1970s. Philosophers such as Sandra Harding (1991) argued that anthropology had been dominated by men and this had led to anthropological interpretations being androcentric and a failure to appreciate the importance of women in social organizations. It has also led to androcentric metaphors in anthropological writing and focusing on research questions that mainly concern men. Strathern (1988) uses what she calls a Marxist-Feminist approach. She employs the categories of Melanesia in order to understand Melanesian gender relations to produce an ‘endogenous’ analysis of the situation. In doing so, she argues that actions in Melanesia are gender-neutral and the asymmetry between males and females is ‘action-specific.’ Thus, Melanesian women are not in any permanent state of social inferiority to men. In other words, if there is a sexual hierarchy it is de facto rather than de jure.

Critics have countered that prominent feminist interpretations have simply turned out to be empirically inaccurate. For example, feminist anthropologists, such as Weiner (1992) as well as philosopher Susan Dahlberg (1981), argued that foraging societies prized females and were peaceful and sexually egalitarian. It has been countered that this is a projection of feminist ideals which does not match with the facts (Kuznar 1997, Ch. 3). It has been argued that it does not follow that just because anthropology is male-dominated it is thus biased (Kuznar 1997, Ch. 3). However, feminist anthropologist Alison Wylie (see Risjord 1997) has argued that ‘politically motivated critiques’ including feminist ones, can improve science. Feminist critique, she argues, demonstrates the influence of ‘androcentric values’ on theory which forces scientists to hone their theories.

Another school, composed of some anthropologists from less developed countries or their descendants, have proffered a similar critique, shifting the feminist view that anthropology is androcentric by arguing that it is Euro-centric. It has been argued that anthropology is dominated by Europeans, and specifically Western Europeans and those of Western European descent, and therefore reflects European thinking and bias. For example, anthropologists from developing countries, such as Greenlandic Karla Jessen-Williamson, have argued that anthropology would benefit from the more holistic, intuitive thinking of non-Western cultures and that this should be integrated into anthropology (for example Jessen-Williamson 2006). American anthropologist Lee Baker (1991) describes himself as ‘Afro-Centric’ and argues that anthropology must be critiqued due to being based on a ‘Western’ and ‘positivistic’ tradition which is thus biased in favour of Europe. Afrocentric anthropology aims to shift this to an African (or African American) perspective. He argues that metaphors in anthropology, for example, are Euro-centric and justify the suppression of Africans. Thus, Afrocentric anthropologists wish to construct an ‘epistemology’ the foundations of which are African. The criticisms leveled against cultural relativism have been leveled with regard to such perspectives (see Levin 2005).

4. Philosophical Dividing Lines

a. Contemporary Evolutionary Anthropology

The positivist, empirical philosophy already discussed broadly underpins current evolutionary anthropology and there is an extent to which it, therefore, crosses over with biology. This is inline with the Consilience model, advocated by Harvard biologist Edward Wilson (b. 1929) (Wilson 1998), who has argued that the social sciences must attempt to be scientific, in order to share in the success of science, and, therefore, must be reducible to the science which underpins them. Contemporary evolutionary anthropologists, therefore, follow the scientific method, and often a quantitative methodology, to answer discrete questions and attempt to orient anthropological research within biology and the latest discoveries in this field. Also some scholars, such as Derek Freeman (1983), have defended a more qualitative methodology but, nevertheless, argued that their findings need to be ultimately underpinned by scientific research.

For example, anthropologist Pascal Boyer (2001) has attempted to understand the origins of ‘religion’ by drawing upon the latest research in genetics and in particular research into the functioning of the human mind. He has examined this alongside evidence from participant observation in an attempt to ‘explain’ religion. This subsection of evolutionary anthropology has been termed ‘Neuro-anthropology’ and attempts to better understand ‘culture’ through the latest discoveries in brain science. There are many other schools which apply different aspects of evolutionary theory – such as behavioral ecology, evolutionary genetics, paleontology and evolutionary psychology – to understanding cultural differences and different aspects of culture or subsections of culture such as ‘religion.’ Some scholars, such as Richard Dawkins (b. 1941) (Dawkins 1976), have attempted to render the study of culture more systematic by introducing the concept of cultural units – memes – and attempting to chart how and why certain memes are more successful than others, in light of research into the nature of the human brain.

Critics, in naturalist anthropology, have suggested that evolutionary anthropologists are insufficiently critical and go into the field thinking they already know the answers (for example Davies 2010). They have also argued that evolutionary anthropologists fail to appreciate that there are ways of knowing other than science. Some critics have also argued that evolutionary anthropology, with its acceptance of personality differences based on genetics, may lead to the maintenance of class and race hierarchies and to racism and discrimination (see Segerstråle 2000).

b. Anthropology: A Philosophical Split?

It has been argued both by scholars and journalists that anthropology, more so than other social scientific disciplines, is rent by a fundamental philosophical divide, though some anthropologists have disputed this and suggested that qualitative research can help to answer scientific research questions as long as naturalistic anthropologists accept the significance of biology.

The divide is trenchantly summarized by Lawson and McCauley (1993) who divide between ‘interpretivists’ and ‘scientists,’ or, as noted above, ‘positivists’ and ‘naturalists.’ For the scientists, the views of the ‘cultural anthropologists’ (as they call themselves) are too speculative, especially because pure ethnographic research is subjective, and are meaningless where they cannot be reduced to science. For the interpretivists, the ‘evolutionary anthropologists’ are too ‘reductionistic’ and ‘mechanistic,’ they do not appreciate the benefits of subjective approach (such as garnering information that could not otherwise be garnered), and they ignore questions of ‘meaning,’ as they suffer from ‘physics envy.’

Some anthropologists, such as Risjord (2000, 8), have criticized this divide arguing that two perspectives can be united and that only through ‘explanatory coherence’ (combining objective analysis of a group with the face-value beliefs of the group members) can a fully coherent explanation be reached. Otherwise, anthropology will ‘never reach the social reality at which it aims.’ But this seems to raise the question of what it means to ‘reach the social reality.’

In terms of physical action, the split has already been happening, as discussed in Segal and Yanagisako (2005, Ch. 1). They note that some American anthropological departments demand that their lecturers are committed to holist ‘four field anthropology’ (archaeology, cultural, biological and linguistic) precisely because of this ongoing split and in particular the divergence between biological and cultural anthropology. They observe that already by the end of the 1980s most biological anthropologists had left the American Anthropological Association. Though they argue that ‘holism’ was less necessary in Europe – because of the way that US anthropology, in focusing on Native Americans, ‘bundled’ the four - Fearn (2008) notes that there is a growing divide in British anthropology departments as well along the same dividing lines of positivism and naturalism.

Evolutionary anthropologists and, in particular, postmodern anthropologists do seem to follow philosophies with essentially different presuppositions. In November 2010, this divide became particularly contentious when the American Anthropological Association voted to remove the word ‘science’ from its Mission Statement (Berrett 2010).

5. References and Further Reading

  • Abu-Lughod, Lila. 1991. “Writing Against Culture.” In Richard Fox (ed.), Recapturing Anthropology: Working in the Present (pp. 466-479). Santa Fe: School of American Research Press.
  • Alarcon, Renato, Foulks, Edward and Vakkur, Mark. 1998. Personality Disorders and Culture: Clinical and Conceptual Interactions. New York: John Wiley and Sons.
  • American Anthropological Association. 1998. “American Anthropological Association Statement on “Race.”” 17 May.
  • Andreski, Stanislav. 1974. Social Sciences as Sorcery. London: Penguin.
  • Asad, Talal. 1971. “Introduction.” In Talal Asad (ed.), Anthropology and the Colonial Encounter. Atlantic Highlands: Humanities Press.
  • Baiburin, Albert. 2005. “The Current State of Ethnography and Anthropology in Russia.” For Anthropology and Culture 2, 448-489.
  • Baker, Lee. 1991. “Afro-Centric Racism.” University of Pennsylvania: African Studies Center.
  • Barenski, Janusz. 2008. “The New Polish Anthropology.” Studio Ethnologica Croatica 20, 211-222.
  • Beddoe, John. 1862. The Races of Britain: A Contribution to the Anthropology of Western Europe. London.
  • Benedict, Ruth. 1934. Patterns of Culture. New York: Mifflin.
  • Berger, Peter. 2010. “Assessing the Relevance and Effects of “Key Emotional Episodes” for the Fieldwork Process.” In Dimitrina Spencer and James Davies (eds.), Anthropological Fieldwork: A Relational Process (pp. 119-143). Newcastle: Cambridge Scholars Press.
  • Berrett, Daniel. 2010. “Anthropology Without Science.” Inside Higher Ed, 30 November.
  • Bidney, David. 1953. Theoretical Anthropology. New York: Columbia University Press.
  • Boas, Franz. 1907. The Mind of Primitive Man. New York: MacMillan.
  • Bourgois, Philippe. 2007. “Confronting the Ethics of Ethnography: Lessons from Fieldwork in Central America.” In Antonius Robben and Jeffrey Slukka (eds.), Ethnographic Fieldwork: An Anthropological Reader (pp. 288-297). Oxford: Blackwell.
  • Boyer, Pascal. 2001. Religion Explained: The Human Instincts That Fashion Gods, Spirits and Ancestors. London: William Heinnemann.
  • Cattell, Raymond. 1972. Beyondism: A New Morality from Science. New York: Pergamon.
  • Ciubrinskas, Vytis. 2007. “Interview: “Anthropology is Badly Needed in Eastern Europe.””
  • Curran, John. 2010. “Emotional Interaction and the Acting Ethnographer: An Ethical Dilemma?” In Dimitrina Spencer and James Davies (eds.), Anthropological Fieldwork: A Relational Process (pp. 100-118). Newcastle: Cambridge Scholars Press.
  • Crapanzano, Vincent. 2010. ““At the Heart of the Discipline”: Critical Reflections on Fieldwork.” In James Davies and Dimitrina Spencer (eds.), Emotions in the Field: The Psychology and Anthropology of Fieldwork Experience (pp. 55-78). Stanford: Stanford University Press.
  • Dahlberg, Frances. 1981. “Introduction.” In Frances Dahlberg (ed.), Woman the Gatherer (pp. 1-33). New Haven: Yale University Press.
  • Darwin, Charles. 1871. The Descent of Man. London: John Murray.
  • Darwin, Charles. 1859. The Origin of Species. London: John Murray.
  • Davies, James. 2010. “Conclusion: Subjectivity in the Field: A History of Neglect.” In Dimitrina Spencer and James Davies (eds.), Anthropological Fieldwork: A Relational Process (pp. 229-243). Newcastle: Cambridge Scholars Publishing.
  • Davies, Charlotte. 1999. Reflexive Ethnography: A Guide to Researching Selves and Others. London: Routledge.
  • Dawkins, Richard. 1976. The Selfish Gene. Oxford: Oxford University Press.
  • Diamond, Stanley. 1974. In Search of the Primitive. New Brunswick: Transaction Books.
  • Douglas, Mary. 1970. Natural Symbols: Explorations in Cosmology. London: Routledge.
  • Douglas, Mary. 1966. Purity and Danger: An Analysis of the Concepts of Pollution and Taboo. London: Routledge.
  • Durkheim, Emile. 1995. The Elementary Forms of Religious Life. New York: Free Press.
  • Dutton, Edward. 2012. Culture Shock and Multiculturalism. Newcastle: Cambridge Scholars Publishing.
  • Ehrenfels, Umar Rolf, Madan, Triloki Nath, and Comas, Juan. 1962. “Mankind Quarterly Under Heavy Criticism: 3 Comments on Editorial Practices.” Current Anthropology 3, 154-158.
  • Eliade, Mircea. 2004. Shamanism: Archaic Technique of Ecstasy. Princeton: Princeton University Press.
  • Ellis, Frank. 2004. Political Correctness and the Theoretical Struggle: From Lenin and Mao to Marcus and Foucault. Auckland: Maxim Institute.
  • Emmet, Dorothy. 1976. “Motivation in Sociology and Social Anthropology.” Journal for the Theory of Social Behaviour 6, 85-104.
  • Engelke, Matthew. 2002. “The Problem of Belief: Evans-Pritchard and Victor Turner on the Inner Life.” Anthropology Today 18, 3-8.
  • Eriksen, Thomas Hylland. 2003. “Introduction.” In Thomas Hylland Eriksen (ed.), Globalisation: Studies in Anthropology (pp. 1-17). London: Pluto Press.
  • Eriksen, Thomas Hylland. 2001. A History of Anthropology. London: Pluto Press.
  • Fearn, Hannah. 2008. “The Great Divide.” Times Higher Education, 28 November.
  • Fox, Richard, and King, Barbara. 2002. “Introduction: Beyond Culture Worry.” In Richard Fox and Barbara King (eds.), Anthropology Beyond Culture (pp. 1-19). Oxford: Berg.
  • Frazer, James. 1922. The Golden Bough: A Study in Magic and Religion. London: MacMillan.
  • Freeman, Derek. 1999. The Fateful Hoaxing of Margaret Mead: A Historical Analysis of Her Samoan Research. London: Basic Books.
  • Freeman, Derek. 1983. Margaret Mead and Samoa: The Making and Unmaking of an Anthropological Myth. Cambridge: Harvard University Press.
  • Galton, Francis. 1909. Essays in Eugenics. London: Eugenics Education Society.
  • Geertz, Clifford. 1973. The Interpretation of Cultures. New York: Basic Books.
  • Geertz, Clifford. 1999. “From the Native’s Point of View’: On the Nature of Anthropological Understanding.” In Russell T. McCutcheon (ed.), The Insider/Outsider Problem in the Study of Religion: A Reader (pp. 50-63). New York: Cassell.
  • Gellner, Ernest. 1995. Anthropology and Politics: Revolutions in the Sacred Grove. Oxford: Blackwell.
  • Gellner, Ernest. 1992. Post-Modernism, Reason and Religion. London: Routledge.
  • Gobineau, Arthur de. 1915. The Inequality of Races. New York: G. P. Putnam and Sons.
  • Gorton, William. 2010. “The Philosophy of Social Science.” Internet Encyclopedia of Philosophy.
  • Gottfried, Paul. 2004. Multiculturalism and the Politics of Guilt: Towards a Secular Theocracy. Columbia: University of Missouri Press.
  • Grant, Madison. 1916. The Passing of the Great Race: Or the Racial Basis of European History. New York: Charles Scribner’s Sons.
  • Hallpike, Christopher Robert. 1986. The Principles of Social Evolution. Oxford: Clarendon Press.
  • Harding, Sandra. 1991. Whose Science? Whose Knowledge? Thinking from Women’s Lives. Ithaca: Cornell University Press.
  • Headland, Thomas, Pike, Kenneth, and Harris, Marvin. 1990. Emics and Etics: The Insider/ Outsider Debate. New York: Sage Publications.
  • Heaton-Shreshta, Celayne. 2010. “Emotional Apprenticeships: Reflections on the Role of Academic Practice in the Construction of “the Field.”” In Dimitrina Spencer and James Davies (eds.), Anthropological Fieldwork: A Relational Process (pp. 48-74). Newcastle: Cambridge Scholars Publishing.
  • Hollis, Martin. 1967. “The Limits of Irrationality.” European Journal of Sociology 8, 265-271.
  • Hymes, Dell. 1974. “The Use of Anthropology: Critical, Political, Personal.” In Dell Hymes (ed.), Reinventing Anthropology (pp. 3-82). New York: Vintage Books.
  • Jessen Williamson, Karla. 2006. Inuit Post-Colonial Gender Relations in Greenland. Aberdeen University: PhD Thesis.
  • Kapferer, Bruce. 2001. “Star Wars: About Anthropology, Culture and Globalization.” Suomen Antropologi: Journal of the Finnish Anthropological Society 26, 2-29.
  • Kuper, Adam. 1973. Anthropologists and Anthropology: The British School 1922-1972. New York: Pica Press.
  • Kuznar, Lawrence. 1997. Reclaiming a Scientific Anthropology. Walnut Creek: AltaMira Press.
  • Lawson, Thomas, and McCauley, Robert. 1993. Rethinking Religion: Connecting Cognition and Culture. Cambridge: Cambridge University Press.
  • Leach, Edmund. 1974. Claude Levi-Strauss. New York: Viking Press.
  • Levi-Strauss, Claude. 1978. Myth and Meaning. London: Routledge.
  • Levin, Michael. 2005. Why Race Matters. Oakton: New Century Foundation.
  • Lovejoy, Arthur. 1936. The Great Chain of Being: A Study of the History of an Idea. Cambridge: Harvard University Press.
  • Lynn, Richard. 2006. Race Differences in Intelligence: An Evolutionary Analysis. Augusta: Washington Summit Publishers.
  • Malinowski, Bronislaw. 1922. Argonauts of the Western Pacific. London: Routledge.
  • Marcus, George, and Cushman, Dick. 1974. “Ethnographies as Texts.” Annual Review of Anthropology 11, 25-69.
  • Mead, Margaret. 1928. Coming of Age in Samoa: A Psychological Study of Primitive Youth for Western Civilization. London: Penguin.
  • Montagu, Ashley. 1945. Man’s Most Dangerous Myth: The Fallacy of Race. New York: Columbia University Press.
  • Nettle, Daniel. 2007. Personality: What Makes Us the Way We Are. Oxford: Oxford University Press.
  • Oberg, Kalervo. 1960. “Culture Shock: Adjustment to New Cultural Environments.” Practical Anthropology 7, 177-182.
  • Pearson, Roger. 1991. Race, Intelligence and Bias in Academe. Washington DC: Scott-Townsend Publishers.
  • Pels, Peter. 1999. “Professions of Duplexity: A Prehistory of Ethical Codes.” Current Anthropology 40, 101-136.
  • Pichot, Andre. 2009. The Pure Society: From Darwin to Hitler. London: Verso.
  • Pike, Luke. 1869. “On the Alleged Influence of Race Upon Religion.” Journal of the Anthropological Society of London 7, CXXXV-CLIII.
  • Pink, Sarah. 2009. Doing Sensory Ethnography. London: Sage Publications.
  • Popper, Karl. 1963. Conjectures and Refutations: The Growth of Scientific Knowledge. London: Routledge.
  • Popper, Karl. 1957. The Poverty of Historicism. London: Routledge.
  • Radcliffe-Brown, Alfred Reginald. 1957. A Natural Science of Society. Chicago: University of Chicago Press.
  • Rees, Tobias. 2010. “On the Challenge – and the Beauty – of (Contemporary) Anthropological Inquiry: A Response to Edward Dutton.” Journal of the Royal Anthropological Institute 16, 895-900.
  • Risjord, Mark. 2007. “Scientific Change as Political Action: Franz Boas and the Anthropology of Race.” Philosophy of Social Science 37, 24-45.
  • Risjord, Mark. 2000. Woodcutters and Witchcraft: Rationality and the Interpretation of Change in the Social Sciences. New York: University of New York Press.
  • Rogan, Bjarn. 2012. “The Institutionalization of Folklore.” In Regina Bendix and Galit Hasam-Rokem (eds.), A Companion to Folklore (pp. 598-630). Oxford: Blackwell.
  • Roth, Paul. 1989. “Anthropology Without Tears.” Current Anthropology 30, 555-569.
  • Sandall, Roger. 2001. The Culture Cult: On Designer Tribalism and Other Essays. Oxford: Westview Press.
  • Sarkany, Mihaly. ND. “Cultural and Social Anthropology in Central and Eastern Europe.” Liebnitz Institute for the Social Sciences.
  • Shankman, Paul. 2009. The Trashing of Margaret Mead: Anatomy of an Anthropological Controversy. Madison: University of Wisconsin Press.
  • Scruton, Roger. 2000. Modern Culture. London: Continuum.
  • Sedgwick, Mitchell. 2006. “The Discipline of Context: On Ethnography Amongst the Japanese.” In Joy Hendry and Heung Wah Wong (eds.), Dismantling the East-West Dichotomy: Essays in Honour of Jan van Bremen (pp. 64-68). New York: Routledge.
  • Segal, Daniel, and Yanagisako, Sylvia. 2005. “Introduction.” In Daniel Segal and Sylvia Yanagisako (eds.), Unwrapping the Sacred Bundle: Reflections on the Disciplining of Anthropology (pp. 1-23). Durham: Duke University Press.
  • Segerstråle, Ullica. 2000. Defenders of the Truth: The Sociobiology Debate. Oxford: Oxford University Press.
  • Siikala, Jukka. 2006. “The Ethnography of Finland.” Annual Review of Anthropology 35, 153-170.
  • Sluka, Jeffrey. 2007. “Fieldwork Ethics: Introduction.” In Antonius Robben and Jeffrey Slukka (eds.), Ethnographic Fieldwork: An Anthropological Reader (pp. 271-276). Oxford: Blackwell.
  • Spencer, Herbert. 1873. The Study of Sociology. New York: D. Appleton and Co.
  • Srivastava, Vinay Kumar. 2000. “Teaching Anthropology.” Seminar 495.
  • Strathern, Marylin. 1988. The Gender of the Gift: Problems with Women and Problems with Society in Melanesia. Berkley: University of California Press.
  • Stocking, George. 1991. Victorian Anthropology. New York: Free Press.
  • Tucker, William. 2002. The Funding of Scientific Racism: Wycliffe Draper and the Pioneer Fund. Illinois: University of Illinois Press.
  • Turner, Victor. 1969. The Ritual Process: Structure and Anti-Structure. New York: Aldine Publishers.
  • Turner, Victor, and Turner, Edith. 1978. Image and Pilgrimage in Christian Culture: Anthropological Perspectives. New York: Columbia University Press.
  • Tylor, Edward Burnett. 1871. Primitive Culture: Researches into the Development of Mythology, Religion, Art and Custom. London: John Murray.
  • Tylor, Edward Burnett. 1865. Researchers into the Early History of Mankind. London: John Murray.
  • Wagner, Roy. 1981. The Invention of Culture. Chicago: University of Chicago Press.
  • Weiner, Annette. 1992. Inalienable Possession. Los Angeles: University of California Press.
  • Wilson, Edward Osborne. 1998. Consilience: Towards the Unity of Knowledge. New York: Alfred A. Knopf.  
  • Wilson, Edward Osborne. 1975. Sociobiology: A New Synthesis. Cambridge: Harvard University Press.
  • Wilson, Richard. 2002. “The Politics of Culture in Post-apartheid South Africa.” In Richard Fox and Barbara King (eds.), Anthropology Beyond Culture (pp. 209-234). Oxford: Berg.


Author Information

Edward Dutton
University of Oulu

Modern Morality and Ancient Ethics

It is commonly supposed that there is a vital difference between ancient ethics and modern morality. For example, there appears to be a vital difference between virtue ethics and the modern moralities of deontological ethics (Kantianism) and consequentialism (utilitarianism). At second glance, however, one acknowledges that both ethical approaches have more in common than their stereotypes may suggest. Oversimplification, fallacious interpretations, as well as a broad variation within a particular ethical theory make it in general harder to determine the real differences and similarities between ancient ethics and modern morality. But why should we bother about ancient ethics at all? What is the utility of comparing the strengths and weaknesses of the particular approaches? The general answer is that a proper understanding of the strengths and weaknesses of virtue ethics and modern moral theories can be used to overcome current ethical problems and to initiate fruitful developments in ethical reasoning and decision-making.

This article examines the differences and similarities between ancient ethics and modern morality by analysing and comparing their main defining features in order to show that the two ethical approaches are less distinct than one might suppose. The first part of the article outlines the main ethical approaches in Ancient Greek ethics by focusing on the Cynics, the Cyrenaics, Aristotle’s virtue ethics, the Epicureans, and the Stoics. This part also briefly outlines the two leading modern ethical approaches, that is, Kantianism and utilitarianism, in more general terms in order to provide a sufficient background. The second part provides a detailed table with the main defining features of the conflicting stereotypes of ancient ethics and modern morality. Three main issues – the good life versus the good action, the use of the term “moral ought,” and whether a virtuous person can act in a non-virtuous way – are described in more detail in the third part of the article in order to show that the differences have more in common than the stereotypes may initially suggest. The fourth part deals with the idea of the moral duty in ancient ethics.

Table of Contents

  1. Ancient Ethics and Modern Morality
    1. Ethics and Morality
    2. Ancient Ethics
      1. The Cynics and the Cyrenaics – The Extremes
      2. The Peripatetic School – Aristotle’s Virtue Ethics
      3. Epicureanism and Stoicism
    3. Modern Morality
      1. Kantianism
      2. Utilitarianism
    4. The Up-shot
  2. The Table of Ancient Ethics and Modern Morality – A Comparison
  3. Ancient Ethics and Modern Morality – The Main Differences
    1. The Good Life versus the Good Action
    2. The Moral Ought
    3. Can a Virtuous Person Act in a Non-Virtuous Way?
  4. Special Problem: Kant and Aristotle – Moral Duty and For the Sake of the Noble
  5. Conclusion
  6. References and Further Reading

1. Ancient Ethics and Modern Morality

There are at least two main criteria that each moral theory must fulfil: first, the criterion of justification (that is, the particular moral theory should not contain any contradictions) and, second, the criterion of applicability (that is, the particular moral theory should solve concrete problems and offer ethical orientation). However, many (traditional) moral theories are unable to meet the second criterion and simply fall short of the high demands of applied ethics to solve the complex moral problems of our times. Why is this the case? The main point is that the traditional moral theories are not sufficiently well equipped to deal with completely new problems such as issues concerning nuclear power, gene technology, and cloning and so forth. Therefore, there is constant interest in updating and enhancing a particular moral theory in order to make it compatible with the latest demands. Examples are neo-Aristotelians such as Hursthouse on abortion (1991) and on nature (2007), as well as neo-Kantians such as Regan on animals (1985), Korsgaard in general and in particular on animals and nature (1996), and Altman’s edited volume on the use and limits of Kant’s practical philosophy in applied ethics (2011). This is a difficult and often very complex process.

a. Ethics and Morality

When people talk about ethical approaches in Antiquity, they refer to these approaches by using the words “ancient ethics” rather than “ancient morality”. They talk about “virtue ethics” and not about “virtue morality”. But, why is this the case? The challenging question is, according to Annas (1992: 119-120), whether ancient scholars such as Plato and Aristotle as well as the Stoics and Epicureans are really talking about morality at all, since their main focus is limited to the agent’s happiness, which obviously “doesn’t sound much like morality” (119). Even if one acknowledges the fact that happiness means a satisfactory and well-lived life according to the ethical virtues and not only a happy moment or so, it still does not sound like morality. Furthermore, the general idea in virtue ethics, that the good of other people enters the scene by being a part of one’s own good and that, for example, the notion of justice is introduced as a character trait and not as the idea of the rights of others (see, Dworkin’s phrase, “rights as trumps”), makes it obvious that there is a systematic difference between the notions of ethics and morality. Ancient ethics is about living a good and virtuous life according to the ethical virtues, that is, to become a virtuous person, while the modern notion of morality is primarily focused on the interests of other people and the idea of deontological constraints. That is, one acts morally because one has to meet certain standards and not because it supports one’s own good life. But even this simple picture might be premature depending on how one conceives the idea of “moral motivation” in ancient ethics (see, below).

Historically speaking, from a different perspective, there is no evidence which term is most legitimate. In Ancient Greek history, the Greek term for ethics is êthos and means something like character. When Aristotle analyses the good life in the Nicomachean Ethics and the Eudemian Ethics, he therefore focuses on the central topic of good and bad character traits that is virtues and vices. In this original sense, ethics means an analysis about the character or character traits. In Ancient Roman thought, which was essentially influenced by Cicero, the Greek term ethikos (the adjective to êthos) was translated with the Latin term moralis (the adjective of mores) whereas the Latin term mores, in fact, means habits and customs. It is possible to translate the Greek term êthos with habits and customs, but it is more likely that the translation of ethikos with moralis was a mistranslation. The term moralis rather refers to the Greek ethos whose primary meaning is habits and customs. If the term morality refers to mores, then the term morality means the totality of all habits and customs of a given community. The term moralis became a terminus technicus in the Latin-shaped philosophy, which covers the present meaning of the term. In modern times, the habits and customs of a given community are termed ‘conventions’, which are authoritative for the social life in society. Morality, however, is not simply a matter of mere convention but the latter often conflicts with morality (for example, an immoral convention), hence, it seems inappropriate to shorten the term in this way (Steinfath 2000). At present, there are, at least, four different possibilities to distinguish between ethics and morality:

  1. Ethics and morality as distinct spheres: Ethics has to do with the pursuit of one’s own happiness or well-being and private lifestyle, that is, how we should live to make good lives for ourselves. Morality has to do with other people’s interests and deontological constraints (for example Jürgen Habermas).
  2. The equation of ethics and morality (for example Peter Singer).
  3. Morality as a special field in the ethical realm: Ethics is the generic term for ethical and moral issues in the above-mentioned sense. Morality is a special part of ethics (for example, Bernard Williams).
  4. Morality as the object of ethics: Ethics is the philosophical theory of morality which is the systematic analysis of moral norms and values (standard reading).

The upshot is that it is always important to ask how the terms ethics and morality are used and how one uses them for oneself. It is certain that one makes a textual and not only a conceptual differentiation by claiming that the terms differ.

b. Ancient Ethics

It is impossible to give a complete depiction of the rich history of ethical reasoning and decision-making in Antiquity here, therefore the focus of this section concerns the main lines of ethical reasoning of the most important philosophical schools in the classic and Hellenistic period. This rather simplified overview is nonetheless sufficient for our purposes. One can roughly distinguish the classic and Hellenistic periods into four different but closely connected parts. The first part concerns Socrates and his arguments with the Sophists (second half of the fifth century BC); the second part covers the post-Socratian formation of important philosophical schools deeply influenced by Socratic thought for example Antisthenes’ school of the Cynics, Aristippus’ school of the Cyrenaics, and Plato’s Academy which is the most influential ancient school (second half of the fifth and fourth centuries BC). The third part is characterized, on the one hand, by the formation of one new major philosophical school, namely Aristotle’s peripatetic school, which developed from Plato’s Academy, and, on the other hand, by the exchange of arguments among the existing schools on various issues (fourth century BC). The fourth part concerns the formation of two new important philosophical schools, which become highly influential in Antiquity, first, Epicurus’ school of epicureanism standing in the tradition of the Cyrenaics and, secondly, Zeno’s school of the Stoics which partly developed from the Cynics (second half of the fourth and third century BC). All the philosophical schools – being at odds with each other – are still united by the fact that they are deeply concerned with the most important ethical questions of how to live a good life and how to achieve happiness. Their responses to these vital questions are, of course, diverse.

ancient and modern ethics Figure 1

Figure 1. The Most Prominent Philosophical Schools in Ancient Greece

The following brief depiction focuses on the basic ethical assumptions of the philosophical schools of the Cynics and Cyrenaics, the peripatetic school, the Epicureans, and the Stoics. Socrates and Plato’s Academy are left out by virtue that Socrates did not provide any (written) systematic ethics. His unsystematic ethical position is mainly depicted in Plato’s early dialogues, for example Laches, Charmides, Protagoras and some of Xenophon’s works, such as Apology, Symposium, and Memorabilia. Plato himself did not provide any systematic ethics comparable to the other main ancient schools either, even though one can certainly reconstruct – at least to some extent – his ethical viewpoint in the dialogue Politeia. In addition, most (ethical) works of the classic and Hellenistic periods are lost in the dark of history; what remains is a collection of fragments, phrases, and (parts of) letters of various important philosophers (and commentators) standing in the tradition of particular schools at that time. Many rival views on ethics are mediated through the works of Plato and Aristotle, in which they criticize their opponents. In addition, some of these rudiments and testimonials were also mediated by famous writers and politicians such as Xenophon (fifth and fourth century BC) and the important historian of philosophy Diogenes Laertios (third century AD). Aristotle, however, is the only ancient philosopher whose two substantial and complete ethical contributions, that is, the Nicomachean Ethics and the Eudemian Ethics – leaving aside the Magna Moralia of which the authorship is unclear – have survived, even though all of his dialogues including those that are concerned with ethics and ethical issues are also lost.

i. The Cynics and the Cyrenaics – The Extremes

The founder of the school of the Cynics, Antisthenes of Athens, taught that virtue in terms of practical wisdom is a good and also sufficient for eudaimonia, that is, happiness. Badness is an evil and everything else is indifferent. In accord with Socrates, Antisthenes claimed that virtue is teachable and he also accepted the doctrine of the unity of the virtues which is the general idea that if a person possesses one ethical virtue, then he or she thereby possesses all other ethical virtues as well (for a recent contribution to this controversial doctrine, see Russell, 2009). The only good of human beings is that what is peculiar to them, that is, their ability to reason. Against the Cyrenaics he argues that pleasure is never a good. Things such as death, illness, servitude, poverty, disgrace, and hard labour are only supposed to be bad but are not real evils. One should be indifferent towards one’s honour, property, liberty, health and life (committing suicide was allowed). The Cynics, in general, lived a beggar’s life and were probably the first real cosmopolitans in human history – a feature that the Stoics wholeheartedly adopted later. They were also against the common cultural and religious rites and practices, a main feature which they shared with the Sophists. They took Socratian frugality to extremes and tried to be as independent of material goods as possible, like Diogenes of Sinope who lived in a barrel. Furthermore, one should abstain from bad things and seek apathy and tranquillity, which are important features the Stoics adopted from the Cynics as well. According to the Cynics, there are two groups of people: first, the wise people living a perfect and happy life – they cannot lose their virtues once they achieved this condition (similar to Aristotle) – and, secondly, the fools who are unhappy and make mistakes (Diogenes Laertios VI, 1 and 2; Zeller 1883: 116-121; Long 2007: 623-629).

Aristippus of Cyrene was well known and highly regarded among philosophers in Antiquity and was the first Socratian disciple who took money in exchange for lessons. He was the founder of the Cyrenaics – a famous philosophical school whose members were devoted to (sensualistic) hedonism (which certainly influenced Jeremy Bentham’s version of hedonistic utilitarianism). Thereby, the school of the Cyrenaics stands in striking contrast to the Cynics. Aristippus claims that knowledge is valuable only insofar as it is useful in practical matters (a feature that the Cyrenaics share with the Cynics); all actions should strive for the utmost pleasure since pleasure is the highest good. There are gradual qualitative differences of the goods. Unlike Aristotle the Hedonists believed that happiness understood as a long-term state is not the overall purpose in life but the bodily pleasure of the very moment, which is the goal of life. The past has gone by and the future is uncertain therefore only the here and now is decisive since the immediate feelings are the only guide to what is really genuinely valuable. Practical wisdom is the precondition of happiness in being instrumentally useful for achieving pleasure. Aristippus and the Cyrenaics were seeking maximum pleasure in each moment without being swamped by it. Aristippus – known for his cheerful nature and praiseworthy character as well as his distinguished restraint – famously claimed that one should be the master in each moment: “I possess, but I am not possessed”. A. A. Long rightly claims: “Aristippus Senior had served as the paradigm of a life that was both autonomous and effortlessly successful in turning circumstances into sources of bodily enjoyment” (2007: 636). Aristippus was a true master in making the best out of each situation; he also taught that one should be able to limit one’s wishes if they are likely to cause severe problems for oneself, to preserve self-control (a general feature he shares with Socrates), to secure one’s happiness, to seek inner freedom, and to be cheerful. Obviously his teachings of a life solely devoted to bodily pleasure – that is, his pursuit of lust and his view concerning the unimportance of knowledge – stand in striking contrast to Socrates’ teachings (as well as to Plato and Aristotle). His disciples – most notably Aristippus the Younger, Theodoros, Anniceris (who bought the release of Plato), and Hegesias – established new Cyrenaic schools offering sophisticated versions of hedonism by virtue of fruitful disputes with Epicurus and the Cynics (for a brief overview on Aristippus’ disciples, see A. A. Long 2007: 632-639 and for the teachings, for example, Diogenes Laertios II, 8; Zeller 1883: 121-125; Döring 1988. For the view that Aristippus’ hedonism is not limited to “bodily pleasures”, see Urstad 2009).

ii. The Peripatetic School – Aristotle’s Virtue Ethics      

Aristotle proposed the most prominent and sophisticated version of virtue ethics in Antiquity and his teachings have become authoritative for many scholars and still remain alive in the vital contributions of neo-Aristotelians in contemporary philosophy. His main ethical work is the Nicomachean Ethics; less prominent but still valuable and authentic is the Eudemian Ethics while Aristotle’s authorship of the Magna Moralia is highly questionable. Aristotle claims that happiness (eudaimonia) is the highest good – that is the final, perfect, and self-contained goal – to which all people strive at. In particular, happiness is the goal of life, that is, a life that is devoted to “doing” philosophy (EN X, 6–9). Whether a person can be called “happy” can only be determined at the very end of a person’s life, retrospectively. For a good and general overview on Aristotle’s ethics see Broadie (1991) and Wolf (2007).

However, the idea that life should be devoted to reasoning follows from Aristotle’s important human function argument (EN I, 5, 6) in which he attempts to show - by analogy - that human beings as such must also have a proper function in comparison to other things such as a pair of scissors (the proper function is to cutting) and a flute player (the proper function is to flute playing) and so forth. If the proper function is performed in a good way, then Aristotle claims that the particular thing has goodness (aretê). For example, if the proper function of a pair of scissors is to cutting, then the proper function of a good pair of scissors is to cutting well (likewise in all other cases). Since the proper function of human beings - according to Aristotle - is to reason, the goodness of human beings depends on the good performance of the proper human function that is to reason well. In fact, Aristotle claims that the goodness of human beings does not consist in the mere performance of the proper function but rather in their disposition. This claim is substantiated by his example of the good person and the bad person who cannot be distinguished from each other during their bedtime if one only refers to their (active) performance. The only possible way to distinguish them is to refer to their different dispositions. It is a matter of debate whether there is a particular human function as proposed by Aristotle.

All in all, one can distinguish four different lines of reasoning in Aristotle’s ethics: the virtue of the good person (standard interpretation), the idea of an action-oriented virtue ethics, the application of practical wisdom, and the idea of the intrinsic value of virtues. The different approaches are dealt with in order.

The virtue of the good person (EN II, 3, 4): according to Aristotle, an action is good (or right) if a virtuous person would perform that action in a similar situation; an action is bad or wrong (and hence prohibited) if the virtuous person would never perform such an action. Three criteria must be met, according to Aristotle, in order to ensure that an action is virtuous given that the agent is in a certain condition when he performs them: (i.) the agent must have knowledge of the circumstances of the action (the action must not happen by accident); (ii.) the action is undertaken out of deliberative choice and is done for its own sake; and (iii.) the action is performed without hesitation, that is, the action is performed by a person with a firm and stable virtuous character.

The action-oriented virtue ethics (EN II, 6, 1107a10–15): Aristotle’s virtue ethics contains some hints that he not only adheres to the standard interpretation, but also claims that there are some actions that are always morally blameworthy under any circumstances, that is, some actions are intrinsically bad. The fine or the noble and the just require the virtuous person to do or refrain from doing certain things, for example, not to murder (in particular, not to kill one’s parents), not to commit adultery, and not to commit theft. This line of reasoning contains deontological limitations insofar as the virtuous person is no longer the overall standard of evaluation, but the virtuous person herself must meet some ethical criteria in order to fulfil the external demands of, for example, “the noble” and “the just” to act virtuously.

Practical wisdom (EN VI): in some passages in book VI of the Nicomachean Ethics, Aristotle argues that it is our practical wisdom that makes our practical considerations good, both with regard to the good or virtuous life and with regard to our particular goals. He claims that a practically wise person has a special sensitivity or special perceptual skill with which to evaluate a situation in a morally correct or appropriate way. Here, the emphasis lies on the practical wisdom - as the capacity of ethical reasoning and decision-making - rather than on adhering to single ethical virtues, even though Aristotle claims that it is impossible to be practically wise without having ethical virtues and vice versa.

The intrinsic value of the virtues: following the standard interpretation of the role of the ethical virtues with regard to living a good life, Aristotle argues in the Nicomachean Ethics (EN X, 6–9) that these virtues are somewhat less important when it comes to the overall goal, that is, happiness of living a good life. The primary goal is to live a life devoted to “doing” philosophy and thereby living a good life; the secondary goal is to live a life among other people which makes it necessary to adopt the ethical virtues, as well.

iii. Epicureanism and Stoicism

Epicurus – educated by the Platonist Pamphilus and highly influenced by the important teachings of Democritus – developed his philosophical school of the Epicureans in controversies with the Cyrenaics and the Stoics and meeting their objections and challenges. The lively exchange of arguments concerning the vital issue of how to live a good life put Epicurus in the position to successfully articulate a refined and sophisticated version of hedonism, which was regarded as superior to the rival philosophical school of the Cyrenaics. He claims that sensation is the only standard of measuring good and evil. Epicurus shares the view with the Cyrenaics that all living beings strive for pleasure and try to avoid pain. But, unlike the Cyrenaic school, he argues that happiness consists of not only the very moment of bodily pleasure but lasts a whole life and also contains mental pleasure, which is – according to him – preferable to bodily pleasure. In his Letter to Menoceus, Epicurus comments on flawed views of his ethical position and claims: “For what produces the pleasant life is not continuous drinking and parties or pederasty or womanizing or the enjoyment of fish and the other dishes of an expensive table, but sober reasoning […]” (Epic. EP. Men. 132, in: Long and Sedley 2011: 114).  The ultimate goal in life is not to strive for positive pleasure but to seek for absence of pain. Unlike Aristippus, Epicurus claims in support of the importance of mental states that bodily pleasure and pain is limited to the here and now, while the soul is also concerned with the pleasurable and painful states of the past and prospective pleasure and pain. Thus, sensations based on recollections, hope and fear in the context of mental states with regard to the past and future are much stronger than the bodily pleasure of the moment. Being virtuous is a precondition of tranquillity, that is, peace and freedom from fear, which is closely connected to happiness. In addition, Epicurus taught that one should free oneself from prejudices, to master and restrict one’s desires, to live a modest life (for example a life not devoted to achieve glory and honour), which does not exclude bodily pleasure, and to cultivate close friendships, for which the Epicureans were well known (see, Diogenes Laertios X, 1; Zeller 1883: 263-267; Erler and Schofield 2007: 642-674; Long and Sedley 2000: §20-§25).

Shortly after the rise of epicureanism, Zeno of Citium – the founder of stoicism – established a new school in Athens. The members were well known for their cosmopolitism that is the idea that all human beings belong to a single community that should be cultivated (quite similar to Aristippus’ view and the Stoics), their self-contained life style and deep concern for friendship as well as their strong adherence to ataraxia that is the freedom from passions such as pleasure, desires, sorrow, and fear which jeopardize the inner independence. The Stoics were influenced by teachings of the Cynics. Human beings, according to stoicism, are able to perceive the laws of nature through reason and to act accordingly. The best life is a life according to nature (Zeller 1883: 243). Zeno believed that the most general instinct is the instinct of self-preservation; for each living being the only thing that is valuable is what conduces to the being’s self-preservation and thereby contributes to the being’s happiness. For example, in the case of rational beings only what is in accord with reason is valuable; only virtue, which is necessary and sufficient for happiness, is a good. Following the Cynics, the Stoics argue that honour, property, health and life are not goods and that poverty, disgrace, illness, and death are not evils. Against the Cyrenaics and Epicureans, they hold the view that pleasure is not a good and certainly not the highest good; they agree with Aristotle that pleasure is the consequence of our actions – if they are of the right kind – but not the goal itself. Two main doctrines are of utmost importance in the teachings of stoicism, first, the significance of ataraxia and, secondly, the idea of doing what nature demands. First, happiness is ataraxia – the freedom from passions – and a self-contained life style. Secondly, the idea that one must act in accordance with one’s own nature in terms of acting virtuously stands in striking contrast to the other philosophical schools at that time. In addition, the right motif transforms the performance of one’s duty into a virtuous action, completely independent of the outcome of the particular action (an important feature that we find again in Kant’s ethics). Following Socrates and Plato, the Stoics believed that virtue is ethical knowledge and that non-virtuous people simply lack ethical knowledge, since virtue consists in the reasonable condition of the soul, which leads to correct views. The Cynic idea of the sharp distinction between the existence of a very few wise people and many fools, that is all non-wise people, had become less sharp in the process of time. In addition, the Roman philosopher and politician Cicero (106–43 BC) is the first author whose work on the notion of duty survives, De Officiis, in which he examined the notion in great detail in the first century BC (44 BC). It should be noted, however, that the stoic philosopher Panaitios of Rhodes (180–110 BC) had already published an important book on the notion of duty prior to Cicero. Panaitios’ work is lost but we know some essential ideas from it mediated through Cicero since he often refers to Panaitios in his De Officiis. Stoicism outlived the other philosophical schools with regard to its ethics by being an attractive position for many people and leading philosophers and politicians such as Seneca (first century AD) and Marcus Aurelius (second century AD) in Ancient Rome. (see, Diogenes Laertios VII, 1; Zeller 1883: 243-253; Inwood and Donini 2007: 675-738; Long and Sedley 2000: §56-§67).

c. Modern Morality

The two main moral theories of modern virtue ethics (or neo-Aristotelianism) are Kant’s deontological ethics and utilitarianism. Both theories have been adopted and modified by many scholars in recent history in order to make them (more) compatible with the latest demands in ethical reasoning and decision-making, in particular, by meeting the objections raised by modern virtue ethics. The following briefly depicts Kantianism in its original form and the main features of utilitarianism.

i. Kantianism

The German philosopher Immanuel Kant is the founder of deontological ethics. His ethics, which he mainly put forth in the Groundwork of the Metaphysics of Morals (1785), Critique of Practical Reason (1788), and Metaphysics of Morals (1797), is one of the most prominent and highly respected theories in modernity. Kant’s ethics is deontological in the sense that one has to obey the duties and obligations which derive from his supreme principle of morality, that is, the Categorical Imperative: “Act only according to that maxim whereby you can at the same time will that it should become a universal law” (Kant 1785). The Categorical Imperative is a test for maxims which, in turn, determine whether certain acts have moral worth or not. A maxim is an individual’s subjective principle or rule of the will (in German, das subjektive Prinzip des Wollen), which tells the individual what to do in a given particular situation. If the maxim can be universalized, then it is valid and one must act upon it. A maxim cannot be universalized when it faces two severe instances: (i.) the case of logical inconsistency (the example of suicide, which is against the “perfect duty”); and, (ii.) the case of impossibility to will the maxim to be universalized (failing to cultivate one’s talents, which is against the “imperfect duty”). Perfect duties are those duties that are blameworthy if they are not met by human beings (for example the suicide example); imperfect duties allow for human desires and hence they are not as strong as perfect duties but they are still morally binding and people do not attract blame if they do not complete them (for example failing to cultivate one’s talents). Kant’s ethics is universal in the sense that the system of moral duties and obligations point at all rational beings (not only human beings). Morality is not based in interests (such as social contract theories), emotions and intuitions, or conscience, but in reason alone. This is the reason why Kant’s ethics is not heteronomous - by being a divine ethical theory in which God commands what human beings should do (for example the Bible, the Ten Commandments) or natural law conception in which nature itself commands what human beings should do by providing human beings with the faculty of reason who, in turn, detect what should be done in moral matters - but truly autonomous with regard to rational beings, who make their moral decisions in the light of pure practical reason. However, pure practical reason, in determining the moral law or Categorical Imperative, determines what ought to be done without reference to empirical contingent factors (that is, anthropology in the broad sense of the term including the empirical sciences; see preface to Groundwork) such as one’s own desires or any personal inclinations (in German Neigungen). The pure practical reason is not limited to the particular nature of human reasoning but is the source and the field of universal norms, which stem from a general notion of a rational being as such (see, Eisler 2008: 577; Paton 1967; Timmermann 2010; Altman 2011).

ii. Utilitarianism

Historically speaking, Jeremy Bentham in his Introduction to the Principles of Morals and Legislation (1789) and John Stuart Mill in Utilitarianism (1863) are the founders of utilitarianism, while Francis Hutcheson (1755) and William Paley (1785) could be seen as their legitimate predecessors by pointing out that utility should be seen as an important standard of evaluation in ethical reasoning and decision-making. Bentham claims that the duration and intensity of pleasure and pain are of utmost importance and that it is even possible – according to Bentham - to measure the right action by applying a hedonistic calculus which determines the exact utility of the actions. The action with the best hedonistic outcome should be put into practice. His position is called radical quantitative hedonism. Mill instead questions the very idea of a hedonistic calculus and argues that one must distinguish between mental and bodily pleasure by giving more weight to mental pleasures. His position is called qualitative hedonism. Mill’s basic formula of utilitarianism is as follows:

The creed which accepts as the foundation of morals, Utility, or the Greatest Happiness Principle, holds that actions are right in proportion as they tend to promote happiness, wrong as they tend to produce the reverse of happiness. By happiness is intended pleasure, and the absence of pain; by unhappiness, pain and the privation of pleasure. (Mill’s Utilitarianism, chapter 2)

There is widespread agreement that there exist numerous different utilitarian theories in modern ethics; hence it would be impossible to provide an adequate depiction of all important major strands in this brief subsection. However, the following four main aspects are typical for each utilitarian theory. (1.) The consequence principle: Utilitarianism is not about actions but about the consequences of actions. This kind of theory is a form of consequentialism, which means that the moral worth of the particular action is determined by its outcome. (2.) Happiness: Utilitarianism is a teleological theory insofar as happiness (but, not in the ancient sense of the term) is the main goal that should be achieved. This particular goal can be identified with (i.) the promotion of pleasure, (ii.) the avoidance of pain or harm, (iii.) the fulfilment of desires or considered preferences, or (iv.) with meeting some objective criteria of well-being. (3.) Greatest Happiness Principle: Utilitarianism is not about mere happiness but about “the greatest happiness” attainable. Utilitarianism is a theory with one principle that judges the consequences of a given action regarding its utility, which is the general aim of actions. The moral rightness or wrongness of actions depends on the goal of achieving the greatest happiness for the greatest number of sentient beings, in short, “the greatest happiness for the greatest number”. (4.) Maximising: The collective amount of utility regarding sentient beings affected by the action should be maximized. This line of reasoning contains strong altruistic claims because, roughly speaking, one should only choose those actions which improve other sentient beings’ happiness.

Furthermore, one major methodological distinction should be mentioned briefly since it really divides all utilitarian theories in two different groups by either applying the principle of utility to actions or rules. In act utilitarianism (or direct utilitarianism) the principle of utility is applied to the particular action; in this case, one asks whether the action in question is morally right or wrong in this particular situation. In rule utilitarianism (or indirect utilitarianism), instead, the principle of utility is applied to rules only which, in turn, are applied to the particular actions and serve as guidelines for human behaviour in order to guarantee the greatest happiness for the greatest number. Here, the vital question is whether a specific rule maximises the general utility or not. From time to time, it happens that the general utility will be maximised by rule utilitarianism to a lesser degree than it would have been the case regarding act utilitarianism. For example, one should act according to the general rule which says that one should keep one’s promises which - in the long run - maximises the general utility (rule utilitarianism). However, in some cases it would be better to adhere to act utilitarianism since it maximises the general utility to a higher degree depending on the particular situation and circumstances of the case in question (act utilitarianism).

d. The Up-shot

The depiction of the ethical views of some important philosophical schools as well as their interrelatedness in Antiquity and the outline of the two leading moral theories in modern morality show that there is – despite the systematic difference concerning the importance of the question of the good life – a significant overlap of important lines of reasoning. In addition, the supposed distinction between ancient ethics and modern morality contains many misleading claims. Socrates can be seen as the initial ignition of a broad variety of diverse virtue ethical approaches such as cynicism, the teachings of the Cyrenaics, Aristotelianism, epicureanism, and stoicism. All philosophical schools were concerned with the vital questions of how to live a good life and how to achieve happiness by pointing out what the appropriate actions were. The brief outline of the different philosophical schools in Antiquity supports this view. Modern morality is different in that its focus is on the basic question of how one should act. The ancient question of how should one live is secondary. However, modern morality in particular Kantianism and utilitarianism did not start from scratch but already had some important and highly influential ancient predecessors. For example, the Kantian idea of doing the right thing because reason dictates it has its roots in stoicism (see, Cooper 1998, Schneewind 1998) and the utilitarian idea of living a happy life according to pleasure has its roots in the teachings of the Cyrenaics (for example Bentham 1789) and Epicureans (for example Mill 1863). The history of ideas conveyed important ethical insights handed down from Antiquity to modernity. The idea that there is a clear and easy distinction between ancient (virtue) ethics and modern moral theories is premature and misleading. Indeed, there are some important differences but one must acknowledge the simple fact that there is no unity or broad consensus among ancient virtue ethicists concerning the question of how to live a good life and which actions should count as virtuous. Hence, it follows that there is no “ancient ethics” as such but many important and diverse virtue ethical approaches, which have either more or less in common with “modern morality”.

In addition, modern morality, in particular contemporary morality, is characterized by the fact that quite a few important scholars elaborated modern versions of Aristotle’s classical virtue ethics in the twentieth century. These scholars argue that virtue ethics was quite successful in solving ethical problems in Antiquity and they believe that adhering to a refined version of virtue ethics is not only useful but also superior in solving our modern moral problems. Among the most important neo-Aristotelian scholars are Anscombe (1958), Foot (1978, 2001), Hursthouse (1999), MacIntyre (1981), Nussbaum (1992, 1993, 1995), Slote (2001), Swanton (2003), and Williams (1985) who claim that the traditional ethical theories such as deontological ethics (Kantianism) and consequentialism (utilitarianism) are doomed to failure. In general they adhere, at least, to two main hypotheses: (i.) People in Antiquity already employed a very efficient way of ethical reasoning and decision-making; and, (ii.) this particular way got lost in modernity without having been properly replaced. Hence it follows that one should overcome the deficient modern ethical theories and again adhere to virtue ethics as a viable alternative without, of course, abandoning the existing ethical developments (see Bayertz 2005: 115).

The following section depicts the old but still persisting stereotypical differences between ancient ethics and modern morality in order to further deepen our understanding about the supposed and real differences and similarities of both ethical approaches.

2. The Table of Ancient Ethics and Modern Morality – A Comparison

This self-explanatory table presents a simple but instructive comparison of the defining features of the stereotypes of ancient ethics and modern morality (for a similar table see Bayertz 2005: 117).

No. Criteria Ancient Ethics Modern Morality
1. Basic Question  What is the good life? What is happiness and human flourishing? What should one/I do? The question of the good life plays, at best, a sub-ordinate role.
2. What is the Object of Concern?  Self-centred: The person’s own interests dominate. Other-related: The interests of other people are most central.
3. What is most important?  Pursuit of Goals: Personal perfection, personal projects, and personal relationships. Universal moral obligations & rules: Individuals should seek for impartiality (and hence they alienate themselves from their own personal projects).
4. What is examined?  Agent: Most important are the acting person and his/her character (agent-centred ethics). Actions & Consequences: Most important is the correctness of the action & consequence (action & consequences centred ethics).
5. Central Notions  Virtues: aretaic notions for example good, excellence, virtue (aretaic language). Norms: prescriptive notions concerning rules, duties, obligations for example must, should (deontic language).
6. Rationality is seen as?  Rationality is seen as a capacity of context-sensitive insight and decision-making. Rationality is “mainly” seen as the capacity to (rationally) deduce inferences from abstract propositions.
7. The Goals of human actions  The goals of human actions are objective (notion of happiness: for example thinking, pleasure). The goals of human actions are individually defined by the people (subjectivism). No God, no nature.
8.  Scope of Morality Adult male citizens with full citizenship. Men, women, children, animals, environment.
9.  Individual and Community The individual is in unity with the community (harmony). The individual and the community are rather disconnected from each other.

Table 1: Ancient Ethics and Modern Morality

3. Ancient Ethics and Modern Morality – The Main Differences

a. The Good Life versus the Good Action

The most common stereotype with regard to ancient ethics and modern morality concerns the vital issue that ancient ethics is only about the question “What is the good life” and that modern moral theories only deal with the question “What should one do” or “How should one act”. Many stereotypes certainly depict some truth, but there is almost always a lot of room for a better understanding of the differences and similarities of the particular issue. To be more precise with regard to this issue, it is true that ancient ethics concerns the vital question of how to live a good life and to become a virtuous person by acting in accordance with the ethical virtues. However, the idea that virtue ethics does not deal with actions and hence is unable to provide concrete answers to ethical problems is premature; it is not only modern moral theories that deal with actions (see, Hursthouse 1999, chapters 1-3; Slote 2001, chapter 1; Swanton 2003, chapter 11). An ethical virtue, according to Aristotle, needs to be completely internalized by its agent through many actions of the same type so that the person is able to accomplish a firm disposition. In other words, a brave person who has the virtue of courage has to perform many brave actions in the area of fear and confidence in order to accomplish a brave disposition. Performing the appropriate actions is the only way one can do this. Indeed, modern moral theories are rather focused on the question of what should one do in a particular situation, and usually ethicists do not pay much attention to the question of living a good life. Ancient ethicists, instead, believe that one cannot separate both issues.

A related issue that seems to strongly support the initial idea concerns the claim that, on the one hand, ancient ethics is self-centred because it only focuses on the agent’s interests in living a good life and becoming a virtuous person and, on the other hand, that modern morality is other-regarding by only focusing on the interests of other people. Broadly speaking, ancient ethics is egoistical and modern morality is altruistic. The interests of other people in virtue ethics enter the stage by being incorporated into the person’s own interest in becoming virtuous and living a good life. In her article Ancient Ethics and Modern Morality, Annas examines this point in more detail and claims “the confusion comes from the thought that if the good of others is introduced into the agent’s own final good, it cannot really be the good of others, but must in some way be reduced to what matters to the agent”.  She points out that the confusion might be that “the good of others must matter to me because it is the good of others, not because it is part of my own good” (Annas 1992: 131). Annas thinks that this is compatible with the overall final good of the virtuous person since the good of others matters to the virtuous person not because it is part of the agent’s own good but because it is the good of others.

Other people, however, might claim that the difference is between “morality” and “legality”, to use a Kantian distinction. In this context, legality means simply to fulfil the moral claims that other people have; morality means to fulfil the moral claims that other people have and, in addition, to have the right motive in doing so, that is, to act out of “the good will” – to act out of a sense of moral obligation or duty. Translated into “ancient” language, the virtuous person should consider other people’s interests not because she feels indifferent to them or because their interests are only instrumentally useful to her as agent, but because the virtuous person wholeheartedly believes, feels, and acknowledges the fact that the other people’s interests are important in their own right. Another example is Aristotle who believes that the good person is living a good life if and only if she devotes her life to “philosophy” and, secondarily, lives a social life among other people. The latter requires the usage of ethical virtues, which are by nature other-regarding; the former does not require the usage of ethical virtues (see, Aristotle EN X, 6–9), even though, according to Aristotle, one cannot be a practically wise person without being virtuous, and vice versa. Both concepts are mutually dependent (EN VI).

One might claim that self-interest and the interests of other people do not stand in contrast to each other in ancient ethics but converge by adhering to an objective idea of the good (see, Bayertz 2005). The line between moral questions that concern the interests of other people and ethical questions that concern the well-being of the particular agent is disfigured beyond recognition. In modern morality, however, there is a clear difference because the question of the good life is secondary, and is systematically not important for the question of how one should act in a particular situation. Modern moral theories are rather subjective in character and hence lack the strong commitments of virtue ethical theories concerning their objective basis, as well as their claims regarding elitism and the devaluation of the moral common sense. The upshot is, however, that there is a systematic difference between ancient ethics and modern morality concerning the way in which moral problems are solved, but the idea that ancient ethics is egoistic and does not appeal to actions is premature and simply wrong.

b. The Moral Ought

Anscombe points out in her classical paper Modern Moral Philosophy (1958) that modern morality is doomed to failure because it only focuses on the analysis of language and notions and, in particular, it adheres to the fallacious idea of the moral duty. She argues that the idea of the moral duty and the moral ought used in deontological ethics originally comes from religious reasoning and theological ethics, where God was the ultimate source of morality and where the people had to obey God’s commands. Here, the ideas of a moral duty and a moral ought were appropriate. In secular ethics, however, there is no general consent to the idea of a moral duty that is universally binding on all rational people. The idea of a moral duty, according to Anscombe, should be replaced by the notion of virtue. Furthermore, Schopenhauer convincingly claims in his book On the Basis of Morality that even in the case of religious ethics there is no categorical moral duty, since people obey God’s moral rules simply because they do not want to be punished, if they decide not to act accordingly. But this means that the moral duty is hypothetical rather than categorical. It is commonly said that in ancient ethics there is no moral duty and no moral ought simply because the Greek and Romans lack those particular notions. However, from the bare fact that they lack the notions of moral duty and moral ought, one cannot conclude that they also lack the particular phenomena as well (Bayertz 2005: 122). In addition, one might claim that his point still misses the general idea of using similar notions as main ethical key terms, which reflects a certain particular way of ethical reasoning and decision-making. Whether there is something like a ‘moral ought’ in ancient virtue ethics that is comparable to deontological ethics will be briefly examined below by focusing on Aristotle’s ethics. 

c. Can a Virtuous Person Act in a Non-Virtuous Way?

According to ancient ethics, a completely virtuous person, who is the bearer of all ethical virtues, is unable to act in a non-virtuous way. If a person bears one virtue, he thereby bears all other virtues as well (that is the thesis of the unity of the virtues). The practically wise person – according to Ancient ethicists - will always act in accordance with the ethical virtues. In other words, the virtuous person is always master of her emotions and, in general, will never be swamped by her emotions, which otherwise might have led her to act in a non-virtuous way. Generally speaking, this is a quite demanding line of argumentation since it can be the case, at least according to our modern way of thinking, that a brave person who has the virtue of courage might not be able to show the virtue of liberality. However, even if one acknowledges that person A is a virtuous person, one might not be convinced that this person will never be able to act in a non-virtuous way. This particular problem has to do with the famous hypothesis of ‘the unity of the virtues’ (for a recent contribution to this problem, see Russell, 2009). In modern morality, utilitarianism, for example, convincingly distinguishes between the evaluation of the character of a person and his or her actions. It can easily be the case, according to utilitarianism, that a morally bad person performs a morally right action or that a morally good person performs a morally wrong action. This distinction is impossible to draw for proponents of (classic) virtue ethics because an ethically right action always presupposes that the person has an ethically good character.

4. Special Problem: Kant and Aristotle – Moral Duty and For the Sake of the Noble

There is a widely shared agreement among philosophers that Kant’s deontological ethics and Aristotle’s virtue ethics can be easily distinguished by acknowledging the simple fact that Kant is concerned with acting from duty or on the moral principle or because one thinks that it is morally right; while Aristotle’s approach completely lacks this particular idea of moral motivation and, hence, it would be unsound to claim that the virtuous person is morally obligated to act in a way similar to the Kantian agent. In other words, there is no such thing as acting from a sense of duty in virtue ethics. The common view has been challenged by, for example, neo-Aristotelians (for example Hursthouse 2010) who claim that there is not only a strong notion of moral motivation in Aristotle’s approach, but also that the virtuous person is better equipped to meet the demands of acting from a sense of duty than the Kantian moral agent. The following sketches out the main line of reasoning (see, also Engstrom and Whiting 1998; Jost and Wuerth 2011).

Hursthouse claims in her book On Virtue Ethics that “there is a growing enthusiasm for the idea that the ideal Kantian agent, the person with a good will, who acts “from a sense of duty”, and the ideal neo-Aristotelian agent, who acts from virtue – from a settled state of character – are not as different as they were once supposed to be” (2010: 140). Her view is supported by some important works of Hudson (1990), Audi (1995), and Baron (1995). This fact, however, has also been acknowledged by neo-Kantian philosophers such as Korsgaard (1998) and Herman (1998). In this respect it reflects a lack of awareness about current developments in virtue ethics and neo-Kantianism if one still up-holds the claim of the clear distinction between ancient ethics and modern morality, in particular, concerning Aristotle and Kant that has been proposed for hundreds of years. A related issue concerning the question of whether there is a fundamental distinction between aretaic and deontic terms has been critically discussed by Gryz (2011) who argues against Stocker (1973) who claims that “good” and “right” mean the same thing. Gryz is convinced that even if both groups of terms converge (as close as possible), there will still either remain an unbridgeable gap or in case that one attempts to define one group of terms by the other group, it follows that something is left behind which cannot be explained by the second group. This contemporary debate shows that there is still no common view on the relationship between ancient ethics and modern morality.

Kant claims in the Groundwork that the morally motivated agent acts from good will. In more detail, to act from duty or to act because one thinks that it is morally right is to perform an action because one thinks that its maxim has the form of a law (Korsgaard 1998: 218). For example, if a person is in need the Kantian agent does the right action not because – as Korsgaard claims – that it is her purpose to simply do her duty, but because the person chooses the action for its own sake that means her purpose is to help (Korsgaard 1998: 207).

Even if the Ancient Greeks lacked the particular notions that can be translated as moral ought, duty, right, and principle (for example Gryz 2011, Hursthouse 2010), it seems nonetheless correct to claim that the idea of doing the right thing because it is right or because one is required to do it is also a well-known phenomenon in classic virtue ethics in general and with regard to Aristotle and stoicism in particular. There are quite a few passages in the Nicomachean Ethics in which Aristotle clearly claims that morally good actions are done for their own sake or because it is the morally right thing to do:

Now excellent actions are noble and done for the sake of the noble. (EN IV, 2, 1120a23–24)

Now the brave man is as dauntless as man may be. Therefore, while he will fear even the things that are not beyond human strength, he will fear them as he ought and as reason directs, and he will face them for the sake of what is noble; for this is the end of excellence. (EN III, 10 1115b10-13)

The standard of all things is the good and the good man; he is striving for the good with all his soul and does the good for the sake of the intellectual element in him. (EN IX, 4, 1166a10–20)

The good man acts for the sake of the noble. (EN IX, 8, 1168a33-35)

For the wicked man, what he does clashes with what he ought to do, but what the good man ought to do he does; for the intellect always chooses what is best for itself, and the good man obeys his intellect. (EN IX, 8, 1169a15–18)

If the virtuous person acts because she thinks that it is the right thing to do, because she acts for the sake of the noble without any inclination other than to do good for the sake of the noble, then she is comparable with the Kantian moral agent. For example, according to Aristotle the noble is “that which is both desirable for its own sake and also worthy of praise” (Rhetoric I, 9, 1366a33); and in 1366b38–67a5 he holds the view that nobility is exhibited in actions “that benefit others rather than the agent, and actions whose advantages will only appear after the agent’s death, since in these cases we can be sure the agent himself gets nothing out of it” (Korsgaard 1998: 217). Hence it follows, the virtuous person will not be able to act in a non-virtuous way because he or she acts from a strong inner moral obligation to act according to the morally right thing, since it is the very nature of the virtuous person to act virtuously. The Kantian agent, instead, sometimes acts according to the universal law and hence performs a morally right action, and on other occasions he or she fails to do so. This is because he or she has no stable and firm disposition to always act in accordance with the universal law. That is the very reason why the Aristotelian virtuous person can be seen as an agent who is not only acting from duty in the sense of doing the right thing because it is right, but also because the virtuous person constantly perceives and adheres to the moral duty, that is, to act virtuously.

5. Conclusion

The upshot is, however, that the vital question of how to live a good life cannot be separated from the essential question of how one should act. Conceptually and phenomenologically, both questions are intimately interwoven and a complete ethical theory will always be concerned with both issues, independently of whether the theory is of ancient or modern origin.

6. References and Further Reading

  • Altman, M. 2011. Kant and Applied Ethics: The Uses and Limits of Kant's Practical Philosophy. Oxford: Wiley-Blackwell.
  • Annas, J. 1992. “Ancient Ethics and Modern Morality.” Philosophical Perspectives 6: 119-136.
  • Anscombe, E. 1958. “Modern Moral Philosophy.“ Philosophy 33 (124): 1-19.
  • Apelt, O, trans. 1998. Diogenes Laertius: Leben und Meinungen berühmter Philosophen. Hamburg: Meiner.
  • Aristotle. 1985. “Nicomachean Ethics.” Translated by W.D. Ross. In Complete Works of Aristotle, edited by J. Barnes, 1729-1867. Princeton: Princeton University Press.
  • Aristotle. 2007. Rhetoric. Translated by George A. Kennedy. New York: Oxford University Press.
  • Audi, R. 1995. “Acting from Virtue.” Mind 104 (415): 449-471.
  • Baron, M. 1995. Kantian Ethics Almost Without Apology. New York: Cornell University Press.
  • Bayertz, K. 2005. “Antike und moderne Ethik.“ Zeitschrift für philosophische Forschung 59 (1): 114-132.
  • Bentham, J. 1962. „The Principles of Morality and Legislation” [1798]. In J. Bowring (ed.), The Works of Jeremy Bentham, edited by J. Bowring, 1-154. New York: Russell and Russell.
  • Broadie, S. 1991. Ethics with Aristotle. Oxford: Oxford University Press.
  • Cicero, M.T. 2001. On Obligations. Translated by P.G. Walsh. Oxford, New York: Oxford University Press.
  • Cooper, J.M. 1998. “Eudaimonism, the Appeal to Nature, and ‘Moral Duty’ in Stoicism.” In Aristotle, Kant, and the Stoics: Rethinking Happiness and Duty, edited by S. Engstrom and  J. Withing, 261-284. Cambridge, New York: Cambridge University Press.
  • Döring, K. 1988. Der Sokratesschüler Aristipp und die Kyrenaiker. Stuttgart, Wiesbaden: Franz Steiner.
  • Dworkin, R. 1984. Rights as Trumps. In Theories of Rights, edited by J. Waldron, 152-167. Oxford: Oxford University Press.
  • Eisler, R. 2008. Kant Lexikon: Nachschlagewerk Zu Kants Sämtlichen Schriften, Briefen Und Handschriftlichem Nachlaß. Hildesheim: Weidmannsche Verlagsbuchhandlung.
  • Engstrom, S. and J. Withing. 1998. Aristotle, Kant, and the Stoics. Rethinking Happiness and Duty.  Cambridge: Cambridge University Press.
  • Erler, M. and M. Schofield. 2007. “Epicurean Ethics.” In The Cambridge History of Hellenistic Philosophy, edited by K. Algra, J. Barnes, J. Mansfield and M. Schofield, 642-669. Cambridge: Cambridge University Press.
  • Foot, P. 1978. Virtues and Vices and other Essays in Moral Philosophy. Berkeley: University of California Press.
  • Foot, P. 2001. Natural Goodness. Oxford: Clarendon Press.
  • Gryz, J. 2010. “On the Relationship Between the Aretaic and the Deontic.” Ethical Theory and Moral Practice 14 (5): 493–501.
  • Herman, B. 1998. “Making Room for Character.” In Aristotle, Kant, and the Stoics. Rethinking Happiness and Duty, edited by S. Engstrom and J. Withing, 36-60.  Cambridge: Cambridge University Press.
  • Hudson, S. 1990. “What is Morality All About?” Philosophia: Philosophical Quarterly of Israel 20 (1-2): 3-13.
  • Hursthouse, R. 1991. “Virtue Theory and Abortion.” Philosophy and Public Affairs 20 (3): 223-246.
  • Hursthouse, R. 2010. On Virtue Ethics. Oxford: Oxford University Press.
  • Hursthouse, R. 2007. “Environmental Virtue Ethics.” In Working Virtue: Virtue Ethics and Contemporary Moral Problems, edited by R.L. Walker and P.J. Ivanhoe, 155-171. New York: Oxford University Press.
  • Hutcheson, F. 2005. A System of Moral Philosophy, in Two Books [1755]. London: Continuum International Publishing Group.
  • Inwood, B. and P. Donini. 2007. “Stoic Ethics.” In The Cambridge History of Hellenistic Philosophy, edited by K. Algra, J. Barnes, J. Mansfield and M. Schofield, 675-736. Cambridge:  Cambridge University Press.
  • Jost, L. and J. Wuerth. 2011. Perfecting Virtue: New Essays on Kantian Ethics and Virtue Ethics. Cambridge: Cambridge University Press.
  • Kant., I. 1991. Metaphysics of Morals [1797]. Translated by M. Gregor. Cambridge: Cambridge University Press.
  • Kant, I. 1997. Critique of Practical Reason [1788]. Translated by M. Gregor. Cambridge: Cambridge University Press.
  • Kant, I. 2005. Groundwork for the Metaphysics of Morals [1785]. Translated by A.W. Wood. New Haven CT:  Yale University Press.
  • Korsgaard, C.M. 1996. Sources of Normativity. Cambridge: Cambridge University Press.
  • Korsgaard, C.M. 1998. “From Duty and From the Sake of the Noble: Kant and Aristotle on Morally Good Action.” In Aristotle, Kant, and the Stoics. Rethinking Happiness and Duty, edited by S. Engstrom and J. Withing, 203-236. Cambridge: Cambridge University Press.
  • Long, A.A. 2007. “The Socratic Legacy.” In The Cambridge History of Hellenistic Philosophy, edited by K. Algra, J. Barnes, J. Mansfield and M. Schofield, 617-639. Cambridge: Cambridge University Press.
  • Long, A.A and D.N. Sedley. 2000. The Hellenistic Philosophers, Volume 2: Greek and Latin Texts with Notes and Bibliography. Cambridge: Cambridge University Press.
  • Long, A.A. and D.N. Sedley. 2011. The Hellenistic Philosophers, Volume 1: Translations of the Principal Sources, with Philosophical Commentary. Cambridge: Cambridge University Press.
  • MacIntyre, A. 1981.  After Virtue. London: Duckworth.
  • Mill, J.S. 1998. Utilitarianism [1863]. Edited by R. Crisp. Oxford: Oxford University Press.
  • Nussbaum, M.C. 1992. “Human Functioning and Social Justice. In Defense of Aristotelian Essentialism.” Political Theory 20 (2): 202-246.
  • Nussbaum, M. 1993. “Non-Relative Virtues: An Aristotelian Approach.” In The Quality of Life, edited by M.C. Nussbaum and A. Sen, (1-6). New York:  Oxford University Press.
  • Nussbaum, M. 1995. “Aristotle on Human Nature and the Foundations of Ethics.” In World, Mind, and Ethics: Essays on the Ethical Philosophy of Bernard Williams, edited by J.E.J. Altham and R. Harrison, 86-131. Cambridge, New York: Cambridge University Press.
  • Paley, W. 2002. The Principles of Moral and Political Philosophy [1785]. Indianapolis: Liberty Fund.
  • Paton, H. J. 1967. The Categorical Imperative: A Study in Kant’s Moral Philosophy. London: Hutchinson & co.
  • Regan, T. 1985.  The Case for Animal Rights. Berkeley: University of California Press.
  • Russell, D.C. 2009. Practical Intelligence and the Virtues. Oxford: Clarendon Press.
  • Schneewind, J.B. 1998. “Kant and Stoic Ethics.” In Aristotle, Kant, and the Stoics: Rethinking Happiness and Duty, edited by S. Engstrom and J. Withing, 285-302. Cambridge, New York: Cambridge University Press.
  • Schopenhauer, A. 1995. On the Basis of Morality [1841]. Translated by E.F.J. Payne. Providence: Berghahn Books.
  • Slote, M. 2001. Morals from Motives. Oxford: Oxford University Press.
  • Steinfath, H. 2000. „Ethik und Moral.“ In Vorlesung: Typen ethischer Theorien, 1-21 (unpublished)
  • Stocker, M. 1973. “Rightness and Goodness: Is There a Difference?” American Philosophical Quarterly 10 (2): 87–98.
  • Swanton, C. 2003. Virtue Ethics: A Pluralistic View. Oxford: Oxford University Press.
  • Timmermann, Jens. 2010. Kant’s Groundwork of the Metaphysics of Morals. Cambridge: Cambridge University Press.
  • Urstad, K. 2009. “Pathos, Pleasure and the Ethical Life in Aristippus.” Journal of Ancient Philosophy III (1): 1-22.
  • Williams, B. 1985. Ethics and the Limits of Philosophy. Cambridge MA: Harvard University Press.
  • Wolf, Ursula. 2002. Aristoteles’ “Nikomachische Ethik”. Darmstadt: Wissenschaftliche Buchgesellschaft.
  • Zeller, E. 1883. Geschichte der Griechischer Philosophie, Stuttgart: Magnus.


Author Information

John-Stewart Gordon
University of Cologne, Germany
Vytautas Magnus University Kaunas, Lithuania

Time Supplement

This supplement answers a series of questions designed to reveal more about what science requires of physical time, and to provide background information about other topics discussed in the Time article.

Table of Contents

  1. What are Instants and Durations?
  2. What is an Event?
  3. What is a Reference Frame?
  4. What is an Inertial Frame?
  5. What is Spacetime?
  6. What is a Minkowski Spacetime Diagram?
  7. What is Time's Metric and Spacetime's Interval?
  8. Does the Theory of Relativity Imply Time is Partly Space?
  9. Is Time the Fourth Dimension?
  10. Is There More Than One Kind of Physical Time?
  11. How is Time Relative to the Observer?
  12. What is the Relativity of Simultaneity?
  13. What is the Conventionality of Simultaneity?
  14. What is the Difference between the Past and the Absolute Past?
  15. What is Time Dilation?
  16. How does Gravity Affect Time?
  17. What Happens to Time at a Black Hole?
  18. What is the Solution to the Twin Paradox?
  19. What is the Solution to Zeno's Paradoxes?
  20. How do Time Coordinates Get Assigned to Points of Spacetime?
  21. How do Dates Get Assigned to Actual Events?
  22. What is Essential to Being a Clock?
  23. What does It Mean for a Clock to be Accurate?
  24. What is Our Standard Clock?
  25. Why are Some Standard Clocks Better than Others?
  26. What is a Field?

1. What Are Instants and Durations?

A duration is a measure of elapsed time. It is a number with temporal units such as years or seconds. The duration of Earth's existence is about five billion years; the duration of a flash of lightning is about 0.0002 seconds. The second is the agreed upon standard unit for the measurement of duration [in the S.I. system (the International Systems of Units, that is, Le Système International d'Unités)]. In informal conversation, an instant is a very short duration. In physics, however, an instant is even shorter. It is instantaneous; it has zero duration.

There is another sense of the word "instant" which means, not duration, but a time, as when we say it happened at that instant. Midnight could be such an instant.

It is assumed in physics that a real event is always a linear continuum of the instants or times that compose the event, but it is an interesting philosophical question to ask how physicists know it is a continuum. Nobody could ever measure time that finely.

A brief comment on the terms: "segment," "interval," and "period." We correctly speak of a segment of a line, but of an interval of numbers, and not a segment of numbers. Regarding time, there is no standard terminology about whether to say interval of time or period of time, although the latter is more popular. The measure of a period of time is called a "duration." The term "interval" in the phrase "spacetime interval" is a different kind of interval.

2. What Is an Event?

In ordinary discourse, an event is a happening lasting a finite duration during which some object changes its properties. For example, this morning’s event of buttering the toast is the toast’s changing from having the property of being unbuttered to having the property of being buttered.

The philosopher Jaegwon Kim suggested that an event should be defined as an object’s having a property at a time. So, two events are the same if they are both events of the same object having the same property at the same time. This suggestion makes it difficult to make sense of the remark, “The vacation could have started an hour earlier.” On Kim’s analysis, the vacation event could not have started earlier because, if it did, it would be a different event. A possible-worlds analysis of events might be the way to solve this problem of change, but that solution will not be explored here.

Physicists adopt the idealization that a basic event is a point event (or point-event): a single property (value of a variable such as the strength of a magnetic field) at a point in time and at a point in space. The point in space is specified relative to a reference frame. The significance of the physicists' idealization is that in ordinary discourse and in most philosophical discussions, an event must involve a change in some property; the physicist’s event does not have this requirement about change. A physicist's event might be that an electron is there at that point in space at that point in time. Multiple basic events might occur at a single location, such as the values not only of the electromagnetic field but also the quark field. Your trip to the supermarket to buy carrots is, in principle, analyzable as a collection of a great many point events.

A mathematical space is a collection of points, and the points might represent anything, for example, a sales price in dollars, and a name of a salesperson, and series number of item sold. The point would be an ordered triple of the three. But the points of a real space, that is, a physical space, can only be locations, that is, places. For example, the large place called “New York City” at one time is composed of the actual point locations which occur within the city’s boundary at that time. The method for dealing with vague boundaries will not be discussed here.

The physicists’ notion of point event is metaphysically unacceptable to many philosophers, in part because it deviates so much from the way “event” is used in ordinary language. In 1936, in order to avoid point events, Bertrand Russell and A. N. Whitehead developed a theory of time based on the assumption that all events in spacetime have a finite, non-zero duration. Unfortunately, they had to assume that any finite part of an event is an event, and this assumption is no closer to common sense than the physicist’s assumption that all events are composed of point events. This encyclopedia article on Zeno’s Paradoxes mentions that Michael Dummett and Frank Arntzenius have continued in the 21st century to develop Russell’s and Whitehead’s idea that any event must have a non-zero duration.

McTaggart argued early in the twentieth century that events change. For example, he said the event of Queen Anne’s death is changing because it is receding ever farther into the past as time goes on. Many other philosophers (those of the so-called B-camp) believe it is improper to consider an event to be something that can change, and that the error is in not using the word "change" properly. This is still an open question in philosophy, but physicists generally use the term "event" as something that does not change.

For a more detailed discussion of what an event is, see the article on Events.

3. What Is a Reference Frame?

A reference frame for a space [whether it is a physical space, a space-time, or a twelve-dimensional abstract mathematical space] is a standard point of view or a perspective that is usually intended to be usable for making quantitative measurements and judgments about places in the space and the phenomena that take place there. To be suited for these purposes, a reference frame needs to be specified by (or augmented by--the experts are ambiguous on this point) choosing a coordinate system and specifying its origin and orientation in the space. Depending on the space, the objects might be physical objects or spacetime events or something else.

You are not in all reference frames. You are not in the reference frame of the two-dimensional abstract space that maps the price of rice in China against the years in the 19th century. Let's focus now on reference frames for space and spacetime. The frame for the physical space in which you have zero velocity is called your "proper frame." There are an infinite number of legitimate choices for the reference frame, but choosing a good reference frame can make a situation much easier to describe. For example, if you are trying to describe the motion of a car down a fairly straight highway, you would not want to choose a reference frame that is fixed to a spinning carousel that is beside the highway. Instead, choose a reference frame fixed to the highway or else fixed to the car. The reference frame attached to the carousel would not be incorrect, just very inconvenient.

A coordinate system is a continuous labeling of the point-parts of objects with numbers, usually with real numbers. For our three-dimensional space, often the simplest coordinate system is to specify that there are three mutually perpendicular space axes x, y, and z intersecting at a certain place, called the origin, and that all the axes have a certain orientation relative to some direction or physical object. For example, we might orient the z axis by saying it points up, while the x axis points forward along the highway, and the y axis is perpendicular to the other two axes and points across the highway.

If we are dealing with spacetime rather than merely space, then we would add a t axis, and say t=0 when a certain famous event occurs such as the firing of a starter pistol, and we would orient the t axis by saying the value of t along the axis is measured by our civilization's standard clock. There are two important constraints on the choice of coordinate system for a reference frame involving time. We want to ensure that simultaneous events are assigned the same time coordinate, and we want to ensure that nearby events are assigned nearby time coordinates. Treating time as a special space-like coordinate is called "spatializing time," and doing this is what makes time precisely describable in a way that treating time as "becoming" does not.

A reference frame for space is often specified by selecting a solid object that does not normally change its size and by saying that the reference frame is fixed to the object—which is equivalent to saying the object is stationary. We might select a reference frame fixed to the Rock of Gibraltar, and say the rock is at the origin where the axes cross each other at <0,0,0,0>. Another object is said to be at rest in this reference frame if it remains at a constant distance in a fixed direction from the Rock of Gibraltar. When we say the Sun rose this morning, we are implicitly choosing a reference frame fixed to the Earth’s surface. The Sun is not at rest in this reference frame, but the Earth is.

The reference frame will specify locations, and this is normally done by choosing a coordinate system that spans the space (equivalently, is global because it assigns coordinate numbers to all points of the space). If we are applying the reference frame only to a region of space, then we can be happy with assigning coordinate numbers only to all points in the region.

In a three-dimensional space, the analyst will specify four distinct points, one for the origin, and the other three for defining three independent, perpendicular axes, which in the familiar Cartesian coordinate system will be the x, y and z axes. Two point-objects in this system are at the same place if they have the same x-value, the same y-value and the same z-value. To keep track of 4-d events rather than simply 3D objects, you the analyst will need to add a time axis, so you will expand your three-dimensional mathematical space to a four-dimensional mathematical space. There, two point events are identical if they occur at the same place and also at the same time. We won't discuss polar or hyperspherical or other coordinate systems.

The coordinates could be finite strings of letters of the alphabet instead of real numbers, but real numbers are usually the best choice if we want to use coordinates for measurement, not just for naming points. An arbitrary name such as "bmaaarirc" for a specific point would contain no information about where one point is located in relation to another, nor how far away one point is from another. A better naming operation can provide this information. Regardless of the number of dimensions, if we want to do measurement, then we should require of our reference frame that nearby points be named with nearby triples of numbers, one number for each of the dimensions, and that any continuous change along a path between two points be reflected in a continuous change of the coordinates of those points. That is why we require that the names of points be real numbers—more specifically an ordered n-tuple of real numbers for a space of n dimensions. Often we prefer rectilinear coordinates to curvilinear coordinates, and we prefer that the coordinate lines along one dimension be straight and at right angles to coordinate lines along all other dimensions so that we can make use of the Pythagorean Theorem in computations of distances; thus we prefer the familiar Euclidean coordinate system.

Both classical theory, such as Newton's mechanics, and relativity theory assume the set of all events forms a four dimensional manifold, which means that events can be specified with four independent, real numbers (rather than, say, integers) and that infinitesimally small regions of events can be covered by a four-dimensional Cartesian coordinate system. A real space obeying general relativity will be curved, and so cannot have a Cartesian coordinate system except infinitesimally. If you detect somewhere that your triangles are odd because the sum of their interior angles is not exactly 180 degrees, then you know your space curves there.

But we can use a curved coordinate system on curved spacetime. The sphere's surface is a two-dimensional space that requires us to select a curvilinear coordinate system in which the axes curve [as viewed from a higher dimensional space]. To cover all of curved spacetime, we must make do with covering different regions of spacetime with different coordinate patches (called charts) that are “knitted together” where one patch meets another. Any curved spacetime (that does not have two disconnected parts) can be covered with a single, global coordinate system in which every point is uniquely identified with a set of four numbers in a continuous way.

Informally when you want to find out how many dimensions your space has, you choose the maximum number of sticks that you can make be mutually perpendicular. As noted above, a dimension of a space is a kind of path in a certain direction in that space, and a coordinate for that dimension is a number that serves as a location along that path. In creating reference frames for spaces, the usual assumption is that in order to specify a location we should supply n independent numbers in an n-dimensional space, where n is a positive integer. This is usual but not required; instead we could exploit the idea that there are space-filling curves which permit a single continuous curve to completely fill, and thus coordinatize, a region of dimension higher than one. For this reason (namely, that each point in n-dimensional space does not always need n numbers to uniquely name the point), the contemporary definition of “dimension” is rather exotic and will not be introduced here. Using a space-filling curve as a coordinate axis is a bad choice because nearby points won't usually have nearby coordinates.

Given an event, it may have one time coordinate in one coordinate system, but a different time coordinate in a different coordinate system. If there exist equations telling us how points change their coordinates between the two different frames, then we say there exist "transformation equations" for the two frames. So, if you were to die, you would die in all reference frames, but at different times.

Physicists distinguish the past from the absolute past. Being in the absolute past is a frame-independent notion, but merely being in the past is not. Event x is in the past of event y if and only if (abbreviated "iff") x happens before y. Event x is in the absolute past of event y iff, in all frames of reference, x happens before y. Event x is in the absolute past of y iff x could have causally influenced y iff an unhindered light beam from x could have reached y. If in some frame two different events x and y are simultaneous, then there is another frame in which they are not simultaneous.

Section 20 offers more discussion of reference frames.

Inertial frames are very special reference frames; see Section 4.

4. What Is an Inertial Frame?

An inertial frame can be characterized in different ways. One is that it is the intended frame of Einstein's special theory of relativity. Another is that it is a reference frame without fictitious forces. All reference frames agree on the real forces, but they disagree on the so-called fictitious forces, such as the centrifugal force and Coriolis force. These fictitious forces will be different in different reference frames, but will go to zero in an inertial frame. Only non-inertial reference frames need fictitious forces. Inertial frames can account for gravitational forces so long as there is no curvature of spacetime involved.

Another way to characterize an inertial frame is by saying it is any reference frame in which Newton's first law of motion holds. Newton’s first law requires that a moving object which is unaffected by any unbalanced, outside forces will coast forever. That is, it will move unimpeded with a constant speed and constant direction forever. An object will not behave this way in actual physical space, which is a sign that reference frames for physical space will not be inertial frames, except as an approximation. An object that "moves inertially" is one that "coasts." In flat space, it moves uniformly in a straight line, and so is represented by a straight line in a Minkowski diagram, which is a diagram that does not allow space to curve. However, in real physical space with matter and energy and gravitation and other forces, the unimpeded or freely falling objects move on geodesics that would be considered to be curved lines if they were seen from a higher dimensional Euclidean space. Geodesics are often said to be the "straightest" paths, given that space itself is not flat.

Suppose you have pre-selected your frame. How do you tell whether it is an inertial frame? There are various ways. (i) Check that there are no fictitious forces. (ii) Check that Newton's laws of motion hold. (iii) Check that physical objects travel equal distances in equal amounts of time, that moving objects without unbalanced external forces always coast in straight lines, and so forth. (iv) Check that two objects coasting in parallel to each other will continue to do so and will continue to be the same distance apart.

Frames can be better or worse approximations to inertial frames. A frame fixed on the distant stars and describing phenomena far from any planets and stars is very nearly an inertial frame in this region. Any spacetime obeying the general theory of relativity will be locally Minkowskian in the sense that any infinitesimal region of spacetime has an inertial frame obeying the principles of special relativity.

5. What Is Spacetime?

Spacetime is where events are located. The dimensions of real spacetime include the time dimension of happens-after and three space dimensions.

Hermann Minkowski, in 1908, was the first person to say that spacetime is fundamental and that space and time are just aspects of spacetime because different reference frames will divide spacetime into different times and spaces.

Spacetime is believed to be a continuum in which we can define points and straight lines. However, these points and lines do not satisfy the principles of Euclidean geometry when gravity is present. Einstein showed that the presence of gravity affects geometry by warping both space and time. The gravitational field is actually manifested as the curvature of spacetime. This curvature implies that Euclidean geometry cannot be the correct geometry of spacetime. Black holes are a sign of especially radical curvature. The Earth's presence causes only a very slight curvature. Nevertheless, the curvature is significant enough that it must be accounted for in clocks of the Global Positioning Satellites (GPS) along with the other time dilation effect that is caused by speed. The GPS satellites are launched with their clocks adjusted so that when they reach orbit they mark time the same as Earth-based clocks do. Every GPS satellite carries four atomic clocks that need to be re-synchronized twice a day with a more trustworthy clock on Earth.

There have been serious attempts to construct theories of physics in which spacetime is a product of more basic entities. The primary aim of these new theories is to unify relativity with quantum theory. So far these theories have not stood up to any empirical observations or experiments that could show them to be superior to the presently accepted theories. So, for the present, the concept of spacetime remains fundamental.

The metaphysical question of whether spacetime is a substantial object or a relationship among events, or neither, is considered in the discussion of the relational theory of time. For other philosophical questions about what spacetime is, see What is a Field?

6. What Is a Minkowski Spacetime Diagram?

A spacetime diagram is a graphical representation of the point-events in spacetime. A Minkowski spacetime diagram is a representation of a spacetime obeying the laws of special relativity, but not necessarily general relativity. In a Minkowski spacetime diagram, normally a rectangular coordinate system is used, the time axis is shown vertically, one or two of the spatial axes are suppressed (that is, not included). Here is an example with only one space dimension:

This Minkowski diagram shows a point-sized Einstein standing still midway between the two places at which there is a flash of light. The directed arrows represent the path of light rays from the flash. In a Minkowski diagram, a physical (point) object is not represented as occupying a point but as occupying a line containing all the spacetime points at which it exists. That line, which usually is not straight, is called the worldline of the object.

In the above diagram, Einstein's worldline is a vertical straight line because no total external force is acting on him. If an object's worldline intersects or meets another object's worldline, then the two objects collide at the point of intersection.

The units along the vertical time axis are customarily chosen to be the product of time and the speed of light so that worldlines of light rays make a forty-five degree angle with each axis. This way, if a centimeter in the up or time direction is one second, then a centimeter to the right or space direction is one light-second, a very long distance.

In order to distinguish the time axis from an ordinary space axis in the MInkowski diagram, the times are also multiples of the square root of minus one. This makes time imaginary, but only in the sense of being a complex number, not in the sense of being like Santa Claus. When the general theory of relativity was developed, Einstein adopted a non-Euclidean Riemannian geometry that does not require using imaginary numbers for the time dimension.

The set of all possible photon histories or light-speed worldlines going through an event defines the two light cones of that event: the past light cone and the future light cone. The future cone is called a "cone" because, if we were to add another space dimension to our diagram, so it has two space dimensions and a single time dimension, then light emitted from the flash spreads out in the two dimensions of space in a circle of growing diameter, producing a cone shape. In a diagram for three-dimensional space, the light wavefront is an expanding sphere and not an expanding cone, but sometimes physicists will informally still speak of the "cone." It is customary in space-time diagrams to use a length scale so that c = 1, and light travels at a 45 degree angle.

The future light cone of the single flash event is defined to be all the space-time events reached by the light emitted from the flash. A pair of events inside the cone are events such that in principle the earlier one could have affected the later one; the events are causally-connectible, and the relation between the event and any other event is said to be time-like. This means that a body could travel between the two events without ever exceeding the speed of light. If you were once located in spacetime at, say <x,y,z,t>, then for the rest of your life you cannot affect or participate in any event that occurs outside of the light cone whose apex is at <x,y,z,t>. Light cones are a helpful tool because different observers or different frames of reference will agree on the light cone of any event, even if the event does not actually radiate any light.

Inertial motion produces a straight worldline, and accelerated motion produces a curved worldline in Minkowski diagrams. If at some time Einstein were to jump on a train moving by at constant speed, then his worldline would, from that time onward, tilt away from the vertical and form some angle less than 45 degrees with the time axis. In order to force a 45 degree angle to be the path of a light ray, the units on the time axis are not seconds but seconds times the speed of light. Any line tilted from than 45 degrees from the vertical is the worldline of an object moving faster than the speed of light in a vacuum. Events on the same horizontal line of the Minkowski diagram are simultaneous in that reference frame. Special relativity does not allow a worldline to be circular, or a closed curve, since the traveler would have to approach infinite speed at the top of the circle and at the bottom. A moving observer is added to the above diagram to produce the diagram below in section 12 in the discussion about the relativity of simultaneity.

If an observer's worldline tips over and goes back to the past, say because the worldline has the shape of an inverted cup, then there's a point where the line becomes horizontal. But a horizontal line is an indication that the observer is moving at infinite speed. This speed violates the special theory of relativity, and so does not exist, provided we accept special relativity. Or, if time travel to the past can occur, then it do so only in a spacetime that is not Minkowskian and so does not satisfy special relativity.

Does an observer move along their worldline? According to J.J.C. Smart, "Within the Minkowski representation we must not talk of our four-dimensional entities changing or not changing." ("Spatialising Time," Mind, 64: 239-241.)

Not all spacetimes can be given Minkowski diagrams, but any spacetime satisfying Einstein's Special Theory of Relativity can. Einstein's Special Theory falsely presupposes that physical processes, such as gravitational processes, have no effect on the structure of spacetime. When attention needs to be given to the real effect of these processes on the structure of spacetime, that is, when general relativity needs to be used, then Minkowski diagrams become inappropriate for spacetime. General relativity assumes that the geometry of spacetime is locally Minkowskian but not globally. That is, spacetime is locally flat in the sense that in any very small region one always finds spacetime to be 4-D Minkowskian (but not 4-D Euclidean). Special relativity holds in any infinitesimally small region of spacetime that satisfies general relativity, and so any such region can be fitted with an inertial reference frame. When we say spacetime is "really curved" and not flat, we mean it really deviates from 4-D Minkowskian geometry.

To repeat a point made earlier, when we speak of a point in these diagrams being a spacetime event, that is a non-standard use of the word "event." A point event in a Minkowski diagram is merely a location in spacetime where an event might or might not happen.

A Minkowski spacetime with three spatial dimensions and one time dimension does its best to treat time like a dimension of space, but Minkowski spacetime is radically different than a Euclidean space with four dimensions. This difference shows up in the fact that its metric (and the topology based on the metric) is very unlike that of the metric of a Euclidean four-space because its timelike dimension is in many ways very unspacelike, as we see in the next section.

7. What is Time's Metric and Spacetime's Interval?

A metric is a measure of separation. The metric for time is a measure of the temporal duration between time coordinates? The following four laws precisely define what we require of any metric for time.

Any metric m for time is a two-place function from a pair of time coordinates to a real number, the so-called duration between the pair, such that, for any three point-instants with time coordinates a, b, and c, m obeys the following principles:

1. m(a,b) ≥ 0

2. m(a,a) = 0.

3. If a ≠ b, then m(a,b) > 0.

4. m(a,b) + m(b,c) ≥ m(a,c)

Here is a metric m that is useful for measuring time and that satisfies the above four conditions:

m(a,b) = |b - a|

|b - a| is the absolute value of the difference between the time coordinates b and a.

This equation shows how we should use a clock to tell the duration between any two instants or instantaneous events. For example, the duration between the event of the clock displaying 2 and the event of the clock displaying 5 is |5 - 2|, namely 3. There are no units. 3 what? The units of time can be specified later.

If the instants form a linear continuum, as we believe they do for physical time, then between any two distinct instants, there are many others. So, if we have a digital clock that ticks every second, we know that between those ticks there are many other times not being displayed by the clock.

The metric m(a,b) = |b - a| is the standardly accepted metric for time if a and b are time coordinates, but could we just as well have used half that absolute value, or the square root of the absolute value? Is one metric more natural than another? Philosophers are interested in the underlying issue of whether the choice of a metric for a space is natural in the sense of being objective or whether its choice is always a matter of convention.

We will now add more some detail to the above treatment of the metric for time and include a discussion of the interval for spacetime. It is very important in the following discussion to be sensitive to the difference between a physical space and a mathematical space. A mathematical 3-dimensional space about dollar sales of cell phones and names of salespersons doing the selling and dates of sales also has a metric specifiying 'distances' or 'separations' or 'intervals' between points in that space.

A mathematical space is simply a collection of points, and a metric tells us the interval between those points. Not every mathemtical space can have a metric, but for those that can, we have a clear idea of the conditions that must be satisfied by the metric. To have a metric space, we need there to be a function specifying the interval or "measure of separation" between pairs of points. The interval must obey certain precisely specified conditions. Suppose we want a metric for spatial distance on the surface of Earth. We usually will not know in advance the distances between every two points on the globe, but we do know, in advance of measuring, that the distance between San Francisco and New York City will be less than or equal to the sum of the distance from San Francisco to Chicago plus the distance from Chicago to New York City.

In our everyday Euclidean space, the interval between two points is the length of the straight line connecting them. This interval is the spatial distance. However, the term "interval" is a technical term that does not always mean spatial distance.

In spacetime, how is the interval different from spatial distance and temporal duration? The brief answer is that for a pair of point events happening in the same place, the interval is just the temporal duration between them, but for two point events happening at the same time, the interval is the spatial distance between them. Let's investigate what that interval is.

In real spacetime, that is, in our real physical space with physical time, the theory of special relativity implies that two observers using reference frames that move relative to each other will correctly calculate different distances and different durations for pairs of point events, but they will calculate the same interval. If we agree with physicists that what is objective about spacetime is what does not change with a change in the reference frame we use, then the interval is objective, but velocities, distances, and durations are not. The metric of a space determines its geometry, and this geometry is intrinsic in the sense that it does not change as we change the reference frame on the space.

If we select a standard clock and the standard metric for time, then we assume the duration between any two successive clock ticks is the same, provided the clock is stationary in the coordinate system where the clock readings are taken.

A point of physical space is located by being assigned a coordinate. For doing quantitative science rather than merely qualitative science, we want the coordinate to be a number and not, say, a letter of the alphabet. A coordinate for a point in two-dimensional space requires two numbers rather than just one; a coordinate for a point in n-dimensional space requires n independently assigned numbers, where n is a positive integer. You should prefer a real number rather than a rational number, even though no measuring tool could detect the difference, because, for example, a square one unit on a side has no rational diagonal, but does have a real diagonal, with a real measure, namely the square root of two. Don't we want the diagonals of our squares to have a length?

Let's now consider metrics in different dimensions. In a one-dimensional Euclidean space, namely for an ordinary straight line, the metric d for two points x and y is customarily given by

d(x,y) = |x - y|.

We have intuitions about a one-dimensional space for time coordinates. For example, if event p happens before event q, and if q happens before r, then the location numbers for those events, namely, l(p), l(q) and l(r), must satisfy the inequality l(p) < l(q) < l(r).

In a two dimensional (or 2D) Euclidean space, the metric for the distance between the point (x,y) with Cartesian coordinates x and y and the point (x',y') with coordinates x' and y' is customarily defined to be the square root of (x' - x)2 + (y' - y)2. Note the application of the Pythagorean Theorem. If the space curves and so is not Euclidean, then a more sophisticated definition of the metric is required because we can no longer apply the Pythagorean Theorem, except perhaps in infinitesimal regions.

More generally, our intuitive idea of distance requires that, no matter how strange the space is, we want its metric function d to have the following distance-like properties. d is a function with two arguments. For any points p, q and r, the following five conditions must be satisfied:

  1. d(p,p) = 0
  2. d(p,q) is greater than or equal to 0
  3. If d(p,q) = 0, then p = q
  4. d(p,q) = d(q,p)
  5. d(p,q) + d(q,r) is greater than or equal to d(p,r)

We generalize these intuitions about physical space to our mathematical spaces. Notice that there is no mention of the path the distance d is taken across; all the attention is on the point pairs themselves. Notice also that the distance from p to q is specified without mentioning how many points exist between p and q. Does your idea of distance imply that those conditions on d should be true? If you were to check, then you'd find that the usual 2D metric defined above, namely the square root of (x' - x)2 + (y' - y)2, does satisfy these five conditions. So, does the 1D metric. In 3D Euclidean space, the customary metric is the square root of (x' - x)2 + (y' - y)2 + (z' - z)2. We might want a scale factor, say a, on the metric so that d2 = a[(x' - x)2 + (y' - y)2 + (z' - z)2]. This also satisfies our five conditions above. If space were to expand uniformly with time, then a cannot be a constant but must be a function of time, namely a(t).

After a metric is defined for a spacetime, the metric is commonly connected to empirical observations by saying that the readings taken with ideal meter sticks and ideal clocks (and lasers and radar and so forth) yield the values of the metric.

Now let's return to our discussion of the interval for spacetime. To have a metric for a 4-dimensional spacetime, we desire a definition of the interval between any two infinitesimally neighboring points in that spacetime. Less generally, consider an appropriate metric for the 4-D mathematical space that is used to represent the spacetime obeying the laws of special relativity theory. It uses a Minkowski spacetime. What is an appropriate metric for this spacetime? Well, if we were just interested in the space portion of this spacetime, then the above 3D Euclidean metric is fine. But we've asked a delicate question because the fourth dimension of Minkowski's mathematical space is special, and it represents a time dimension and not a space dimension. Because of time, our metric for spacetime needs to give what we have called the "interval" between any two point events, and not merely the spatial distance between the events.

Using Cartesian coordinates, the spacetime has the following customary Lorentzian metric or interval Δs (also call a Minkowski metric and a Minkowskian metric): For any pair of point events at (x',y',z',t') and (x,y,z,t),

Δs2 = (x' - x)2 + (y' - y)2 + (z' - z)2 - c2(t' - t)2

The square of the spacetime interval is Δs2. If this is positive we have a spacelike interval; when it is negative we have a timelike interval. Because true metrics are always positive, this Minkowskian or Lorentzian metric is not a true metric, nor even a pseudometric; but it is customary for physicists to refer to it loosely as a "metric" because Δs retains enough other features of the metric. Here is another equally good candidate for the Lorentzian metric:

Δs2 = - (x' - x)2 - (y' - y)2 - (z' - z)2 + c2(t' - t)2

Δs is called the interval of Minkowski spacetime. Notice the plus and minus signs on the four terms.

The interval is sensitive to both space and time. This is reflected in our comment earlier that, for a pair of point events happening in the same place, the interval is just the temporal duration between them, but for two point events happening at the same time, the interval is the spatial distance between them.

The interval of spacetime between two point events is complicated because its square can be negative. Notice that if Δs2 is zero, the two events might be identical, or they might have occurred millions of miles apart. In ordinary space, if the space interval between two events is zero, then the two events happened at the same time and place, but in spacetime, if the spacetime interval between two events is zero, this means only that there could be a light ray connecting them. It is because the spacetime interval between two events can be zero even when the events are far apart in distance that the term "interval" is very unlike what we normally mean by the term "distance."

All the events that have a zero spacetime interval from some event e constitute e's two light cones. This set of events is given that name because it has the shape of cones when represented in a three dimensional (or 2+1) Minkowski diagram, one cone for events in e's future and one cone for events in e's past. If event 2 is outside the light cones of event 1, then event 2 is said to occur in the "absolute elsewhere" of event 1. In that case, neither event could have affected the other by a causal influence traveling less than the speed of light. And, you as the analyst are free to choose a coordinate system in which event 1 happens first, or another coordinate system in which event 2 happens first, or even a coordinate system in which the two are simultaneous. But once the coordinate system is chosen, then this choice fixes the happens-before relation for all point-events.

Strictly speaking, a clock ticks off the amount

(1/c)√[- (x' - x)2 - (y' - y)2 - (z' - z)2 + c2(t' - t)2]

between position (x,y,z,t) and position (x',y',z',t') along its worldline. The ticking marks off congruent, invariant intervals. If the clock is stationary in its own inertial reference frame, then x' - x is zero, and so are y' - y and z' - z'; so, the clock measures the quantity t' - t.

What if we turn now from special relativity to general relativity? Adding space and time dependence (particularly the values of mass-energy and momentum at points) to each term of the Lorentzian metric produces the metric for general relativity. That metric requires more complex tensor equations; these put multiplication factors g in front of each of the products of the differential displacements such as (x' - x)2 and (x' - x)(y' - y), and the mathematical difficulty of the description escalates.

8. Does the Theory of Relativity Imply Time Is Partly Space?

In 1908, the mathematician Hermann Minkowski remarked that "Henceforth space by itself, and time by itself, are doomed to fade away into mere shadows, and only a kind of union of the two will preserve an independent reality." Many people took this to mean that time is partly space, and vice versa. The philosopher C. D. Broad countered that the discovery of spacetime did not break down the distinction between time and space but only their independence or isolation. He argued that their lack of independence does not imply a lack of reality.

Nevertheless, there is a deep sense in which time and space are "mixed up" or linked. This is evident from the Lorentz transformations of special relativity that connect the time t in one inertial frame with the time t' in another frame that is moving in the x direction at a constant speed v. In this Lorentz equation, t' is dependent upon the space coordinate x and the speed. In this way, time is not independent of either space or speed. It follows that the time between two events could be zero in one frame but not zero in another. Each frame has its own way of splitting up spacetime into its space part and its time part.

The reason why time is not partly space is that, within a single frame, time is always distinct from space. Time is a distinguished dimension of spacetime, not an arbitrary dimension. What being distinguished amounts to, speaking informally, is that when you set up a rectangular coordinate system on spacetime with an origin at, say, some important event, you may point the x-axis east or north or up, but you may not point it forward in time—you may do that only with the t-axis, the time axis.

9. Is Time the Fourth Dimension?

Yes and no; it depends on what you are talking about. Time is the fourth dimension of 4-d spacetime, but time is not the fourth dimension of physical space because that space has only three dimensions. In 4-d spacetime, the time dimension is special and unlike any of the other three dimensions.

Mathematicians have a broader notion of the term "space" than the average person; and in their sense a space need not consist of places, that is, geographical locations. Not paying attention to the two meanings of the term "space" is the source of all the confusion about whether time is the fourth dimension.

Newton treated space as three dimensional, and treated time as a separate one-dimensional space. He could have used Minkowski's 1908 idea, if he had thought of it, namely the idea of treating spacetime as four dimensional.

The mathematical space used by mathematical physicists to represent physical spacetime that obeys the laws of relativity is four dimensional; and in that mathematical space, the space of places is a 3D sub-space and time is another 1D sub-space. Minkowski was the first person to construct such a mathematical space, although in 1895 H. G. Wells treated time informally as a fourth dimension in his novel The Time Machine.

In any coordinate system on spacetime, mathematicians of the early twentieth century believed it was necessary to treat a point event with at least four independent numbers in order to account for the four dimensionality of spacetime. Actually this appeal to the 19th century definition of dimensionality, which is due to Bernhard Riemann, is not quite adequate because mathematicians have subsequently discovered how to assign each point on the plane to a point on the line without any two points on the plane being assigned to the same point on the line. The idea comes from the work of Georg Cantor. Because of this one-to-one correspondence, the points on a plane could be specified with just one number. If so, then the line and plane must have the same dimensions according to the Riemann definition of dimension. To avoid this problem and to keep the plane being a 2D object, the notion of dimensionality of a space has been given a new, but rather complex, definition.

10. Is There More Than One Kind of Physical Time?

Every reference frame has its own physical time, but the question is intended in another sense. At present, physicists measure time electromagnetically. They define a standard atomic clock using periodic electromagnetic processes in atoms, then use electromagnetic signals (light) to synchronize clocks that are far from the standard clock. In doing this, are physicists measuring '"electromagnetic time" but not other kinds of physical time?

In the 1930s, the physicists Arthur Milne and Paul Dirac worried about this question. Independently, they suggested there may be very many time scales. For example, there could be the time of atomic processes and perhaps also a time of gravitation and large-scale physical processes. Clocks for the two processes might drift out of synchrony after being initially synchronized, yet there would be no reasonable explanation for why they don't stay in synchrony. Ditto for clocks based on the pendulum, on superconducting resonators, on the spread of electromagnetic radiation through space, and on other physical principles. Just imagine the difficulty for physicists if they had to work with electromagnetic time, gravitational time, nuclear time, neutrino time, and so forth. Current physics, however, has found no reason to assume there is more than one kind of time for physical processes.

In 1967, physicists did reject the astronomical standard for the atomic standard because the deviation between known atomic and gravitation periodic processes could be explained better assuming that the atomic processes were the more regular of the two. But this is not a cause for worry about two times drifting apart. Physicists still have no reason to believe a gravitational periodic process that is just as regular initially as the atomic process and that is not affected by friction or impacts or other forces would ever drift out of synchrony with the atomic process, yet this is the possibility that worried Milne and Dirac.

11. How is Time Relative to the Observer?

Physical time is not relative to any observer's state of mind. Wishing time will pass does not affect the rate at which the observed clock ticks. On the other hand, physical time is relative to the observer's reference system—in trivial ways and in a deep way discovered by Albert Einstein.

In a trivial way, time is relative to the chosen coordinate system on the reference frame. For example, it depends on the units chosen as when the duration of some event is 34 seconds if seconds are defined to be a certain number of ticks of the standard clock, but is 24 seconds if seconds are defined to be a different number of ticks of that standard clock. Similarly, the difference between the Christian calendar and the Jewish calendar for the date of some event is due to a different unit and origin. Also trivially, time depends on the coordinate system when a change is made from Eastern Standard Time to Pacific Standard Time. These dependencies are taken into account by scientists but usually never mentioned. For example, if a pendulum's approximately one-second swing is measured in a physics laboratory during the autumn night when the society changes from Daylight Savings Time back to Standard Time, the scientists do not note that one unusual swing of the pendulum that evening took a negative fifty-nine minutes and fifty-nine seconds instead of the usual one second.

Isn't time relative to the observer's coordinate system in the sense that in some reference frames there could be fifty-nine seconds in a minute? No, due to scientific convention, it is absolutely certain that there are sixty seconds in any minute in any reference frame. How long an event lasts is relative to the reference frame used to measure the time elapsed, but in any reference frame there are exactly sixty seconds in a minute because this is true by definition. Similarly, you do not need to worry that in some reference frame there might be two gallons in a quart.

In a deeper sense, time is relative, not just to the coordinate system, but to the reference frame itself. That is Einstein's principal original idea about time. Einstein's special theory of relativity requires physical laws not change if we change from one inertial reference frame to another. In technical-speak Einstein is requiring that the statements of physical laws must be Lorentz-invariant. The equations of light and electricity and magnetism (Maxwell electrodynamics) are Lorentz-invariant, but those of Newton's mechanics are not, and Einstein eventually figured out that what needs changing in the laws of mechanics is that temporal durations and spatial intervals between two events must be allowed to be relative to which reference frame is being used. There is no frame-independent duration for an event extended in time. To be redundant, Einstein's idea is that without reference to the frame, there is no fixed time interval between two events, no 'actual' duration between them. This idea was philosophically shocking as well as scientifically revolutionary.

Einstein illustrated his idea using two observers, one on a moving train in the middle of the train, and a second observer standing on the embankment next to the train tracks. If the observer sitting in the middle of the rapidly moving train receives signals simultaneously from lightning flashes at the front and back of the train, then in his reference frame the two lightning strikes were simultaneous. But the strikes were not simultaneous in a frame fixed to an observer on the ground. This outside observer will say that the flash from the back had farther to travel because the observer on the train was moving away from the flash. If one flash had farther to travel, then it must have left before the other one, assuming that both flashes moved at the same speed. Therefore, the lightning struck the back of the train before the lightning struck the front of the train in the reference frame fixed to the tracks.

Let's assume that a number of observers are moving with various constant speeds in various directions. Consider the inertial frame of reference in which each observer is at rest in his or her own frame. Which of these observers will agree on their time measurements? Only observers with zero relative speed will agree. Observers with different relative speeds will not, even if they agree on how to define the second and agree on some event occurring at time zero (the origin of the time axis). If two observers are moving relative to each other, but each makes judgments from a reference frame fixed to themselves, then the assigned times to the event will disagree more, the faster their relative speed. All observers will be observing the same objective reality, the same event in the same spacetime, but their different frames of reference will require disagreement about how spacetime divides up into its space part and its time part.

This relativity of time to reference frame implies that there be no such thing as The Past in the sense of a past independent of reference frame. This is because a past event in one reference frame might not be past in another reference frame. However, this frame relativity usually isn't very important except when high speeds or high gravitational field strengths are involved.

In some reference frame, was Adolf Hitler born before George Washington? No, because the two events are causally connectible. That is, one event could in principle have affected the other since light would have had time to travel from one to the other. We can select a reference frame to reverse the usual Earth-based order of two events only if they are not causally connectible, that is, only if one event is in the absolute elsewhere of the other. Despite the relativity of time to a reference frame, any two observers in any two reference frames should agree about which of any two causally connectible events happened first.

12. What Is the Relativity of Simultaneity?

The relativity of simultaneity is the feature of spacetime in which two different reference frames moving relative to each other will disagree on which events are simultaneous.

How do we tell the time of occurrence of an event that is very far away from us? We assign the time when something is occurring far away from us by subtracting, from the time we noticed it, the time it took the signal to travel all that way to us.

For example, we see a flash of light at time t arriving from a distant place P. When did the flash occur back at P? Let's call that time tp. Here is how to compute tp. Suppose we know the distance x from us to P. Then the flash occurred at t minus the travel time for the light. That travel time is x/c. So,

tp = t - x/c.

For example, if we see an explosion on the Sun at t, then we know to say it really occurred about eight minutes before, because x/c is approximately eight minutes, where x is the distance from Earth to the Sun. In this way, we know what events on the distant Sun are simultaneous with what clicks on our Earth clock.

The deeper problem is that other observers will not agree with us that the event on the Sun occurred when we say it did. The diagram below illustrates the problem. Let's assume that our spacetime obeys the special theory of relativity.

There are two light flashes that occur simultaneously, with Einstein at rest midway between them in this diagram.


The Minkowski diagram represents Einstein sitting still in the reference frame (marked by the coordinate system with the thick black axes) while Lorentz is not sitting still but is traveling rapidly away from him and toward the source of flash 2. Because Lorentz's timeline is a straight line, we can tell that he is moving at a constant speed. The two flashes of light arrive at Einstein's location simultaneously, creating spacetime event B. However, Lorentz sees flash 2 before flash 1. That is, the event A of Lorentz seeing flash 2 occurs before event C of Lorentz seeing flash 1. So, Einstein will readily say the flashes are simultaneous, but Lorentz will have to do some computing to figure out that the flashes are simultaneous in the Einstein frame because they won't "look" simultaneous to Lorentz. However, if we'd chosen a different reference frame from the one above, one in which Lorentz is not moving but Einstein is, then Lorentz would be correct to say flash 2 occurs before flash 1 in that new frame. So, whether the flashes are or are not simultaneous depends on which reference frame is used in making the judgment. It's all relative.

13. What Is the Conventionality of Simultaneity?

The relativity of simultaneity is philosophically less controversial than the conventionality of simultaneity. To appreciate the difference, consider what is involved in making a determination regarding simultaneity. Given two events that happen essentially at the same place, physicists assume they can tell by direct observation whether the events happened simultaneously, assuming they are in a space obeying special relativity. If we don't see one of them happening first, then we say they happened simultaneously, and we assign them the same time coordinate. The determination of simultaneity is more difficult if the two happen at separate places, especially if they are very far apart. One way to measure (operationally define) simultaneity at a distance is to say that two events are simultaneous in a reference frame if unobstructed light signals from the two events would reach us simultaneously when we are midway between the two places where they occur, as judged in that frame. This is the operational definition of simultaneity used by Einstein in his theory of special relativity.

The "midway" method described above has a significant presumption: that the light beams travel at the same speed regardless of direction. Einstein, Reichenbach and Grünbaum have called this a reasonable "convention" because any attempt to experimentally confirm it presupposes that we already know how to determine simultaneity at a distance. This presupposition is about the conventionality, rather than relativity, of simultaneity. To pursue the issue of which event here is simultaneous with which event there, suppose the two original events are in each other's absolute elsewhere; that is, they could not have affected each other. Einstein noticed that there is no physical basis for judging the simultaneity or lack of simultaneity between these two events, and for that reason he said we rely on a convention when we define distant simultaneity as we do. Hilary Putnam, Michael Friedman, and Graham Nerlich object to calling it a convention—on the grounds that to make any other assumption about light's speed would unnecessarily complicate our description of nature, and we often make choices about how nature is on the basis of simplification of our description. They would say there is less conventionality in the choice than Einstein supposed.

The "midway" method is not the only way to define simultaneity. Consider a second method, the "mirror reflection" method. Select an Earth-based frame of reference, and send a flash of light from Earth to Mars where it hits a mirror and is reflected back to its source. The flash occurred at 12:00 according to an Earth clock, let's say, and its reflection arrived back on Earth 20 minutes later. The light traveled the same empty, undisturbed path coming and going. At what time did the light flash hit the mirror? The answer involves the so-called conventionality of simultaneity. All physicists agree one should say the reflection event occurred at 12:10. The controversial philosophical question is whether this is really a convention. Einstein pointed out that there would be no inconsistency in our saying that it hit the mirror at 12:17, provided we live with the awkward consequence that light was relatively slow getting to the mirror, but then traveled back to Earth at a faster speed.

Let's explore the reflection method that is used to synchronize a distant, stationary clock so that it reads the same time as our clock. Let's draw a Minkowski diagram of the situation and consider just one spatial dimension in which we are at location A with the standard clock for the reference frame. The distant clock we want to synchronize is at location B. See the following diagram.

conventionality of simultaneity graph

The fact that the timeline of the B-clock is parallel to the time axis shows that the clock there is stationary. We will send light signals in order to synchronize the two clocks. Send a light signal from A at time t1 to B, where it is reflected back to us, arriving at time t3. Then the reading tr on the distant clock at the time of the reflection event should be t2, where

t2 = (1/2)(t3 + t1).

If tr = t2, then the two clocks are synchronized.

Einstein noticed that the use of "(1/2)" in the equation t2 = (1/2)(t3 + t1) rather than the use of some other fraction implicitly assumes that the light speed to and from B is the same. He said this assumption is a convention, the so-called conventionality of simultaneity, and isn't something we could check to see whether it is correct. If t2 were (1/3)(t3 + t1), then the light would travel to B faster and return more slowly. If t2 were (2/3)(t3 + t1), then the light would travel to B relatively slowly and return faster. Either way, the average travel speed to and from would be c. Only with the fraction (1/2) are the travel speeds the same going and coming back.

Notice how we would check whether the two light speeds really are the same. We would send a light signal from A to B, and see if the travel time was the same as when we sent it from B to A. But to trust these times we would already need to have synchronized the clocks at A and B. But that synchronization process will use the equation t2 = (1/2)(t3 + t1), with the (1/2) again, so we are arguing in a circle here.

Not all philosophers of science agree with Einstein that the choice of (1/2) is a convention, nor with those philosophers such as Putnam who say the messiness of any other choice shows that the choice must be correct. Everyone agrees, though, that any other choice than (1/2) would make for messy physics.

However, some researchers suggest that there is a way to check on the light speeds and not simply presume they are the same. Transport one of the clocks to B at an infinitesimal speed. Going this slow, the clock will arrive at B without having its proper time deviate from that of the A-clock. That is, the two clocks will be synchronized even though they are distant from each other. Now the two clocks can be used to find the time when a light signal left A and the time when it arrived at B. The time difference can be used to compute the light speed. This speed can be compared with the speed computed for a signal that left B and then arrived at A. The experiment has never been performed, but the recommenders are sure that the speeds to and from will turn out to be identical, so they are sure that the (1/2) in the equation t2 = (1/2)(t3 + t1) is correct and not a convention. For more discussion of this controversial issue of conventionality in relativity, see pp. 179-184 of The Blackwell Guide to the Philosophy of Science, edited by Peter Machamer and Michael Silberstein, Blackwell Publishers, Inc., 2002.

14. What Is the Difference between the Past and the Absolute Past?

What does it mean to say the human condition is one in which you never will be able to affect an event outside your forward light cone? With any action you take, the speed of transmission of your action to its effect cannot move faster than c. This c is the c in E = mc2. It is the maximum speed in any reference frame. It is the speed of light and the speed of anything else with zero rest mass; it is also the speed of any electron or quark at the big bang before the Higgs field appeared and slowed them down.

Here is a visual representation of the human condition according to the special theory of relativity, whose spacetime can always be represented by a Minkowski diagram of the following sort:


The absolutely past events (the green events in the diagram above) are the events in or on the backward light cone of your present event, your here-and-now. The backward light cone of event Q is the imaginary cone-shaped surface of spacetime points formed by the paths of all light rays reaching Q from the past.

The events in your absolute past are those that could have directly or indirectly affected you, the observer, at the present moment. The events in your absolute future are those that you could directly or indirectly affect.

An event's being in another event's absolute past is a feature of spacetime itself because the event is in the point's past in all possible reference frames. The feature is frame-independent. For any event in your absolute past, every observer in the universe (who isn't making an error) will agree the event happened in your past. Not so for events that are in your past but not in your absolute past. Past events not in your absolute past will be in what Eddington called your "absolute elsewhere." The absolute elsewhere is the region of spacetime containing events that are not causally connectible to your here-and-now. Your absolute elsewhere is the region of spacetime that is neither in nor on either your forward or backward light cones. No event here now, can affect any event in your absolute elsewhere; and no event in your absolute elsewhere can affect you here and now.

A single point's absolute elsewhere, absolute future, and absolute past partition all of spacetime beyond the point into three disjoint regions. If point A is in point B's absolute elsewhere, the two events are said to be "spacelike related." If the two are in each other's forward or backward light cones they are said to be "timelike related" or "causally connectible." The order of occurrence of two space-like events depends on the chosen frame of reference, but the order of time-like events does not.

The past light cone looks like a cone when the region we are interested in is relatively small. However, the past light cone is not cone-shaped at the cosmological level but has a pear-shape because all very ancient light lines must have originated from the infinitesimal volume at the big bang.

15. What is Time Dilation?

Time dilation is about two synchronized clocks getting out of synchrony due either to their relative motion or due to their being in regions of different gravitational field strengths. Time dilation due to difference in speeds is described by Einstein's special theory of relativity. Time dilation due to difference in acceleration or difference in travel through a varying gravitational field is described by Einstein's general theory of relativity.  This section focuses on just the time dilation described by special relativity, namely time dilation due to speed.

According to special relativity, two properly functioning, stationary clocks once properly synchronized will stay in synchrony no matter how far away from each other they are. But if one clock moves and the other does not, then the moving clock will tick slower than the stationary clock, as measured in the inertial reference frame of the stationary clock. This slowing due to motion is called "time dilation." If you move at 99% of the speed of light, then your time slows by a factor of 7 relative to stationary clocks. Your mass also increases by this amount. In addition, you are 7 times thinner along your direction of motion than you were when stationary. If you move at 99.9%, then your aging process slows by a factor of 22 compared to our twin who remains more or less stationary compared to you.

Suppose your twin sister's spaceship travels to and from a star one light year away while you remain on Earth. It takes light from your Earth-based flashlight two years to go there, reflect in a mirror and arrive back on Earth, so your clock will say your sister's trip took longer than two years or longer than the travel time for light. But if the spaceship is fast, she can make the trip in less than two years, according to her own clock.

We sometimes speak of time dilation by saying time itself is "slower." Time is not going slower in any absolute sense, only relative to some other frame of reference.

According to special relativity, time dilation due to motion is relative in the sense that if your spaceship moves at constant speed past mine so fast that I measure your clock to be running at half speed, then you will measure my clock to be running at half speed also. This is not contradictory because we are making our measurements in different inertial frames. If one of us is affected by different gravitational field strengths or undergoes acceleration, then that person is not in an inertial frame and the results are slightly different, and general relativity is needed to describe what happens.

Both these types of time dilation play a significant role in time-sensitive satellite navigation systems such as the Global Positioning System. The atomic clocks on the satellites must be programmed to compensate for the relativistic dilation effects of both gravity and motion.

According to special relativity, if you are in the center of the field of large sports stadium filled with spectators, and you suddenly race to the exit door at constant, high speed, everyone else in the stadium will get thinner (in the frame fixed to you now) than they were originally (in the frame fixed to you before you left for the exit).

Here is a picture of the visual distortion of moving objects described by special relativity:

rolling wheel
Image: Corvin Zahn, Institute of Physics, Universität Hildesheim,
Space Time Travel (

The picture describes the same wheel in different colors: rotating in place just below the speed of light (green), moving left to right just below the speed of light (blue), and sitting still (red).

16. How Does Gravity Affect Time?

Spacetime in the presence of masses is curved. So, time is warped, too. The light cone structure of spacetime is affected by gravity and its description requires the general theory of relativity rather than the special theory. According to the general theory, the warping of spacetime just is gravity, and vice versa. So, when spacetime tilts, so do the light cones, and that is why time is warped, too.

According to general relativity, gravitational differences affect time by dilating it. Observers in a less intense gravitational potential find that clocks in a more intense gravitational potential run slow relative to their own clocks. People in ground floor apartments outlive their twins in penthouses, all other things being equal. Basement flashlights will be shifted toward the red end of the visible spectrum compared to the flashlights in attics. This effect is known as the gravitational red shift.

Informally, one speaks of gravity bending light rays around massive objects, but more accurately it is the spacetime that bends, and as a consequence the light's path is bent, too. The light simply follows the shortest path through space (called a geodesic), and when spacetime curves the shortest paths are no longer Euclidean straight lines.

17. What Happens to Time at a Black Hole?

A black hole is a compressed body of matter-energy with a very high gravitational field strength that constitutes such a severe warp in spacetime that particles inside can never escape the hole, regardless of how energetic they are. Black holes form when large stars exhaust their fuel and collapse into an infinitesimal volume. The surface of no return around the black hole where gravity is so strong that nothing from within can escape outward is called the black hole's "event horizon." Even a light wave cannot escape.

That is the majority opinion regarding black holes. A less common position is that they have no inside. It is not that there is a vacuum inside, but that the inside is non-existent. However, let's continue our discussion assuming black holes do have an inside.

General relativity implies that black holes are black, but when quantum theory is taken into account, the holes cannot be black because the event horizon continually glows with Hawking radiation. This Hawking radiation is continually being created out of the vacuum via pair production of particles and their antiparticles. The gravitational force of the black hole tugs on one of the two particles pulling it into the black hole, while the other particle of the pair radiates away, making the black hole bright, not black. For large black holes, this outward Hawking radiation is not especially dangerous, but the smaller the black hole the higher the frequency of the Hawking radiation. Each particle entering the black hole enters with negative gravitational potential energy, and this is equivalent to a negative mass that reduces the size of the black hole. Eventually it will evaporate and disappear.

If you on Earth were to judge the clocks of some astronauts falling toward the event horizon, the clocks would suffer significant time dilation. Suppose you are an astronaut who orbits very near a black hole but does not enter the horizon. Observers far away will see you live in slow motion compared to them—if they can see you. Any light from you that is sent back to Earth will undergo severe gravitational red shift.

If you were to return home to Earth after being near the black hole, you would discover that you were younger than your Earth-bound twin. Your initially synchronized clocks would show that yours had fallen behind. The reason for the loss of synchrony is that, when you approached the vicinity of the black hole, you experienced gravitational time dilation. This dilation or time warp would be more severe the longer you stayed in the vicinity and also the closer you got to the event horizon.

The center of our galaxy is a black hole about three to four million times as massive as our sun. Particles falling toward the horizon become highly accelerated, causing them to radiate x-rays. So, any person falling toward the black hole might be destroyed by the x-rays.

The first black hole was directly detected in 2015 from clear evidence that two of them had collided. Black holes were predicted to exist in 1915 by considering the implications of Einstein's new, general theory of relativity. In the classical black hole implied by general relativity, there must be a central singularity where the mass density and spacetime curvature are infinite. However, most twenty-first century astrophysicists believe that a better description of black holes will eventually include a theory of quantum gravity that will imply the center is dense but not infinitely dense, so there is not a real singularity.

The kind of singularity we are talking about that may or may not exist within a black hole is a physically real singularity, unlike the coordinate singularity we notice by asking for the longitude of the North Pole. The singularity of meridians can be removed by changing to a different coordinate system. The singularity in a black hole, if it exists, cannot be removed this way.

One odd feature of a black hole is that, if it is turning or twisting, then inside the event horizon there inevitably will be objects whose worldlines are closed time-like curves, and so these objects undergo past time travel.

Perhaps an even odder feature is that it is better to think of a person or object, once it arrives inside the black hole's event horizon, as aging toward the singularity, or its quantum mechanical equivalent, rather than as falling toward the singularity because inside the horizon the roles of time and space are switched. The singularity is not a place in space; it is a moment when time ends. Trying to avoid the singularity by switching on your rockets will be as pointless as getting in a car here on Earth and driving fast in order to avoid tomorrow afternoon. For more on why time and space switch roles inside the black hole, see

There are equally startling visual effects. A light ray can circle a black hole once or many times depending upon its angle of incidence. A light ray grazing a black hole can leave at any angle, so a person viewing a black hole can see multiple copies of the rest of the universe at various angles. See for some of these visual effects.

18. What is the Solution to the Twin Paradox?

This paradox is also called the clock paradox and the twins paradox. It is an argument about time dilation that uses the special theory of relativity to produce a contradiction. Consider two twins at rest on Earth with their clocks synchronized. One twin climbs into a spaceship and flies far away at a high, constant speed, then reverses course and flies back at the same speed. An application of the equations of special relativity theory implies that because of time dilation the twin on the spaceship will return and be younger than the Earth-based twin. The argument for the twin paradox is that it is all relative. That is, either twin could regard the other as the traveler. Let's consider the spaceship to be stationary. Wouldn’t relativity theory then imply that the Earth-based twin could race off (while attached to the Earth) and return to be the younger of the two twins? If so, then when the twins reunite, each will be younger than the other. That is paradoxical.

Herbert Dingle famously argued in the 1960s that this paradox reveals an inconsistency in special relativity. Almost all philosophers and scientists now agree that it is not a true paradox, in the sense of revealing an inconsistency within relativity theory, but is merely a complex puzzle that can be adequately solved within relativity theory, although there has been serious dispute about whether the solution can occur in special relativity or only in general relativity. Those who say the resolution of the twin paradox requires only special relativity are a small minority. Einstein said the solution to the paradox requires general relativity. Max Born said, "the clock paradox is due to a false application of the special theory of relativity, namely, to a case in which the methods of the general theory should be applied." In 1921, Wolfgang Pauli said, “Of course, a complete explanation of the problem can only be given within the framework of the general theory of relativity.”

There have been a variety of suggestions in the relativity textbooks on how to understand the paradox. Here is one, diagrammed below.

twin paradox

The principal suggestion for solving the paradox is to apply general relativity and then note that there must be a difference in the proper time taken by the twins because their behavior is different, as shown in their two world lines above. The coordinate time, that is, the time shown by clocks fixed in space in the coordinate system, is the same for both travelers. Their proper times are not the same. The longer line in the graph represents a longer path in space but a shorter duration of proper time. The length of the line representing the traveler's path in spacetime in the above diagram is not a correct measure of the traveler's proper time. Instead, the number of dots in the line is a measure of the proper time for the traveler. The spacing of the dots represents a tick of a clock in that world line and thus represents the proper time elapsed along the world line. The diagram shows how sitting still on Earth is a way of maximizing the proper during the trip, and it shows how flying near light speed in a spaceship away from Earth and then back again is a way of minimizing the proper time, even though if you paid attention only to the shape of the world lines and not to the dot spacing within them you might think just the reverse. This odd feature of the geometry is why Minkowski geometry is not Euclidean. So, the conclusion of the analysis of the paradox is that its reasoning makes the mistake of supposing that the situation of the two twins is the same as far as elapsed proper time is concerned.

A second way to understand the twin paradox—this is not a different solution, just a different description of the solution—is to note that each twin can consider the other twin to be the one who moves, but their two experiences will still be very different because their situations are not symmetric. Regardless of which twin is considered to be stationary, only one twin feels the acceleration at the turnaround point, so it should not be surprising that the two situations have different implications about time. And when the gravitational fields are taken into considerations, the equations of general relativity do imply that the younger twin is the one who feels the acceleration. However, the force felt by the spaceship twin is not what "forces" that twin to be younger. Nothing is forcing the twin to be younger just as nothing is forcing there to be the same speed limit in every reference frame.

A third suggestion for how to understand the paradox is to say that only the Earthbound twin can move at a constant velocity in a single inertial frame. If the spaceship twin is to be considered in an inertial frame and moving at a constant velocity, as required by special relativity, then there must be a different frame for the Earthbound twin's return trip than the frame for the outgoing trip. But changing frames in the middle of the presentation is an improper equivocation and shows that the argument of the paradox breaks down. In short, both twins' motions cannot always be inertial.

These three resolutions of the paradox—which are really variants of the same solution—tend to leave many people unsatisfied, probably because they think of the following situation. Consider a universe that is empty except for two clocks that move away from each other and then back again with constant speed. Isn't that like the situation in the clock paradox? The time shown on each clock can’t be less than what is shown on the other, can it? If we remove the stars and planets and other material from the universe and simply have two twins, isn't it clear that it would be inappropriate to say "there is an observable difference" due to one twin feeling an acceleration while the other does not? Won't both twins feel the same forces, and wouldn't relativity theory be incorrect if it implied that one twin returned to be younger than the other? Therefore, why does attaching the Earth to one of the twins force that twin to be the older one upon reunion?

OK, that is the argument of the paradox. The resolution requires appealing to general relativity. Notice that it is not just the Earth that is attached to the one twin. It is the Earth in tandem with all the planets and stars. Because of the movement of all this mass, the turnaround isn't felt by the Earthbound twin who moves in tandem with this great mass, but is felt very clearly by the spaceship twin. So, regardless of which twin is considered to be at rest, it is only the spaceship twin who feels any acceleration. Explaining this failure of the Earthbound twin to feel the force at the turnaround when the spaceship twin is at rest shows that a solution to the paradox ultimately requires a theory of the origin of inertia. But the point remains that the asymmetry in the experience of the two twins accounts for the aging difference and for the error in the argument of the twin paradox.

19. What Is the Solution to Zeno's Paradoxes?

See the article "Zeno's Paradoxes" in this encyclopedia.

20. How Do Time Coordinates Get Assigned to Points of Spacetime?

This question is asking how we coordinatize the four-dimensional manifold. The manifold is a collection of points (technically, a topological space) which behaves as a Euclidean space in neighborhoods around any point. Coordinates applied to the space are not physically real; they are tools used by the analyst, the physicist. In other words, they are invented, not discovered.

In the late 16th century, the Italian mathematician Rafael Bombelli interpreted real numbers as lengths on a line and interpreted addition, subtraction, multiplication, and division as movements along the line. This work eventually led to our assigning real numbers to both instants and durations.

To justify the assignment of time numbers (called time coordinates or dates or clock readings) to instants, we cannot literally paste a number to an instant. What we do instead is show that the structure of the set of instantaneous events is the same as, or embeddable within, the structure of our time numbers. The structure of our time numbers is the structure of real numbers. Showing that this is so is called "solving the representation problem" for our theory of time measurement. We won't go into detail on how to solve this problem, but the main idea is that to measure any space, including a one-dimensional space of time, we need a metrification for the space. The metrification assigns location coordinates to all points and assigns distances between all pairs of points. The method of assigning these distances is called the “metric” for the space. A metrification for time assigns dates to the points we call instants of time; these assignments are called time coordinates. Normally we use a clock to do this. Point instants get assigned a unique real number coordinate, and the metric or duration between any two of those point instants is found by subtracting their time coordinates from each other. The duration is the absolute value of the numerical difference of their coordinates, that is |t(B) - t(A)| where t(B) is the time coordinate of event B and t(A) is the time coordinate of A.

Let's reconsider the question of metrification in more detail, starting with the assignment of locations to points. Any space is a collection of points. In a space that is supposed to be time, these points are the instants, and the space for time is presumably linear (since presumably time is one-dimensional). Before discussing time coordinates specifically, let's consider what is meant by assigning coordinates to a mathematical space, one that might represent either physical space, or physical time, or spacetime, or the two-dimensional mathematical space in which we graph the relationship between the price of rice in China and the wholesale price of tulips in Holland. In a one-dimensional space, such as a curving line, we assign unique coordinate numbers to points along the line, and we make sure that no point fails to have a coordinate. For a two-dimensional space, we assign ordered pairs of numbers to points. For a 3D space, we assign ordered triplets of numbers. Why numbers and not letters? If we assign letters instead of numbers, we can not use the tools of mathematics to describe the space. But even if we do assign numbers, we cannot assign any coordinate numbers we please. There are restrictions. If the space has a certain geometry, then we have to assign numbers that reflect this geometry. If event A occurs before event B, then the time coordinate of event A, namely t(A), must be less than t(B). If event B occurs after event A but before event C, then we should assign coordinates so that t(A) < t(B) < t(C). Here is the fundamental method of this analytic geometry:

Consider a space as a class of fundamental entities: points. The class of points has "structure" imposed upon it, constituting it as a geometry—say the full structure of space as described by Euclidean geometry. [By assigning coordinates] we associate another class of entities with the class of points, for example a class of ordered n-tuples of real numbers [for a n-dimensional space], and by means of this "mapping" associate structural features of the space described by the geometry with structural features generated by the relations that may hold among the new class of entities—say functional relations among the reals. We can then study the geometry by studying, instead, the structure of the new associated system [of coordinates]. (Sklar, 1976, p. 28)

The goal in assigning coordinates to a space is to create a reference system for the space. A reference system is a reference frame plus (or including [the literature is ambiguous on this point]) a coordinate system. For 4-d spacetime obeying special relativity with its Lorentzian geometry, a coordinate system is a grid of smooth timelike and spacelike curves on the spacetime that assigns to each point three space coordinate numbers and one time coordinate number. No two distinct points can have the same set of four coordinate numbers.

As we get more global, we have to make adjustments. Consider two coordinate systems on adjacent regions. For the adjacent regions, we make sure that the 'edges' of the two coordinate systems match up in the sense that each point near the intersection of the two coordinate systems gets a unique set of four coordinates and that nearby points get nearby coordinate numbers. The result is an "atlas" on spacetime. Inertial frames can have global coordinate systems, but in general we have to make due with atlases. If we are working with general relativity where spacetime can curve and we cannot assume inertial frames, then the best we can do is to assign a coordinate system to a small region of spacetime where the laws of special relativity hold to a good approximation. General relativity requires special relativity to hold locally, and thus for spacetime to be Euclidean locally. That means that locally the 4-d spacetime is correctly described by 4-d Euclidean solid geometry.

For small regions of curved spacetime, we create a coordinate system for a small region, a chart of the atlas, by choosing a style of grid, say rectangular coordinates, fixing a point as being the origin, selecting one timelike and three spacelike lines to be the axes, and defining a unit of distance for each dimension. We cannot use letters for coordinates. The alphabet's structure is too simple. Integers won't do either; but real numbers are adequate to the task. The definition of "coordinate system" requires us to assign our real numbers in such a way that numerical betweenness among the coordinate numbers reflects the betweenness relation among points. For example, if we assign numbers 17, pi, and 101.3 to instants, then every interval of time that contains the pi instant and the 101.3 instant had better contain the 17 instant. When this feature holds everywhere, the coordinate assignment is said to be "monotonic" or to "obey the continuity requirement."

Because mathematical spaces, unlike physical spaces, are used for so many different purposes, the unit of distance measurement might be a meter or a second or a price in yen, depending on the space we are working with. The metric for a space is what specifies what is meant by distance in that space.

The natural metric between any two points in a one-dimensional space, such as the time sub-space of our spacetime, is the absolute value of the numerical difference between the coordinates of the two points. Using this metric for time, the duration between an event with the time coordinate 11 and the event with coordinate 7 is 5. The metric for spacetime defines the spacetime interval between two spacetime locations, and it is more complicated than the metric for time alone, as we have discussed elsewhere in this Supplement and the main time article. The spacetime interval between any two events is invariant or unchanged by a change to any other reference frame. More accurately, in the general theory, only the infinitesimal spacetime interval between two neighboring points is invariant.

There are still other restrictions on the assignments of coordinate numbers. The restriction that we called the "conventionality of simultaneity" fixes what time-slices of spacetime can be counted as collections of simultaneous events. An even more complicated restriction is that coordinate assignments satisfy the demands of general relativity. The metric of spacetime in general relativity is not global but varies from place to place due to the presence of matter and gravitation. Spacetime cannot be given its coordinate numbers without our knowing the distribution of matter and energy.

The features that a space has without its points being assigned any coordinates whatsoever are its topological features, its differential structures and affine structures. The topological features include its dimensionality, whether it goes on forever or has a boundary, and how many points there are. The affine structure is about lines, the geodesics. Differential structure allows us to use the differential calculus on the manifold.

21. How Do Dates Get Assigned to Actual Events?

Ideally for any reference frame we would like to partition the set of all actual events into simultaneity equivalence classes by some reliable method. All events in the same equivalence class are said to happen at the same time in the frame, and every event is in some class or other. Consider what event near the supergiant star Betelgeuse is happening (at the same time as our) now. To answer this difficult question, let's begin with some easier questions.

What is happening at time zero in our coordinate system? There is no way to select one point of spacetime and call it the origin of the coordinate system except by reference to actual events that can be measured. In practice, we make the origin be the location of a special event. For some frames we use the big bang. For another frame we might use the birth of Jesus or the entrance of Mohammed into Mecca. For a spherical coordinate system on the surface of Earth that will serve as a spatial coordinate system and not a spacetime coordinate system, we very often use the location of an observatory in Greenwich, England.

Our purpose in choosing a coordinate system or atlas is to express relationships among actual and possible events. The time relationships we are interested in are time-order relationships (Did this event occur between those two?) and magnitude-duration relationships (How long after A did B occur?) and date-time relationships (When did event A itself occur?). The date of a (point) event is the time coordinate number of the spacetime location where the event occurs. We expect all these assignments of dates to events to satisfy the requirement that event A happens before event B iff t(A) < t(B), where t(A) is the time coordinate of A, namely its date. The assignments of dates to events also must satisfy the demands of our physical theories, and in this case we face serious problems involving inconsistency as when a geologist gives one date for the birth of Earth, an astronomer gives a different date, and a theologian gives yet another date.

It is a big step from assigning numbers to points of spacetime to assigning them to real events. Here are some of the questions that need answers. How do we determine whether a nearby event and a very distant event occurred simultaneously? Assuming we want a second to be the standard unit for measuring the time interval between two events, how do we operationally define the second so we can measure whether one event occurred exactly one second later than another event? A related question is: How do we know whether the clock we have is accurate? Less fundamentally, attention must also be paid to the dependency of dates due to shifting from Standard Time to Daylight Savings Time, to crossing the International Date Line, to switching from the Julian to the Gregorian Calendar, and to comparing regular years with leap years.

Let's design a coordinate system for time. Suppose we have already assigned a date of zero to the event that we choose to be at the origin of our coordinate system. To assign dates (that is, time coordinates) to other events, we first must define a standard clock and declare that the time intervals between any two consecutive ticks of that clock are the same. The second is our conventional unit of time measurement, and it will be defined to be so many ticks of the standard clock. We then synchronize other clocks with the standard clock so the clocks show equal readings at the same time, when they are all relatively stationary and also not affected differently by gravitational fields. The time or date at which a point event occurs is the number reading on the clock at rest there. If there is no clock there, the assignment process is more complicated. One could transport a synchronized clock to that place, but any clock speed or influence by a gravitational field during the transport will need to be compensated for. If the place is across the galaxy, then any transport is out of the question, and other means must be used.

Because we want to use clocks to assign a time (coordinate) even to very distant events, not just to events in the immediate vicinity of the clock, and because we want to do this correctly, some appreciation of Einstein's theory of relativity is required. A major difficulty is that two nearby synchronized clocks, namely clocks that have been calibrated and set to show the same time when they are next to each other, will not in general stay synchronized if one is transported somewhere else. If they undergo the same motions and gravitational influences, they will stay synchronized; otherwise, they won't. There is no privileged transportation process that we can appeal to. For more on how to assign dates to distant events, see the discussion of the relativity and conventionality of simultaneity.

As a practical matter, dates are assigned to events in a wide variety of ways. The date of the birth of the Sun is assigned very differently from dates assigned to two successive crests of a light wave in a laboratory laser. For example, there are lasers whose successive crests of visible light waves pass by a given location in the laboratory every 10-15 seconds. This short time isn't measured with a stopwatch. It is computed from measurements of the light's wavelength. We rely on electromagnetic theory for the equation connecting the periodic time of the wave to its wavelength and speed. Dates for other kinds of events, such as the birth of Mohammad or the origin of the Sun, are computed from historical records rather than directly measured with a clock.

22. What Is Essential to Being a Clock?

Clocks are physically observable objects that can be used to assign times to instants and to assign durations to longer events. Clocks must tick and count those ticks and convert the ticks to measures of time, and have a means of displaying the time. Not just any ticking will do. Ideally, the ticks need to be regular in the sense that the duration between any tick and the next tick is the same duration. The technical term is that durations between ticks are congruent.

We need predictable, regular, cyclic behavior in order to measure time. In a pendulum clock, the cyclic behavior is the swings of the pendulum. In a digital clock, the cycles are oscillations in an electronic circuit. In a sundial, the cycles are daily movements of a shadow. The rotating Earth can be used as a clock that ticks once a day (in a reference frame in which the stars are approximately stationary), or it would be a clock if we were to add the feature that there is some counting of those ticks and displaying of the times. Similarly, the revolving Earth can be used as a clock that ticks once a year. A calendar is not a clock.

The count of a clock's ticks is normally converted and displayed in seconds—or in some other standard unit of time such as days or years. This counting can be difficult if there are very many ticks per second. Our standard atomic clock ticks 9,192,631,770 times per second.

It is an arbitrary convention that we design clocks to count up to higher numbers rather than down to lower numbers. It is also a convention that we re-set our clock by one hour as we move across a time-zone on the Earth's surface, or that we add leap days and leap seconds. However, it is no convention in a flat spacetime that in a reference frame the duration from instantaneous event A to instant B plus the duration from B to instant C is equal to the duration from A to C. It is one of the objective characteristics of time.

We like to use clocks that are in synchrony. That is, we want them to tick at the same rate when they are not moving relative to each other and are not differently affected by gravitation. This implies that, for any event, the two clocks agree on the duration of the event. However, we want more than synchrony. We also want our clocks to be accurate.

23. What Does It Mean for a Clock to be Accurate?

An accurate clock is a clock that is in synchrony with the standard clock. When the time measurements of the clock agree with the measurements made using the standard clock, we say the clock is accurate or properly calibrated or synchronized with the standard clock or simply correct. A perfectly accurate clock shows that it is time t just when the standard clock shows that it is time t, for all t. Accuracy is different from precision. If four clocks read exactly thirteen minutes slow compared to the standard clock, then the four are very precise, but they all are inaccurate by thirteen minutes.

If other clocks are synchronized with the standard clock when both are stationary, then those other clocks need to be used in a reference frame in which the standard clock is stationary, or else time dilation will make the clocks be inaccurate.

One philosophical issue is whether the standard clock itself is accurate. Realists will say that the standard clock is our best guess as to what time it really is, and we can make incorrect choices for our standard clock. Anti-realists will say that the standard clock cannot, by definition, be inaccurate, so any choice of a standard clock, even the choice of the president's heartbeat as our standard clock, will yield a standard clock that is accurate. Leibniz would qualify as an anti-realist because he said the best we can do in setting our clocks is to place them in synchrony with each other. Newton would disagree and say that for the standard clock to be accurate it must tick in synchrony with time itself.

Let's reconsider the answer to the question: Can the time shown on a standard clock can be inaccurate? There are a variety of answers to this question.

  1. Yes, because what we mean by accurate is the average of the many standard clocks, about 200 of them, and any single one could fail to be in sync with the average. That is why standard clocks get re-set every month.

So, let’s rephrase the question: Can the average time shown on our standard clocks be inaccurate?

  1. No, because the goal for accuracy is the best known clock, and that is just the average of our current standard clocks.
  2. Yes because the goal for accuracy is absolute time.
  3. Yes because the goal for accuracy is the best possible clock, and we don’t know what that is yet; we would need to see into the future.

Of the four answers, most physicists prefer answer (2).

A clock measures its own proper time, namely the times along its own worldline. If the clock is in an inertial frame and not moving relative to the standard clock, then it measures the "coordinate time," the time we agree to use in the coordinate system. If the spacetime has no inertial frame, then that spacetime cannot have an ordinary, Euclidean coordinate time.

Because clocks are intended to be used to measure events external to themselves, another goal in clock building is to ensure there is no difficulty in telling which clock tick is simultaneous with which events to be measured that are occurring away from the clock. For some situations and clocks, the sound made by the ticking helps us make this determination. We hear the tick just as we see the event occur that we desire to measure. Note that doing this presupposes that we can ignore the difference between the speed of sound and the speed of light.

In our discussion so far, we have assumed that the clock is very small, that it can count any part of a second and that it can count high enough to provide information for our calendars and longer-term records. These aren't always good assumptions. Despite those practical problems, there is the theoretical problem of there being a physical limit to the shortest duration measurable by a given clock because no clock can measure events whose duration is shorter than the time it takes light to travel between the components of that clock, the components in the part that generates the sequence of regular ticks. This theoretical limit places a lower limit on the error margin of any measurement made with that clock.

Every physical motion of every clock is subject to disturbances. So, to be an accurate clock, one that is in synchrony with the standard clock, we want our clock to be adjustable in case it drifts out of synchrony a bit. It helps to keep it isolated from environmental influences such as heat, dust, unusual electromagnetic fields, physical blows (such as dropping the clock), and immersion in the ocean. And it helps to be able to predict how much a specific influence affects the drift out of synchrony so that there can be an adjustment for this influence.

24. What Is Our Standard Clock?

In our earth-based reference frame, our standard clock is the clock that other motionless clocks are synchronized with. We want to select as our standard clock a clock that we can be reasonably confident will tick regularly in the sense that all periods between adjacent ticks are congruent (that is, the same duration). Choosing a standard clock that is based on the beats of the president's heart would be a poor choice because stationary clocks everywhere would go out of synchrony with the standard clock when the president goes jogging.

Most nations agree on what clock is the standard clock. The international time standard used by most nations is called Coordinated Universal Time [or U.T.C. time, for the initials of the French name]. It is not based on only a single standard clock but rather on a large group of them. Here are more details about U.T.C. time.

Atomic Time or A.T. time is what is produced by a cesium-based atomic fountain clock that counts in seconds, where those seconds are the S.I. seconds or Système International seconds (in the International Systems of Units, that is, Le Système International d'Unités). The S.I. second is defined to be the time it takes for a motionless standard cesium atomic clock to emit exactly 9,192,631,770 cycles of radiation of a certain color of light that is emitted from the clock’s cloud of cesium 133 atoms. This frequency is very stable. (More details about this process are offered below. As physics research continues to improve time measurement, the standard use of the cesium clock is likely to be changed by convention to clocks with even more stable frequencies.)

Actually, for the more precise timekeeping, the T.A.I. time scale is used rather than the A.T. scale. The T.A.I. scale does not use merely a single standard cesium clock but rather a calculated average of the readings of about 200 official cesium atomic clocks that are distributed around the world in about fifty selected laboratories. One of those laboratories is the National Institute of Standards and Technology in Boulder, Colorado, U.S.A. This calculated average time is called T.A.I. time, the abbreviation of the French phrase for International Atomic Time. The International Bureau of Weights and Measures near Paris performs the averaging about once a month. If your laboratory in the T.A.I. system had sent in your clock's reading for certain events that occurred in the previous month, then in the following month, the Bureau would calculate the average answer for the clock readings, then send you a report of how inaccurate your guess was from the average, so you could make adjustments to your clock.

Coordinated Universal Time or U.T.C. time is T.A.I. time plus or minus some integral number of leap seconds. U.T.C. is, by agreement, the time at the Prime Meridian, the longitude that runs through Greenwich England. The official government time is different in the time zones in different countries. In the U.S.A., for example, the government time is U.T.C. time minus the hourly offsets for the appropriate time zones of the U.S.A. including whether daylight savings time is observed. U.T.C. time is informally called Zulu Time, and it is the time used by the Internet and the aviation industry throughout the world.

A.T. time, T.A.I. time, and U.T.C. time are not kinds of physical time but rather kinds of measurements of physical time. So, this is another reason why the word "time" is ambiguous; sometimes it means unmeasured time, and sometimes it means the measure of that time. Speakers rarely take care to say explicitly how they are using the term, so readers need to stay alert, even in the present Supplement and in the main Time article.

At the 13th General Conference on Weights and Measures in 1967, the definition of a second was changed from a certain fraction of a solar day to a specific number of periods of radiation produced by an atomic clock. This new second is the so-called "standard second" or S.I. second. It is now defined to be the duration of 9,192,631,770 periods (cycles, oscillations, vibrations) of a certain kind of microwave radiation emitted in the standard cesium atomic clock. More specifically, the second is defined to be the duration of 9,192,631,770 periods of the microwave radiation required to produce the maximum fluorescence of a small cloud of cesium 133 atoms (that is, their radiating a specific color of light) as an electron in the atom makes a transition between two specific hyperfine energy levels of the ground state of the atom. This is the internationally agreed upon unit for atomic time [the T.A.I. system]. The old astronomical system [Universal Time 1 or UT1] defined a second to be 1/86,400 of an an average solar day.

For this "atomic time," or time measured atomically, the atoms of cesium are cooled to near absolute zero and given a uniform energy while trapped in an atomic fountain or optical lattice and irradiated with microwaves. The frequency of the microwave radiation is tuned until maximum fluorescence is achieved. That is, it is adjusted until the maximum number of cesium atoms flip from one energy level to another, showing that the microwave radiation frequency is precisely tuned to be 9,192,631,770 vibrations per second. Because this frequency for maximum fluorescence is so stable from one experiment to the next, the vibration number is accurate to this many significant digits.

The meter depends on the second, so time measurement is more basic than space measurement. It does not follow, though, that time is more basic than space. After 1983, scientists agreed that the best way to define and to measure length between any two points A and B is to do it via measuring the number of periods of a light beam reaching from A to B. This is for three reasons: (i) light propagation is very stable or regular; its speed is either constant, or when not constant such as its moving through water or moving at 38 miles per hour through a Bose-Einstein condensate, we know how to compensate for the influence of the medium; (ii) a light wave's frequency can also be made very stable; and (iii) distance can not be measured more accurately.

In 1999, the meter was defined in terms of the (pre-defined) second as being the distance light travels in a vacuum in an inertial reference frame in exactly 0.000000003335640952 seconds, or 1/299,792,458 seconds. That number is picked by convention so that the new meter will be very nearly the same distance as the old meter that was once defined to be the distance between two specific marks on a platinum bar kept in the Paris Observatory.

Time can be measured not only more accurately than distance but also more accurately than voltage, temperature, mass, or anything else.

One subtle implication of the standard definition of the second and the meter is that they fix the speed of light in a vacuum in all inertial frames. The speed is exactly one meter per 0.000000003335640952 seconds or 299,792,458 meters per second. There can no longer be any direct measurement to check whether that is how fast light really moves; it is defined to be moving that fast. Any measurement that produced a different value for the speed of light would be presumed initially to have an error. The error would be in, say, its measurements of lengths and durations, or in its assumptions about being in an inertial frame, or in its adjustments for the influence of gravitation and acceleration, or in its assumption that the light was moving in a vacuum. This initial presumption of where the error lies comes from a deep reliance by scientists on Einstein's theory of relativity. However, if it were eventually decided by the community of scientists that the speed of light shouldn't have been fixed as it was, then the scientists would call for a new world convention to re-define the second.

Leap years (with their leap days) are needed as adjustments to the standard clock in order to account for the fact that the number of the Earth’s rotations per Earth revolution does not stay constant from year to year. The Earth is spinning slower every day. Without an adjustment, our midnights will drift into the daylight. Leap seconds are needed for another reason. The Earth's period changes irregularly due to earthquakes and hurricanes. This effect is not practically predictable, so, when the irregularity occurs, a leap second is added or subtracted every six months as needed.

25. Why are Some Standard Clocks Better than Others?

Other clocks ideally are calibrated by being synchronized to "the" standard clock, but some choices of standard clock are better than others. The philosophical question is whether the better choice is objectively better because it gives us an objectively more accurate clock, or whether the choice is a matter merely of convenience and makes our concept of time a more useful tool for doing physics. The issue is one of realism vs. instrumentalism. Let's consider the various goals we want to achieve in choosing one standard clock rather than another.

One goal is to choose a clock that doesn't drift very much. That is, we want a clock that has a very regular period—so the durations between ticks are congruent. Throughout history, scientists have detected that their currently-chosen standard clock seemed to be drifting. In about 1700, scientists discovered that the time from one day to the next, as determined by sunrises, varied throughout the year. Therefore, they decided to define durations in terms of the mean day throughout the year. Before the 1950s, the standard clock was defined astronomically in terms of the mean rotation of the Earth upon its axis [solar time]. For a short period in the 1950s and 1960s, it was defined in terms of the revolution of the Earth about the Sun [ephemeris time]. The second was defined to be 1/86,400 of the mean solar day, the average throughout the year of the rotational period of the Earth with respect to the Sun.

Now we've found a better standard clock, a certain kind of atomic clock [which displays "atomic time"] that was discussed in the previous section of this Supplement. All atomic clocks measure time in terms of the natural resonant frequencies of certain atoms or molecules. (The dates of adoption of these standard clocks is omitted in this paragraph because different international organizations adopted different standards in different years.) The U.S.A.'s National Institute of Standards and Technology's F-1 atomic fountain clock, that is used for reporting standard time in the U.S.A. (after adjustment so it reports the average from the other laboratories in the T.A.I. network), is so accurate that it drifts by less than one second every 300 million years. We know there is this drift because it is implied by the laws of physics, not because we have a better clock that measures this drift. With engineering improvements, the "300 million" number may improve.

In 2014, several physicists writing in the journal Nature Physics suggested someday replacing our current standard clock with a network of atomic clocks that are connected via quantum entanglement. They claim that this new clock would not lose a second in 1380 million years, which is the approximate age of the universe since the big bang.

To achieve the goal of restricting drift, we isolate the clock from outside effects. That is, a practical goal in selecting a standard clock is to find a clock that can be well insulated from environmental impact such as comets impacting the Earth, earthquakes, stray electric fields or the presence of dust. If not insulation, then we pursue the goal of compensation. If there is some theoretically predictable effect of the influence upon the standard clock, then the clock can be regularly adjusted to compensate for this effect.

Consider the insulation problem we would have if we were to use as our standard clock the mean yearly motion of the Earth around the Sun. Can we compensate for all the relevant disturbing effects on the motion of the Earth around the Sun? Not easily. The problem is that the Earth's rate of spin varies in a practically unpredictable manner. Meanwhile, we believe that the relevant factors affecting the spin (such as shifts in winds, comet bombardment, earthquakes, the ocean's tides and currents, convection in Earth's molten core) are affecting the rotational speed and period of revolution of the Earth, but not affecting the behavior of the atomic clock. We don't want to be required to say that an earthquake on Earth or the melting of Greenland ice caused a change in the frequency of cesium emissions throughout the galaxies.

We add leap days and seconds in order to keep our atomic-based calendar in synchrony with the rotations and revolutions of the Earth. We want to keep atomic-noons occurring on astronomical-noons and ultimately to prevent Northern hemisphere winters from occurring in some future July, so we systematically add leap years and leap seconds and leap microseconds in the counting process. These changes do not affect the duration of a second, but they do affect the duration of a year because, with leap years, not all years last the same number of seconds. In this way, we compensate for the Earth-Sun clocks falling out of synchrony with our standard clock.

Another desirable feature of a standard clock is that reproductions of it stay in synchrony with each other when environmental conditions are the same. Otherwise we may be limited to relying on a specifically-located standard clock that can't be trusted elsewhere and that can be stolen. Cesium clocks in a suburb of Istanbul work just like cesium clocks in an airplane over New York City.

The principal goal in selecting a standard clock is to reduce mystery in physics by finding a periodic process that, if adopted as our standard, makes the resulting system of physical laws simpler and more useful, and allows us to explain phenomena that otherwise would be mysterious. Choosing an atomic clock as standard is much better for this purpose than choosing the periodic dripping of water from our goat skin bag or even the periodic revolution of the Earth about the Sun. If scientists were to have retained the Earth-Sun clock as the standard clock and were to say that by definition the Earth does not slow down in any rotation or in any revolution, then when a comet collides with Earth, tempting the scientists to say the Earth's period of rotation and revolution changed , the scientists would be forced instead to alter, among many other things, their atomic theory and say the frequency of light emitted from cesium atoms mysteriously increases all over the universe when comets collide with Earth. By switching to the cesium atomic standard, these alterations are unnecessary, and the mystery vanishes.

To achieve the goal of choosing a standard clock that maximally reduces mystery, we want the clock's readings to be consistent with the accepted laws of motion, in the following sense. Newton's first law of motion says that a body in motion should continue to cover the same distance during the same time interval unless acted upon by an external force. If we used our standard clock to run a series of tests of the time intervals as a body coasted along a carefully measured path, and we found that the law was violated and we couldn't account for this mysterious violation by finding external forces to blame and we were sure that there was no problem otherwise with Newton's law or with the measurement of the length of the path, then the problem would be with the clock. Leonhard Euler [1707-1783] was the first person to suggest this consistency requirement on our choice of a standard clock. A similar argument holds today but with using the laws of motion from Einstein's theory of relativity.

What it means for the standard clock to be accurate depends on your philosophy of time. If you are a conventionalist, then once you select the standard clock it can not fail to be accurate in the sense of being correct. On the other hand, if you are an objectivist, you will say the standard clock can be inaccurate. There are different sorts of objectivists. Suppose we ask the question, "Can the time shown on a properly functioning standard clock ever be inaccurate?" The answer is "no" if the target is synchrony with the current standard clock, as the conventionalists believe, but "yes" if there is another target. Objectivists can propose at least three other distinct targets: (1) absolute time (perhaps in Isaac Newton's sense that he proposed in the 17th century), (2) the best possible clock, and (3) the best known clock. We do not have a way of knowing whether our current standard clock is close to target 1 or target 2. But if the best known clock is known not yet to have been chosen to be the standard clock, then the current standard clock can be inaccurate in sense 3.

When you want to know how long a basketball game lasts, why do we subtract the start time from the end time? The answer is that we accept a metric for duration in which we subtract two time numbers to determine the duration between the two. Why don't we choose another metric and, let's say, subtract the square root of the start time from the square root of the end time? This question is implicitly asking whether our choice of metric can be incorrect or merely inconvenient.

Let's say more about this. When we choose a standard clock, we are choosing a metric. By agreeing to read the clock so that a duration from 3:00 to 5:00 is 5-3 hours or 2 hours, we are making a choice about how to compare any two durations in order to decide whether they are equal, that is, congruent. We suppose the duration from 3:00 to 5:00 as shown by yesterday's reading of the standard clock was the same as the duration from 3:00 to 5:00 on the readings from two days ago, and will be the same for today's readings and tomorrow's readings. Philosophers of time continue to dispute the extent to which the choice of metric is conventional rather than objective in the sense of being forced on us by nature. The objectivist says the choice is forced and that the success of the standard atomic clock over the standard solar clock shows that we were more accurate in our choice of the standard clock. An objectivist disagrees and believes that whether two intervals of time are really equivalent is an intrinsic feature of nature, so choosing the standard clock is not any more conventional than our choosing to say the Earth is round rather than flat. Taking this conventional side on this issue, Adolf Grünbaum argues that time is "metrically amorphous." It has no intrinsic metric. Instead, we choose the metric we do in order only to achieve the goals of reducing mystery in science, but satisfying those goals is no sign of being correct.

The conventionalist as opposed to the objectivist would say that if we were to require by convention that the instant at which Jesus was born and the instant at which Abraham Lincoln was assassinated are to be only 24 seconds apart, whereas the duration between Lincoln's assassination and his burial is to be 24 billion seconds, then we could not be mistaken. It is up to us as a civilization to say what is correct when we first create our conventions about measuring duration. We can consistently assign any numerical time coordinates we wish, subject only to the condition that the assignment properly reflect the betweenness relations of the events that occur at those instants. That is, if event J (birth of Jesus) occurs before event L (Lincoln's assassination) and this in turn occurs before event B (burial of Lincoln), then the time assigned to J must be numerically less than the time assigned to L, and both must be less than the time assigned to B so that t(J) < t(L) < t(B). A simple requirement. Yes, but the implication is that this relationship among J, L, and B must hold for events simultaneous with J, and for all events simultaneous with K, and so forth.

It is other features of nature that lead us to reject the above convention about 24 seconds and 24 billion seconds. What features? There are many periodic processes in nature that have a special relationship to each other; their periods are very nearly constant multiples of each other; and this constant stays the same over a long time. For example, the period of the rotation of the Earth is a fairly constant multiple of the period of the revolution of the Earth around the Sun, and both these periods are a constant multiple of the periods of a swinging pendulum and of vibrations of quartz crystals. The class of these periodic processes is very large, so the world will be easier to describe if we choose our standard clock from one of these periodic processes. A good convention for what is regular will make it easier for scientists to find simple laws of nature and to explain what causes other events to be irregular. It is the search for regularity and simplicity and removal of mystery that leads us to adopt the conventions we do for numerical time coordinate assignments and thus leads us to choose the standard clock we do choose. Objectivists disagree and say this search for regularity and simplicity and removal of mystery is all fine, but it is directing us toward the intrinsic metric, not simply the useful metric.

26. What is a Field?

A field is something filling the universe that takes on a value everywhere. Field theory has the advantage that, if you want to know what will happen next, you don't have to consider the influence of everything in the universe, but only nearby field values. For example, to figure out what happens to a thrown ball that is influenced only by gravity, Newton's theory of gravity requires consideration of gravitational forces on the ball from all the other matter in the universe at the instant the ball is thrown. In the corresponding field theory, one need only consider the gravitational field near the ball's location. However, Newton's theory of gravity is often practical to use because gravitational forces get weaker with distance, so in most calculations that don't require extreme accuracy one can ignore the very distant objects and consider only the large nearby objects.

Quantum field theory of the Standard Model of Particle Physics is a very well-confirmed theory of non-classical physics that treats a material particle as a localized vibration in a field filling all of space. The vibration (the technical term is "oscillation") is a fuzzy bundle of quantized energy occupying a region of space bigger than a single point. The propagation of basic particles from one place to another is due to the fact that any change in the field values induces nearby changes a little later. The particle of the electromagnetic field is called the "photon," and the particle of the gravitational field is called the "graviton."

The particles of the field move, but the field itself does not.

Physicists are interested in so-called "covariant" fields, meaning fields whose physical behavior is independent of the coordinate systems used to describe the fields. What is real is covariant, so to speak.

There are many basic fields that co-exist and interact. There are twelve matter fields (such as the electron field and the up quark field), four basic force fields (such as the electromagnetic field and the gravitational field), the Higgs field (whose interactions [or not] with a particle cause the particle to have mass [or no mass]), and perhaps the inflaton field. Every type of elementary particle has its own field. A field's value at a point might be a simple number (as in the Higgs field), or a vector (as in the classical electromagnetic field), or a tensor (as in Einstein's gravitational field).

According to quantum field theory, once one of these fields comes into existence it cannot be removed from existence; the field exists everywhere even if its values are everywhere zero or whatever is the lowest possible value. Magnets create magnetic fields, but if you were to remove all the magnets there would still be a magnetic field. Sources of fields are not essential for the existence of fields. And even when a field's value is the lowest possible (called the "vacuum state") in a region, because of the Heisenberg Uncertainty Principle there is always a non-zero probability that its value will spontaneously deviate from that value in the region. The most common way this happens is via a particle and its anti-particle spontaneously coming into existence in the region for a short time, then annihilating each other in a small burst of energy. You can think of space at its basis being a sea of pairs of these particles and their anti-particles coming into existence and then being annihilated in an extremely short time. So, even if all universe's fields were to be at their lowest state, they would still have some energy. Empty space always has some activity and energy, but this energy is inaccessible to us; we can never use it. Clearly, the empty space of physics is not the metaphysician's nothingness. And there is no region of it where there could be empty time (in the sense meant by a Leibnizian relationist).

But quantum field theory is not a theory of cosmology. There are a variety of theories of cosmology, and these disagree about whether the past is infinite. So, physicists do not agree on which fields, if any, have existed forever.

What is the relationship between spacetime and these fields? The most common position among physicists is that spacetime is a geometric entity. However, many physicists believe instead that spacetime is really a physical field, perhaps the gravitational field, but a field that can be described by geometrical means. Then there is the question of whether this spacetime field grounds all other physical fields, or is just one of the many fields.

It is not at all clear that a distinction can be maintained between spacetime and the other fields because the energy contained in the matter fields of spacetime cannot be clearly separated from the gravitational energy of spacetime itself. Gravitational energy can be transformed into the energy of the matter fields, and vice versa. There are significant metaphysical implications for this breakdown of the common distinction. Many physicists believe that the universe is not composed of various fields; it is composed of a single entity, the quantum field, which has such a character that it appears as if it is composed of various different fields.

For an elementary introduction to quantum fields, see the video

Back to the main “Time” article.


Author Information

Bradley Dowden
California State University Sacramento
U. S. A.

Simplicity in the Philosophy of Science

The view that simplicity is a virtue in scientific theories and that, other things being equal, simpler theories should be preferred to more complex ones has been widely advocated in the history of science and philosophy, and it remains widely held by modern scientists and philosophers of science. It often goes by the name of “Ockham’s Razor.” The claim is that simplicity ought to be one of the key criteria for evaluating and choosing between rival theories, alongside criteria such as consistency with the data and coherence with accepted background theories. Simplicity, in this sense, is often understood ontologically, in terms of how simple a theory represents nature as being—for example, a theory might be said to be simpler than another if it posits the existence of fewer entities, causes, or processes in nature in order to account for the empirical data. However, simplicity can also been understood in terms of various features of how theories go about explaining nature—for example, a theory might be said to be simpler than another if it contains fewer adjustable parameters, if it invokes fewer extraneous assumptions, or if it provides a more unified explanation of the data.

Preferences for simpler theories are widely thought to have played a central role in many important episodes in the history of science. Simplicity considerations are also regarded as integral to many of the standard methods that scientists use for inferring hypotheses from empirical data, the most of common illustration of this being the practice of curve-fitting. Indeed, some philosophers have argued that a systematic bias towards simpler theories and hypotheses is a fundamental component of inductive reasoning quite generally.

However, though the legitimacy of choosing between rival scientific theories on grounds of simplicity is frequently taken for granted, or viewed as self-evident, this practice raises a number of very difficult philosophical problems. A common concern is that notions of simplicity appear vague, and judgments about the relative simplicity of particular theories appear irredeemably subjective. Thus, one problem is to explain more precisely what it is for theories to be simpler than others and how, if at all, the relative simplicity of theories can be objectively measured. In addition, even if we can get clearer about what simplicity is and how it is to be measured, there remains the problem of explaining what justification, if any, can be provided for choosing between rival scientific theories on grounds of simplicity. For instance, do we have any reason for thinking that simpler theories are more likely to be true?

This article provides an overview of the debate over simplicity in the philosophy of science. Section 1 illustrates the putative role of simplicity considerations in scientific methodology, outlining some common views of scientists on this issue, different formulations of Ockham’s Razor, and some commonly cited examples of simplicity at work in the history and current practice of science. Section 2 highlights the wider significance of the philosophical issues surrounding simplicity for central controversies in the philosophy of science and epistemology. Section 3 outlines the challenges facing the project of trying to precisely define and measure theoretical simplicity, and it surveys the leading measures of simplicity and complexity currently on the market. Finally, Section 4 surveys the wide variety of attempts that have been made to justify the practice of choosing between rival theories on grounds of simplicity.

Table of Contents

  1. The Role of Simplicity in Science
    1. Ockham’s Razor
    2. Examples of Simplicity Preferences at Work in the History of Science
      1. Newton’s Argument for Universal Gravitation
      2. Other Examples
    3. Simplicity and Inductive Inference
    4. Simplicity in Statistics and Data Analysis
  2. Wider Philosophical Significance of Issues Surrounding Simplicity
  3. Defining and Measuring Simplicity
    1. Syntactic Measures
    2. Goodman’s Measure
    3. Simplicity as Testability
    4. Sober’s Measure
    5. Thagard’s Measure
    6. Information-Theoretic Measures
    7. Is Simplicity a Unified Concept?
  4. Justifying Preferences for Simpler Theories
    1. Simplicity as an Indicator of Truth
      1. Nature is Simple
      2. Meta-Inductive Proposals
      3. Bayesian Proposals
      4. Simplicity as a Fundamental A Priori Principle
    2. Alternative Justifications
      1. Falsifiability
      2. Simplicity as an Explanatory Virtue
      3. Predictive Accuracy
      4. Truth-Finding Efficiency
    3. Deflationary Approaches
  5. Conclusion
  6. References and Further Reading

1. The Role of Simplicity in Science

There are many ways in which simplicity might be regarded as a desirable feature of scientific theories. Simpler theories are frequently said to be more “beautiful” or more “elegant” than their rivals; they might also be easier to understand and to work with. However, according to many scientists and philosophers, simplicity is not something that is merely to be hoped for in theories; nor is it something that we should only strive for after we have already selected a theory that we believe to be on the right track (for example, by trying to find a simpler formulation of an accepted theory). Rather, the claim is that simplicity should actually be one of the key criteria that we use to evaluate which of a set of rival theories is, in fact, the best theory, given the available evidence: other things being equal, the simplest theory consistent with the data is the best one.

This view has a long and illustrious history. Though it is now most commonly associated with the 14th century philosopher, William of Ockham (also spelt “Occam”), whose name is attached to the famous methodological maxim known as “Ockham’s razor”, which is often interpreted as enjoining us to prefer the simplest theory consistent with the available evidence, it can be traced at least as far back as Aristotle. In his Posterior Analytics, Aristotle argued that nothing in nature was done in vain and nothing was superfluous, so our theories of nature should be as simple as possible. Several centuries later, at the beginning of the modern scientific revolution, Galileo espoused a similar view, holding that, “[n]ature does not multiply things unnecessarily; that she makes use of the easiest and simplest means for producing her effects” (Galilei, 1962, p396). Similarly, at beginning of the third book of the Principia, Isaac Newton included the following principle among his “rules for the study of natural philosophy”:

  • No more causes of natural things should be admitted than are both true and sufficient to explain their phenomena.
    As the philosophers say: Nature does nothing in vain, and more causes are in vain when fewer will suffice. For Nature is simple and does not indulge in the luxury of superfluous causes. (Newton, 1999, p794 [emphasis in original]).

In the 20th century, Albert Einstein asserted that “our experience hitherto justifies us in believing that nature is the realisation of the simplest conceivable mathematical ideas” (Einstein, 1954, p274). More recently, the eminent physicist Steven Weinberg has claimed that he and his fellow physicists “demand simplicity and rigidity in our principles before we are willing to take them seriously” (Weinberg, 1993, p148-9), while the Nobel prize winning economist John Harsanyi has stated that “[o]ther things being equal, a simpler theory will be preferable to a less simple theory” (quoted in McAlleer, 2001, p296).

It should be noted, however, that not all scientists agree that simplicity should be regarded as a legitimate criterion for theory choice. The eminent biologist Francis Crick once complained, “[w]hile Occam’s razor is a useful tool in physics, it can be a very dangerous implement in biology. It is thus very rash to use simplicity and elegance as a guide in biological research” (Crick, 1988, p138). Similarly, here are a group of earth scientists writing in Science:

  • Many scientists accept and apply [Ockham’s Razor] in their work, even though it is an entirely metaphysical assumption. There is scant empirical evidence that the world is actually simple or that simple accounts are more likely than complex ones to be true. Our commitment to simplicity is largely an inheritance of 17th-century theology. (Oreskes et al, 1994, endnote 25)

Hence, while very many scientists assert that rival theories should be evaluated on grounds of simplicity, others are much more skeptical about this idea. Much of this skepticism stems from the suspicion that the cogency of a simplicity criterion depends on assuming that nature is simple (hardly surprising given the way that many scientists have defended such a criterion) and that we have no good reason to make such an assumption. Crick, for instance, seemed to think that such an assumption could make no sense in biology, given the patent complexity of the biological world. In contrast, some advocates of simplicity have argued that a preference for simple theories need not necessarily assume a simple world—for instance, even if nature is demonstrably complex in an ontological sense, we should still prefer comparatively simple explanations for nature’s complexity. Oreskes and others also emphasize that the simplicity principles of scientists such as Galileo and Newton were explicitly rooted in a particular kind of natural theology, which held that a simple and elegant universe was a necessary consequence of God’s benevolence. Today, there is much less enthusiasm for grounding scientific methods in theology (the putative connection between God’s benevolence and the simplicity of creation is theologically controversial in any case). Another common source of skepticism is the apparent vagueness of the notion of simplicity and the suspicion that scientists’ judgments about the relative simplicity of theories lack a principled and objective basis.

Even so, there is no doubting the popularity of the idea that simplicity should be used as a criterion for theory choice and evaluation. It seems to be explicitly ingrained into many scientific methods—for instance, standard statistical methods of data analysis (Section 1d). It has also spread far beyond philosophy and the natural sciences. A recent issue of the FBI Law Enforcement Bulletin, for instance, contained the advice that “[u]nfortunately, many people perceive criminal acts as more complex than they really are… the least complicated explanation of an event is usually the correct one” (Rothwell, 2006, p24).

a. Ockham’s Razor

Many scientists and philosophers endorse a methodological principle known as “Ockham’s Razor”. This principle has been formulated in a variety of different ways. In the early 21st century, it is typically just equated with the general maxim that simpler theories are “better” than more complex ones, other things being equal. Historically, however, it has been more common to formulate Ockham’s Razor as a more specific type of simplicity principle, often referred to as “the principle of parsimony”. Whether William of Ockham himself would have endorsed any of the wide variety of methodological maxims that have been attributed to him is a matter of some controversy (see Thorburn, 1918; entry on William of Ockham), since Ockham never explicitly referred to a methodological principle that he called his “razor”. However, a standard of formulation of the principle of parsimony—one that seems to be reasonably close to the sort of principle that Ockham himself probably would have endorsed—is as the maxim “entities are not to be multiplied beyond necessity”. So stated, the principle is ontological, since it is concerned with parsimony with respect to the entities that theories posit the existence of in attempting to account for the empirical data. “Entity”, in this context, is typically understood broadly, referring not just to objects (for example, atoms and particles), but also to other kinds of natural phenomena that a theory may include in its ontology, such as causes, processes, properties, and so forth. Other, more general formulations of Ockham’s Razor are not exclusively ontological, and may also make reference to various structural features of how theories go about explaining nature, such as the unity of their explanations. The remainder of this section will focus on the more traditional ontological interpretation.

It is important to recognize that the principle, “entities are not to be multiplied beyond necessity” can be read in at least two different ways. One way of reading it is as what we can call an anti-superfluity principle (Barnes, 2000). This principle calls for the elimination of ontological posits from theories that are explanatorily redundant. Suppose, for instance, that there are two theories, T1 and T2, which both seek to explain the same set of empirical data, D. Suppose also that T1 and T2 are identical in terms of the entities that are posited, except for the fact that T2 entails an additional posit, b, that is not part of T1. So let us say that T1 posits a, while T2 posits a + b. Intuitively, T2 is a more complex theory than T1 because it posits more things. Now let us assume that both theories provide an equally complete explanation of D, in the sense that there are no features of D that the two theories cannot account for. In this situation, the anti-superfluity principle would instruct us to prefer the simpler theory, T1, to the more complex theory, T2. The reason for this is because T2 contains an explanatorily redundant posit, b, which does no explanatory work in the theory with respect to D. We know this because T1, which posits a alone provides an equally adequate account of D as T2. Hence, we can infer that positing a alone is sufficient to acquire all the explanatory ability offered by T2, with respect to D; adding b does nothing to improve the ability of T2 to account for the data.

This sort of anti-superfluity principle underlies one important interpretation of “entities are not to be multiplied beyond necessity”: as a principle that invites us to get rid of superfluous components of theories. Here, an ontological posit is superfluous with respect to a given theory, T, in so far as it does nothing to improve T’s ability to account for the phenomena to be explained. This is how John Stuart Mill understood Ockham’s razor (Mill, 1867, p526). Mill also pointed to a plausible justification for the anti-superfluity principle: explanatorily redundant posits—those that have no effect on the ability of the theory to explain the data—are also posits that do not obtain evidential support from the data. This is because it is plausible that theoretical entities are evidentially supported by empirical data only to the extent that they can help us to account for why the data take the form that they do. If a theoretical entity fails to contribute to this end, then the data fails to confirm the existence of this entity. If we have no other independent reason to postulate the existence of this entity, then we have no justification for including this entity in our theoretical ontology.

Another justification that has been offered for the anti-superfluity principle is a probabilistic one. Note that T2 is a logically stronger theory than T1: T2 says that a and b exist, while T1 says that only a exists. It is a consequence of the axioms of probability that a logically stronger theory is always less probable than a logically weaker theory, thus, so long as the probability of a existing and the probability of b existing are independent of each other, the probability of a existing is greater than zero, and the probability of b existing is less than 1, we can assert that Pr (a exists) > Pr (a exists & b exists), where Pr (a exists & b exists) = Pr (a exists) * Pr (b exists). According to this reasoning, we should therefore regard the claims of T1 as more a priori probable than the claims of T2, and this is a reason to prefer it. However, one objection to this probabilistic justification for the anti-superfluity principle is that it doesn’t fully explain why we dislike theories that posit explanatorily redundant entities: it can’t really because they are logically stronger theories; rather it is because they postulate entities that are unsupported by evidence.

When the principle of parsimony is read as an anti-superfluity principle, it seems relatively uncontroversial. However, it is important to recognize that the vast majority of instances where the principle of parsimony is applied (or has been seen as applying) in science cannot be given an interpretation merely in terms of the anti-superfluity principle. This is because the phrase “entities are not to be multiplied beyond necessity” is normally read as what we can call an anti-quantity principle: theories that posit fewer things are (other things being equal) to be preferred to theories that posit more things, whether or not the relevant posits play any genuine explanatory role in the theories concerned (Barnes, 2000). This is a much stronger claim than the claim that we should razor off explanatorily redundant entities. The evidential justification for the anti-superfluity principle just described cannot be used to motivate the anti-quantity principle, since the reasoning behind this justification allows that we can posit as many things as we like, so long as all of the individual posits do some explanatory work within the theory. It merely tells us to get rid of theoretical ontology that, from the perspective of a given theory, is explanatorily redundant. It does not tell us that theories that posit fewer things when accounting for the data are better than theories that posit more things—that is, that sparser ontologies are better than richer ones.

Another important point about the anti-superfluity principle is that it does not give us a reason to assert the non-existence of the superfluous posit. Absence of evidence, is not (by itself) evidence for absence. Hence, this version of Ockham’s razor is sometimes also referred to as an “agnostic” razor rather than an “atheistic” razor, since it only motivates us to be agnostic about the razored-off ontology (Sober, 1981). It seems that in most cases where Ockham’s razor is appealed to in science it is intended to support atheistic conclusions—the entities concerned are not merely cut out of our theoretical ontology, their existence is also denied. Hence, if we are to explain why such a preference is justified we need will to look for a different justification. With respect to the probabilistic justification for the anti-superfluity principle described above, it is important to note that it is not an axiom of probability that Pr (a exists & b doesn’t exist) > Pr (a exists & b exists).

b. Examples of Simplicity Preferences at Work in the History of Science

It is widely believed that there have been numerous episodes in the history of science where particular scientific theories were defended by particular scientists and/or came to be preferred by the wider scientific community less for directly empirical reasons (for example, some telling experimental finding) than as a result of their relative simplicity compared to rival theories. Hence, the history of science is taken to demonstrate the importance of simplicity considerations in how scientists defend, evaluate, and choose between theories. One striking example is Isaac Newton’s argument for universal gravitation.

i. Newton’s Argument for Universal Gravitation

At beginning of the third book of the Principia, subtitled “The system of the world”, Isaac Newton described four “rules for the study of natural philosophy”:

  • Rule 1 No more causes of natural things should be admitted than are both true and sufficient to explain their phenomena.
  • As the philosophers say: Nature does nothing in vain, and more causes are in vain when fewer will suffice. For Nature is simple and does not indulge in the luxury of superfluous causes.
  • Rule 2 Therefore, the causes assigned to natural effects of the same kind must be, so far as possible, the same.
  • Rule 3 Those qualities of bodies that cannot be intended and remitted [i.e., qualities that cannot be increased and diminished] and that belong to all bodies on which experiments can be made should be taken as qualities of all bodies universally.
  • For the qualities of bodies can be known only through experiments; and therefore qualities that square with experiments universally are to be regarded as universal qualities… Certainly ideal fancies ought not to be fabricated recklessly against the evidence of experiments, nor should we depart from the analogy of nature, since nature is always simple and ever consonant with itself…
  • Rule 4 In experimental philosophy, propositions gathered from phenomena by induction should be considered either exactly or very nearly true notwithstanding any contrary hypotheses, until yet other phenomena make such propositions either more exact or liable to exceptions.
  • This rule should be followed so that arguments based on induction may not be nullified by hypotheses. (Newton, 1999, p794-796).

Here we see Newton explicitly placing simplicity at the heart of his conception of the scientific method. Rule 1, a version of Ockham’s Razor, which, despite the use of the word “superfluous”, has typically been read as an anti-quantity principle rather than an anti-superfluity principle (see Section 1a), is taken to follow directly from the assumption that nature is simple, which is in turn taken to give rise to rules 2 and 3, both principles of inductive generalization (infer similar causes for similar effects, and assume to be universal in all bodies those properties found in all observed bodies). These rules play a crucial role in what follows, the centrepiece being the argument for universal gravitation.

After laying out these rules of method, Newton described several “phenomena”—what are in fact empirical generalizations, derived from astronomical observations, about the motions of the planets and their satellites, including the moon. From these phenomena and the rules of method, he then “deduced” several general theoretical propositions. Propositions 1, 2, and 3 state that the satellites of Jupiter, the primary planets, and the moon are attracted towards the centers of Jupiter, the sun, and the earth respectively by forces that keep them in their orbits (stopping them from following a linear path in the direction of their motion at any one time). These forces are also claimed to vary inversely with the square of the distance of the orbiting body (for example, Mars) from the center of the body about which it orbits (for example, the sun). These propositions are taken to follow from the phenomena, including the fact that the respective orbits can be shown to (approximately) obey Kepler’s law of areas and the harmonic law, and the laws of motion developed in book 1 of the Principia. Newton then asserted proposition 4: “The moon gravitates toward the earth and by the force of gravity is always drawn back from rectilinear motion and kept in its orbit” (p802). In other words, it is the force of gravity that keeps the moon in its orbit around the earth. Newton explicitly invoked rules 1 and 2 in the argument for this proposition (what has become known as the “moon-test”). First, astronomical observations told us how fast the moon accelerates towards the earth. Newton was then able to calculate what the acceleration of the moon would be at the earth’s surface, if it were to fall down to the earth. This turned out to be equal to the acceleration of bodies observed to fall in experiments conducted on earth. Since it is the force of gravity that causes bodies on earth to fall (Newton assumed his readers’ familiarity with “gravity” in this sense), and since both gravity and the force acting on the moon “are directed towards the center of the earth and are similar to each other and equal”, Newton asserted that “they will (by rules 1 and 2) have the same cause” (p805). Therefore, the forces that act on falling bodies on earth, and which keeps the moon in its orbit are one and the same: gravity. Given this, the force of gravity acting on terrestrial bodies could now be claimed to obey an inverse-square law. Through similar deployment of rules 1, 2, and 4, Newton was led to the claim that it is also gravity that keeps the planets in their orbits around the sun and the satellites of Jupiter and Saturn in their orbits, since these forces are also directed toward the centers of the sun, Jupiter, and Saturn, and display similar properties to the force of gravity on earth, such as the fact that they obey an inverse-square law. Therefore, the force of gravity was held to act on all planets universally. Through several more steps, Newton was eventually able to get to the principle of universal gravitation: that gravity is a mutually attractive force that acts on any two bodies whatsoever and is described by an inverse-square law, which says that the each body attracts the other with a force of equal magnitude that is proportional to the product of the masses of the two bodies and inversely proportional to the squared distance between them. From there, Newton was able to determine the masses and densities of the sun, Jupiter, Saturn, and the earth, and offer a new explanation for the tides of the seas, thus showing the remarkable explanatory power of this new physics.

Newton’s argument has been the subject of much debate amongst historians and philosophers of science (for further discussion of the various controversies surrounding its structure and the accuracy of its premises, see Glymour, 1980; Cohen, 1999; Harper, 2002). However, one thing that seems to be clear is that his conclusions are by no means forced on us through simple deductions from the phenomena, even when combined with the mathematical theorems and general theory of motion outlined in book 1 of the Principia. No experiment or mathematical derivation from the phenomena demonstrated that it must be gravity that is the common cause of the falling of bodies on earth, the orbits of the moon, the planets and their satellites, much less that gravity is a mutually attractive force acting on all bodies whatsoever. Rather, Newton’s argument appears to boil down to the claim that if gravity did have the properties accorded to it by the principle of universal gravitation, it could provide a common causal explanation for all the phenomena, and his rules of method tell us to infer common causes wherever we can. Hence, the rules, which are in turn grounded in a preference for simplicity, play a crucial role in taking us from the phenomena to universal gravitation (for further discussion of the apparent link between simplicity and common cause reasoning, see Sober, 1988). Newton’s argument for universal gravitation can thus be seen as argument to the putatively simplest explanation for the empirical observations.

ii. Other Examples

Numerous other putative examples of simplicity considerations at work in the history of science have been cited in the literature:

  • One of the most commonly cited concerns Copernicus’ arguments for the heliocentric theory of planetary motion. Copernicus placed particular emphasis on the comparative “simplicity” and “harmony” of the account that his theory gave of the motions of the planets compared with the rival geocentric theory derived from the work of Ptolemy. This argument appears to have carried significant weight for Copernicus’ successors, including Rheticus, Galileo, and Kepler, who all emphasized simplicity as a major motivation for heliocentrism. Philosophers have suggested various reconstructions of the Copernican argument (see for example, Glymour, 1980; Rosencrantz, 1983; Forster and Sober, 1994; Myrvold, 2003; Martens, 2009). However, historians of science have questioned the extent to which simplicity could have played a genuine rather than purely rhetorical role in this episode. For example, it has been argued that there is no clear sense in which the Copernican system was in fact simpler than Ptolemy’s, and that geocentric systems such as the Tychronic system could be constructed that were at least as simple as the Copernican one (for discussion, see Kuhn, 1957; Palter, 1970; Cohen, 1985; Gingerich, 1993; Martens, 2009).
  • It has been widely claimed that simplicity played a key role in the development of Einstein’s theories of theories of special and general relativity, and in the early acceptance of Einstein’s theories by the scientific community (see for example, Hesse, 1974; Holton, 1974; Schaffner, 1974; Sober, 1981; Pais, 1982; Norton, 2000).
  • Thagard (1988) argues that simplicity considerations played an important role in Lavoisier’s case against the existence of phlogiston and in favour of the oxygen theory of combustion.
  • Carlson (1966) describes several episodes in the history of genetics in which simplicity considerations seemed to have held sway.
  • Nolan (1997) argues that a preference for ontological parsimony played an important role in the discovery of the neutrino and in the proposal of Avogadro’s hypothesis.
  • Baker (2007) argues that ontological parsimony was a key issue in discussions over rival dispersalist and extensionist bio-geographical theories in the late 19th and early 20th century.

Though it is commonplace for scientists and philosophers to claim that simplicity considerations have played a significant role in the history of science, it is important to note that some skeptics have argued that the actual historical importance of simplicity considerations has been over-sold (for example, Bunge, 1961; Lakatos and Zahar, 1978). Such skeptics dispute the claim that we can only explain the basis for these and other episodes of theory change by according a role to simplicity, claiming other considerations actually carried more weight. In addition, it has been argued that, in many cases, what appear on the surface to have been appeals to the relative simplicity of theories were in fact covert appeals to some other theoretical virtue (for example, Boyd, 1990; Sober, 1994; Norton, 2003; Fitzpatrick, 2009). Hence, for any putative example of simplicity at work in the history of science, it is important to consider whether the relevant arguments are not best reconstructed in other terms (such a “deflationary” view of simplicity will be discussed further in Section 4c).

c. Simplicity and Inductive Inference

Many philosophers have come to see simplicity considerations figuring not only in how scientists go about evaluating and choosing between developed scientific theories, but also in the mechanics of making much more basic inductive inferences from empirical data. The standard illustration of this in the modern literature is the practice of curve-fitting. Suppose that we have a series of observations of the values of a variable, y, given values of another variable, x. This gives us a series of data points, as represented in Figure 1.

Figure 1

Given this data, what underlying relationship should we posit between x and y so that we can predict future pairs of x-y values? Standard practice is not to select a bumpy curve that neatly passes through all the data points, but rather to select a smooth curve—preferably a straight line, such as H1—that passes close to the data. But why do we do this? Part of an answer comes from the fact that if the data is to some degree contaminated with measurement error (for example, through mistakes in data collection) or “noise” produced by the effects of uncontrolled factors, then any curve that fits the data perfectly will most likely be false. However, this does not explain our preference for a curve like H1 over an infinite number of other curves—H2, for instance—that also pass close to the data. It is here that simplicity has been seen as playing a vital, though often implicit role in how we go about inferring hypotheses from empirical data: H1 posits a “simpler” relationship between x and y than H2—hence, it is for reasons of simplicity that we tend to infer hypotheses like H1.

The practice of curve-fitting has been taken to show that—whether we aware of it or not—human beings have a fundamental cognitive bias towards simple hypotheses. Whether we are deciding between rival scientific theories, or performing more basic generalizations from our experience, we ubiquitously tend to infer the simplest hypothesis consistent with our observations. Moreover, this bias is held to be necessary in order for us to be able select a unique hypotheses from the potentially limitless number of hypotheses consistent with any finite amount of experience.

The view that simplicity may often play an implicit role in empirical reasoning can arguably be traced back to David Hume’s description of enumerative induction in the context of his formulation of the famous problem of induction. Hume suggested that a tacit assumption of the uniformity of nature is ingrained into our psychology. Thus, we are naturally drawn to the conclusion that all ravens have black feathers from the fact that all previously observed ravens have black feathers because we tacitly assume that the world is broadly uniform in its properties. This has been seen as a kind of simplicity assumption: it is simpler to assume more of the same.

A fundamental link between simplicity and inductive reasoning has been retained in many more recent descriptive accounts of inductive inference. For instance, Hans Reichenbach (1949) described induction as an application of what he called the “Straight Rule”, modelling all inductive inference on curve-fitting. In addition, proponents of the model of “Inference to Best Explanation”, who hold that many inductive inferences are best understood as inferences to the hypothesis that would, if true, provide the best explanation for our observations, normally claim that simplicity is one of the criteria that we use to determine which hypothesis constitutes the “best” explanation.

In recent years, the putative role of simplicity in our inferential psychology has been attracting increasing attention from cognitive scientists. For instance, Lombrozo (2007) describes experiments that she claims show that participants use the relative simplicity of rival explanations (for instance, whether a particular medical diagnosis for a set of symptoms involves assuming the presence of one or multiple independent conditions) as a guide to assessing their probability, such that a disproportionate amount of contrary probabilistic evidence is required for participants to choose a more complex explanation over a simpler one. Simplicity considerations have also been seen as central to learning processes in many different cognitive domains, including language acquisition and category learning (for example, Chater, 1999; Lu and others, 2006).

d. Simplicity in Statistics and Data Analysis

Philosophers have long used the example of curve-fitting to illustrate the (often implicit) role played by considerations of simplicity in inductive reasoning from empirical data. However, partly due to the advent of low-cost computing power and that the fact scientists in many disciplines find themselves having to deal with ever larger and more intricate bodies of data, recent decades have seen a remarkable revolution in the methods available to scientists for analyzing and interpreting empirical data (Gauch, 2006). Importantly, there are now numerous formalized procedures for data analysis that can be implemented in computer software—and which are widely used in disciplines from engineering to crop science to sociology—that contain an explicit role for some notion of simplicity. The literature on such methods abounds with talk of “Ockham’s Razor”, “Occam factors”, “Ockham’s hill” (MacKay, 1992; Gauch, 2006), “Occam’s window” (Raftery and others, 1997), and so forth. This literature not only provides important illustrations of the role that simplicity plays in scientific practice, but may also offer insights for philosophers seeking to understand the basis for this role.

As an illustration, consider standard procedures for model selection, such as the Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC), Minimum Message Length (MML) and Minimum Description Length (MDL) procedures, and numerous others (for discussion see, Forster and Sober, 1994; Forster, 2001; Gauch, 2003; Dowe and others, 2007). Model selection is a matter of selecting the kind of relationship that is to be posited between a set of variables, given a sample of data, in an effort to generate hypotheses about the true underlying relationship holding in the population of inference and/or to make predictions about future data. This question arises in the simple curve-fitting example discussed above—for instance, whether the true underlying relationship between x and y is linear, parabolic, quadratic, and so on. It also arises in lots of other contexts, such as the problem of inferring the causal relationship that exists between an empirical effect and a set of variables. “Models” in this sense are families of functions, such as the family of linear functions, LIN: y = a + bx, or the family of parabolic functions, PAR: y = a + bx + cx2. The simplicity of a model is normally explicated in terms of the number of adjustable parameters it contains (MML and MDL measure the simplicity of models in terms of the extent to which they provide compact descriptions of the data, but produce similar results to the counting of adjustable parameters). On this measure, the model LIN is simpler than PAR, since LIN contains two adjustable parameters, whereas PAR has three. A consequence of this is that a more complex model will always be able to fit a given sample of data better than a simpler model (“fitting” a model to the data involves using the data to determine what the values of the parameters in the model should be, given that data—that is, identifying the best-fitting member of the family). For instance, returning to the curve-fitting scenario represented in Figure 1, the best-fitting curve in PAR is guaranteed to fit this data set at least as well as the best-fitting member of the simpler model, LIN, and this is true no matter what the data are, since linear functions are special cases of parabolas, where c = 0, so any curve that is a member of LIN is also a member of PAR.

Model selection procedures produce a ranking of all the models under consideration in light of the data, thus allowing scientists to choose between them. Though they do it in different ways, AIC, BIC, MML, and MDL all implement procedures for model selection that impose a penalty on the complexity of a model, so that a more complex model will have to fit the data sample at hand significantly better than a simpler one for it to be rated higher than the simpler model. Often, this penalty is greater the smaller is the sample of data. Interestingly—and contrary to the assumptions of some philosophers—this seems to suggest that simplicity considerations do not only come into play as a tiebreaker between theories that fit the data equally well: according to the model selection literature, simplicity sometimes trumps better fit to the data. Hence, simplicity need not only come into play when all other things are equal.

Both statisticians and philosophers of statistics have vigorously debated the underlying justification for these sorts of model selection procedures (see, for example, the papers in Zellner and others, 2001). However, one motivation for taking into account the simplicity of models derives from a piece of practical wisdom: when there is error or “noise” in the data sample, a relatively simple model that fits the sample less well will often be more accurate when it comes to predicting extra-sample (for example, future) data than a more complex model that fits the sample more closely. The logic here is that since more complex models are more flexible in their ability to fit the data (since they have more adjustable parameters), they also have a greater propensity to be misled by errors and noise, in which case they may recover less of the true underlying “signal” in the sample. Thus, constraining model complexity may facilitate greater predictive accuracy. This idea is captured in what Gauch (2003, 2006) (following MacKay, 1992) calls “Ockham’s hill”. To the left of the peak of the hill, increasing the complexity of a model improves its accuracy with respect to extra-sample data because this recovers more of the signal in the sample. However, after the peak, increasing complexity actually diminishes predictive accuracy because this leads to over-fitting to spurious noise in the sample. There is therefore an optimal trade-off (at the peak of Ockham’s hill) between simplicity and fit to the sample data when it comes to facilitating accurate prediction of extra-sample data. Indeed, this trade-off is essentially the core idea behind AIC, the development of which initiated the now enormous literature on model selection, and the philosophers Malcolm Forster and Elliott Sober have sought to use such reasoning to make sense of the role of simplicity in many areas of science (see Section 4biii).

One important implication of this apparent link between model simplicity and predictive accuracy is that interpreting sample data using relatively simple models may improve the efficiency of experiments by allowing scientists to do more with less data—for example, scientists may be able to run a costly experiment fewer times before they can be in a position to make relatively accurate predictions about the future. Gauch (2003, 2006) describes several real world cases from crop science and elsewhere where this gain in accuracy and efficiency from the use of relatively simple models has been documented.

2. Wider Philosophical Significance of Issues Surrounding Simplicity

The putative role of simplicity, both in the evaluation of rival scientific theories and in the mechanics of how we go about inferring hypotheses from empirical data, clearly raises a number of difficult philosophical issues. These include, but are by no means limited to: (1) the question of what precisely it means to say the one theory or hypothesis is simpler than another and how the relative simplicity of theories is to be measured; (2) the question of what rational justification (if any) can be provided for choosing between rival theories on grounds of simplicity; and (3) the closely related question of what weight simplicity considerations ought to carry in theory choice relative to other theoretical virtues, particularly if these sometimes have to be traded-off against each other. (For general surveys of the philosophical literature on these issues, see Hesse, 1967; Sober, 2001a, 2001b). Before we delve more deeply into how philosophers have sought to answer these questions, it is worth noting the close connections between philosophical issues surrounding simplicity and many of the most important controversies in the philosophy of science and epistemology.

First, the problem of simplicity has close connections with long-standing issues surrounding the nature and justification of inductive inference. Some philosophers have actually offered up the idea that simpler theories are preferable to less simple ones as a purported solution to the problem of induction: it is the relative simplicity of the hypotheses that we tend to infer from empirical observations that supposedly provides the justification for these inferences—thus, it is simplicity that provides the warrant for our inductive practices. This approach is not as popular as it once was, since it is taken to merely substitute the problem of induction for the equally substantive problem of justifying preferences for simpler theories. A more common view in the recent literature is that the problem of induction and the problem of justifying preferences for simpler theories are closely connected, or may even amount to the same problem. Hence, a solution to the latter problem will provide substantial help towards solving the former.

More generally, the ability to make sense of the putative role of simplicity in scientific reasoning has been seen by many to be a central desideratum for any adequate philosophical theory of the scientific method. For example, Thomas Kuhn’s (1962) influential discussion of the importance of scientists’ aesthetic preferences—including but not limited to judgments of simplicity—in scientific revolutions was a central part of his case for adopting a richer conception of the scientific method and of theory change in science than he found in the dominant logical empiricist views of the time. More recently, critics of the Bayesian approach to scientific reasoning and theory confirmation, which holds that sound inductive reasoning is reasoning according to the formal principles of probability, have claimed that simplicity is an important feature of scientific reasoning that escapes a Bayesian analysis. For instance, Forster and Sober (1994) argue that Bayesian approaches to curve-fitting and model selection (such as the Bayesian Information Criterion) cannot themselves be given Bayesian rationale, nor can any other approach that builds in a bias towards simpler models. The ability of the Bayesian approach to make sense of simplicity in model selection and other aspects of scientific practice has thus been seen as central to evaluating its promise (see for example, Glymour, 1980; Forster and Sober, 1994; Forster, 1995; Kelly and Glymour, 2004; Howson and Urbach, 2006; Dowe and others, 2007).

Discussions over the legitimacy of simplicity as a criterion for theory choice have also been closely bound up with debates over scientific realism. Scientific realists assert that scientific theories aim to offer a literally true description of the world and that we have good reason to believe that the claims of our current best scientific theories are at least approximately true, including those claims that purport to be about “unobservable” natural phenomena that are beyond our direct perceptual access. Some anti-realists object that it is possible to formulate incompatible alternatives to our current best theories that are just as consistent with any current data that we have, perhaps even any future data that we could ever collect. They claim that we can therefore never be justified in asserting that the claims of our current best theories, especially those concerning unobservables, are true, or approximately true. A standard realist response is to emphasize the role of the so-called “theoretical virtues” in theory choice, among which simplicity is normally listed. The claim is thus that we rule out these alternative theories because they are unnecessarily complex. Importantly, for this defense to work, realists have to defend the idea that not only are we justified in choosing between rival theories on grounds of simplicity, but also that simplicity can be used as a guide to the truth. Naturally, anti-realists, particularly those of an empiricist persuasion (for example, van Fraassen, 1989), have expressed deep skepticism about the alleged truth-conduciveness of a simplicity criterion.

3. Defining and Measuring Simplicity

The first major philosophical problem that seems to arise from the notion that simplicity plays a role in theory choice and evaluation concerns specifying in more detail what it means to say that one theory is simpler than another and how the relative simplicity of theories is to be precisely and objectively measured. Numerous attempts have been made to formulate definitions and measures of theoretical simplicity, all of which face very significant challenges. Philosophers have not been the only ones to contribute to this endeavour. For instance, over the last few decades, a number of formal measures of simplicity and complexity have been developed in mathematical information theory. This section provides an overview of some of the main simplicity measures that have been proposed and the problems that they face. The proposals described here have also normally been tied to particular proposals about what justifies preferences for simpler theories. However, discussion of these justifications will be left until Section 4.

To begin with, it is worth considering why providing a precise definition and measure of theoretical simplicity ought to be regarded as a substantial philosophical problem. After all, it often seems that when one is confronted with a set of rival theories designed to explain a particular empirical phenomenon, it is just obvious which is the simplest. One does not always need a precise definition or measure of a particular property to be able to tell whether or not something exhibits it to a greater degree than something else. Hence, it could be suggested that if there is a philosophical problem here, it is only of very minor interest and certainly of little relevance to scientific practice. There are, however, some reasons to regard this as a substantial philosophical problem, which also has some practical relevance.

First, it is not always easy to tell whether one theory really ought to be regarded as simpler than another, and it is not uncommon for practicing scientists to disagree about the relative simplicity of rival theories. A well-known historical example is the disagreement between Galileo and Kepler concerning the relative simplicity of Copernicus’ theory of planetary motion, according to which the planets move only in perfect circular orbits with epicycles, and Kepler’s theory, according to which the planets move in elliptical orbits (see Holton, 1974; McAllister, 1996). Galileo held to the idea that perfect circular motion is simpler than elliptical motion. In contrast, Kepler emphasized that an elliptical model of planetary motion required many fewer orbits than a circular model and enabled a reduction of all the planetary motions to three fundamental laws of planetary motion. The problem here is that scientists seem to evaluate the simplicity of theories along a number of different dimensions that may conflict with each other. Hence, we have to deal with the fact that a theory may be regarded as simpler than a rival in one respect and more complex in another. To illustrate this further, consider the following list of commonly cited ways in which theories may be held to be simpler than others:

  • Quantitative ontological parsimony (or economy): postulating a smaller number of independent entities, processes, causes, or events.
  • Qualitative ontological parsimony (or economy): postulating a smaller number of independent kinds or classes of entities, processes, causes, or events.
  • Common cause explanation: accounting for phenomena in terms of common rather than separate causal processes.
  • Symmetry: postulating that equalities hold between interacting systems and that the laws describing the phenomena look the same from different perspectives.
  • Uniformity (or homogeneity): postulating a smaller number of changes in a given phenomenon and holding that the relations between phenomena are invariant.
  • Unification: explaining a wider and more diverse range of phenomena that might otherwise be thought to require separate explanations in a single theory (theoretical reduction is generally held to be a species of unification).
  • Lower level processes: when the kinds of processes that can be posited to explain a phenomena come in a hierarchy, positing processes that come lower rather than higher in this hierarchy.
  • Familiarity (or conservativeness): explaining new phenomena with minimal new theoretical machinery, reusing existing patterns of explanation.
  • Paucity of auxiliary assumptions: invoking fewer extraneous assumptions about the world.
  • Paucity of adjustable parameters: containing fewer independent parameters that the theory leaves to be determined by the data.

As can be seen from this list, there is considerable diversity here. We can see that theoretical simplicity is frequently thought of in ontological terms (for example, quantitative and qualitative parsimony), but also sometimes as a structural feature of theories (for example, unification, paucity of adjustable parameters), and while some of these intuitive types of simplicity may often cluster together in theories—for instance, qualitative parsimony would seem to often go together with invoking common cause explanations, which would in turn often seem to go together with explanatory unification—there is also considerable scope for them pointing in different directions in particular cases. For example, a theory that is qualitatively parsimonious as a result of positing fewer different kinds of entities might be quantitatively unparsimonious as result of positing more of a particular kind of entity; while the demand to explain in terms of lower-level processes rather than higher-level processes may conflict with the demand to explain in terms of common causes behind similar phenomena, and so on. There are also different possible ways of evaluating the simplicity of a theory with regard to any one of these intuitive types of simplicity. A theory may, for instance, come out as more quantitatively parsimonious than another if one focuses on the number of independent entities that it posits, but less parsimonious if one focuses on the number of independent causes it invokes. Consequently, it seems that if a simplicity criterion is actually to be applicable in practice, we need some way of resolving the disagreements that may arise between scientists about the relative simplicity of rival theories, and this requires a more precise measure of simplicity.

Second, as has already been mentioned, a considerable amount of the skepticism expressed both by philosophers and by scientists about the practice of choosing one theory over another on grounds of relative simplicity has stemmed from the suspicion that our simplicity judgments lack a principled basis (for example, Ackerman, 1961; Bunge, 1961; Priest, 1976). Disagreements between scientists, along with the multiplicity and scope for conflict between intuitive types of simplicity have been important contributors to this suspicion, leading to the view that for any two theories, T1 and T2, there is some way of evaluating their simplicity such that T1 comes out as simpler than T2, and vice versa. It seems, then, that an adequate defense of the legitimacy a simplicity criterion needs to show that there are in fact principled ways of determining when one theory is indeed simpler than another. Moreover, in so far as there is also a justificatory issue to be dealt with, we also need to be clear about exactly what it is that we need to justify a preference for.

a. Syntactic Measures

One proposal is that the simplicity of theories can be precisely and objectively measured in terms of how briefly they can be expressed. For example, a natural way of measuring the simplicity of an equation is just to count the number of terms, or parameters that it contains. Similarly, we could measure the simplicity of a theory in terms of the size of the vocabulary—for example, the number of extra-logical terms—required to write down its claims. Such measures of simplicity are often referred to as syntactic measures, since they involve counting the linguistic elements required to state, or to describe the theory.

A major problem facing any such syntactic measure of simplicity is the problem of language variance. A measure of simplicity is language variant if it delivers different results depending on the language that is used to represent the theories being compared. Suppose, for example, that we measure the simplicity of an equation by counting the number of non-logical terms that it contains. This will produce the result that r = a will come out as simpler than x2 + y2 = a2. However, this second equation is simply a transformation of the first into Cartesian co-ordinates, where r2 = x2 + y2, and is hence logically equivalent. The intuitive proposal for measuring simplicity in curve-fitting contexts, according to which hypotheses are said to be simpler if they contain fewer parameters, is also language variant in this sense. How many parameters a hypothesis contains depends on the co-ordinate scales that one uses. For any two non-identical functions, F and G, there is some way of transforming the co-ordinate scales such that we can turn F into a linear curve and G into a non-linear curve, and vice versa.

Nelson Goodman’s (1983) famous “new riddle of induction” allows us to formulate another example of the problem of language variance. Suppose all previously observed emeralds have been green. Now consider the following hypotheses about the color properties of the entire population of emeralds:

  • H1: all emeralds are green
  • H2: all emeralds first observed prior to time t are green and all emeralds first observed after time t are blue (where t is some future time)

Intuitively, H1 seems to be a simpler hypothesis than H2. To begin with, it can be stated with a smaller vocabulary. H1 also seems to postulate uniformity in the properties of emeralds, while H2 posits non-uniformity. For instance, H2 seems to assume that there is some link between the time at which an emerald is first observed and its properties. Thus it can be viewed as including an additional time parameter. But now consider Goodman’s invented predicates, “grue” and “bleen”. These have been defined in variety of different ways, but let us define them here as follows: an object is grue if it is first observed before time t and the object is green, or first observed after t and the object is blue; an object is bleen if it is first observed before time t and the object is blue, or first observed after the time t and the object is green. With these predicates, we can define a further property, “grolor”. Grue and bleen are grolors just as green and blue are colors. Now, because of the way that grolors are defined, color predicates like “green” and “blue” can also be defined in terms of grolor predicates: an object is green if first observed before time t and the object is grue, or first observed after time t and the object is bleen; an object is blue if first observed before time t and the object is bleen, or first observed after t and the object is grue. This means that statements that are expressed in terms of green and blue can also be expressed in terms of grue and bleen. So, we can rewrite H1 and H2 as follows:

  • H1: all emeralds first observed prior to time t are grue and all emeralds first observed after time t are bleen (where t is some future time)
  • H2: all emeralds are grue

Re-call that earlier we judged H1 to be simpler than H2. However, if we are retain that simplicity judgment, we cannot say that H1 is simpler than H2 because it can be stated with a smaller vocabulary; nor can we say that it H1 posits greater uniformity, and is hence simpler, because it does not contain a time parameter. This is because simplicity judgments based on such syntactic features can be reversed merely by switching the language used to represent the hypotheses from a color language to a grolor language.

Examples such as these have been taken to show two things. First, no syntactic measure of simplicity can suffice to produce a principled simplicity ordering, since all such measures will produce different results depending of the language of representation that is used. It is not enough just to stipulate that we should evaluate simplicity in one language rather than another, since that would not explain why simplicity should be measured in that way. In particular, we want to know that our chosen language is accurately tracking the objective language-independent simplicity of the theories being compared. Hence, if a syntactic measure of simplicity is to be used, say for practical purposes, it must be underwritten by a more fundamental theory of simplicity. Second, a plausible measure of simplicity cannot be entirely neutral with respect to all of the different claims about the world that the theory makes or can be interpreted as making. Because of the respective definitions of colors and grolors, any hypothesis that posits uniformity in color properties must posit non-uniformity in grolor properties. As Goodman emphasized, one can find uniformity anywhere if no restriction is placed on what kinds of properties should be taken into account. Similarly, it will not do to say that theories are simpler because they posit the existence of fewer entities, causes and processes, since, using Goodman-like manipulations, it is trivial to show that a theory can be regarded as positing any number of different entities, causes and processes. Hence, some principled restriction needs to be placed on which aspects of the content of a theory are to be taken into account and which are to be disregarded when measuring their relative simplicity.

b. Goodman’s Measure

According to Nelson Goodman, an important component of the problem of measuring the simplicity of scientific theories is the problem of measuring the degree of systematization that a theory imposes on the world, since, for Goodman, to seek simplicity is to seek a system. In a series of papers in the 1940s and 50s, Goodman (1943, 1955, 1958, 1959) attempted to explicate a precise measure of theoretical systematization in terms of the logical properties of the set of concepts, or extra-logical terms, that make up the statements of the theory.

According to Goodman, scientific theories can be regarded as sets of statements. These statements contain various extra-logical terms, including property terms, relation terms, and so on. These terms can all be assigned predicate symbols. Hence, all the statements of a theory can be expressed in a first order language, using standard symbolic notion. For instance, “… is acid” may become “A(x)”, “… is smaller than ____” may become “S(x, y)”, and so on. Goodman then claims that we can measure the simplicity of the system of predicates employed by the theory in terms of their logical properties, such as their arity, reflexivity, transitivity, symmetry, and so on. The details arehighly technical but, very roughly, Goodman’s proposal is that a system of predicates that can be used to express more is more complex than a system of predicates that can be used to express less. For instance, one of the axioms of Goodman’s proposal is that if every set of predicates of a relevant kind, K, is always replaceable by a set of predicates of another kind, L, then K is not more complex than L.

Part of Goodman’s project was to avoid the problem of language variance. Goodman’s measure is a linguistic measure, since it concerns measuring the simplicity of a theory’s predicate basis in a first order language. However, it is not a purely syntactic measure, since it does not involve merely counting linguistic elements, such as the number of extra-logical predicates. Rather, it can be regarded as an attempt to measure the richness of a conceptual scheme: conceptual schemes that can be used to say more are more complex than conceptual schemes that can be used to say less. Hence, a theory can be regarded as simpler if it requires a less expressive system of concepts.

Goodman developed his axiomatic measure of simplicity in considerable detail. However, Goodman himself only ever regarded it as a measure of one particular type of simplicity, since it only concerns the logical properties of the predicates employed by the theory. It does not, for example, take account of the number of entities that a theory postulates. Moreover, Goodman never showed how the measure could be applied to real scientific theories. It has been objected that even if Goodman’s measure could be applied, it would not discriminate between many theories that intuitively differ in simplicity—indeed, in the kind of simplicity as systematization that Goodman wants to measure. For instance, it is plausible that the system of concepts used to express the Copernican theory of planetary motion is just as expressively rich as the system of concepts used to express the Ptolemaic theory, yet the former is widely regarded as considerably simpler than the latter, partly in virtue of it providing an intuitively more systematic account of the data (for discussion of the details of Goodman’s proposal and the objections it faces, see Kemeny, 1955; Suppes, 1956; Kyburg, 1961; Hesse, 1967).

c. Simplicity as Testability

It has often been argued that simpler theories say more about the world and hence are easier to test than more complex ones. C. S. Peirce (1931), for example, claimed that the simplest theories are those whose empirical consequences are most readily deduced and compared with observation, so that they can be eliminated more easily if they are wrong. Complex theories, on the other hand, tend to be less precise and allow for more wriggle room in accommodating the data. This apparent connection between simplicity and testability has led some philosophers to attempt to formulate measures of simplicity in terms of the relative testability of theories.

Karl Popper (1959) famously proposed one such testability measure of simplicity. Popper associated simplicity with empirical content: simpler theories say more about the world than more complex theories and, in so doing, place more restriction on the ways that the world can be. According to Popper, the empirical content of theories, and hence their simplicity, can be measured in terms of their falsifiability. The falsifiability of a theory concerns the ease with which the theory can be proven false, if the theory is indeed false. Popper argued that this could be measured in terms of the amount of data that one would need to falsify the theory. For example, on Popper’s measure, the hypothesis that x and y are linearly related, according to an equation of the form, y = a + bx, comes out as having greater empirical content and hence greater simplicity than the hypotheses that they are related according a parabola of the form, y = a + bx + cx2. This is because one only needs three data points to falsify the linear hypothesis, but one needs at least four data points to falsify the parabolic hypothesis. Thus Popper argued that empirical content, falsifiability, and hence simplicity, could be seen as equivalent to the paucity of adjustable parameters. John Kemeny (1955) proposed a similar testability measure, according to which theories are more complex if they can come out as true in more ways in an n-member universe, where n is the number of individuals that the universe contains.

Popper’s equation of simplicity with falsifiability suffers from some serious objections. First, it cannot be applied to comparisons between theories that make equally precise claims, such as a comparison between a specific parabolic hypothesis and a specific linear hypothesis, both of which specify precise values for their parameters and can be falsified by only one data point. It also cannot be applied when we compare theories that make probabilistic claims about the world, since probabilistic statements are not strictly falsifiable. This is particularly troublesome when it comes to accounting for the role of simplicity in the practice of curve-fitting, since one normally has to deal with the possibility of error in the data. As a result, an error distribution is normally added to the hypotheses under consideration, so that they are understood as conferring certain probabilities on the data, rather than as having deductive observational consequences. In addition, most philosophers of science now tend to think that falsifiability is not really an intrinsic property of theories themselves, but rather a feature of how scientists are disposed to behave towards their theories. Even deterministic theories normally do not entail particular observational consequences unless they are conjoined with particular auxiliary assumptions, usually leaving the scientist the option of saving the theory from refutation by tinkering with their auxiliary assumptions—a point famously emphasized by Pierre Duhem (1954). This makes it extremely difficult to maintain that simpler theories are intrinsically more falsifiable than less simple ones. Goodman (1961, p150-151) also argued that equating simplicity with falsifiability leads to counter-intuitive consequences. The hypothesis, “All maple trees are deciduous”, is intuitively simpler than the hypothesis, “All maple trees whatsoever, and all sassafras trees in Eagleville, are deciduous”, yet, according to Goodman, the latter hypothesis is clearly the easiest to falsify of the two. Kemeny’s measure inherits many of the same objections.

Both Popper and Kemeny essentially tried to link the simplicity of a theory with the degree to which it can accommodate potential future data: simpler theories are less accommodating than more complex ones. One interesting recent attempt to make sense of this notion of accommodation is due to Harman and Kulkarni (2007). Harman and Kulkarni analyze accommodation in terms of a concept drawn from statistical learning theory known as the Vapnik-Chervonenkis (VC) dimension. The VC dimension of a hypothesis can be roughly understood as a measure of the “richness” of the class of hypotheses from which it is drawn, where a class is richer if it is harder to find data that is inconsistent with some member of the class. Thus, a hypothesis drawn from a class that can fit any possible set of data will have infinite VC dimension. Though VC dimension shares some important similarities with Popper’s measure, there are important differences. Unlike Popper’s measure, it implies that accommodation is not always equivalent to the number of adjustable parameters. If we count adjustable parameters, sine curves of the form y = a sin bx, come out as relatively unaccommodating, however, such curves have an infinite VC dimension. While Harman and Kulkarni do not propose that VC dimension be taken as a general measure of simplicity (in fact, they regard it as an alternative to simplicity in some scientific contexts), ideas along these lines might perhaps hold some future promise for testability/accommodation measures of simplicity. Similar notions of accommodation in terms of “dimension” have been used to explicate the notion of the simplicity of a statistical model in the face of the fact the number of adjustable parameters a model contains is language variant (for discussion, see Forster, 1999; Sober, 2007).

d. Sober’s Measure

In his early work on simplicity, Elliott Sober (1975) proposed that the simplicity of theories be measured in terms of their question-relative informativeness. According to Sober, a theory is more informative if it requires less supplementary information from us in order for us to be able to use it to determine the answer to the particular questions that we are interested in. For instance, the hypothesis, y = 4x, is more informative and hence simpler than y = 2z + 2x with respect to the question, “what is the value of y?” This is because in order to find out the value of y one only needs to determine a value for x on the first hypothesis, whereas on the second hypothesis one also needs to determine a value for z. Similarly, Sober’s proposal can be used to capture the intuition that theories that say that a given class of things are uniform in their properties are simpler than theories that say that the class is non-uniform, because they are more informative relative to particular questions about the properties of the class. For instance, the hypothesis that “all ravens are black” is more informative and hence simpler than “70% of ravens are black” with respect to the question, “what will be the colour of the next observed raven?” This is because on the former hypothesis one needs no additional information in order to answer this question, whereas one will have to supplement the latter hypothesis with considerable extra information in order to generate a determinate answer.

By relativizing the notion of the content-fullness of theories to the question that one is interested in, Sober’s measure avoids the problem that Popper and Kemeny’s proposals faced of the most arbitrarily specific theories, or theories made up of strings of irrelevant conjunctions of claims, turning out to be the simplest. Moreover, according to Sober’s proposal, the content of the theory must be relevant to answering the question for it to count towards the theory’s simplicity. This gives rise to the most distinctive element of Sober’s proposal: different simplicity orderings of theories will be produced depending on the question one asks. For instance, if we want to know what the relationship is between values of z and given values of y and x, then y = 2z + 2x will be more informative, and hence simpler, than y = 4x. Thus, a theory can be simple relative to some questions and complex relative to others.

Critics have argued that Sober’s measure produces a number of counter-intuitive results. Firstly, the measure cannot explain why people tend to judge an equation such as y = 3x + 4x2 – 50 as more complex than an equation like y = 2x, relative to the question, “what is the value of y?” In both cases, one only needs a value of x to work out a value for y. Similarly, Sober’s measure fails to deal with Goodman’s above cited counter-example to the idea that simplicity equates to testability, since it produces the counter-intuitive outcome that there is no difference in simplicity between “all maple trees whatsoever, and all sassafras trees in Eagleville, are deciduous” and “all maple trees are deciduous” relative to questions about whether maple trees are deciduous. The interest-relativity of Sober’s measure has also generated criticism from those who prefer to see simplicity as a property that varies only with what a given theory is being compared with, not with the question that one happens to be asking.

e. Thagard’s Measure

Paul Thagard (1988) proposed that simplicity ought to be understood as a ratio of the number of facts explained by a theory to the number of auxiliary assumptions that the theory requires. Thagard defines an auxiliary assumption as a statement, not part of the original theory, which is assumed in order for the theory to be able to explain one or more of the facts to be explained. Simplicity is then measured as follows:

  • Simplicity of T = (Facts explained by T – Auxiliary assumptions of T) / Facts explained by T

A value of 0 is given to a maximally complex theory that requires as many auxiliary assumptions as facts that it explains and 1 to a maximally simple theory that requires no auxiliary assumptions at all to explain. Thus, the higher the ratio of facts explained to auxiliary assumptions, the simpler the theory. The essence of Thagard’s proposal is that we want to explain as much as we can, while making the fewest assumptions about the way the world is. By balancing the paucity of auxiliary assumptions against explanatory power it prevents the unfortunate consequence of the simplest theories turning out to be those that are most anaemic.

A significant difficulty facing Thargard’s proposal lies in determining what the auxiliary assumptions of theories actually are and how to count them. It could be argued that the problem of counting auxiliary assumptions threatens to become as difficult as the original problem of measuring simplicity. What a theory must assume about the world for it to explain the evidence is frequently extremely unclear and even harder to quantify. In addition, some auxiliary assumptions are bigger and more onerous than others and it is not clear that they should be given equal weighting, as they are in Thagard’s measure. Another objection is that Thagard’s proposal struggles to make sense of things like ontological parsimony—the idea that theories are simpler because they posit fewer things—since it is not clear that parsimony per se would make any particular difference to the number of auxiliary assumptions required. In defense of this, Thagard has argued that ontological parsimony is actually less important to practicing scientists than has often been thought.

f. Information-Theoretic Measures

Over the last few decades, a number of formal measures of simplicity and complexity have been developed in mathematical information theory. Though many of these measures have been designed for addressing specific practical problems, the central ideas behind them have been claimed to have significance for addressing the philosophical problem of measuring the simplicity of scientific theories.

One of the prominent information-theoretic measures of simplicity in the current literature is Kolmogorov complexity, which is a formal measure of quantitative information content (see Li and Vitányi, 1997). The Kolmogorov complexity K(x) of an object x is the length in bits of the shortest binary program that can output a completely faithful description of x in some universal programming language, such as Java, C++, or LISP. This measure was originally formulated to measure randomness in data strings (such as sequences of numbers), and is based on the insight that non-random data strings can be “compressed” by finding the patterns that exist in them. If there are patterns in a data string, it is possible to provide a completely accurate description of it that is shorter than the string itself, in terms of the number of “bits” of information used in the description, by using the pattern as a mnemonic that eliminates redundant information that need not be encoded in the description. For instance, if the data string is an ordered sequence of 1s and 0s, where every 1 is followed by a 0, and every 0 by a 1, then it can be given a very short description that specifies the pattern, the value of the first data point and the number of data points. Any further information is redundant. Completely random data sets, however, contain no patterns, no redundancy, and hence are not compressible.

It has been argued that Kolmogorov complexity can be applied as a general measure of the simplicity of scientific theories. Theories can be thought of as specifying the patterns that exist in the data sets they are meant to explain. As a result, we can also think of theories as compressing the data. Accordingly, the more a theory T compresses the data, the lower the value of K for the data using T, and the greater is its simplicity. An important feature of Kolmogorov complexity is that it is measured in a universal programming language and it can be shown that the difference in code length between the shortest code length for x in one universal programming language and the shortest code length for x in another programming language is no more than a constant c, which depends on the languages chosen, rather than x. This has been thought to provide some handle on the problem of language variance: Kolmogorov complexity can be seen as a measure of “objective” or “inherent” information content up to an additive constant. Due to this, some enthusiasts have gone so far as to claim that Kolmogorov complexity solves the problem of defining and measuring simplicity.

A number of objections have been raised against this application of Kolmogorov complexity. First, finding K(x) is a non-computable problem: no algorithm exists to compute it. This is claimed to be a serious practical limitation of the measure. Another objection is that Kolmogorov complexity produces some counter-intuitive results. For instance, theories that make probabilistic rather than deterministic predictions about the data must have maximum Kolmogorov complexity. For example, a theory that says that a sequence of coin flips conforms to the probabilistic law, Pr(Heads) = ½, cannot be said to compress the data, since one cannot use this law to reconstruct the exact sequence of heads and tails, even though it offers an intuitively simple explanation of what we observe.

Other information-theoretic measures of simplicity, such as the Minimum Message Length (MML) and Minimum Description Length (MDL) measures, avoid some of the practical problems facing Kolmogorov complexity. Though there are important differences in the details of these measures (see Wallace and Dowe, 1999), they all adopt the same basic idea that the simplicity of an empirical hypothesis can be measured in terms of the extent to which it provides a compact encoding of the data.

A general objection to all such measures of simplicity is that scientific theories generally aim to do more than specify patterns in the data. They also aim to explain why these patterns are there and it is in relation to how theories go about explaining the patterns in our observations that theories have often been thought to be simple or complex. Hence, it can be argued that mere data compression cannot, by itself, suffice as an explication of simplicity in relation to scientific theories. A further objection to the data compression approach is that theories can be viewed as compressing data sets in a very large number of different ways, many of which we do not consider appropriate contributions to simplicity. The problem raised by Goodman’s new riddle of induction can be seen as the problem of deciding which regularities to measure: for example, color regularities or grolor regularities? Formal information-theoretic measures do not discriminate between different kinds of pattern finding. Hence, any such measure can only be applied once we specify the sorts of patterns and regularities that should be taken into account.

g. Is Simplicity a Unified Concept?

There is a general consensus in the philosophical literature that the project of articulating a precise general measure of theoretical simplicity faces very significant challenges. Of course, this has not stopped practicing scientists from utilizing notions of simplicity in their work, and particular concepts of simplicity—such as the simplicity of a statistical model, understood in terms of paucity of adjustable parameters or model dimension—are firmly entrenched in several areas of science. Given this, one potential way of responding to the difficulties that philosophers and others have encountered in this area—particularly in light of the apparent multiplicity and scope for conflict between intuitive explications of simplicity—is to raise the question of whether theoretical simplicity is in fact a unified concept at all. Perhaps there is no single notion of simplicity that is (or should be) employed by scientists, but rather a cluster of different, sometimes related, but also sometimes conflicting notions of simplicity that scientists find useful to varying degrees in particular contexts. This might be evidenced by the observation that scientists’ simplicity judgments often involve making trade-offs between different notions of simplicity. Kepler’s preference for an astronomical theory that abandoned perfectly circular motions for the planets, but which could offer a unified explanation of the astronomical observations in terms of three basic laws, over a theory that retained perfect circular motion, but could not offer a similarly unified explanation, seems to be a clear example of this.

As a result of thoughts in this sort of direction, some philosophers have argued that there is actually no single theoretical value here at all, but rather a cluster of them (for example, Bunge, 1961). It is also worth considering the possibility that which of the cluster is accorded greater weight than the others, and how each of them is understood in practice, may vary greatly across different disciplines and fields of inquiry. Thus, what really matters when it comes to evaluating the comparative “simplicity” of theories might be quite different for biologists than for physicists, for instance, and perhaps what matters to a particle physicist is different to what matters to an astrophysicist. If there is in fact no unified concept of simplicity at work in science that might also indicate that there is no unitary justification for choosing between rival theories on grounds of simplicity. One important suggestion that this possibility has lead to is that the role of simplicity in science cannot be understood from a global perspective, but can only be understood locally. How simplicity ought to be measured and why it matters may have a peculiarly domain-specific explanation.

4. Justifying Preferences for Simpler Theories

Due to the apparent centrality of simplicity considerations to scientific methods and the link between it and numerous other important philosophical issues, the problem of justifying preferences for simpler theories is regarded as a major problem in the philosophy of science. It is also regarded as one of the most intractable. Though an extremely wide variety of justifications have been proposed—as with the debate over how to correctly define and measure simplicity, some important recent contributions have their origins in scientific literature in statistics, information theory, and other cognate fields—all of them have met with significant objections. There is currently no agreement amongst philosophers on what is the most promising path to take. There is also skepticism in some circles about whether an adequate justification is even possible.

Broadly speaking, justificatory proposals can be categorized into three types: 1) accounts that seek to show that simplicity is an indicator of truth (that is, that simpler theories are, in general, more likely to be true, or are somehow better confirmed by the empirical data than their more complex rivals); 2) accounts that do not regard simplicity as a direct indicator of truth, but which seek to highlight some alternative methodological justification for preferring simpler theories; 3) deflationary approaches, which actually reject the idea that there is a general justification for preferring simpler theories per se, but which seek to analyze particular appeals to simplicity in science in terms of other, less problematic, theoretical virtues.

a. Simplicity as an Indicator of Truth

i. Nature is Simple

Historically, the dominant view about why we should prefer simpler theories to more complex ones has been based on a general metaphysical thesis of the simplicity of nature. Since nature itself is simple, the relative simplicity of theories can thus be regarded as direct evidence for their truth. Such a view was explicitly endorsed by many of the great scientists of the past, including Aristotle, Copernicus, Galileo, Kepler, Newton, Maxwell, and Einstein. Naturally however, the question arises as to what justifies the thesis that nature is simple? Broadly speaking, there have been two different sorts of argument given for this thesis: i) that a benevolent God must have created a simple and elegant universe; ii) that the past record of success of relatively simple theories entitles us to infer that nature is simple. The theological justification was most common amongst scientists and philosophers during the early modern period. Einstein, on the other hand, invoked a meta-inductive justification, claiming that the history of physics justifies us in believing that nature is the realization of the simplest conceivable mathematical ideas.

Despite the historical popularity and influence of this view, more recent philosophers and scientists have been extremely resistant to the idea that we are justified in believing that nature is simple. For a start, it seems difficult to formulate the thesis that nature is simple so that it is not either obviously false, or too vague to be of any use. There would seem to be many counter-examples to the claim that we live in a simple universe. Consider, for instance, the picture of the atomic nucleus that physicists were working with in the early part of the twentieth century: it was assumed that matter was made only of protons and electrons; there were no such things as neutrons or neutrinos and no weak or strong nuclear forces to be explained, only electromagnetism. Subsequent discoveries have arguably led to a much more complex picture of nature and much more complex theories have had to be developed to account for this. In response, it could be claimed that though nature seems to be complex in some superficial respects, there is in fact a deep underlying simplicity in the fundamental structure of nature. It might also be claimed that the respects in which nature appears to be complex are necessary consequences of its underlying simplicity. But this just serves to highlight the vagueness of the claim that nature is simple—what exactly does this thesis amount to, and what kind of evidence could we have for it?

However the thesis is formulated, it would seem to be an extremely difficult one to adequately defend, whether this be on theological or meta-inductive grounds. An attempt to give a theological justification for the claim that nature is simple suffers from an inherent unattractiveness to modern philosophers and scientists who do not want to ground the legitimacy of scientific methods in theology. In any case, many theologians reject the supposed link between God’s benevolence and the simplicity of creation. With respect to a meta-inductive justification, even if it were the case that the history of science demonstrates the better than average success of simpler theories, we may still raise significant worries about the extent to which this could give sufficient credence to the claim that nature is simple. First, it assumes that empirical success can be taken to be a reliable indicator of truth (or at least approximate truth), and hence of what nature is really like. Though this is a standard assumption for many scientific realists—the claim being that success would be “miraculous” if the theory concerned was radically false—it is a highly contentious one, since many anti-realists hold that the history of science shows that all theories, even eminently successful theories, typically turn out to be radically false. Even if one does accept a link between success and truth, our successes to date may still not provide a representative sample of nature: maybe we have only looked at the problems that are most amenable to simple solutions and the real underlying complexity of nature has escaped our notice. We can also question the degree to which we can extrapolate any putative connection between simplicity and truth in one area of nature to nature as a whole. Moreover, in so far as simplicity considerations are held to be fundamental to inductive inference quite generally, such an attempted justification risks a charge of circularity.

ii. Meta-Inductive Proposals

There is another way of appealing to past success in order to try to justify a link between simplicity and truth. Instead of trying to justify a completely general claim about the simplicity of nature, this proposal merely suggests that we can infer a correlation between success and very particular simplicity characteristics in particular fields of inquiry—for instance, a particular kind of symmetry in certain areas of theoretical physics. If success can be regarded as an indicator of at least approximate truth, we can then infer that theories that are simpler in the relevant sense are more likely to be true in fields where the correlation with success holds.

Recent examples of this sort of proposal include McAllister (1996) and Kuipers (2002). In an effort to account for the truth-conduciveness of aesthetic considerations in science, including simplicity, Theo Kuipers (2002) claims that scientists tend to become attracted to theories that share particular aesthetic features in common with successful theories that they have been previously exposed to. In other words, we can explain the particular aesthetic preferences that scientists have in terms that are similar to a well-documented psychological effect known as the “mere-exposure effect”, which occurs when individuals take a liking to something after repeated exposure to it. If, in a given field of inquiry, theories that have been especially successful exhibit a particular type of simplicity (however this is understood), and thus such theories have been repeatedly presented to scientists working in the field during their training, the mere-exposure effect will then lead these scientists to be attracted to other theories that also exhibit that same type of simplicity. This process can then be used to support an aesthetic induction to a correlation between simplicity in the relevant sense and success. One can then make a case that this type of simplicity can legitimately be taken as an indicator of at least approximate truth.

Even though this sort of meta-inductive proposal does not attempt to show that nature in general is simple, many of the same objections can be raised against it as are raised against the attempt to justify that metaphysical thesis by appeal to the past success of simple theories. Once again, there is the problem of justifying the claim that empirical success is a reliable guide to (approximate) truth. Kuipers’ own arguments for this claim rest on a somewhat idiosyncratic account of truth approximation. In addition, in order to legitimately infer that there is a genuine correlation between simplicity and success, one cannot just look at successful theories; one must look at unsuccessful theories too. Even if all the successful theories in a domain have the relevant simplicity characteristic, it might still be the case that the majority of theories with the characteristic have been (or would have been) highly unsuccessful. Indeed, if one can potentially modify a successful theory in an infinite number of ways while keeping the relevant simplicity characteristic, one might actually be able to guarantee that the majority of possible theories with the characteristic would be unsuccessful theories, thus breaking the correlation between simplicity and success. This could be taken as suggesting that in order to carry any weight, arguments from success also need to offer an explanation for why simplicity contributes to success. Moreover, though the mere-exposure effect is well documented, Kuipers provides no direct empirical evidence that scientists actually acquire their aesthetic preferences via the kind of process that he proposes.

iii. Bayesian Proposals

According to standard varieties of Bayesianism, we should evaluate scientific theories according to their probability conditional upon the evidence (posterior probability). This probability, Pr(T | E), is a function of three quantities:

  • Pr(T | E) = Pr(E | T) Pr(T) / Pr(E)

Pr(E | T), is the probability that the theory, T, confers on the evidence, E, which is referred to as the likelihood of T. Pr(T) is the prior probability of T, and Pr(E) is the probability of E. T is then held to have higher posterior probability than a rival theory, T*, if and only if:

  • Pr(E | T) Pr(T) > Pr(E | T*) Pr(T*)

A standard Bayesian proposal for understanding the role of simplicity in theory choice is that simplicity is one of the key determinates of Pr(T): other things being equal, simpler theories and hypotheses are held to have higher prior probability of being true than more complex ones. Thus, if two rival theories confer equal or near equal probability on the data, but differ in relative simplicity, other things being equal, the simpler theory will tend to have a higher posterior probability. This idea, which Harold Jeffreys called “the simplicity postulate”, has been elaborated in a number of different ways by philosophers, statisticians, and information theorists, utilizing various measures of simplicity (for example, Carnap, 1950; Jeffreys, 1957, 1961; Solomonoff, 1964; Li, M. and Vitányi, 1997).

In response to this proposal, Karl Popper (1959) argued that, in some cases, assigning a simpler theory a higher prior probability actually violates the axioms of probability. For instance, Jeffreys proposed that simplicity be measured by counting adjustable parameters. On this measure, the claim that the planets move in circular orbits is simpler than the claim that the planets move in elliptical orbits, since the equation for an ellipse contains an additional adjustable parameter. However, circles can also be viewed as special cases of ellipses, where the additional parameter is set to zero. Hence, the claim that planets move in circular orbits can also be seen as a special case of the claim that the planets move in elliptical orbits. If that is right, then the former claim cannot be more probable than the latter claim because the truth of the former entails the truth of latter and probability respects entailment. In reply to Popper, it has been argued that this prior probabilistic bias towards simpler theories should only be seen as applying to comparisons between inconsistent theories where no relation of entailment holds between them—for instance, between the claim that the planets move in circular orbits and the claim that they move in elliptical but non-circular orbits.

The main objection to the Bayesian proposal that simplicity is a determinate of prior probability is that the theory of probability seems to offer no resources for explaining why simpler theories should be accorded higher prior probability. Rudolf Carnap (1950) thought that prior probabilities could be assigned a priori to any hypothesis stated in a formal language, on the basis of a logical analysis of the structure of the language and assumptions about the equi-probability of all possible states of affairs. However, Carnap’s approach has generally been recognized to be unworkable. If higher prior probabilities cannot be assigned to simpler theories on the basis of purely logical or mathematical considerations, then it seems that Bayesians must look outside of the Bayesian framework itself to justify the simplicity postulate.

Some Bayesians have taken an alternative route, claiming that a direct mathematical connection can be established between the simplicity of theories and their likelihood—that is, the value of Pr(E | T) ( see Rosencrantz, 1983; Myrvold, 2003; White, 2005). This proposal depends on the assumption that simpler theories have fewer adjustable parameters, and hence are consistent with a narrower range of potential data. Suppose that we collect a set of empirical data, E, that can be explained by two theories that differ with respect to this kind of simplicity: a simple theory, S, and a complex theory, C. S has no adjustable parameters and only ever entails E, while C has an adjustable parameter, θ, which can take a range of values, n. When θ is set to some specific value, i, it entails E, but on other values of θ, C entails different and incompatible observations. It is then argued that S confers a higher probability on E. This is because C allows that lots of other possible observations could have been made instead of E (on different possible settings for θ). Hence, the truth of C would make our recording those particular observations less probable than would the truth of S. Here, the likelihood of C is calculated as the average of the likelihoods of each of the n versions of C, defined by a unique setting of θ. Thus, as the complexity of a theory increases—measured in terms of the number of adjustable parameters it contains—the number of versions of the theory that will give a low probability to E will increase and the overall value of Pr(E | T) will go down.

An objection to this proposal (Kelly, 2004, 2010) is that for us to be able to show that S has a higher posterior probability than C as a result of its having a higher likelihood, it must be assumed that the prior probability of C is not significantly greater than the prior probability of S. This is a substantive assumption to make because of the way that simplicity is defined in this argument. We can view C as coming in a variety of different versions, each of which is picked out by a different value given to θ. If we then assume that S and C have roughly equal prior probability we must, by implication, assume that each version of C has a very low prior probability compared to S, since the prior probability of each version of C would be Pr(C) / n (assuming that the theory does not say that any particular parameter setting is more probable than any of the others). This would effectively build in a very strong prior bias in favour of S over each version of C. Given that each version of C could be considered independently—that is, the complex theory could be given a simpler, more restricted formulation—this would require an additional supporting argument. The objection is thus that the proposal simply begs the question by resting on a prior probabilistic bias towards simpler theories. Another objection is that the proposal suffers from the limitation that it can only be applied to comparisons between theories where the simpler theory can be derived from the more complex one by fixing certain of its parameters. At best, this represents a small fraction of cases in which simplicity has been thought to play a role.

iv. Simplicity as a Fundamental A Priori Principle

In the light of the perceived failure of philosophers to justify the claim that simpler theories are more likely to true, Richard Swinburne (2001) has argued that this claim has to be regarded as a fundamental a priori principle. Swinburne argues that it is just obvious that the criteria for theory evaluation that scientists use reliably lead them to make correct judgments about which theories are more likely to true. Since, Swinburne argues, one of these is that simpler theories are, other things being equal, more likely to be true, we just have to accept that simplicity is indeed an indicator of probable truth. However, Swinburne doesn’t think that this connection between simplicity and truth can be established empirically, nor does he think that it can be shown to follow from some more obvious a priori principle. Hence, we have no choice but to regard it as a fundamental a priori principle—a principle that cannot be justified by anything more fundamental.

In response to Swinburne, it can be argued that this is hardly going to convince those scientists and philosophers for whom it is not at all obvious the simpler theories are more likely to be true.

b. Alternative Justifications

i. Falsifiability

Famously, Karl Popper (1959) rejected the idea that theories are ever confirmed by evidence and that we are ever entitled to regard a theory as true, or probably true. Hence, Popper did not think simplicity could be legitimately regarded as an indicator of truth. Rather, he argued that simpler theories are to be valued because they are more falsifiable. Indeed, Popper thought that the simplicity of theories could be measured in terms of their falsifiability, since intuitively simpler theories have greater empirical content, placing more restriction on the ways the world can be, thus leading to a reduced ability to accommodate any future that we might discover. According to Popper, scientific progress consists not in the attainment of true theories, but in the elimination of false ones. Thus, the reason we should prefer more falsifiable theories is because such theories will be more quickly eliminated if they are in fact false. Hence, the practice of first considering the simplest theory consistent with the data provides a faster route to scientific progress. Importantly, for Popper, this meant that we should prefer simpler theories because they have a lower probability of being true, since, for any set of data, it is more likely that some complex theory (in Popper’s sense) will be able to accommodate it than a simpler theory.

Popper’s equation of simplicity with falsifiability suffers from some well-known objections and counter-examples, and these pose significant problems for his justificatory proposal (Section 3c). Another significant problem is that taking degree of falsifiability as a criterion for theory choice seems to lead to absurd consequences, since it encourages us to prefer absurdly specific scientific theories to those that have more general content. For instance, the hypothesis, “all emeralds are green until 11pm today when they will turn blue” should be judged as preferable to “all emeralds are green” because it is easier to falsify. It thus seems deeply implausible to say that selecting and testing such hypotheses first provides the fastest route to scientific progress.

ii. Simplicity as an Explanatory Virtue

A number of philosophers have sought to elucidate the rationale for preferring simpler theories to more complex ones in explanatory terms (for example, Friedman, 1974; Sober, 1975; Walsh, 1979; Thagard, 1988; Kitcher, 1989; Baker, 2003). These proposals have typically been made on the back of accounts of scientific explanation that explicate notions of explanatoriness and explanatory power in terms of unification, which is taken to be intimately bound up with notions of simplicity. According to unification accounts of explanation, a theory is explanatory if it shows how different phenomena are related to each other under certain systematizing theoretical principles, and a theory is held to have greater explanatory power than its rivals if it systematizes more phenomena. For Michael Friedman (1974), for instance, explanatory power is a function of the number of independent phenomena that we need to accept as ultimate: the smaller the number of independent phenomena that are regarded as ultimate by the theory, the more explanatory is the theory. Similarly, for Philip Kitcher (1989), explanatory power is increased the smaller the number of patterns of argument, or “problem-solving schemas”, that are needed to deliver the facts about the world that we accept. Thus, on such accounts, explanatory power is seen as a structural relationship between the sparseness of an explanation—the fewness of hypotheses or argument patterns—and the plenitude of facts that are explained. There have been various attempts to explicate notions of simplicity in terms of these sorts of features. A standard type of argument that is then used is that we want our theories not only to be true, but also explanatory. If truth were our only goal, there would be no reason to prefer a genuine scientific theory to a collection of random factual statements that all happen to be true. Hence, explanation is an ultimate, rather than a purely instrumental goal of scientific inquiry. Thus, we can justify our preferences for simpler theories once we recognize that there is a fundamental link between simplicity and explanatoriness and that explanation is a key goal of scientific inquiry, alongside truth.

There are some well-known objections to unification theories of explanation, though most of them concern the claim that unification is all there is to explanation—a claim on which the current proposal does not depend. However, even if we accept a unification theory of explanation and accept that explanation is an ultimate goal of scientific inquiry, it can be objected that the choice between a simple theory and a more complex rival is not normally a choice between a theory that is genuinely explanatory, in this sense, and a mere factual report. The complex theory can normally be seen as unifying different phenomena under systematizing principles, at least to some degree. Hence, the justificatory question here is not about why we should prefer theories that explain the data to theories that do not, but why we should prefer theories that have greater explanatory power in the senses just described to theories that are comparatively less explanatory. It is certainly a coherent possibility that the truth may turn out to be relatively disunified and unsystematic. Given this, it seems appropriate to ask why we are justified in choosing theories because they are more unifying. Just saying that explanation is an ultimate goal of scientific inquiry does not seem to be enough.

iii. Predictive Accuracy

In the last few decades, the treatment of simplicity as an explicit part of statistical methodology has become increasingly sophisticated. A consequence of this is that some philosophers of science have started looking to the statistics literature for illumination on how to think about the philosophical problems surrounding simplicity. According to Malcolm Forster and Elliott Sober (Forster and Sober, 1994; Forster, 2001; Sober, 2007), the work of the statistician, Hirotugu Akaike (1973), provides a precise theoretical framework for understanding the justification for the role of simplicity in curve-fitting and model selection.

Standard approaches to curve-fitting effect a trade-off between fit to a sample of data and the simplicity of the kind of mathematical relationship that is posited to hold between the variables—that is, the simplicity of the postulated model for the underlying relationship, typically measured in terms of the number of adjustable parameters it contains. This often means, for instance, that a linear hypothesis that fits a sample of data less well may be chosen over a parabolic hypothesis that fits the data better. According to Forster and Sober, Akaike developed an explanation for why it is rational to favor simpler models, under specific circumstances. The proposal builds on the practical wisdom that when there is a particular amount of error or noise in the data sample, more complex models have a greater propensity to “over-fit” to this spurious data in the sample and thus lead to less accurate predictions of extra-sample (for instance, future) data, particularly when dealing with small sample sizes. (Gauch [2003, 2006] calls this “Ockham’s hill”: to the left of the peak of the hill, increasing the complexity of a model improves its accuracy with respect to extra-sample data; after the peak, increasing complexity actually diminishes predictive accuracy. There is therefore an optimal trade-off at the peak of Ockham’s hill between simplicity and fit to the data sample when it comes to facilitating accurate prediction). According to Forster and Sober, what Akaike did was prove a theorem, which shows that, given standard statistical assumptions, we can estimate the degree to which constraining model complexity when fitting a curve to a sample of data will lead to more accurate predictions of extra-sample data. Following Forster and Sober’s presentation (1994, p9-10), Akaike’s theorem can be stated as follows:

  • Estimated[A(M)] = (1/N)[log-likelihood(L(M)) – k],

where A(M) is the predictive accuracy of the model, M, with respect to extra-sample data, N is the number of data points in the sample, log-likelihood is a measure of goodness of fit to the sample (the higher the log-likelihood score the closer the fit to the data), L(M) is the best fitting member of M, and k is the number of adjustable parameters that M contains. Akaike’s theorem is claimed to specify an unbiased estimator of predictive accuracy, which means that the distribution of estimates of A is centered around the true value of A (for proofs and further details on the assumptions behind Akaike’s theorem, see Sakamoto and others, 1986). This gives rise to a model selection procedure, Akaike’s Information Criterion (AIC), which says that we should choose the model that has the highest estimated predictive accuracy, given the data at hand. In practice, AIC implies that when the best-fitting parabola fits the data sample better than the best-fitting straight line, but not so much better that this outweighs its greater complexity (k), the straight line should be used for making predictions. Importantly, the penalty imposed on complexity has less influence on model selection the larger the sample of data, meaning that simplicity matters more for predictive accuracy when dealing with smaller samples.

Forster and Sober argue that Akaike’s theorem explains why simplicity has a quantifiable positive effect on predictive accuracy by combating the risk of over-fitting to noisy data. Hence, if one is interested in generating accurate predictions—for instance, of future data—one has a clear rationale for preferring simpler models. Forster and Sober are explicit that this proposal is only meant to apply to scientific contexts that can be understood from within a model selection framework, where predictive accuracy is the central goal of inquiry and there is a certain amount of error or noise in the data. Hence, they do not view Akaike’s work as offering a complete solution to the problem of justifying preferences for simpler theories. However, they have argued that a very significant number of scientific inference problems can be understood from an Akaikian perspective.

Several objections have been raised against Forster and Sober’s philosophical use of Akaike’s work. One objection is that the measure of simplicity employed by AIC is not language invariant, since the number of adjustable parameters a model contains depends on how the model is described. However, Forster and Sober argue that though, for practical purposes, the quantity, k, is normally spelt out in terms of number of adjustable parameters, it is in fact more accurately explicated in terms of the notion of the dimension of a family of functions, which is language invariant. Another objection is that AIC is not statistically consistent. Forster and Sober reply that this charge rests on a confusion over what AIC is meant to estimate: for example, erroneously assuming that AIC is meant to be estimator of the true value of k (the size of the simplest model that contains the true hypothesis), rather than an estimator of the predictive accuracy of a particular model at hand. Another worry is that over-fitting considerations imply that an idealized false model will often make more accurate predictions than a more realistic model, so the justification is merely instrumentalist and cannot warrant the use of simplicity as a criterion for hypothesis acceptance where hypotheses are construed realistically, rather than just as predictive tools. For their part, Forster and Sober are quite happy to accept this instrumentalist construal of the role of simplicity in curve-fitting and model selection: in this context, simplicity is not a guide to the truth, but to predictive accuracy. Finally, there are a variety of objections concerning the nature and validity of the assumptions behind Akaikie’s theorem and whether AIC is applicable to some important classes of model selection problems (for discussion, see Kieseppä, 1997; Forster, 1999, 2001; Howson and Urbach, 2006; Dowe and others, 2007; Sober, 2007; Kelly, 2010).

iv. Truth-Finding Efficiency

An important recent proposal about how to justify preferences for simpler theories has come from work in the interdisciplinary field known as formal learning theory (Schulte, 1999; Kelly, 2004, 2007, 2010). It has been proposed that even if we do not know whether the world is simple or complex, inferential rules that are biased towards simple hypotheses can be shown to converge to the truth more efficiently than alternative inferential rules. According to this proposal, an inferential rule is said to converge to the truth efficiently, if, relative to other possible convergent inferential rules, it minimizes the maximum number of U-turns or “retractions” of opinion that might be required of the inquirer while using the rule to guide her decisions on what to believe given the data. Such procedures are said to converge to the truth more directly and in a more stable fashion, since they require fewer changes of mind along the way. The proposal is that even if we do not know whether the truth is simple or complex, scientific inference procedures that are biased towards simplicity can be shown a priori to be optimally efficient in this sense, converging to the truth in the most direct and stable way possible.

To illustrate the basic logic behind this proposal, consider the following example from Oliver Schulte (1999). Suppose that we are investigating the existence of hypothetical particle, Ω. If Ω does exist, we will be able to detect it with an appropriate measurement device. However, as yet, it has not been detected. What attitude should we take towards the existence Ω? Let us say that Ockham’s Razor suggests that we deny that Ω exists until it is detected (if ever). Alternatively, we could assert that Ω does exist until a finite number of attempts to detect Ω have proved to be unsuccessful, say ten thousand, in which case, we assert that Ω does not exist; or, we could withhold judgment until Ω is either detected, or there have been ten thousand unsuccessful attempts to detect it. Since we are assuming that existent particles do not go undetected forever, abiding by any of three of these inferential rules will enable us to converge to the truth in the limit, whether Ω exists or not. However, Schulte argues that Ockham’s Razor provides the most efficient route to the truth. This is because following Ockham’s Razor incurs a maximum of only one retraction of opinion: retracting an assertion of non-existence to an assertion of existence, if Ω is detected. In contrast, the alternative inferential rules both incur a maximum of two retractions, since Ω could go undetected ten thousand times, but is then detected on the ten thousandth and one time. Hence, truth-finding efficiency requires that one adopt Ockham’s Razor and presume that Ω does not exist until it is detected.

Kevin Kelly has further developed this U-turn argument in considerable detail. Kelly argues that, with suitable refinements, it can be extended to an extremely wide variety of real world scientific inference problems. Importantly, Kelly has argued that, on this proposal, simplicity should not be seen as purely a pragmatic consideration in theory choice. While simplicity cannot be regarded as a direct indicator of truth, we do nonetheless have a reason to think that the practice of favoring simpler theories is a truth-conducive strategy, since it promotes speedy and stable attainment of true beliefs. Hence, simplicity should be regarded as a genuinely epistemic consideration in theory choice.

One worry about the truth-finding efficiency proposal concerns the general applicability of these results to scientific contexts in which simplicity may play a role. The U-turn argument for Ockham’s razor described above seems to depend on the evidential asymmetry between establishing that Ω exists and establishing that Ω does not exist: a detection of Ω is sufficient to establish the existence of Ω, whereas repeated failures of detection are not sufficient to establish non-existence. The argument may work where detection procedures are relatively clear-cut—for instance where there are relatively unambiguous instrument readings that count as “detections”—but what about entities that are very difficult to detect directly and where mistakes can easily be made about existence as well as non-existence? Similarly, a current stumbling block is that the U-turn argument cannot be used as a justification for the employment of simplicity biases in statistical inference, where the hypotheses under consideration do not have deductive observational consequences. Kelly is, however, optimistic about extending the U-turn argument to statistical inference. Another objection concerns the nature of the justification that is being provided here. What the U-turn argument seems to show is that the strategy of favoring the simplest theory consistent with the data may help one to find the truth with fewer reversals along the way. It does not establish that simpler theories themselves should be regarded as in any way “better” than their more complex rivals. Hence, there are doubts about the extent to which this proposal can actually make sense of standard examples of simplicity preferences at work in the history and current practice of science, where the guiding assumption seems to be that simpler theories are not to be preferred merely for strategic reasons, but because they are better theories.

c. Deflationary Approaches

Various philosophers have sought to defend broadly deflationary accounts of simplicity. Such accounts depart from all of the justificatory accounts discussed so far by rejecting the idea that simplicity should in fact be regarded as a theoretical virtue and criterion for theory choice in its own right. Rather, according to deflationary accounts, when simplicity appears to be a driving factor in theory evaluation, something else is doing the real work.

Richard Boyd (1990), for instance, has argued that scientists’ simplicity judgments are typically best understood as just covert judgements of theoretical plausibility. When a scientist claims that one theory is “simpler” than another this is often just another way of saying that the theory provides a more plausible account of the data. For Boyd, such covert judgments of theoretical plausibility are driven by the scientist’s background theories. Hence, it is the relevant background theories that do the real work in motivating the preference for the “simpler” theory, not the simplicity of the theory per se. John Norton (2003) has advocated a similar view in the context of his “material theory” of induction, according to which inductive inferences are licensed not by universal inductive rules or inference schemas, but rather by local factual assumptions about the domain of inquiry. Norton argues that the apparent use of simplicity in induction merely reflects material assumptions about the nature of the domain being investigated. For instance, when we try to fit curves to data we choose the variables and functions that we believe to be appropriate to the physical reality we are trying to get at. Hence, it is because of the facts that we believe to prevail in this domain that we prefer a “simple” linear function to a quadratic one, if such a curve fits the data sufficiently well. In a different domain, where we believe that different facts prevail, our decision about which hypotheses are “simple” or “complex” are likely to be very different.

Elliott Sober (1988, 1994) has defended this sort of deflationary analysis of various appeals to simplicity and parsimony in evolutionary biology. For example, Sober argues that the common claim that group selection hypotheses are “less parsimonious” and hence to be taken less seriously as explanations for biological adaptations than individual selection hypotheses, rests on substantive assumptions about the comparative rarity of the conditions required for group selection to occur. Hence, the appeal to Ockham’s Razor in this context is just a covert appeal to local background knowledge. Other attempts to offer deflationary analyses of particular appeals to simplicity in science include Plutynski (2005), who focuses on the Fisher-Wright debate in evolutionary biology, and Fitzpatrick (2009), who focuses on appeals to simplicity in debates over the cognitive capacities of non-human primates.

If such deflationary analyses of the putative role of simplicity in particular scientific contexts turn out to be plausible, then problems concerning how to measure simplicity and how to offer a general justification for preferring simpler theories can be avoided, since simplicity per se can be shown to do no substantive work in the relevant inferences. However, many philosophers are skeptical that such deflationary analyses are possible for many of the contexts where simplicity considerations have been thought to play an important role. Kelly (2010), for example, has argued that simplicity typically comes into play when our background knowledge underdetermines theory choice. Sober himself seems to advocate a mixed view: some appeals to simplicity in science are best understood in deflationary terms, others are better understood in terms of Akaikian model selection theory.

5. Conclusion

The putative role of considerations of simplicity in the history and current practice of science gives rise to a number of philosophical problems, including the problem of precisely defining and measuring theoretical simplicity, and the problem of justifying preferences for simpler theories. As this survey of the literature on simplicity in the philosophy of science demonstrates, these problems have turned out to be surprisingly resistant to resolution, and there remains a live debate amongst philosophers of science about how to deal with them. On the other hand, there is no disputing the fact that practicing scientists continue to find it useful to appeal to various notions of simplicity in their work. Thus, in many ways, the debate over simplicity resembles other long-running debates in the philosophy science, such as that over the justification for induction (which, it turns out, is closely related to the problem of justifying preferences for simpler theories). Though there is arguably more skepticism within the scientific community about the legitimacy of choosing between rival theories on grounds of simplicity than there is about the legitimacy of inductive inference—the latter being a complete non-issue for practicing scientists—as is the case with induction, very many scientists continue to employ practices and methods that utilize notions of simplicity to great scientific effect, assuming that appropriate solutions to the philosophical problems that these practices give rise to do in fact exist, even though philosophers have so far failed to articulate them. However, as this survey has also shown, statisticians, information and learning theorists, and other scientists have been making increasingly important contributions to the debate over the philosophical underpinning for these practices.

6. References and Further Reading

  • Ackerman, R. 1961. Inductive simplicity. Philosophy of Science, 28, 162-171.
    • Argues against the claim that simplicity considerations play a significant role in inductive inference. Critiques measures of simplicity proposed by Jeffreys, Kemeny, and Popper.
  • Akaike, H. 1973. Information theory and the extension of the maximum likelihood principle. In B. Petrov and F. Csaki (eds.), Second International Symposium on Information Theory. Budapest: Akademiai Kiado.
    • Laid the foundations for model selection theory. Proves a theorem suggesting that the simplicity of a model is relevant to estimating its future predictive accuracy. Highly technical.
  • Baker, A. 2003. Quantitative parsimony and explanatory power. British Journal for the Philosophy of Science, 54, 245-259.
    • Builds on Nolan (1997), argues that quantitative parsimony is linked with explanatory power.
  • Baker, A. 2007. Occam’s Razor in science: a case study from biogeography. Biology and Philosophy, 22, 193-215.
    • Argues for a “naturalistic” justification of Ockham’s Razor and that preferences for ontological parsimony played a significant role in the late 19th century debate in bio-geography between dispersalist and extensionist theories.
  • Barnes, E.C. 2000. Ockham’s razor and the anti-superfluity principle. Erkenntnis, 53, 353-374.
    • Draws a useful distinction between two different interpretations of Ockham’s Razor: the anti-superfluity principle and the anti-quantity principle. Explicates an evidential justification for anti-superfluity principle.
  • Boyd, R. 1990. Observations, explanatory power, and simplicity: towards a non-Humean account. In R. Boyd, P. Gasper and J.D. Trout (eds.), The Philosophy of Science. Cambridge, MA: MIT Press.
    • Argues that appeals to simplicity in theory evaluation are typically best understood as covert judgments of theoretical plausibility.
  • Bunge, M. 1961. The weight of simplicity in the construction and assaying of scientific theories. Philosophy of Science, 28, 162-171.
    • Takes a skeptical view about the importance and justifiability of a simplicity criterion in theory evaluation.
  • Carlson, E. 1966. The Gene: A Critical History. Philadelphia: Saunders.
    • Argues that simplicity considerations played a significant role in several important debates in the history of genetics.
  • Carnap, R. 1950. Logical Foundations of Probability. Chicago: University of Chicago Press.
  • Chater, N. 1999. The search for simplicity: a fundamental cognitive principle. The Quarterly Journal of Experimental Psychology, 52A, 273-302.
    • Argues that simplicity plays a fundamental role in human reasoning, with simplicity to be defined in terms of Kolmogorov complexity.
  • Cohen, I.B. 1985. Revolutions in Science. Cambridge, MA: Harvard University Press.
  • Cohen, I.B. 1999. A guide to Newton’s Principia. In I. Newton, The Principia: Mathematical Principles of Natural Philosophy; A New Translation by I. Bernard Cohen and Anne Whitman. Berkeley: University of California Press.
  • Crick, F. 1988. What Mad Pursuit: a Personal View of Scientific Discovery. New York: Basic Books.
    • Argues that the application of Ockham’s Razor to biology is inadvisable.
  • Dowe, D, Gardner, S., and Oppy, G. 2007. Bayes not bust! Why simplicity is no problem for Bayesians. British Journal for the Philosophy of Science, 58, 709-754.
    • Contra Forster and Sober (1994), argues that Bayesians can make sense of the role of simplicity in curve-fitting.
  • Duhem, P. 1954. The Aim and Structure of Physical Theory. Princeton: Princeton University Press.
  • Einstein, A. 1954. Ideas and Opinions. New York: Crown.
    • Einstein’s views about the role of simplicity in physics.
  • Fitzpatrick, S. 2009. The primate mindreading controversy: a case study in simplicity and methodology in animal psychology. In R. Lurz (ed.), The Philosophy of Animal Minds. New York: Cambridge University Press.
    • Advocates a deflationary analysis of appeals to simplicity in debates over the cognitive capacities of non-human primates.
  • Forster, M. 1995. Bayes and bust: simplicity as a problem for a probabilist’s approach to confirmation. British Journal for the Philosophy of Science, 46, 399-424.
    • Argues that the Bayesian approach to scientific reasoning is inadequate because it cannot make sense of the role of simplicity in theory evaluation.
  • Forster, M. 1999. Model selection in science: the problem of language variance. British Journal for the Philosophy of Science, 50, 83-102.
    • Responds to criticisms of Forster and Sober (1994). Argues that AIC relies on a language invariant measure of simplicity.
  • Forster, M. 2001. The new science of simplicity. In A. Zellner, H. Keuzenkamp and M. McAleer (eds.), Simplicity, Inference and Modelling. Cambridge: Cambridge University Press.
    • Accessible introduction to model selection theory. Describes how different procedures, including AIC, BIC, and MDL, trade-off simplicity and fit to the data.
  • Forster, M. and Sober, E. 1994. How to tell when simpler, more unified, or less ad hoc theories will provide more accurate predictions. British Journal for the Philosophy of Science, 45, 1-35.
    • Explication of AIC statistics and its relevance to the philosophical problem of justifying preferences for simpler theories. Argues against Bayesian approaches to simplicity. Technical in places.
  • Foster, M. and Martin, M. 1966. Probability, Confirmation, and Simplicity: Readings in the Philosophy of Inductive Logic. New York: The Odyssey Press.
    • Anthology of papers discussing the role of simplicity in induction. Contains important papers by Ackermann, Barker, Bunge, Goodman, Kemeny, and Quine.
  • Friedman, M. 1974. Explanation and scientific understanding. Journal of Philosophy, LXXI, 1-19.
    • Defends a unification account of explanation, connects simplicity with explanatoriness.
  • Galilei, G. 1962. Dialogues concerning the Two Chief World Systems. Berkeley: University of California Press.
    • Classic defense of Copernicanism with significant emphasis placed on the greater simplicity and harmony of the Copernican system. Asserts that nature does nothing in vain.
  • Gauch, H. 2003. Scientific Method in Practice. Cambridge: Cambridge University Press.
    • Wide-ranging discussion of the scientific method written by a scientist for scientists. Contains a chapter on the importance of parsimony in science.
  • Gauch, H. 2006. Winning the accuracy game. American Scientist, 94, March-April 2006, 134-141.
    • Useful informal presentation of the concept of Ockham’s hill and its importance to scientific research in a number of fields.
  • Gingerich, O. 1993. The Eye of Heaven: Ptolemy, Copernicus, Kepler. New York: American Institute of Physics.
  • Glymour, C. 1980. Theory and Evidence. Princeton: Princeton University Press.
    • An important critique of Bayesian attempts to make sense of the role of simplicity in science. Defends a “boot-strapping” analysis of the simplicity arguments for Copernicanism and Newton’s argument for universal gravitation.
  • Goodman, N. 1943. On the simplicity of ideas. Journal of Symbolic Logic, 8, 107-1.
  • Goodman, N. 1955. Axiomatic measurement of simplicity. Journal of Philosophy, 52, 709-722.
  • Goodman, N. 1958. The test of simplicity. Science, 128, October 31st 1958, 1064-1069.
    • Reasonably accessible introduction to Goodman’s attempts to formulate a measure of logical simplicity.
  • Goodman, N. 1959. Recent developments in the theory of simplicity. Philosophy and Phenomenological Research, 19, 429-446.
    • Response to criticisms of Goodman (1955).
  • Goodman, N. 1961. Safety, strength, simplicity. Philosophy of Science, 28, 150-151.
    • Argues that simplicity cannot be equated with testability, empirical content, or paucity of assumption.
  • Goodman, N. 1983. Fact, Fiction and Forecast (4th edition). Cambridge, MA: Harvard University Press.
  • Harman, G. 1999. Simplicity as a pragmatic criterion for deciding what hypotheses to take seriously. In G. Harman, Reasoning, Meaning and Mind. Oxford: Oxford University Press.
    • Defends the claim that simplicity is a fundamental component of inductive inference and that this role has a pragmatic justification.
  • Harman, G. and Kulkarni, S. 2007. Reliable Reasoning: Induction and Statistical Learning Theory. Cambridge, MA: MIT Press.
    • Accessible introduction to statistical learning theory and VC dimension.
  • Harper, W. 2002. Newton’s argument for universal gravitation. In I.B. Cohen and G.E. Smith (eds.), The Cambridge Companion to Newton. Cambridge: Cambridge University Press.
  • Hesse, M. 1967. Simplicity. In P. Edwards (ed.), The Encyclopaedia of Philosophy, vol. 7. New York: Macmillan.
    • Focuses on attempts by Jeffreys, Popper, Kemeny, and Goodman to formulate measures of simplicity.
  • Hesse, M. 1974. The Structure of Scientific Inference. London: Macmillan.
    • Defends the view that simplicity is a determinant of prior probability. Useful discussion of the role of simplicity in Einstein’s work.
  • Holton, G. 1974. Thematic Origins of Modern Science: Kepler to Einstein. Cambridge, MA: Harvard University Press.
    • Discusses the role of aesthetic considerations, including simplicity, in the history of science.
  • Hoffman, R., Minkin, V., and Carpenter, B. 1997. Ockham’s Razor and chemistry. Hyle, 3, 3-28.
    • Discussion by three chemists of the benefits and pitfalls of applying Ockham’s Razor in chemical research.
  • Howson, C. and Urbach, P. 2006. Scientific Reasoning: The Bayesian Approach (Third Edition). Chicago: Open Court.
    • Contains a useful survey of Bayesian attempts to make sense of the role of simplicity in theory evaluation. Technical in places.
  • Jeffreys, H. 1957. Scientific Inference (2nd edition). Cambridge: Cambridge University Press.
    • Defends the “simplicity postulate” that simpler theories have higher prior probability.
  • Jeffreys, H. 1961. Theory of Probability. Oxford: Clarendon Press.
    • Outline and defense of the Bayesian approach to scientific inference. Discusses the role of simplicity in the determination of priors and likelihoods.
  • Kelly, K. 2004. Justification as truth-finding efficiency: how Ockham’s Razor works. Minds and Machines, 14, 485-505.
    • Argues that Ockham’s Razor is justified by considerations of truth-finding efficiency. Critiques Bayesian, Akiakian, and other traditional attempts to justify simplicity preferences. Technical in places.
  • Kelly, K. 2007. How simplicity helps you find the truth without pointing at it. In M. Friend, N. Goethe, and V.Harizanov (eds.), Induction, Algorithmic Learning Theory, and Philosophy. Dordrecht: Springer.
    • Refinement and development of the argument found in Kelly (2004) and Schulte (1999). Technical.
  • Kelly, K. 2010. Simplicity, truth and probability. In P. Bandyopadhyay and M. Forster (eds.), Handbook of the Philosophy of Statistics. Dordrecht: Elsevier.
    • Expands and develops the argument found in Kelly (2007). Detailed critique of Bayesian accounts of simplicity. Technical.
  • Kelly, K. and Glymour, C. 2004. Why probability does not capture the logic of scientific justification. In C. Hitchcock (ed.), Contemporary Debates in the Philosophy of Science. Oxford: Blackwell.
    • Argues that Bayesians can’t make sense of Ockham’s Razor.
  • Kemeny, J. 1955. Two measures of complexity. Journal of Philosophy, 52, p722-733.
    • Develops some of Goodman’s ideas about how to measure the logical simplicity of predicates and systems of predicates. Proposes a measure of simplicity similar to Popper’s (1959) falsifiability measure.
  • Kieseppä, I. A. 1997. Akaike Information Criterion, curve-fitting, and the philosophical problem of simplicity. British Journal for the Philosophy of Science, 48, p21-48.
    • Critique of Forster and Sober (1994). Argues that Akaike’s theorem has little relevance to traditional philosophical problems surrounding simplicity. Highly technical.
  • Kitcher, P. 1989. Explanatory unification and the causal structure of the world. In P. Kitcher and W. Salmon, Minnesota Studies in the Philosophy of Science, vol 13: Scientific Explanation, Minneapolis: University of Minnesota Press.
    • Defends a unification theory of explanation. Argues that simplicity contributes to explanatory power.
  • Kuhn, T. 1957. The Copernican Revolution. Cambridge, MA: Harvard University Press.
    • Influential discussion of the role of simplicity in the arguments for Copernicanism.
  • Kuhn, T. 1962. The Structure of Scientific Revolutions. Chicago: University of Chicago Press.
  • Kuipers, T. 2002. Beauty: a road to truth. Synthese, 131, 291-328.
    • Attempts to show how aesthetic considerations might be indicative of truth.
  • Kyburg, H. 1961. A modest proposal concerning simplicity. Philosophical Review, 70, 390-395.
    • Important critique of Goodman (1955). Argues that simplicity be identified with the number of quantifiers in a theory.
  • Lakatos, I. and Zahar, E. 1978. Why did Copernicus’s research programme supersede Ptolemy’s? In J. Worrall and G. Curie (eds.), The Methodology of Scientific Research Programmes: Philosophical Papers of Imre Lakatos, Volume 1. Cambridge: Cambridge University Press.
    • Argues that simplicity did not really play a significant role in the Copernican Revolution.
  • Lewis, D. 1973. Counterfactuals. Oxford: Basil Blackwell.
    • Argues that quantitative parsimony is less important than qualitative parsimony in scientific and philosophical theorizing.
  • Li, M. and Vitányi, P. 1997. An Introduction to Kolmogorov Complexity and its Applications (2nd edition). New York: Springer.
    • Detailed elaboration of Kolmogorov complexity as a measure of simplicity. Highly technical.
  • Lipton, P. 2004. Inference to the Best Explanation (2nd edition). Oxford: Basil Blackwell.
    • Account of inference to the best explanation as inference to the “loveliest” explanation. Defends the claim that simplicity contributes to explanatory loveliness.
  • Lombrozo, T. 2007. Simplicity and probability in causal explanation. Cognitive Psychology, 55, 232–257.
    • Argues that simplicity is used as a guide to assessing the probability of causal explanations.
  • Lu, H., Yuille, A., Liljeholm, M., Cheng, P. W., and Holyoak, K. J. 2006. Modeling causal learning using Bayesian generic priors on generative and preventive powers. In R. Sun and N. Miyake (eds.), Proceedings of the 28th annual conference of the cognitive science society, 519–524. Mahwah, NJ: Erlbaum.
    • Argues that simplicity plays a significant role in causal learning.
  • MacKay, D. 1992. Bayesian interpolation. Neural Computation, 4, 415-447.
    • First presentation of the concept of Ockham’s Hill.
  • Martens, R. 2009. Harmony and simplicity: aesthetic virtues and the rise of testability. Studies in History and Philosophy of Science, 40, 258-266.
    • Discussion of the Copernican simplicity arguments and recent attempts to reconstruct the justification for them.
  • McAlleer, M. 2001. Simplicity: views of some Nobel laureates in economic science. In A. Zellner, H. Keuzenkamp and M. McAleer (eds.), Simplicity, Inference and Modelling. Cambridge: Cambridge University Press.
    • Interesting survey of the views of famous economists on the place of simplicity considerations in their work.
  • McAllister, J. W. 1996. Beauty and Revolution in Science. Ithaca: Cornell University Press.
    • Proposes that scientists’ simplicity preferences are the product of an aesthetic induction.
  • Mill, J.S. 1867. An Examination of Sir William Hamilton’s Philosophy. London: Walter Scott.
  • Myrvold, W. 2003. A Bayesian account of the virtue of unification. Philosophy of Science, 70, 399-423.
  • Newton, I. 1999. The Principia: Mathematical Principles of Natural Philosophy; A New Translation by I. Bernard Cohen and Anne Whitman. Berkeley: University of California Press.
    • Contains Newton’s “rules for the study of natural philosophy”, which includes a version of Ockham’s Razor, defended in terms of the simplicity of nature. These rules play an explicit role in Newton’s argument for universal gravitation.
  • Nolan, D. 1997. Quantitative Parsimony. British Journal for the Philosophy of Science, 48, 329-343.
    • Contra Lewis (1973), argues that quantitative parsimony has been important in the history of science.
  • Norton, J. 2000. ‘Nature is the realization of the simplest conceivable mathematical ideas’: Einstein and canon of mathematical simplicity. Studies in the History and Philosophy of Modern Physics, 31, 135-170.
    • Discusses the evolution of Einstein’s thinking about the role of mathematical simplicity in physical theorizing.
  • Norton, J. 2003. A material theory of induction. Philosophy of Science, 70, p647-670.
    • Defends a “material” theory of induction. Argues that appeals to simplicity in induction reflect factual assumptions about the domain of inquiry.
  • Oreskes, N., Shrader-Frechette, K., Belitz, K. 1994. Verification, validation, and confirmation of numerical models in the earth sciences. Science, 263, 641-646.
  • Palter, R. 1970. An approach to the history of early astronomy. Studies in History and Philosophy of Science, 1, 93-133.
  • Pais, A. 1982. Subtle Is the Lord: The science and life of Albert Einstein. Oxford: Oxford University Press.
  • Peirce, C.S. 1931. Collected Papers of Charles Sanders Peirce, vol 6. C. Hartshorne, P. Weiss, and A. Burks (eds.). Cambridge, MA: Harvard University Press.
  • Plutynski, A. 2005. Parsimony and the Fisher-Wright debate. Biology and Philosophy, 20, 697-713.
    • Advocates a deflationary analysis of appeals to parsimony in debates between Wrightian and neo-Fisherian models of natural selection.
  • Popper, K. 1959. The Logic of Scientific Discovery. London: Hutchinson.
    • Argues that simplicity = empirical content = falsifiability.
  • Priest, G. 1976. Gruesome simplicity. Philosophy of Science, 43, 432-437.
    • Shows that standard measures of simplicity in curve-fitting are language variant.
  • Raftery, A., Madigan, D., and Hoeting, J. 1997. Bayesian model averaging for linear regression models. Journal of the American Statistical Association, 92, 179-191.
  • Reichenbach, H. 1949. On the justification of induction. In H. Feigl and W. Sellars (eds.), Readings in Philosophical Analysis. New York: Appleton-Century-Crofts.
  • Rosencrantz, R. 1983. Why Glymour is a Bayesian. In J. Earman (ed.), Testing Scientific Theories. Minneapolis: University of Minnesota Press.
    • Responds to Glymour (1980). Argues that simpler theories have higher likelihoods, using Copernican vs. Ptolemaic astronomy as an example.
  • Rothwell, G. 2006. Notes for the occasional major case manager. FBI Law Enforcement Bulletin, 75, 20-24.
    • Emphasizes the importance of Ockham’s Razor in criminal investigation.
  • Sakamoto, Y., Ishiguro, M., and Kitagawa, G. 1986. Akaike Information Criterion Statistics. New York: Springer.
  • Schaffner, K. 1974. Einstein versus Lorentz: research programmes and the logic of comparative theory evaluation. British Journal for the Philosophy of Science, 25, 45-78.
    • Argues that simplicity played a significant role in the development and early acceptance of special relativity.
  • Schulte, O. 1999. Means-end epistemology. British Journal for the Philosophy of Science, 50, 1-31.
    • First statement of the claim that Ockham’s Razor can be justified in terms of truth-finding efficiency.
  • Simon, H. 1962. The architecture of complexity. Proceedings of the American Philosophical Society, 106, 467-482.
    • Important discussion by a Nobel laureate of features common to complex systems in nature.
  • Sober, E. 1975. Simplicity. Oxford: Oxford University Press.
    • Argues that simplicity can be defined in terms of question-relative informativeness. Technical in places.
  • Sober, E. 1981. The principle of parsimony. British Journal for the Philosophy of Science, 32, 145-156.
    • Distinguishes between “agnostic” and “atheistic” versions of Ockham’s Razor. Argues that the atheistic razor has an inductive justification.
  • Sober, E. 1988. Reconstructing the Past: Parsimony, Evolution and Inference. Cambridge, MA: MIT Press.
    • Defends a deflationary account of simplicity in the context of the use of parsimony methods in evolutionary biology.
  • Sober, E. 1994. Let’s razor Ockham’s Razor. In E. Sober, From a Biological Point of View, Cambridge: Cambridge University Press.
    • Argues that the use of Ockham’s Razor is grounded in local background assumptions.
  • Sober, E. 2001a. What is the problem of simplicity? In H. Keuzenkamp, M. McAlleer, and A. Zellner (eds.), Simplicity, Inference and Modelling. Cambridge: Cambridge University Press.
  • Sober, E. 2001b. Simplicity. In W.H. Newton-Smith (ed.), A Companion to the Philosophy of Science, Oxford: Blackwell.
  • Sober, E. 2007. Evidence and Evolution. New York: Cambridge University Press.
  • Solomonoff, R.J. 1964. A formal theory of inductive inference, part 1 and part 2. Information and Control, 7, 1-22, 224-254.
  • Suppes, P. 1956. Nelson Goodman on the concept of logical simplicity. Philosophy of Science, 23, 153-159.
  • Swinburne, R. 2001. Epistemic Justification. Oxford: Oxford University Press.
    • Argues that the principle that simpler theories are more probably true is a fundamental a priori principle.
  • Thagard, P. 1988. Computational Philosophy of Science. Cambridge, MA: MIT Press.
    • Simplicity is a determinant of the goodness of an explanation and can be measured in terms of the paucity of auxiliary assumptions relative to the number of facts explained.
  • Thorburn, W. 1918. The myth of Occam’s Razor. Mind, 23, 345-353.
    • Argues that William of Ockham would not have advocated many of the principles that have been attributed to him.
  • van Fraassen, B. 1989. Laws and Symmetry. Oxford: Oxford University Press.
  • Wallace, C. S. and Dowe, D. L. 1999. Minimum Message Length and Kolmogorov Complexity. Computer Journal, 42(4), 270–83.
  • Walsh, D. 1979. Occam’s Razor: A Principle of Intellectual Elegance. American Philosophical Quarterly, 16, 241-244.
  • Weinberg, S. 1993. Dreams of a Final Theory. New York: Vintage.
    • Argues that physicists demand simplicity in physical principles before they can be taken seriously.
  • White, R. 2005. Why favour simplicity? Analysis, 65, 205-210.
    • Attempts to justify preferences for simpler theories in virtue of such theories having higher likelihoods.
  • Zellner, A, Keuzenkamp, H., and McAleer, M. 2001. Simplicity, Inference and Modelling. Cambridge: Cambridge University Press.
    • Collection papers by statisticians, philosophers, and economists on the role of simplicity in scientific inference and modelling.

Author Information

Simon Fitzpatrick
John Carroll University
U. S. A.

Zeno’s Paradoxes

Zeno_of_EleaIn the fifth century B.C.E., Zeno of Elea offered arguments that led to conclusions contradicting what we all know from our physical experience—that runners run, that arrows fly, and that there are many different things in the world. The arguments were paradoxes for the ancient Greek philosophers. Because many of the arguments turn crucially on the notion that space and time are infinitely divisible, Zeno was the first person to show that the concept of infinity is problematical.

In the Achilles Paradox, Achilles races to catch a slower runner—for example, a tortoise that is crawling in a line away from him. The tortoise has a head start, so if Achilles hopes to overtake it, he must run at least as far as the place where the tortoise presently is, but by the time he arrives there, it will have crawled to a new place, so then Achilles must run at least to this new place, but the tortoise meanwhile will have crawled on, and so forth. Achilles will never catch the tortoise, says Zeno. Therefore, good reasoning shows that fast runners never can catch slow ones. So much the worse for the claim that any kind of motion really occurs, Zeno says in defense of his mentor Parmenides who had argued that motion is an illusion.

Although practically no scholars today would agree with Zeno’s conclusion, we cannot escape the paradox by jumping up from our seat and chasing down a tortoise, nor by saying Zeno should have constructed a new argument in which Achilles takes better aim and runs to some other target place ahead of where the tortoise is. Because Zeno was correct in saying Achilles needs to run at least to all those places where the tortoise once was, what is required is an analysis of Zeno's own argument.

This article explains his ten known paradoxes and considers the treatments that have been offered. In the Achilles Paradox, Zeno assumed distances and durations can be endlessly divided into (what modern mathematicians call a transfinite infinity of indivisible) parts, and he assumed there are too many of these parts for the runner to complete. Aristotle's treatment said Zeno should have assumed instead that there are only potential infinities, so that at any time the hypothetical division into parts produces only a finite number of parts, and the runner has time to complete all these parts. Aristotle's treatment became the generally accepted solution until the late 19th century. The current standard treatment or so-called "Standard Solution" implies Zeno was correct to conclude that a runner's path contains an actual infinity of parts at any time during the motion, but he was mistaken to assume this is too many parts. This treatment employs the mathematical apparatus of calculus which has proved its indispensability for the development of modern science.  The article ends by exploring newer treatments of the paradoxes—and related paradoxes such as Thomson's Lamp Paradox—that were developed since the 1950s.

Table of Contents

  1. Zeno of Elea
    1. His Life
    2. His Book
    3. His Goals
    4. His Method
  2. The Standard Solution to the Paradoxes
  3. The Ten Paradoxes
    1. Paradoxes of Motion
      1. The Achilles
      2. The Dichotomy (The Racetrack)
      3. The Arrow
      4. The Moving Rows (The Stadium)
    2. Paradoxes of Plurality
      1. Alike and Unlike
      2. Limited and Unlimited
      3. Large and Small
      4. Infinite Divisibility
    3. Other Paradoxes
      1. The Grain of Millet
      2. Against Place
  4. Aristotle’s Treatment of the Paradoxes
  5. Other Issues Involving the Paradoxes
    1. Consequences of Accepting the Standard Solution
    2. Criticisms of the Standard Solution
    3. Supertasks and Infinity Machines
    4. Constructivism
    5. Nonstandard Analysis
    6. Smooth Infinitesimal Analysis
  6. The Legacy and Current Significance of the Paradoxes
  7. References and Further Reading

1. Zeno of Elea

a. His Life

Zeno was born in about 490 B.C.E. in Elea, now Velia, in southern Italy; and he died in about 430 B.C.E. He was a friend and student of Parmenides, who was twenty-five years older and also from Elea. He was not a mathematician.

There is little additional, reliable information about Zeno’s life. Plato remarked (in Parmenides 127b) that Parmenides took Zeno to Athens with him where he encountered Socrates, who was about twenty years younger than Zeno, but today’s scholars consider this encounter to have been invented by Plato to improve the story line. Zeno is reported to have been arrested for taking weapons to rebels opposed to the tyrant who ruled Elea. When asked about his accomplices, Zeno said he wished to whisper something privately to the tyrant. But when the tyrant came near, Zeno bit him, and would not let go until he was stabbed. Diogenes Laërtius reported this apocryphal story seven hundred years after Zeno’s death.

b. His Book

According to Plato’s commentary in his Parmenides (127a to 128e), Zeno brought a treatise with him when he visited Athens. It was said to be a book of paradoxes defending the philosophy of Parmenides. Plato and Aristotle may have had access to the book, but Plato did not state any of the arguments, and Aristotle’s presentations of the arguments are very compressed. A thousand years after Zeno, the Greek philosophers Proclus and Simplicius commented on the book and its arguments. They had access to some of the book, perhaps to all of it, but it has not survived. Proclus is the first person to tell us that the book contained forty arguments. This number is confirmed by the sixth century commentator Elias, who is regarded as an independent source because he does not mention Proclus. Unfortunately, we know of no specific dates for when Zeno composed any of his paradoxes, and we know very little of how Zeno stated his own paradoxes. We do have a direct quotation via Simplicius of the Paradox of Denseness and a partial quotation via Simplicius of the Large and Small Paradox. In total we know of less than two hundred words that can be attributed to Zeno. Our knowledge of these two paradoxes and the other seven comes to us indirectly through paraphrases of them, and comments on them, primarily by his opponents Aristotle (384-322 B.C.E.), Plato (427-347 B.C.E.), Proclus (410-485 C.E.), and Simplicius (490-560 C.E.). The names of the paradoxes were created by later commentators, not by Zeno.

c. His Goals

In the early fifth century B.C.E., Parmenides emphasized the distinction between appearance and reality. Reality, he said, is a seamless unity that is unchanging and can not be destroyed, so appearances of reality are deceptive. Our ordinary observation reports are false; they do not report what is real. This metaphysical theory is the opposite of Heraclitus’ theory, but evidently it was supported by Zeno. Although we do not know from Zeno himself whether he accepted his own paradoxical arguments or exactly what point he was making with them, according to Plato the paradoxes were designed to provide detailed, supporting arguments for Parmenides by demonstrating that our common sense confidence in the reality of motion, change, and ontological plurality (that is, that there exist many things), involve absurdities. Plato’s classical interpretation of Zeno was accepted by Aristotle and by most other commentators throughout the intervening centuries. On Plato's interpretation, it could reasonably be said that Zeno reasoned this way: His Dichotomy and Achilles paradoxes presumably demonstrate that any continuous process takes an infinite amount of time, which is paradoxical. Zeno's Arrow and Stadium paradoxes demonstrate that the concept of discontinuous change is paradoxical. Because both continuous and discontinuous change are paradoxical, so is any change.

Eudemus, a student of Aristotle, offered another interpretation. He suggested that Zeno was challenging both pluralism and Parmenides’ idea of monism, which would imply that Zeno was a nihilist. Paul Tannery in 1885 and Wallace Matson in 2001 offer a third interpretation of Zeno’s goals regarding the paradoxes of motion. Plato and Aristotle did not understand Zeno’s arguments nor his purpose, they say. Zeno was actually challenging the Pythagoreans and their particular brand of pluralism, not Greek common sense. Zeno was not trying to directly support Parmenides. Instead, he intended to show that Parmenides’ opponents are committed to denying the very motion, change, and plurality they believe in, and Zeno’s arguments were completely successful. This controversial issue about interpreting Zeno’s purposes will not be pursued further in this article, and Plato’s classical interpretation will be assumed.

Aristotle believed Zeno's Paradoxes were trivial and easily resolved, but later philosophers have not agreed on the triviality.

d. His Method

Before Zeno, Greek thinkers favored presenting their philosophical views by writing poetry. Zeno began the grand shift away from poetry toward a prose that contained explicit premises and conclusions. And he employed the method of indirect proof in his paradoxes by temporarily assuming some thesis that he opposed and then attempting to deduce an absurd conclusion or a contradiction, thereby undermining the temporary assumption. This method of indirect proof or reductio ad absurdum probably originated with his teacher Parmenides [although this is disputed in the scholarly literature], but Zeno used it more systematically.

2. The Standard Solution to the Paradoxes

Any paradox can be treated by abandoning enough of its crucial assumptions. For Zeno's it is very interesting to consider which assumptions to abandon, and why those. A paradox is an argument that reaches a contradiction by apparently legitimate steps from apparently reasonable assumptions, while the experts at the time cannot agree on the way out of the paradox, that is, agree on its resolution. It is this latter point about disagreement among the experts that distinguishes a paradox from a mere puzzle in the ordinary sense of that term. Zeno’s paradoxes are now generally considered to be puzzles because of the wide agreement among today’s experts that there is at least one acceptable resolution of the paradoxes.

This resolution is called the Standard Solution. It points out that, although Zeno was correct in saying that at any point or instant before reaching the goal there is always some as yet uncompleted path to cover, this does not imply that the goal is never reached. More specifically, the Standard Solution says that for the runners in the Achilles Paradox and the Dichotomy Paradox, the runner's path is a physical continuum that is completed by using a positive, finite speed. The details presuppose differential calculus and classical mechanics (as opposed to quantum mechanics). The Standard Solution treats speed as the derivative of distance with respect to time. It assumes that physical processes are sets of point-events. It implies that durations, distances and line segments are all linear continua composed of indivisible points, then it uses these ideas to challenge various assumptions made, and inference steps taken, by Zeno. To be very brief and anachronistic, Zeno's mistake (and Aristotle's mistake) was to fail to use calculus. More specifically, in the case of the paradoxes of motion such as the Achilles and the Dichotomy, Zeno's mistake was not his assuming there is a completed infinity of places for the runner to go, which was what Aristotle said was Zeno's mistake. Instead, Zeno's and Aristotle's mistake was in assuming that this is too many places (for the runner to go to in a finite time).

A key background assumption of the Standard Solution is that this resolution is not simply employing some concepts that will undermine Zeno’s reasoning—Aristotle's reasoning does that, too, at least for most of the paradoxes—but that it is employing concepts which have been shown to be appropriate for the development of a coherent and fruitful system of mathematics and physical science. Aristotle's treatment of the paradoxes does not employ these fruitful concepts of mathematical physics. Aristotle did not believe that the use of mathematics was needed to understand the world. So, the Standard Solution is much more complicated than Aristotle's treatment. No single person can be credited with creating it.

The Standard Solution allows us to speak of one event happening pi seconds after another, and of one event happening the square root of three seconds after another. In ordinary discourse outside of science we would never need this kind of precision, but it is needed in mathematical physics and its calculus. The need for this precision has led to requiring time to be a linear continuum, very much like a segment of the real number line. By "real numbers" we do not mean actual numbers but rather decimal numbers.

Calculus was invented in the late 1600's by Newton and Leibniz. Their calculus is a technique for treating continuous motion as being composed of an infinite number of infinitesimal steps. After the acceptance of calculus, most all mathematicians and physicists believed that continuous motion should be modeled by a function which takes real numbers representing time as its argument and which gives real numbers representing spatial position as its value. This position function should be continuous or gap-free. In addition, the position function should be differentiable in order to make sense of speed, which is treated as the rate of change of position. By the early 20th century most mathematicians had come to believe that, to make rigorous sense of motion, mathematics needs a fully developed set theory that rigorously defines the key concepts of real number, continuity and differentiability. Doing this requires a well defined concept of the continuum. Unfortunately Newton and Leibniz did not have a good definition of the continuum, and finding a good one required over two hundred years of work.

The continuum is a very special set; it is the standard model of the real numbers. Intuitively, a continuum is a continuous entity; it is a whole thing that has no gaps. Some examples of a continuum are the path of a runner’s center of mass, the time elapsed during this motion, ocean salinity, and the temperature along a metal rod. Distances and durations are normally considered to be real physical continua whereas treating the ocean salinity and the rod's temperature as continua is a very useful approximation for many calculations in physics even though we know that at the atomic level the approximation breaks down.

The distinction between “a” continuum and “the” continuum is that “the” continuum is the paradigm of “a” continuum. The continuum is the mathematical line, the line of geometry, which is standardly understood to have the same structure as the real numbers in their natural order. Real numbers and points on the continuum can be put into a one-to-one order-preserving correspondence. There are not enough rational numbers for this correspondence even though the rational numbers are dense, too (in the sense that between any two rational numbers there is another rational number).

For Zeno’s paradoxes, standard analysis assumes that length should be defined in terms of measure, and motion should be defined in terms of the derivative. These definitions are given in terms of the linear continuum. The most important features of any linear continuum are that (a) it is composed of indivisible points, (b) it is an actually infinite set, that is, a transfinite set, and not merely a potentially infinite set that gets bigger over time, (c) it is undivided yet infinitely divisible (that is, it is gap-free), (d) the points are so close together that no point can have a point immediately next to it, (e) between any two points there are other points, (f) the measure (such as length) of a continuum is not a matter of adding up the measures of its points nor adding up the number of its points, (g) any connected part of a continuum is also a continuum, and (h) there are an aleph-one number of points between any two points.

Physical space is not a linear continuum because it is three-dimensional and not linear; but it has one-dimensional subspaces such as paths of runners and orbits of planets; and these are linear continua if we use the path created by only one point on the runner and the orbit created by only one point on the planet. Regarding time, each (point) instant is assigned a real number as its time, and each instant is assigned a duration of zero. The time taken by Achilles to catch the tortoise is a temporal interval, a linear continuum of instants, according to the Standard Solution (but not according to Zeno or Aristotle). The Standard Solution says that the sequence of Achilles' goals (the goals of reaching the point where the tortoise is) should be abstracted from a pre-existing transfinite set, namely a linear continuum of point places along the tortoise's path. Aristotle's treatment does not do this. The next section of this article presents the details of how the concepts of the Standard Solution are used to resolve each of Zeno's Paradoxes.

Of the ten known paradoxes, The Achilles attracted the most attention over the centuries. Aristotle’s treatment of the paradox involved accusing Zeno of using the concept of an actual or completed infinity instead of the concept of a potential infinity, and accusing Zeno of failing to appreciate that a line cannot be composed of indivisible points. Aristotle’s treatment is described in detail below. It was generally accepted until the 19th century, but slowly lost ground to the Standard Solution. Some historians say Aristotle had no solution but only a verbal quibble. This article takes no side on this dispute and speaks of Aristotle’s “treatment.”

The development of calculus was the most important step in the Standard Solution of Zeno's paradoxes, so why did it take so long for the Standard Solution to be accepted after Newton and Leibniz developed their calculus? The period lasted about two hundred years. There are four reasons. (1) It took time for calculus and the rest of real analysis to prove its applicability and fruitfulness in physics. (2) It took time for the relative shallowness of Aristotle’s treatment to be recognized because Aristotle was so influential that everyone was very cautious about disagreeing with him; for so many centuries the European intellectuals believed their role was to understand Aristotle, not challenge him. (3) It took time for philosophers of science to appreciate that each theoretical concept used in a physical theory need not have its own correlate in our experience.  (4) It took time for certain problems in the foundations of mathematics to be resolved, such as finding a better definition of the continuum and avoiding the paradoxes of Cantor's naive set theory.

Point (2) is discussed in section 4 below.

Point (3) is about the time it took for philosophers of science to reject the demand, favored by Ernst Mach and many Logical Positivists, that meaningful terms in science must have “empirical meaning.” This was the demand that each physical concept be separately definable with observation terms. It was thought that, because our experience is finite, the term “actual infinite” or "completed infinity" could not have empirical meaning, but “potential infinity” could. Today, most philosophers would not restrict meaning to empirical meaning. They believe in indivisible points even though they are not even indirectly observable. However, for an interesting exception see Dummett (2000) which contains a theory in which time is composed of overlapping intervals rather than durationless instants, and in which the endpoints of those intervals are the initiation and termination of actual physical processes. This idea of treating time without instants develops a 1936 proposal of Russell and Whitehead. The central philosophical issue about Dummett's treatment of motion is whether its adoption would negatively affect other areas of mathematics and science.

Point (1) is about the time it took for classical mechanics to develop to the point where it was accepted as giving correct solutions to problems involving motion. Point (1) was challenged in the metaphysical literature on the grounds that the abstract account of continuity in real analysis does not truly describe either time, space or concrete physical reality. This challenge is discussed in later sections.

Point (4) arises because the standard of rigorous proof and rigorous definition of concepts has increased over the years. As a consequence, the difficulties in the foundations of real analysis, which began with George Berkeley’s criticism of inconsistencies in the use of infinitesimals in the calculus of Leibniz (and fluxions in the calculus of Newton), were not satisfactorily resolved until the early 20th century with the development of Zermelo-Fraenkel set theory. The key idea was to work out the necessary and sufficient conditions for being a continuum. To achieve the goal, the conditions for being a mathematical continuum had to be strictly arithmetical and not dependent on our intuitions about space, time and motion. The idea was to revise or “tweak” the definition until it would not create new paradoxes and would still give useful theorems. When this revision was completed, it could be declared that the set of real numbers is an actual infinity, not a potential infinity, and that not only is any interval of real numbers a linear continuum, but so are the spatial paths, the temporal durations, and the motions that are mentioned in Zeno’s paradoxes. In addition, it was important to clarify how to compute the sum of an infinite series (such as 1/2 + 1/4 + 1/8 + ...) and how to define motion in terms of the derivative. This new mathematical system required new or better-defined mathematical concepts of compact set, connected set, continuity, continuous function, convergence-to-a-limit of an infinite sequence (such as 1/2, 1/4, 1/8, ...), curvature at a point, cut, derivative, dimension, function, integral, limit, measure, reference frame, set, and size of a set. Similarly, rigor was added to the definitions of the physical concepts of place, instant, duration, distance, and instantaneous speed. The relevant revisions were made by Euler in the 18th century and by Bolzano, Cantor, Cauchy, Dedekind, Frege, Hilbert, Lebesgue, Peano, Russell, Weierstrass, and Whitehead, among others, during the 19th and early 20th centuries.

What about Leibniz's infinitesimals or Newton's fluxions? Let's stick with infinitesimals, since fluxions have the same problems and same resolution. In 1734, Berkeley had properly criticized the use of infinitesimals as being "ghosts of departed quantities" that are used inconsistently in calculus. Earlier Newton had defined instantaneous speed as the ratio of an infinitesimally small distance and an infinitesimally small duration, and he and Leibniz produced a system of calculating variable speeds that was very fruitful. But nobody in that century or the next could adequately explain what an infinitesimal was. Newton had called them “evanescent divisible quantities,” whatever that meant. Leibniz called them “vanishingly small,” but that was just as vague. The practical use of infinitesimals was unsystematic. For example, the infinitesimal dx is treated as being equal to zero when it is declared that x + dx = x, but is treated as not being zero when used in the denominator of the fraction [f(x + dx) - f(x)]/dx which is the derivative of the function f. In addition, consider the seemingly obvious Archimedean property of pairs of positive numbers: given any two positive numbers A and B, if you add enough copies of A, then you can produce a sum greater than B. This property fails if A is an infinitesimal. Finally, mathematicians gave up on answering Berkeley’s charges (and thus re-defined what we mean by standard analysis) because, in 1821, Cauchy showed how to achieve the same useful theorems of calculus by using the idea of a limit instead of an infinitesimal. Later in the 19th century, Weierstrass resolved some of the inconsistencies in Cauchy’s account and satisfactorily showed how to define continuity in terms of limits (his epsilon-delta method). As J. O. Wisdom points out (1953, p. 23), “At the same time it became clear that [Leibniz's and] Newton’s theory, with suitable amendments and additions, could be soundly based.” In an effort to provide this sound basis according to the latest, heightened standard of what counts as “sound,” Peano, Frege, Hilbert, and Russell attempted to properly axiomatize real analysis. This led in 1901 to Russell’s paradox and the fruitful controversy about how to provide a foundation to all of mathematics. That controversy still exists, but the majority view is that axiomatic Zermelo-Fraenkel set theory with the axiom of choice blocks all the paradoxes, legitimizes Cantor’s theory of transfinite sets, and provides the proper foundation for real analysis and other areas of mathematics. This standard real analysis lacks infinitesimals, thanks to Cauchy and Weierstrass. Standard real analysis is the mathematics that the Standard Solution applies to Zeno’s Paradoxes.

The rational numbers are not continuous although they are infinitely numerous and infinitely dense. To come up with a foundation for calculus there had to be a good definition of the continuity of the real numbers. But this required having a good definition of irrational numbers. There wasn’t one before 1872. Dedekind’s definition in 1872 defines the mysterious irrationals in terms of the familiar rationals. The result was a clear and useful definition of real numbers. The usefulness of Dedekind's definition of real numbers, and the lack of any better definition, convinced many mathematicians to be more open to accepting both the real numbers and actually-infinite sets.

We won't explore the definitions of continuity here, but what Dedekind discovered about the reals and their relationship to the rationals was how to define a real number to be a cut of the rational numbers, where a cut is a certain ordered pair of actually-infinite sets of rational numbers.

A Dedekind cut (A,B) is defined to be a partition or cutting of the set of all the rational numbers into a left part A and a right part B. A and B are non-empty subsets, such that all rational numbers in A are less than all rational numbers in B, and also A contains no greatest number. Every real number is a unique Dedekind cut. The cut can be made at a rational number or at an irrational number. Here are examples of each:

Dedekind's real number 1/2 is ({x : x < 1/2} , {x: x ≥ 1/2}).

Dedekind's positive real number √2 is ({x : x < 0 or x2 < 2} , {x: x2 ≥ 2}).

The value of 'x' must be rational only. For any cut (A,B), if B has a smallest number, then the real number for that cut corresponds to this smallest number, as in the definition of ½ above. Otherwise, the cut defines an irrational number which, loosely speaking, fills the gap between A and B, as in the definition of the square root of 2 above.

By defining reals in terms of rationals this way, Dedekind gave a foundation to the reals, and legitimized them by showing they are as acceptable as actually-infinite sets of rationals.

But what exactly is an actually-infinite (or transfinite) set, and does this idea lead to contradictions? This question needs an answer if there is to be a good theory of continuity and of real numbers. In the 1870s, Cantor clarified what an actually-infinite set is and made a convincing case that the concept does not lead to inconsistencies. These accomplishments by Cantor are why he (along with Dedekind and Weierstrass) is said by Russell to have “solved Zeno’s Paradoxes.”

That solution recommends using very different concepts and theories than those used by Zeno. The argument that this is the correct solution was presented by many people, but it was especially influenced by the work of Bertrand Russell (1914, lecture 6) and the more detailed work of Adolf Grünbaum (1967). In brief, the argument for the Standard Solution is that we have solid grounds for believing our best scientific theories, but the theories of mathematics such as calculus and Zermelo-Fraenkel set theory are indispensable to these theories, so we have solid grounds for believing in them, too. The scientific theories require a resolution of Zeno’s paradoxes and the other paradoxes; and the Standard Solution to Zeno's Paradoxes that uses standard calculus and Zermelo-Fraenkel set theory is indispensable to this resolution or at least is the best resolution, or, if not, then we can be fairly sure there is no better solution, or, if not that either, then we can be confident that the solution is good enough (for our purposes). Aristotle's treatment, on the other hand, uses concepts that hamper the growth of mathematics and science. Therefore, we should accept the Standard Solution.

In the next section, this solution will be applied to each of Zeno’s ten paradoxes.

To be optimistic, the Standard Solution represents a counterexample to the claim that philosophical problems never get solved. To be less optimistic, the Standard Solution has its drawbacks and its alternatives, and these have generated new and interesting philosophical controversies beginning in the last half of the 20th century, as will be seen in later sections. The primary alternatives contain different treatments of calculus from that developed at the end of the 19th century. Whether this implies that Zeno’s paradoxes have multiple solutions or only one is still an open question.

Did Zeno make mistakes? And was he superficial or profound? These questions are a matter of dispute in the philosophical literature. The majority position is as follows. If we give his paradoxes a sympathetic reconstruction, he correctly demonstrated that some important, classical Greek concepts are logically inconsistent, and he did not make a mistake in doing this, except in the Moving Rows Paradox, the Paradox of Alike and Unlike and the Grain of Millet Paradox, his weakest paradoxes. Zeno did assume that the classical Greek concepts were the correct concepts to use in reasoning about his paradoxes, and now we prefer revised concepts, though it would be unfair to say he blundered for not foreseeing later developments in mathematics and physics.

3. The Ten Paradoxes

Zeno probably created forty paradoxes, of which only the following ten are known. Only the first four have standard names, and the first two have received the most attention. The ten are of uneven quality. Zeno and his ancient interpreters usually stated his paradoxes badly, so it has taken some clever reconstruction over the years to reveal their full force. Below, the paradoxes are reconstructed sympathetically, and then the Standard Solution is applied to them. These reconstructions use just one of several reasonable schemes for presenting the paradoxes, but the present article does not explore the historical research about the variety of interpretive schemes and their relative plausibility.

a. Paradoxes of Motion

i. The Achilles

Achilles, whom we can assume is the fastest runner of antiquity, is racing to catch the tortoise that is slowly crawling away from him. Both are moving along a linear path at constant speeds. In order to catch the tortoise, Achilles will have to reach the place where the tortoise presently is. However, by the time Achilles gets there, the tortoise will have crawled to a new location. Achilles will then have to reach this new location. By the time Achilles reaches that location, the tortoise will have moved on to yet another location, and so on forever. Zeno claims Achilles will never catch the tortoise. This argument shows, he believes, that anyone who believes Achilles will succeed in catching the tortoise and who believes more generally that motion is physically possible is the victim of illusion, as Parmenides had proclaimed.

The source for all of Zeno's arguments is the writings of his opponents. The Achilles Paradox is reconstructed from Aristotle (Physics Book VI, Chapter 8, 239b14-16) and some passages from Simplicius in the fifth century C.E. There is no evidence that Zeno used a tortoise rather than a slow human. The tortoise is a later commentator’s addition. Aristotle spoke simply of “the runner” who competes with Achilles.

It won’t do to react and say the solution to the paradox is that there are biological limitations on how small a step Achilles can take. Achilles’ feet are not obligated to stop and start again at each of the locations described above, so there is no limit to how close one of those locations can be to another. A stronger version of his paradox would ask us to consider the movement of Achilles' center of mass. It is best to think of Achilles' change from one location to another as a continuous movement rather than as incremental steps requiring halting and starting again. Zeno is assuming that space and time are infinitely divisible; they are not discrete or atomistic. If they were, this Paradox's argument would not work.


One common complaint with Zeno’s reasoning is that he is setting up a straw man because it is obvious that Achilles cannot catch the tortoise if he continually takes a bad aim toward the place where the tortoise is; he should aim farther ahead. The mistake in this complaint is that even if Achilles took some sort of better aim, it is still true that he is required to go to every one of those locations that are the goals of the so-called “bad aims,” so remarking about a bad aim is not a way to successfully treat Zeno's argument.

The treatment called the "Standard Solution" to the Achilles Paradox uses calculus and other parts of real analysis to describe the situation. It implies that Zeno is assuming Achilles cannot achieve his goal because

(1) there are too many places, or

(2) there is not enough time, or

(3) there is too far to run, or

(4) there is no final step, or

(5) there are too many tasks.

The historical record does not tell us which of these was Zeno's real assumption, but they are all false assumptions, according to the Standard Solution.

Let's consider assumption (3). Presumably Zeno would defend the assumption by remarking that the sum of the distances along so many of the runs toward the tortoise is infinite, which is too far to run even for Achilles. However, the advocate of the Standard Solution will remark, "How does Zeno know what the sum of this infinite series is?" According to the Standard Solution, the sum is finite.

Here is a graph using the methods of the Standard Solution to show the activity of Achilles as he chases the tortoise and overtakes it.

graph of Achilles and the TortoiseFor ease of understanding, Zeno and the tortoise are assumed to be point masses or infinitesimal particles, each moving at a constant velocity (that is, a constant speed in one direction). The graph, called a Minkowski diagram, is displaying the fact that Achilles' path in spacetime path is a linear continuum and so is composed of an infinity of points. Zeno's failure to assume that Achilles' path is a linear continuum is a fatal step in his argument, according to the Standard Solution which requires that the reasoner use the concepts of contemporary mathematical physics. Saying the spacetime path is a linear continuum is implying that the points on the line are isomorphic to the points of time under the happens-before relation which in turn are isomorphic to the real numbers under the less-than relation.

Achilles travels a distance d1 in reaching the point x1 where the tortoise starts, but by the time Achilles reaches x1, the tortoise has moved on to a new point x2. When Achilles reaches x2, having gone an additional distance d2, the tortoise has moved on to point x3, requiring Achilles to cover an additional distance d3, and so forth. This sequence of non-overlapping distances (or intervals or sub-paths) is an actual infinity, but happily the geometric series converges. The sum of its terms d1 + d2 + d3 +… is a finite distance that Achilles can readily complete while moving at a constant speed.

Similar reasoning would apply if Zeno were to have made assumptions (1) or (2) above. Regarding assumption (4), Zeno's requirement that there be a final step or final sub-path is simply mistaken according to the Standard Solution which implies the fast runner can take a step that reaches the slow runner, that is, can reach the same point as that of the slow runner, but there is no last point just before that point, just as there is no last real number just before one. More will be said about assumption (5) in Section 5c when we discuss supertasks.

The Achilles Argument presumes that space and time are continuous or infinitely divisible. So, Zeno's conclusion may not simply have been that Achilles cannot catch the tortoise but instead that he cannot catch the tortoise if space and time are infinitely divisible. Perhaps, as some commentators have speculated, Zeno used the Achilles Paradox only to attack continuous space, and he intended his other paradoxes such as the "Arrow" and the "The Moving Rows" to attack discrete space. The historical record is not clear. Notice that, although space and time are infinitely divisible for Zeno, he did not have the concepts to properly describe the limit of the repeated division. Neither Zeno nor any other ancient Greek even had the concept of zero.

ii. The Dichotomy (The Racetrack)

As Aristotle realized, the Dichotomy Paradox is just the Achilles Paradox in which Achilles stands still ahead of the tortoise. In his Progressive Dichotomy Paradox, Zeno argued that a runner will never reach the stationary goal line on a straight racetrack. The reason is that the runner must first reach half the distance to the goal, but when there he must still cross half the remaining distance to the goal, but having done that the runner must cover half of the new remainder, and so on. If the goal is one meter away, the runner must cover a distance of 1/2 meter, then 1/4 meter, then 1/8 meter, and so on ad infinitum. The runner cannot reach the final goal, says Zeno. Why not? There are few traces of Zeno's reasoning here, but for reconstructions that give the strongest reasoning, we may say that the runner will not reach the final goal because there is too far to run, the sum is actually infinite. The Standard Solution argues instead that the sum of this infinite geometric series is one, not infinity.

The problem of the runner getting to the goal can be viewed from a different perspective. According to the Regressive version of the Dichotomy Paradox, the runner cannot even take a first step. Here is why. Any step may be divided conceptually into a first half and a second half. Before taking a full step, the runner must take a 1/2 step, but before that he must take a 1/4 step, but before that a 1/8 step, and so forth ad infinitum, so Achilles will never get going. Like the Achilles Paradox, this paradox also concludes that any motion is impossible.

The Dichotomy paradox, in either its Progressive version or its Regressive version, assumes here for the sake of simplicity and strength of argumentation that the runner’s positions are point places. Actual runners take up some larger volume, but assuming point places is not a controversial assumption because Zeno could have reconstructed his paradox by speaking of the point places occupied by, say, the tip of the runner’s nose, and this assumption makes for a clearer and stronger paradox than assuming the runner's position is larger.

In the Dichotomy Paradox, the runner reaches the points 1/2 and 3/4 and 7/8 and so forth on the way to his goal, but under the influence of Bolzano and Dedekind and Cantor, who developed the first theory of sets, the set of those points is no longer considered to be potentially infinite. It is an actually infinite set of points abstracted from a continuum of points–in the contemporary sense of “continuum” at the heart of calculus. And the ancient idea that the actually infinite series of path lengths or segments 1/2 + 1/4 + 1/8 + … is infinite had to be rejected in favor of the new theory that it converges to 1. This is key to solving the Dichotomy Paradox, according to the Standard Solution. It is basically the same treatment as that given to the Achilles. The Dichotomy Paradox has been called “The Stadium” by some commentators, but that name is also commonly used for the Paradox of the Moving Rows.

Aristotle, in Physics Z9, said of the Dichotomy that it is possible for a runner to come in contact with a potentially infinite number of things in a finite time provided the time intervals becomes shorter and shorter. Aristotle said Zeno assumed this is impossible, and that is one of his errors in the Dichotomy. However, Aristotle merely asserted this and could give no detailed theory that enables the computation of the finite amount of time. So, Aristotle could not really defend his diagnosis of Zeno's error. Today the calculus is used to provide the Standard Solution with that detailed theory.

There is another detail of the Dichotomy that needs resolution. How does Zeno's runner complete the trip if there is no final step or last member of the infinite sequence of steps (intervals and goals)? Don't trips need last steps? The Standard Solution answers "no" and says the intuitive answer "yes" is one of many intuitions held by Zeno and Aristotle and the average person today that must be rejected when embracing the Standard Solution.

iii. The Arrow

Zeno’s Arrow Paradox takes a different approach to challenging the coherence of our common sense concepts of time and motion. Think of how you would distinguish an arrow that is stationary in space from one that is flying through space, given that you look only at a snapshot (an instantaneous photo) of them. As Aristotle explains, from Zeno’s “assumption that time is composed of moments,” a moving arrow must occupy a space equal to itself during any moment. That is, during any indivisible moment or instant it is at the place where it is. But places do not move. So, if in each moment, the arrow is occupying a space equal to itself, then the arrow is not moving in that moment. The reason it is not moving is that it has no time in which to move; it is simply there at the place. It cannot move during the moment because that motion would require an even smaller unit of time, but the moment is indivisible. The same reasoning holds for any other moment during the so-called “flight” of the arrow. So, the arrow is never moving. By a similar argument, Zeno can establish that nothing else moves. The source for Zeno’s argument is Aristotle (Physics, Book VI, chapter 5, 239b5-32).

The Standard Solution to the Arrow Paradox requires the reasoning to use our contemporary theory of speed from calculus. This theory defines instantaneous motion, that is, motion at an instant, without defining motion during an instant. This new treatment of motion originated with Newton and Leibniz in the sixteenth century, and it employs what is called the “at-at” theory of motion, which says motion is being at different places at different times. Motion isn't some feature that reveals itself only within a moment. The modern difference between rest and motion, as opposed to the difference in antiquity, has to do with what is happening at nearby moments and—contra Zeno—has nothing to do with what is happening during a moment.

Some researchers have speculated that the Arrow Paradox was designed by Zeno to attack discrete time and space rather than continuous time and space. This is not clear, and the Standard Solution works for both. That is, regardless of whether time is continuous and Zeno's instant has no finite duration, or time is discrete and Zeno's instant lasts for, say, 10-44 seconds, there is insufficient time for the arrow to move during the instant. Yet regardless of how long the instant lasts, there still can be instantaneous motion, namely motion at that instant provided the object is in a different place at some other instant.

To re-emphasize this crucial point, note that both Zeno and 21st century mathematical physicists agree that the arrow cannot be in motion within or during an instant, but the physicists will point out that the arrow can be in motion at an instant in the sense of having a positive speed at that instant (its so-called instantaneous speed), provided the arrow occupies different positions at times before or after that instant so that the instant is part of a period in which the arrow is continuously in motion. If we do not pay attention to what happens at nearby instants, it is impossible to distinguish instantaneous motion from instantaneous rest, but distinguishing the two is the way out of the Arrow Paradox. Zeno would have balked at the idea of motion at an instant, and Aristotle explicitly denied it.

The Arrow Paradox is refuted by the Standard Solution with its new at-at theory of motion, but the paradox seems especially strong to someone who would prefer instead to say that motion is an intrinsic property of an instant, being some propensity or disposition to be elsewhere.

Let's reconsider the details of the Standard Solution assuming continuous motion rather than discrete motion. In calculus, the speed of an object at an instant (its instantaneous speed) is the time derivative of the object's position; this means the object's speed is the limit of its series of average speeds during smaller and smaller intervals of time containing the instant. We make essentially the same point when we say the object's speed is the limit of its average speed over an interval as the length of the interval tends to zero. The derivative of the arrow's position x with respect to time t, namely dx/dt, is the arrow’s instantaneous speed, and it has non-zero values at specific places at specific instants during the arrow's flight, contra Zeno and Aristotle. The speed during an instant or in an instant, which is what Zeno is calling for, would be 0/0 and is undefined. But the speed at an instant is well defined. If we require the use of these modern concepts, then Zeno cannot successfully produce a contradiction as he tries to do by his assuming that in each moment the speed of the arrow is zero—because it is not zero. Therefore, advocates of the Standard Solution conclude that Zeno’s Arrow Paradox has a false, but crucial, assumption and so is unsound.

Independently of Zeno, the Arrow Paradox was discovered by the Chinese dialectician Kung-sun Lung (Gongsun Long, ca. 325–250 B.C.E.). A lingering philosophical question about the arrow paradox is whether there is a way to properly refute Zeno's argument that motion is impossible without using the apparatus of calculus.

iv. The Moving Rows (The Stadium)

According to Aristotle (Physics, Book VI, chapter 9, 239b33-240a18), Zeno try to create a paradox by considering bodies (that is, physical objects) of equal length aligned along three parallel rows within a stadium. One track contains A bodies (three A bodies are shown below); another contains B bodies; and a third contains C bodies. Each body is the same distance from its neighbors along its track. The A bodies are stationary. The Bs are moving to the right, and the Cs are moving with the same speed to the left. Here are two snapshots of the situation, before and after. They are taken one instant apart.

Diagram of Zeno's Moving Rows

Zeno points out that, in the time between the before-snapshot and the after-snapshot, the leftmost C passes two Bs but only one A, contradicting his (very controversial) assumption that the C should take longer to pass two Bs than one A. The usual way out of this paradox is to reject that controversial assumption.

Aristotle argues that how long it takes to pass a body depends on the speed of the body; for example, if the body is coming towards you, then you can pass it in less time than if it is stationary. Today’s analysts agree with Aristotle’s diagnosis, and historically this paradox of motion has seemed weaker than the previous three. This paradox has been called “The Stadium,” but occasionally so has the Dichotomy Paradox.

Some analysts, for example Tannery (1887), believe Zeno may have had in mind that the paradox was supposed to have assumed that both space and time are discrete (quantized, atomized) as opposed to continuous, and Zeno intended his argument to challenge the coherence of the idea of discrete space and time.

Well, the paradox could be interpreted this way. If so, assume the three objects A, B, and C are adjacent to each other in their tracks, and each A, B and C body are occupying a space that is one atom long. Then, if all motion is occurring at the rate of one atom of space in one atom of time, the leftmost C would pass two atoms of B-space in the time it passed one atom of A-space, which is a contradiction to our assumption about rates. There is another paradoxical consequence. Look at the space occupied by left C object.  During the instant of movement, it passes the middle B object, yet there is no time at which they are adjacent, which is odd.

So, Zeno’s argument can be interpreted as producing a challenge to the idea that space and time are discrete. However, most commentators suspect Zeno himself did not interpret his paradox this way.

b. Paradoxes of Plurality

Zeno's paradoxes of motion are attacks on the commonly held belief that motion is real, but because motion is a kind of plurality, namely a process along a plurality of places in a plurality of times, they are also attacks on this kind of plurality. Zeno offered more direct attacks on all kinds of plurality. The first is his Paradox of Alike and Unlike.

i. Alike and Unlike

According to Plato in Parmenides 127-9, Zeno argued that the assumption of plurality–the assumption that there are many things–leads to a contradiction. He quotes Zeno as saying: "If things are many, . . . they must be both like and unlike. But that is impossible; unlike things cannot be like, nor like things unlike" (Hamilton and Cairns (1961), 922).

Zeno's point is this. Consider a plurality of things, such as some people and some mountains. These things have in common the property of being heavy. But if they all have this property in common, then they really are all the same kind of thing, and so are not a plurality. They are a one. By this reasoning, Zeno believes it has been shown that the plurality is one (or the many is not many), which is a contradiction. Therefore, by reductio ad absurdum, there is no plurality, as Parmenides has always claimed.

Plato immediately accuses Zeno of equivocating. A thing can be alike some other thing in one respect while being not alike it in a different respect. Your having a property in common with some other thing does not make you identical with that other thing. Consider again our plurality of people and mountains. People and mountains are all alike in being heavy, but are unlike in intelligence. And they are unlike in being mountains; the mountains are mountains, but the people are not. As Plato says, when Zeno tries to conclude "that the same thing is many and one, we shall [instead] say that what he is proving is that something is many and one [in different respects], not that unity is many or that plurality is one...." [129d] So, there is no contradiction, and the paradox is solved by Plato. This paradox is generally considered to be one of Zeno's weakest paradoxes, and it is now rarely discussed. [See Rescher (2001), pp. 94-6 for some discussion.]

ii. Limited and Unlimited

This paradox is also called the Paradox of Denseness. Suppose there exist many things rather than, as Parmenides would say, just one thing. Then there will be a definite or fixed number of those many things, and so they will be “limited.” But if there are many things, say two things, then they must be distinct, and to keep them distinct there must be a third thing separating them. So, there are three things. But between these, …. In other words, things are dense and there is no definite or fixed number of them, so they will be “unlimited.” This is a contradiction, because the plurality would be both limited and unlimited. Therefore, there are no pluralities; there exists only one thing, not many things. This argument is reconstructed from Zeno’s own words, as quoted by Simplicius in his commentary of book 1 of Aristotle’s Physics.

According to the Standard Solution to this paradox, the weakness of Zeno’s argument can be said to lie in the assumption that “to keep them distinct, there must be a third thing separating them.” Zeno would have been correct to say that between any two physical objects that are separated in space, there is a place between them, because space is dense, but he is mistaken to claim that there must be a third physical object there between them. Two objects can be distinct at a time simply by one having a property the other does not have.

iii. Large and Small

Suppose there exist many things rather than, as Parmenides says, just one thing. Then every part of any plurality is both so small as to have no size but also so large as to be infinite, says Zeno. His reasoning for why they have no size has been lost, but many commentators suggest that he’d reason as follows. If there is a plurality, then it must be composed of parts which are not themselves pluralities. Yet things that are not pluralities cannot have a size or else they’d be divisible into parts and thus be pluralities themselves.

Now, why are the parts of pluralities so large as to be infinite? Well, the parts cannot be so small as to have no size since adding such things together would never contribute anything to the whole so far as size is concerned. So, the parts have some non-zero size. If so, then each of these parts will have two spatially distinct sub-parts, one in front of the other. Each of these sub-parts also will have a size. The front part, being a thing, will have its own two spatially distinct sub-parts, one in front of the other; and these two sub-parts will have sizes. Ditto for the back part. And so on without end. A sum of all these sub-parts would be infinite. Therefore, each part of a plurality will be so large as to be infinite.

This sympathetic reconstruction of the argument is based on Simplicius’ On Aristotle’s Physics, where Simplicius quotes Zeno’s own words for part of the paradox, although he does not say what he is quoting from.

There are many errors here in Zeno’s reasoning, according to the Standard Solution. He is mistaken at the beginning when he says, “If there is a plurality, then it must be composed of parts which are not themselves pluralities.” A university is an illustrative counterexample. A university is a plurality of students, but we need not rule out the possibility that a student is a plurality. What’s a whole and what’s a plurality depends on our purposes. When we consider a university to be a plurality of students, we consider the students to be wholes without parts. But for another purpose we might want to say that a student is a plurality of biological cells. Zeno is confused about this notion of relativity, and about part-whole reasoning; and as commentators began to appreciate this they lost interest in Zeno as a player in the great metaphysical debate between pluralism and monism.

A second error occurs in arguing that the each part of a plurality must have a non-zero size. The contemporary notion of measure (developed in the 20th century by Brouwer, Lebesgue, and others) showed how to properly define the measure function so that a line segment has nonzero measure even though (the singleton set of) any point has a zero measure. The measure of the line segment [a, b] is b - a; the measure of a cube with side a is a3. This theory of measure is now properly used by our civilization for length, volume, duration, mass, voltage, brightness, and other continuous magnitudes.

Thanks to Aristotle’s support, Zeno’s Paradoxes of Large and Small and of Infinite Divisibility (to be discussed below) were generally considered to have shown that a continuous magnitude cannot be composed of points. Interest was rekindled in this topic in the 18th century. The physical objects in Newton’s classical mechanics of 1726 were interpreted by R. J. Boscovich in 1763 as being collections of point masses. Each point mass is a movable point carrying a fixed mass. This idealization of continuous bodies as if they were compositions of point particles was very fruitful; it could be used to easily solve otherwise very difficult problems in physics. This success led scientists, mathematicians, and philosophers to recognize that the strength of Zeno’s Paradoxes of Large and Small and of Infinite Divisibility had been overestimated; they did not prevent a continuous magnitude from being composed of points.

iv. Infinite Divisibility

This is the most challenging of all the paradoxes of plurality. Consider the difficulties that arise if we assume that an object theoretically can be divided into a plurality of parts. According to Zeno, there is a reassembly problem. Imagine cutting the object into two non-overlapping parts, then similarly cutting these parts into parts, and so on until the process of repeated division is complete. Assuming the hypothetical division is “exhaustive” or does comes to an end, then at the end we reach what Zeno calls “the elements.” Here there is a problem about reassembly. There are three possibilities. (1) The elements are nothing. In that case the original objects will be a composite of nothing, and so the whole object will be a mere appearance, which is absurd. (2) The elements are something, but they have zero size. So, the original object is composed of elements of zero size. Adding an infinity of zeros yields a zero sum, so the original object had no size, which is absurd. (3) The elements are something, but they do not have zero size. If so, these can be further divided, and the process of division was not complete after all, which contradicts our assumption that the process was already complete. In summary, there were three possibilities, but all three possibilities lead to absurdity. So, objects are not divisible into a plurality of parts.

Simplicius says this argument is due to Zeno even though it is in Aristotle (On Generation and Corruption, 316a15-34, 316b34 and 325a8-12) and is not attributed there to Zeno, which is odd. Aristotle says the argument convinced the atomists to reject infinite divisibility. The argument has been called the Paradox of Parts and Wholes, but it has no traditional name.

The Standard Solution says we first should ask Zeno to be clearer about what he is dividing. Is it concrete or abstract? When dividing a concrete, material stick into its components, we reach ultimate constituents of matter such as quarks and electrons that cannot be further divided. These have a size, a zero size (according to quantum electrodynamics), but it is incorrect to conclude that the whole stick has no size if its constituents have zero size. [Due to the forces involved, point particles have finite “cross sections,” and configurations of those particles, such as atoms, do have finite size.] So, Zeno is wrong here. On the other hand, is Zeno dividing an abstract path or trajectory? Let's assume he is, since this produces a more challenging paradox. If so, then choice (2) above is the one to think about. It's the one that talks about addition of zeroes. Let's assume the object is one-dimensional, like a path. According to the Standard Solution, this "object" that gets divided should be considered to be a continuum with its elements arranged into the order type of the linear continuum, and we should use the contemporary notion of measure to find the size of the object. The size (length, measure) of a point-element is zero, but Zeno is mistaken in saying the total size (length, measure) of all the zero-size elements is zero. The size of the object  is determined instead by the difference in coordinate numbers assigned to the end points of the object. An object extending along a straight line that has one of its end points at one meter from the origin and other end point at three meters from the origin has a size of two meters and not zero meters. So, there is no reassembly problem, and a crucial step in Zeno's argument breaks down.

c. Other Paradoxes

i. The Grain of Millet

There are two common interpretations of this paradox. According to the first, which is the standard interpretation, when a bushel of millet (or wheat) grains falls out of its container and crashes to the floor, it makes a sound. Since the bushel is composed of individual grains, each individual grain also makes a sound, as should each thousandth part of the grain, and so on to its ultimate parts. But this result contradicts the fact that we actually hear no sound for portions like a thousandth part of a grain, and so we surely would hear no sound for an ultimate part of a grain. Yet, how can the bushel make a sound if none of its ultimate parts make a sound? The original source of this argument is Aristotle Physics, Book VII, chapter 4, 250a19-21). There seems to be appeal to the iterative rule that if a millet or millet part makes a sound, then so should a next smaller part.

We do not have Zeno’s words on what conclusion we are supposed to draw from this. Perhaps he would conclude it is a mistake to suppose that whole bushels of millet have millet parts. This is an attack on plurality.

The Standard Solution to this interpretation of the paradox accuses Zeno of mistakenly assuming that there is no lower bound on the size of something that can make a sound. There is no problem, we now say, with parts having very different properties from the wholes that they constitute. The iterative rule is initially plausible but ultimately not trustworthy, and Zeno is committing both the fallacy of division and the fallacy of composition.

Some analysts interpret Zeno’s paradox a second way, as challenging our trust in our sense of hearing, as follows. When a bushel of millet grains crashes to the floor, it makes a sound. The bushel is composed of individual grains, so they, too, make an audible sound. But if you drop an individual millet grain or a small part of one or an even smaller part, then eventually your hearing detects no sound, even though there is one. Therefore, you cannot trust your sense of hearing.

This reasoning about our not detecting low amplitude sounds is similar to making the mistake of arguing that you cannot trust your thermometer because there are some ranges of temperature that it is not sensitive to. So, on this second interpretation, the paradox is also easy to solve. One reason given in the literature for believing that this second interpretation is not the one that Zeno had in mind is that Aristotle’s criticism given below applies to the first interpretation and not the second, and it is unlikely that Aristotle would have misinterpreted the paradox.

ii. Against Place

Given an object, we may assume that there is a single, correct answer to the question, “What is its place?” Because everything that exists has a place, and because place itself exists, so it also must have a place, and so on forever. That’s too many places, so there is a contradiction. The original source is Aristotle’s Physics (209a23-25 and 210b22-24).

The standard response to Zeno’s Paradox Against Place is to deny that places have places, and to point out that the notion of place should be relative to reference frame. But Zeno’s assumption that places have places was common in ancient Greece at the time, and Zeno is to be praised for showing that it is a faulty assumption.

4. Aristotle’s Treatment of the Paradoxes

Aristotle’s views about Zeno’s paradoxes can be found in Physics, book 4, chapter 2, and book 6, chapters 2 and 9. Regarding the Dichotomy Paradox, Aristotle is to be applauded for his insight that Achilles has time to reach his goal because during the run ever shorter paths take correspondingly ever shorter times.

Aristotle had several criticisms of Zeno. Regarding the paradoxes of motion, he complained that Zeno should not suppose the runner's path is dependent on its parts; instead, the path is there first, and the parts are constructed by the analyst. His second complaint was that Zeno should not suppose that lines contain indivisible points. Aristotle's third and most influential, critical idea involves a complaint about potential infinity. On this point, in remarking about the Achilles Paradox, Aristotle said, “Zeno’s argument makes a false assumption in asserting that it is impossible for a thing to pass over…infinite things in a finite time.” Aristotle believes it is impossible for a thing to pass over an actually infinite number of things in a finite time, but that it is possible for a thing to pass over a potentially infinite number of things in a finite time. Here is how Aristotle expressed the point:

For motion…, although what is continuous contains an infinite number of halves, they are not actual but potential halves. (Physics 263a25-27). …Therefore to the question whether it is possible to pass through an infinite number of units either of time or of distance we must reply that in a sense it is and in a sense it is not. If the units are actual, it is not possible: if they are potential, it is possible. (Physics 263b2-5).

Aristotle denied the existence of the actual infinite both in the physical world and in mathematics, but he accepted potential infinities there. By calling them potential infinities he did not mean they have the potential to become actually infinite; “potential infinity” is a technical term that suggests a process that has not been completed. The term “actual infinite” does not imply being actual or real. It implies being complete, with no dependency on some process in time.

A potential infinity is an unlimited iteration of some operation—unlimited in time. Aristotle claimed correctly that if Zeno were not to have used the concept of actual infinity and of indivisible point, then the paradoxes of motion such as the Achilles Paradox (and the Dichotomy Paradox) could not be created.

Here is why doing so is a way out of these paradoxes. Zeno said that to go from the start to the finish line, the runner Achilles must reach the place that is halfway-there, then after arriving at this place he still must reach the place that is half of that remaining distance, and after arriving there he must again reach the new place that is now halfway to the goal, and so on. These are too many places to reach. Zeno made the mistake, according to Aristotle, of supposing that this infinite process needs completing when it really does not; the finitely long path from start to finish exists undivided for the runner, and it is the mathematician who is demanding the completion of such a process. Without using that concept of a completed infinity there is no paradox. Aristotle is correct about this being a treatment that avoids paradox.

Today’s standard treatment of the Achilles paradox disagrees with Aristotle's way out of the paradox and says Zeno was correct to use the concept of a completed infinity and correct to imply the runner must go to an actual infinity of places in a finite time.

From what Aristotle says, one can infer between the lines that he believes there is another reason to reject actual infinities: doing so is the only way out of these paradoxes of motion. Today we know better. There is another way out, namely, the Standard Solution that uses actual infinities, namely Cantor's transfinite sets.

Aristotle’s treatment by disallowing actual infinity while allowing potential infinity was clever, and it satisfied nearly all scholars for 1,500 years, being buttressed during that time by the Church's doctrine that only God is actually infinite. George Berkeley, Immanuel Kant, Carl Friedrich Gauss, and Henri Poincaré were influential defenders of potential infinity. Leibniz accepted actual infinitesimals, but other mathematicians and physicists in European universities during these centuries were careful to distinguish between actual and potential infinities and to avoid using actual infinities.

Given 1,500 years of opposition to actual infinities, the burden of proof was on anyone advocating them. Bernard Bolzano and Georg Cantor accepted this burden in the 19th century. The key idea is to see a potentially infinite set as a variable quantity that is dependent on being abstracted from a pre-exisiting actually infinite set. Bolzano argued that the natural numbers should be conceived of as a set, a determinate set, not one with a variable number of elements. Cantor argued that any potential infinity must be interpreted as varying over a predefined fixed set of possible values, a set that is actually infinite. He put it this way:

In order for there to be a variable quantity in some mathematical study, the “domain” of its variability must strictly speaking be known beforehand through a definition. However, this domain cannot itself be something variable…. Thus this “domain” is a definite, actually infinite set of values. Thus each potential infinite…presupposes an actual infinite. (Cantor 1887)

From this standpoint, Dedekind’s 1872 axiom of continuity and his definition of real numbers as certain infinite subsets of rational numbers suggested to Cantor and then to many other mathematicians that arbitrarily large sets of rational numbers are most naturally seen to be subsets of an actually infinite set of rational numbers. The same can be said for sets of real numbers. An actually infinite set is what we today call a "transfinite set." Cantor's idea is then to treat a potentially infinite set as being a sequence of definite subsets of a transfinite set. Aristotle had said mathematicians need only the concept of a finite straight line that may be produced as far as they wish, or divided as finely as they wish, but Cantor would say that this way of thinking presupposes a completed infinite continuum from which that finite line is abstracted at any particular time.

[When Cantor says the mathematical concept of potential infinity presupposes the mathematical concept of actual infinity, this does not imply that, if future time were to be potentially infinite, then future time also would be actually infinite.]

Dedekind's primary contribution to our topic was to give the first rigorous definition of infinite set—an actual infinity—showing that the notion is useful and not self-contradictory. Cantor provided the missing ingredient—that the mathematical line can fruitfully be treated as a dense linear ordering of uncountably many points, and he went on to develop set theory and to give the continuum a set-theoretic basis which convinced mathematicians that the concept was rigorously defined.

These ideas now form the basis of modern real analysis. The implication for the Achilles and Dichotomy paradoxes is that, once the rigorous definition of a linear continuum is in place, and once we have Cauchy’s rigorous theory of how to assess the value of an infinite series, then we can point to the successful use of calculus in physical science, especially in the treatment of time and of motion through space, and say that the sequence of intervals or paths described by Zeno is most properly treated as a sequence of subsets of an actually infinite set [that is, Aristotle's potential infinity of places that Achilles reaches are really a variable subset of an already existing actually infinite set of point places], and we can be confident that Aristotle’s treatment of the paradoxes is inferior to the Standard Solution’s.

Zeno said Achilles cannot achieve his goal in a finite time, but there is no record of the details of how he defended this conclusion. He might have said the reason is (i) that there is no last goal in the sequence of sub-goals, or, perhaps (ii) that it would take too long to achieve all the sub-goals, or perhaps (iii) that covering all the sub-paths is too great a distance to run. Zeno might have offered all these defenses. In attacking justification (ii), Aristotle objects that, if Zeno were to confine his notion of infinity to a potential infinity and were to reject the idea of zero-length sub-paths, then Achilles achieves his goal in a finite time, so this is a way out of the paradox. However, an advocate of the Standard Solution says Achilles achieves his goal by covering an actual infinity of paths in a finite time, and this is the way out of the paradox. (The discussion of whether Achilles can properly be described as completing an actual infinity of tasks rather than goals will be considered in Section 5c.) Aristotle's treatment of the paradoxes is basically criticized for being inconsistent with current standard real analysis that is based upon Zermelo Fraenkel set theory and its actually infinite sets. To summarize the errors of Zeno and Aristotle in the Achilles Paradox and in the Dichotomy Paradox, they both made the mistake of thinking that if a runner has to cover an actually infinite number of sub-paths to reach his goal, then he will never reach it; calculus shows how Achilles can do this and reach his goal in a finite time, and the fruitfulness of the tools of calculus imply that the Standard Solution is a better treatment than Aristotle's.

Let’s turn to the other paradoxes. In proposing his treatment of the Paradox of the Large and Small and of the Paradox of Infinite Divisibility, Aristotle said that

…a line cannot be composed of points, the line being continuous and the point indivisible. (Physics, 231a25)

In modern real analysis, a continuum is composed of points, but Aristotle, ever the advocate of common sense reasoning, claimed that a continuum cannot be composed of points. Aristotle believed a line can be composed only of smaller, indefinitely divisible lines and not of points without magnitude. Similarly a distance cannot be composed of point places and a duration cannot be composed of instants. This is one of Aristotle’s key errors, according to advocates of the Standard Solution, because by maintaining this common sense view he created an obstacle to the fruitful development of real analysis. In addition to complaining about points, Aristotelians object to the idea of an actual infinite number of them.

In his analysis of the Arrow Paradox, Aristotle said Zeno mistakenly assumes time is composed of indivisible moments, but “This is false, for time is not composed of indivisible moments any more than any other magnitude is composed of indivisibles.” (Physics, 239b8-9) Zeno needs those instantaneous moments; that way Zeno can say the arrow does not move during the moment. Aristotle recommends not allowing Zeno to appeal to instantaneous moments and restricting Zeno to saying motion be divided only into a potential infinity of intervals. That restriction implies the arrow’s path can be divided only into finitely many intervals at any time. So, at any time, there is a finite interval during which the arrow can exhibit motion by changing location. So the arrow flies, after all. That is, Aristotle declares Zeno’s argument is based on false assumptions without which there is no problem with the arrow’s motion. However, the Standard Solution agrees with Zeno that time can be composed of indivisible moments or instants, and it implies that Aristotle has mis-diagnosed where the error lies in the Arrow Paradox. Advocates of the Standard Solution would add that allowing a duration to be composed of indivisible moments is what is needed for having a fruitful calculus, and Aristotle's recommendation is an obstacle to the development of calculus.

Aristotle’s treatment of The Paradox of the Moving Rows is basically in agreement with the Standard Solution to that paradox–that Zeno did not appreciate the difference between speed and relative speed.

Regarding the Paradox of the Grain of Millet, Aristotle said that parts need not have all the properties of the whole, and so grains need not make sounds just because bushels of grains do. (Physics, 250a, 22) And if the parts make no sounds, we should not conclude that the whole can make no sound. It would have been helpful for Aristotle to have said more about what are today called the Fallacies of Division and Composition that Zeno is committing. However, Aristotle’s response to the Grain of Millet is brief but accurate by today’s standards.

In conclusion, are there two adequate but different solutions to Zeno’s paradoxes, Aristotle’s Solution and the Standard Solution? No. Aristotle’s treatment does not stand up to criticism in a manner that most scholars deem adequate. The Standard Solution uses contemporary concepts that have proved to be more valuable for solving and resolving so many other problems in mathematics and physics. Replacing Aristotle’s common sense concepts with the new concepts from real analysis and classical mechanics has been a key ingredient in the successful development of mathematics and science, and for this reason the vast majority of scientists, mathematicians, and philosophers reject Aristotle's treatment. Nevertheless, there is a significant minority in the philosophical community who do not agree, as we shall see in the sections that follow.

5. Other Issues Involving the Paradoxes

a. Consequences of Accepting the Standard Solution

There is a price to pay for accepting the Standard Solution to Zeno’s Paradoxes. The following–once presumably safe–intuitions or assumptions must be rejected:

  1. A continuum is too smooth to be composed of indivisible points.
  2. Runners do not have time to go to an actual infinity of places in a finite time.
  3. The sum of an infinite series of positive terms is always infinite.
  4. For each instant there is a next instant and for each place along a line there is a next place.
  5. A finite distance along a line cannot contain an actually infinite number of points.
  6. The more points there are on a line, the longer the line is.
  7. It is absurd for there to be numbers that are bigger than every integer.
  8. A one-dimensional curve can not fill a two-dimensional area, nor can an infinitely long curve enclose a finite area.
  9. A whole is always greater than any of its parts.

Item (8) was undermined when it was discovered that the continuum implies the existence of fractal curves. However, the loss of intuition (1) has caused the greatest stir because so many philosophers object to a continuum being constructed from points. Aristotle had said, "Nothing continuous can be composed of things having no parts," (Physics VI.3 234a 7-8). The Austrian philosopher Franz Brentano believed with Aristotle that scientific theories should be literal descriptions of reality, as opposed to today’s more popular view that theories are idealizations or approximations of reality. Continuity is something given in perception, said Brentano, and not in a mathematical construction; therefore, mathematics misrepresents. In a 1905 letter to Husserl, he said, “I regard it as absurd to interpret a continuum as a set of points.”

But the Standard Solution needs to be thought of as a package to be evaluated in terms of all of its costs and benefits. From this perspective the Standard Solution’s point-set analysis of continua has withstood the criticism and demonstrated its value in mathematics and mathematical physics. As a consequence, advocates of the Standard Solution say we must live with rejecting the eight intuitions listed above, and accept the counterintuitive implications such as there being divisible continua, infinite sets of different sizes, and space-filling curves. They agree with the philosopher W. V .O. Quine who demands that we be conservative when revising the system of claims that we believe and who recommends “minimum mutilation.” Advocates of the Standard Solution say no less mutilation will work satisfactorily.

b. Criticisms of the Standard Solution

Balking at having to reject so many of our intuitions, Henri-Louis Bergson, Max Black, Franz Brentano, L. E. J. Brouwer, Solomon Feferman, William James, Charles S. Peirce, James Thomson, Alfred North Whitehead, and Hermann Weyl argued in different ways that the standard mathematical account of continuity does not apply to physical processes, or is improper for describing those processes. Here are their main reasons: (1) the actual infinite cannot be encountered in experience and thus is unreal, (2) human intelligence is not capable of understanding motion, (3) the sequence of tasks that Achilles performs is finite and the illusion that it is infinite is due to mathematicians who confuse their mathematical representations with what is represented. (4) motion is unitary even though its spatial trajectory is infinitely divisible, (5) treating time as being made of instants is to treat time as static rather than as the dynamic aspect of consciousness that it truly is, (6) actual infinities and the contemporary continuum are not indispensable to solving the paradoxes, and (7) the Standard Solution’s implicit assumption of the primacy of the coherence of the sciences is unjustified because coherence with a priori knowledge and common sense is primary.

See Salmon (1970, Introduction) and Feferman (1998) for a discussion of the controversy about the quality of Zeno’s arguments, and an introduction to its vast literature. This controversy is much less actively pursued in today’s mathematical literature, and hardly at all in today’s scientific literature. A minority of philosophers are actively involved in an attempt to retain one or more of the eight intuitions listed in section 5a above. An important philosophical issue is whether the paradoxes should be solved by the Standard Solution or instead by assuming that a line is not composed of points but of intervals, and whether use of infinitesimals is essential to a proper understanding of the paradoxes. For an example of how to solve Zeno's Paradoxes without using the continuum and with returning to Democritus' intuition that there is a lower limit to the divisibility of space, see  "Atoms of Space" in Rivelli's theory of loop quantum gravity (Rivelli 2017, pp. 169-171).

c. Supertasks and Infinity Machines

In Zeno’s Achilles Paradox, Achilles does not cover an infinite distance, but he does cover an infinite number of distances. In doing so, does he need to complete an infinite sequence of tasks or actions? In other words, assuming Achilles does complete the task of reaching the tortoise, does he thereby complete a supertask, a transfinite number of tasks in a finite time?

Bertrand Russell said “yes.” He argued that it is possible to perform a task in one-half minute, then perform another task in the next quarter-minute, and so on, for a full minute. At the end of the minute, an infinite number of tasks would have been performed. In fact, Achilles does this in catching the tortoise, Russell said. In the mid-twentieth century, Hermann Weyl, Max Black, James Thomson, and others objected, and thus began an ongoing controversy about the number of tasks that can be completed in a finite time.

That controversy has sparked a related discussion about whether there could be a machine that can perform an infinite number of tasks in a finite time. A machine that can is called an infinity machine. In 1954, in an effort to undermine Russell’s argument, the philosopher James Thomson described a lamp that is intended to be a typical infinity machine. Let the machine switch the lamp on for a half-minute; then switch it off for a quarter-minute; then on for an eighth-minute; off for a sixteenth-minute; and so on. Would the lamp be lit or dark at the end of minute? Thomson argued that it must be one or the other, but it cannot be either because every period in which it is off is followed by a period in which it is on, and vice versa, so there can be no such lamp, and the specific mistake in the reasoning was to suppose that it is logically possible to perform a supertask. The implication for Zeno’s paradoxes is that Thomson is denying Russell’s description of Achilles’ task as a supertask, as being the completion of an infinite number of sub-tasks in a finite time.

Paul Benacerraf (1962) complains that Thomson’s reasoning is faulty because it fails to notice that the initial description of the lamp determines the state of the lamp at each period in the sequence of switching, but it determines nothing about the state of the lamp at the limit of the sequence, namely at the end of the minute. The lamp could be either on or off at the limit. The limit of the infinite converging sequence is not in the sequence. So, Thomson has not established the logical impossibility of completing this supertask.

Could some other argument establish this impossibility? Benacerraf suggests that an answer depends on what we ordinarily mean by the term “completing a task.” If the meaning does not require that tasks have minimum times for their completion, then maybe Russell is right that some supertasks can be completed, he says; but if a minimum time is always required, then Russell is mistaken because an infinite time would be required. What is needed is a better account of the meaning of the term “task.” Grünbaum objects to Benacerraf’s reliance on ordinary meaning. “We need to heed the commitments of ordinary language,” says Grünbaum, “only to the extent of guarding against being victimized or stultified by them.”

The Thomson Lamp Argument has generated a great literature in philosophy. Here are some of the issues. What is the proper definition of “task”? For example, does it require a minimum amount of time in the physicists’ technical sense of that term? Even if it is physically impossible to flip the switch in Thomson’s lamp, suppose physics were different and there were no limit on speed; what then? Is the lamp logically impossible? Is the lamp metaphysically impossible, even if it is logically possible? Was it proper of Thomson to suppose that the question of whether the lamp is lit or dark at the end of the minute must have a determinate answer? Does Thomson’s question have no answer, given the initial description of the situation, or does it have an answer which we are unable to compute? Should we conclude that it makes no sense to divide a finite task into an infinite number of ever shorter sub-tasks? Is there an important difference between completing a countable infinity of tasks and completing an uncountable infinity of tasks? Interesting issues arise when we bring in Einstein’s theory of relativity and consider a bifurcated supertask. This is an infinite sequence of tasks in a finite interval of an external observer’s proper time, but not in the machine’s own proper time. See Earman and Norton (1996) for an introduction to the extensive literature on these topics. Unfortunately, there is no agreement in the philosophical community on most of the questions we’ve just entertained.

d. Constructivism

The spirit of Aristotle’s opposition to actual infinities persists today in the philosophy of mathematics called constructivism. Constructivism is not a precisely defined position, but it implies that acceptable mathematical objects and procedures have to be founded on constructions and not, say, on assuming the object does not exist, then deducing a contradiction from that assumption. Most constructivists believe acceptable constructions must be performable ideally by humans independently of practical limitations of time or money. So they would say potential infinities, recursive functions, mathematical induction, and Cantor’s diagonal argument are constructive, but the following are not: The axiom of choice, the law of excluded middle, the law of double negation, completed infinities, and the classical continuum of the Standard Solution. The implication is that Zeno’s Paradoxes were not solved correctly by using the methods of the Standard Solution. More conservative constructionists, the finitists, would go even further and reject potential infinities because of the human being's finite computational resources, but this conservative sub-group of constructivists is very much out of favor.

L. E. J. Brouwer’s intuitionism was the leading constructivist theory of the early 20th century. In response to suspicions raised by the discovery of Russell’s Paradox and the introduction into set theory of the controversial non-constructive axiom of choice, Brouwer attempted to place mathematics on what he believed to be a firmer epistemological foundation by arguing that mathematical concepts are admissible only if they can be constructed from, and thus grounded in, an ideal mathematician’s vivid temporal intuitions, their a priori intuitions of time.

Brouwer’s intuitionistic continuum has the Aristotelian property of unsplitability. What this means is that, unlike the Standard Solution’s set-theoretic composition of the continuum which allows, say, the closed interval of real numbers from zero to one to be split or cut into (that is, be the union of sets of) those numbers in the interval that are less than one-half and those numbers in the interval that are greater than or equal to one-half, the corresponding closed interval of the intuitionistic continuum cannot be split this way into two disjoint sets. This unsplitability or inseparability agrees in spirit with Aristotle’s idea of the continuity of a real continuum, but disagrees in spirit with Aristotle's idea of not allowing the continuum to be composed of points. [For more on this topic, see Posy (2005) pp. 346-7.]

Although everyone agrees that any legitimate mathematical proof must use only a finite number of steps and be constructive in that sense, the majority of mathematicians in the first half of the twentieth century claimed that constructive mathematics could not produce an adequate theory of the continuum because essential theorems would no longer be theorems, and constructivist principles and procedures are too awkward to use successfully. In 1927, David Hilbert exemplified this attitude when he objected that Brouwer’s restrictions on allowable mathematics—such as rejecting proof by contradiction—were like taking the telescope away from the astronomer.

But thanks in large part to the later development of constructive mathematics by Errett Bishop and Douglas Bridges in the second half of the 20th century, most contemporary philosophers of mathematics believe the question of whether constructivism could be successful in the sense of producing an adequate theory of the continuum is still open [see Wolf (2005) p. 346, and McCarty (2005) p. 382], and to that extent so is the question of whether the Standard Solution to Zeno’s Paradoxes needs to be rejected or perhaps revised to embrace constructivism. Frank Arntzenius (2000), Michael Dummett (2000), and Solomon Feferman (1998) have done important philosophical work to promote the constructivist tradition. Nevertheless, the vast majority of today’s practicing mathematicians routinely use nonconstructive mathematics.

e. Nonstandard Analysis

Although Zeno and Aristotle had the concept of small, they did not have the concept of infinitesimally small, which is the informal concept that was used by Leibniz (and Newton) in the development of calculus. In the 19th century, infinitesimals were eliminated from the standard development of calculus due to the work of Cauchy and Weierstrass on defining a derivative in terms of limits using the epsilon-delta method. But in 1881, C. S. Peirce advocated restoring infinitesimals because of their intuitive appeal. Unfortunately, he was unable to work out the details, as were all mathematicians—until 1960 when Abraham Robinson produced his nonstandard analysis. At this point in time it was no longer reasonable to say that banishing infinitesimals from analysis was an intellectual advance. What Robinson did was to extend the standard real numbers to include infinitesimals, using this definition: h is infinitesimal if and only if its absolute value is less than 1/n, for every positive standard number n. Robinson went on to create a nonstandard model of analysis using hyperreal numbers. The class of hyperreal numbers contains counterparts of the reals, but in addition it contains any number that is the sum, or difference, of both a standard real number and an infinitesimal number, such as 3 + h and 3 – 4h2. The reciprocal of an infinitesimal is an infinite hyperreal number. These hyperreals obey the usual rules of real numbers except for the Archimedean axiom. Infinitesimal distances between distinct points are allowed, unlike with standard real analysis. The derivative is defined in terms of the ratio of infinitesimals, in the style of Leibniz, rather than in terms of a limit as in standard real analysis in the style of Weierstrass.

Nonstandard analysis is called “nonstandard” because it was inspired by Thoralf Skolem’s demonstration in 1933 of the existence of models of first-order arithmetic that are not isomorphic to the standard model of arithmetic. What makes them nonstandard is especially that they contain infinitely large (hyper)integers. For nonstandard calculus one needs nonstandard models of real analysis rather than just of arithmetic. An important feature demonstrating the usefulness of nonstandard analysis is that it achieves essentially the same theorems as those in classical calculus. The treatment of Zeno’s paradoxes is interesting from this perspective. See McLaughlin (1994) for how Zeno’s paradoxes may be treated using infinitesimals. McLaughlin believes this approach to the paradoxes is the only successful one, but commentators generally do not agree with that conclusion, and consider it merely to be an alternative solution. See Dainton (2010) pp. 306-9 for some discussion of this.

f. Smooth Infinitesimal Analysis

Abraham Robinson in the 1960s resurrected the infinitesimal as an infinitesimal number, but F. W. Lawvere in the 1970s resurrected the infinitesimal as an infinitesimal magnitude. His work is called “smooth infinitesimal analysis” and is part of “synthetic differential geometry.” In smooth infinitesimal analysis, a curved line is composed of infinitesimal tangent vectors. One significant difference from a nonstandard analysis, such as Robinson’s above, is that all smooth curves are straight over infinitesimal distances, whereas Robinson’s can curve over infinitesimal distances. In smooth infinitesimal analysis, Zeno’s arrow does not have time to change its speed during an infinitesimal interval. Smooth infinitesimal analysis retains the intuition that a continuum should be smoother than the continuum of the Standard Solution. Unlike both standard analysis and nonstandard analysis whose real number systems are set-theoretical entities and are based on classical logic, the real number system of smooth infinitesimal analysis is not a set-theoretic entity but rather an object in a topos of category theory, and its logic is intuitionist. (Harrison, 1996, p. 283) Like Robinson’s nonstandard analysis, Lawvere’s smooth infinitesimal analysis may also be a promising approach to a foundation for real analysis and thus to solving Zeno’s paradoxes, but there is no consensus that Zeno’s Paradoxes need to be solved this way. For more discussion see note 11 in Dainton (2010) pp. 420-1.

6. The Legacy and Current Significance of the Paradoxes

What influence has Zeno had? He had none in the East, but in the West there has been continued influence and interest up to today.

Let’s begin with his influence on the ancient Greeks. Before Zeno, philosophers expressed their philosophy in poetry, and he was the first philosopher to use prose arguments. This new method of presentation was destined to shape almost all later philosophy, mathematics, and science. Zeno drew new attention to the idea that the way the world appears to us is not how it is in reality. Zeno probably also influenced the Greek atomists to accept atoms. Aristotle was influenced by Zeno to use the distinction between actual and potential infinity as a way out of the paradoxes, and careful attention to this distinction has influenced mathematicians ever since. The proofs in Euclid’s Elements, for example, used only potentially infinite procedures. Awareness of Zeno’s paradoxes made Greek and all later Western intellectuals more aware that mistakes can be made when thinking about infinity, continuity, and the structure of space and time, and it made them wary of any claim that a continuous magnitude could be made of discrete parts. ”Zeno’s arguments, in some form, have afforded grounds for almost all theories of space and time and infinity which have been constructed from his time to our own,” said Bertrand Russell in the twentieth century.

There is controversy in 20th and 21st century literature about whether Zeno developed any specific, new mathematical techniques. Some scholars claim Zeno influenced the mathematicians to use the indirect method of proof (reductio ad absurdum), but others disagree and say it may have been the other way around. Other scholars take the internalist position that the conscious use of the method of indirect argumentation arose in both mathematics and philosophy independently of each other. See Hintikka (1978) for a discussion of this controversy about origins. Everyone agrees the method was Greek and not Babylonian, as was the method of proving something by deducing it from explicitly stated assumptions. G. E. L. Owen (Owen 1958, p. 222) argued that Zeno influenced Aristotle’s concept of motion not existing at an instant, which implies there is no instant when a body begins to move, nor an instant when a body changes its speed. Consequently, says Owen, Aristotle’s conception is an obstacle to a Newton-style concept of acceleration, and this hindrance is “Zeno’s major influence on the mathematics of science.” Other commentators consider Owen’s remark to be slightly harsh regarding Zeno because, they ask, if Zeno had not been born, would Aristotle have been likely to develop any other concept of motion?

Zeno’s paradoxes have received some explicit attention from scholars throughout later centuries. Pierre Gassendi in the early 17th century mentioned Zeno’s paradoxes as the reason to claim that the world’s atoms must not be infinitely divisible. Pierre Bayle’s 1696 article on Zeno drew the skeptical conclusion that, for the reasons given by Zeno, the concept of space is contradictory. In the early 19th century, Hegel suggested that Zeno’s paradoxes supported his view that reality is inherently contradictory.

Zeno’s paradoxes caused mistrust in infinites, and this mistrust has influenced the contemporary movements of constructivism, finitism, and nonstandard analysis, all of which affect the treatment of Zeno’s paradoxes. Dialetheism, the acceptance of true contradictions via a paraconsistent formal logic, provides a newer, although unpopular, response to Zeno’s paradoxes, but dialetheism was not created specifically in response to worries about Zeno’s paradoxes. With the introduction in the 20th century of thought experiments about supertasks, interesting philosophical research has been directed towards understanding what it means to complete a task.

Zeno's paradoxes are often pointed to for a case study in how a philosophical problem has been solved, even though the solution took over two thousand years to materialize.

So, Zeno’s paradoxes have had a wide variety of impacts upon subsequent research. Little research today is involved directly in how to solve the paradoxes themselves, especially in the fields of mathematics and science, although discussion continues in philosophy, primarily on whether a continuous magnitude should be composed of discrete magnitudes, such as whether a line should be composed of points. If there are alternative treatments of Zeno's paradoxes, then this raises the issue of whether there is a single solution to the paradoxes or several solutions or one best solution. The answer to whether the Standard Solution is the correct solution to Zeno’s paradoxes may also depend on whether the best physics of the future that reconciles the theories of quantum mechanics and general relativity will require us to assume spacetime is composed at its most basic level of points, or, instead, of regions or loops or something else.

From the perspective of the Standard Solution, the most significant lesson learned by researchers who have tried to solve Zeno’s paradoxes is that the way out requires revising many of our old theories and their concepts. We have to be willing to rank the virtues of preserving logical consistency and promoting scientific fruitfulness above the virtue of preserving our intuitions.

7. References and Further Reading

  • Arntzenius, Frank. (2000) “Are there Really Instantaneous Velocities?”, The Monist 83, pp. 187-208.
    • Examines the possibility that a duration does not consist of points, that every part of time has a non-zero size, that real numbers cannot be used as coordinates of times, and that there are no instantaneous velocities at a point.
  • Barnes, J. (1982). The Presocratic Philosophers, Routledge & Kegan Paul: Boston.
    • A well respected survey of the philosophical contributions of the Pre-Socratics.
  • Barrow, John D. (2005). The Infinite Book: A Short Guide to the Boundless, Timeless and Endless, Pantheon Books, New York.
    • A popular book in science and mathematics introducing Zeno’s Paradoxes and other paradoxes regarding infinity.
  • Benacerraf, Paul (1962). “Tasks, Super-Tasks, and the Modern Eleatics,” The Journal of Philosophy, 59, pp. 765-784.
    • An original analysis of Thomson’s Lamp and supertasks.
  • Bergson, Henri (1946). Creative Mind, translated by M. L. Andison. Philosophical Library: New York.
    • Bergson demands the primacy of intuition in place of the objects of mathematical physics.
  • Black, Max (1950-1951). “Achilles and the Tortoise,” Analysis 11, pp. 91-101.
    • A challenge to the Standard Solution to Zeno’s paradoxes. Blacks agrees that Achilles did not need to complete an infinite number of sub-tasks in order to catch the tortoise.
  • Cajori, Florian (1920). “The Purpose of Zeno’s Arguments on Motion,” Isis, vol. 3, no. 1, pp. 7-20.
    • An analysis of the debate regarding the point Zeno is making with his paradoxes of motion.
  • Cantor, Georg (1887). "Über die verschiedenen Ansichten in Bezug auf die actualunendlichen Zahlen." Bihang till Kongl. Svenska Vetenskaps-Akademien Handlingar , Bd. 11 (1886-7), article 19. P. A. Norstedt & Sôner: Stockholm.
    • A very early description of set theory and its relationship to old ideas about infinity.
  • Chihara, Charles S. (1965). “On the Possibility of Completing an Infinite Process,” Philosophical Review 74, no. 1, p. 74-87.
    • An analysis of what we mean by “task.”
  • Copleston, Frederick, S.J. (1962). “The Dialectic of Zeno,” chapter 7 of A History of Philosophy, Volume I, Greece and Rome, Part I, Image Books: Garden City.
    • Copleston says Zeno’s goal is to challenge the Pythagoreans who denied empty space and accepted pluralism.
  • Dainton, Barry. (2010). Time and Space, Second Edition, McGill-Queens University Press: Ithaca.
    • Chapters 16 and 17 discuss Zeno's Paradoxes.
  • Dauben, J. (1990). Georg Cantor, Princeton University Press: Princeton.
    • Contains Kronecker’s threat to write an article showing that Cantor’s set theory has “no real significance.” Ludwig Wittgenstein was another vocal opponent of set theory.
  • De Boer, Jesse (1953). “A Critique of Continuity, Infinity, and Allied Concepts in the Natural Philosophy of Bergson and Russell,” in Return to Reason: Essays in Realistic Philosophy, John Wild, ed., Henry Regnery Company: Chicago, pp. 92-124.
    • A philosophical defense of Aristotle’s treatment of Zeno’s paradoxes.
  • Diels, Hermann and W. Kranz (1951). Die Fragmente der Vorsokratiker, sixth ed., Weidmannsche Buchhandlung: Berlin.
    • A standard edition of the pre-Socratic texts.
  • Dummett, Michael (2000). “Is Time a Continuum of Instants?,” Philosophy, 2000, Cambridge University Press: Cambridge, pp. 497-515.
    • Promoting a constructive foundation for mathematics, Dummett’s formalism implies there are no instantaneous instants, so times must have rational values rather than real values. Times have only the values that they can in principle be measured to have; and all measurements produce rational numbers within a margin of error.
  • Earman J. and J. D. Norton (1996). “Infinite Pains: The Trouble with Supertasks,” in Paul Benacerraf: the Philosopher and His Critics, A. Morton and S. Stich (eds.), Blackwell: Cambridge, MA, pp. 231-261.
    • A criticism of Thomson’s interpretation of his infinity machines and the supertasks involved, plus an introduction to the literature on the topic.
  • Feferman, Solomon (1998). In the Light of Logic, Oxford University Press, New York.
    • A discussion of the foundations of mathematics and an argument for semi-constructivism in the tradition of Kronecker and Weyl, that the mathematics used in physical science needs only the lowest level of infinity, the infinity that characterizes the whole numbers. Presupposes considerable knowledge of mathematical logic.
  • Freeman, Kathleen (1948). Ancilla to the Pre-Socratic Philosophers, Harvard University Press: Cambridge, MA. Reprinted in paperback in 1983.
    • One of the best sources in English of primary material on the Pre-Socratics.
  • Grünbaum, Adolf (1967). Modern Science and Zeno’s Paradoxes, Wesleyan University Press: Middletown, Connecticut.
    • A detailed defense of the Standard Solution to the paradoxes.
  • Grünbaum, Adolf (1970). “Modern Science and Zeno’s Paradoxes of Motion,” in (Salmon, 1970), pp. 200-250.
    • An analysis of arguments by Thomson, Chihara, Benacerraf and others regarding the Thomson Lamp and other infinity machines.
  • Hamilton, Edith and Huntington Cairns (1961). The Collected Dialogues of Plato Including the Letters, Princeton University Press: Princeton.
  • Harrison, Craig (1996). “The Three Arrows of Zeno: Cantorian and Non-Cantorian Concepts of the Continuum and of Motion,” Synthese, Volume 107, Number 2, pp. 271-292.
    • Considers smooth infinitesimal analysis as an alternative to the classical Cantorian real analysis of the Standard Solution.
  • Heath, T. L. (1921). A History of Greek Mathematics, Vol. I, Clarendon Press: Oxford. Reprinted 1981.
    • Promotes the minority viewpoint that Zeno had a direct influence on Greek mathematics, for example by eliminating the use of infinitesimals.
  • Hintikka, Jaakko, David Gruender and Evandro Agazzi. Theory Change, Ancient Axiomatics, and Galileo’s Methodology, D. Reidel Publishing Company, Dordrecht.
    • A collection of articles that discuss, among other issues, whether Zeno’s methods influenced the mathematicians of the time or whether the influence went in the other direction. See especially the articles by Karel Berka and Wilbur Knorr.
  • Kirk, G. S., J. E. Raven, and M. Schofield, eds. (1983). The Presocratic Philosophers: A Critical History with a Selection of Texts, Second Edition, Cambridge University Press: Cambridge.
    • A good source in English of primary material on the Pre-Socratics with detailed commentary on the controversies about how to interpret various passages.
  • Maddy, Penelope (1992) “Indispensability and Practice,” Journal of Philosophy 59, pp. 275-289.
    • Explores the implication of arguing that theories of mathematics are indispensable to good science, and that we are justified in believing in the mathematical entities used in those theories.
  • Matson, Wallace I (2001). “Zeno Moves!” pp. 87-108 in Essays in Ancient Greek Philosophy VI: Before Plato, ed. by Anthony Preus, State University of New York Press: Albany.
    • Matson supports Tannery’s non-classical interpretation that Zeno’s purpose was to show only that the opponents of Parmenides are committed to denying motion, and that Zeno himself never denied motion, nor did Parmenides.
  • McCarty, D.C. (2005). “Intuitionism in Mathematics,” in The Oxford Handbook of Philosophy of Mathematics and Logic, edited by Stewart Shapiro, Oxford University Press, Oxford, pp. 356-86.
    • Argues that a declaration of death of the program of founding mathematics on an intuitionistic basis is premature.
  • McLaughlin, William I. (1994). “Resolving Zeno’s Paradoxes,” Scientific American, vol. 271, no. 5, Nov., pp. 84-90.
    • How Zeno’s paradoxes may be explained using a contemporary theory of Leibniz’s infinitesimals.
  • Owen, G.E.L. (1958). “Zeno and the Mathematicians,” Proceedings of the Aristotelian Society, New Series, vol. LVIII, pp. 199-222.
    • Argues that Zeno and Aristotle negatively influenced the development of the Renaissance concept of acceleration that was used so fruitfully in calculus.
  • Posy, Carl. (2005). “Intuitionism and Philosophy,” in The Oxford Handbook of Philosophy of Mathematics and Logic, edited by Stewart Shapiro, Oxford University Press, Oxford, pp. 318-54.
    • Contains a discussion of how the unsplitability of Brouwer’s intuitionistic continuum makes precise Aristotle’s notion that “you can’t cut a continuous medium without some of it clinging to the knife,” on pages 345-7.
  • Proclus (1987). Proclus’ Commentary on Plato’s Parmenides, translated by Glenn R. Morrow and John M. Dillon, Princeton University Press: Princeton.
    • A detailed list of every comment made by Proclus about Zeno is available with discussion starting on p. xxxix of the Introduction by John M. Dillon. Dillon focuses on Proclus’ comments which are not clearly derivable from Plato’s Parmenides, and concludes that Proclus had access to other sources for Zeno’s comments, most probably Zeno’s original book or some derivative of it. William Moerbeke’s overly literal translation in 1285 from Greek to Latin of Proclus’ earlier, but now lost, translation of Plato’s Parmenides is the key to figuring out the original Greek. (see p. xliv)
  • Rescher, Nicholas (2001). Paradoxes: Their Roots, Range, and Resolution, Carus Publishing Company: Chicago.
    • Pages 94-102 apply the Standard Solution to all of Zeno's paradoxes. Rescher calls the Paradox of Alike and Unlike the "Paradox of Differentiation."
  • Rivelli, Carlo (2017). Reality is Not What It Seems: The Journey to Quantum Gravity, Riverhead Books: New York.
    • Rivelli's chapter 6 explains how the theory of loop quantum gravity provides a new solution to Zeno's Paradoxes that is more in tune with the intuitions of Democratus because it rejects the assumption that a bit of space can always be subdivided.
  • Russell, Bertrand (1914). Our Knowledge of the External World as a Field for Scientific Method in Philosophy, Open Court Publishing Co.: Chicago.
    • Russell champions the use of contemporary real analysis and physics in resolving Zeno’s paradoxes.
  • Salmon, Wesley C., ed. (1970). Zeno’s Paradoxes, The Bobbs-Merrill Company, Inc.: Indianapolis and New York. Reprinted in paperback in 2001.
    • A collection of the most influential articles about Zeno’s Paradoxes from 1911 to 1965. Salmon provides an excellent annotated bibliography of further readings.
  • Szabo, Arpad (1978). The Beginnings of Greek Mathematics, D. Reidel Publishing Co.: Dordrecht.
    • Contains the argument that Parmenides discovered the method of indirect proof by using it against Anaximenes’ cosmogony, although it was better developed in prose by Zeno. Also argues that Greek mathematicians did not originate the idea but learned of it from Parmenides and Zeno. (pp. 244-250). These arguments are challenged in Hntikka (1978).
  • Tannery, Paul (1885). “‘Le Concept Scientifique du continu: Zenon d’Elee et Georg Cantor,” pp. 385-410 of Revue Philosophique de la France et de l’Etranger, vol. 20, Les Presses Universitaires de France: Paris.
    • This mathematician gives the first argument that Zeno’s purpose was not to deny motion but rather to show only that the opponents of Parmenides are committed to denying motion.
  • Tannery, Paul (1887). Pour l’Histoire de la Science Hellène: de Thalès à Empédocle, Alcan: Paris. 2nd ed. 1930.
    • More development of the challenge to the classical interpretation of what Zeno’s purposes were in creating his paradoxes.
  • Thomson, James (1954-1955). “Tasks and Super-Tasks,” Analysis, XV, pp. 1-13.
    • A criticism of supertasks. The Thomson Lamp thought-experiment is used to challenge Russell’s characterization of Achilles as being able to complete an infinite number of tasks in a finite time.
  • Tiles, Mary (1989). The Philosophy of Set Theory: An Introduction to Cantor’s Paradise, Basil Blackwell: Oxford.
    • A philosophically oriented introduction to the foundations of real analysis and its impact on Zeno’s paradoxes.
  • Vlastos, Gregory (1967). “Zeno of Elea,” in The Encyclopedia of Philosophy, Paul Edwards (ed.), The Macmillan Company and The Free Press: New York.
    • A clear, detailed presentation of the paradoxes. Vlastos comments that Aristotle does not consider any other treatment of Zeno’s paradoxes than by recommending replacing Zeno’s actual infinities with potential infinites, so we are entitled to assert that Aristotle probably believed denying actual infinities is the only route to a coherent treatment of infinity. Vlastos also comments that “there is nothing in our sources that states or implies that any development in Greek mathematics (as distinct from philosophical opinions about mathematics) was due to Zeno’s influence.”
  • White, M. J. (1992). The Continuous and the Discrete: Ancient Physical Theories from a Contemporary Perspective, Clarendon Press: Oxford.
    • A presentation of various attempts to defend finitism, neo-Aristotelian potential infinities, and the replacement of the infinite real number field with a finite field.
  • Wisdom, J. O. (1953). “Berkeley’s Criticism of the Infinitesimal,” The British Journal for the Philosophy of Science, Vol. 4, No. 13, pp. 22-25.
    • Wisdom clarifies the issue behind George Berkeley’s criticism (in 1734 in The Analyst) of the use of the infinitesimal (fluxion) by Newton and Leibniz. See also the references there to Wisdom’s other three articles on this topic in the journal Hermathena in 1939, 1941 and 1942.
  • Wolf, Robert S. (2005). A Tour Through Mathematical Logic, The Mathematical Association of America: Washington, DC.
    • Chapter 7 surveys nonstandard analysis, and Chapter 8 surveys constructive mathematics, including the contributions by Errett Bishop and Douglas Bridges.

Author Information

Bradley Dowden
California State University, Sacramento
U. S. A.