Category Archives: Science, Logic, & Mathematics

Confirmation and Induction

The term "confirmation" is used in epistemology and the philosophy of science whenever observational data and evidence "speak in favor of" or support scientific theories and everyday hypotheses. Historically, confirmation has been closely related to the problem of induction, the question of what to believe regarding the future in the face of knowledge that is restricted to the past and present. One relation between confirmation and inductive logic is that the conclusion H of an inductively strong argument with premise E is confirmed by E. If inductive strength comes in degrees and the inductive strength of the argument with premise E and conclusion H is equal to r, then the degree of confirmation of H by E is likewise said to be equal to r.

This article begins by briefly reviewing Hume's formulation of the problem of the justification of induction. Then we jump to the middle of the twentieth century and Hempel's pioneering work on confirmation. After looking at Popper's falsificationism and the hypothetico-deductive method of hypotheses testing, the notion of probability, as it was defined by Kolmogorov, is introduced. Probability theory is the main mathematical tool for Carnap's inductive logic as well as for Bayesian confirmation theory. Carnap's inductive logic is based on a logical interpretation of probability, which will be discussed at some length. However, his heroic efforts to construct a logical probability measure in purely syntactical terms can be considered to have failed. Goodman's new riddle of induction will serve to illustrate the shortcomings of such a purely syntactical approach to confirmation. Carnap's work is nevertheless important because today's most popular theory of confirmation – Bayesian confirmation theory – is to a great extent the result of replacing Carnap's logical interpretation of probability with a subjective interpretation as degree of belief qua fair betting ratio. The rest of the article will be concerned mainly with Bayesian confirmation theory, although the final section will mention some alternative views on confirmation and induction.

[Note from author: "I intend to submit a substantially revised version of this entry that will remove any potentially offensive examples and that will update the bibliography to include important and currently missing work by members of underrepresented groups. -- Franz Huber, March 21, 2014."]

Table of Contents

  1. Introduction: Confirmation and Induction
  2. Hempel and the Logic of Confirmation
    1. The Ravens Paradox
    2. The Logic of Confirmation
  3. Popper's Falsificationism and Hypothetico-Deductive Confirmation
    1. Popper's Falsificationism
    2. Hypothetico-Deductive Confirmation
  4. Inductive Logic
    1. Kolmogorov's Axiomatization
    2. Logical Probability and Degree of Confirmation
    3. Absolute and Incremental Confirmation
    4. Carnap's Analysis of Hempel's Conditions
  5. The New Riddle of Induction and the Demise of the Syntactic Approach
  6. Bayesian Confirmation Theory
    1. Subjective Probability and the Dutch Book Argument
    2. Confirmation Measures
    3. Some Success Stories
  7. Taking Stock
  8. References and Further Reading

1. Introduction: Confirmation and Induction

Whenever observational data and evidence speak in favor of, or support, scientific theories or everyday hypotheses, the latter are said to be confirmed by the former. The positive result of a pregnancy test speaks in favor of or confirms the hypothesis that the tested woman is pregnant. The dark clouds on the sky support or confirm the hypothesis that it will be raining.

Confirmation takes a qualitative and a quantitative form. Qualitative confirmation is usually construed as a relation, among other things, between three sentences or propositions: evidence E confirms hypothesis H relative to background information B. Quantitative confirmation is, among other things, a relation between evidence E, hypothesis H, background information B, and a number r: E confirms H relative to B to degree r. (Comparative confirmation – H1 is more confirmed by E1 relative to B1 than H2 by E2 relative to B2 – is usually derived from a quantitative notion of confirmation, and is not discussed in this entry.)

Historically, confirmation has been closely related to the problem of induction, the question of what to believe regarding the future in the face of knowledge that is restricted to the past and present. David Hume gives the classic formulation of the problem of the justification of induction in A Treatise of Human Nature:

Let men be once fully persuaded of these two principles, that there is nothing in any object, consider'd in itself, which can afford us a reason for drawing a conclusion beyond it; and, that even after the observation of the frequent or constant conjunction of objects, we have no reason to draw any inference concerning any object beyond those of which we have had experience; (Hume 1739/2000, book 1, part 3, section 12)

The reason is that any such inference beyond those objects of which we had experience needs to be justified – and, according to Hume, this is not possible.

In order to justify induction one has to provide a deductively valid or an inductively strong argument to the effect that our inductively strong arguments will continue to lead us to true conclusions (most of the time) in the future. (An argument consists of a set of premises P1, …, Pn and a conclusion C. Such an argument is deductively valid just in case the truth of the premises guarantees the truth of the conclusion. There is no standard definition of an inductively strong argument, but the idea is that the premises speak in favor of or support the conclusion.) But there is no deductively valid argument whose premises are restricted to the past and present and whose conclusion is about the future – and all our knowledge is about the past and present. On the other hand, an inductively strong argument presumably has to be inductively strong in the very sense of our inductive practices – and thus begs the question. For more see the introductory Skyrms (2000), the intermediate Hacking (2001), and the advanced Howson (2000a).

Neglecting the background information B, as we will mostly do in the following, we can state the link between induction and confirmation as follows. The conclusion H of an inductively strong argument with premise E is confirmed by E. If r quantifies the strength of the inductive argument in question, the degree of confirmation of H by E is equal to r. Let us then start the discussion of confirmation by the first serious attempts to define the notion, and to develop a corresponding logic of confirmation.

2. Hempel and the Logic of Confirmation

a. The Ravens Paradox

According to the Nicod criterion of confirmation (Hempel 1945), universal generalizations of the form "All Fs are Gs," in symbols ∀x(Fx Gx), are confirmed by their "instances" " This particular object a is both F and G," FaGa. (It would be more appropriate to call FaGa rather than FaGa an instance of ∀x(FxGx).) The universal generalization "All ravens are black" is thus said to be confirmed by its instance "a is a black raven." As "a is a non-black non-raven" is an instance of "All non-black things are non-ravens," the Nicod criterion says that "a is a non-black non-raven" confirms "All non-black things are non-ravens." (It is sometimes said that a black raven confirms the ravens hypothesis "All ravens are black." In this case, confirmation is a relation between a non-linguistic entity – namely, a black raven – and a hypothesis. I decided to construe confirmation as a relation between, among other things, evidential propositions and hypotheses, and so we have to state the above in a clumsier way.)

One of Hempel's conditions of adequacy for any relation of confirmation is the equivalence condition. It says that logically equivalent hypotheses are confirmed by the same evidential propositions. "All ravens are black" is logically equivalent to "All non-black things are non-ravens." Therefore a non-black non-raven like a white shoe or a red herring can be used to confirm the ravens-hypothesis "All ravens are black." Surely, this is absurd – and this is known as the ravens paradox.

Even worse, "All ravens are black,"∀x(RxBx), is logically equivalent to "All things that are green or not green are not ravens or black,"∀x(Gx∨¬Gx → ¬RxBx). "a is green or not green, and a is not raven or black" is an instance of this hypothesis. Furthermore, it is logically equivalent to "a is not a raven or a is black." As everything is green or not green, we get the similarly paradoxical result that an object which is not a raven or which is black – anything but a non-black raven which could be used to falsify the ravens hypothesis is such an object – can be used to confirm the ravens hypothesis that all ravens are black.

Hempel (1945), who discussed these cases of the ravens, concluded that non-black non-ravens (as well as any other object that is not a raven or black) can indeed be used to confirm the ravens hypothesis. He attributed the paradoxical character of this alleged paradox to the psychological fact that we assume there to be far more non-black objects than ravens. However, the notion of confirmation he was explicating was supposed to presuppose no background knowledge whatsoever. An example by Good (1967) shows that such an unrelativized notion of confirmation is not useful (see Hempel 1967, Good 1968).

Others have been led to the rejection of the Nicod criterion. Howson (2000b, 113) considers the hypothesis "Everybody in the room leaves with somebody else's hat," which he attributes to Rosenkrantz (1981). If the background contains the information that there are only three individuals a, b, c in the room, then the evidence consisting of the two instances "a leaves with b's hat" and "b leaves with a's hat" falsifies rather than confirms the hypothesis. Besides pointing to the role played by the background information in this example, Hempel would presumably have stressed that the Nicod criterion has to be restricted to universal generalization in one variable only. Already in his (1945, 13: fn. 1) he notes that R(a, b)∧¬R(a, b) falsifies∀xy(¬[R(x, y)∧R(y, x)] → R(x, y)∧¬R(x, y)), which is equivalent to∀xxR(x, y), although it satisfies both the antecedent and the consequent of the universal generalization (cf. also Carnap 1950/1962, 469f).

b. The Logic of Confirmation

After discussing the ravens, Hempel (1945) considers the following conditions of adequacy for any relation of confirmation:

  1. Entailment Condition: If an evidential proposition E logically implies some hypothesis H, then E confirms H.
  2. Special Consequence Condition: If an evidential proposition E confirms some hypothesis H, and if H logically implies some hypothesis H', then E also confirms H'.
  3. Special Consistency Condition: If an evidential proposition E confirms some hypothesis H, and if H is not compatible with some hypothesis H', then E does not confirm H'.
  4. Converse Consequence Condition: If an evidential proposition E confirms some hypothesis H, and if H is logically implied by some hypothesis H', then E also confirms H'.

(The equivalence condition mentioned above follows from 2 as well as from 4). Hempel then shows that any relation of confirmation satisfying 1, 2, and 4 is trivial in the sense that every evidential proposition E confirms every hypothesis H. This is easily seen as follows. As E logically implies itself, E confirms E according to the entailment condition. The conjunction of E and H, EH, logically implies E, and so the converse consequence condition entails that E confirms EH. But EH logically implies H; thus E confirms H by the special consequence condition. In fact, it suffices that confirmation satisfies 1 and 4 in order to be trivial: E logically implies and, by 1, confirms the disjunction of E and H, EH. As H logically implies EH, E confirms H by 4.

Hempel (1945) rejects the converse consequence condition as the culprit rendering trivial any relation of confirmation satisfying 1-4. The latter condition has nevertheless gained popularity in the philosophy of science – partly because it seems to be at the core of the account of confirmation we will discuss next.

3. Popper's Falsificationism and Hypothetico-Deductive Confirmation

a. Popper's Falsificationism

Although Popper was an opponent of any kind of induction, his falsificationism gave rise to a qualitative account of confirmation. Popper started by observing that many scientific hypotheses have the form of universal generalizations, say "All metals conduct electricity." Now there can be no amount of observational data that would verify a universal generalization. After all, the next piece of metal could be such that it does not conduct electricity. In order to verify this hypothesis we would have to investigate all pieces of metal there are – and even if there were only finitely many such pieces, we would never know this (unless there were only finitely many space-time regions we would have to search). However, Popper's basic insight is that these universal generalization can easily be falsified. We only need to find a piece of metal that does not conduct electricity in order to know that our hypothesis is false (supposing we can check this). Popper then generalized this. He suggested that all science should put forth bold hypotheses, which are then severely tested (where bold means to have a high degree of falsifiability, in other words, to have many observational consequences). As long as these hypotheses survive their tests, scientists should stick to them. However, once they are falsified, they should be put aside if there are competing hypotheses that remain unfalsified.

This is not the place to list the numerous problems of Popper's falsificationism. Suffice it to say that there are many scientific hypotheses that are neither verifiable nor falsifiable (for example, "Each planet has a moon"), and that falsifying instances are often taken to be indicators of errors that lie elsewhere, say errors of measurement or errors in auxiliary hypotheses. As Duhem and Quine noted, confirmation is holistic in the sense that it is always a whole battery of hypotheses that is put to test, and the arrow of error usually does not point to a single hypothesis (Duhem 1906/1974, Quine 1953).

According to Popper's falsificationism (see Popper 1935/1994) the hallmark of scientific (rather than meaningful, as in the early days of logical positivism) hypotheses is that they are falsifiable: scientific hypotheses must have consequences whose truth or falsity can in principle (and with a grain of salt) be ascertained by observation (with a grain of salt, because for Popper there is always an element of conventionalism in stipulating the basis of science). If there are no conditions under which a given hypothesis is false, this hypothesis is not scientific (though it may very well be meaningful).

b. Hypothetico-Deductive Confirmation

The hypothetico-deductive notion of confirmation says that an evidential proposition E confirms a hypothesis H relative to background information B if and only if the conjunction of H and B, HB, logically implies E in some suitable way (which depends on the particular version of hypothetic-deductivism under consideration). The intuition here is that scientific hypotheses are tested; and if a hypothesis H survives a severe test, then, intuitively, this is evidence in favor of H. Furthermore, scientific hypothesis are often used for predictions. If a hypothesis H correctly predicts some experimental outcome E by logically implying it, then, intuitively, this is again evidence for the truth of H. Both of these related aspects are covered by the above definition, if surviving a test is tantamount to entailing the correct outcome.

Note that hypthetico-deductive confirmation – henceforth HD-confirmation – satisfies Hempel's converse consequence condition. Suppose an evidential proposition E HD-confirms some hypothesis H. This means that H logically implies E in some suitable way. Now any hypothesis H' which logically implies H also logically implies E. But this means – at least under most conditions fixing the "suitable way" of entailment – that E HD-confirms H'.

Hypothetico-deductivism has run into serious difficulties. To mention just two, there is the problem of irrelevant conjunctions and the problem of irrelevant disjunctions. Suppose an evidential proposition E HD-confirms some hypothesis H. Then, by the converse consequence condition, E also HD-confirms HH', for any hypothesis H' whatsoever. Assuming that the anomalous perihelion of Mercury confirms the general theory of relativity GTR (Earman 1992), it also confirms the conjunction of GTR and, say, that there is life on Mars – which seems to be wrong. Similarly, if E HD-confirms H, then EE' HD-confirms H, for any evidential proposition E' whatsoever. For instance, the disjunctive proposition of the anomalous perihelion of Mercury or Luca's living on the second floor HD-confirms GTR (Grimes 1990, Moretti 2004).

Another worry with HD-confirmation is that it is not clear how it should be applied to statistical hypotheses that do not strictly entail anything (see, however, Albert 1992). The treatment of statistical hypotheses is no problem for probabilistic theories of confirmation, which we will turn to now.

4. Inductive Logic

For overview articles see Fitelson (2005) and Hawthorne (2005).

a. Kolmogorov's Axiomatization

Before we turn to inductive logic, let us define the notion of probability as it was axiomatized by Kolmogorov (1933; 1956).

Let W be a non-empty set (of outcomes or possibilities), and let A be a field over W, that is, a set of subsets of W that contains the whole set W and is closed under complementation (with respect to W) and finite unions. That is, A is a field over W if and only if A is a set of subsets of W such that

(i) W A

(ii) if AA, then (W\A) = -AA

(iii) if AA and BA, then (AB) ∈ A

where "W\A" is the complement of A with respect to W. If (iii) is strengthened to

(iv) if A1A, … AnA, …, then (A1∪…∪An∪…) ∈ A,

so that A is closed under countable (and not only finite) unions, A is called a σ-field over W.

A function Pr: A → ℜ from the field A over W into the real numbers ℜ is a (finitely additive) probability measure on A if and only if it is a non-negative, normalized, and (finitely) additive measure; that is, if and only if for all A, BA

(K1) Pr(A) ≥ 0

(K2) Pr(W) = 1

(K3) if AB = ∅, then Pr(AB) = Pr(A) + Pr(B)

The triple <W, A, Pr> with W a non-empty set, A a field over W, and Pr a probability measure on A is called a (finitely additive) probability space. If A is a σ-field over W and Pr: A → ℜ additionally satisfies

(K4) if A1A2 ⊇ … ⊇ An … is a decreasing sequence of elements of A, i.e. A1A, … AnA, …, such that A1A2∩…∩An∩… = ∅, then limn→∞ Pr(An) = 0,

Pr is a σ-additive probability measure on A and <W, A, Pr> is a σ-additive probability space (Kolmogorov 1933; 1956, ch. 2). (K4) asserts that

limn→∞ Pr(An) = Pr(A1A2∩…∩An∩…) = Pr(∅) = 0

for a decreasing sequence of elements of A. Given (K1-3), (K4) is equivalent to

(K5) if A1A, … AnA, …, and if AiAj= ∅ for all natural numbers i, j with i j, then Pr(A1∪…∪An∪…) = Pr(A1) + … + Pr(An) + …

A probability measure Pr: A → ℜ on A is regular just in case Pr(A) > 0 for every non-empty AA. Let <W, A, Pr> be a probability space, and define A* to be the set of all AA that have positive probability according to Pr, that is, A* = {AA: Pr(A) > 0}. The conditional probability measure Pr(•|-): A x A* → ℜ on A (based on the unconditional probability measure Pr) is defined for all AA and BA* by the fraction

(K6) Pr(A|B) = Pr(AB)/Pr(B)

(Kolmogorov 1933; 1956, ch. 1, §4). The domain of the second argument place of Pr(•|-) has to be restricted to A*, since the fraction Pr(AB)/Pr(B) is not defined when Pr(B) = 0. Note that Pr(•|B): A → ℜ is a probability measure on A, for every BA*.

Here are some immediate consequences of the Kolmogorov axioms and the definition of conditional probability. For every probability space <W, A, Pr> and all A, BA,

  • Law of Negation: Pr(-A)= 1 – Pr(A)
  • Law of Conjunction: Pr(AB) = Pr(B)•Pr(A|B) whenever Pr(B) > 0
  • Law of Disjunction: Pr(AB) = Pr(A) + Pr(B) – Pr(AB)
  • Law of Total Probability: Pr(B) = ΣiPr(B|Ai)•Pr(Ai),

where the Ai form a countable partition of W, i.e. A1, … An, … is a sequence of mutually exclusive (AiAj= ∅ for all i, j with i j) and jointly exhaustive (A1∪…∪An∪… = W) elements of A. A special case of the Law of Total Probability is

Pr(B) = Pr(B|A)•Pr(A) + Pr(B|-A)•Pr(-A).

Finally the definition of conditional probability is easily turned into

Bayes's Theorem: Pr(A|B) = Pr(B|A)•Pr(A)/Pr(B)

= Pr(B|A)•Pr(A)/[Pr(B|A)•Pr(A) + Pr(B|-A)•Pr(-A)]

= Pr(B|A)•Pr(A)/ΣiPr(B|Ai)•Pr(Ai),

where the Ai form a countable partition of W. The important role played by Bayes's Theorem (in combination with some principle linking objective chances and subjective probabilities) for confirmation will be discussed below. For more on Bayes's Theorem see Joyce (2003).

The names of the first three laws above already indicate that probability measures can also be defined on formal languages. Instead of defining probability on a field A over some non-empty set W, we can take its domain to be a formal language L, that is, a set of (possibly open) well-formed formulas that contains the tautological sentence τ (corresponding to the whole set W) and is closed under negation ¬ (corresponding to complementation) and disjunction ∨ (corresponding to finite union). That is, L is a language if and only if L is a set of well-formed formulas such that

(i) τL

(ii) if αL, then ¬α ∈ L

(iii) if αL and βL, then (α∨β) ∈ L

If L additionally satisfies

(iv) if αL, then ∃ L,

L is called a quantificational language.

A function Pr: L → ℜ from the language L into the reals ℜ is a probability on L if and only if for all α, βL,

(L0) Pr(α) = Pr(β) if α is logically equivalent (in the sense of classical logic CL) to β

(L1) Pr(α) ≥ 0,

(L2) Pr(τ) = 1,

(L3) Pr(α∨β) = Pr(α) + Pr(β), if α∧β is logically inconsistent (in the sense of CL).

(L0) is not necessary, if (L2) is strengthened to: (L2+) Pr(α) = 1, if α is logically valid. If L is a quantificational language with an individual constant "ai" for each individual ai in the envisioned countable domain, i = 1, 2, …, n, …, and Pr: L → ℜ additionally satisfies

(L4) limn→∞Pr(α[a1/x]∧…∧α[an/x]) = Pr(∀),

Pr is called a Gaifman-Snir probability. Here "α[ai/x]" results from "α[x]" by substituting the individual constant "ai" for all occurrences of the individual variable "x" in." "x" in "α[x]" indicates that "x" occurs free in "α," that is to say, "x" is not bound in "α" by a quantifier like it is in "∀."

Given (L0-3) and the restriction to countable domains, (L4) is equivalent to

(L5) limn→∞Pr(α[a1/x]∨…∨α[an/x]) = sup{Pr(α[a1/x]∨…∨α[an/x]): nN} =
Pr
(∃),

where the equation on the right-hand side is the slightly more general definition adopted by Gaifman & Snir (1982, 501). A probability Pr: L → ℜ on L is regular just in case Pr(α) > 0 for every consistent αL. For L* = {αL: Pr(α) > 0} the conditional probability Pr(•|-): L x L* → ℜ on L (based on Pr) is defined for all αL and all βL* by the fraction

(L6) Pr(α|β) = Pr(αβ)/Pr(β).

As before, Pr(•|β): L → ℜ is a probability on L, for every βL.

Each probability Pr on a language L induces a probability space <W, A, Pr*> with W the set Mod of all models for L, A the smallest σ-field containing the field {Mod(α) ⊆Mod: αL}, and Pr* the unique σ-additive probability measure on A such that Pr*(Mod(α)) = Pr(α) for all αL. (A model for a language L with an individual constant for each individual in the envisioned domain can be represented by a function w: L → {0,1} from L into the set {0,1} such that for all α, βL: w(¬α) = 1 – w(α), w(αβ) = max{w(α), w(β)}, and w(∃) = max{w(α[a/x]): "a" is an individual constant of L}.)

In conclusion, it is to be noted that some authors take conditional probability Pr(• given -) as primitive and define probability as Pr(• given W) or Pr(• given τ) (see Hájek 2003b). For more on probability and its interpretations see Hájek (2003a), Hájek & Hall (2000), Fitelson & Hájek & Hall (2005).

b. Logical Probability and Degree of Confirmation

There has always been a close connection between probability and induction. Probability was thought to provide the basis for an inductive logic. Early proponents of a logical conception of probability include Keynes (1921/1973) and Jeffreys (1939/1967). However, by far the biggest effort to construct an inductive logic was undertaken by Carnap in his Logical Foundations of Probability (1950/1962). Carnap starts from a simple formal language with countably many individual constants (such as "Carl Gustav Hempel") denoting individuals (namely, Carl Gustav Hempel) and finitely many monadic predicates (such as "is a great philosopher of science") denoting properties (namely, being a great philosopher of science), but not relations (such as being a better philosopher of science than). Then he defines a state-description to be a complete description of each individual with respect to all the predicates. For instance, if the language contains three individual constants "a," "b," and "c" (denoting the individuals a, b, and c, respectively), and four monadic predicates "P," "Q," "R," and "S" (denoting the properties P,  Q,  R, and S, respectively), then there are 23•4 state descriptions of the form:

±Pa ∧ ±Qa ∧ ±Ra ∧ ±Sa ∧ ±Pb ∧ ±Qb ∧ ±Rb ∧ ±Sb ∧ ±Pc ∧ ±Qc ∧ ±Rc ∧ ±Sc,

where "±" indicates that the predicate in question is either unnegated as in "Pa" or negated as in "¬Pa." That is, a state description determines for each individual constant "a" and each predicate "P" whether or not Pa. Based on the notion of a state description, Carnap then introduces the notion of a structure description, a maximal disjunction of state descriptions which can be obtained from each other by uniformly substituting individual constants for each other. In the above example there are, among others, the following structure descriptions:

(Pa ∧ Qa ∧ Ra Sa) ∧ (Pb ∧ Qb ∧ Rb ∧ Sb) ∧ (Pc ∧ Qc ∧ Rc ∧ Sc)

((Pa ∧ QaRaSa) ∧ (PbQbRb ∧ ¬Sb) ∧ (PcQc ∧ ¬RcSc)) ∨((PbQbRbSb) ∧ (PaQaRa ∧ ¬Sa) ∧ (PcQc ∧ ¬RcSc)) ∨((PcQcRcSc) ∧ (PbQbRb ∧ ¬Sb) ∧ (PaQa ∧ ¬RaSa)) ∨((PaQaRaSa) ∧ (PcQcRc ∧ ¬Sc) ∧ (PbQb ∧ ¬RbSb))

So a structure description is a disjunction of one or more state descriptions. It says how many individuals satisfy the maximally consistent predicates (Carnap calls them Q-predicates) that can be formulated in the language. It may but need not say which individuals. The first structure description above says that all three individuals a, b, and c have the maximally consistent property Px Qx Rx Sx. The second structure description says that exactly one individual has the maximally consistent property Px Qx Rx Sx, exactly one individual has the maximally consistent property Px Qx Rx ∧ ¬Sx, and exactly one individual has the maximally consistent property Px Qx ∧ ¬Rx Sx. It does not say which of a, b, and c has the property in question.

Each function that assigns non-negative weights wi to the state descriptions zi whose sum Σiwi equals 1 induces a probability on the language in question. Carnap then argues – by postulating various principles of symmetry and invariance – that each of the finitely many structure (not state) descriptions sj should be assigned the same weight vj such that their sum Σjvj is equal to 1. This weight vj should then be divided equally among the state descriptions whose disjunction constitutes the structure description sj. The probability so obtained is Carnap's favorite m*, which, like any other probability, induces what Carnap calls a confirmation function (and we have called a conditional probability): c*(H, E) = m*(E)/m*(E)

(In case the language contains countably infinitely many individual constants, some structure descriptions are disjunctions of infinitely many state descriptions. These state descriptions cannot all get the same positive weight. Therefore Carnap considers the limit of the measures m*n for the languages Ln containing the first n individual constants in some enumeration of the individual constants, provided this limit exists.)

c* allows learning from experience in the sense that

c*(the n + 1st individual is P, k of the first n individuals are P) > c*(the n + 1st individual is P, τ)

= m*(the n + 1st individual is P),

where τ is the tautological sentence. If we assigned equal weights to the state descriptions instead of the structure descriptions, no such learning would be possible. Let us check that c* allows learning from experience for n = 2 in a language with three individual constants "a," "b," and "c" and one predicate "P." There are eight state descriptions and four structure descriptions:

z1 = Pa Pb Pc s1 = Pa Pb Pc:
z2 = Pa Pb ∧ ¬Pc All three individuals are P.
z3 = Pa ∧ ¬Pb Pc s2 = (Pa Pb ∧ ¬Pc)∨(Pa ∧ ¬Pb Pc)∨(¬Pa Pb Pc):
z4 = Pa ∧ ¬Pb ∧ ¬Pc Exactly two individuals are P.
z5 = ¬Pa Pb Pc s3 = (Pa ∧ ¬Pb ∧ ¬Pc)∨(¬Pa Pb ∧ ¬Pc)∨(¬Pa ∧ ¬Pb Pc):
z6 = ¬Pa Pb ∧ ¬Pc Exactly one individual is P.
z7 = ¬Pa ∧ ¬Pb Pc s4 = ¬Pa ∧ ¬Pb ∧ ¬Pc:
z8 = ¬Pa ∧ ¬Pb ∧ ¬Pc None of the three individuals is P.

Each structure description s1-s4 gets weight vj = 1/4 (j = 1, …, 4).

s1 = z1: v1 = m*(Pa Pb Pc) = 1/4

s2 = z2z3z5: v2 = m*((Pa Pb ∧ ¬Pc)∨(Pa ∧ ¬Pb Pc)∨(¬Pa Pb Pc)) = 1/4

s3 = z4z6z7: v3 = m*((Pa ∧ ¬Pb ∧ ¬Pc)∨(¬Pa Pb ∧ ¬Pc)∨(¬Pa ∧ ¬Pb Pc)) = 1/4

s4 = z8: v4 = m*Pa ∧ ¬Pb ∧ ¬Pc) = 1/4

These weights are equally divided among the state descriptions z1-z8.

z1: w1 = m*(Pa Pb Pc) = 1/4 z5: w5 = m*Pa PbPc) = 1/12

z2: w2 = m*(Pa Pb ∧ ¬Pc) = 1/12 z6: w6 = m*Pa Pb ∧ ¬Pc) = 1/12

z3: w3 = m*(Pa ∧ ¬Pb Pc) = 1/12 z7: w7 = m*Pa ∧ ¬Pb Pc) = 1/12

z4: w4 = m*(Pa ∧ ¬Pb ∧ ¬Pc) = 1/12 z8: w8 = m*Pa ∧ ¬Pb ∧ ¬Pc) = 1/4

Let us now compute the values of the confirmation function c*.

c*(the 3rd individual is P, 2 of the first 2 individuals are P) =

= m*(the 3rd individual is P, the first 2 individuals are P)/m*(the first 2 individuals are P)

= m*(the first 3 individuals are P)/m*(the first 2 individuals are P)

= m*(Pa Pb Pc)/m*(Pa Pb)

= (1/4)/(1/4 + 1/12)

= 3/4

> 1/2 = m*(Pc) = c* (the 3rd individual is P)

The general formula is (Carnap 1950/1962, 568)

c*(the n + 1st individual is P, k of the first n individuals are P)

= (k + ϖ)/(n + κ)

= (k + (ϖ/κ)•κ)/(n + κ),

where ϖ is the "logical width" of the predicate "P" (Carnap 1950/1962, 127), that is, the number of maximally consistent properties or Q-predicates whose disjunction is logically equivalent to "P" (ϖ = 1 in our example: "P"). κ = 2π is the total number of Q-predicates (κ = 21 = 2 in our example: "P" and "¬P") with π being the number of primitive predicates (π = 1 in our example: "P"). This formula is dependent on the logical factor ϖ/κ of the "relative width" of the predicate "P," and the empirical factor k/n of the relative frequency of Ps.

Later on, Carnap (1952) generalizes this to a whole continuum of confirmation functions Cλ where the parameter λ is inversely proportional to the impact of evidence. λ specifies how the confirmation function Cλ weighs between the logical factor ϖ/κ and the empirical factor k/n. For λ = ∞, Cλ is independent of the empirical factor k/n: Cλ(the n + 1st individual is P, k of the first n individuals are P) = ϖ/κ (Carnap 1952, §13). For λ = 0, Cλ is independent of the logical factor ϖ/κ: Cλ(the n + 1st individual is P, k of the first n individuals are P) = k/n and thus coincides with what is known as the straight rule (Carnap 1952, §14). c*is the special case with λ = κ (Carnap 1952, §15). The general formula is (Carnap 1952, §9)

Cλ(the n + 1st individual is P, k of the first n individuals are P) = (k + λ/κ)/(n + λ).

In his (1963) Carnap slightly modifies the set up and considers families of monadic predicates {"P1," …, "Pp"} like the family of color predicates {"red," "green," …, "blue"}. For a given family {"P1," …, "Pp"} and each individual constant "a" there is exactly one predicate "Pj" such that Pja. Families thus generalize {"P," "¬P"} and correspond to random variables. Given his axioms (including A15) Carnap (1963, 976) shows that for each family {"P1," …, "Pp"}, p ≥ 2,

Cλ(the n + 1st individual is Pj, k of the first n individuals are Pj) = (k + λ/p)/(n + λ).

One of the peculiar features of Carnap's systems is that universal generalizations get degrees of confirmation (alias conditional probability) 0. Hintikka (1966) further elaborates Carnap's project in this respect. For a neo-Carnapian approach see Maher (2004a).

Of more interest to us is Carnap's discussion of "the controversial problem of the justification of induction" (1963, 978, emphasis in the original). For Carnap, the justification of induction boils down to justifying the axioms specifying a set of confirmation functions. The "reasons are based upon our intuitive judgments concerning inductive validity". Therefore "[i]t is impossible to give a purely deductive justification of induction," and these "reasons are a priori" (Carnap 1963, 978). So according to Carnap, induction is justified by appeals to intuition about inductive validity. We will see below that Goodman, who is otherwise very skeptical about the prospects of Carnap's project, shares this view of the justification of induction. In fact, the view also seems to be widely accepted among current Bayesian confirmation theorists and their desideratum/explicatum approach (see Fitelson 2001 for an example). [According to Carnap (1962), an explication is "the transformation of an inexact, prescientific concept, the explicandum, into a new exact concept, the explicatum." (Carnap 1962, 3) The desideratum/explicatum approach consists in stating various "intuitively plausible desiderata" the explicatum is supposed to satisfy. Proposals for explicata that do not satisfy these desiderata are rejected. This appeal to intuitions is fine as long as we are doing conceptual analysis. However, contemporary confirmation theorists also sell their accounts as normative theories. Normative theories are not justified by appeal to intuitions, though. They are justified relative to a goal by showing that the norms in question further the goal at issue. See section 7.]

First, however, we will have a look at what Carnap has to say about Hempel's conditions of adequacy.

c. Absolute and Incremental Confirmation

As we saw in the preceding section, one of Carnap's goals was to define a quantitative notion of confirmation, explicated by a confirmation function in the manner indicated above. It is important to note that this quantitative concept of confirmation is a relation between two propositions H and E (three, if we include the background information B), a number r, and a confirmation function c. In chapters VI and VII of his (1950/1962) Carnap discusses comparative and qualitative concepts of confirmation. The explicans for qualitative confirmation he offers is that of positive probabilistic relevance in the sense of some logical probability m. That is, E qualitatively confirms H in the sense of some logical measure m just in case E is positively relevant to H in the sense of m, that is,

m(HE) > m(H)•m(E).

If both m(H) and m(E) are positive – which is the case whenever both H and E are not logically false, because Carnap assumes m to be regular – this is equivalently expressed by the following inequality:

c(H, E) > c(H, τ) = m(H)

So provided both H and E have positive probability, E confirms H if and only if E raises the conditional probability (degree of confirmation in the sense of c) of H. Let us call this concept incremental confirmation. Again, note that qualitative confirmation is a relation between two propositions H and E, and a conditional probability or confirmation function c. Incremental confirmation, or positive probabilistic relevance, is a qualitative notion, which says whether E raises the conditional probability (degree of confirmation in the sense of c) of H. Its natural quantitative counterpart measures how much E raises the conditional probability of H. This measure may take several forms which will be discussed below.

Incremental confirmation is different from the concept of absolute confirmation on which it is based. The quantitative explication of absolute confirmation is given by one of Carnap's confirmation functions c. The qualitative counterpart is to say that E absolutely confirms H in the sense of c if and only if the degree of absolute confirmation of H by E is sufficiently high, c(H, E) > r. So Carnap, who offers degree of absolute confirmation c(H, E) as explication for the quantitative notion of confirmation of H by E, and who offers incremental confirmation or positive probabilistic relevance between E and H as explication of the qualitative notion of confirmation, is, to say the least, not fully consistent in his terminology. He switches between absolute confirmation (for the quantitative notion) and incremental confirmation (for the qualitative notion). This is particularly peculiar, because Carnap (1950/1962, §87) is the locus classicus for the discussion of Hempel's conditions of adequacy mentioned in section 2b.

d. Carnap's Analysis of Hempel's Conditions

In analyzing the special consequence condition, Carnap argues that

Hempel has in mind as explicandum the following relation: "the degree of confirmation of H by E is greater than r, where r is a fixed value, perhaps 0 or 1/2 (Carnap 1962, 475; notation adapted);

that is, the qualitative concept of absolute confirmation. Similarly when discussing the special consistency condition:

Hempel regards it as a great advantage of any explicatum satisfying [a more general form of the special consistency condition 3] "that it sets a limit, so to speak, to the strength of the hypotheses which can be confirmed by given evidence" … This argument does not seem to have any plausibility for our explicandum, (Carnap 1962, 477; emphasis in original)

which is the qualitative concept of incremental confirmation,

[b]ut it is plausible for the second explicandum mentioned earlier: the degree of [absolute] confirmation exceeding a fixed value r. Therefore we may perhaps assume that Hempel's acceptance of [a more general form of 3] is due again to an inadvertent shift to the second explicandum. (Carnap 1962, 477-478)

Carnap's analysis can be summarized as follows. In presenting his first three conditions of adequacy, Hempel was mixing up two distinct concepts of confirmation, two distinct explicanda in Carnap's terminology, namely,

(i) the qualitative concept of incremental confirmation (positive probabilistic relevance) according to which E confirms H if and only if E (has non-zero probability and) increases the degree of absolute confirmation (conditional probability) of H, and

(ii) the qualitative concept of absolute confirmation according to which E confirms H if and only if the degree of absolute confirmation (conditional probability) of H by E is greater than some value r.

Hempel's second and third condition, 2 and 3, respectively, hold true for the second explicandum (for r ≥ 1/2), but they do not hold true for the first explicandum. On the other hand, Hempel's first condition holds true for the first explicandum, but it does so only in a qualified form (Carnap 1950/1962, 473) – namely only if E is not assigned probability 0, and H is not already assigned probability 1.

This, however, means that, according to Carnap's analysis, Hempel first had in mind the explicandum of incremental confirmation for the entailment condition. Then he had in mind the explicandum of absolute confirmation for the special consequence and the special consistency conditions 2 and 3, respectively. And then, when Hempel presented the converse consequence condition, he got completely confused and had in mind still another explicandum or concept of confirmation (neither the first nor the second explicandum satisfies the converse consequence condition). This is not a very charitable analysis. It is not a good one either, because the qualitative concept of absolute confirmation, which Hempel is said to have had in mind for 2 and 3, also satisfies 1 – and it does so without the second qualification that H be assigned a probability smaller than 1. So there is no need to accuse Hempel of mixing up two concepts of confirmation. Indeed, the analysis is bad, because Carnap's reading of Hempel also leaves open the question of what the third explicandum for the converse consequence condition might have been. For a different analysis of Hempel's conditions and a corresponding logic of confirmation see Huber (2007a), respectively.

5. The New Riddle of Induction and the Demise of the Syntactic Approach

According to Goodman (1983, ch. III), the problem of justifying induction boils down to defining valid inductive rules, and thus to a definition of confirmation. The reason is that an inductive inference is justified by conformity to an inductive rule, and inductive rules are justified by their conformity to accepted inductive practices. One does not have to follow Goodman in this respect, however, in order to appreciate his insight that whether a hypothesis is confirmed by a piece of evidence depends on features other than their syntactical form.

In his (1946) he asks us to suppose a marble has been drawn from a certain bowl on each of the ninety-nine days up to and including VE day, and that each marble drawn was red. Our evidence can be described by the conjunction "Marble 1 is red and … and marble 99 is red," in symbols: Ra1∧ …∧ Ra99. Whatever the details of our theory of confirmation, this evidence will confirm the hypothesis "Marble 100 is red," R100. Now consider the predicate S = "is drawn by VE day and is red, or is drawn after VE day and is not red." In terms of S rather than R our evidence is described by the conjunction "Marble 1 is drawn by VE day and is red or it is drawn after VE day and is not red, and …, and marble 99 is drawn by VE day and is red or it is drawn after VE day and is not red," Sa1∧ …∧ Sa99. If our theory of confirmation relies solely on syntactical features of the evidence and the hypothesis, our evidence will confirm the conclusion "Marble 100 is drawn by VE and is red, or it is drawn after VE day and is not red," S100. But we know that the next marble will be drawn after VE day. Given this, S100 is logically equivalent to the negation of R100. So one and the same piece of evidence can be used to confirm a hypothesis and its negation, which is certainly absurd.

One might object to this example that the two formulations do not describe one and the same piece of evidence after all. The first formulation in terms of R should be the conjunction "Marble 1 is drawn by VE day and is red, and …, and marble 99 is drawn by VE day and is red," (Da1Ra1)∧ …∧ (Da99Ra99). The second formulation in terms of S should be "Marble 1 is drawn by VE day and it is drawn by VE day and red or drawn after VE and not red, and …, and marble 99 is drawn by VE day and it is drawn by VE day and red or drawn after VE day and not red," (Da1Sa1)∧ …∧ (Da99Sa99). Now the two formulations really describe one and the same piece of evidence in the sense of being logically equivalent. But then the problem is whether any interesting statement can ever be confirmed. The syntactical form of the evidence now seems to confirm Da100Ra100, equivalently Da100Sa100. But we know that the next marble is drawn after VE day; that is, we know ¬Da100. That the future resembles the past in all respects is thus false. That it resembles the past in some respects is trivial. The new riddle of induction is the question in which respects the future resembles the past, and in which it does not.

It has been suggested that the puzzling character of Goodman's example is due to its mentioning a particular point of time, namely, VE day. A related reaction has been that gerrymandered predicates, whether or not they involve a particular point of time, cannot be used in inductive inferences. But there are plenty of similar examples (Stalker 1994), and it is commonly agreed that Goodman has succeeded in showing that a purely syntactical definition of (degree of) confirmation won't do. Goodman himself sought to solve his new riddle of induction by distinguishing between "projectible" predicates such as "red" and unprojectible predicates such as "is drawn by VE day and is red, or is drawn after VE day and is not red." The projectibility of a predicate is in turn determined by its entrenchment in natural language. This comes very close to saying that the projectible predicates are the ones that we do in fact project (that is, use in inductive inferences). (Quine's 1969 "natural kinds" are special cases of what can be described by projectible predicates.)

6. Bayesian Confirmation Theory

Bayesian confirmation theory is by far the most popular and elaborated theory of confirmation. It has its origins in Rudolf Carnap's work on inductive logic (Carnap 1950/1962), but relieves itself from defining confirmation in terms of logical probability. More or less any subjective degree of belief function satisfying the Kolmogorov axioms is considered to be an admissible probability measure.

a. Subjective Probability and the Dutch Book Argument

In Bayesian confirmation theory, a probability measure on a field of propositions is usually interpreted as an agent's degree of belief function. There is disagreement as to how broad the class of admissible probability measures is to be construed. Some objective Bayesians such as the early Carnap insist that the class consist of a single logical probability measure, whereas subjective Bayesians admit any probability measure. Most Bayesians will be somewhere in the middle of this spectrum when it comes to the question which particular degree of belief functions it is reasonable to adopt in a particular situation. But they will agree that from a purely logical point of view any (regular) probability measure is acceptable. The standard argument for this position is the Dutch Book Argument.

The Dutch Book Argument starts with the assumption that there is a link between subjective degrees of belief and betting ratios. It is further assumed that it is pragmatically defective to accept a series of bets which guarantees a sure loss, that is, a Dutch Book. By appealing to the Dutch Book Theorem that an agent's betting ratios satisfy the probability axioms just in case they do not make the agent vulnerable to such a Dutch Book, it is inferred that it is epistemically defective to have degrees of belief that violate the probability axioms. The strength of this inference is, of course, dependent on the link between degrees of belief and betting ratios. If this link is identity – as it is when one defines degrees of belief as betting ratios – the distinction between pragmatic and epistemic defectiveness disappears, and the Dutch Book Argument is a deductively valid argument. But this comes at the cost of rendering the link between degrees of belief and betting ratios implausible. If the link is weaker than identity – as it is when degrees of belief are only measured by betting ratios – the Dutch Book Argument is not deductively valid anymore, but it has more plausible assumptions.

The pragmatic nature of the Dutch Book Argument has led to so called depragmatized versions. A depragmatized Dutch Book Argument starts with a link between degrees of belief and fair betting ratios, and it assumes that it is epistemically defective to consider a series of bets that guarantees a sure loss as fair. Using the depragmatized Dutch Book Theorem that an agent's fair betting ratios obey the probability calculus if and only if the agent never considers a Dutch Book as fair, it is then inferred that it is epistemically defective to have degrees of belief that do not obey the probability calculus. The thesis that an agent's degree of belief function should obey the probability calculus is called probabilism. For more on the Dutch Book Argument see Hájek (2005). For a different justification of probabilism in terms of the accuracy of degrees of belief see Joyce (1998).

b. Confirmation Measures

Let A be a field of propositions over some set of possibilities W, let H, E, B be propositions from A, and let Pr be a probability measure on A. We already know that H is incrementally confirmed by E relative to B in the sense of Pr if and only if Pr(HE|B) > Pr(H|B)•Pr(E|B), and that this is a relation between three propositions and a probability space whose field contains the propositions. The central notion in Bayesian confirmation theory is that of a confirmation measure. A real valued function c: P → ℜ from the set P of all probability spaces <W, A, Pr> into the reals ℜ is a confirmation measure if and only if for every probability space <W, A, Pr> and all H, E, BA:

c(H, E, B) > 0 ↔ Pr(HE|B) > Pr(H|B)•Pr(E|B)

c(H, E, B) = 0 ↔ Pr(HE|B) = Pr(H|B)•Pr(E|B)

c(H, E, B) < 0 ↔ Pr(HE|B) < Pr(H|B)•Pr(E|B)

The six most popular confirmation measures are (what I now call) the Carnap measure c (Carnap 1962), the distance measure d (Earman 1992), the log-likelihood or Good-Fitelson measure l (Fitelson 1999 and Good 1983), the log-ratio or Milne measure r (Milne 1996), the Joyce-Christensen measure s (Christensen 1999, Joyce 1999, ch. 6), and the relative distance measure z (Crupi & Tentori & Gonzalez 2007).

c(H, E, B) = Pr(HE|B) – Pr(H|B)•Pr(E|B)

d(H, E, B) = Pr(H|EB) – Pr(H|B)

l(H, E, B) = log [Pr(E|HB)/Pr(E|-HB)]

r(H, E, B) = log [Pr(H|EB)/Pr(H|B)]

s(H, E, B) = Pr(H|EB) – Pr(H|-EB)

z(H, E, B) = [Pr(H|EB) – Pr(H|B)]/Pr(-H|B) if Pr(H|EB) ≥Pr(H|B)

= [Pr(H|EB) – Pr(H|B)]/Pr(H|B) if Pr(H|EB) < Pr(H|B)

(Mathematically speaking, there are uncountably many confirmation measures.) For an overview article, see Eells (2005). Book length expositions are Earman (1992) and Howson & Urbach (1989/2005).

c. Some Success Stories

Bayesian confirmation theory captures the insights of Popper's falsificationism and hypothetico-deductive confirmation. Suppose evidence E falsifies hypothesis H relative to background information B in the sense that BHE = ∅. Then Pr(EH|B) = 0, and so Pr(EH|B) = 0 < Pr(H|B)•Pr(E|B), provided both Pr(H|B) and Pr(E|B) are positive. So as long as H is not already known to be false (in the sense of having probability 0 conditional on B) and E is a possible outcome (one with positive probability conditional on B), falsifying E incrementally disconfirms H relative to B in the sense of Pr.

Remember, E HD-confirms H relative to B if and only if the conjunction of H and B logically implies E (in some suitable way). In this case Pr(EH|B) = Pr(H|B), provided Pr(B) > 0. Hence as long as Pr(E|B) < 1, we have

Pr(EH|B) > Pr(H|B)•Pr(E|B),

which means that E incrementally confirms H relative to B in the sense of Pr (Kuipers 2000).

If the conjunction of H and B logically implies E, but E is already known to be true in the sense of having probability 1 conditional on B, E does not incrementally confirm H relative to B in the sense of Pr. In fact, no E which receives probability 1 conditional on B can incrementally confirm any H whatsoever. This is the so called problem of old evidence (Glymour 1980). It is a special case of a more general phenomenon. The following is true for many confirmation measures (d, l, and r, but not s). If H is positively relevant to E given B, the degree to which E incrementally confirms H relative to B is greater, the smaller the probability of E given B. Similarly, if H is negatively relevant for E given B, the degree to which E disconfirms H relative to B is greater, the smaller the probability of E given B (Huber 2005a). If Pr(E|B) = 1 we have the problem of old evidence. If Pr(E|B) = 0 we have the above mentioned problem that E cannot disconfirm hypotheses it falsifies.

Some people simply deny that the problem of old evidence is a problem. Bayesian confirmation theory, it is said, does not explicate whether and how much E confirms H relative to B. It explicates whether E is additional evidence for H relative to B, and how much additional confirmation E provides for H relative to B. If E already has probability 1 conditional on B, it is part of the background knowledge, and so does not provide any additional evidence for H. More generally, the more we already believe in E, the less additional (dis)confirmation this provides for positively (negatively) relevant H. This reply does not work in case E is a falsifier of H with probability 0 conditional on B, for in this case Pr(H|EB) is not defined. It also does not agree with the fact that the problem of old evidence is taken seriously in the literature on Bayesian confirmation theory (Earman 1992, ch. 5). An alternative view (Joyce 1999, ch. 6) sees several different but equally legitimate concepts of confirmation at work. The intuition behind one concept is the reason for the implausibility of the explication of another.

In contrast to hypothetico-deductivism, Bayesian confirmation theory has no problem with assigning degrees of incremental confirmation to statistical hypotheses. Such alternative statistical hypotheses H1, …Hn, … are taken to specify the probability of an outcome E. The probabilities Pr(E|H1), …Pr(E|Hn), … are called the likelihoods of the hypotheses Hi. Together with their prior probabilities Pr(Hi) the likelihoods determine the posterior probabilities of the Hi via Bayes's Theorem:

Pr(Hi|E) = Pr(E|Hi)•Pr(Hi)/[ΣjPr(E|Hj)•Pr(Hj) + Pr(E|H)•Pr(H)]

The so called "catchall" hypothesis H is the negation of the disjunction or union of all the alternative hypotheses Hi, and so it is equivalent to -(H1∪…∪Hn∪…). It is important to note the implicit use of something like the principal principle (Lewis 1980) in such an application of Bayes's Theorem. The probability measure Pr figuring in the above equation is an agent's degree of belief function. The statistical hypotheses Hi specify the objective chance of the outcome E as Chi(E). Without a principle linking objective chances to subjective degrees of belief, nothing guarantees that the agent's conditional degree of belief in E given Hi, Pr(E|Hi), is equal to the chance of E as specified by Hi, Chi(E). The principal principle says that an agent's conditional degree of belief in a proposition A given the information that the chance of A is equal to r (and no further inadmissible information) should be r, Pr(A|Ch(A) = r) = r. For more on the principal principle see Hall (1994), Lewis (1994), Thau (1994), as well as Vranas (2004a). Spohn shows that the principal principle is a special case of the reflection principle (van Fraassen 1984; 1995). The latter principle says that an agent's current conditional degree of belief in A given that her future belief in A equals r should be r,

Prnow(A|Prlater(A) = r) = r provided Prnow(Prlater(A)=r) > 0.

Bayesian confirmation theory can also handle the ravens paradox. As we have seen, Hempel thought that "a is neither black nor a raven" confirms "All ravens are black" relative to no or tautological background information. He attributed the unintuitive character of this claim to a conflation of it and the claim that "a is neither black nor a raven" confirms "All ravens are black" relative to our actual background knowledge A – and the fact that A contains the information that there are more non-black objects than ravens. The latter information is reflected in our degree of belief function Pr by the inequality

PrBa|A) > Pr(Ra|A).

If we further assume that the probabilities of finding a non-black object as well as finding a raven are independent of whether or not all ravens are black,

PrBa|∀x(Rx Bx)∧A) = PrBa|A),

Pr(Ra|∀x(Rx Bx)∧A) = Pr(Ra|A),

we can infer (when we assume all probabilities to be defined) that

Pr(∀x(Rx Bx)|RaBaA) > Pr(∀x(Rx Bx)|¬Ra∧¬BaA) >
Pr
(∀x(Rx Bx)|A).

So Hempel's intuitions are vindicated by Bayesian confirmation theory to the extent that the above independence assumptions are plausible (or there are weaker assumptions entailing a similar result), and to the extent he also took non-black non-ravens to confirm the ravens hypothesis relative to our actual background knowledge. For more, see Vranas (2004b).

Let us finally consider the problem of irrelevant conjunction in Bayesian confirmation theory. HD-confirmation satisfies the converse consequence condition, and so has the undesirable feature that E confirms HH' relative to B whenever E confirms H relative to B, for any H' whatsoever. This is not true for incremental confirmation. Even if Pr(EH|B) > Pr(E|B)•Pr(H|B), it need not be the case that Pr(EHH'|B) > Pr(E|B)•Pr(HH'|B). However, the following special case is also true for incremental confirmation.

If HB logically implies E, then E incrementally confirms HH' relative to B, for any H' whatsoever (whenever the relevant probabilities are defined).

In the spirit of the last paragraph, one can, however, show that HH' is less confirmed by E relative to B than H alone (in the sense of the distance measure d and the Good-Fitelson measure l) if H' is an irrelevant conjunct to H given B with respect to E in the sense that

Pr(E|HH'B) = Pr(E|HB)

(Hawthorne & Fitelson 2004). If HB logically implies E, then every H' such that Pr(HH'B) > 0 is irrelevant in this sense. For more see Fitelson (2002), Hawthorne & Fitelson (2004), Maher (2004b).

7. Taking Stock

Let us grant that Bayesian confirmation theory adequately explicates the concept of confirmation. If so, then this is the concept scientists use when they say that the anomalous perihelion of Mercury confirms the general theory of relativity. It is also the concept more ordinary epistemic agents use when they say that, relative to what they have experienced so far, the dark clouds on the sky are evidence for rain. The question remains what happened to Hume's problem of the justification of induction. We know – by definition – that the conclusion of an inductively strong argument is well-confirmed by its premises. But does that also justify our acceptance of that conclusion? Don't we first have to justify our definition of confirmation before we can use it to justify our inductive inferences?

It seems we would have to, but, as Hume argued, such a justification of induction is not possible. All we could hope for is an adequate description of our inductive practices. As we have seen, Goodman took the task of adequately describing induction as being tantamount to its justification (Goodman 1983, ch. III, ascribes a similar view to Hume, which is somehow peculiar, because Hume argued that a justification of induction is impossible). In doing so he appealed to deductive logic, which he claimed to be justified by its conformity to accepted practices of deductive reasoning. But that is not so. Deductive logic is not justified because it adequately describes our practices of deductive reasoning – it doesn't. The rules of deductive logic are justified relative to the goal of truth preservation in all possible worlds. The reasons are that (i) in going from the premises of a deductively valid argument to its conclusion, truth is preserved in all possible worlds (this is known as soundness); and that (ii) any argument with that property is a deductively valid argument (this is known as completeness). Similarly for the rules of nonmonotonic logic, which are justified relative to the goal of truth preservation in all "normal" worlds (for normality see e.g. Koons 2005). The reason is that all and only nonmonotonically valid inferences are such that truth is preserved in all normal worlds when one jumps from the premises to the conclusion (Kraus & Lehmann & Magidor 1990, for a survey see Makinson 1994). More generally, the justification of a canon of normative principles – such as the rules of deductive logic, the rules of nonmonotonic logic, or the rules of inductive logic – are only justified relative to a certain goal when one can show that adhering to these normative principles in some sense furthers the goal in question.

Similarly to Goodman, Carnap sought to justify the principles of his inductive logic by appeals to intuition (cf. the quote in section 4b). Contemporary Bayesian confirmation theorists with their desideratum/explicatum approach follow Carnap and Goodman at least insofar as they apparently do not see the need for justifying their accounts of confirmation by more than appeals to intuition. These are supposed to show that their definitions of confirmation are adequate. But the alleged impossibility of justifying induction does not entail that its adequate description or explication in form of a particular theory of confirmation is sufficient to justify inductive inferences based on that theory. Moreover, as noted by Reichenbach (1938; 1940), a justification of induction is not impossible after all. Hume was right in claiming that there is no deductively valid argument with knowable premises and the conclusion that inductively strong arguments will always lead us to true conclusions. But that is not the only conclusion that would justify induction. Reichenbach was mainly interested in the limiting relative frequencies of particular outcomes in various sequences of events. He could show that a particular inductive rule – the straight rule that conjectures that the limiting relative frequency is equal to the observed relative frequency – will lead us to the true limiting relative frequency, if any inductive rule does. However, the straight rule is not the only rule with this property. Therefore its justification relative to the goal of discovering limiting relative frequencies is at least incomplete. If we want to keep the analogy to deductive logic, we can put things as follows: Reichenbach was able to establish the soundness, but not the completeness, of his inductive logic (that is, the straight rule) with respect to the goal of eventually arriving at the true limiting relative frequency. (Reichenbach himself provides an example that proves the incompleteness of the straight rule with respect to this goal.)

While soundness in this sense is not sufficient for a justification of the straight rule, such results provide more reasons than appeals to intuition. They are necessary conditions for the justification of a normative rule of inference relative to a particular goal of inquiry. A similar view about the justification of induction is held by formal learning theory. Here one considers the objective reliability with which a particular method (such as the straight rule or a particular confirmation measure) finds out the correct answer to a given question. The use of a method to answer a question is only justified when the method reliably answers the question, if any method does. As different questions differ in their complexity, there are different senses of reliability. A method may correctly answer a question after finitely many steps and with a sign that the question is answered correctly – as when we answer the question whether the first observed raven is black by saying "yes" if it is, and "no" otherwise. Or it may answer the question after finitely many steps and with a sign that it has done so when the answer is "yes," but not when the answer is "no" – as when we answer the question whether there exists a black raven by saying "yes" when we first observe a black raven, and by saying "no" otherwise. Or it may stabilize to the correct answer in the sense that the method conjectures the right answer after finitely many steps and continues to do so forever without necessarily giving a sign that it has arrived at the correct answer – as when we answer the question whether the limiting relative frequency of black ravens among all ravens is greater than .5 by saying "yes" as long as the observed relative frequency is greater than .5, and by saying "no" otherwise (under the assumption that this limit exists). And so on. This provides a classification of all problems in terms of their complexity. The use of a particular method for answering a question of a certain complexity is only justified if the method reliably answers the question in the sense of reliability determined by the complexity of the question. A discussion of Bayesian confirmation theory from the point of view of formal learning theory can be found in Kelly & Glymour (2004). Schulte (2002) gives an introduction to the main philosophical ideas of formal learning theory. A technically advanced book length exposition is Kelly (1996). The general idea is the same as before. A rule is justified relative to a certain goal to the extent that the rule furthers that goal.

So can we justify particular inductive rules in the form of confirmation measures along these lines? We had better, for otherwise there might be inductive rules that would reliably lead us to the correct answer about a question where our inductive rules won't (cf. Putnam 1963a; see also his 1963b). Before answering this question, let us first be clear which goal confirmation is supposed to further. In other words, why should we accept well-confirmed hypotheses rather than any other hypotheses? A natural answer is that science and our more ordinary epistemic enterprises aim at true hypotheses. The justification for confirmation would then be that we should accept well-confirmed hypotheses, because we are in some sense guaranteed to arrive at true hypotheses if (and only if) we stick to well-confirmed hypotheses. Something along these lines is true for absolute confirmation according to which degree of confirmation is equal to probability conditional on the data. More precisely, the Gaifman and Snir convergence theorem (Gaifman & Snir 1982) says that for almost every world or model w for the underlying language – that is, all worlds w except, possibly, for those in a set of measure 0 (in the sense of the measure Pr* on the σ-field A from section 4a) – the probability of a hypothesis conditional on the first n data sentences from w converges to its truth value in w (1 for true, 0 for false). It is assumed here that the set of all data sentences separates the set of all worlds (in the sense that for any two distinct worlds there is a data sentence which is true in the one and false in the other world). If we accept a hypothesis as true as soon as its probability is greater than .5 (or any other positive threshold value < 1), and reject it as false otherwise, we are guaranteed to almost surely arrive at true hypotheses after finitely many steps. That does not mean that no other method can do equally well. But it is more than to simply appeal to our intuitions, and a necessary condition for the justification of absolute confirmation relative to the goal of truth. See also Earman (1992, ch. 9) and Juhl (1997).

A more limited result is true for incremental confirmation. Based on the Gaifman and Snir convergence theorem one can show for every confirmation measure c and almost all worlds w that there is an n such that for all later m: the conjunction of the first m data sentences confirms hypotheses that are true in w to a non-negative degree, and it confirms hypotheses that are false in w to a non-positive degree (the set of all data sentences is again assumed to separate the set of all worlds). Even if this more limited result were a satisfying justification for the claim that incremental confirmation furthers the goal of truth, the question remains why one has to go to incremental confirmation in order to arrive at true theories. It also remains unclear what degrees of incremental confirmation are supposed to indicate, for it is completely irrelevant for the above result whether a positive degree of confirmation is high or low – all that matters is that it is positive. This is in contrast to absolute confirmation. There a high number represents a high probability – that is, a high probability of being true – which almost surely converges to the truth value itself. To make these vague remarks more vivid, let us consider an example.

Suppose my 35 year old friend is pregnant and I am curious as to who is the father. I know that it is either the 35 year old Alberto or the 55 year old Ben or the 55 year old Cesar. My initial degree of belief function Pr is such that

Pr(A) = .9, Pr(B) = Pr(C) = .05, Pr(AB) = Pr(AC) = Pr(BC) = 0,

Pr(AB) = Pr(AC) = .95, Pr(BC) = .1, Pr(ABC) = 1,

Pr(AG) = .4, Pr(BG) = .03, Pr(CG) = .03, Pr(G) = .46,

where A is the proposition that Alberto is the father, and similarly for B and C. G is the proposition that the father has grey hair. [More precisely, the probability space is <L, Pr> with L the propositional language over the set of propositional variables {A, B, C, G} and Pr such that Pr(AG) = .4, Pr(BG) = .03, Pr(CG) = .03, Pr(A∧¬G) = .5, Pr(B∧¬G) = .02, Pr(C∧¬G) = .02, Pr(AB) = Pr(AC) = Pr(BC) = PrA∧¬B∧¬C)= 0.] This is a fairly reasonable degree of belief function. Most men at the age of 55 I know have grey hair. Less than 50% of the men of age 35 I know have grey hair. And I tend to use the principal principle whenever I can (assuming a close connection between objective chances and relative frequencies). Now suppose I learn that the father has grey hair. My new degrees of belief are

Pr(A|G) = 40/46, Pr(B|G) = 3/46, Pr(C|G) = 3/46,

Pr(AB|G) = Pr(AC|G) = 43/46, Pr(BC|G) = 6/46, Pr(ABC|G) = 1.

G incrementally confirms B, C, BC, AC, BC, it neither incrementally confirms nor incrementally disconfirms ABC, and it incrementally disconfirms A.

However, my degree of belief in A is still more than thirteen times my degree of belief in B and my degree of belief in C. And whether I have to bet on these propositions or whether I am just curious as to who is the father of my friend's baby, all I care about after having received evidence G will be my new degrees of belief in the various answers – and my utilities, including my desire to answer the question. I will be willing to bet on A at less favorable odds than on either B or C or even their disjunction; and should my friend tell me she is going to marry the father of her baby – she assuming that I know who it is – I would buy my wedding present on the assumption that she is going to marry Alberto (unless, of course, I can ask her first). In this situation, incremental confirmation and degrees of incremental confirmation are at best misleading.

[What is important is a way of updating my old degree of belief function by the incoming evidence. The above example assumes evidence to come in the form of a proposition that I become certain of. In this case, probabilism says I should update my degree of belief function by Strict Conditionalization:

If Pr is your subjective probability at time t, and between t and t' you learn E and no logically stronger proposition in the sense that your new degree of belief in E is 1, then your new subjective probability at time t' should be Pr(•|E).

As Jeffrey (1983) observes, we usually do not learn by becoming certain of a proposition. Evidence often merely changes our degrees of belief in various propositions. Jeffrey Conditionalization is a more general update rule than Strict Conditionalization:

If Pr is your subjective probability at time t, and between t and t' your degrees of belief in the countable partition {E1, …, En, …} change from Pr(Ei) to p∈ [0,1] (with Pr(Ei) = pi for Pr(Ei) ∈ {0,1}), and your positive degrees of belief do not change on any superset thereof, then your new subjective probability at time t' should be Pr*, where for all A, Pr*(A) = ΣiPr(A|Ei)•pi.

For evidential input of the above form, Jeffrey Conditionalization turns regular probability measures into regular probability measures, provided no contingent evidential proposition receives an extreme value p ∈ {0,1}. Radical probabilism (Jeffrey 2004) urges you not to assign such extreme values, and to have a regular initial degree of belief function – that is, whenever you can (but you can't always). Field (1978) proposes an update rule for evidence of a different format.

This is also the place to mention different formal frameworks besides probability theory. For an overview, see Huber (2008a).]

More generally, degrees of belief are important to us, because together with our desires they determine which acts it is rational for us to take. The usual recommendation according to rational choice theory for choosing one's acts is to maximize one's expected utility (the mathematical representation of one's desires), that is, the quantity

EU(a) = ΣsSu(a(s))•Pr(s).

Here S is an exclusive and exhaustive set of states, u is the agent's utility function over the set of outcomes a(s) which are the results of an act a in a state s (acts are identified with functions from states s to outcomes), and Pr is the agent's probability measure on a field over S (Savage 1972). From this decision-theoretic point of view all we need – besides our utilities – are our degrees of belief encoded in Pr. Degrees of confirmation encoding how much one proposition increases the probability of another are of no use here.

In the above examples, I only consider the propositions A, B, C, because they are sufficiently informative to answer my question. If truth were the only thing I am interested in, I would be happy with the tautological answer that somebody is the father of my friend's baby, ABC. But I am not. The reason is that I want to know what is going on out there – not only in the sense of having true beliefs, but also in the sense of having informative beliefs. In terms of decision theory, my decisions do not only depend on my degrees of belief – they also depend on my utilities. This is the idea behind the plausibility-informativeness theory (Huber 2008b), according to which epistemic utilities reduce to informativeness values. If we take as our epistemic utilities in the above example the informativeness values of the various answers (with positive probability) to our question, we get

I(A) = I(B) = I(C) = 1, I(AB) = I(AC) ≈ 40/83, I(BC) = 60/83, I(ABC) = 0,

where the question "Who is the father of my friend's baby?" is represented by the partition Q = {A, B, C} and the informativeness values of the various answers are calculated according to

I(A) = 1 – [1 – ΣiPr*(Xi|A)2]/[1 – ΣiPr*(Xi)2],

a measure proposed by Hilpinen (1970). Contrary to what Hilpinen (1970, 112) claims, I(A) does not increase with the logical strength of A. The probability Pr* is the posterior degree of belief function from our example, Pr(•|G). If we insert these values into the expected utility formula,

EU(a) = Σs∈Su(a(s))•Pr*(s) = ΣX∈Qu(a(X))•Pr*(X) = ΣX∈QI(X)•Pr*(X),

we get the result that the act of accepting A as answer to our question maximizes our expected epistemic utility.

Not all is lost, however. The distance measure d turns out to measure the expected utility of accepting H when utility is identified with informativeness measured according to a measure proposed by Carnap & Bar-Hillel (1953) (one can think of this measure as measuring how much an answer informs about the most difficult question, namely, which world is the actual one?). Similarly, the Joyce-Christensen measure s turns out to measure the expected utility of accepting H when utility is identified with informativeness about the data measured according to a proposal by Hempel & Oppenheim (1948). So far, this is only interesting. It gets important by noting that d and s can also be justified relative to the goal of informative truth – and not just by appealing to our intuitions about maximizing expected utility. When based on a regular probability, there almost surely is an n such that for all later m: relative to the conjunction of the first m data sentences, contingently true hypotheses get a positive value and contingently false hypotheses get a negative value. Moreover, within the true hypotheses, logically stronger hypotheses get a higher value than logically weaker hypotheses. The logically strongest true hypothesis (the complete true theory about the world w) gets the highest value, followed by all logically weaker true hypotheses all the way down to the logically weakest true hypothesis, the tautology, which is sent to 0. Similarly within the false hypotheses: the logically strongest false hypothesis, the contradiction, is sent to 0, followed by all logically weaker false hypotheses all the way down to the logically weakest false hypothesis (the negation of the complete theory about w). As informativeness increases with logical strength, we can put this as follows (assuming that the underlying probability measure is regular): d and s do not only distinguish between true and false theories, as do all confirmation measures (as well as all conditional probabilities). They additionally distinguish between informative and uninformative true theories, as well as between informative and uninformative false theories. In this sense, they reveal the following structure of almost every world w [w(p) = w(q) = 1 in the toy example]:

informative and contingently true in w
pq
> 0 contingently true in w
p, q, pq
uninformative and contingently true in w
p∨q, ¬pq, p∨¬q
= 0 logically determined
p∨¬p, p∧¬p
informative and contingently false in w
¬p∧¬q, p∧¬q, ¬pq
< 0 contingently false in w
¬p, ¬q, p↔¬q
uninformative and contingently false in w
¬p∨¬q

This result is also true for the Carnap measure c, but it does not extend to all confirmation measures. It is false for the Milne measure r, which does not distinguish between informative and uninformative false theories. And it is false for the Good-Fitelson measure l, which distinguishes neither between informative and uninformative true theories nor between informative and uninformative false theories. For more see Huber (2005b).

The reason c, d, and s have this property of distinguishing between informative and uninformative truth and falsehood is that they are probabilistic assessment functions in the sense of the plausibility-informativeness theory (Huber 2008b) – and the above result is true for all probabilistic assessment functions (not only those that can be expressed as expected utilities). The plausibility-informativeness theory agrees with traditional philosophy that truth is an epistemic goal. Its distinguishing thesis is that there is a second epistemic goal besides truth, namely, informativeness, which has to be taken into account when we evaluate hypotheses. Like confirmation theory, the plausibility-informativeness theory assigns numbers to hypotheses in the light of evidence. But unlike confirmation theory, it does not appeal to intuitions when it comes to the question why one is justified in accepting hypotheses with high assessment values. The plausibility-informativeness theory answers this question by showing that accepting hypotheses according to the recommendation of an assessment function almost surely leads one to (the most) informative (among all) true hypotheses (against, this can be seen as a soundness result). (The corresponding completeness result that only acceptances according to the recommendations of assessment functions almost surely lead to informative true hypotheses does not hold. For a discussion of this, see Huber 2008b, sec. 6.2.)

It is idle to speculate what Hume would have said to all this ado. Suffice it to note that his problem would not have got off the ground without our desire for informativeness.

8. References and Further Reading

  • Albert, Max (1992), "Die Falsifikation Statistischer Hypothesen." Journal for General Philosophy of Science 23, 1-32.
  • Alchourrón, Carlos E. & Gärdenfors, Peter & Makinson, David (1985), "On the Logic of Theory Change: Partial Meet Contraction and Revision Functions." Journal of Symbolic Logic 50, 510-530.
  • Carnap, Rudolf (1950/1962), Logical Foundations of Probability. 2nd ed. Chicago: University of Chicago Press.
  • Carnap, Rudolf (1952), The Continuum of Inductive Methods. Chicago: University of Chicago Press.
  • Carnap, Rudolf (1963), "Replies and Systematic Expositions. Probability and Induction. " In P.A. Schilpp (ed.), The Philosophy of Rudolf Carnap. La Salle, IL: Open Court, 966-998.
  • Carnap, Rudolf & Bar-Hillel, Yehoshua (1953), An Outline of a Theory of Semantic Information. Technical Report 247. Research Laboratory of Electronics, MIT. Reprinted in Y. Bar-Hillel (1964), Language and Information. Selected Essays on Their Theory and Application. Reading, MA: Addison-Wesley, 221-274.
  • Christensen, David (1999), "Measuring Confirmation. " Journal of Philosophy 96, 437-461.
  • Crupi, Vincenzo and Tentori, Katya, and Gonzalez, Michel (2007), On Bayesian Measures of Evidential Support: Theoretical and Empirical Issues. Philosophy of Science 74, 229-252.
  • Duhem, Pierre (1906/1974), The Aim and Structure of Physical Theory. New York: Atheneum.
  • Earman, John (1992), Bayes or Bust? A Critical Examination of Bayesian Confirmation Theory. Cambridge, MA: MIT Press.
  • Eells, Ellery (2005), "Confirmation Theory. " In J. Pfeifer & S. Sarkar (eds.), The Philosophy of Science. An Encyclopedia. Oxford: Routledge.
  • Field, Hartry (1978), "A Note on Jeffrey Conditionalization. " Philosophy of Science 45, 361-367.
  • Fitelson, Branden (1999), "The Plurality of Bayesian Measures of Confirmation and the Problem of Measure Sensitivity. " Philosophy of Science 66 (Proceedings), S362-S378.
  • Fitelson, Branden (2001), Studies in Bayesian Confirmation Theory. PhD Dissertation. Madison, WI: University of Wisconsin-Madison.
  • Fitelson, Branden (2002), "Putting the Irrelevance Back Into the Problem of Irrelevant Conjunction. " Philosophy of Science 69, 611-622.
  • Fitelson, Branden (2005), "Inductive Logic. " In J. Pfeifer & S. Sarkar (eds.), The Philosophy of Science. An Encyclopedia. Oxford: Routledge.
  • Fitelson, Branden & Hájek, Alan & Hall, Ned (2005), "Probability. " In J. Pfeifer & S. Sarkar (eds.), The Philosophy of Science. An Encyclopedia. Oxford: Routledge.
  • Gaifman, Haim & Snir, Marc (1982), "Probabilities over Rich Languages, Testing, and Randomness." Journal of Symbolic Logic 47, 495-548.
  • Gärdenfors, Peter (1988), Knowledge in Flux. Modeling the Dynamics of Epistemic States. Cambridge, MA: MIT Press.
  • Gärdenfors, Peter & Rott, Hans (1995), "Belief Revision. " In D.M. Gabbay & C.J. Hogger & J.A. Robinson (eds.), Handbook of Logic in Artificial Intelligence and Logic Programming. Vol. 4. Epistemic and Temporal Reasoning. Oxford: Clarendon Press, 35-132.
  • Glymour, Clark (1980), Theory and Evidence. Princeton: Princeton University Press.
  • Good, Irving John (1967), "The White Shoe is a Red Herring." British Journal for the Philosophy of Science 17, 322.
  • Good, Irving John (1968), "The White Shoe qua Herring is Pink." British Journal for the Philosophy of Science 19, 156-157.
  • Good, Irving John (1983), Good Thinking: The Foundations of Probability and Its Applications. Minneapolis: University of Minnesota Press.
  • Goodman, Nelson (1946), "A Query on Confirmation." Journal of Philosophy 43, 383-385.
  • Goodman, Nelson (1983), Fact, Fiction, and Forecast. 4th ed. Cambridge, MA: Harvard University Press.
  • Grimes, Thomas R. (1990), "Truth, Content, and the Hypothetico-Deductive Method." Philosophy of Science 57, 514-522.
  • Hacking, Ian (2001), An Introduction to Probability and Inductive Logic. Cambridge: Cambridge University Press.
  • Hájek, Alan (2003a), "Interpretations of Probability." In E.N. Zalta (ed.), Stanford Encyclopedia of Philosophy.
  • Hájek, Alan (2003b), "What Conditional Probability Could Not Be." Synthese 137, 273-323.
  • Hájek, Alan (2005), "Scotching Dutch Books?" Philosopical Perspectives 19 (Epistemology), 139-151.
  • Hájek, Alan & Hall, Ned (2000), "Induction and Probability." In P. Machamer & M. Silberstein (eds.), The Blackwell Guide to the Philosophy of Science. Oxford: Blackwell, 149-172.
  • Hall, Ned (1994), "Correcting the Guide to Objective Chance." Mind 103, 505-518.
  • Hawthorne, James (2005), "Inductive Logic." In E.N. Zalta (ed.), Stanford Encyclopedia of Philosophy.
  • Hawthorne, James & Fitelson, Branden (2004), "Re-solving Irrelevant Conjunction with Probabilistic Independence." Philosophy of Science 71, 505-514.
  • Hempel, Carl Gustav (1945), "Studies in the Logic of Confirmation." Mind 54, 1-26, 97-121.
  • Hempel, Carl Gustav (1962), "Deductive-Nomological vs. Statistical Explanation." In H. Feigl & G. Maxwell (eds.), Scientific Explanation, Space and Time. Minnesota Studies in the Philosophy of Science 3. Minneapolis: University of Minnesota Press, 98-169.
  • Hempel, Carl Gustav (1967), "The White Shoe: No Red Herring." British Journal for the Philosophy of Science 18, 239-240.
  • Hempel, Carl Gustav & Oppenheim, Paul (1948), "Studies in the Logic of Explanation." Philosophy of Science 15, 135-175.
  • Hilpinen, Risto (1970), "On the Information Provided by Observations." In J. Hintikka & P. Suppes (eds.), Information and Inference. Dordrecht: D. Reidel, 97-122.
  • Hintikka, Jaakko (1966), "A Tw-Dimensional Continuum of Inductive Methods." In J. Hintikka & P. Suppes (eds.), Aspects of Inductive Logic. Amsterdam: North-Holland, 113-132.
  • Hitchcock, Christopher R. (2001), "The Intransitivity of Causation Revealed in Graphs and Equations." Journal of Philosophy 98, 273-299.
  • Howson, Colin (2000a), Hume's Problem: Induction and the Justification of Belief. Oxford: Oxford University Press.
  • Howson, Colin (2000b), "Evidence and Confirmation." In W.H. Newton-Smith (ed.), A Companion to the Philosophy of Science. Oxford: Blackwell, 108-116.
  • Howson, Colin & Urbach, Peter (1989/2005), Scientific Reasoning: The Bayesian Approach. 3rd ed. La Salle, IL: Open Court.
  • Huber, Franz (2005a), "Subjective Probabilities as Basis for Scientific Reasoning?" British Journal for the Philosophy of Science 56, 101-116.
  • Huber, Franz (2005b), "What Is the Point of Confirmation?" Philosophy of Science 75, 1146-1159.
  • Huber, Franz (2008a) "Formal Epistemology." In E. N. Zalta (ed.), Stanford Encyclopedia of Philosophy.
  • Huber, Franz (2008b), "Assessing Theories, Bayes Style." Synthese 161, 89-118.
  • Hume, David (1739/2000), A Treatise of Human Nature. Ed. by D.F. Norton & M.J. Norton. Oxford: Oxford University Press.
  • Jeffrey, Richard C. (1965/1983), The Logic of Decision. 2nd ed. Chicago: University of Chicago Press.
  • Jeffrey, Richard C. (2004), Subjective Probability: The Real Thing. Cambridge: Cambridge University Press.
  • Jeffreys, Harold (1939/1967), Theory of Probability. 3rd ed. Oxford: Clarendon Press.
  • Joyce, James F. (1998), "A Non-Pragmatic Vindication of Probabilism." Philosophy of Science 65, 575-603.
  • Joyce, James F. (1999), The Foundations of Causal Decision Theory. Cambridge: Cambridge University Press.
  • Joyce, James M. (2003), "Bayes's Theorem." In E.N. Zalta (ed.), Stanford Encyclopedia of Philosophy.
  • Juhl, Cory (1997), "Objectively Reliable Subjective Probabilities." Synthese 109, 293-309.
  • Kelly, Kevin T. (1996), The Logic of Reliable Inquiry. Oxford: Oxford University Press.
  • Kelly, Kevin T. & Glymour, Clark (2004), "Why Probability does not Capture the Logic of Scientific Justification." In C. Hitchcock (ed.), Contemporary Debates in the Philosophy of Science. Oxford: Blackwell, 94-114.
  • Keynes, John Maynard (1921/1973), A Treatise on Probability. The Collected Writings of John Maynard Keynes. Vol. III. New York: St. Martin's Press.
  • Kolmogoroff, Andrej N. (1933), Grundbegriffe der Wahrscheinlichkeitsrechnung. Berlin: Springer.
  • Kolmogorov, Andrej N. (1956), Foundations of the Theory of Probability, 2nd ed. New York: Chelsea Publishing Company.
  • Koons, Robert (2005), "Defeasible Reasoning." In E.N. Zalta (ed.), Stanford Encyclopedia of Philosophy.
  • Kraus, Sarit & Lehmann, Daniel & Magidor, Menachem (1990), "Nonmonotonic Reasoning, Preferential Models, and Cumulative Logics." Artificial Intelligence 40, 167-207.
  • Kuipers, Theo A.F. (2000), From Instrumentalism to Constructive Realism. On Some Relations between Confirmation, Empirical Progress, and Truth Approximation. Dordrecht: Kluwer.
  • Kyburg, Henry E. Jr. (1961), Probability and the Logic of Rational Belief. Middletown, CT: Wesleyan University Press.
  • Lewis, David (1980), "A Subjectivist's Guide to Objective Chance." In R.C. Jeffrey (ed.), Studies in Inductive Logic and Probability. Vol. II. Berkeley: University of California Press, 263-293. Reprinted in D. Lewis (1986), Philosophical Papers. Vol. II. Oxford: Oxford University Press, 83-113.
  • Lewis, David (1994), "Humean Supervenience Debugged." Mind 103, 473-490.
  • Maher, Patrick (1999), "Inductive Logic and the Ravens Paradox." Philosophy of Science 66, 50-70.
  • Maher, Patrick (2004a), "Probability Captures the Logic of Scientific Confirmation." In C. Hitchcock (ed.), Contemporary Debates in Philosophy of Science. Oxford: Blackwell, 69-93.
  • Maher, Patrick (2004b), "Bayesianism and Irrelevant Conjunction." Philosophy of Science 71, 515-520.
  • Makinson, David (1994), "General Patterns in Nonmonotonic Logic." In D.M. Gabbay & C.J. Hogger & J.A. Robinson (eds.), Handbook of Logic in Artificial Intelligence and Logic Programming. Vol. 3. Nonmonotonic Reasoning and Uncertain Reasoning. Oxford: Clarendon Press, 35-110.
  • Milne, Peter (1996), "log[P(h|eb)/P(h/b)] is the One True Measure of Confirmation." Philosophy of Science 63, 21-26.
  • Moretti, Luca (2004), "Grimes on the Tacking by Disjunction Problem." Disputatio 17, 16-20.
  • Pearl, Judea (2000), Causality: Models, Reasoning, and Inference. Cambridge: Cambridge University Press.
  • Popper, Karl R. (1935/1994), Logik der Forschung. Tübingen: J.C.B. Mohr.
  • Putnam, Hilary (1963a), "Degree of Confirmation and Inductive Logic." P.A. Schilpp (ed.), The Philosophy of Rudolf Carnap. La Salle, IL: Open Court, 761-784. Reprinted in H. Putnam (1975/1979), Mathematics, Matter and Method. 2nd ed. Cambridge: Cambridge University Press, 270-292.
  • Putnam, Hilary (1963b), "Probability and Confirmation." The Voice of America, Forum Philosophy of Science 10, U.S. Information Agency. Reprinted in H. Putnam (1975/1979), Mathematics, Matter and Method. 2nd ed. Cambridge: Cambridge University Press, 293-304.
  • Quine, Willard Van Orman (1953), "Two Dogmas of Empiricism." The Philosophical Review 60, 20-43.
  • Quine, Willard van Orman (1969), "Natural Kinds." In N. Rescher et.al. (eds.), Essays in Honor of Carl G. Hempel. Dordrecht: Reidel, 5-23.
  • Reichenbach, Hans (1938), Experience and Prediction. An Analysis of the Foundations and the Structure of Knowledge. Chicago: University of Chicago Press.
  • Reichenbach, Hans (1940), "On the Justification of Induction." Journal of Philosophy 37, 97-103.
  • Rosenkrantz, Roger (1981), Foundations and Applications of Inductive Probability. New York: Ridgeview.
  • Roush, Sherrilyn (2005), "Problem of Induction." In J. Pfeifer & S. Sarkar (eds.), The Philosophy of Science. An Encyclopedia. Oxford: Routledge.
  • Savage, Leonard J. (1954/1972), The Foundations of Statistics. 2nd ed. New York: Dover.
  • Schulte, Oliver (2002), "Formal Learning Theory." In E.N. Zalta (ed.), Stanford Encyclopedia of Philosophy.
  • Skyrms, Brian (2000), Choice and Chance. An Introduction to Inductive Logic. 4th ed. Belmont, CA: Wadsworth Thomson Learning.
  • Spohn, Wolfgang (1988), "Ordinal Conditional Functions: A Dynamic Theory of Epistemic States." In W.L. Harper & B. Skyrms (eds.), Causation in Decision, Belief Change, and Statistics II. Dordrecht: Kluwer, 105-134.
  • Stalker, Douglas F. (ed.) (1994), Grue! The New Riddle of Induction. Chicago: Open Court.
  • Thau, Michael (1994), "Undermining and Admissibility." Mind 103, 491-504.
  • van Fraassen, Bas C. (1984), "Belief and the Will." Journal of Philosophy 81, 235-256.
  • van Fraassen, Bas C. (1995), "Belief and the Problem of Ulysses and the Sirens." Philosophical Studies 77, 7-37.
  • Vineberg, Susan (2005), "Dutch Book Argument." In J. Pfeifer & S. Sarkar (eds.), The Philosophy of Science. An Encyclopedia. Oxford: Routledge.
  • Vranas, Peter B.M. (2004a), "Have Your Cake and Eat It Too: The Old Principal Principle Reconciled with the New." Philosophy and Phenomenological Research 69, 368-382.
  • Vranas, Peter B.M. (2004b), "Hempel's Raven Paradox: A Lacuna in the Standard Bayesian Solution." British Journal for the Philosophy of Science 55, 545-560.
  • Woodward, James F. (2003), Making Things Happen. A Theory of Causal Explanation. Oxford: Oxford University Press.

Author Information

Franz Huber
Email: franz@caltech.edu
California Institute of Technology
U. S. A.

Evolutionary Epistemology

Evolutionary Epistemology (EE) is a naturalistic approach to epistemology and so is part of philosophy of science. Other naturalistic approaches include sociological, historical and anthropological explanations of knowledge. What makes EE specific is that it subscribes to the idea that cognition is to be understood primarily as a product of biological evolution. What does this mean exactly? Biological evolution is regarded as the precondition of the variety of cognitive, cultural, and social behavior that an organism, group or species can portray. In other words, biological evolution precedes (socio-)cultural (co-)evolution. Conversely, (socio-)cultural (co-)evolution originates as a result of biological evolution. Therefore:

  1. EE studies the origin, evolution and current mechanisms of all cognitive capacities of all biological organisms from within biological (evolutionary) theory. Here cognition is broadly conceived, ranging from the echolocation of bats, to human-specific symbolic thinking;
  2. Besides studying the cognitive capacities themselves, EE investigates the ways in which biological evolutionary models can be used to study the products of these cognitive capacities. The cognitive products studied include, for example, the typical spatiotemporal perception of objects of all mammals, or more human-specific cognitive products such as science, culture and language. These evolutionary models are at minimum applied on a descriptive level, but can also be used as explanations for the behavior under study. In other words, the cognitive mechanisms and their products are understood to be either comparative with, or the result of, biological evolution.
  3. Within EE it is sometimes assumed that biological evolution itself is a cognitive process.

Table of Contents

  1. Overview
  2. Context of Use
    1. EE and Selection Theory
    2. The EEM and EET Program
  3. EE and Naturalized Epistemology
  4. Different EEs: The Units and Levels of Selection Debate
  5. The Environment, the Adaptationist Program and Traditional EE
    1. The Adaptationist Program
    2. Traditional EE
      1. Karl Popper
      2. Konrad Lorenz
      3. Donald Campbell
      4. Stephen Toulmin
      5. Peter Munz
  6. Evolution from the Point of View of the Organism
    1. The Constructivist Approach
    2. The Non-Adaptationist Approach within EE
  7. Evolution from the Point of View of Genes
  8. Universal Selection Mechanisms Repeated and Extended
    1. Lewontin’s "Logical Skeleton" of Natural Selection
    2. Universal Darwinism
    3. Blind Variation and Selective Retention
    4. Universal Selectionism
    5. Replication, Variation and Environmental Interaction
    6. Generate-Test-Regenerate / Replicator-Interactor-Lineage
    7. Universal Symbiogenesis
  9. References and Further Reading

1. Overview

A general account of the meaning and history of the term “evolutionary epistemology” is given in sections 1 and 2 below. It is important to understand in advance that different kinds of evolutionary epistemology (EE) can be distinguished, but all forms share the following assumption: that cognition –to a greater or lesser extent- needs to be studied from within evolutionary theory. Disagreements arise about:

  1. where to draw the line between the cognitive and the non-cognitive,
  2. which aspects of cognition should be studied from within evolutionary theory, and
  3. which aspects of evolutionary theory should apply to the study of cognition.

Evolutionary theory itself is far from synonymous with the theory of evolution by natural selection. Rather, heterogeneous views on evolution arise when one takes the units and levels of selection debate (sections 3 through 6) as points of departure. Different perspectives on evolution emerge when one looks at evolution from the point of the environment (section 4), the organism (section 5), and genes (section 6). The development of different EEs parallels this perspectivism. That is, based on these different viewpoints, different EEs have been put forward. The adaptationist approach to evolution is the basis of traditional EE. Non-adaptationist approaches to EE have been based on the constructivist approach to evolution. The “gene’s eye view” of evolution has resulted in a quest for universal evolutionary epistemological mechanisms.

2. Context of Use

The concept “evolutionary epistemology” was first introduced by Donald T. Campbell (1974). However, he repeatedly refused to be called the founding father of EE since he saw himself as denoting “… something that has sprung up all over for a hundred years or more” (Campbell in Callebaut, 1993: 289).

If EE were to have a motto, it might come from Michael Ruse’s (1988) famous book title Taking Darwin Seriously. This means that when one adheres to an evolutionary view of life, one needs to understand all biological processes not only as the outcome of evolution, but also as something that can only be investigated adequately by making use of evolutionary theory.

Evolutionary epistemology understands epistemology to be a product of biological evolution. Therefore, epistemology is studied from within evolutionary biology. Cognition is no longer understood to be linguistic (propositional) or a human-bounded characteristic. Rather, all organisms can show behavior that is cognitively based.

Hence, the first major quest of evolutionary epistemologists is distinguishing between the different cognitive processes that biological organisms from all major kingdoms of life can display.

Second, they investigate how these cognitive capacities evolved from unicellular organisms onwards.

Third, the products of cognition (on the one hand, the perception of light, or color, on the other hand, science, culture and language) are understood from within an evolutionary approach.

The use of biological theories and mechanisms to comprehend cognition is either meant to be descriptive or explanatory. In this context, Ruse (1988: 32) differentiates between an “analogy-as-heuristic” and an “analogy-as-justification.” The former term refers to using metaphors and analogies from evolutionary theory to describe, for example, the evolution of science loosely and to discover new approaches to research. The latter research strategy involves applying evolutionary analogies to justify and thus to validate such things as the evolution of science.

In sum, the underlying view of EE is thus that there is a universal evolutionary mechanism that lead first to the evolution of life in general, and second, that this mechanism is also at work within the evolution of cognition, and within the products of cognition such as language, science and culture.

Some evolutionary epistemologists such as Campbell (1974), therefore also assume that this evolutionary mechanism in its own workings portrays an evolutionary mechanism. This concept will be discussed later.

The concept “EE” today is commonly used as a synonym for selection theory on the one hand, and, on the other hand, as part of the EEM and EET program.

a. EE and Selection Theory

EE has strong affinities with selection theory (Campbell, 1997). The latter is a theory that adheres to the view that all and only selectionist —as opposed to instructionist (behaviorist) — explanations of an organism’s traits (including cognitive ones) are valid. Behaviorist explanations state that it suffices to describe the visible, external behavior that an organism portrays in order to develop adequate explanations of that behavior. Selectionist accounts, by contrast, also examine internal elements that underlie a certain trait (such as genes, for example) and the evolutionary emergence of that trait. The term selection theory was first introduced by Simmel and Baldwin in the 19th century (Campbell, 1997). Today, however, a wide range of biologists, neurologists, and evolutionary epistemologists are selectionists (for an example, see Cziko 1995), but these scholars do not recognize or accept any direct influence of Simmel and Baldwin’s selection theory.

Throughout this article, the more general term EE is maintained. The reason is twofold. First, not all topics that are investigated by selectionists are relevant for the study of cognition. Second, not every Evolutionary Epistemologist defends a solely selectionist account of cognition. Rather, other evolutionary principles such as self-organization, for example, are also included to comprehend (the products of) cognition (as will be discussed in section 5). Finally, analogies are not only drawn between evolutionary theory and the evolution of science and knowledge. Culture, language, economics etc. can also be interpreted from within these evolutionary epistemological frameworks.

b. The EEM and EET Program

A useful distinction within EE is made by Bradie (1986). Two different programs are identified, the EEM and the EET program. Within the Evolution of Epistemological Mechanisms, the evolution of cognition and cognitive knowledge mechanisms is investigated from within the Modern Synthesis. The Modern Synthesis is the standard paradigm within evolutionary biology on how evolution occurs. This is based on the principle of evolution by natural selection as first introduced by Charles Darwin.

Furthermore, the products of cognitive evolution, such as language, science, or culture, are also understood to be the result of biological evolution, and it is assumed that in their emergence or structure an evolutionary pattern can also be found. The following example can illustrate this: the evolution of language or culture is at least partly the result of biological evolution. Hence, the same evolutionary mechanisms that are used to describe the evolution of cognition are also applicable to the products of cognition, such as language or culture. The EET program (Bradie, 1986) was introduced specifically for epistemological or scientific theories. The ways in which analogies are drawn between the evolution of science on the one hand and natural selection on the other are investigated within Evolution of Epistemological Theories.

Different evolutionary epistemologists are active within the above mentioned various fields and within extra-philosophical scientific fields, which makes it difficult to pinpoint the common assertions made by all evolutionary epistemologists. Adherents of an EEM position, for example, can object to the widely subscribed idea that science also needs to be explained from within evolutionary epistemology, as adherents of the EET program state. What binds evolutionary epistemologists is the idea that evolutionary theory, to some extent, can explain aspects of cognition.

3. EE and Naturalized Epistemology

What is so different about EE that it can be distinguished from all other epistemological endeavors? To answer this question, we need to first situate, and secondly, evaluate EE in relation to other philosophical frameworks.

EE is part of the naturalistic turn. The naturalistic turn itself is a larger movement that emphasizes the importance of a sociology of knowledge, anthropology of knowledge, and the historical study of knowledge. Evolutionary Epistemology in turn emphasizes the importance of the biology of knowledge. More specifically, the study of biological evolution is the precondition of all investigations into cognition (Wuketits, 1984: 2-19). Therefore, it explains evolution itself as a cognitive process.

Furthermore, within EE, knowledge and cognition are no longer conceived of as necessarily proposition-like or language-like or human-bounded. As such, EE stands opposed to traditional philosophical approaches to cognition (such as empiricist and rationalist ones that understand knowledge to be language-like), and it also goes beyond Quine’s Naturalized Epistemology. In order to understand this, first naturalized epistemology is briefly discussed and then the difference with EE is explained.

Naturalized epistemology was first introduced by Quine (1969), who stressed that the study of science and scientific thinking should revolve around how knowledge is processed, rather than what knowledge is in itself. Therefore, he emphasized that we should reject the idea of a first philosophy. Within a first philosophy, it is assumed that philosophy can make claims about science without using the sciences. If one would make use of the sciences, this would be understood as circular. Quine, however, stressed that we should investigate epistemology from within the natural sciences, more specifically, psychology:

The stimulation of his sensory receptors is all the evidence anybody has had to go on, ultimately, in arriving at his picture of the world. Why not see how this construction really proceeds? Why not settle for psychology?” (Quine, 1969: 269-70) […] [A]t this point it may be more useful to say that epistemology still goes on, though in a new setting and a clarified status. Epistemology, or something like it, simply falls into place as a chapter of psychology and hence of natural science. (Quine 1969: 273-4)

Epistemology is defined as that discipline which studies exactly how our sense organs construct a picture of the world. The study of knowledge involves the investigation into (1) the relation between neural input and observational sentences, and (2) an investigation into the relation between theoretical and observational sentences. Hence, according to Quine, knowledge, or more specifically, cognition, is still understood to be language-like: it is assumed that somehow our neural input is transformed into verbal output. A rather behaviorist position is taken by Quine, because the study of how our neurological abilities relate to language is not assessed. Somehow the relation between sensory input and language is assumed to be direct.

Neurology today, however, has shown us multiple times that at the neurological or cognitive level, there is no direct, and certainly no necessary relation between our categorizations and our language (Changeaux 1985; Gazzaniga 1994, 2000; Damasio 1996 and 1999; Ledoux 1998).

Furthermore, because of the rise of ethology and ecology (the study of the external behavior of animals in relation to their natural settings), cognition as a scientific concept has been broadened to include non-linguistic behavior as well.

It is here that evolutionary epistemology makes its entrée. Konrad Lorenz (1958) for example, was one of the founding fathers (together with Nikolas Tinbergen) of ethology. Lorenz stressed the importance of a cognitivist approach of behavior, hereby also including internal behavior.

In contrast to Naturalized Epistemology, EE does not only examine the relation between human, language-like knowledge and the world. Any type of relation that an organism engages in with its environment is understood as a knowledge relation, irrespective of whether or not these organisms have language.

Munz (2001: 9) points out that what makes EE unique is that knowledge is comprehended as a cognitive relation between an organism and its environment. Empiricists for example understood knowledge to be a relation between a knower and something knowable by induction, while rationalists define knowledge as a relation between a knower and something known because of deduction. Even within the sociology of knowledge movement, knowledge is not understood from within its relation between an organism and its environment, rather here it is comprehended as a relation between different knowers.

What makes EE different from all other naturalistic approaches within philosophy, is that it does not regard epistemology as a mere study of how a human knower comes to know what is knowable. Rather EE studies how knowledge about the environment is gained across different species, and what knowledge-gaining mechanisms arise in biological organisms through time enabling these organisms to cope with their environment. This means that within EE not only human cognition but all sorts of behavior that organisms at all levels in biological evolution display (ranging from instinctive behavior to cultural behavior or even chemotaxis – that is to say, communication between cells) are regarded as devices that are put to use to gain knowledge. And equally important, these mechanisms themselves are also comprehended as knowledge in and of themselves.

Within EE, contrary to behaviorism, internal factors that determine behavior and cognition are also included. Because biological evolution led to the rise and acquisition of different cognitive/knowledge processes, this evolution itself is explained as a knowledge process.

4. Different EEs: The Units and Levels of Selection Debate

The units and levels of selection debate is taken as the point of departure to distinguish between different types of EE. EEs draw on evolutionary theory to explain epistemology or cognition. However, there are disagreements on what evolution in general is. Therefore different, sometimes complementary evolutionary theories are put forward by evolutionary biologists. It is only logical that this results in different evolutionary epistemologies. Three different perspectives are described to understand evolution and the different EEs that arise when using these perspectives:

  1. Evolution from the point of view of the environment, which lead to traditional, adaptationist approaches to EE;
  2. Evolution from the point of view of the organism, which lead to non-adaptationist, constructivist approaches; and
  3. Evolution from the point of view of genes, which opens the quest for universal selection formulas.

How did the units and levels of selection debate get started?

The Modern Synthesis (Ayala, 1978, Maynard-Smith, 1993, Mayr, 1978), which is the standard paradigm on how biological evolution occurs, states very strictly that the phenotype (the visible organism) is the unit of selection. This phenotype either is selected at the level of the environment, if this visible organism is adapted to that environment, or the organism dies out, if it is maladaptive.

With the rise of Postneodarwinian theory on the one hand, and Systems Theory on the other, the debate over the units and levels of selection was introduced first in biology, and later within evolutionary epistemology. In this discussion the primary question asked is whether there are units and levels of selection other than the phenotype and the environment. The concept “units of selection” was coined by the biologist Richard Lewontin in his famous homonymous article of 1970. The concept “levels of selection” was introduced by Robert Brandon in his 1982 article by the same name. However the discussion dates back to scientific debates concerning the possibility of group selection in the 1960s between William Hamilton (1964) and George C. Williams (1966, chapter 4), and still even further back in time to the 19th century when Herbert Spencer introduced and applied the “survival of the fittest” idea to human populations and society.

5. The Environment, the Adaptationist Program and Traditional EE

a. The Adaptationist Program

The concept “adaptationist program,” was first introduced by Gould and Lewontin (1979) -- but is not subscribed to by these authors themselves. The adaptationist program regards “ […] natural selection as so powerful and the constraints upon it so few that direct production of adaptation through its operation becomes the primary cause of nearly all organic form, function, and behavior” (Gould and Lewontin, 1979:584-5).

To understand this, the distinction between ontogeny (the development of an organism from conception until death) and phylogeny (the evolution of species) is in order. Within Lamarckian theory, no strict separation between ontogenetic and phylogenetic processes is adhered to. Within this paradigm, also known as the inheritance of acquired characteristics, traits acquired during the lifetime of an individual can be passed on immediately to the next generation.

With the introduction of Darwin’s principle of natural selection, for the first time in history it was possible to distinguish between ontogenetic and phylogenetic processes, because of the distinction that is made between the inner and the outer world of the organism (Lewontin: 2000: 42-3). The inner milieu of the organism is, according to Darwin, subjected to, amongst other things, developmental growth processes that are not themselves subjected to evolution by natural selection. The outer environment, by contrast, is the sole scene where evolution by natural selection occurs. Here the environment either does or does not select an organism. Regarding the inner milieu of the organism, Darwin himself quite often made use of Lamarck’s theory. He used it as an explanation for how novel individual variation arises. Natural selection was never interpreted by Darwin as being the cause of the variation; in fact, he did not know how variation occurred. Therefore, he invoked Lamarck’s principle of the inheritance of acquired characteristics. Natural selection only selected amongst the given variation.

These ideas were later incorporated into the Modern Synthesis. Organisms vary. This variety is the result of, on the one hand, the specific combinations of genetic material that an organism carries, and on the other hand, possible random mutations that occur within these genes. One acquires the genetic material that one carries through birth, thus no child can choose its specific genetic code. And, the genetic mutations that sometimes occur, occur randomly, they are blind. That is to say, mutations are random errors that occurred during the copying of this genetic material. The genetic material that one carries can be neutral, adaptive or maladaptive for the carrier in the “struggle for existence.” The point, however, is that from this perspective, the organism itself cannot by any means whatsoever influence the genetic material that it carries. Eventually, it is the environment that indirectly selects adaptive organisms through the elimination of the unfit. Thus, the Modern Synthesis views this selection process as taking place between the phenotype and the environment. And the selection process itself is said to occur only externally: the “level of selection” is the external environment, and the selection of the “unit of selection,” the organism, occurs independently of internal processes such as developmental growth.

ev-ep-diagram1

Figure 1. The adaptationist approach focuses on the external relation
between the environment and the organism.

Thus, within the adaptationist approach the organism and the environment are conceived as two separate entities that only interact during the selection process but develop independently from one another (fig. 1).

Adaptation is literally the process of fitting an object to a pre-existing demand… Organisms adapt to the environment because the external world had acquired its properties independently of the organism, which must adapt or die. (Lewontin, 2000: 43)

In other words, Neodarwinian theory adheres to a strict dualistic viewpoint (Gontier, 2006) between organism and environment: the organism is passively selected, or not, by an active environment. The organism cannot influence its chances of survival or fitness. For this reason, according to Lewontin (1978), one can defend the position that because of the emphasis these scholars lay on adaptation, Neodarwinians explain evolution from the point of view of the environment. Hence, they actually give a description of the environment through the organism, rather than describing the organism itself.

b. Traditional EE

It is the latter position that has been one of the basic tenets of traditional EE, namely, that one is able to gain knowledge about the environment by studying the organisms that live in it, because organisms literally “re-present” the outer world.

What does this mean? Logical empiricism failed in providing a non-arbitrary relation between the world and human language. However, the search for such a non-arbitrary relation between the outer world and the organisms that inhabit that world was continued from within the adaptationist approach. In this position it is assumed that there is an unchangeable outer world to which organisms adapt. If it is true that organisms are adapted to the outer world, and that all and only the fit survive and reproduce in the long run, then these adaptive organisms can tell us something about that environment. An ant, for example, can tell us something about the soil.

This section provides an overview of the major traditional evolutionary epistemologies and how they developed out of the adaptationist view of evolution.

i. Karl Popper

Beginning with Sir Karl Popper’s (1963) ideas concerning conjectures and refutations (also called trials and errors), the following position is defended within traditional EE: there is a growth of (scientific) knowledge which is comparable with the succession of adaptations in evolution. The task of EE thus becomes explaining this growth.

Adhering to the strict distinction made between ontogeny and phylogeny, it is argued that at no stage during evolution does an organism receive knowledge from the outer world. Bold conjectures are made about the outer world and if these hypotheses are not falsified by experiments performed by the scientific community, they survive. In the long run, unfit theories are eliminated by the process of falsification, and there is a growth in knowledge. Theories that survive longer than others are understood to tentatively corroborate the truth. The analogy with biological evolution is clear: a selectionist account is preferred over an instructionist one. This means that at no point does an organism choose its genetic endowment. However, if this organism, with the genetic endowment that it is born with, stands the test of the environment, that is, if it survives long enough so that it can reproduce, than the organism‘s genetic traits survive, and it is said that it is adapted to its environment. In the long run, only the fit survive; maladaptive organisms are not able to survive long enough to reproduce and spread their genes in the gene pool again, and therefore die out.

Thus, just as the Modern Synthesis stresses that an organism can by no means directly receive instructions from the environment, Popper (1963: 46) emphasizes that we force our interpretations upon the world prior to our observations: “Without waiting, passively, for repetitions to impress regularities upon us, we actively try to impose regularities upon the world.” These are the conjectures that are put forward for trial, to be selected or eliminated according to the test-results. Scientific theories are thus not the result of observations, but of wild hypotheses. Although Popper himself is not part of the field of EE, his work on conjectures and refutations is often regarded as a first account on EE.

ii. Konrad Lorenz

Konrad Lorenz is also a representative of traditional EE, since he too worked within the adaptationist program. Lorenz (1941, 1985) is famous for reinterpreting Kant's synthetic a priori claims. No longer are the inborn categories regarded as evidently true, rather, they are understood to be “ontogenetically a priori and phylogenetically a posteriori.” This means that an individual organism is born with innate dispositions. These innate dispositions are acquired phylogenetically, through the evolution of the species, by means of the mechanism of natural selection. Most importantly, these dispositions are fallible, because they are the result of selection, not instruction. That is, these dispositions are adaptations, and natural selection only weeds out maladaptive organisms, which results in the survival of the adaptive ones. According to the Modern Synthesis, at no time in evolution does natural selection actually cause or create the adaptive traits that are presented to the environment (again because of the strict distinction made between ontogeny, where natural selection does not work, and phylogeny, where it does.

According to Lorenz, and contrary to Kant, the thing in itself (Das Ding an Sich) is knowable through the categories of the knower, not the characteristics of the thing in itself, and selection results in a partial isomorphism through adaptation. Lorenz states that:

The central nervous apparatus does not prescribe the laws of nature any more than the hoof of the horse prescribes the form of the ground. Just as the hoof of the horse, this central nervous apparatus stumbles over unforeseen changes in its task. But just as the hoof of the horse is adapted to the ground of the steppe which it copes with, so our central nervous apparatus for organizing the image of the world is adapted to the real world with which man has to cope. (In Campbell, 1974: 447)

Thus, through adaptation, there is a correspondence between our images of the world and the world in itself, or between organism and environment, or between theories and the world. This is of course not a 1-to-1 correspondence; our image of a tree is not like a real tree, but because our cognitive apparatus is adapted to the world, there is a partial isomorphism between the two. Adaptations thus become a description of the world in a biological language (Lorenz, 1977).

The reinterpretation of Kant’s synthetic a priori claims is not solely the work of Lorenz; rather it dates as far back as Herbert Spencer. For the most complete overview of authors who have reinterpreted Kant’s ideas in this way, see Campbell (1974).

iii. Donald Campbell

Donald T. Campbell (1974) goes one step further than Lorenz because he rethought the distinction between ontogeny and phylogeny. No longer is natural selection something that solely works on the level of the environment; natural selection is internalized as well. Furthermore, the mechanism of natural selection, in its own workings, is said to work selectively as well.

Campbell’s (1959: 153-5) main goal was to develop an empirical science of induction (not to be confused with behaviorist instruction; see section 1). This empirical science consisted of a comparative study of the psychology of knowledge, a biological science of cognition, a sociology of knowledge, and a science of history. In other words, he wanted to build a science of science, which Campbell (1974) termed EE. This discipline had to be compatible with evolutionary biology and social evolution (Campbell, 1974: 413). In his 1959 paper he characterized biology as the study of “progressive adaptation.” Therefore, he made an abstraction of the mechanism of natural section by introducing the blind-variation-and-selective-survival mechanism (Campbell, 1959). Later he would call it the blind-variation-and-selective-retention scheme (Campbell (1960).

Campbell’s (1959: 156-8) EE is based upon six philosophical assumptions:

  1. Hypothetical realism: EE acknowledges as a hypothesis the existence of an external world where entities exist and processes occur. This differs from Popper’s critical realism in that the existence of the world in itself also needs to be proven through observation.
  2. No first philosophy: EE rejects the idea of a first philosophy, subscribing rather to the view that knowledge needs to be explained using scientific knowledge.
  3. No distinction between human beings and animals is adhered to. On the contrary, it is fully acknowledged that human beings are animals.
  4. EE is an “epistemology of the other one” as Campbell (1974: 448) calls it. This means that EE raises the question of how organisms come to know, not how a knower acquires knowledge. That is to say, it studies the relationship between an organism’s cognitive capacities and the environment that it is selected to cognize.
  5. Epistemological dualism: there is a difference between what is knowable and what is known. Knowledge always constitutes indirect and fallible constructions that never completely correspond with the thing in itself.
  6. Perspectivism: each of the different hypotheses that are formed provides another perspective. These can partially overlap, but also differ from one another. In the latter case, different positions can be regarded as equal.

According to Campbell, science was only one aspect of a general knowledge process and this process was hierarchical in nature. Knowledge is no longer merely language-like and human bounded. On the contrary, different biological and social layers can be distinguished which, each on its own, encompasses a different aspect of knowledge. And here too, the focus lies on the acquisition and growth of knowledge.

In his 1959 article, Campbell distinguishes between 12 knowledge processes. These include machines on the one hand, but also bisexuality, heterozygosis, and meiotic cell division, on the other. In his 1960 article Campbell discusses creative thinking as a separate learning process.

Finally, in his 1974 article he distinguishes ten different levels that are applicable to biological and social evolution. This is the last and most canonized hierarchy that Campbell (1974: 422-435) introduced and it are these ten levels that are now discussed.

(i) Non-mnemonic problem solving

Organisms that engage in non-mnemonic problem solving do not have a memory. Bacteria, for example, are such organisms. They blindly search for food until they find it: they cannot remember previous food sources, and they cannot voluntarily go to one. They are just swept away by the wind.

(ii) Vicarious locomotor devices

Examples are the echolocation of bats, or a blind man’s cane. They replace the blind exploration of the surrounding space by trial and error movements.

(iii) Habit and (iv) Instinct

Habit, instinct, and visual diagnosis are all closely related to each other, according to Campbell. Both instincts and habits are mostly founded upon visual stimuli that trigger a learned or innate response. Innate knowledge does not represent innate ideas; rather it corresponds to expectations or hypotheses that have no prior validity. Therefore, the distinction between “primitive instincts” and “learned habits” is false: all instincts are fine-tuned by learning processes and all learning makes use of inborn knowledge mechanisms. And both are hypotheses that need to be tested. Furthermore, Campbell introduces the popular habit-to-instinct view of his time, namely that by means of natural selection, habits will become instincts (without explaining how this takes place).

(v) Visually supported thought

This can be thought of as insightful problem-solving. Organisms endowed with this knowledge process are able to perform insightful behavior when they can visually perceive their surrounding environment. Campbell offers as an example the Köhler experiments, where primates are capable of showing some kind of “aha” experience.

(vi) Mnemonically supported thought

Organisms with memory capacities can re-present the environment, thereby replacing the need for a constant visually perceivable environment. Because one can imagine the environment, one can also have creative and intelligent thoughts, of unseen or unexperienced things (such as a mermaid).

(vii) Socially vicarious exploration: observational learning and imitation

Trial and error exploration by one member of the community can replace the trial and error exploration by all the other members of society. This is because certain organisms are able to learn by observing others. Imitating other’s behavior reduces the possibility that each individual on its own needs to invent a certain behavior. This implies that we live in a shared world; a solipsistic view is impossible. Campbell also stresses that learned behavior cannot jump from brain to brain; rather it needs to be learned in turn by trial and error. So a memetic position is not feasible in Campbell’s view.

(viii) Language

Language overlaps with (vi) and (vii) and is broadly conceived as including human language but also other communication systems such as bee language and pheromones. With language, the environment is represented by words that are contingently chosen (they don’t necessarily correspond with the world; the relation is indirect). Language acquisition too, does not merely encompass the direct passing on of words to children. Children, through trial and error, learn to correctly use the words they hear to describe certain objects and/or events, which again implies a strictly behavioristic model.

(ix) Cultural transmission

Changes in technology and culture also represent a blind variation and selective retention scheme. Complete social organizations or either selected or not and their respective leaders replace the behavior of the members of the community.

(x) Science

Science is part of cultural evolution. And also science reveals a trial and error pattern.

Many of the above mentioned knowledge mechanisms that Campbell introduced are today further divided or re-defined. Nevertheless it was Campbell who for the first time in history so clearly distinguished between different knowledge processes. Thus he showed that knowledge is not to be understood in a uniform manner.

Campbell’s more general blind-variation-and-selective-retention scheme, that is supposed to run through all levels of the hierarchy, is still applied today.

All increases in knowledge or adaptivity are an inductive process, and adaptivity is also comprehended as knowledge (Campbell, 1960). This differs from an instructionist process, because at no time is the organism a blank slate that is written upon by the environment. While natural selection does not cause blind variation, in a way it does cause indirect selective retention, through the elimination of the unfit. “At no stage has there been any transfusion of knowledge from the outside, not of mechanisms of knowing, nor of fundamental certainties.” (Campbell, 1974: 413). Therefore, according to Campbell (1960: 380-381):

  1. All knowledge-gaining-processes or inductive achievements are the result of a blind-variation-and-selective-retention scheme. The latter is thus a universal schema or heuristic that can account for the evolution of these different processes.
  2. Furthermore, within the course of evolution, one can distinguish between many later-evolved processes that shortcut full blind-variation-and-selective-retention processes. Vision, for example, shortcuts blind trial and error locomotion. Such new mechanisms are also inductively achieved (by natural selection). The process by which these inductively achieved mechanisms shortcut and accelerate earlier mechanisms is called vicarious selection. This concept is derived from the Christian vicar, because such shortcuts substitute earlier mechanisms in a way that a vicar substitutes God. What is important is that knowledge mechanisms that are acquired later are (again because they are inductively achieved) not necessarily more accurate; they are only more efficient (Campbell, 1959: 162). These shortcuts themselves evolved through a process of blind-variation-and-selective-retention. And later stages partly determine earlier stages of knowledge processes which Campbell (1974) termed downward causation.
  3. Finally, these shortcuts have not only evolved by blind-variation-and-selective-retention. In the operation of these shortcuts, a blind-variation-and-selective-retention process can also be detected. Thus it is Campbell who is the first to state clearly that not only does a selection process lie at the basis of evolution, but also that this selection process itself adheres to such a selection process.

In his 1995 article (published posthumously in 1997 by Heyes and Frankel), Campbell rejected his earlier ideas about treating adaptations as knowledge and he restricted knowledge to be those vicarious selectors. In fact, the whole adaptationist approach became more and more problematic to Campbell (1987: 140) in his later writings and he started to emphasize that Panglossian adaptationism needs to be avoided at all times within EE. Retention is equally important, just as variation and selection are, especially when science is concerned.

iv. Stephen Toulmin

Specifically regarding scientific thinking, especially in the works of Stephen Toulmin (1972), a strong analogy is drawn with natural selection. Ideas and concepts are the results of scientific thinking and these are, by analogy with the gene pool, introduced into the pool of scientists through science journals, conferences, books etc., leading to the rise of competition between different ideas. Only the fittest ideas survive while the less fit die out. However, this “fitness” is not solely the result of the scientific value of the idea; other factors enter into the equation. For example, sociological reasons are included as causal factors for why an idea is or is not rejected.

v. Peter Munz

Peter Munz, another author working within the adaptationist program, calls his version of EE, “Philosophical Darwinism” (2001). Contrary to the previous authors discussed, Munz states that even variation, which is normally conceived of as being blind (the result of random mutations and genetic recombinations), is the result of a selective process. Inspired by the works of Popper, he goes so far as to state that organisms are “embodied theories,” and theories are “disembodied organisms.”

According to Munz (2001: 151-160), every organism is a theory about its environment. That is, an organism primarily gives knowledge about the environment. Moreover, an organism can be regarded as a definition of that environment. An organism mirrors its environment because of selective adaptation. Therefore, an organism literally becomes a not yet falsified theory of a certain aspect of the environment, its Umwelt/niche, and thus it becomes a provisionally true hypothesis. A theory/organism — the two are synonymous in Munz’s view — has certain expectancies about its environment, and if these are met, then the organism/theory survives; if not, the organism/theory is falsified. The longer an organism/theory survives, the more truth is approximated.

The behavior of a fish and the functioning of a theory of water are exactly identical. The fish represents water by its structure and its functioning. Both features define an initial condition (for example, the degree of viscosity of water) which, when spotted or sensed, trigger off a prognosis or behavioral response which, in case of a fish, fails to be falsified. By contrast, a bird does not represent water. (Munz, 2001: 155) .

Thus, an organism is an embodied theory about its environment. An organism re-presents that part of the world that it is adapted to and this representation is thus no longer verbal or conscious. Embodied theories, according to Munz, are also no longer expressed in language, but in anatomical structures or reflex responses, etc.

Besides regarding organisms as embodied theories, theories become disembodied organisms in Munz’s view. A human being is both because it possesses linguistic knowledge. Linguistically expressed theories, according to Munz (2001: 160-8), are also the result of a process of variation and selective retention. Here too, linguistically expressed theories are literally organisms. In the wake of Popper, Munz stresses that theories should be reified. Linguistic theories are built up from language, and there exists no causal link between this language and the causal impact that the world has upon the non-linguistic body. Therefore, language and consciousness create uncertainty: expressions can only be hypothetical. In addition, at first language appears to be maladaptive, since it delays non-linguistic, embodied responses. Nevertheless, such expressions are adaptive as well, because they enable variation. Selection can only work when there is variation which it can select from, and therefore, for Munz, the growth of scientific linguistic knowledge is possible.

In contrast to previous adaptationist EEs, according to Munz, this variation is also the result of selectionist processes. Eventually, Munz (2001: 184) stresses that his theory results in an antropic principle. With the origin and evolution of life, the world represents itself, onto itself, through disembodied organisms and embodied theories. Contrary to physics, it is biology that can give us a valid picture of how the world is.

In summary, within traditional evolutionary epistemological accounts, the strict distinction between phenotype and environment, as put forward by adherents of the Modern Synthesis, is adhered to. This leads to the possibility that one can gain knowledge about the environment by studying organisms that are adaptive to that environment. Thus, within this tradition it is assumed that organisms can provide a non-arbitrary relation, not between language and the outer world, but between whole organisms (their bodies) and the outer world. This position however encounters problems when one takes an organismic point of view, a position that will be discussed in the next section.

6. Evolution from the Point of View of the Organism

When evolution is regarded from within an organismic point of view, a constructivist account emerges which in turn leads to the non-adaptationist approach within EE. Therefore, first the constructivist approach is examined. Secondly, the elements that are subtracted from this approach for the development of the non-adaptationist approach to EE are outlined.

a. The Constructivist Approach

Following Lewontin and Gould’s critical review of the adaptationist program, evolutionary theory was interrogated from less adaptationist perspectives as well. Opposed to the strict adaptationist account, the systems theoretical approach defends the following constructivist position.

…[T]he claim that the environment of an organism is causally independent of the organism, and the changes in the environment are autonomous and independent of changes in the species itself, is clearly wrong. It is bad biology, and every ecologist and evolutionary biologist knows that it is bad biology. The metaphor of adaptation, while once an important heuristic for building evolutionary theory, is now an impediment to a real understanding of the evolutionary process that needs to be replaced by another. Although all metaphors are dangerous, the actual process of evolution seems best captured by the process of construction. (Lewontin: 2000: 48)

Instead of portraying organisms as passive elements that are subjected to selection, Lewontin (2000: 51-64) introduces a more constructivist approach to evolution in which five different aspects of the organism-environment relation are distinguishable.

  1. Organisms partly determine by themselves which elements from the external environment belong to their environment or niche, and they determine to a large extent how these different elements relate to one another. A shrub, for example, can be part of the habitat of a butterfly, while a tree is not.
  2. Organisms not only largely choose what is part of their environment; they also literally construe the environment that surrounds them. This process is called niche construction. Beavers, for example, build their own dams.
  3. Furthermore, organisms constantly change their environment in an active manner; every act of consumption is an act of production. The first photosynthetic organisms, for example, changed earth dramatically from an oxygen-low to an oxygen-rich planet.
  4. Through time, organisms learn to anticipate the external conditions that the environment provides. For instance, according to certain environmental conditions, certain chordates are able to switch from a sexual to an asexual form. Other organisms hoard food for the winter.
  5. Finally, according to Lewontin, organisms modify signals that are coming from their surrounding by their biological build-up. That is to say, they modify external signals into internal signals to which their bodies are able to react. For example, if the external temperature rises, the molecules that form the organisms do not start to tremble. Rather, an internal signal in the brain will lead to the release of certain hormones that cool the body down so that it does not get overheated.

Hence, from within the systems theoretical approach, the relation between an organism and its environment is understood from within a dialectical point of view (Callebaut & Pinxten, 1987: 41, Gontier, 2006).

ev-ep-diagram2

Figure 2. Within systems theory, the focus lies not only on the mutual relation between the organism and
its environment, rather internal processes specific to the organism and/or the environment are taken into account.

An organism not only is determined by the external environment, the organism can also, to a certain extent, determine its environment by construing and reconstruing it in an active manner (fig. 2). Therefore, the concept “environment” is also broadened to include the inner environment where inner homeostatic, self-regulating processes are responsible for an organism’s survival (point 4 and 5 above). Because of this, it is said that the constructivist approach explains evolution from the organismic point of view (Gutmann and Weingarten 1990; Wuketits, 2006).

b. The Non-Adaptationist Approach within EE

The non-adaptationist approach to EE was first introduced by Franz Wuketits (1989). All adaptationist approaches to EE adhere to the view that it is possible, to an extent, to develop a correspondence theory. A correspondence theory states that there is a 1-to-1 correspondence between the environment and the organisms that live in it, or between theories and the world. For instance, the ant can tell us something about the soil. In order to make this claim feasible, natural selection needs to be reduced to, or at a minimum the emphasis should rest heavily on, the mechanism of adaptation. It is only through the mechanism of adaptation that such correspondence can be obtained.

In the wake of Ludwig von Bertalanffy, one of the founders of systems theory, the importance of the study of the whole organism is stressed, next to the study of the (adaptive) relation between the organism and the environment. Within systems theory, organisms are conceived of as partly open, partly closed systems. That is to say, organisms constantly take matter and energy from, and give matter and energy to, their environment, while they themselves maintain a “steady state” (Wuketits, 2002: 193). Later on, Prigogine (1996) would introduce the concept of “dissipative structures.” A whirlpool, for example, maintains its form while the water of which it is composed, constantly changes. But once the water flow stops, the whirlpool no longer exists. Organisms are more than such dissipative structures. They are homeostatic systems, because not only can they self-regulate and self-organize, they can also maintain themselves to a certain extent. That is why it is said that organisms are partly open, partly closed systems; they receive and donate matter and energy to and from their environment. They also distinguish themselves from that environment and are able to construct their environment as well.

Developmental systems theory (DST) (Maturana and Varela 1980; Oyama 2000a and b; Dupré 2001) grew out of systems theory and, as the concept suggests, it focuses on developmental processes. It understands organisms to be autocatalytic systems, systems which are able to self-organize and self-maintain, not so much because they are adapted to the environment they live in, but because they are able to self-maintain, sometimes even despite the environment, due to the inner mechanisms they develop in order to survive. Therefore, these inner mechanisms of self-organization and self-regulation are comprehended as causal factors that need to be part of the explanation of why organisms behave in a certain manner.

Within the non-adaptationist tradition of EE, being adapted does not mean that there is a one-to-one correspondence with the environment. Instead, being adapted implies having the ability to change the environment to make it livable for the organism, and thus to enhance survival. Adaptation thus becomes only one aspect that needs to be studied, together with non-adaptationist approaches. Wuketits (2006: 38-9):

… a nonadaptationist view of cognition and knowledge and a nonadaptationist version of evolutionary epistemology (…) is mainly based on the following assumptions: (1) Cognition is the function of active bio-systems and not of blind machines that just respond to the outer world. (2) Cognition is not a reaction to the outer world but results from complex interactions between the organism and its surroundings. (3) Cognition is not a linear process of step-by-step accumulation of information but a complex process of continuous error elimination.

In sum, an EE based upon systems theoretical evolutionary theory is not anti-adaptationist (Wuketits 1995: 359-60). It is non-adaptationist because the world constantly changes because of the organisms that inhabit it. This makes it difficult to approximate a one-to-one correspondence.

Instead of adhering to such a correspondence theory, the non-adaptationist approach puts forward a coherence theory. Because of these processes of inner self-organization, self-regulation and the possibility for an organism to partially (re)construct its environment, an organism is partly capable of creating its own habitat. Different organisms develop different habitats because they have evolved differently and have different inner mechanisms which enable them to cope with, and interact with, the outer world. Here, according to Wuketits (2006), it is not useful to ask which habitat is more real or more in correspondence with the world in itself (an sich), because every organism capable of surviving has proven that it is adequate. Therefore a coherence theory adheres to a functional notion of reality. What an organism, according to its own inner mechanisms of perception, perceives as real, is real for that organism in its struggle for existence. If that organism is able to survive because of the way it perceives things, it is able to reproduce and reintroduce its genes into the gene pool. Wuketits (2006: 43):

First, organisms do not simply get a picture of (parts of) reality, but develop, as was already hinted at, a particular scheme of reaction. … Second, the notion of a world-in-itself becomes obsolete or at least redundant. What counts for any organism is that it copes with its own world properly.

7. Evolution from the Point of View of Genes

Thus far we have examined the “organismic point of view” towards evolution defended by the systems theoretical approach, and the description of evolution from the “point of view of the environment” as is the case with the Modern Synthesis. A third and final alternative for describing evolution is the “gene’s eye view.” The gene’s eye view was introduced by Richard Dawkins (1976), following Williams (1966).

This approach opened the discussion concerning universal Darwinism (section 7) and introduced the important concept of a “replicator,” a concept that is often used within universal selectionism.

According to Dawkins (1982: 162) the unit of selection is not the phenotype, but the replicator: “… any entity in the universe of which copies are made” and this replicator, contrary to the vehicles that temporarily house them “…is potentially immortal… the rationale is that an entity must have a low rate of spontaneous, endogenous change, if the selective advantage of its phenotypic effects over those of rival (‘allelic’) entities is to have any significant effect.” (Dawkins, 1982: 164).

A replicator carries information that can be copied. An example par excellence is genetic material that, according to the specific sequence of nucleotides (the building blocks of genes), encodes for certain characteristics. Organisms, according to Dawkins, are mere vehicles that temporarily accommodate such information-carrying replicators. In the long run, because of their longevity, fecundity and copying-fidelity, these “selfish genes” outlive their temporary housing. Therefore, the emphasis for Dawkins should lie on the replicator, not the individual organism. That is not to say that the environmental approach so characteristic of the Modern Synthesis is wrong, according to Dawkins, rather it should be complemented with the gene’s point of view of evolution.

…[t]here are two ways in which we can characterize natural selection. Both are correct: they simply focus on different aspects of the same process. Evolution results from the differential survival of replicators. Genes are replicators; organisms and groups of organisms are not replicators, they are vehicles in which replicators travel about. Vehicle selection is the process by which some vehicles are more successful than other vehicles in ensuring the survival of their replicators. (Dawkins, 1982: 162)

It is the organism’s job to deliver its genes as quickly and faithfully as possible within the gene pool. “Vehicle selection is the differential success of vehicles in propagating the replicators that ride inside them.” (Dawkins, 1982: 166) Every behavior an organism displays that is not reducible to the benefit of its genetic material is, from the point of view of the gene, futile and even unnecessarily costly. Organisms are only important in so far as they are able to propagate their genes. Therefore, although this view can be complemented with the Modern Synthesis, it stands opposed to the “organismic point of view.”

8. Universal Selection Mechanisms Repeated and Extended

Thus far we have seen that the units and levels of selection debate that started within biology also set off an evolutionary epistemological debate concerning the different units and levels of selection in science.

One of the main goals set forward by many Evolutionary Epistemologists is the development of a normative and explanatory framework that is based upon, and is at the least analogical to, evolutionary thinking. The quest for universal selection formulas that was already launched as early as the nineteenth century was spurred again by this units and levels of selection debate. The goal of such a uniform universal formula is that it not only explains biological evolution, but also the evolution of science, culture, the brain, economics, etc.

Scientists and philosophers alike have introduced different formulas that generalize and universalize natural selection and other evolutionary theories. Discussions in the field revolve around the question of whether there exists one universal selection formula which can be utilized to interpret all other kinds of evolutionary processes (including the evolution of culture, psychology, immunology, language, etc.), or whether such formulas can only help at a descriptive, and therefore, merely analogical, level. In what follows, different evolutionary frameworks are briefly touched upon so that the interested reader has an idea of where to look for different applications of these schemas.

a. Lewontin’s "Logical Skeleton" of Natural Selection

Lewontin (1970: 1) was the first to make an abstraction of natural selection. He argued that “the logical skeleton” of Darwin’s theory is “a powerful predictive system for changes at all levels of biological organization.” Lewontin distinguishes between three principles: phenotypic variation, differential fitness (because of different environments) and the heritability of that fitness. Lewontin (1970: 1) introduced this logical skeleton to pinpoint “different units of Mendelian, cytoplasmic, or cultural inheritance.” He distinguished between the selection of molecules (regarding the origin of life), cell organelles (regarding cytoplasmic evolution), cellular selection (different cell types divide at different rates, comparable with what today is called epigenetics), gametic selection, individual selection, kin selection and population selection.

b. Universal Darwinism

Dawkins (1983: 15) states that wherever life originates, that life can only be explained by using Darwin’s theory of natural selection. According to Dawkins, the most important property of life is that it is adapted to its environment, and adaptation requires a Darwinist explanation. Dawkins (1983: 16) states: “I agree with Maynard Smith […] that ‘The main task of evolution is to explain complexity, that is, to explain the same set of facts which Paley used as evidence of a Creator.’”

Organisms are “adaptively complex” (Dawkins, 1983: 17). This means that a complex structure like the eye, for example, evolved by natural selection for vision. Organisms or organismal traits are adapted to the environment and also evolved to enable adaptation towards that environment. Thus, through adaptation, an organism possesses information about that environment (Dawkins, 1983: 21). Selection refers to “…the non-random selection of randomly varying replicating entities by reason of their ‘phenotypic’ effects” (Dawkins, 1983: 32). It can be further divided into “one-off selection” and “cumulative selection.” The former relates to the selection of a stable configuration, a universally occurring process. The latter enables complex adaptation, because the next generation builds upon earlier generations through such things as the passing on of genes, but not solely by this mechanism.

Most importantly, for Dawkins, it is replicators that are selected. The reason that he introduces the concept “replicator” is twofold. First, he wants to extend the Modern Synthesis by introducing the gene’s eye view. Second, he introduces the term replicator, instead of gene, because he wants to universalize the principle of natural selection. The unit of selection, according to Dawkins, is the replicator, but replicator is a generic term; not only genes (individual genes or whole chunks of the chromosome), but also memes –which he defines as “… brain structures whose ‘phenotypic’ manifestation as behavior or artifact is the basis of their [cultural] selection,” are replicators (Dawkins, 1982: 164). The idea of memetics was later expanded by Blackmore (1999).

c. Blind Variation and Selective Retention

Campbell’s scheme is a formula that can be universalized. Every relationship that an organism engages in with its environment is a knowledge relation. Variation is blind, either because of random mutations and genetic recombinations, or, in the case of the development of scientific theories, blind trials result in blind variation.

Selection does not only occur at the level of the interaction between phenotype and environment, for selection is also internalized by the process of vicarious selection (see above). And trial and error learning has always been somewhat synonymous with blind-variation-and-selective-retention, according to Campbell.

In his earlier writings, Campbell (1959, 1960) emphasizes the notion of variation, because only when there is sufficient variation will there be competition and selection. Later, he emphasized the selective retention-part of his theory: those traits that are already adaptive also need to be retained by the current generation in order to keep being adaptive. In science as well, existing theories must be retained and passed on to the next generation through learning, or this information dies out. Hence tradition within culture or science, for example, also became a more important element in Campbell’s later writings (1987).

d. Universal Selectionism

The concept “universal selectionism” was first introduced by Gary Cziko (1995) and roughly corresponds with Campbell’s blind-variation-and-selective-retention scheme, although he prefers the term selectionism. In his 1995 book, Cziko explains this scheme as being applicable not only to biological evolution, but also to the evolution and growth of knowledge, immunology, and the development of the brain, thinking and culture. Selectionism is the only theory that, according to Cziko (1995: 13- 26), can explain the fit of an organism with its environment. Throughout history, providentialism and instructionism have also been assumed to explain this fit, but only selectionism can explain the mechanism of adaptation.

e. Replication, Variation and Environmental Interaction

The replication, variation, and environmental interaction scheme was first introduced by David Hull (1980) as a critique on Dawkins’s notion of replicators and vehicles. In Dawkins’s view, organisms are mere vehicles that temporarily accommodate the selfish genes that ride inside them and an organism can actually be equated with the workings of its genes. Hull’s theory differs from Dawkins’s, because the former states that organisms can display behavior that is not reducible to their genes. On a more general level, Hull introduced the notion of an interactor to complement Dawkins’s view (1980). Thus, he basically re-introduced the common assumption held by the Modern Synthesis that what interacts with the environment are organisms, not genes. But the notion of interaction can also be universalized. The most recent account of this formula is given in Hull, Langman and Glenn (2001).

For selection to occur, three conditions need to be met: replication, variation, and environmental interaction. Replication is dependent on the interaction between the organism and its environment (Hull, Langman and Glenn, 2001: 511). The formula they propose should be equally applicable to biology, immunology and operant behavior, although it should not be identical to biological selection theory. All three sorts of evolution share certain properties but also have their own peculiarities. Changes in operant behavior, for example, are not transmitted immediately to the next generation.

In contrast to Campbell and Plotkin, Hull, Langman, and Glenn (2001: 513) define selection as “[The] repeated cycles of replication, variation, and environmental interaction so structured that environmental interaction causes replication to be differential. The net effect is the evolution of the lineages produced by this process.”

Within postneodarwinian theory, variation is either perceived as part of the selection process, or as a precondition for selection to occur. If variation occurs, this results either from mutations that occur in the sex cells at the biological level, or from different behavioral patterns that in their own right are the result of environmental interaction. Replication, according to these authors (Hull, Langman and Glenn, 2001: 514-6), concerns the repetition/copying of “information.”

Finally, environmental interaction is characterized as causing replication to differ because certain replicators are more frequently selected than others, which in turn has nothing to do with the introduction of new variation. Only at the level of interaction between the organism and the environment does selection occur.

Hull’s scheme is one of the few schemes that has already been implemented in extra-philosophical and extra-biological fields. William Croft (2000, 2002) for instance uses it for the study of language change.

f. Generate-Test-Regenerate / Replicator-Interactor-Lineage

Plotkin prefers the notion of “universal Darwinism” over universal selectionism (1995, chapter 3). He distinguishes between two universal formulas. The first, the generate-test/selection-regenerate formula is more general. It does not a priori say anything about the mechanisms or units that cause this generating and testing. This formula is again very close to Campbell’s scheme. as well as Lewontin’s (Plotkin, 1995: 84). A second formula does specify the units and mechanisms: replication, interaction and lineages. The reason Plotkin distinguishes between the two is that he wants to avoid having to pinpoint a priori a replicator in cultural evolution.

Selection processes, according to Plotkin, always take place in three steps: first, there is the generation of variation, and the nature of variation does not in itself need to be specified (genes, phenotypes, theories etc. all can vary). This phase is always followed by a test phase, where natural selection is of course the prototypical way in which there occurs selection based upon the test results. Finally, there is regeneration of old and newly evolved varieties (Plotkin, 1995: 84). While it is obvious that Plotkin mainly has the selection of genetic material in mind here, he also sees his formula appropriate in order to explain learning and intelligence. How information is transmitted is not determined a priori, rather it is important that old variations are regenerated throughout time.

The replicator-interactor-lineage formula is first an elaboration and specialization of Plotkin’s first formula since it combines Dawkins's notion of a replicator with Hull’s notions of an interactor and lineage, the latter term referring to “… entities that can change indefinitely through time as a result of replication and interaction.” (Plotkin, 1995: 97). Hull himself defines lineages as “… spatiotemporal sequences of entities that causally produce one another. Entities in the sequence are in some sense ‘descended’ from those earlier in the sequence” (1981: 146).

According to Plotkin (1995: xv), adaptation and knowledge are related in two ways: first the capacity to acquire knowledge is in itself an adaptation, and secondly, adaptations are also a form of knowledge. Adaptations are “in-formed” by the environment. Therefore, adaptation is knowledge (Plotkin, 1995: 116) and there can be a tentative growth of knowledge.

g. Universal Symbiogenesis

SET, the Serial Endo-symbiogenetic Theory of Lynn Margulis and Dorian Sagan (2002), is a theory that describes the origin of the five kingdoms. In brief, different bacteria merged and evolved into multi-cellular life. What is interesting here is that different bacteria literally merged and thus that evolution does not exclusively occur according to speciation models. The physicist Freeman Dyson (1992) therefore introduces the principle of universal symbiogenesis, where symbiotic mergings and speciation models intertwine. Throughout the evolution of life, which is the same for the evolution of the universe, there is an increase in diversification on the one hand and symbiogenesis on the other. Different structures originate and then later merge to form new structures. Within the evolution of life, there was the origin of the first microbial organisms, which than merged again and evolved into multi-cellular organisms.

Dyson defines universal symbiogenesis as “the reattachment of two structures, after they have been detached from each other and have evolved along separate paths for a long time, so as to form a combined structure with behavior not seen in the separate components” (Dyson, 1998: 121).

In conclusion, it can be said that the specific theory of evolution that one adheres to also partly determines what kind of evolutionary epistemology can be adhered to. Since evolutionary epistemology bases itself first on the sciences, no attempt is made by different evolutionary epistemologists to put forward one all-encompassing theory or program that all evolutionary epistemologists should adhere too. On the contrary, the diversity of evolutionary epistemologies is championed by scholars working in the field.

9. References and Further Reading

  • Ayala, Francisco J. 1978. “The Mechanisms of Evolution.” Scientific American 239 (3): 48-61.
  • Blackmore, Susan. 1999. The Meme Machine – with a foreword of Richard Dawkins. Oxford: Oxford University Press.
  • Bradie, Michael, 1986. “Assessing Evolutionary Epistemology.” Biology & Philosophy, 1, 401-459.
  • Brandon, Robert N. 1982. “The Levels of Selection.” In: Brandon, Robert N.; and Burian, Richard M. (eds). 1984. Genes, Organisms, Populations: Controversies over the units of selection 133-9. Cambridge: Massachusetts Institute of Technology.
  • Brandon, Robert N.; and Burian, Richard M. (eds). 1984. Genes, Organisms, Populations: Controversies over the Units of Selection. Cambridge: Massachusetts Institute of Technology.
  • Callebaut, Werner; and Pinxten, Rik. 1987. “Evolutionary Epistemology Today: Converging Views from Philosophy, the Natural and Social Sciences.” In: Callebaut, Werner; and Pinxten, Rik, (eds.). 1987. Evolutionary Epistemology: A Multiparadigm Program With a Complete Evolutionary Epistemology Bibliography 3-55. Dordrecht: Reidel.
  • Callebaut, Werner. 1993. Taking The Naturalistic Turn or How Real Philosophy of Science Is Done. Chicago IL: The University of Chicago Press.
  • Campbell, Donald T. 1959. “Methodological Suggestions from a Comparative Psychology of Knowledge Processes.” Inquiry 2 (3): 152-83.
  • Campbell, Donald T. 1960. “Blind Variation and Selective Retention in Creative Thought as in Other Knowledge Processes.” Psychological Review 67(6): 380-400.
  • Campbell, Donald T. 1974. “Evolutionary Epistemology.” In: Schlipp, Paul A. (ed.), The Philosophy of Karl Popper Vol. I, 413-459. Illinois: La Salle.
  • Campbell, Donald T. 1987. “Selection Theory and the Sociology of Scientific Validity.” In: Callebaut, Werner; and Pinxten, Rik (eds), Evolutionary Epistemology 139-58. Dordrecht: D. Reidel Publishing Company.
  • Campbell, Donald T. 1997. “From Evolutionary Epistemology Via Selection Theory to a Sociology of Scientific Validity: Edited by Cecilia Heyes and Barbara Frankel” Evolution and Cognition, 3 (1), 5-38.
  • Changeaux, Jean-Pierre. 1985. Neuronal Man: The Biology of Mind. New York: Oxford University Press.
  • Croft, William. 2000. Explaining Language Change: An Evolutionary Approach. Essex: Pearson.
  • Croft, William. 2002. “The Darwinization of Linguistics.” Selection 3(1): 75-91.
  • Cziko, Gary. 1995. Without Miracles: Universal Selection Theory and the Second Darwinian Revolution. Cambridge: Massachusetts Institute of Technology.
  • Damasio, Antonio R. 1996 (1994). Descartes’s Error: Emotion, Reason and the Human Brain. London: Papermac [First published by New York: Grosset/Putnam].Damasio, Antonio R. 1999. The Feeling of What Happens: Body and Emotion in the Making of Consciousness. New York: Harcourt Brace & Company.
  • Dawkins, Richard. 1976. The Selfish Gene. New York: Oxford University Press.
  • Dawkins, Richard. 1983. “Universal Darwinism.” In: Hull, David Lee; and Ruse, Michael (eds), The Philosophy of Biology 15-35. Oxford: Oxford University Press [First published in: Bendall, D. S. (ed.), 1998. Evolution from Molecules to Man 403-25. Cambridge: Cambridge University Press].
  • Dawkins, Richard. 1982. “Replicators and Vehicles.” In: Brandon, N. R.; and Burian, R. M. (eds) 1984, Genes, Organisms, Populations 161-79. Cambridge: Massachusetts Institute of Technology Press.
  • Dupré, John. 2001. Human Nature and the Limits of Science. Oxford: Clarendon Press.
  • Dyson, Freeman. 1998. “The Evolution of Science.” In: Fabian, Andrew C. (ed.), Evolution: Society, Science and the Universe 118-35. Cambridge: Cambridge University Press.
  • Gazzaniga, Michael S. 1994. Nature’s Mind: The Biological Roots of Thinking, Emotions, Sexuality, Language, and Intelligence. New York: Basic Books.
  • Gazzaniga, Michael S. 2000. The Mind’s Past. California: University of California Press.
  • Gontier, Nathalie. 2006. “Introduction to Evolutionary Epistemology, Language and Culture.” In: Gontier, Nathalie, Van Bendegem, Jean Paul and Aerts, Diederik (eds), Evolutionary Epistemology, Language and Culture – A non-adaptationist systems theoretical approach1-29. Dordrecht: Springer.
  • Gould, Stephen J. and Lewontin, R. C. 1979. “The Spandrels of San Marco and the Panglossian Paradigm: A Critique of the Adaptationist Program.” Proc. R. Soc. London, B 205, 581-589.
  • Guttmann, Wolfgang F.; and Weingarten, Michael. 1990. “Die biotheoretischen Mängel der evolutionären Erkenntnistheorie.” Journal for General Philosophy of Science 21: 309-328.
  • Hamilton, William. D. 1964. “The Genetical Evolution of Social Behavior, I and II.” Journal of theoretical biology, 7, 1-52.
  • Heyes, Cecilia; and Hull, David (eds). 2001. Selection Theory and Social Construction – The Evolutionary Naturalistic Epistemology of Donald T. Campbell. New York: State University of New York Press.
  • Hull, David L. 1980. “Individuality and Selection.” Annual Review of Ecology and Systematics, II: 311-32.
  • Hull, David L. 1981. “Units of Evolution.” In: Brandon, N. R.; and Burian, R. M. (eds) 1984, Genes, Organisms, populations 142-159. Cambridge: Massachusetts Institute of Technology Press.
  • Hull, David L. 1988. Science as a Process: An Evolutionary Account of the Social and Conceptual Development of Science. Chicago: The University of Chicago Press.
  • Hull, David L.; Langman, Rodney E.; and Glenn, Sigrid S. 2001. “A General Account of Selection: Biology, Immunology, and Behavior.” Behavioral and Brain Sciences 24: 511-573.
  • Ledoux, Joseph. 1998 (1996). The Emotional Brain: The Mysterious Underpinnings of Emotional Life. New York: Touchstone Edition [First published by New York: Simon & Schuster incorporate].
  • Lewontin, Richard. 1970. “The levels of Selection.” Annu. Rev. Ecol. Syst. 1-18.
  • Lewontin, Richard. 1978. “Adaptation.” Scientific American 239 (3): 157-69.
  • Lewontin, Richard. 2000. The Triple Helix: Gene, Organisms and Environment. Cambridge: Harvard University Press.
  • Lorenz, Konrad. 1941. “Kant’s Lehre vom Apriorischen im Lichte gegenwärtiger Biologie.” Blätter für Deutsche Philosophie 15: 94-125. (English translation in Plotkin, Henry C., op. cit. 121-143.)
  • Lorenz, Konrad. 1958. “The Evolution of Behavior.” Scientific American 199 (6), 67-78.
  • Lorenz, Konrad. 1977. Behind the Mirror. London: Methuen.
  • Lorenz, Konrad. 1985. “Wege zur Evolutionären Erkenntnistheorie.” In: Ott, Jörg A., Wagner, G. P. and Wuketits, F. (eds), Evolution, Ordnung und Erkenntnis. Berlin: Verlag, 13-20.
  • Margulis, Lynn; and Sagan, Dorion. 2002. Acquiring Genomes: A Theory of the Origin of Species. New York: Basic Books.
  • Maturana, Humberto; and Varela, Francisco. 1980. Autopoiesis and cognition. Dordrecht: D. Reidel Publishing Company.
  • Maynard Smith, John. 1993 (1958). The Theory of Evolution. Cambridge: Canto [First published by Cambridge: Cambridge University Press].
  • Mayr, Ernst. 1978. “Evolution.” Scientific American 239 (3): 39-47.
  • Munz, Peter. 2001 (1993). Philosophical Darwinism: on the origin of knowledge by means of natural selection. London: Routledge.
  • Oyama, Susan. 2000a. The Ontogeny of Information: Developmental Systems and Evolution. Durham: Duke University Press.
  • Oyama, Susan. 2000b. Evolution’s Eye: A Systems View of the Biology-Culture Divide. Durham: Duke University Press.
  • Plotkin, Henry. 1995 (1994). Darwin Machines and the Nature of Knowledge: Concerning Adaptations, Instinct and the Evolution of Intelligence. London: Penguin Books.
  • Popper, Karl. 1963. Conjectures and Refutations. London: Routledge & Kegan Paul.
  • Prigogine, Ilya. 1996. La fin des certitudes: temps, chaos et les lois de la nature. Paris: Odile Jacob.
  • Quine, William V. 1969. “Epistemology Naturalized.” In: Bernecker, Sven; and Dretske, Fred (eds), 2000. Knowledge: Readings in Contemporary Epistemology 266-78. Oxford: Oxford University Press [First published in Quine, William V., 1969, Ontological Relativity and Other Essays 69-90. New York: Colombia University Press. Original title: “Naturalized Epistemology.”].
  • Ruse, Michael. 1988. Taking Darwin Seriously. Oxford: Blackwell publishers.
  • Toulmin, Stephen. 1972. Human Understanding: The Collective Use and Evolution of Concepts. Princeton: Princeton University Press.
  • Williams, George C. 1966. Adaptation and Natural Selection. Princeton: Princeton University Press.
  • Wuketits, Franz (ed.) 1984. Concepts and Approaches in Evolutionary Epistemology. Dordrecht: D. Reidel Publishing Company
  • Wuketits, Franz M. 1985. “Die Systemtheoretische Innovation der Evolutionslehre.” In: Ott, Jörg A., Wagner, G. P. and Wuketits, F. (eds), Evolution, Ordnung und Erkenntnis. Berlin: Verlag, 68-81.
  • Wuketits, Franz M. 1989. “Cognition: A Non-adaptationist View.” La Nuova Critica 9-10, 5-15.
  • Wuketits, Franz M. 1990. Evolutionary Epistemology and its Implications for Humankind. New York: State University of New York Press.
  • Wuketits, Franz M. 1992. “Adaptation, Representation, Construction: An Issue in Evolutionary Epistemology.” Evolution and Cognition 2: 151-162.
  • Wuketits, Franz M. 1995. “A Comment on Some Recent Arguments in Evolutionary Epistemology: and Some Counterarguments.” Biology and Philosophy 10: 357-363.
  • Wuketits, Franz M. 2001. “The Philosophy of Donald T. Campbell: A Short Review and Critical Appraisal.” Biology and Philosophy 16: 171-88.
  • Wuketits, Franz. M. 2002. “Ludwig von Bertalanffy (1901-1972) und die theoretische Biologie heute.” Naturwissenschafliche Rundschau, 55 (4): 190-194.
  • Wuketits, Franz M. 2006. “Evolutionary Epistemology – The Non-Adaptationist Approach.” In: Gontier, Nathalie, Van Bendegem, Jean Paul and Aerts, Diederik (eds), Evolutionary Epistemology, Language and Culture – A Non-Adaptationist Systems Theoretical Approach 33-46. Dordrecht: Springer.

Research for this article was supported by the Fund for Scientific Research - Flanders (F.W.O.-Vlaanderen) and the Centre for Logic and Philosophy of Science, where the author is a Research Assistant.

Author Information

Nathalie Gontier
Email: Nathalie.Gontier@vub.ac.be
Vrije Universiteit Brussel
Belgium

Models

The word “model” is highly ambiguous, and there is no uniform terminology used by either scientists or philosophers. Here, a model is considered to be a representation of some object, behavior, or system that one wants to understand. This article presents the most common type of models found in science as well as the different relations—traditionally called “analogies”—between models and between a given model and its subject. Although once considered merely heuristic devices, they are now seen as indispensable to modern science. There are many different types of models used across the scientific disciplines, although there is no uniform terminology to classify them. The most familiar are physical models such as scale replicas of bridges or airplanes. These, like all models, are used because of their “analogies” to the subjects of the models. A scale model airplane has a structural similarity or “material analogy” to the full scale version. This correspondence allows engineers to infer dynamic properties of the airplane based on wind tunnel experiments on the replica. Physical models also include abstract representations which often include idealizations such as frictionless planes and point masses. Another, but completely different type of model, is constituted by sets of equations. These mathematical models were not always deemed legitimate models by philosophers. Model-to-subject and model-to-model relations are described using several different types of analogies: positive, negative, neutral, material, and formal.

Like unobservable entities, models have been the subject of debate between scientific realists and antirealists. One’s position often depends on what one considers the truth-bearers in science to be. Those who take fundamental laws and/or theories to be true believe that models are true in inverse proportion to the degree of idealization used. Highly idealized models would therefore be (in some sense) less true. Others take models to be true only insofar as they describe the behavior of empirically observable systems. This empiricism leads some to believe that models built from the bottom-up are realistic, while those derived in a top-down manner from abstract laws are not.

Models also play a key role in the semantic view of theories. What counts as a model on this approach, however, is more closely related to the sense of models in mathematical logic than in science itself.

Table of Contents

  1. Models in Science
  2. Physical Models
  3. Mathematical Models
  4. State Spaces
  5. Models and Realism
  6. Models and the Semantic View of Theories
  7. References and Further Reading

1. Models in Science

The word “model” is highly ambiguous, and there is no uniform terminology used by either scientists or philosophers. This article presents the most common type of models found in science as well as the different relations—traditionally called “analogies”—between models and between a given model and its subject. For most of the 20th century, the use of models in science was a neglected topic in philosophy. Far more attention was given to the nature of scientific theories and laws. Except for a few philosophers in the 1960’s, Mary Hesse in particular, most did not think the topic was particularly important. The philosophically interesting parts of science were thought to lie elsewhere. As a result, few articles on models were published in twenty-five years following Hesse’s (1966). [These include (Redhead, 1980) and (Wimsatt, 1987), and parts of (Bunge, 1973) and (Cartwright, 1983.] The situation is now quite different. As philosophers of science have come to pay greater attention to actual scientific practice, the use of models has become an import area of philosophical analysis.

2. Physical Models

One familiar type of model is the physical model: a material, pictorial, or analogical representation of (at least some part of) an actual system. “Physical” here is not meant to convey an ontological claim. As we shall see, some physical models are material objects; others are not. Hesse classifies many of these as either replicas or analogue models. Examples of the former are scale models used in wind tunnel experiments. There is what she calls a “material analogy” between the model and its subject, that is, a pretheoretic similarity in how their observable properties are related. Replicas are often used when the laws governing the subject of the model are either unknown or too computationally complex to derive predictions. When a material analogy is present, one assumes that a “formal analogy” also exists between the subject and the model. In a formal analogy, the same laws govern the relevant parts of both the subject and model.

Analogue models, in contrast, have a formal analogy with the subject of the model but no material analogy. In other words, the same laws govern both the subject and the model, although the two are physically quite different. For example, ping-pong balls blowing around in a box (like those used in some state lotteries) constitute an analogue model for an ideal gas. Some analogue models were important before the age of digital computers when simple electric circuits were used as analogues of mechanical systems. Consider a mass M on a frictionless plane that is subject to a time varying force f(t) (Figure 1). This system can be simulated by a circuit with a capacitor C and a time varying voltage source v(t). The voltage across C at time t corresponds to the velocity of M.

Figure 1: Analogue Machine

Today engineers and physicists are more familiar with simplifying models. These are constructed by abstracting away properties and relations that exist in the subject. Here we find the usual zoo of physical idealizations: frictionless planes, perfectly elastic bodies, point masses, etc. Consider a textbook mass-spring system with only one degree of freedom (that is, the spring oscillates perfectly along one dimension) shown in Figure 2. This particular system is physically possible, but nonactual. Real springs always wobble just a bit. If by chance a spring did oscillate in one dimension for some time, the event would be unlikely but would not violate any physical laws. Frictionless planes, on the other hand, are nonphysical rather than merely nonactual.

Figure 2: Physical Water Drop Model

Simplifying models provide a context for Hesse’s other relations known as positive, negative, and neutral analogies. Positive analogies are the ways in which the subject and model are alike—the properties and relations they share. Negative analogies occur when there is a mismatch between the two. The idealizations mentioned in the previous paragraph are negatively analogous to their real-world subjects. In a scale-model airplane (a replica), the length of the wing relative to the length of the tail is a positively analogous since the ratio is the same in the subject and the model. The wood used to make the model is negatively analogous since the real airplane would use different materials. Neutral analogies are relations that are in fact either positive or negative, but it is not yet known which. The number of neutral analogies is inversely related to our knowledge of the model and its subject. One uses a physical model with strong, positive analogies in order to probe its neutral analogies for more information. Ideally, all neutral analogies will be sorted into either positive or negative. The early success of the Bohr model of the atom showed that it had positive analogies to real hydrogen atoms. In Hesse’s terms, the neutral analogies proved to be negative when the model was applied to atoms with more than one electron.

The use of “analogy” in this regard has declined somewhat in recent years. “Idealization” has replaced “negative analogy” when these simplifications are built into physical models from the start. The degree to which a model has positive analogies is more typically described by how “realistic” the model is. One might also use the notion of “approximate truth”—a term long recognized as more suggestive than precise. The rough idea is that more realistic models—those with stronger positive analogies—contain more truth than others. “Negative analogy” contains an ambiguity. Some are used at the beginning of the model-building process. The modeler recognizes the false properties for what they are and uses them for a specific purpose—usually to simplify the mathematics. Other negative analogies, known as “artifacts,” are unintended consequences of idealizations, data collection, research methods, and limitations of the medium used to construct the model. Some artifacts are benign and obvious. Consider the wooden models of molecules used in high school chemistry classes. Three balls held together by sticks can represent a water molecule, but the color of the balls is an artifact. (As the early moderns were fond of pointing out, atoms are colorless.) Other artifacts are produced by measuring devices. It is impossible, for example, to fully shield an oscilloscope from the periodic signal produced by its AC current source. This produces a periodic component in the output signal not present in the source itself.

The heavy emphasis here on models in the physical sciences has more to do with the interests of philosophers than scientific practice. Physical models are used throughout the sciences, from immunoglobulin models of allergic reactions to macroeconomic models of the business cycle.

3. Mathematical Models

Philosophers have generally taken physical models as paradigm cases of scientific models. In many branches of science, however, mathematical models play a far more important role. There are many examples, especially in dynamics. Equation (1) below is an ordinary differential equation representing the motion of a frictionless pendulum. [θ is the angle of the string from vertical, l is the length of the string, and g is the acceleration due to gravity. The two dots in the first term stand for the second derivative with respect to time.] Even when sets of equations have clearly been used “to model” some behavior of a system, philosophers were often unwilling to take these as legitimate models. The difference is driven in part by greater familiarity with models in mathematical logic. In the logician’s realm, a model satisfies a set of axioms; the axioms themselves are not models. To philosophers, equations look like axioms. Referring to a set of equations as “a model” then sounds like a category mistake.

(1)

This attitude was eroded in part by the central role mathematical models played in the development of chaos theory. The 1980s saw a deluge of scientific articles with equations governing nonlinear systems as well as the state spaces that represented their evolution over time (see section 4). Physical models, on the other hand, were often bypassed altogether. This made it far more difficult to dismiss “mathematical model” as a scientist’s misnomer. It soon became apparent that all of the issues regarding idealizations, confirmation, and construction of physical models had mathematical counterparts.

Consider the physical model of the electric circuit in Figure 1. A common idealization is to stipulate that the circuit has no resistance. When we look to the associated differential equations—a mathematical model—there is a corresponding simplification, in this case the elimination of an algebraic term that represented the resistance of the wire. Unlike this example, simplification is often more than a mere convenience. The governing equations for many types of phenomena are intractable as they stand. Simplifications are needed to bridge the computational gap between the laws and phenomena they describe. In the old (pre-1926) quantum theory, for example, it was common to run across a Hamiltonian (an important type of function in physics that expresses the total energy of the system) that blocked the usual mathematical techniques—for example, separation of variables. Instead, a perturbation parameter λ was used to convert the problematic Hamiltonian into a power series such as in equation (2) below. [I, θ are classical action-angle variables. See any text on classical mechanics for more on this method.] Once in this form, one may generate an approximate solution for to an arbitrary degree of precision by keeping a finite number of terms and discarding the rest. This is sometimes called a “mediating mathematical model” (Morton 1993) since it operates, in a sense, between the intractable Hamiltonian and the phenomenon it is thought to describe.

(2)

4. State Spaces

State spaces have received scant attention in the philosophical literature until recently. They are often used in tandem with a mathematical model as a means for representing the possible states of a system and its evolution. The “system” is often a physical model, but might also be a real-world phenomenon essentially free of idealizations. Figure 3 is the state space associate with equation (1), the mathematical model for an ideal (frictionless) pendulum. Since θ represents the angle of the string, a,b correspond to the two highest points of deflection. represents velocity. [The coefficient .] Hence c,d are the points at which the pendulum is moving the fastest.

Figure 3: State Space for Ideal Pendulum

State spaces take a variety of forms. Quantum mechanics uses a Hilbert space to represent the state governed by Schrödinger’s equation. The space itself might have an infinite number of dimensions with a vector representing an individual state. The ordinary differential equations used in dynamics require many-dimensional phase spaces. Points represent the system states in these (usually Euclidean) spaces. As the state evolves over time, it carves a trajectory through the space. Every point belongs to some possible trajectory that represents the system’s actual or possible evolution. A phase space together with a set of trajectories forms a phase portrait (Figure 4). Since the full phase portrait cannot be captured in a diagram, only a handful of possible trajectories are shown in textbook illustrations. If the system allows for dissipation (for example friction), attractors can develop in the associated phase portrait. As the name implies, an attractor is a set of points toward which neighboring trajectories flow, though the points themselves possess no actual attractive force. The center of Figure 4a, known as a point attractor, might represent a marble coming to rest at the bottom of a bowl. Simple periodic motion, like a clock pendulum, produces limit cycles, attracting sets forming closed curves in phase space (Figure 4b).

Figure 4: Sample Phase Portraits

Let us consider a very simple system—a leaky faucet—that illustrates the use of each type of model mentioned. Researchers at the University of California, Santa Cruz, believed that the time between drops does not change randomly over time, but instead has an underlying dynamical structure (Martien 1985). In other words, one drip interval causally influences the next. In order to explore this hypothesis, a simplified physical model for a drop of water was developed (the one shown above in Figure 2). They believed that a water drop is roughly like a one-dimensional, oscillating mass on a spring. Part of the mass detaches when the spring extends to a critical point. The amount of mass that detaches depends on the velocity of the block when it reaches this point.

The mathematical model (3) for this system is relatively simple. y is the vertical position of the drop, v is its velocity, m is its mass prior to detachment, and Δm is the amount of mass that detaches (k, b, and c are constants). When this model is simulated on a computer, the resulting phase portrait is very similar to the one that was reconstructed from the data in the lab. Although this qualitative agreement is too weak to completely vindicate these models of the dripping faucet, it does provide a small degree confirmation.

(3)

Going back to the physical model, there are two clear idealizations/negative analogies. First, of course, is that water drops are not shaped like rigid blocks. Second, the mass-spring model only oscillates along one axis. Real liquids are not constrained in this way. However, these idealization allow for a far simpler mathematical model to be used than one would need for a realistic fluid. (Without these idealizations, (3) would have to be replaced by a difficult partial differential equation.) In addition, Peter Smith has argued that this mathematical tractability came with a steep price, namely, an unrecognized artifact (1998). The problem is that the state space for this particular system contains a “strange attractor” with a fractal structure, a geometrical structure far more complex than the attractors in Figure 4. Smith argues that the infinitely intricate structure of this attractor is an artifact of the mathematics used to describe the evolution of the system. If more realistic physical and mathematical models were used, this negative analogy would likewise disappear.

5. Models and Realism

One of the perennial debates in the philosophy of science has to do with realism. What aspects of science—if any—truly represent the real world? Which devices, on the other hand, are merely heuristic? Antirealists hold that some parts of the scientific enterprise—laws, unobservable entities, etc.—do not correspond to anything in reality. (Some, like van Fraassen (1980), would say that if by chance the abstract terms used by scientists did denote something real, we have no way of knowing it.) Scientific realists argue that the successful use of these devices shows that they are, at least in part, truly describing the real world. Let’s now consider what role models have played in this debate.

Whether models should be taken realistically depends on what one takes the truth-bearers in science to be. Some hold that foundational, scientific truths are contained either in mature theories or their fundamental laws. If so, then idealized models are simply false. The argument for this is straightforward (Achinstein 1965). Let’s say that theory T describes a system S in terms of properties p1, p2, and p3. As we have seen, simplified models either modify or ignore some of the properties found in more fundamental theories. Say that a physical model M describes S in terms of p1 and p4. If so, then T describes S in one way; M describes S in a logically incompatible way. The simplifying assumptions needed to build a useful model contradict the claims of the governing theory. Hence, if T is true, M is false.

In contrast, Nancy Cartwright has long argued that abstract laws, no matter how “fundamental” to our understanding of nature, are not literally true. In her earlier work (1983), she argued that it is not models that are highly idealized, but rather the laws themselves. Abstract laws are useful for organizing scientific knowledge, but are not literally true when applied to concrete systems. They are “true,” she argues, only insofar as they correctly describe simplified physical models (or “simulacra”). Fundamental laws are true-of-the-model, not true simpliciter. The idea is something like being true-in-a-novel. The claim “The beast that terrorized the island of Amity in 1975 was a squid” is false-in-the-novel Jaws. Similarly, Newton’s second law of motion plus universal gravitation are only true-in-Newtonian-particle-models.

For most scientific realists, whether physical models are “true” or “real” is not a simple yes-or-no question. Most would point out that even idealizations like the frictionless plane are not simply false. For two blocks of iron sliding past each other, neglecting friction is a poor approximation. For skis sliding over an icy slope, it is much better. In other words, negative analogies come in degrees. If the idealizations are negligible, we may properly say that a physical model is realistic.

Scientific realists have not always held similar views about mathematical models. Textbook model building in the physical sciences often follows a “top-down” approach: start with general laws and first principles and then work toward the specifics of the phenomenon of interest. Dynamics texts are filled with models that can serve as the foundation for a more detailed mathematical treatment (for example, an ideal damped pendulum or a point particle moving in a central field). Philosophers have paid much less attention to models constructed from the bottom-up, that is, models that begin with the data rather than theory. What little attention bottom-up modeling did receive in the older modeling literature was almost entirely negative. Conventional wisdom seemed to be that phenomenological laws and curve-fitting methods were devices researchers sometimes had to stoop to in order to get a project off the ground. They were not considered models, but rather “mathematical hypotheses designed to fit experimental data” (Hesse 1967, 38). According to Ernan McMullin, sometimes physicists—and other scientists presumably—simply want a function that summarizes their observations (1967, 390-391). Curve-fitting and phenomenological laws do just that. The question of realism is avoided by denying the legitimacy of bottom-up mathematical models.

In her broad attack on “theory-driven” philosophy of science, Cartwright has recently defended a nearly opposite view (1999). She argues that top-down mathematical models are not realistic, but bottom-up models are. Once again, this verdict follows from a more general thesis about the truth-bearers in science. Cartwright is an antirealist about fundamental laws and abstract theories which, she claims, serve only to systematize scientific knowledge. Since top-down mathematical models use these laws as first principles from which to begin, they cannot possibly represent real systems. Bottom-up models, on the other hand, are not derived from covering laws. They are instead tied to experimental knowledge of particular systems. Unlike fundamental theories and their associated top-down models, bottom-up models are designed to represent actual objects and their behavior. It is this grounding in empirical knowledge that allows these kinds of mathematical models to be the primary device in science for representing real-world systems.

6. Models and the Semantic View of Theories

This typology of models and their properties has been developed with an eye toward scientific practice. Within the philosophy of science itself, models have also played a central role in understanding the nature of scientific theories. For most of the 20th century, philosophers considered theories to be special sets of sentences. Theories on this so-called “syntactic view” are linguistic entities. The meaning of the theory is contained in the sentences that constitute it, roughly the same way the meaning of this article is contained in these sentences. The semantic view, in contrast, uses the model-theoretic language of mathematical logic. In broad terms, a theory just is a family of models. The theory/model distinction collapses. Using the terminology we have already defined, a model in this sense might be an idealized physical model, an existing system in nature, or even a state space. The semantic content of a theory, on this view, is found in a family of models rather than in the sentences that describe them. If a given theory were axiomatized—a rare occurrence—one could think of these models as those entities for which the axioms are true. To take a toy example, say T1 is a theory whose sole axiom is “for any two lines, at most one point lies on both.” Figure 5 is one model that constitutes T1:

Figure 5: A Model of Theory T1

A model for ideal gases would be a physical model of dilute, perfectly elastic atoms in a closed container with an ordered set of parameters P, V, m, M, T> that satisfies the equation . (Respectively, pressure, volume, mass of the gases, molecular weight of the molecules, and temperature. R is a constant). In fact two different sets of parameters P1, V1, m1, M1, T1> and P2, V2, m1, M1, T2> constitute two separate models in the same family.

Some advocates of the semantic view claim that the use of the term “model” is similar in science and in logic (van Fraassen, 1980). This similarity has been one of the motivating forces behind this particular understanding of scientific theories. Given the distinctions made in previous sections of this article, this similarity seems to be questionable.

First, many things that would count as a model on the semantic view, for example the geometric diagram in Figure 5, are not physical models, mathematical models, or state spaces. In what sense, one wonders, are they scientific models? Moreover, a model on the semantic view might be an existing physical system. For example, Jupiter and its moons would constitute another model of Newton’s laws of motion plus universal gravitation. This blurs the distinction between the model and its subject. One may use a physical and/or mathematical model to study celestial bodies, but such entities are not themselves models. The scientist’s use of the term is not this broad.

Second, as we have already seen, sets of equations often constitute mathematical models. In contrast, laws and equations on the semantic approach are said to describe and classify models, but are never themselves taken to be models. Their relation is satisfaction, not identity.

Some time before the semantic view became popular, Hesse issued what still seems to be the correct verdict: “[M]ost uses of ‘model’ in science do carry over from logic the idea of interpretation of a deductive system,” however, “most writers on models in the sciences agree that there is little else in common between the scientist’s and the logician’s use of the term, either in the nature of the entities referred to or in the purpose for which they are used” (1967, 354).

7. References and Further Reading

  • Achinstein, P. “Theoretical Models.” The British Journal for the Philosophy of Science 16 (1965): 102-120.
  • Bunge, M. Method, Model and Matter. Dordrecht: Reidel, 1973.
  • Cartwright, N. How the Laws of Physics Lie. New York: Clarendon Press, 1983.
  • Cartwright, N. The Dappled World. Cambridge: Cambridge University Press, 1999.
  • Hesse, M. Models and Analogies in Science. Notre Dame: University of Notre Dame Press, 1966.
  • Hesse, M. “Models and Analogy in Science.The Encyclopedia of Philosophy. New York: Macmillan Publishing, 1967.
  • McMullin, E. “What do Physical Models Tell Us?” Logic, Methodology, and Philosophy of Science III. Eds. B. van Rootselaar and J. F. Staal. Amsterdam: North-Holland Publishing, 1967: 385-396.
  • Morrison, M. and M. Morgan, eds. Models as Mediators. Cambridge: Cambridge University Press, 1999.
  • Morton, A. “Mathematical Models: Questions of Trustworthiness.” The British Journal for the Philosophy of Science 44 (1993): 659-674.
  • Morton, A. and M. Suàrez. “Kinds of Models.” Model Validation in Hydrological Science. Eds. P. Bates and M. Anderson. New York: John Wiley Press, 2001.
  • Redhead, M. “Models in Physics.” The British Journal for the Philosophy of Science 31 (1980): 154-163.
  • Smith, P. Explaining Chaos. Cambridge: Cambridge University Press, 1998.
  • Van Fraassen, B. The Scientific Image. New York: Clarendon Press, 1980.
  • Wimsatt, W. “False Models as Means to Truer Theories.” Neutral Models in Biology. Eds. M. Nitecki and A. Hoffmann. New York: Oxford University Press, 1987.

Author Information

Jeffrey Koperski
Email: koperski@svsu.edu
Saginaw Valley State University
U. S. A.

Aristotle: Motion and its Place in Nature

aristotle

Aristotle’s account of motion can be found in the Physics. By motion, Aristotle (384-322 BCE) understands any kind of change. He defines motion as the actuality of a potentiality. Initially, Aristotle's definition seems to involve a contradiction. However, commentators on the works of Aristotle, such as St. Thomas Aquinas, maintain that this is the only way to define motion.

In order to adequately understand Aristotle's definition of motion it is necessary to understand what he means by actuality and potentiality. Aristotle uses the words energeia and entelechia interchangeably to describe a kind of action. A linguistic analysis shows that, by actuality, Aristotle means both energeia, which means being-at-work, and entelechia, which means being-at-an-end. These two words, although they have different meanings, function as synonyms in Aristotle's scheme. For Aristotle, to be a thing in the world is to be at work, to belong to a particular species, to act for an end and to form material into enduring organized wholes. Actuality, for Aristotle, is therefore close in meaning to what it is to be alive, except it does not carry the implication of mortality.

From the Middle Ages to modern times, commentators disagreed on the interpretation of Aristotle’s account of motion. An accurate rendering of Aristotle's definition must include apparently inconsistent propositions: (a) that motion is rest, and (b) that a potentiality, which must be, if anything, a privation of actuality, is at the same time that actuality of which it is the lack. St. Thomas Aquinas was prepared to take these propositions seriously. St. Thomas observes that to say that something is in motion is just to say that it is both what it is already and something else that it is not yet. Accordingly, motion is the mode in which the future belongs to the present, it is the present absence of just those particular absent things which are about to be. St. Thomas thus resolves the apparent contradiction between potentiality and actuality in Aristotle's definition of motion by arguing that in every motion actuality and potentiality are mixed or blended.

St. Thomas’ interpretation of Aristotle's definition of motion, however, is not free of difficulties. His interpretation seems to trivialize the meaning of entelechia. One implication of this interpretation is that whatever happens to be the case right now is an entelechia, as though something which is intrinsically unstable as the instantaneous position of an arrow in flight deserved to be described by the word which Aristotle everywhere else reserves for complex organized states which persist, which hold out in being against internal and external causes tending to destroy them.

In the Metaphysics, however, Aristotle draws a distinction between two kinds of potentiality. On the one hand, there are latent or inactive potentialities. On the other hand, there are active or at-work potentialities. Accordingly, every motion is a complex whole, an enduring unity which organizes distinct parts. Things have being to the extent that they are or are part of determinate wholes, so that to be means to be something, and change has being because it always is or is part of some determinate potentiality, at work and manifest in the world as change.

Table of Contents

  1. Introduction
  2. Energeia and Entelechia
  3. The Standard Account of Aristotle's View of Motion
  4. Thomas' Account of Aristotle's View of Motion
  5. The Limits of Thomas' Account
  6. Facing the Contradictions of Aristotle's Account of Motion
  7. What Motion Is
  8. Zeno's Paradoxes and Aristotle's Definition of Motion
  9. References and Further Reading

1. Introduction

Aristotle defines motion, by which he means change of any kind, as the actuality of a potentiality as such (or as movable, or as a potentiality -- Physics 201a 10-11, 27-29, b 4-5). The definition is a conjunction of two terms which normally contradict each other, along with, in Greek, a qualifying clause which seems to make the contradiction inescapable. Yet St. Thomas Aquinas called it the only possible way to define motion by what is prior to and better known than motion. At the opposite extreme is the young Descartes, who in the first book he wrote announced that while everyone knows what motion is, no one understands Aristotle's definition of it. According to Descartes, "motion . . . is nothing more than the action by which any body passes from one place to another" (Principles II, 24). The use of the word "passes" makes this definition an obvious circle; Descartes might just as well have called motion the action by which a thing moves. But the important part of Descartes' definition is the words "nothing more than," by which he asserts that motion is susceptible of no definition which is not circular, as one might say "the color red is just the color red," to mean that the term is not reducible to some modification of a wave, or analyzable in any other way. There must be ultimate terms of discourse, or there would be no definitions, and indeed no thought. The point is not that one cannot construct a non-circular definition of such a term, one claimed to be properly irreducible, but that one ought not to do so. The true atoms of discourse are those things which can be explained only by means of things less known than themselves. If motion is such an ultimate term, then to define it by means of anything but synonyms is willfully to choose to dwell in a realm of darkness, at the sacrifice of the understanding which is naturally ours in the form of "good sense" or ordinary common sense.

Descartes' treatment of motion is explicitly anti-Aristotelian and his definition of motion is deliberately circular. The Cartesian physics is rooted in a disagreement with Aristotle about what the best-known things are, and about where thought should take its beginnings. There is, however, a long tradition of interpretation and translation of Aristotle's definition of motion, beginning at least five hundred years before Descartes and dominating discussions of Aristotle today, which seeks to have things both ways. An unusually clear instance of this attitude is found in the following sentence from a medieval Arabic commentary: "Motion is a first entelechy of that which is in potentiality, insofar as it is in potentiality, and if you prefer you may say that it is a transition from potentiality to actuality." You will recognize the first of these two statements presented as equivalent as a translation of Aristotle's definition, and the second as a circular definition of the same type as that of Descartes. Motion is an entelechy; motion is a transition. The strangeness of the word "entelechy" masks the contradiction between these two claims. We must achieve an understanding of Aristotle's word entelechia, the heart of his definition of motion, in order to see that what it says cannot be said just as well by such a word as "transition."

2. Energeia and Entelechia

The word entelecheia was invented by Aristotle, but never defined by him. It is at the heart not only of his definition of motion, but of all his thought. Its meaning is the most knowable in itself of all possible objects of the intellect. There is no starting point from which we can descend to put together the cements of its meaning. We can come to an understanding of entelecheia only by an ascent from what is intrinsically less knowable than it, indeed knowable only through it, but more known because more familiar to us. We have a number of resources by which to begin such an ascent, drawing upon the linguistic elements out of which Aristotle constructed the word, and upon the fact that he uses the wordenergeia as a synonym, or all but a synonym, for entelecheia.

The root of energeia is ergonó deed, work, or actó from which comes the adjective energon used in ordinary speech to mean active, busy, or at work. Energeia is formed by the addition of a noun ending to the adjective energon; we might construct the word is-at-work-ness from Anglo-Saxon roots to translateenergeia into English, or use the more euphonious periphrastic expression, being-at-work. If we are careful to remember how we got there, we could alternatively use Latin roots to make the word "actuality" to translate energeia. The problem with this alternative is that the word "actuality" already belongs to the English language, and has a life of its own which seems to be at variance with the simple sense of being active. By the actuality of a thing, we mean not its being-in-action but its being what it is. For example, there is a fish with an effective means of camouflage: it looks like a rock but it is actually a fish. When an actuality is attributed to that fish, completely at rest at the bottom of the ocean, we don't seem to be talking about any activity. But according to Aristotle, to be something always means to be at work in a certain way. In the case of the fish at rest, its actuality is the activity of metabolism, the work by which it is constantly transforming material from its environment into parts of itself and losing material from itself into its environment, the activity by which the fish maintains itself as a fish and as just the fish it is, and which ceases only when the fish ceases to be. Any static state which has any determinate character can only exist as the outcome of a continuous expenditure of effort, maintaining the state as it is. Thus even the rock, at rest next to the fish, is in activity: to be a rock is to strain to be at the center of the universe, and thus to be in motion unless constrained otherwise, as the rock in our example is constrained by the large quantity of earth already gathered around the center of the universe. A rock at rest at the center is at work maintaining its place, against the counter-tendency of all the earth to displace it. The center of the universe is determined only by the common innate activity of rocks and other kinds of earth. Nothing is which is not somehow in action, maintaining itself either as the whole it is, or as a part of some whole. A rock is inorganic only when regarded in isolation from the universe as a whole which is an organized whole just as blood considered by itself could not be called alive yet is only blood insofar as it contributes to the maintenance of some organized body. No existing rock can fail to contribute to the hierarchical organization of the universe; we can therefore call any existing rock an actual rock.

Energeia, then, always means the being-at-work of some definite, specific something; the rock cannot undergo metabolism, and once the fish does no more than fall to earth and remain there it is no longer a fish. The material and organization of a thing determine a specific capacity or potentiality for activity with respect to which the corresponding activity has the character of an end (telos). Aristotle says "the act is an end and the being-at-work is the act and since energeia is named from the ergon it also extends to the being-at-an-end (entelecheia)" (Metaphysics 1050a 21-23). The word entelecheia has a structure parallel to that of energeia. From the root word telos, meaning end, comes the adjective enteles, used in ordinary speech to mean complete, perfect, or full-grown. But while energeia, being-at-work, is made from the adjective meaning at work and a noun ending, entelecheia is made from the adjective meaning complete and the verb exein. Thus if we translate entelecheia as "completeness" or "perfection," the contribution the meaning of exein makes to the term is not evident. Aristotle probably uses exein for two reasons which lead to the same conclusion: First, one of the common meanings of exein is "to be" in the sense of to remain, to stay, or to keep in some condition specified by a preceding adverb as in the idiomskalos exei, "things are going well," or kakos exei, "things are going badly." It means "to be" in the sense of to continue to be. This is only one of several possible meanings of exein, but there is a second fact which makes it likely that it is the meaning which would strike the ear of a Greek-speaking person of Aristotle's time. There was then in ordinary use the word endelecheia, differing from Aristotle's wordentelecheia only by a delta in place of the tau. Endelecheia means continuity or persistence. As one would expect, there was a good deal of confusion in ancient times between the invented and undefined term entelecheia and the familiar word endelecheia. The use of the pun for the serious philosophic purpose of saying at once two things for whose union the language has no word was a frequent literary device of Aristotle's teacher Plato. In this striking instance, Aristotle seems to have imitated the playful style of his teacher in constructing the most important term in his technical vocabulary. The addition ofexein to enteles, through the joint action of the meaning of the suffix and the sound of the whole, superimposes upon the sense of "completeness" that of continuity. Entelecheia means continuing in a state of completeness, or being at an end which is of such a nature that it is only possible to be there by means of the continual expenditure of the effort required to stay there. Just as energeia extends toentelecheia because it is the activity which makes a thing what it is, entelecheia extends to energeiabecause it is the end or perfection which has being only in, through, and during activity. For the remainder of this entry, the word "actuality" translates both energeia and entelecheia, and "actuality" means just that area of overlap between being-at-work and being-at-an-end which expresses what it means to be something determinate. The words energeia and entelecheia have very different meanings, but function as synonyms because the world is such that things have identities, belong to species, act for ends, and form material into enduring organized wholes. The word actuality as thus used is very close in meaning to the word life, with the exception that it is broader in meaning, carrying no necessary implication of mortality.

Kosman [1969] interprets the definition in substantially the same way as it is interpreted above, utilizing examples of kinds of entelecheia given by Aristotle in On the Soul, and thus he succeeds in bypassing the inadequate translations of the word. The Sachs 1995 translation of Aristotle's Physics translatesentelecheia as being-at-work-staying-itself.

3. The Standard Account of Aristotle's View of Motion

We embarked on this quest for the meaning of entelecheia in order to decide whether the phrase "transition to actuality" could ever properly render it. The answer is now obviously "no." An actuality is something ongoing, but only the ongoing activity of maintaining a state of completeness or perfection already reached; the transition into such a state always lacks and progressively approaches the perfected character which an actuality always has. A dog is not a puppy: the one is, among other things, capable of generating puppies and giving protection, while the other is incapable of generation and in need of protection. We might have trouble deciding exactly when the puppy has ceased to be a puppy and become a dog at the age of one year, for example, it will probably be fully grown and capable of reproducing, but still awkward in its movements and puppyish in its attitudes, but in any respect in which it has become a dog it has ceased to be a puppy.

But our concern was to understand what motion is, and it is obviously the puppy which is in motion, since it is growing toward maturity, while the dog is not in motion in that respect, since its activity has ceased to produce change and become wholly directed toward self-maintenance. If the same thing cannot be in the same respect both an actuality and a transition to actuality, it is clearly the transition that motion is, and the actuality that it isn't. It seems that Descartes is right and Aristotle is wrong. Of course it is possible that Aristotle meant what Descartes said, but simply used the wrong word, that he called motion anentelecheia three times, at the beginning, middle, and end of his explanation of what motion is, when he really meant not entelecheia but the transition or passage to entelecheia. Now, this suggestion would be laughable if it were not what almost everyone who addresses the question today believes. Sir David Ross, certainly the most massively qualified authority on Aristotle of those who have lived in our century and written in our language, the man who supervised the Oxford University Press's forty-five year project of translating all the works of Aristotle into English, in a commentary, on Aristotle's definition of motion, writes: "entelecheia must here mean 'actualization,' not 'actuality'; it is the passage to actuality that iskinesis" (Physics, text with commentary, London, 1936, p. 359). In another book, his commentary on the Metaphysics, Ross makes it clear that he regards the meaning entelecheia has in every use Aristotle makes of it everywhere but in the definition of motion as being not only other than but incompatible with the meaning "actualization." In view of that fact, Ross' decision that "entelecheia must here mean 'actualization'" is a desperate one, indicating a despair of understanding Aristotle out of his own mouth. It is not translation or interpretation but plastic surgery.

Ross' full account of motion as actualization (Aristotle, New York, 1966, pp. 81-82) cites no passages from Aristotle, and no authorities, but patiently explains that motion is motion and cannot, therefore, be an actuality. There are authorities he could have cited, including Moses Maimonides, the twelfth century Jewish philosopher who sought to reconcile Aristotle's philosophy with the Old Testament and Talmud, and who defined motion as "the transition from potentiality to actuality," and the most famous Aristotelian commentator of all time, Averroes, the twelfth century Spanish Muslim thinker, who called motion a passage from non-being to actuality and complete reality. In each case the circular definition is chosen in preference to the one which seems laden with contradictions. A circular statement, to the extent that it is circular, is at least not false, and can as a whole have some content: Descartes' definition amounts to saying "whatever motion is, it is possible only with respect to place," and that of Averroes, Maimonides, and Ross amounts to saying "whatever motion is, it results always in an actuality." An accurate rendering of Aristotle's definition would amount to saying (a) that motion is rest, and (b) that a potentiality, which must be, at a minimum, a privation of actuality, is at the same time that actuality of which it is the lack. There has been one major commentator on Aristotle who was prepared to take seriously and to make sense of both these claims.

4. Thomas' Account of Aristotle's View of Motion

St. Thomas Aquinas, in his interpretation of Aristotle's definition of motion, (Commentary on Aristotle's Physics, London, 1963, pp. 136-137), observes two principles: (1) that Aristotle meant what he wrote, and (2) that what Aristotle wrote is worth the effort of understanding. Writing a century after Maimonides and Averroes, Thomas disposes of their approach to defining motion with few words: it is not Aristotle's definition and it is an error. A passage, a transition, an actualization, an actualizing, or any of the more complex substantives to which translators have resorted which incorporate in some more or less disguised form some progressive sense united to the meaning of actuality, all have in common that they denote a kind of motion. If motion can be defined, then to rest content with explaining motion as a kind of motion is certainly to err; even if one is to reject Aristotle's definition on fundamental philosophical grounds, as Descartes was to do, the first step must be to see what it means. And Thomas explains clearly and simply a sense in which Aristotle's definition is both free of contradiction and genuinely a definition of motion. One must simply see that the growing puppy is a dog, that the half formed lump of bronze on which the sculptor is working is a statue of Hermes, that the tepid water on the fire is hot; what it means to say that the puppy is growing, the bronze is being worked, or the water is being heated, is that each is not just the complex of characteristics it possesses right now; in each case, something that the thing is not yet, already belongs to it as that toward which it is, right now, ordered. To say that something is in motion is just to say that it is both what it is already and something else that it isn't yet. What else do we mean by saying that the puppy is growing, rather than remaining what it is, that the bronze under the sculptor's hand is in a different condition from the identically shaped lump of bronze he has discarded, or that the water is not just tepid but being heated? Motion is the mode in which the future belongs to the present, is the present absence of just those particular absent things which are about to be.

Thomas discusses in detail the example of the water being heated. Assume it to have started cold, and to have been heated so far to room temperature. The heat it now has, which has replaced the potentiality it previously had to be just that hot, belongs to it in actuality. The capacity it has to be still hotter belongs to it in potentiality. To the extent that it is actually hot it has been moved; to the extent that it is not yet as hot as it is going to be, it is not yet moved. The motion is just the joint presence of potentiality and actuality with respect to same thing, in this case heat.

In Thomas' version of Aristotle's definition one can see the alternative to Descartes' approach to physics. Since Descartes regards motion as ultimate and given, his physics will give no account of motion itself, but describe the transient static configurations through which the moving things pass. By Thomas' account, motion is not ultimate but is a consequence of the way in which present states of things are ordered toward other actualities which do not belong to them. One could build on such an account a physics of forces, that is, of those directed potentialities which cause a thing to move, to pass over from the actuality it possesses to another which it lacks but to which it is ordered. Motion will thus not have to be understood as the mysterious departure of things from rest, which alone can be described, but as the outcome of the action upon one another of divergent and conflicting innate tendencies of things. Rest will be the anomaly, since things will be understood as so constituted by nature as to pass over of themselves into certain states of activity, but states of rest will be explainable as dynamic states of balance among things with opposed tendencies. Leibniz, who criticized Descartes' physics and invented a science of dynamics, explicitly acknowledged his debt to Aristotle (see, e.g., Specimen Dynamicum), whose doctrine of entelecheia he regarded himself as restoring in a modified form. From Leibniz we derive our current notions of potential and kinetic energy, whose very names, pointing to the actuality which is potential and the actuality which is motion, preserve the Thomistic resolutions of the two paradoxes in Aristotle's definition of motion.

5. The Limits of Thomas' Account

But though the modern science of dynamics can be seen in germ in St. Thomas' discussion of motion, it can be seen also to reveal difficulties in Thomas' conclusions. According to Thomas, actuality and potentiality do not exclude one another but co-exist as motion. To the extent that an actuality is also a potentiality it is a motion, and to the extent that an actuality is a motion it is a potentiality. The two seeming contradictions cancel each other in the dynamic actuality of the present state which is determined by its own future. But are not potential and kinetic energy two different things? A rock held six feet above the ground has been actually moved identically to the rock thrown six feet above the ground, and at that distance each strains identically to fall to earth; but the one is falling and the other isn't. How can the description which is common to both, when one is moving and the other is at rest, be an account of what motion is? It seems that everything which Thomas says about the tepid water which is being heated can be said also of the tepid water which has been removed from the fire. Each is a coincidence of a certain actuality of heat with a further potentiality to the same heat. What does it mean to say that the water on the fire has, right now, an order to further heat which the water off the fire lacks? If we say that the fire is acting on the one and not on the other in such a way as to disturb its present state, we have begged the question and returned to the position of presupposing motion to explain motion. Thomas' account of Aristotle's definition of motion, though immeasurably superior to that of Sir David Ross as interpretation, and far more sophisticated as an approach to and specification of the conditions an account of motion would have to meet, seems ultimately subject to the same circularity. Maimonides, Averroes, and Ross fail to say how motion differs from rest. Thomas fails to say how any given motion differs from a corresponding state of balanced tension, or of strain and constraint.

The strength of Thomas' interpretation of the definition of motion comes from his taking every word seriously. When Ross discusses Aristotle's definition, he gives no indication of why the he toiouton, or "insofar as it is such," clause should have been included. By Thomas' account, motion is the actuality of any potentiality which is nevertheless still a potentiality. It is the actuality which has not canceled its corresponding potentiality but exists along with it. Motion then is the actuality of any potentiality insofar as it is still a potentiality. This is the formula which applies equally well to the dynamic state of rest and the dynamic state of motion. We shall try to advance our understanding by being still more careful about the meaning of the pronoun he.

Thomas' account of the meaning of Aristotle's definition forces him to construe the grammar of the definition in such a way that the clause introduced by the dative singular feminine relative pronoun he has as its antecedent, in two cases, the neuter participle tou ontos, and in the third, the neuter substantive adjective tou dunatou. It is true that this particular feminine relative pronoun often had an adverbial sense to which its gender was irrelevant, but in the three statements of the definition of motion there is no verb but estin. If the clause is understood adverbially, then, the sentence must mean something like: if motion is a potentiality, it is the actuality of a potentiality. Whatever that might mean, it could at any rate not be a definition of motion. Thus the clause must be understood adjectivally, and Thomas must make the relative pronoun dependent upon a word with which it does not agree in gender. He makes the sentence say that motion is the actuality of the potentiality in which there is yet potentiality. Reading the pronoun as dependent upon the feminine noun entelecheia with which it does agree, we find the sentence saying that motion is the actuality as which it is a potentiality of the potentiality, or the actuality as a potentiality of the potentiality.

6. Facing the Contradictions of Aristotle's Account of Motion

This reading of the definition implies that potentialities exist in two ways, that it is possible to be a potentiality, yet not be an actual potentiality. The beginning of this entry says that Aristotle's definition of motion was made by putting together two terms, actuality and potentiality, which normally contradict each other. Thomas resolved the contradiction by arguing that in every motion actuality and potentiality are mixed or blended, that the condition of becoming-hot of the water is just the simultaneous presence in the same water of some actuality of heat and some remaining potentiality of heat. Earlier it was stated that there was a qualifying clause in Aristotle's definition which seemed to intensify, rather than relieve, the contradiction. This refers to the he toiouton, or he kineton, or he dunaton, which appears in each version of the definition, and which, being grammatically dependent on entelecheia, signifies something the very actuality of which is potentiality. The Thomistic blend of actuality and potentiality has the characteristic that, to the extent that it is actual it is not potential and to the extent that it is potential it is not actual; the hotter the water is, the less is it potentially hot, and the cooler it is, the less is it actually, the more potentially, hot.

The most serious defect in Saint Thomas' interpretation of Aristotle's definition is that, like Ross' interpretation, it broadens, dilutes, cheapens, and trivializes the meaning of the word entelecheia. An immediate implication of the interpretations of both Thomas and Ross is that whatever happens to be the case right now is an entelecheia, as though being at 70 degrees Fahrenheit were an end determined by the nature of water, or as though something which is intrinsically so unstable as the instantaneous position of an arrow in flight deserved to be described by the word which Aristotle everywhere else reserves for complex organized states which persist, which hold out in being against internal and external causes tending to destroy them.

Aristotle's definition of motion applies to any and every motion: the pencil falling to the floor, the white pages in the book turning yellow, the glue in the binding of the book being eaten by insects. Maimonides, Averroes, and Ross, who say that motion is always a transition or passage from potentiality to actuality, must call the being-on-the-floor of the pencil, the being-yellow of the pages, and the crumbled condition of the binding of the book actualities. Thomas, who says that motion is constituted at any moment by the joint presence of actuality and potentiality, is in a still worse position: he must call every position of the pencil on the way to the floor, every color of the pages on the way to being yellow, and every loss of a crumb from the binding an actuality. If these are actualities, then it is no wonder that philosophers such as Descartes rejected Aristotle's account of motion as a useless redundancy, saying no more than that whatever changes, changes into that into which it changes.

We know however that the things Aristotle called actualities are limited in number, and constitute the world in its ordered finitude rather than in its random particularity. The actuality of the adult horse is one, although horses are many and all different from each other. Books and pencils are not actualities at all, even though they are organized wholes, since their organizations are products of human art, and they maintain themselves not as books and pencils but only as earth. Even the organized content of a book, such as that of the first three chapters of Book Three of Aristotle's Physics, does not exist as an actuality, since it is only the new labor of each new reader that gives being to that content, in this case a very difficult labor. By this strict test, the only actualities in the world, that is, the only things which, by their own innate tendencies, maintain themselves in being as organized wholes, seem to be the animals and plants, the ever-the-same orbits of the ever-moving planets, and the universe as a whole. But Aristotle has said that every motion is an entelecheia; if we choose not to trivialize the meaning of entelecheia to make it applicable to motion, we must deepen our understanding of motion to make it applicable to the meaning of entelecheia.

7. What Motion Is

In the Metaphysics, Aristotle argues that if there is a distinction between potentiality and actuality at all, there must be a distinction between two kinds of potentiality. The man with sight, but with his eyes closed, differs from the blind man, although neither is seeing. The first man has the capacity to see, which the second man lacks. There are then potentialities as well as actualities in the world. But when the first man opens his eyes, has he lost the capacity to see? Obviously not; while he is seeing, his capacity to see is no longer merely a potentiality, but is a potentiality which has been put to work. The potentiality to see exists sometimes as active or at-work, and sometimes as inactive or latent. But this example seems to get us no closer to understanding motion, since seeing is just one of those activities which is not a motion. Let us consider, then, a man's capacity to walk across the room. When he is sitting or standing or lying still, his capacity to walk is latent, like the sight of the man with his eyes closed; that capacity nevertheless has real being, distinguishing the man in question from a man who is crippled to the extent of having lost all potentiality to walk. When the man is walking across the room, his capacity to walk has been put to work. But while he is walking, what has happened to his capacity to be at the other side of the room, which was also latent before he began to walk? It too is a potentiality which has been put to work by the act of walking. Once he has reached the other side of the room, his potentiality to be there has been actualized in Ross' sense of the term, but while he is walking, his potentiality to be on the other side of the room is not merely latent, and is not yet canceled by, an actuality in the weak sense, the so-called actuality of being on that other side of the room; while he is walking his potentiality to be on the other side of the room is actual just as a potentiality. The actuality of the potentiality to be on the other side of the room, as just that potentiality, is neither more nor less than the walking across the room.

A similar analysis will apply to any motion whatever. The growth of the puppy is not the actualization of its potentiality to be a dog, but the actuality of that potentiality as a potentiality. The falling of the pencil is the actuality of its potentiality to be on the floor, in actuality as just that: as a potentiality to be on the floor. In each case the motion is just the potentiality qua actual and the actuality qua potential. And the sense we thus give to the word entelecheia is not at odds with its other uses: a motion is like an animal in that it remains completely and exactly what it is through time. My walking across the room is no more a motion as the last step is being taken than at any earlier point. Every motion is a complex whole, an enduring unity which organizes distinct parts, such as the various positions through which the falling pencil passes. As parts of the motion of the pencil, these positions, though distinct, function identically in the ordered continuity determined by the potentiality of the pencil to be on the floor. Things have being to the extent that they are or are part of determinate wholes, so that to be means to be something, and change has being because it always is or is part of some determinate potentiality, at work and manifest in the world as change.

8. Zeno's Paradoxes and Aristotle's Definition of Motion

Consider the application of Aristotle's account of motion to two paradoxes famous in antiquity. Zeno argued in various ways that there is no motion. According to one of his arguments, the arrow in flight is always in some one place, therefore always at rest, and therefore never in motion. We can deduce from Aristotle's definition that Zeno has made the same error, technically called the fallacy of composition, as one who would argue that no animal is alive since its head, when cut off, is not alive, its blood, when drawn out, is not alive, its bones, when removed are not alive, and so on with each part in turn. The second paradox is one attributed to Heraclitus, and taken as proving that there is nothing but motion, that is, no identity, in the world. The saying goes that one cannot step into the same river twice. If the river flows, how can it continue to be itself? But the flux of the river, like the flight of the arrow, is an actuality of just the kind Aristotle formulates in his definition of motion. The river is always the same, as a river, precisely because it is never the same as water. To be a river is to be the always identical actuality of the potentiality of water to be in the sea.

For more discussion of Aristotle's solution to Zeno's paradoxes, see "Zeno: Aristotle's Treatment of Zeno's Paradoxes."

9. References and Further Reading

  • Aristotle, Metaphysics, Joe Sachs (trans.), Green Lion Press, 1999.
  • Aristotle, Nicomachean Ethics, Joe Sachs (trans.), Focus Philosophical Library, Pullins Press, 2002.
  • Aristotle, On the Soul, Joe Sachs (trans.), Green Lion Press, 2001.
  • Aristotle, Poetics, Joe Sachs (trans.), Focus Philosophical Library, Pullins Press, 2006.
  • Aristotle, Physics, Joe Sachs (trans.), Rutgers U. P., 1995.
  • Kosman, L. A. "Aristotle's Definition of Motion," Phronesis, 1969.

Author Information

Joe Sachs
Email: joe.sachs@sjc.edu
St. John's College
U. S. A.

Aristotle: Biology

aristotle

Aristotle (384-322 BCE.) may be said to be the first biologist in the Western tradition. Though there are physicians and other natural philosophers who remark on various flora and fauna before Aristotle, none of them brings to his study a systematic critical empiricism. Aristotle’s biological science is important to understand, not only because it gives us a view into the history and philosophy of science, but also because it allows us more deeply to understand his non-biological works, since certain key concepts from Aristotle’s biology repeat themselves in his other writings. Since a significant portion of the corpus of Aristotle’s work is on biology, it is natural to expect his work in biology to resonate in his other writings. One may, for example, use concepts from the biological works to better understand the ethics or metaphysics of Aristotle.

This article will begin with a brief explanation of his biological views and move toward several key explanatory concepts that Aristotle employs. These concepts are essential because they stand as candidates for a philosophy of biology. If Aristotle’s principles are insightful, then he has gone a long way towards creating the first systematic and critical system of biological thought. It is for this reason (rather than the particular observations themselves) that moderns are interested in Aristotle’s biological writings.

Table of Contents

  1. His Life
  2. The Scope of Aristotle’s Biological Works
  3. The Specialist and the Generalist
  4. The Two Modes of Causal Explanation
  5. Aristotle’s Theory of Soul
  6. The Biological Practice: Outlines of a Systematics
  7. “The more and the less” and “Epi to polu”
  8. Significant Achievements and Mistakes
  9. Conclusion
  10. References and Further Reading
    1. Primary Text
    2. Key Texts in Translation
    3. Selected Secondary Sources

1. His Life

Aristotle was born in the year 384 B.C. in the town of Stagira (the modern town Stavros), a coastal Macedonian town to the north of Greece. He was raised at the court of Amyntas where he probably met and was friends with Philip (later to become king and father to Alexander, the Great). When Aristotle was around 18, he was sent to Athens to study in Plato’s Academy. Aristotle spent twenty years at the Academy until Plato’s death, although Diogenes says Aristotle left before Plato’s death. When Plato was succeeded by his nephew, Speusippus, as head of the Academy, Aristotle accepted an invitation to join a former student, Hermeias, who was gathering a Platonic circle about him in Assos in Mysia (near Troy). Aristotle spent three years in this environment. During this time, he may have done some of the natural investigations that later became The History of Animals.

At the end of Aristotle’s stay in Mysia, he moved to Lesbos (an adjacent island). This move may have been prompted by Theophrastus, a fellow of the Academy who was much influenced by Aristotle. It is probable (according to D’Arcy Thompson) that Aristotle performed some important biological investigations during this period.

Aristotle returned to Athens (circa 334-5). This began a period of great productivity. He rented some grounds in woods sacred to Apollo. It was here that Aristotle set-up his school (Diog. Laert V, 51).

At his school Aristotle also accumulated a large number of manuscripts and created a library that was a model for later libraries in Alexandria and Pergamon. According to one tradition, Alexander (his former pupil) paid him a handsome sum of money each year as a form of gratitude (as well as some exotic animals for Aristotle to study that Alexander encountered in his conquests).

At the death of Alexander in 323, Athens once again was full of anti-Macedonian sentiment. A charge of impiety was brought against Aristotle due to a poem he had written for Hermeias. One martyr for philosophy (Socrates) was enough for Aristotle and so he left his school to his colleague, Theophrastus, and fled to the Macedonian Chalcis. Here in 322 he died of a disease that is still the subject of speculation.

2. The Scope of Aristotle’s Biological Works

There is some dispute as to which works should be classified as the biological works of Aristotle. This is indeed a contentious question that is especially difficult for a systematic philosopher such as Aristotle. Generally speaking, a systematic philosopher is one who constructs various philosophical distinctions that, in turn, can be applied to a number of different contexts. Thus, a distinction such as “the more and the less” that has its roots in biology explaining that certain animal parts are greater (bigger) among some individuals and smaller among others, can also be used in the ethics as a cornerstone of the doctrine of the mean as a criterion for virtue. That is, one varies from the mean by the principle of the more and the less. For example, if courage is the mean, then the defect of excess would be “foolhardiness” while the defect of paucity would be “cowardice.” The boundary between what we’d consider “biology” proper vs. what we’d think of as psychology, philosophy of mind, and metaphysics is often hard to draw in Aristotle. That’s because Aristotle’s understanding of biology informs his metaphysics and philosophy of mind, but likewise, he often uses the distinctions drawn in his metaphysics in order to deal with biological issues.

In this article, the biological works are: (a) works that deal specifically with biological topics such as: The Parts of Animals (PA), The Generation of Animals (GA), The History of Animals (HA), The Movement of Animals, The Progression of Animals, On Sense and Sensible Objects, On Memory and Recollection, On Sleep and Waking, On Dreams, On Prophecy in Sleep, On Length and Shortness of Life, On Youth and Old Age, On Life and Death, On Respiration, On Breath, and On Plants, and  (b) the work that deals with psuche (soul), On the Soul—though this work deals with metaphysical issues very explicitly, as well. This list does not include works such as the Metaphysics, Physics, Posterior Analytics, Categories, Nicomachean Ethics, or The Politics even though they contain many arguments that are augmented by an understanding of Aristotle’s biological science. Nor does this article examine any of the reputedly lost works (listed by ancient authors but not existing today) such as Dissections, On Composite Animals, On Sterility, On Physiognomy, and On Medicine . Some of these titles may have sections that have survived in part within the present corpus, but this is doubtful.

3. The Specialist and the Generalist

The distinction between the specialist and the generalist is a good starting point for understanding Aristotle’s philosophy of biology. The specialist is one who has a considerable body of experience in practical fieldwork while the generalist is one who knows many different areas of study. This distinction is brought out in Book One of the Parts of Animals (PA). At PA 639a 1-7 Aristotle says,

In all study and investigation, be it exalted or mundane, there appear to be two types of proficiency: one is that of exact, scientific knowledge while the other is a generalist’s understanding. (my tr.)

Aristotle does not mean to denigrate or to exalt either. Both are necessary for natural investigations. The generalist’s understanding is holistic and puts some area of study into a proper genus, while scientific knowledge deals with causes and definitions at the level of the species. These two skills are demonstrated by the following example:

An example of what I mean is the question of whether one should take a single species and state its differentia independently, for example, homo sapiens nature or the nature of Lions or Oxen, etc., or should we first set down common attributes or a common character (PA 639a 15-19, my tr.).

In other words, the methodology of the specialist would be to observe and catalogue each separate species by itself. The generalist, on the other hand, is drawn to making more global connections through an understanding of the common character of many species. Both skills are needed. Here and elsewhere Aristotle demonstrates the limitations of a single mode of discovery. We cannot simply set out a single path toward scientific investigation—whether it be demonstrative (logical) exactness (the specialist’s understanding) or holistic understanding (the generalist’s knowledge). Neither direction (specialist or generalist) is the one and only way to truth. Really, it is a little of both working in tandem. Sometimes one half takes the lead and sometimes the other. The adoption of several methods is a cornerstone of Aristotelian pluralism, a methodological principle that characterizes much of his work.

When discussing biological science, Aristotle presents the reader two directions: (a) the modes of discovery (genetic order) and (b) the presentation of a completed science (logical order). In the mode of discovery, the specialist sets out all the phenomena in as much detail as possible while the generalist must use her inter-generic knowledge to sort out what may or may not be significant in the event taking place before her. This is because in the mode of discovery, the investigator is in the genetic order. Some possible errors that could be made in this order (for example) might be mistaking certain animal behaviors for an end for which they were not intended. For example, it is very easy to mistake mating behavior for aggressive territorial behavior. Since the generalist has seen many different types of animals, she may be in the best position (on the basis of generic analogy) to classify the sort of behavior in question.

In the mode of discovery one begins with the phenomenon and then seeks to create a causal explanation (PA 646a 25). But how does one go about doing this? In the Posterior Analytics II.19, Aristotle suggests a process of induction that begins with the particular and then moves to the universal. Arriving at the universal entails a comprehensive understanding of some phenomenon. For example, if one wanted to know whether fish sleep, one would first observe fish in their environment. If one of the behaviors of the fish meets the common understanding of sleep (such as being deadened to outside stimulus, showing little to no movement, and so forth), then one may move to the generalization that fish sleep (On Sleeping and Waking 455b 8, cf. On Dreams 458b 9). But one cannot stop there. Once one has determined that fish sleep (via the inductive mode of discovery), it is now up to the researcher to ferret out the causes and reasons why, in a systematic fashion. This second step is the mode of presentation. In this mode the practitioner of biological science seeks to understand why the universal is as it is. Going back to the example of sleeping fish, the scientist would ask why fish need to sleep. Is it by analogy to humans and other animals that seem to gather strength through sleep? What ways might sleep be dangerous (say by opening the individual fish to being eaten)? What do fish do to avoid this?

These, and other questions require the practitioner to work back and forth with what has been set down in the mode of discovery for the purpose of providing an explanation. The most important tools for this exercise are the two modes of causal explanation.

4. The Two Modes of Causal Explanation

For Aristotle there are four causes: material, efficient, formal, and final. The material cause is characterized as “That out of which something existing becomes” (Phys. 194b 24). The material has the potential for the range of final products. Within the material is, in a potential sense, that which is to be formed. Obviously, one piece of wood or metal has the potential to be many artifacts; yet the possibilities are not infinite. The material itself puts constraint upon what can be produced from it. One can execute designs in glass, for example, which could never be brought forth from brass.

The efficient cause is depicted as “that from whence comes the first principle of kinetic change or rest” (Phys. 194b 30). Aristotle gives the example of a male fathering a child as showing an efficient cause. The efficient cause is the trigger that starts a process moving.

The formal cause constitutes the essence of something while the final cause is the purpose of something. For example, Aristotle believed the tongue to be for the purpose of either talking or not. If the tongue was for the purpose of talking (final cause), then it had to be shaped in a certain way, wide and supple so that it might form subtle differences in sound (formal cause). In this way the purpose of the tongue for speaking dovetails with the structural way it might be brought about (P.A. 660a 27-32).

It is generally the case that Aristotle in his biological science interrelates the final and formal causes. For example Aristotle says that the efficient cause may be inadequate to explain change. In the On Generation and Corruption 336a Aristotle states that all natural efficient causes are regulated by formal causes. “It is clear then that fire itself acts and is acted upon.” What this means is that while the fire does act as efficient cause, the manner of this action is regulated by a formal/final cause. The formal cause (via the doctrine of natural place—that arranges an ascending hierarchy among the elements, earth, water, air and fire) dictates that fire is the highest level of the sub-lunar phenomena. Thus, its essence defines its purpose, namely, to travel upward toward its own natural place. In this way the formal and final cause act together to guide the actions of fire (efficient cause) to point upward toward its natural place.

Aristotle (at least in the biological works) invokes a strategy of redundant explanation. Taken at its simplest level, he gives four accounts of everything. However, in the actual practice, it comes about that he really only offers two accounts. In the first account he presents a case for understanding an event via material/kinetic means. For the sake of simplicity, let us call this the ME (materially-based causal explanation) account.

In the second case he presents aspects of essence (formal cause) and purpose (final cause). These are presented together. For the sake of simplicity, let us call this the TE (teleologically-based causal explanation) account. For an example of how these work together, consider respiration.

Aristotle believes that material and efficient causes can give one account of the motions of the air in and out of the lungs for respiration. But this is only part of the story. One must also consider the purpose of respiration and how this essence affects the entire organism (PA 642a 31-642b 4). Thus the combination of the efficient and material causes are lumped together as one sort of explanation ME that focus upon how the nature of hot and cold air form a sort of current that brings in new air and exhales the old. The final and formal causes are linked together as another sort of explanation TE that is tied to why we have respiration in the first place.

In Aristotle’s account respiration we are presented with a partner to TE and ME: necessity. When necessity attaches itself to ME it is called simple or absolute necessity. When necessity attaches itself to TE it is called conditional necessity. Let us return to our example of respiration and examine these concepts in more detail.

First, then there is the formal/final cause of respiration. Respiration exists so that air might be brought into the body for the creation of pneuma (a vital force essential for life). If there were no respiration, there would be no intake of air and no way for it to be heated in the region of the heart and turned intopneuma—an element necessary for life among the blooded animals who live out of water. Thus the TE for respiration is for the sake of producing an essential raw material for the creation of pneuma.

The second mode of explanation, ME, concerns the material and efficient causes related to respiration. These have to do with the manner of a quasi-gas law theory. The hot air in the lungs will tend to stay there unless it is pushed out by the cold incoming air that hurries its exit (cf. On Breath 481b 11). (This is because ‘hot’ and ‘cold’ are two of the essential contraries hot/cold & wet/dry). It is the material natures of the elements that dictate its motions. This is the realm of the ME.

ME is an important mode of explanation because it grounds the practitioner in the empirical facts so that he may not incline himself to offer mere a priori causal accounts. When one is forced to give material and kinetic accounts of some event, then one is grounded in the tangible dynamics of what is happening. This is one important requirement for knowledge.

Now to necessity. Necessity can be represented as a modal operator that can attach itself to either TE or to ME. When it attaches itself to TE, the result is conditional necessity. In conditional necessity one must always begin with the end to be achieved. For example, if one assumes the teleological assumption of natural efficiency, Nature does nothing in vain (GA 741b 5, cf. 739b20, et. al.) then the functions of various animal parts must be viewed within that frame. If we know that respiration is necessary for life, then what animal parts are necessary to allow respiration within different species? The acceptance of the end of respiration causes the investigator to account for how it can occur within a species. The same could be said for other given ends such as “gaining nutrition,” “defending one’s self from attack,” and “reproduction,” among others. When the biologist begins his investigation with some end (whether in the mode of discovery or the mode of scientific presentation), he is creating an account of conditional necessity.

The other sort of necessity is absolute necessity that is the result of matter following its nature (such as fire moving to its natural place). The very nature of the material, itself, creates the dynamics—such as the quasi gas law interactions between the hot and cold air in the lungs. These dynamics may be described without proximate reference to the purpose of the event. In this way ME can function by itself along with simple necessity to give one complete account of an event.

In biological science Aristotle believes that conditional necessity is the most useful of the two necessities in discovery and explanation (PA 639b 25). This is because, in biology, there is a sense that the entire explanation always requires the purpose to set out the boundaries of what is and what is not significant. However, in his practice it is most often the case that Aristotle employs two complete accounts ME and TE in order to reveal different modes of explanation according to his doctrine of pluralism.

5. Aristotle’s Theory of Soul

The word for ‘soul’ in Aristotle is psuche. In Latin it is translated as anima. For many readers, it is the use of the Latin term (particularly as it was used by Christian, Moslem, and Jewish theologians) that forms the basis of our modern understanding of the word. Under the theological tradition, the soul meant an immaterial, detached ruling power within a human. It was immortal and went to God after death. This tradition gave rise to Descartes’ metaphysical dualism: the doctrine that there are two sorts of things that exist (soul and matter), and that soul ruled matter.

Aristotle does not think of soul as the aforementioned theologians do. This is because matter (hyle) and shape (morphe) combine to create a unity not a duality. The philosopher can intellectually abstract out the separate constituents, but in reality they are always united. This unity is often termed hylomorphism (after its root words). Using the terminology of the last section we can identify hyle with ME and morphe with TE. Thus, Aristotle’s doctrine of the soul (understood as hylomorphism) represents a unity of form and function within matter.

From the biological perspective, soul demarcates three sorts of living things: plants, animals, and human beings. In this way soul acts as the cause of a body’s being alive (De An 415b 8). This amalgamation (soul and body) exhibits itself through the presentation of a particular power that characterizes what it means to be alive for that sort of living thing.

The soul is the form of a living body thus constituting its first actuality. Together the body and soul form an amalgamation. This is because when we analyze the whole into its component parts the particular power of the amalgamation is lost. Matter without TE, as we have seen, acts through the nature of its elements (earth, air, fire, and water) and not for its organic purpose. An example that illustrates the relationship between form and matter is the human eye. When an eye is situated in a living body, the matter (and the motions of that matter) of the eye works with the other parts of the body to present the actualization of a particular power: sight. When governed by the actuality (or fulfillment) of its purpose, an eyeball can see (De An 412b 17). Both the matter of the eyeball and its various neural connections (hyle, understood as ME) along with the formal and final causes (morphe, understood as TE) are necessary for sight. Each part has its particular purpose, and that purpose is given through its contribution to the basic tasks associated with essence of the sort of thing in question: plant, animal, human.

It is important not to slip into the theological cum Cartesian sense of anima here. To say that plants and animals have souls is not to assert that there is a Divine rose garden or hound Heaven. We must remember that soul for Aristotle is a hylomorphic unity representing a monism and not a dualism. (The rational soul’s status is less clear since it is situated in no particular organ since Aristotle rejected the brain as the organ of thinking relegating it to a cooling mechanism, PA652b 21-25). It is the dynamic, vital organizing principle of life—nothing more, nothing less.

Plants exhibit the most basic power that living organisms possess: nutrition and reproduction (De An 414a 31). The purpose of a plant is to take in and process materials in such a way that the plant grows. Several consequences follow (for the most part) from an individual plant having a well-operating nutritive soul. Let’s examine one sort of plant, a tree. If a plant exhibits excellence in taking in and processing nutrition it will exhibit various positive effects. First, the tree will have tallness and girth that will see it through different weather conditions. Second, it will live longer. Third, it will drop lots of seeds giving rise to other trees. Thus, if we were to compare two individual trees (of the same species), and one was tall and robust while the other was small and thin, then we would be able to render a judgment about the two individual trees on the basis of their fulfillment of their purpose as plants within that species. The tall and robust tree of that species would be a better tree (functionally). The small and thin tree would be condemned as failing to fulfill its purpose as a plant within that species.

Animals contain the nutritive soul plus some of the following powers: appetite, sensation, and locomotion (De An 414a 30, 414b 1-415a 13). Now, not all animals have all the same powers. For example, some (like dogs) have a developed sense of smell, while others (like cats) have a developed ability to run quickly with balance. This makes simple comparisons between species more difficult, but within one species the same sort of analysis used with plants also holds. That is, between two individual dogs one dog can (for example) smell his prey up to 200 meters away while the other dog can only detect his prey up to 50 meters. (This assumes that being able to detect prey from a distance allows the individual to eat more often.) The first dog is better because he has fulfilled his soul’s function better than the second. The first dog is thus a good dog while the second a bad example of one. What is important here is that animals judged as animals must fulfill that power (soul) particular to it specifically in order to be functionally excellent. This means that dogs (for example) are proximately judged on their olfactory sense and remotely upon their ability to take in nutrition and to reproduce.

Humans contain the nutritive soul and the appetitive-sensory-locomotive souls along with the rational soul. This power is given in a passive, active, and imaginative sense (De An III 3-5). What this means is that first there is a power in the rational soul to perceive sensation and to process it in such a way that it is intelligible. Next, one is able to use the data received in the first step as material for analysis and reflection. This involves the active agency of the mind. Finally, the result (having both a sensory and ratiocinative element) can be arranged in a novel fashion so that the universal mixes with the perceived particular. This is imagination (De An III.3). For example, one might perceive in step-one that your door is hanging at a slant. In step-two you examine the hinges and ponder why the door is hanging in just this way. Finally, in step-three you consider types of solutions that might solve the problem—such as taking a plane to the top of the door, or inserting a “shim” behind one of the hinges. You make your decision about this door in front of you based upon your assessment of the various generic solutions.

The rational soul, thus understood as a multi-step imaginative process, gives rise to theoretical and practical knowledge that, in turn, have other sub-divisions (EN VI). Just as the single nutritive soul of plants was greatly complicated by the addition of souls for the animals, so also is the situation even more complicated with the addition of the rational soul for humans. This is because it has so many different applications. For example, one person may know right and wrong and can act on this knowledge and create habits of the same while another may have productive knowledge of an artist who is able to master the functional requirements of his craft in order to produce well-wrought artifacts. Just as it is hard to compare cats and dogs among animal souls, so it is difficult to judge various instantiations of excellence among human rational souls. However, it is clear that between two persons compared on their ethical virtues and two artists compared on their productive wisdom, we may make intra-category judgments about each. These sorts of judgments begin with a biological understanding of what it means to be a human being and how one may fulfill her biological function based on her possession of the human rational soul (understood in one of the sub-categories of reason). Again, a biological understanding of the soul has implications beyond the field of biology/psychology.

6. The Biological Practice: Outlines of a Systematics

Systematics is the study of how one ought to create a system of biological classification and thus perform taxonomy. (“Systematics” is not to be confused with being a “systematic philosopher.” The former term has a technical meaning related to the theoretical foundations of animal classification and taxonomy. The latter phrase has to do with a tightly structured interlocking philosophical account.) In Aristotle’s logical works, he creates a theory of definition. According to Aristotle, the best way to create a definition is to find the proximate group in which the type of thing resides. For example, humans are a type of thing (species) and their proximate group is animal (or blooded animal). The proximate group is called thegenus. Thus the genus is a larger group of which the species is merely one proper subset. What marks off that particular species as unique? This is the differentia or the essential defining trait. In our example with humans the differentia is “rationality.” Thus the definition of “human” is a rational animal. “Human” is the species, “animal” is the genus and “rationality” is the differentia.

In a similar way, Aristotle adapts his logical theory of genus and species to biology. By thinking in terms of species and their proximate genus, Aristotle makes a statement about the connections between various types of animals. Aristotle does not create a full-blown classification system that can describe all animals, but he does lay the theoretical foundations for such.

The first overarching categories are the blooded and the non-blooded animals. The animals covered by this distinction roughly correspond to the modern distinction between vertebrates and invertebrates. There are also two classes of dualizers that are animals that fit somewhat between categories. Here is a sketch of the categorization:

I. Blooded Animals

A. Live bearing animals

1. Homo Sapiens2. Other mammals without a distinction for primates

B. Egg-laying animals

1. Birds2. Fish

I. Non-Blooded Animals

A. Shell skinned sea animals: testaceaB. Soft shelled sea animals: Crustacea

C. Non-shelled soft skinned sea animals: Cephalopods

D. Insects

E. Bees

I. Dualizers (animals that share properties of more than one group)

A. Whales, seals and porpoises—they give live birth yet they live in the seaB. Bats—they have four appendages yet they fly

C. Sponges—they act like both plants and like animals

Aristotle’s proto-system of classification differs from that of his predecessors who used habitat and other non-functional criteria to classify animals. For example, one theory commonly set out three large groups: air, land, and sea creatures. Because of the functional orientation of Aristotle’s TE, Aristotle repudiates any classification system based upon non-functional accidents. What is important is that the primary activities of life are carried out efficiently through specially designated body parts.

Though Aristotle’s work on classification is by no means comprehensive (but is rather a series of reflections on how to create one), it is appropriate to describe it as meta-systematics. Such reflections are consistent with his other key explanatory concepts of functionalism (TE and ME) as well as his work on logic in the Organon with respect to the utilization of genus and species. Though incomplete, this again is a blueprint of how to construct a systematics. The general structure of meta-systematics also acts as an independent principle that permits Aristotle to examine animals together that are functionally similar. Such a move enhances the reliability of analogy as a tool of explanation.

7. “The more and the less” and “Epi to polu”

“The more and the less” is an explanatory concept that is allied to the ME account. Principally, it is a way that individuation occurs in the non-uniform parts. Aristotle distinguishes two sorts of parts in animals: the uniform and the non-uniform. The uniform parts are those that if you dumped them into a bucket and cut the bucket in half, they would still remain the same. For example, blood is a uniform part. Dump blood into a bucket and cut it in half and it’s still the same blood (just half the quantity). The same is true of tissue, cartilage, tendons, skin, et al. Non-uniform parts change when the bucket test is applied. If you dump a lung into a bucket and cut it in half, you no longer have a proper organ. The same holds true of other organs: heart, liver, pancreas, and so forth, as well as the skeleton (Uniform Parts—PA 646b 20, 648b, 650a 20, 650b, 651b 20, 652a 23; Non-Uniform Parts—PA 656b 25, 622a 17, 665b 20, 683a 20, 684a 25.)

When an individual has excess nutrition (trophe), the excess (perittoma) often is distributed all around (GA 734b 25). An external observer does not perceive the changes to the uniform parts—except, perhaps, stomach fat. But such an observer would perceive the difference in a child who has been well fed (whose non-uniform parts are bigger) than one who hasn’t. The difference is accounted for by the principle of the more and the less.

How does an external observer differentiate between any two people? The answer is that the non-uniform parts (particularly the skeletal structure) differ. Thus, one person’s nose is longer, another stands taller, a third is broader in the shoulders, etc. We all have noses, stand within a range of height and broadness of shoulders, etc. The particular mix that we each possess makes us individuals.

Sometimes, this mix goes beyond the range of the species (eidos). In these instances a part becomes non-functional because it has too much material or too little. Such situations are beyond the natural range one might expect within the species. Because of this, the instance involved is characterized as being unnatural (para phusin).

The possibility of unnatural events occurring in nature affects the status of explanatory principles in biology. We remember from above that there are two sorts of necessity: conditional and absolute. The absolute necessity never fails. It is the sort of necessity that one can apply to the stars that exist in the super lunar realm. One can create star charts of the heavens that will be accurate for a thousand years forward or backward. This is because of the mode of absolute necessity.

However, because conditional necessity depends upon its telos, and because of the principle of the more and the less that is non-teleologically (ME) driven, there can arise a sort of spontaneity (cf. automaton, Phys. II.6) that can alter the normal, expected execution of a task because spontaneity is purposeless. In these cases the input from the material cause is greater or lesser than is usually the case. The result is an unnatural outcome based upon the principle of the more and the less. An example of this might be obesity. Nourishment is delivered to the body in a hierarchical fashion beginning with the primary needs. When all biological needs are met, then the excess goes into hair, nails and body fat. Excess body fat can impair proper function, but not out of design.

Because of the possibility of spontaneity and its unintended consequences, the necessary operative in biological events (conditional necessity) is only “for the most part” (hôs epi to polu). We cannot expect biological explanatory principles to be of the same order as those of the stars. Ceteris paribis principles are the best the biological realm can give. This brute fact gives rise to a different set of epistemic expectations than are often raised in the Prior Analytics and the Posterior Analytics. Our expectations for biology are for general rules that are true in most cases but have many exceptions. This means that biology cannot be an exact science, unlike astronomy. If there are always going to be exceptions that are contrary to nature, then the biologist must do his biology with toleration for these sorts of peripheral anomalies. This disposition is characterized by the doctrine of epi to polu.

8. Significant Achievements and Mistakes

This section will highlight a few of Aristotle’s biological achievements from the perspective of over 2,300 years of hindsight. For simplicity’s sake let us break these up into “bad calls” (observations and conclusions that have proven to be wrong) and “good calls” (observations and conclusions that have proven to be very accurate).

We begin with the bad calls: let’s start with a few of Aristotle’s mistakes. First, Aristotle believed that thinking occurred in the region around the heart and not in the brain (a cooling organ, PA 652b 21-25, cf. HA 514a 16-22). Second, Aristotle thought that men were hotter than women (the opposite is the case). Third, Aristotle overweighed the male contribution in reproduction. Fourth, little details are often amiss such as the number of teeth in women. Fifth, Aristotle believed that spontaneous generation could occur. For example, Aristotle observed that from animal dung certain flies could appear (even though careful observation did not reveal any flies mating and laying their eggs in the dung. The possibility of the eggs already existing in the abdomen of the animal did not occur to Aristotle.) However, these sorts of mistakes are more often than not the result of an a priori principle such as “women being colder and less perfectly formed than men” or the application of his method on (in principle) unobservables—such as human conception in which it is posited that the male provides the efficient, formal, and final cause while the woman provides merely the material cause.

Good Calls: Aristotle examined over 500 different species of animals. Some species came from fishermen, hunters, farmers, and perhaps Alexander. Many other species were viewed in nature by Aristotle. There are some very exact observations made by Aristotle during his stay at Lesbos. It is virtually certain that his early dissection skills were utilized solely upon animals (due to the social prohibition on dissecting humans). One example of this comes from the Generation of Animals in which Aristotle breaks open fertilized chicken eggs at carefully controlled intervals to observe when visible organs were generated. The first organ Aristotle saw was the heart. (In fact it is the spinal cord and the beginnings of the nervous system, but this is not visible without employing modern staining techniques.) On eggs opened later, Aristotle saw other organs. This led Aristotle to come out against a popular theory of conception and development entitled, “the pre-formation theory.” In the pre-formation theory, whose advocates extended until the eighteenth century, all the parts appear all at once and development is merely the growth of these essential parts. The contrary theory that Aristotle espouses is the epigenetic theory. According to epigenesis, the parts are created in a nested hierarchical order. Thus, through his observation, Aristotle saw that the heart was formed first, then he postulated that other parts were formed (also backed-up by observation). Aristotle concludes,

I mean, for instance, not that the heart once formed, fashions the liver, and then the liver fashions something else; but that the one is formed after the other (just as man is formed in time after a child), not by it. The reason of this is that so far as the things formed by nature or by human art are concerned, the formation of that which is potentially brought about by that which is in actuality; so that the form of B would have to be contained in A, e.g., the form of liver would have to be in the heart—which is absurd. (GA 734a 28-35, Peck trans.)

In epigenesis the controlling process of development operates according to the TE plan of creating the most important parts first. Since the heart is the principle (arche) of the body, being the center of blood production and sensation/intelligence, it is appropriate that it should be created first. Then other parts such as the liver, etc. are then created in their appropriate order. The epigenesis-preformation debate lasted two thousand years and Aristotle got it right.

Another interesting observation by Aristotle is the discovery of the reproductive mode of the dog shark,Mustelus laevis (HA 6.10, 565b 1ff.). This species is externally viviparous (live bearing) yet internally oviparous (egg bearing). Such an observation could only have come from dissections and careful observations.

Another observation concerns the reproductive habits of cuttlefish. In this process of hectocotylization, the sperm of the Argonauta among other allied species comes in large spermataphores that the male transfers to the mantle cavity of the female. This complicated maneuver, described in HA 524a 4-5, 541b 9-15, cf. 544a 12, GA 720b 33, was not fully verified by moderns until 1959!

Though Aristotle’s observations on bees in HA seems to be entirely from the beekeeper’s point of view (HA 625b7-22), he does note that there are three classes of bees and that sexual reproduction requires that one class give way. He begins his discussion in the Generation of Animals with the following remark, “The generation of bees is beset with many problems” (GA 759a 9). If there are three classes and two genders, then something is amiss. Aristotle goes through what he feels to be all the possibilities. Though the observations are probably second-hand, Aristotle is still able to evaluate the data. He employs his systematic theory using the over-riding meta-principle that Nature always acts in an orderly way (GA 760a 32) to form his explanation of the function of each type of bee. This means that there must be a purposeful process (TE) that guides generation. However, since neither Aristotle nor the beekeepers had ever seen bee copulation, and since Aristotle allows for asexual generation in some fish, he believes that the case of bees offers him another case in which one class is sterile (complies with modern theory on worker bees), another class creates its own kind and another (this is meant to correspond to the Queen bee—that Aristotle calls a King Bee because it has a stinger and females in nature never have defensive weapons), while the third class creates not its own class but another (this is the drone).

Aristotle has got some of this right and some of it wrong. What he has right is first, bees are unusual in having three classes. Second, one class is infertile and works for the good of the whole. Third, one class (the Queen) is a super-reproducer. However, in the case of bees it is Aristotle’s method rather than his results that stirs admiration. Three meta-principles cause particular note:

  1. Reproduction works with two groups not three. The quickest “solution” would have been to make one group sterile and then make the other two male and female. [This would have been the correct response.] However, since none of the beekeepers reported anything like reproductive behavior among bees and because Aristotle’s own limited observations also do not note this, he is reluctant to make such a reply. It is on the basis of the phainomena that Aristotle rejects bee copulation (GA 759a 10).
  2. Aristotle holds that a priori argument alone is not enough. One must square the most likely explanation with the observed facts.
  3. Via analogy, Aristotle notes that some fish seem not to reproduce and even some flies are generated spontaneously. Thus, assigning the roles to the various classes that he does, Aristotle does not create a sui generis instance. By analogy to other suppositions of his biological theory, Aristotle is able to “solve” a troublesome case via reference to analogy. (Aristotle is also admirably cautious about his own theory, saying that more work is needed.)

What is most important in Aristotle’s accomplishments is his combination of keen observations with a critical scientific method that employs his systematic categories to solve problems in biology and then link these to other issues in human life.

9. Conclusion

Since Aristotle’s biological works comprise almost a third of his writings that have come down to us, and since these writings may have occurred early in his career, it is very possible that the influence of the biological works upon Aristotle’s other writings is considerable. Aristotle’s biological works (so often neglected) should be brought to the fore, not only in the history of biology, but also as a way of understanding some of Aristotle’s non-biological writings.

10. References and Further Reading

a. Primary Text

  • Bekker, Immanuel (ed) update by Olof Gigon , Aristotelis Opera. Berlin, Deutsche Akademie der Wissenschaften, 1831-1870, rpt. W. de Gruyter, 1960-1987.

b. Key Texts in Translation

  • Barnes, Jonathan (ed). The Complete Works of Aristotle: the Revised Oxford Translation. Princeton, NJ: Princeton University Press, 1984.
  • The Clarendon Series of Aristotle:
  • Balme, David (tr and ed). Updated by Allan Gotthelf, De Partibus Animalium I with De Generatione Animalium I (with passages from II 1-3). Oxford: Clarendon Press, 1993).
  • Lennox, James G. (tr and ed) Aristotle on the Parts of Animals I-4. Oxford: Clarendon Press, 2002.
  • The Loeb Series of Aristotle (opposite pages of Greek and English).

c. Selected Secondary Sources

  • Balme, David. “Aristotle’s Use of Differentiae in Zoology.” Aristote et les Problèms de Méthode.Louvain: Publications Universitaires 1961.
  • Balme, David. “GENOS and EIDOS in Aristotle’s Biology” The Classical Quarterly. 12 (1962): 81-88.
  • Balme, David. “Aristotle’s Biology was not Essentialist” Archiv Für Geschichte der Philosophie. 62.1 (1980): 1-12.
  • Bourgey, Louis. Observation et Experiénce chez Aristote. Paris: J. Vrin, 1955.
  • Boylan, Michael. "Mechanism and Teleology in Aristotle's Biology" Apeiron 15.2 (1981): 96-102.
  • Boylan, Michael. "The Digestive and 'Circulatory' Systems in Aristotle's Biology" Journal of the History of Biology 15.1 (1982): 89-118.
  • Boylan, Michael. Method and Practice in Aristotle’s Biology. Lanham, MD and London: University Press of America, 1983.
  • Boylan, Michael. "The Hippocratic and Galenic Challenges to Aristotle's Conception Theory" Journal of the History of Biology 15.1 (1984): 83-112.
  • Boylan, Michael. "The Place of Nature in Aristotle's Biology" Apeiron 19.1 (1985).
  • Boylan, Michael. "Galen's Conception Theory" Journal of the History of Biology 19.1 (1986): 44-77.
  • Boylan, Michael. "Monadic and SystemicTEleology" in Modern Problems in Teleology ed. Nicholas Rescher (Washington, D.C.: University Press of America, 1986).
  • Charles, David. Aristotle on Meaning and Essence. Oxford: Oxford University Press, 2000.
  • Deverreux, Daniel and Pierre Pellegrin. Eds. Biologie, Logique et Métaphysique chez Aristote. Paris: Éditions du Centre National de la Recherche Scientifique,1990.
  • Düring, Ingemar. Aristotles De Partibus Animalium, Critical and Literary Commentary. Goeteborg, 1943, rpt. NY.: Garland, 1980.
  • Ferejohn, M. The Origins of Aristotelian Science. New Haven, CT: Yale University Press, 1990.
  • Gotthelf, Allan and James G. Lennox, eds. Philosophical Issues in Aristotle’s Biology. NY: Cambridge University Press, 1987.
  • Grene, Marjorie. A Portrait of Aristotle. Chicago: University of Chicago Press, 1963.
  • Joly, Robert. “La Charactérologie Antique Jusqu’ à Aristote. Revue Belge de Philologie et d’Histoire40 (1962): 5-28.
  • Kullmann, Wolfgang. Wissenscaft und Methode: Interpretationen zur Aristotelischen Theorie der Naturwissenschaft. Berlin: de Gruyter, 1974.
  • Kullmann, Wolfgang. Aristoteles und die moderne Wissenschaft Stuttgart: F. Steiner, 1998.
  • Kullmann, Wolfgang. “Aristotles’ wissenschaftliche Methode in seinen zoologischen Schriften” in Wörhle, G., ed. Geschichte der Mathematik und der Naturwissenschaften. Band 1 Stuttgart: F. Steiner, 1999, pp. 103-123.
  • Kullmann, Wolfgang. “Zoologische Sammelwerk in der Antike” in Wörhle, G., ed. Geschichte der Mathematik und der Naturwissenschaften. Band 1 Stuttgart: F. Steiner 1999, pp. 181-198.
  • Kung, Joan. “Some Aspects of Form in Aristotle’s Biology” Nature and System 2 (1980): 67-90.
  • Kung, Joan. “Aristotle on Thises, Suches and the Third Man Argument” Phronesis 26 (1981): 207-247.
  • Le Blonde, Jean Marie. Aristote, Philosophie de la Vie. Paris: Éditions Montaigne, 1945.
  • Lesher, James. “NOUS in the Parts of Animals.” Phronesis 18 (1973): 44-68.
  • Lennox, James. “Teleology, Chance, and Aristotle’s Theory of Spontaneous Generation” Journal of the History of Philosophy 20 (1982): 219-232.
  • Lennox, James. “The Place of Mankind in Aristotle’s Zoology” Philosophical Topics 25.1 (1999): 1-16.
  • Lennox, James. Aristotle’s Philosophy of Biology: Studies in the Origins of Life Sciences. NY: Cambridge University Press, 2001.
  • Lloyd, G.E.R. “Right and Left in Greek Philosophy” Journal of Hellenic Studies. 82 (1962): 67-90.
  • Lloyd, G.E.R. Polarity and Analogy. Cambridge: Cambridge University Press, 1966.
  • Lloyd, G.E.R. Aristotle: The Growth and Structure of his Thought. Cambridge: Cambridge University Press, 1969.
  • Lloyd, G.E.R. “Saving the Appearances” Classical Quarterly. n.s. 28 (1978): 202-222.
  • Lloyd, G.E.R. Magic, Reason, and Experience. Cambridge: Cambridge University Press, 1979.
  • Lloyd, G.E.R. The Revolutions of Wisdom. Berkeley, CA: University of California Press, 1987
  • Lloyd, G.E.R. Methods and Problems in Greek Science. Cambridge: Cambridge University Press, 1991.
  • Lloyd, G.E.R. Aristotelian Explorations. Cambridge: Cambridge University Press, 1996.
  • Louis, Pierre. “La Génération Spontanée chez Aristote” Congrèss International d’Histoire des Sciences (1968): 291-305.
  • Louis, Pierre. La Découverte de la Vie. Paris: Hermann, 1975.
  • Owen, G.E.L. “TITHENAI TA PHAINOMENA” Aristote et les Problèms de Méthode. Louvain, 1975.
  • Owen, G.E.L. The Platonism of Aristotle. London: British Academy: Dawes Hicks Lecture on Philosophy, 1965.
  • Pellegrin, Pierre. La Classification des Animaux chez Aristote: Statut de la Biologie et Unite de l’Aristotélisme. Paris: Societé d’édition “Les Belles Lettres,” 1982.
  • Pellegrin, Pierre. “Logical Difference and Biological Difference: The Unity of Aristotle’s Thought” in Gotthelf, Allan and James G. Lennox, eds. Philosophical Issues in Aristotle’s Biology. NY: Cambridge University Press, 1987, pp. 313-338.
  • Pellegrin, Pierre. “Taxonomie, moriologie, division” in Deverreux, Daniel and Pierre Pellegrin. Eds.Biologie, Logique et Métaphysique chez Aristote. Paris, 1990, 37-48.
  • Preus, Anthony. “Aristotle’s Parts of Animals 2.16 659b 13-19: Is it Authentic?” Classical Quarterly18.2 (1968): 170-178.
  • Preus, Anthony. “Nature Uses. . . .” Apeiron 3.2 (1969): 20-33.
  • Preus, Anthony. Science and Philosophy in Aristotle’s Biological Works. NY: Olhms, 1975.
  • Preus, Anthony. “Eidos as Norm” Nature and System 1 (1979): 79-103.
  • Solmsen, Friedrich. Aristotle’s System of the Physical World: A Comparison with his Predecessors.Ithaca, NY: Cornell University Press, 1960.
  • Sorabji, Richard. Necessity, Cause, and Blame. Ithaca, NY: Cornell University Press, 1980.
  • Thompson, D’Arcy. Aristotle as Biologist. Oxford: Oxford University Press, 1913.
  • Thompson, D’Arcy. Growth and Form. Cambridge: Cambridge University Press, 1917.
  • Ulmer, K. Wahrheit, Kunst und Natur bei Aristotles. Tübingen: M. Niemayer, 1953.
  • Witt, Charlotte. Substance and Essence in Aristotle: An Interpretation of Metaphysics VII-IX.Ithaca, NY: Cornell University Press, 1989.
  • Wörhle, Georg and Jochen Althoff, eds. Biologie in Geschichte der Mathematik und der Naturwissenschaften (series). Band 1 Stuttgart: F. Steiner, 1999.

Author Information

Michael Boylan
Email: michael.boylan@marymount.edu
Marymount University
U. S. A.

 

 

 

 

Propositional Logic

Propositional logic, also known as sentential logic and statement logic, is the branch of logic that studies ways of joining and/or modifying entire propositions, statements or sentences to form more complicated propositions, statements or sentences, as well as the logical relationships and properties that are derived from these methods of combining or altering statements. In propositional logic, the simplest statements are considered as indivisible units, and hence, propositional logic does not study those logical properties and relations that depend upon parts of statements that are not themselves statements on their own, such as the subject and predicate of a statement. The most thoroughly researched branch of propositional logic is classical truth-functional propositional logic, which studies logical operators and connectives that are used to produce complex statements whose truth-value depends entirely on the truth-values of the simpler statements making them up, and in which it is assumed that every statement is either true or false and not both. However, there are other forms of propositional logic in which other truth-values are considered, or in which there is consideration of connectives that are used to produce statements whose truth-values depend not simply on the truth-values of the parts, but additional things such as their necessity, possibility or relatedness to one another.

Table of Contents

  1. Introduction
  2. History
  3. The Language of Propositional Logic
    1. Syntax and Formation Rules of PL
    2. Truth Functions and Truth Tables
    3. Definability of the Operators and the Languages PL' and PL''
  4. Tautologies, Logical Equivalence and Validity
  5. Deduction: Rules of Inference and Replacement
    1. Natural Deduction
    2. Rules of Inference
    3. Rules of Replacement
    4. Direct Deductions
    5. Conditional and Indirect Proofs
  6. Axiomatic Systems and the Propositional Calculus
  7. Meta-Theoretic Results for the Propositional Calculus
  8. Other Forms of Propositional Logic
  9. References and Further Reading

1. Introduction

A statement can be defined as a declarative sentence, or part of a sentence, that is capable of having a truth-value, such as being true or false. So, for example, the following are statements:

  • George W. Bush is the 43rd President of the United States.
  • Paris is the capital of France.
  • Everyone born on Monday has purple hair.

Sometimes, a statement can contain one or more other statements as parts. Consider for example, the following statement:

  • Either Ganymede is a moon of Jupiter or Ganymede is a moon of Saturn.

While the above compound sentence is itself a statement, because it is true, the two parts, "Ganymede is a moon of Jupiter" and "Ganymede is a moon of Saturn", are themselves statements, because the first is true and the second is false.

The term proposition is sometimes used synonymously with statement. However, it is sometimes used to name something abstract that two different statements with the same meaning are both said to "express". In this usage, the English sentence, "It is raining", and the French sentence "Il pleut", would be considered to express the same proposition; similarly, the two English sentences, "Callisto orbits Jupiter" and "Jupiter is orbitted by Callisto" would also be considered to express the same proposition. However, the nature or existence of propositions as abstract meanings is still a matter of philosophical controversy, and for the purposes of this article, the phrases "statement" and "proposition" are used interchangeably.

Propositional logic, also known as sentential logic, is that branch of logic that studies ways of combining or altering statements or propositions to form more complicated statements or propositions. Joining two simpler propositions with the word "and" is one common way of combining statements. When two statements are joined together with "and", the complex statement formed by them is true if and only if both the component statements are true. Because of this, an argument of the following form is logically valid:

Paris is the capital of France and Paris has a population of over two million.
Therefore, Paris has a population of over two million.

Propositional logic largely involves studying logical connectives such as the words "and" and "or" and the rules determining the truth-values of the propositions they are used to join, as well as what these rules mean for the validity of arguments, and such logical relationships between statements as being consistent or inconsistent with one another, as well as logical properties of propositions, such as being tautologically true, being contingent, and being self-contradictory. (These notions are defined below.)

Propositional logic also studies way of modifying statements, such as the addition of the word "not" that is used to change an affirmative statement into a negative statement. Here, the fundamental logical principle involved is that if a given affirmative statement is true, the negation of that statement is false, and if a given affirmative statement is false, the negation of that statement is true.

What is distinctive about propositional logic as opposed to other (typically more complicated) branches of logic is that propositional logic does not deal with logical relationships and properties that involve the parts of a statement smaller than the simple statements making it up. Therefore, propositional logic does not study those logical characteristics of the propositions below in virtue of which they constitute a valid argument:

  1. George W. Bush is a president of the United States.
  2. George W. Bush is a son of a president of the United States.
  3. Therefore, there is someone who is both a president of the United States and a son of a president of the United States.

The recognition that the above argument is valid requires one to recognize that the subject in the first premise is the same as the subject in the second premise. However, in propositional logic, simple statements are considered as indivisible wholes, and those logical relationships and properties that involve parts of statements such as their subjects and predicates are not taken into consideration.

Propositional logic can be thought of as primarily the study of logical operators. A logical operator is any word or phrase used either to modify one statement to make a different statement, or join multiple statements together to form a more complicated statement. In English, words such as "and", "or", "not", "if ... then...", "because", and "necessarily", are all operators.

A logical operator is said to be truth-functional if the truth-values (the truth or falsity, etc.) of the statements it is used to construct always depend entirely on the truth or falsity of the statements from which they are constructed. The English words "and", "or" and "not" are (at least arguably) truth-functional, because a compound statement joined together with the word "and" is true if both the statements so joined are true, and false if either or both are false, a compound statement joined together with the word "or" is true if at least one of the joined statements is true, and false if both joined statements are false, and the negation of a statement is true if and only if the statement negated is false.

Some logical operators are not truth-functional. One example of an operator in English that is not truth-functional is the word "necessarily". Whether a statement formed using this operator is true or false does not depend entirely on the truth or falsity of the statement to which the operator is applied. For example, both of the following statements are true:

  • 2 + 2 = 4.
  • Someone is reading an article in a philosophy encyclopedia.

However, let us now consider the corresponding statements modified with the operator "necessarily":

  • Necessarily, 2 + 2 = 4.
  • Necessarily, someone is reading an article in a philosophy encyclopedia.

Here, the first example is true but the second example is false. Hence, the truth or falsity of a statement using the operator "necessarily" does not depend entirely on the truth or falsity of the statement modified.

Truth-functional propositional logic is that branch of propositional logic that limits itself to the study of truth-functional operators. Classical (or "bivalent") truth-functional propositional logic is that branch of truth-functional propositional logic that assumes that there are are only two possible truth-values a statement (whether simple or complex) can have: (1) truth, and (2) falsity, and that every statement is either true or false but not both.

Classical truth-functional propositional logic is by far the most widely studied branch of propositional logic, and for this reason, most of the remainder of this article focuses exclusively on this area of logic. In addition to classical truth-functional propositional logic, there are other branches of propositional logic that study logical operators, such as "necessarily", that are not truth-functional. There are also "non-classical" propositional logics in which such possibilities as (i) a proposition's having a truth-value other than truth or falsity, (ii) a proposition's having an indeterminate truth-value or lacking a truth-value altogether, and sometimes even (iii) a proposition's being both true and false, are considered. (For more information on these alternative forms of propositional logic, consult Section VIII below.)

2. History

The serious study of logic as an independent discipline began with the work of Aristotle (384-322 BCE). Generally, however, Aristotle's sophisticated writings on logic dealt with the logic of categories and quantifiers such as "all", and "some", which are not treated in propositional logic. However, in his metaphysical writings, Aristotle espoused two principles of great importance in propositional logic, which have since come to be called the Law of Excluded Middle and the Law of Contradiction. Interpreted in propositional logic, the first is the principle that every statement is either true or false, the second is the principle that no statement is both true and false. These are, of course, cornerstones of classical propositional logic. There is some evidence that Aristotle, or at least his successor at the Lyceum, Theophrastus (d. 287 BCE), did recognize a need for the development of a doctrine of "complex" or "hypothetical" propositions, i.e., those involving conjunctions (statements joined by "and"), disjunctions (statements joined by "or") and conditionals (statements joined by "if... then..."), but their investigations into this branch of logic seem to have been very minor.

More serious attempts to study such statement operators such as "and", "or" and "if... then..." were conducted by the Stoic philosophers in the late 3rd century BCE. Since most of their original works -- if indeed, many writings were even produced -- are lost, we cannot make many definite claims about exactly who first made investigations into what areas of propositional logic, but we do know from the writings of Sextus Empiricus that Diodorus Cronus and his pupil Philo had engaged in a protracted debate about whether the truth of a conditional statement depends entirely on it not being the case that its antecedent (if-clause) is true while its consequent (then-clause) is false, or whether it requires some sort of stronger connection between the antecedent and consequent -- a debate that continues to have relevance for modern discussion of conditionals. The Stoic philosopher Chrysippus (roughly 280-205 BCE) perhaps did the most in advancing Stoic propositional logic, by marking out a number of different ways of forming complex premises for arguments, and for each, listing valid inference schemata. Chrysippus suggested that the following inference schemata are to be considered the most basic:

  1. If the first, then the second; but the first; therefore the second.
  2. If the first, then the second; but not the second; therefore, not the first.
  3. Not both the first and the second; but the first; therefore, not the second.
  4. Either the first or the second [and not both]; but the first; therefore, not the second.
  5. Either the first or the second; but not the second; therefore the first.

Inference rules such as the above correspond very closely the the basic principles in a contemporary system of natural deduction for propositional logic. For example, the first two rules correspond to the rules of modus ponens and modus tollens, respectively. These basic inference schemata were expanded upon by less basic inference schemata by Chrysippus himself and other Stoics, and are preserved in the work of Diogenes Laertius, Sextus Empiricus and later, in the work of Cicero.

Advances on the work of the Stoics were undertaken in small steps in the centuries that followed. This work was done by, for example, the second century logician Galen (roughly 129-210 CE), the sixth century philosopher Boethius (roughly 480-525 CE) and later by medieval thinkers such as Peter Abelard (1079-1142) and William of Ockham (1288-1347), and others. Much of their work involved producing better formalizations of the principles of Aristotle or Chrysippus, introducing improved terminology and furthering the discussion of the relationships between operators. Abelard, for example, seems to have been the first to differentiate clearly exclusive from inclusive disjunction (discussed below), and to suggest that inclusive disjunction is the more important notion for the development of a relatively simple logic of disjunctions.

The next major step forward in the development of propositional logic came only much later with the advent of symbolic logic in the work of logicians such as Augustus DeMorgan (1806-1871) and, especialy, George Boole (1815-1864) in the mid-19th century. Boole was primarily interested in developing a mathematical-style "algebra" to replace Aristotelian syllogistic logic, primarily by employing the numeral "1" for the universal class, the numeral "0" for the empty class, the multiplication notation "xy" for the intersection of classes x and y, the addition notation "x + y" for the union of classes x and y, etc., so that statements of syllogistic logic could be treated in quasi-mathematical fashion as equations; e.g., "No x is y" could be written as "xy = 0". However, Boole noticed that if an equation such as "x = 1" is read as "x is true", and "x = 0" is read as "x is false", the rules given for his logic of classes can be transformed into logic for propositions, with "x + y = 1" reinterpreted as saying that either x or y is true, and "xy = 1" reinterpreted as meaning that x and y are both true. Boole's work sparked rapid interest in logic among mathematicians and later, "Boolean algebras" were used to form the basis of the truth-functional propositional logics utilized in computer design and programming.

In the late 19th century, Gottlob Frege (1848-1925) presented logic as a branch of systematic inquiry more fundamental than mathematics or algebra, and presented the first modern axiomatic calculus for logic in his 1879 work Begriffsschrift. While it covered more than propositional logic, from Frege's axiomatization it is possible to distill the first complete axiomatization of classical truth-functional propositional logic. Frege was also the first to systematically argue that all truth-functional connectives could be defined in terms of negation and the material conditional.

In the early 20th century, Bertrand Russell gave a different complete axiomatization of propositional logic, considered on its own, in his 1906 paper "The Theory of Implication", and later, along with A. N. Whitehead, produced another axiomatization using disjunction and negation as primitives in the 1910 work Principia Mathematica. Proof of the possibility of defining all truth functional operators in virtue of a single binary operator was first published by American logician H. M. Sheffer in 1913, though C. S. Peirce (1839-1914) seems have discovered this decades earlier. In 1917, French logician Jean Nicod discovered that an axiomatization for propositional logic using the Sheffer stroke involving only a single axiom schema and single inference rule was possible.

While the notion of a "truth table" often utilized in the discussion of truth-functional connectives, discussed below, seems to have been at least implicit in the work of Peirce, W. S. Jevons (1835-1882), Lewis Carroll (1832-1898), John Venn (1834-1923), and Allan Marquand (1853-1924), and truth tables appear explicitly in writings by Eugen Müller as early as 1909, their use gained rapid popularity in the early 1920s, perhaps due to the combined influence of the work of Emil Post, whose 1921 makes liberal use of them, and Ludwig Wittgenstein's 1921 Tractatus Logico-Philosophicus, in which truth tables and truth-functionality are prominently featured.

Systematic inquiry into axiomatic systems for propositional logic and related metatheory was conducted in the 1920s, 1930s and 1940s by such thinkers as David Hilbert, Paul Bernays, Alfred Tarski, Jan Łukasiewicz, Kurt Gödel, Alonzo Church and others. It is during this period, that most of the important metatheoretic results such as those discussed in Section VII were discovered.

Complete natural deduction systems for classical truth-functional propositional logic were developed and popularized in the work of Gerhard Gentzen in the mid-1930s, and subsequently introduced into influential textbooks such as that of F. B. Fitch (1952) and Irving Copi (1953).

Modal propositional logics are the most widely studied form of non-truth-functional propositional logic. While interest in modal logic dates back to Aristotle, by contemporary standards, the first systematic inquiry into this modal propositional logic can be found in the work of C. I. Lewis in 1912 and 1913. Among other well-known forms of non-truth-functional propositional logic, deontic logic began with the work of Ernst Mally in 1926, and epistemic logic was first treated systematically by Jaakko Hintikka in the early 1960s. The modern study of three-valued propositional logic began in the work of Jan Łukasiewicz in 1917, and other forms of non-classical propositional logic soon followed suit. Relevance propositional logic is relatively more recent; dating from the mid-1970s in the work of A. R. Anderson and N. D. Belnap. Paraconsistent logic, while having its roots in the work of Łukasiewicz and others, has blossomed into an independent area of research only recently, mainly due to work undertaken by N. C. A. da Costa, Graham Priest and others in the 1970s and 1980s.

3. The Language of Propositional Logic

The basic rules and principles of classical truth-functional propositional logic are, among contemporary logicians, almost entirely agreed upon, and capable of being stated in a definitive way. This is most easily done if we utilize a simplified logical language that deals only with simple statements considered as indivisible units as well as complex statements joined together by means of truth-functional connectives. We first consider a language called PL for "Propositional Logic". Later we shall consider two even simpler languages, PL' and PL''.

a. Syntax and Formation Rules of PL

In any ordinary language, a statement would never consist of a single word, but would always at the very least consist of a noun or pronoun along with a verb. However, because propositional logic does not consider smaller parts of statements, and treats simple statements as indivisible wholes, the language PL uses uppercase letters 'A', 'B', 'C', etc., in place of complete statements. The logical signs '&', 'v', '→', '↔', and '¬' are used in place of the truth-functional operators, "and", "or", "if... then...", "if and only if", and "not", respectively. So, consider again the following example argument, mentioned in Section I.

Paris is the capital of France and Paris has a population of over two million.
Therefore, Paris has a population of over two million.

If we use the letter 'C' as our translation of the statement "Paris is the captial of France" in PL, and the letter 'P' as our translation of the statement "Paris has a population of over two million", and use a horizontal line to separate the premise(s) of an argument from the conclusion, the above argument could be symbolized in language PL as follows:

C & P
P

In addition to statement letters like 'C' and 'P' and the operators, the only other signs that sometimes appear in the language PL are parentheses which are used in forming even more complex statements. Consider the English compound sentence, "Paris is the most important city in France if and only if Paris is the capital of France and Paris has a population of over two million." If we use the letter 'I' in language PL to mean that Paris is the most important city in France, this sentence would be translated into PL as follows:

I ↔ (C & P)

The parentheses are used to group together the statements 'C' and 'P' and differentiate the above statement from the one that would be written as follows:

(I ↔ C) & P

This latter statement asserts that Paris is the most important city in France if and only if it is the capital of France, and (separate from this), Paris has a population of over two million. The difference between the two is subtle, but important logically.

It is important to describe the syntax and make-up of statements in the language PL in a precise manner, and give some definitions that will be used later on. Before doing this, it is worthwhile to make a distinction between the language in which we will be discussing PL, namely, English, from PL itself. Whenever one language is used to discuss another, the language in which the discussion takes place is called the metalanguage, and language under discussion is called the object language. In this context, the object language is the language PL, and the metalanguage is English, or to be more precise, English supplemented with certain special devices that are used to talk about language PL. It is possible in English to talk about words and sentences in other languages, and when we do, we place the words or sentences we wish to talk about in quotation marks. Therefore, using ordinary English, I can say that "parler" is a French verb, and "I & C" is a statement of PL. The following expression is part of PL, not English:

(I ↔ C) & P

However, the following expression is a part of English; in particular, it is the English name of a PL sentence:

"(I ↔ C) & P"

This point may seem rather trivial, but it is easy to become confused if one is not careful.

In our metalanguage, we shall also be using certain variables that are used to stand for arbitrary expressions built from the basic symbols of PL. In what follows, the Greek letters 'α', 'β', and so on, are used for any object language (PL) expression of a certain designated form. For example, later on, we shall say that, if α is a statement of PL, then so is '¬α'. Notice that 'α' itself is not a symbol that appears in PL; it is a symbol used in English to speak about symbols of PL. We will also be making use of so-called "Quine corners", written ''' and ''', which are a special metalinguistic device used to speak about object language expressions constructed in a certain way. Suppose α is the statement "(I ↔ C)" and β is the statement "(P & C)"; then 'α v β' is the complex statement "(I ↔ C) v (P & C)".

Let us now proceed to giving certain definitions used in the metalanguage when speaking of the language PL.

Definition: A statement letter of PL is defined as any uppercase letter written with or without a numerical subscript.

Note: According to this definition, 'A', 'B', 'B2', 'C3', and 'P14' are examples of statement letters. The numerical subscripts are used just in case we need to deal with more than 26 simple statements: in that case, we can use 'P1' to mean something different than 'P2', and so forth.

Definition: A connective or operator of PL is any of the signs '¬', '&', 'v', '→', and '↔'.

Definition: A well-formed formula (hereafter abbrevated as wff) of PL is defined recursively as follows:

  1. Any statement letter is a well-formed formula.
  2. If α is a well-formed formula, then so is '¬α'.
  3. If α and β are well-formed formulas, then so is '(α & β)'.
  4. If α and β are well-formed formulas, then so is 'v β)'.
  5. If α and β are well-formed formulas, then so is '(α → β)'.
  6. If α and β are well-formed formulas, then so is '(α ↔ β)'.
  7. Nothing that cannot be constructed by successive steps of (1)-(6) is a well-formed formula.

Note: According to part (1) of this definition, the statement letters 'C', 'P' and 'M' are wffs. Because 'C' and 'P' are wffs, by part (3), "(C & P)" is a wff. Because it is a wff, and 'M' is also a wff, by part (6), "(M ↔ (C & P))" is a wff. It is conventional to regard the outermost parentheses on a wff as optional, so that "M ↔ (C & P)" is treated as an abbreviated form of "(M ↔ (C & P))". However, whenever a shorter wff is used in constructing a more complicated wff, the parentheses on the shorter wff are necessary.

The notion of a well-formed formula should be understood as corresponding to the notion of a grammatically correct or properly constructed statement of language PL. This definition tells us, for example, that "¬(Q v ¬R)" is grammatical for PL because it is a well-formed formula, whereas the string of symbols, ")¬Q¬v(↔P&", while consisting entirely of symbols used in PL, is not grammatical because it is not well-formed.

b. Truth Functions and Truth Tables

So far we have in effect described the grammar of language PL. When setting up a language fully, however, it is necessary not only to establish rules of grammar, but also describe the meanings of the symbols used in the language. We have already suggested that uppercase letters are used as complete simple statements. Because truth-functional propositional logic does not analyze the parts of simple statements, and only considers those ways of combining them to form more complicated statements that make the truth or falsity of the whole dependent entirely on the truth or falsity of the parts, in effect, it does not matter what meaning we assign to the individual statement letters like 'P', 'Q' and 'R', etc., provided that each is taken as either true or false (and not both).

However, more must be said about the meaning or semantics of the logical operators '&', 'v', '→', '↔', and '¬'. As mentioned above, these are used in place of the English words, 'and', 'or', 'if... then...', 'if and only if', and 'not', respectively. However, the correspondence is really only rough, because the operators of PL are considered to be entirely truth-functional, whereas their English counterparts are not always used truth-functionally. Consider, for example, the following statements:

  1. If Bob Dole is president of the United States in 2004, then the president of the United States in 2004 is a member of the Republican party.
  2. If Al Gore is president of the United States in 2004, then the president of the United States in 2004 is a member of the Republican party.

For those familiar with American politics, it is tempting to regard the English sentence (1) as true, but to regard (2) as false, since Dole is a Republican but Gore is not. But notice that in both cases, the simple statement in the "if" part of the "if... then..." statement is false, and the simple statement in the "then" part of the statement is true. This shows that the English operator "if... then..." is not fully truth-functional. However, all the operators of language PL are entirely truth-functional, so the sign '→', though similar in many ways to the English "if... then..." is not in all ways the same. More is said about this operator below.

Since our study is limited to the ways in which the truth-values of complex statements depend on the truth-values of the parts, for each operator, the only aspect of its meaning relevant in this context is its associated truth-function. The truth-function for an operator can be represented as a table, each line of which expresses a possible combination of truth-values for the simpler statements to which the operator applies, along with the resulting truth-value for the complex statement formed using the operator.

The signs '&', 'v', '→', '↔', and '¬', correspond, respectively, to the truth-functions of conjunction, disjunction, material implication, material equivalence, and negation. We shall consider these individually.

Conjunction: The conjunction of two statements α and β, written in PL as '(α & β)', is true if both α and β are true, and is false if either α is false or β is false or both are false. In effect, the meaning of the operator '&' can be displayed according to the following chart, which shows the truth-value of the conjunction depending on the four possibilities of the truth-values of the parts:

α β (α & β)
T
T
F
F
T
F
T
F
T
F
F
F

Conjunction using the operator '&' is language PL's rough equivalent of joining statements together with 'and' in English. In a statement of the form '(α & β)', the two statements joined together, α and β, are called the conjuncts, and the whole statement is called a conjunction.

Instead of the sign '&', some other logical works use the signs '' or '•' for conjunction.

Disjunction: The disjunction of two statements α and β, written in PL as 'v β)', is true if either α is true or β is true, or both α and β are true, and is false only if both α and β are false. A chart similar to that given above for conjunction, modified for to show the meaning of the disjunction sign 'v' instead, would be drawn as follows:

α β v β)
T
T
F
F
T
F
T
F
T
T
T
F

This is language PL's rough equivalent of joining statements together with the word 'or' in English. However, it should be noted that the sign 'v' is used for disjunction in the inclusive sense. Sometimes when the word 'or' is used to join together two English statements, we only regard the whole as true if one side or the other is true, but not both, as when the statement "Either we can buy the toy robot, or we can buy the toy truck; you must choose!" is spoken by a parent to a child who wants both toys. This is called the exclusive sense of 'or'. However, in PL, the sign 'v' is used inclusively, and is more analogous to the English word 'or' as it appears in a statement such as (e.g., said about someone who has just received a perfect score on the SAT), "either she studied hard, or she is extremely bright", which does not mean to rule out the possibility that she both studied hard and is bright. In a statement of the form 'v β)', the two statements joined together, α and β, are called the disjuncts, and the whole statement is called a disjunction.

Material Implication: This truth-function is represented in language PL with the sign '→'. A statement of the form '(α → β)', is false if α is true and β is false, and is true if either α is false or β is true (or both). This truth-function generates the following chart:

α β (α → β)
T
T
F
F
T
F
T
F
T
F
T
T

Because the truth of a statement of the form '(α → β)' rules out the possibility of α being true and β being false, there is some similarity between the operator '→' and the English phrase, "if... then...", which is also used to rule out the possibility of one statement being true and another false; however, '→' is used entirely truth-functionally, and so, for reasons discussed earlier, it is not entirely analogous with "if... then..." in English. If α is false, then '(α → β)' is regarded as true, whether or not there is any connection between the falsity of α and the truth-value of β. In a statement of the form, '(α → β)', we call α the antecedent, and we call β the consequent, and the whole statement '(α → β)' is sometimes also called a (material) conditional.

The sign '⊃' is sometimes used instead of '→' for material implication.

Material Equivalence: This truth-function is represented in language PL with the sign '↔'. A statement of the form '(α ↔ β)' is regarded as true if α and β are either both true or both false, and is regarded as false if they have different truth-values. Hence, we have the following chart:

α β (α ↔ β)
T
T
F
F
T
F
T
F
T
F
F
T

Since the truth of a statement of the form '(α ↔ β)' requires α and β to have the same truth-value, this operator is often likened to the English phrase "...if and only if...". Again, however, they are not in all ways alike, because '↔' is used entirely truth-functionally. Regardless of what α and β are, and what relation (if any) they have to one another, if both are false, '(α ↔ β)' is considered to be true. However, we would not normally regard the statement "Al Gore is the President of the United States in 2004 if and only if Bob Dole is the President of the United States in 2004" as true simply because both simpler statements happen to be false. A statement of the form '(α ↔ β)' is also sometimes referred to as a (material) biconditional.

The sign '≡' is sometimes used instead of '↔' for material equivalence.

Negation: The negation of statement α, simply written '¬α' in language PL, is regarded as true if α is false, and false if α is true. Unlike the other operators we have considered, negation is applied to a single statement. The corresponding chart can therefore be drawn more simply as follows:

α ¬α
T
F
F
T

The negation sign '¬' bears obvious similarities to the word 'not' used in English, as well as similar phrases used to change a statement from affirmative to negative or vice-versa. In logical languages, the signs '~' or '–' are sometimes used in place of '¬'.

The five charts together provide the rules needed to determine the truth-value of a given wff in language PL when given the truth-values of the independent statement letters making it up. These rules are very easy to apply in the case of a very simple wff such as "(P & Q)". Suppose that 'P' is true, and 'Q' is false; according to the second row of the chart given for the operator, '&', we can see that this statement is false.

However, the charts also provide the rules necessary for determining the truth-value of more complicated statements. We have just seen that "(P & Q)" is false if 'P' is true and 'Q' is false. Consider a more complicated statement that contains this statement as a part, e.g., "((P & Q) → ¬R)", and suppose once again that 'P' is true, and 'Q' is false, and further suppose that 'R' is also false. To determine the truth-value of this complicated statement, we begin by determining the truth-value of the internal parts. The statement "(P & Q)", as we have seen, is false. The other substatement, "¬R", is true, because 'R' is false, and '¬' reverses the truth-value of that to which it is applied. Now we can determine the truth-value of the whole wff, "((P & Q) → ¬R)", by consulting the chart given above for '→'. Here, the wff "(P & Q)" is our α, and "¬R" is our β, and since their truth-values are F and T, respectively, we consult the third row of the chart, and we see that the complex statement "((P & Q) → ¬R)" is true.

We have so far been considering the case in which 'P' is true and 'Q' and 'R' are both false. There are, however, a number of other possibilities with regard to the possible truth-values of the statement letters, 'P', 'Q' and 'R'. There are eight possibilities altogether, as shown by the following list:

P
Q
R
T
T
T
T
F
F
F
F
T
T
F
F
T
T
F
F
T
F
T
F
T
F
T
F

Strictly speaking, each of the eight possibilities above represents a different truth-value assignment, which can be defined as a possible assignment of truth-values T or F to the different statement letters making up a wff or series of wffs. If a wff has n distinct statement letters making up, the number of possible truth-value assignments is 2n. With the wff, "((P & Q) → ¬R)", there are three statement letters, 'P', 'Q' and 'R', and so there are 8 truth-value assignments.

It then becomes possible to draw a chart showing how the truth-value of a given wff would be resolved for each possible truth-value assignment. We begin with a chart showing all the possible truth-value assignments for the wff, such as the one given above. Next, we write out the wff itself on the top right of our chart, with spaces between the signs. Then, for each, truth-value assignment, we repeat the appropriate truth-value, 'T', or 'F', underneath the statement letters as they appear in the wff. Then, as the truth-values of those wffs that are parts of the complete wff are determined, we write their truth-values underneath the logical sign that is used to form them. The final column filled in shows the truth-value of the entire statement for each truth-value assignment. Given the importance of this column, we highlight it in some way. Here, we highlight it in yellow.

P
Q
R
|
((P
&
Q)
¬
R)
T
T
T
T
F
F
F
F
T
T
F
F
T
T
F
F
T
F
T
F
T
F
T
F
T
T
T
T
F
F
F
F
T
T
F
F
F
F
F
F
T
T
F
F
T
T
F
F
F
T
T
T
T
T
T
T
F
T
F
T
F
T
F
T
T
F
T
F
T
F
T
F

Charts such as the one given above are called truth tables. In classical truth-functional propositional logic, a truth table constructed for a given wff in effects reveals everything logically important about that wff. The above chart tells us that the wff "((P & Q) → ¬R)" can only be false if 'P', 'Q' and 'R' are all true, and is true otherwise.

c. Definability of the Operators and the Languages PL' and PL''

The language PL, as we have seen, contains operators that are roughly analogous to the English operators 'and', 'or', 'if... then...', 'if and only if', and 'not'. Each of these, as we have also seen, can be thought of as representing a certain truth-function. It might be objected however, that there are other methods of combining statements together in which the truth-value of the statement depends wholly on the truth-values of the parts, or in other words, that there are truth-functions besides conjunction, (inclusive) disjunction, material implication, material equivalence and negation. For example, we noted earlier that the sign 'v' is used analogously to 'or' in the inclusive sense, which means that language PL has no simple sign for 'or' in the exclusive sense. It might be thought, however, that the langauge PL is incomplete without the addition of an additional symbol, say 'v', such that 'v β)' would be regarded as true if α is true and β is false, or α is false and β is true, but would be regarded as false if either both α and β are true or both α and β are false.

However, a possible response to this objection would be to make note that while language PL does not include a simple sign for this exclusive sense of disjunction, it is possible, using the symbols that are included in PL, to construct a statement that is true in exactly the same circumstances. Consider, e.g., a statement of the form, '¬(α ↔ β)'. It is easily shown, using a truth table, that any wff of this form would have the same truth-value as a would-be statement using the operator 'v'. See the following chart:

α
β
|
¬
β)
T
T
F
F
T
F
T
F
F
T
T
F
T
T
F
F
T
F
F
T
T
F
T
F

Here we see that a wff of the form '¬(α ↔ β)' is true if either α or β is true but not both. This shows that PL is not lacking in any way by not containing a sign 'v'. All the work that one would wish to do with this sign can be done using the signs '↔' and '¬'. Indeed, one might claim that the sign 'v' can be defined in terms of the signs '↔', and '¬', and then use the form 'v β)' as an abbreviation of a wff of the form '¬(α ↔ β)', without actually expanding the primitive vocabulary of language PL.

The signs '&', 'v', '→', '↔' and '¬', were chosen as the operators to include in PL because they correspond (roughly) the sorts of truth-functional operators that are most often used in ordinary discourse and reasoning. However, given the preceding discussion, it is natural to ask whether or not some operators on this list can be defined in terms of the others. It turns out that they can. In fact, if for some reason we wished our logical language to have a more limited vocabulary, it is possible to get by using only the signs '¬' and '→', and define all other possible truth-functions in virtue of them. Consider, e.g., the following truth table for statements of the form '¬(α → ¬β)':

α
β
|
¬
¬
β)
T
T
F
F
T
F
T
F
T
F
F
F
T
T
F
F
F
T
T
T
F
T
F
T
T
F
T
F

We can see from the above that a wff of the form '¬(α → ¬β)' always has the same truth-value as the corresponding statement of the form '(α & β)'. This shows that the sign '&' can in effect be defined using the signs '¬' and '→'.

Next, consider the truth table for statements of the form '(¬α → β)':

α
β
|
α
β)
T
T
F
F
T
F
T
F
F
F
T
T
T
T
F
F
T
T
T
F
T
F
T
F

Here we can see that a statement of the form '(¬α → β)' always has the same truth-value as the corresponding statement of the form 'v β)'. Again, this shows that the sign 'v' could in effect be defined using the signs '→' and '¬'.

Lastly, consider the truth table for a statement of the form '¬((α → β) → ¬(β → α))':

α
β
|
¬
((α
β)
¬
α))
T
T
F
F
T
F
T
F
T
F
F
T
T
T
F
F
T
F
T
T
T
F
T
F
F
T
T
F
F
F
T
F
T
F
T
F
T
T
F
T
T
T
F
F

From the above, we see that a statement of the form '¬((α → β) → ¬(β → α))' always has the same truth-value as the corresponding statement of the form '(α ↔ β)'. In effect, therefore, we have shown that the remaining operators of PL can all be defined in virtue of '→', and '¬', and that, if we wished, we could do away with the operators, '&', 'v' and '↔', and simply make do with those equivalent expressions built up entirely from '→' and '¬'.

Let us call the language that results from this simplication PL'. While the definition of a statement letter remains the same for PL' as for PL, the definition of a well-formed formula (wff) for PL' can be greatly simplified. In effect, it can be stated as follows:

Definition: A well-formed formula (or wff) of PL' is defined recursively as follows:

  1. Any statement letter is a well-formed formula.
  2. If α is a well-formed formula, then so is '¬α'.
  3. If α and β are well-formed formulas, then so is '(α → β)'.
  4. Nothing that cannot be constructed by successive steps of (1)-(3) is a well-formed formula.

Strictly speaking, then, the langauge PL' does not contain any statements using the operators 'v', '&', or '↔'. One could however, utilize conventions such that, in language PL', an expression of the form '(α & β)' is to be regarded as a mere abbreviation or short-hand for the corresponding statement of the form '¬(α → ¬β)', and similarly that expressions of the forms 'v β)' and '(α ↔ β)' are to be regarded as abbreviations of expressions of the forms '(¬α → β)' or '¬((α → β) → ¬(β → α))', respectively. In effect, this means that it is possible to translate any wff of language PL into an equivalent wff of language PL'.

In Section VII, it is proven that not only are the operators '¬' and '→' sufficient for defining every truth-functional operator included in language PL, but also that they are sufficient for defining any imaginable truth-functional operator in classical propositional logic.

Nevertheless, the choice of '¬' and '→' for the primitive signs used in language PL' is to some extent arbitrary. It would also have been possible to define all other operators of PL (including '→') using the signs '¬' and 'v'. On this approach, '(α & β)' would be defined as '¬(¬α v ¬β)', '(α → β)' would be defined as '(¬α v β)', and '(α ↔ β)' would be defined as '¬(¬(¬α v β) v ¬(¬β v α))'. Similarly, we could instead have begun with '¬' and '&' as our starting operators. On this way of proceeding, 'v β)' would be defined as '¬(¬α & ¬β)', '(α → β)' would be defined as '¬(α & ¬β)', and '(α ↔ β)' would be defined as '(¬(α & ¬β) & ¬(β & ¬α)'.

There are, as we have seen, multiple different ways of reducing all truth-functional operators down to two primitives. There are also two ways of reducing all truth-functional operators down to a single primitive operator, but they require using an operator that is not included in language PL as primitive. On one approach, we utilize an operator written '|', and explain the truth-function corresponding to this sign by means of the following chart:

α β (α | β)
T
T
F
F
T
F
T
F
F
T
T
T

Here we can see that a statement of the form '(α | β)' is false if both α and β are true, and true otherwise. For this reason one might read '|' as akin to the English expression, "Not both ... and ...". Indeed, it is possible to represent this truth-function in language PL using an expression of the form, '¬(α & β)'. However, since it is our intention to show that all other truth-functional operators, including '¬' and '&' can be derived from '|', it is better not to regard the meanings of '¬' and '&' as playing a part of the meaning of '|', and instead attempt (however counterintuitive it may seem) to regard '|' as conceptually prior to '¬' and '&'.

The sign '|' is called the Sheffer stroke, and is named after H. M. Sheffer, who first publicized the result that all truth-functional connectives could be defined in virtue of a single operator in 1913.

We can then see that the connective '&' can be defined in virtue of '|', because an expression of the form '((α | β) | (α | β))' generates the following truth table, and hence is equivalent to the corresponding expression of the form '(α & β)':

α
β
|
((α
|
β)
|
(α
|
β))
T
T
F
F
T
F
T
F
T
T
F
F
F
T
T
T
T
F
T
F
T
F
F
F
T
T
F
F
F
T
T
T
T
F
T
F

Similarly, we can define the operator 'v' using '|' by noting that an expression of the form '((α | α) | (β | β))' always has the same truth-value as the corresponding statement of the form 'v β)':

α
β
|
((α
|
α)
|
(β
|
β))
T
T
F
F
T
F
T
F
T
T
F
F
F
F
T
T
T
T
F
F
T
T
T
F
T
F
T
F
F
T
F
T
T
F
T
F

The following truth table shows that a statement of the form '(α | (β | β))' always has the same truth table as a statement of the form '(α → β)':

α
β
|
|
(β
|
β))
T
T
F
F
T
F
T
F
T
T
F
F
T
F
T
T
T
F
T
F
F
T
F
T
T
F
T
F

Although far from intuitively obvious, the following table shows that an expression of the form '(((α | α) | (β | β)) | (α | β))' always has the same truth-value as the corresponding wff of the form '(α ↔ β)':

α
β
|
(((α
|
α)
|

|
β))
|

|
β))
T

T
F
F

T
F
T
F
T
T
F
F
F
F
T
T
T
T
F
F
T
T
T
F
T
F
T
F
F
T
F
T
T
F
T
F
T
F
F
T
T
T
F
F
F
T
T
T
T
F
T
F

This leaves only the sign '¬', which is perhaps the easiest to define using '|', as clearly '(α | α)', or, roughly, "not both α and α", has the opposite truth-value from α itself:

α
|

|
α)
T

F

T
F
F
T
T
F

If, therefore, we desire a language for use in studying propositional logic that has as small a vocabulary as possible, we might suggest using a language that employs the sign '|' as its sole primitive operator, and defines all other truth-functional operators in virtue of it. Let us call such a language PL''. PL'' differs from PL and PL' only in that its definition of a well-formed formula can be simplified even further:

Definition: A well-formed formula (or wff) of PL'' is defined recursively as follows:

  1. Any statement letter is a well-formed formula.
  2. If α and β are well-formed formulas, then so is '(α | β)'.
  3. Nothing that cannot be constructed by successive steps of (1)-(2) is a well-formed formula.

In language PL'', strictly speaking, '|' is the only operator. However, for reasons that should be clear from the above, any expression from language PL that involves any of the operators '¬', '&', 'v', '→', or '↔' could be translated into language PL'' without the loss of any of its important logical properties. In effect, statements using these signs could be regarded as abbreviations or shorthand expressions for wffs of PL'' that only use the operator '|'.

Even here, the choice of '|' as the sole primitive is to some extent arbitrary. It would also be possible to reduce all truth-functional operators down to a single primitive by making use of a sign '↓', treating it as roughly equivalent to the English expression, "neither ... nor ...", so that the corresponding chart would be drawn as follows:

α β (α ↓ β)
T
T
F
F
T
F
T
F
F
F
F
T

If we were to use '↓' as our sole operator, we could again define all the others. '¬α' would be defined as '(α ↓ α)'; 'v β)' would be defined as '((α ↓ β) ↓ (α ↓ β))'; '(α & β)' would be defined as '((α ↓ α) ↓ (β ↓ β))'; and similarly for the other operators. The sign '↓' is sometimes also referred to as the Sheffer stroke, and is also called the Peirce/Sheffer dagger.

Depending on one's purposes in studying propositional logic, sometimes it makes sense to use a rich language like PL with more primitive operators, and sometimes it makes sense to use a relatively sparse language such as PL' or PL'' with fewer primitive operators. The advantage of the former approach is that it conforms better with our ordinary reasoning and thinking habits; the advantage of the latter is that it simplifies the logical language, which makes certain interesting results regarding the deductive systems making use of the language easier to prove.

For the remainder of this article, we shall primarily be concerned with the logical properties of statements formed in the richer language PL. However, we shall consider a system making use of language PL' in some detail in Section VI, and shall also make brief mention of a system making use of language PL''.

4. Tautologies, Logical Equivalence and Validity

Truth-functional propositional logic concerns itself only with those ways of combining statements to form more complicated statements in which the truth-values of the complicated statements depend entirely on the truth-values of the parts. Owing to this, all those features of a complex statement that are studied in propositional logic derive from the way in which their truth-values are derived from those of their parts. These features are therefore always represented in the truth table for a given statement.

Some complex statements have the interesting feature that they would be true regardless of the truth-values of the simple statements making them up. A simple example would be the wff "P v ¬P"; i.e., "P or not P". It is fairly easy to see that this statement is true regardless of whether 'P' is true or 'P' is false. This is also shown by its truth table:

P
|
P
v
¬
P
T

F

T
F
T
T
F
T
T
F

There are, however, statements for which this is true but it is not so obvious. Consider the wff, "R → ((P → Q) v ¬(R → Q))". This wff also comes out as true regardless of the truth-values of 'P', 'Q' and 'R'.

P
Q
R
|
R

((P

Q)
v
¬
(R

Q))
T
T
T
T
F
F
F
F
T
T
F
F
T
T
F
F
T
F
T
F
T
F
T
F
T
F
T
F
T
F
T
F
T
T
T
T
T
T
T
T
T
T
T
T
F
F
F
F
T
T
F
F
T
T
T
T
T
T
F
F
T
T
F
F
T
T
T
F
T
T
T
T
F
F
T
F
F
F
T
F
T
F
T
F
T
F
T
F
T
T
F
T
T
T
F
T
T
T
F
F
T
T
F
F

Statements that have this interesting feature are called tautologies. Let us define this notion precisely.

Definition: a wff is a tautology if and only if it is true for all possible truth-value assignments to the statement letters making it up.

Tautologies are also sometimes called logical truths or truths of logic because tautologies can be recognized as true solely in virtue of the principles of propositional logic, and without recourse to any additional information.

On the other side of the spectrum from tautologies are statements that come out as false regardless of the truth-values of the simple statements making them up. A simple example of such a statement would be the wff "P & ¬P"; clearly such a statement cannot be true, as it contradicts itself. This is revealed by its truth table:

P
|
P
&
¬
P
T

F

T
F
F
F
F
T
T
F

To state this precisely:

Definition: a wff is a self-contradiction if and only if it is false for all possible truth-value assignments to the statement letters making it up.

Another, more interesting, example of a self-contradiction is the statement "¬(P → Q) & ¬(Q → P)"; this is not as obviously self-contradictory. However, we can see that it is when we consider its truth table:

P
Q
|
¬
(P

Q)
&
¬
(Q

P)
T

T
F
F

T
F
T
F
F
T
F
F
T
T
F
F
T
F
T
T
T
F
T
F
F
F
F
F
F
F
T
F
T
F
T
F
T
T
F
T
T
T
F
F

A statement that is neither self-contradictory nor tautological is called a contingent statement. A contingent statement is true for some truth-value assignments to its statement letters and false for others. The truth table for a contingent statement reveals which truth-value assignments make it come out as true, and which make it come out as false. Consider the truth table for the statement "(P → Q) & (P → ¬Q)":

P
Q
|
(P

Q)
&
(P
¬
Q)
T

T
F
F

T
F
T
F
T
T
F
F
T
F
T
T
T
F
T
F
F
F
T
T
T
T
F
F
F
T
T
T
F
T
F
T
T
F
T
F

We can see that of the four possible truth-value assignments for this statement, two make it come as true, and two make it come out as false. Specifically, the statement is true when 'P' is false and 'Q' is true, and when 'P' is false and 'Q' is false, and the statement is false when 'P' is true and 'Q' is true and when 'P' is true and 'Q' is false.

Truth tables are also useful in studying logical relationships that hold between two or more statements. For example, two statements are said to be consistent when it is possible for both to be true, and are said to be inconsistent when it is not possible for both to be true. In propositional logic, we can make this more precise as follows.

Definition: two wffs are consistent if and only if there is at least one possible truth-value assignment to the statement letters making them up that makes both wffs true.

Definition: two wffs are inconsistent if and only if there is no truth-value assignment to the statement letters making them up that makes them both true.

Whether or not two statements are consistent can be determined by means of a combined truth table for the two statements. For example, the two statements, "P v Q" and "¬(P ↔ ¬Q)" are consistent:

P
Q
|
P
v
Q

¬
(P

¬
Q)
T

T
F
F

T
F
T
F
T
T
F
F
T
T
T
F
T
F
T
F
T
F
F
T
T
T
F
F
F
T
T
F
F
T
F
T
T
F
T
F

Here, we see that there is one truth-value assignment, that in which both 'P' and 'Q' are true, that makes both "P v Q" and "¬(P ↔ ¬Q)" true. However, the statements "(P → Q) & P" and "¬(Q v ¬P)" are inconsistent, because there is no truth-value assignment in which both come out as true.

P
Q
|
(P

Q)
&
P
¬
(Q
v
¬
P))
T

T
F
F

T
F
T
F
T
T
F
F
T
F
T
T
T
F
T
F
T
F
F
F
T
T
F
F
F
T
F
F
T
F
T
F
T
F
T
T
F
F
T
T
T
T
F
F

Another relationship that can hold between two statements is that of having the same truth-value regardless of the truth-values of the simple statements making them up. Consider a combined truth table for the wffs "¬P → ¬Q" and "¬(Q & ¬P)":

P
Q
|
¬
P

¬
Q
¬
(Q
&
¬
P))
T

T
F
F

T
F
T
F
F
F
T
T
T
T
F
F
T
T
F
T
F
T
F
T
T
F
T
F
T
T
F
T
T
F
T
F
F
F
T
F
F
F
T
T
T
T
F
F

Here we see that these two statements necessarily have the same truth-value.

Definition: two statements are said to be logically equivalent if and only if all possible truth-value assignments to the statement letters making them up result in the same resulting truth-values for the whole statements.

The above statements are logically equivalent. However, the truth table given above for the statements "P v Q" and "¬(P ↔ ¬Q)" show that they, on the other hand, are not logically equivalent, because they differ in truth-value for three of the four possible truth-value assignments.

Finally, and perhaps most importantly, truth tables can be utilized to determine whether or not an argument is logically valid. In general, an argument is said to be logically valid whenever it has a form that makes it impossible for the conclusion to be false if the premises are true. (See the encylopedia entry on "Validity and Soundness".) In classical propositional logic, we can give this a more precise characterization.

Definition: a wff β is said to be a logical consequence of a set of wffs α1, α2, ..., αn, if and only if there is no truth-value assignment to the statement letters making up these wffs that makes all of α1, α2, ..., αn true but does not make β true.

An argument is logically valid if and only if its conclusion is a logical consequence of its premises. If an argument whose conclusion is β and whose only premise is α is logically valid, then α is said to logically imply β.

For example, consider the following argument:

P → Q
¬Q → P
Q

We can test the validity of this argument by constructing a combined truth table for all three statements.

P
Q
|
P

Q

¬
Q
P

Q
T

T
F
F

T
F
T
F
T
T
F
F
T
F
T
T
T
F
T
F
F
T
F
T
T
F
T
F
T
T
T
F
T
T
F
F
T
F
T
F

Here we see that both premises come out as true in the case in which both 'P' and 'Q' are true, and in which 'P' is false but 'Q' is true. However, in those cases, the conclusion is also true. It is possible for the conclusion to be false, but only if one of the premises is false as well. Hence, we can see that the inference represented by this argument is truth-preserving. Contrast this with the following example:

P → Q
¬Q v ¬P

Consider the truth-value assignment making both 'P' and 'Q' true. If we were to fill in that row of the truth-value for these statements, we would see that "P → Q" comes out as true, but "¬Q v ¬P" comes out as false. Even if 'P' and 'Q' are not actually both true, it is possible for them to both be true, and so this form of reasoning is not truth-preserving. In other words, the argument is not logically valid, and its premise does not logically imply its conclusion.

One of the most striking features of truth tables is that they provide an effective procedure for determining the logical truth, or tautologyhood of any single wff, and for determining the logical validity of any argument written in the language PL. The procedure for constructing such tables is purely rote, and while the size of the tables grows exponentially with the number of statement letters involved in the wff(s) under consideration, the number of rows is always finite and so it is in principle possible to finish the table and determine a definite answer. In sum, classical propositional logic is decidable.

5. Deduction: Rules of Inference and Replacement

a. Natural Deduction

Truth tables, as we have seen, can theoretically be used to solve any question in classical truth-functional propositional logic. However, this method has its drawbacks. The size of the tables grows exponentially with the number of distinct statement letters making up the statements involved. Moreover, truth tables are alien to our normal reasoning patterns. Another method for establishing the validity of an argument exists that does not have these drawbacks: the method of natural deduction. In natural deduction an attempt is made to reduce the reasoning behind a valid argument to a series of steps each of which is intuitively justified by the premises of the argument or previous steps in the series.

Consider the following argument stated in natural language:

Either cat fur or dog fur was found at the scene of the crime. If dog fur was found at the scene of the crime, officer Thompson had an allergy attack. If cat fur was found at the scene of the crime, then Macavity is responsibile for the crime. But officer Thompson didn't have an allergy attack, and so therefore Macavity must be responsible for the crime.

The validity of this argument can be made more obvious by representing the chain of reasoning leading from the premises to the conclusion:

  1. Either cat fur was found at the scene of the crime, or dog fur was found at the scene of the crime. (Premise)
  2. If dog fur was found at the scene of the crime, then officer Thompson had an allergy attack. (Premise)
  3. If cat fur was found at the scene of the crime, then Macavity is responsible for the crime. (Premise)
  4. Officer Thompson did not have an allergy attack. (Premise)
  5. Dog fur was not found at the scene of the crime. (Follows from 2 and 4.)
  6. Cat fur was found at the scene of the crime. (Follows from 1 and 5.)
  7. Macavity is responsible for the crime. (Conclusion. Follows from 3 and 6.)

Above, we do not jump directly from the premises to the conclusion, but show how intermediate inferences are used to ultimately justify the conclusion by a step-by-step chain. Each step in the chain represents a simple, obviously valid form of reasoning. In this example, the form of reasoning exemplified in line 5 is called modus tollens, which involves deducing the negation of the antecedent of a conditional from the conditional and the negation of its consequent. The form of reasoning exemplified in step 5 is called disjunctive syllogism, and involves deducing one disjunct of a disjunction on the basis of the disjunction and the negation of the other disjunct. Lastly, the form of reasoning found at line 7 is called modus ponens, which involves deducing the truth of the consequent of a conditional given truth of both the conditional and its antecedent. "Modus ponens" is Latin for affirming mode, and "modus tollens" is Latin for denying mode.

A system of natural deduction consists in the specification of a list of intuitively valid rules of inference for the construction of derivations or step-by-step deductions. Many equivalent systems of deduction have been given for classical truth-functional propositional logic. In what follows, we sketch one system, which is derived from the popular textbook by Irving Copi (1953). The system makes use of the language PL.

b. Rules of Inference

Here we give a list of intuitively valid rules of inference. The rules are stated in schematic form. Any inference in which any wff of language PL is substituted unformly for the schematic letters in the forms below constitutes an instance of the rule.

Modus ponens (MP):

α → β
α
β

(Modus ponens is sometimes also called "modus ponendo ponens", "detachment" or a form of "-elimination".)

Modus tollens (MT):

α → β
¬β
¬α

(Modus tollens is sometimes also called "modus tollendo tollens" or a form of "-elimination".)

Disjunctive syllogism (DS): (two forms)

α v β
¬α
β

α v β
¬β
α

(Disjunctive syllogism is sometimes also called "modus tollendo ponens" or "v-elimination".)

Addition (DS): (two forms)

α
α v β

β
α v β

(Addition is sometimes also called "disjunction introduction" or "v-introduction".)

Simplification (Simp): (two forms)

α & β
α

α & β
β

(Simplification is sometimes also called "conjunction elimination" or "&-elimination".)

Conjunction (Conj):

α
β
α & β

(Conjunction is sometimes also called "conjunction introduction", "&-introduction" or "logical multiplication".)

Hypothetical syllogism (HS):

α → β
β → γ
α → γ

(Hypothetical syllogism is sometimes also called "chain reasoning" or "chain deduction".)

Constructive dilemma (CD):

(α → γ) & (β → δ)
α v β
γ v δ

Absorption (Abs):

α → β
α → (α & β)

c. Rules of Replacement

The nine rules of inference listed above represent ways of inferring something new from previous steps in a deduction. Many systems of natural deduction, including those initially designed by Gentzen, consist entirely of rules similar to the above. If the language of a system involves signs introduced by definition, it must also allow the substitution of a defined sign for the expression used to define it, or vice versa. Still other systems, while not making use of defined signs, allow one to make certain substitutions of expressions of one form for expressions of another form in certain cases in which the expressions in question are logically equivalent. These are called rules of replacement, and Copi's natural deduction system invokes such rules. Strictly speaking, rules of replacement differ from inference rules, because, in a sense, when a rule of replacement is used, one is not inferring something new but merely stating what amounts to the same thing using a different combination of symbols. In some systems, rules for replacement can be derived from the inference rules, but in Copi's system, they are taken as primitive.

Rules of replacement also differ from inference rules in other ways. Inference rules only apply when the main operators match the patterns given and only apply to entire statements. Inference rules are also strictly unidirectional: one must infer what is below the horizontal line from what is above and not vice-versa. However, replacement rules can be applied to portions of statements and not only to entire statements; moreover, they can be implemented in either direction.

The rules of replacement used by Copi are the following:

Double negation (DN):

'¬¬α' is interreplaceable with α

(Double negation is also called "¬-elimination".)

Commutativity (Com): (two forms)

'α & β' is interreplaceable with 'β & α'
'α v β' is interreplaceable with 'β v α'

Associativity (Assoc): (two forms)

'(α & β) & γ' is interreplaceable with 'α & (β & γ)'
'v β) v γ' is interreplaceable with 'α vv γ)'

Tautology (Taut): (two forms)

α is interreplaceable with 'α & α'
α is interreplaceable with 'α v α'

DeMorgan's Laws (DM): (two forms)

'¬(α & β)' is interreplaceable with '¬α v ¬β'
'¬(α v β)' is interreplaceable with '¬α & ¬β'

Transposition (Trans):

'α → β' is interreplaceable with '¬β → ¬α'

(Transposition is also sometimes called "contraposition".)

Material Implication (Impl):

'α → β' is interreplaceable with '¬α v β'

Exportation (Exp):

'α → (β → γ)' is interreplaceable with '(α & β) → γ'

Distribution (Dist): (two forms)

'α & (β v γ)' is interreplaceable with '(α & β) v (α & γ)'
'α v (β & γ)' is interreplaceable with 'v β) & (α v γ)'

Material Equivalence (Equiv): (two forms)

'α ↔ β' is interreplaceable with '(α → β) & (β → α)'
'α ↔ β' is interreplaceable with '(α & β) v (¬α & ¬β)'

(Material equivalence is sometimes also called "biconditional introduction/elimination" or "-introduction/elimination".)

d. Direct Deductions

A direct deduction of a conclusion from a set of premises consists of an ordered sequence of wffs such that each member of the sequence is either (1) a premise, (2) derived from previous members of the sequence by one of the inference rules, (3) derived from a previous member of the sequence by the replacement of a logically equivalent part according to the rules of replacement, and such that the conclusion is the final step of the sequence.

To be even more precise, a direct deduction is defined as an ordered sequence of wffs, β1, β2, ..., βn, such that for each step βi where i is between 1 and n inclusive, either (1) βi is a premise, (2) βi matches the form given below the horizontal line for one of the 9 inference rules, and there are wffs in the sequence prior to βi matching the forms given above the horizontal line, (3) there is a previous step in the sequence βj where j < i and βj differs from βi at most by matching or containing a part that matches one of the forms given for one of the 10 replacement rules in the same place in whcih βi contains the wff of the corresponding form, and such that the conclusion of the argument is βn.

Using line numbers and the abbreviations for the rules of the system to annotate, the chain of reasoning given above in English, when transcribed into language PL and organized as a direct deduction, would appear as follows:

1. C v D Premise
2. C → O Premise
3. D → M Premise
4. ¬O Premise
5. ¬C 2,4 MT
6. D 1,5 DS
7. M 2,6 MP

There is no unique derivation for a given conclusion from a given set of premises. Here is a distinct derivation for the same conclusion from the same premises:

1. C v D Premise
2. C → O Premise
3. D → M Premise
4. ¬O Premise
5. (C → O) & (D → M) 2,3 Conj
6. O v M 1,5 CD
7. M 4,6 DS

Consider next the argument:

P ↔ Q
(S v T) → Q
¬P v (¬T & R)
T → U

This argument has six distinct statement letters, and hence constructing a truth table for it would require 64 rows. The table would have 22 columns, thereby requiring 1,408 distinct T/F calculations. Happily, the derivation of the conclusion of the premises using our inference and replacement rules, while far from simple, is relatively less exhausting:

1. P ↔ Q Premise
2. (S v T) → Q Premise
3. ¬P v (¬T & R) Premise
4. (P → Q) & (Q → P) 1 Equiv
5. Q → P 4 Simp
6. (S v T) → P 2,5 HS
7. P → (¬T & R) 3 Impl
8. (S v T) → (¬T & R) 6,7 HS
9. ¬(S v T) v (¬T & R) 8 Impl
10. (¬S & ¬T) v (¬T & R) 9 DM
11. [(¬S & ¬T) v ¬T] & [(¬S & ¬T) v R] 10 Dist
12. (¬S & ¬T) v ¬T 11 Simp
13. ¬T v (¬S & ¬T) 12 Com
14. (¬T v ¬S) & (¬T v ¬T) 13 Dist
15. ¬T v ¬T 14 Simp
16. ¬T 15 Taut
17. ¬T v U 16 Add
18. T → U 17 Impl

e. Conditional and Indirect Proofs

Together the nine inference rules and ten rules of replacement are sufficient for creating a deduction for any logically valid argument, provided that the argument has at least one premise. However, to cover the limiting case of arguments with no premises, and simply to facillitate certain deductions that would be recondite otherwise, it is also customary to allow for certain methods of deduction other than direct derivation. Specifically, it is customary to allow the proof techniques known as conditional proof and indirect proof.

A conditional proof is a derivation technique used to establish a conditional wff, i.e., a wff whose main operator is the sign '→'. This is done by constructing a sub-derivation within a derivation in which the antecedent of the conditional is assumed as a hypothesis. If, by using the inference rules and rules of derivation (and possibily additional sub-derivations), it is possible to arrive at the consequent, it is permissible to end the sub-derivation and conclude the truth of the conditional statement within the main derivation, citing the sub-derivation as a conditional proof, or 'CP' for short. This is much clearer by considering the following example argument:

P → (Q v R)
P → ¬S
S ↔ Q
P → R

While a direct derivation establishing the validity of this argument is possible, it is easier to establish the validity of this argument using a conditional derivation.

1. P → (Q v R) Premise
2. P → ¬S Premise
3. S ↔ Q Premise
4. P Assumption
5. Q v R 1,4 MP
6. ¬S 2,4 MP
7. (S → Q) & (Q → S) 3 Equiv
8. Q → S 7 Simp
9. ¬Q 6,8 MT
10. R 5,9 DS
11. P → R 4-10 CP

Here in order to establish the conditional statement "P → R", we constructed a sub-derivation, which is the indented portion found at lines 4-10. First, we assumed the truth of 'P', and found that with it, we could derive 'R'. Given the premises, we therefore had shown that if 'P' were also true, so would be 'R'. Therefore, on the basis of the sub-derivation we were justified in concluding "P → R". This is the usual methodology used in logic and mathematics for establishing the truth of a conditional statement.

Another common method is that of indirect proof, also known as proof by reductio ad absurdum. (For a fuller discussion, see the entry on reductio ad absurdum in the encyclopedia.) In an indirect proof ('IP' for short), our goal is to demonstate that a certain wff is false on the basis of the premises. Again, we make use of a sub-derivation; here, we begin by assuming the opposite of that which we're trying to prove, i.e., we assume that the wff is true. If on the basis of this assumption, we can demonstate an obvious contradiction, i.e., a statement of the form 'α & ¬α', we can conclude that the assumed statement must be false, because anything that leads to a contradiction must be false.

For example, consider the following argument:

P → Q
P → (Q → ¬P)
¬P

While, again, a direct derivation of the conclusion for this argument from the premises is possible, it is somewhat easier to prove that "¬P" is true by showing that, given the premises, it would be impossible for 'P' to be true by assuming that it is and showing this to be absurd.

1. P → Q Premise
2. P → (Q → ¬P) Premise
3. P Assumption
4. Q 1,3 MP
5. Q → ¬P 2,3 MP
6. ¬P 4,5 MP
7. P & ¬P 3,6 Conj
8. ¬P 3-7 IP

Here we were attempting to show that "¬P" was true given the premises. To do this we assumed instead that 'P' was true. Since this assumption was impossible, we were justified in concluding that 'P' is false, i.e., that "¬P" is true.

When making use of either conditional proof or indirect proof, once a sub-derivation is finished, the lines making it up cannot be used later on in the main derivation or any additional sub-derivations that may be constructed later on.

This completes our characterization of a system of natural deduction for the language PL.

The system of natural deduction just described is formally adequate in the following sense. Earlier, we defined a valid argument as one in which there is no possible truth-value assignment to the statement letters making up its premises and conclusion that makes the premises all true but the conclusion untrue. It is provable that an argument in the language of PL is formally valid in that sense if and only if it is possible to construct a derivation of the conclusion of that argument from the premises using the above rules of inferences, rules of replacement and techniques of conditional and indirect proof. Space limitations preclude a full proof of this in the metalanguage, although the reasoning is very similar to that given for the axiomatic Propositional Calculus discussed in Sections VI and VII below.

Informally, it is fairly easy to see that no argument for which a deduction is possible in this system could be invalid according to truth tables. Firstly, the rules of inference are all truth-preserving. For example, in the case of modus ponens, it is fairly easy to see from the truth table for any set of statements of the appropriate form that no truth-value assignment could make both 'α → β' and α true while making β false. A similar consideration applies for the others. Moreover, truth tables can easily be used to verify that statements of one of the forms mentioned in the rules of replacement are all logically equivalent with those the rule allows one to swap for them. Hence, the statements could never differ in truth-value for any truth-value assignment. In case of conditional proof, note that any truth-value assignment must make either the conditional true, or it must make the antecedent true and consequent false. The antecedent is what is assumed in a conditional proof. So if the truth-value assignment makes both it and the premises of the argument true, because the other rules are all truth-preserving, it would be impossible to derive the consequent unless it were also true. A similar consideration justifies the use of indirect proof.

This system represents a useful method for establishing the validity of an argument that has the advantage of coinciding more closely with the way we normally reason. (As noted earlier, however, there are many equivalent systems of natural deduction, all coinciding relatively closely to ordinary reasoning patterns.) One disadvantage this method has, however, is that, unlike truth tables, it does not provide a means for recognizing that an argument is invalid. If an argument is invalid, there is no deduction for it in the system. However, the system itself does not provide a means for recognizing when a deduction is impossible.

Another objection that might be made to the system of deduction sketched above is that it contains more rules and more techniques than it needs to. This leads us directly into our next topic.

6. Axiomatic Systems and the Propositional Calculus

The system of deduction discussed in the previous section is an example of a natural deduction system, i.e., a system of deduction for a formal language that attempts to coincide as closely as possible to the forms of reasoning most people actually employ. Natural systems of deduction are typically contrasted with axiomatic systems. Axiomatic systems are minimalist systems; rather than including rules corresponding to natural modes of reasoning, they utilize as few basic principles or rules as possible. Since so few kinds of steps are available in a deduction, relatively speaking, an axiomatic system usually requires more steps for the deduction of a conclusion from a given set of premises as compared to a natural deduction system.

Typically, an axiomatic system consists in the specification of certain wffs that are specified as "axioms". An axiom is something that is taken as a fundamental truth of the system that does not itself require proof. To allow for the deduction of results from the axioms or the premises of an argument, the system typically also includes at least one (and often only one) rule of inference. Usually, an attempt is made to limit the number of axioms to as few as possible, or at least, limit the number of forms axioms can take.

Because axiomatic systems aim to be minimal, typically they employ languages with simplified vocabularies whenever possible. For classical truth-functional propositional logic, this might involve using a simpler language such as PL' or PL'' instead of the full language PL.

For most of the remainder of this section, we shall sketch an axiomatic system for classical truth-functional propositional logic, which we shall dub the Propositional Calculus (or PC for short). The Propositional Calculus makes use of language PL', described above. That is, the only connectives it uses are '→' and '¬', and the other operators, if used at all, would be understood as shorthand abbreviations making use of the definitions discussion in Section III(c).

System PC consists of three axiom schemata, which are forms a wff fits if it is axiom, along with a single inference rule: modus ponens. We make this more precise by specifying certain definitions.

Definition: a wff of language PL' is an axiom of PC if and only if it is an instance of one of the following three forms:

α → (β → α) (Axiom Schema 1, or AS1)
(α → (β → γ)) → ((α → β) → (α → γ)) (Axiom Schema 2, or AS2)
(¬α → ¬β) → ((¬α → β) → α) (Axiom Schema 3, or AS3)

Note that according to this definition, every wff of the form 'α → (β → α)' is an axiom. This includes an infinite number of different wffs, from simple cases such as "P → (Q → P)", to much more complicated cases such as "(¬R → ¬¬S) → [¬(¬M → N) → (¬R → ¬¬S)]".

An ordered step-by-step deduction constitutes a derivation in system PC if and only if each step in the deduction is either (1) a premise of the argument, (2) an axiom, or (3) derived from previous steps by modus ponens. Once again we can make this more precise with the following (more recondite) definition:

Definition: an ordered sequence of wffs β1, β2, ..., βn is a derivation in system PC of the wff βn from the premises α1, α2, ..., αm if and only if, for each wff βi in the sequence β1, β2, ..., βn, either (1) βi is one of the premises α1, α2, ..., αm, (2) βi is an axiom of PC, or (3) βi follows from previous members of the series by the inference rule modus ponens (that is, there are previous members of the sequence, βj and βk, such that βj takes the form 'βk → βi').

For example, consider the following argument written in the language PL':

P
(R → P) → (R → (P → S))
R → S

The following constitutes a derivation in system PC of the conclusion from the premises:

1. P Premise
2. (R → P) → (R → (P → S)) Premise
3. P → (R → P) Instance of AS1
4. R → P 1,3 MP
5. R → (P → S) 2,4 MP
6. (R → (P → S)) → ((R → P) → (R → S)) Instance of AS2
7. (R → P) → (R → S) 5,6 MP
8. R → S 4,7 MP

Historically, the original axiomatic systems for logic were designed to be akin to other axiomatic systems found in mathematics, such as Euclid's axiomatization of geometry. The goal of developing an axiomatic system for logic was to create a system in which to derive truths of logic making use only of the axioms of the system and the inference rule(s). Those wffs that can be derived from the axioms and inference rule alone, i.e., without making use of any additional premises, are called theorems or theses of the system. To make this more precise:

Definition: a wff α is said to be a theorem of PC if and only if there is an ordered sequence of wffs, specifically, a derivation, β1, β2, ..., βn such that, α is βn and each wff βi in the sequence β1, β2, ..., βn, is such that either (1) βi is an axiom of PC, or (2) βi follows from previous members of the series by modus ponens.

One very simple theorem of system PC is the wff "P → P". We can show that it is a theorem by constructing a derivation of "P → P" that makes use only of axioms and MP and no additional premises.

1. P → (P → P) Instance of AS1
2. P → ((P → P) → P) Instance of AS1
3. [P → ((P → P) → P)] → [(P → (P → P)) → (P → P)] Instance of AS2
4. (P → (P → P)) → (P → P) 2,3 MP
5. P → P 1,4 MP

It is fairly easy to see that not only is "P → P" a theorem of PC, but so is any wff of the form 'α → α'. Whatever α happens to be, there will be a derivation in PC of the same form:

1. α → (α → α) Instance of AS1
2. α → ((α → α) → α) Instance of AS1
3. [α → ((α → α) → α)] → [(α → (α → α)) → (α → α)] Instance of AS2
4. (α → (α → α)) → (α → α) 2,3 MP
5. α → α 1,4 MP

So even if we make α in the above the more complicated wff, e.g., "¬(¬M → N)", a derivation with the same form shows that "¬(¬M → N) → ¬(¬M → N)" is also a theorem of PC. Hence, we call 'α → α' a theorem schema of PC, because all of its instances are theorems of PC. From now on, let's call it "Theorem Schema 1", or "TS1" for short.

The following are also theorem schemata of PC:

α → ¬¬α (Theorem Schema 2, or TS2)
¬α → (α → β) (TS3)
α → (¬β → ¬(α → β)) (TS4)
(α → β) → ((¬α → β) → β) (TS5)

You may wish to verify this for yourself by attempting to construct the appropriate proofs for each. Be warned that some require quite lengthy derivations!

It is common to use the notation:

⊢ β

to mean that β is a theorem. Similarly, it is common to use the notation:

α1, α2, ..., αm⊢ β

to mean that it is possible to construct a derivation of β making use of α1, α2, ..., αm as premises.

Considered in terms of number of rules it employs, the axiomatic system PC is far less complex than the system of natural deduction sketched in the previous section. The natural deduction system made use of nine inference rules, ten rules of replacement and two additional proof techniques. The axiomatic system instead, makes use of three axiom schemata and a single inference rule and no additional proof techniques. Yet, the axiomatic system is not lacking in any way.

Indeed, for any argument using language PL' that is logically valid according to truth tables it is possible to construct a derivation in system PC for that argument. Moreover, every wff of language PL' that is a logical truth, i.e., a tautology according to truth tables, is a theorem of PC. The reverse of these results is true as well; every theorem of PC is a tautology, and every argument for which a derivation in system PC exists is logically valid according to truth tables. These and other features of the Propositional Calculus are discussed, and some are even proven in the next section below.

While the Propositional Calculus is simpler in one way than the natural deduction system sketched in the previous section, in many ways it is actually more complicated to use. For any given argument, a deduction of the conclusion from the premises conducted in PC is likely to be far longer and less psychologically natural than one carried out in a natural deduction system. Such deductions are only simpler in the sense that fewer distinct rules are employed.

System PC is only one of many possible ways of axiomatizing propositional logic. Some systems differ from PC in only very minor ways. For example, we could alter our definition of "axiom" so that a wff is an axiom iff it is an instance of (A1), an instance of (A2), or an instance of the following:

(A3') (¬α → ¬β) → (β → α)

Replacing axiom schema (A3) with (A3'), while altering the way certain deductions must be constructed (making the proofs of many important results longer), has little effect otherwise; the resulting system would have all the same theorems and every argument for which a deduction is possible in the system above would also have a deduction in the revised system, and vice versa.

We also noted above that, strictly speaking, there are an infinite number of axioms of system PC. Instead of utilizing an infinite number of axioms, we might alternatively have utilized only three axioms, namely, the specific wffs:

(A1*) P → (Q → P)
(A2*) (P → (Q → R)) → ((P → Q) → (P → R))
(A3*) (¬P → ¬Q) → ((¬P → Q) → P)

Note that (A1*) is just a unique wff; on this approach, the wff "(¬R → ¬¬S) → [¬(¬M → N) → (¬R → ¬¬S)]" would not count as an axiom, even though it shares a common from with (A1*). To such a system it would be necessary to add an additional inference rule, a rule of substitution or uniform replacement. This would allow one to infer, from a theorem of the system, the result of uniformly replacing any given statement letter (e.g., 'P' or 'Q') that occurs within the theorem, with any wff, simple or complex, provided that the same wff replaces all occurrences of the same statement letter in the theorem. On this approach, "(¬R → ¬¬S) → [¬(¬M → N) → (¬R → ¬¬S)]", while not an axiom, would still be a theorem because it could be derived from the rule of uniform replacement twice, i.e., by first replacing 'P' in (A1*) with "(¬R → ¬¬S)", and then replacing 'Q' with "¬(¬M → N)". The resulting system differs in only subtle ways from our earlier system PC. System PC, strictly speaking, uses only one inference rule, but countenances an infinite number of axioms. This system uses only three axioms, but makes use of an additional rule. System PC, however, avoids this additional inference rule by allowing everything that one could get by substitution in (A1*) to be an axiom. For every theorem α, therefore, if β if a wff obtained from α by uniformly substituting wffs for statement letters in α, then β is also a theorem of PC, because there would always be a proof of β analogous to the proof of α only beginning from different axioms.

It is also possible to construct even more austere systems. Indeed, it is possible to utilize only a single axiom schema (or a single axiom plus a rule of replacement). One possibility, suggested by C. A. Meredith (1953), would be define an axiom as any wff matching the following form:

((((α → β) → (¬γ → ¬δ)) → γ) → ε) → ((ε → α) → (δ → α))

The resulting system is equally powerful as system PC and has exactly the same set of theorems. However, it is far less psychologically intuitive and straightforward, and deductions even for relatively simple results are often very long.

Historically, the first single axiom schema system made use, instead of language PL', the even simpler language PL'' in which the only connective is the Sheffer stroke, '|', as discussed above. In that case, it is possible to make use only of the following axiom schema:

(α | (β | γ)) | ((δ | (δ | δ)) | ((ε | β) | ((α | ε) | (α | ε))))

The inference rule of MP is replaced with the rule that from wffs of the form 'α | (β | γ)' and α, one can deduce the wff γ. This system was discovered by Jean Nicod (1917). More recently, a number of possible single axiom systems have been found, some faring better than others in terms of the complexity of the single axiom and in terms of how long deductions for the same results are required to be. (For recent research in this area, consult McCune et. al. 2002.) Generally, however the more the system allows, the shorter the deductions.

Besides axiomatic and natural deduction forms, deduction systems for propositional logic can also take the form of a sequent calculus; here, rather than specifying definitions of axioms and inference rules, the rules are stated directly in terms of derivability or entailment conditions; e.g., one rule might state that if (either α ⊢ β or α ⊢ γ) then if γ, α ⊢ β then α ⊢ β. Sequent calculi, like modern natural deduction systems, were first developed by Gerhard Gentzen. Gentzen's work also suggests the use of tree-like deduction systems rather than linear step-by-step deduction systems, and such tree systems have proven more useful in automated theorem-proving, i.e., in the creation of algorithms for the mechanical construction of deductions (e.g., by a computer). However, rather then exploring the details of these and other rival systems, in the next section, we focus on proving things about the system PC, the axiomatic system treated at length above.

7. Important Meta-Theoretic Results for the Propositional Calculus

Note: this section is relatively more technical, and is designed for audiences with some prior background in logic or mathematics. Beginners may wish to skip to the next section.

In this section, we sketch informally the proofs given for certain important features of the Propositional Calculus. Our first topic, however, concerns the language PL' generally.

Metatheoretic result 1: Language PL' is expressively adequate, i.e., within the context of classical bivalent logic, there are no truth-functions that cannot be represented in it.

We noted in Section III(c) that the connectives '&', '↔' and 'v' can be defined using the connectives of PL' ('→' and '¬'). More generally, metatheoretic result 1 holds that any statement built using truth-functional connectives, regardless of what those connectives are, has an equivalent statement formed using only '→' and '¬'. Here's the proof.

1. Assume that α is some wff built in some language containing any set of truth-functional connectives, including those not found in PL, PL' or PL''. For example, α might make use of some three or four-place truth-functional connectives, or connectives such as the exclusive or, or the sign '↓', or any others you might imagine.

2. We need to show that there is a wff β formed only with the connectives '→' and '¬' that is logically equivalent with α. Because we have already shown that forms equivalent to those built from '&', '↔', and 'v' can be constructed from '→' and '¬', we are entitled to use them as well.

3. In order for it to be logically equivalent to α, the wff β that we construct must have the same final truth-value for every possible truth-value assignment to the statement letters making up α, or in other words, it must have the same final column in a truth table.

4. Let p1, p2, ..., pn be the distinct statement letters making up α. For some possible truth-value assignments to these letters, α may be true, and for others α may be false. The only hard case would be the one in which α is contingent. If α were not contingent, it must either be a tautology, or a self-contradiction. Since clearly tautologies and self-contradictions can be constructed in PL', and all tautologies are logically equivalent to one another, and all self-contradictions are equivalent to one another, in those cases, our job is easy. Let us suppose instead that α is contingent.

5. Let us construct a wff β in the following way.

(a) Consider in turn each truth-value assignment to the letters p1, p2, ..., pn. For each truth-value assignment, construct a conjunction made up of those letters the truth-value assignment makes true, along with the negations of those letters the truth-value assignment makes false. For instance, if the letters involved are 'A', 'B' and 'C', and the truth-value assignment makes 'A' and 'C' true but 'B' false, consider the conjunction '((A & ¬B) & C)'.

(b) From the resulting conjunctions, form a complex disjunction formed from those conjunctions formed in step (a) for which the corresponding truth-value assignment makes α true. For example, if the truth-value assignment making 'A' and 'C' true but 'B' false makes α true, include it the disjunction. Suppose, e.g., that this truth-value assignment does make α true, as does that assignment in which 'A' and 'B' and 'C' are all made false, but no other truth-value assignment makes α true. In that case, the resulting disjunction would be '((A & ¬B) & C) v ((¬A & ¬B) & ¬C)'.

6. The wff β constructed in step 5 is logically equivalent to α. Consider that for those truth-value assignments making α true, one of the conjunctions making up the disjunction β is true, and hence the whole disjunction is true as well. For those truth-value assignments making α false, none of the conjunctions making up β is true, because each conjunction will contain at least one conjunct that is false on that truth-value assignment.

7. Because β is constructed using only '&', 'v' and '¬', and these can in turn be defined using only '¬' and '→', and because β is equivalent to α, there is a wff built up only from '¬' and '→' that is equivalent to α, regardless of the connectives making up α.

8. Therefore, PL' is expressively adequate.

Corollary 1.1: Language PL'' is also expressively adequate.

The corrollary follows at once from metatheoretic result 1, along with the fact, noted in Section III(c), that '→', and '¬' can be defined using only '|'.

Metatheoretic result 2 (a.k.a. "The Deduction Theorem"): In the Propositional Calculus, PC, whenever it holds that α1, ..., αn⊢ β, it also holds that α1, ..., αn-1⊢ αn → β

What this means is that whenver we can prove a given result in PC using a certain number of premises, then it is possible, using all the same premises leaving out one exception, αn, to prove the conditional statement made up of the removed premise, αn, as antecedent and the conclusion of the original derivation, β, as consequent. The importance of this result is that, in effect, is shows that the technique of conditional proof, typically found in natural deduction (see Section V), is unnecessary in PC, because whenever it is possible to prove the consequent of a conditional by taking the antecedent as an additional premise, a derivation directly for the conditional can be found without taking the antecedent as a premise.

Here's the proof:

1. Assume that α1, ..., αn⊢ β. This means that there is a derivation of β in the Propositional Calculus from the premises α1, ..., αn. This derivation takes the form of an ordered sequence γ1, γ2, ..., γm, where the last member of the sequence, γm, is β, and each member of the sequence is either (1) a premise, i.e., it is one of α1, ..., αn, (2) an axiom of PC, (3) derived from previous members of the sequence by modus ponens.

2. We need to show that there is a derivation of 'αn → β', which, while possibly making use of the other premises of the argument, does not make use of αn. We'll do this by showing that for each member, γi, of the sequence of the original derivation: γ1, γ2, ..., γm, one can derive 'αn → γi' without making use of αn as a premise.

3. Each step γi in the sequence of the original derivation was gotten at in one of three ways, as mentioned in (1) above. Regardless of which case we are dealing with, we can get the result that α1, ..., αn-1⊢ αn → γi. There are three cases to consider:

Case (a): Suppose γi is a premise of the original argument. Then γi is either one of α1, ..., αn-1 or it is αn itself. In the latter subcase, what we desire to get is that 'αn → αn' can be gotten at without using αn as a premise. Because 'αn → αn' is an instance of TS1, we can get it without using any premises. In the latter case, notice that γi is one of the premises we're allowed to use in the new derivation. We're also allowed to introduce the instance of AS1, 'γi → (αn → γi)'. From these, we can get 'αn → γi' by modus ponens.

Case (b): Suppose γi is an axiom. We need to show that we can get 'αn → γi' without using αn as a premise. In fact, we can get it without using any premises. Because γi is an axiom, we can use it in the new derivation as well. As in the last case, we have 'γi → (αn → γi)' as another axiom (an instance of AS1). From these two axioms, we arrive at 'αn → γi' by modus ponens.

Case (c): Suppose that γi was derived from previous members of the sequence by modus ponens. Specifically, there is some γj and γk such that both j and k are less than i, and γj takes the form 'γk → γi'. We can assume that we have already been able to derive both 'αn → γj' -- i.e., 'αn → (γk → γi)' -- and 'αn → γk' in the new derivation without making use of αn. (This may seem questionable in the case that either γj or γk was itself gotten at by modus ponens. But notice that this just pushes the assumption back, and eventually one will reach the beginning of the original derivation. The first two steps of the sequence, namely, γ1 and γ2, cannot have been derived by modus ponens, since this would require there to have been two previous member of the sequence, which is impossible.) So, in our new derivation, we already have both 'αn → (γk → γi)' and 'αn → γk'. Notice that 'n → (γk → γi)] → [(αn → γk) → (αn → γi)]' is an instance of AS2, and so it can be introduced in the new derivation. By two steps of modus ponens, we arrive at 'αn → γi', again without using αn as a premise.

4. If we continue through each step of the original derivation, showing for each such step γi, we can get 'αn → γi' without using αn as a premise, eventually, we come to the last step of the original derivation, γm, which is β itself. Applying the procedure from step (3), we get that 'αn → β' without making use of αn as a premise. Therefore, the new derivation formed in this way shows that α1, ..., αn-1⊢ αn → β, which is what we were attempting to show.

What's interesting about this proof for metatheoretic result 2 is that it provides a recipe, given a derivation for a certain result that makes use of one or more premises, for transforming that derivation into one of a conditional statement in which one of the premises of the original argument has become the antecedent. This may be much clearer with an example.

Consider the following derivation for the result that: Q → R ⊢ (P → Q) → (P → R):

1. Q → R Premise
2. (Q → R) → (P → (Q → R)) AS1
3. P → (Q → R) 1,2 MP
4. [P → (Q → R)] → [(P → Q) → (P → R)] AS2
5. (P → Q) → (P → R) 3,4 MP

It is possible to transform the above derivation into one that uses no premises that shows that '(Q → R) → ((P → Q) → (P → R))' is a theorem of PC. The procedure for such a transformation involves looking at each step of the original derivation, and for each one, attempt to derive the same statement, only beginning with "(Q → R) → ...", without making use of "(Q → R)" as a premise. How this is done depends on whether the step is a premise, an axiom, or a result of modus ponens, and depending on which it is, applying one of the three procedures sketched in the proof above. The result is the following:

1. (Q → R) → (Q → R) TS1
2. (Q → R) → (P → (Q → R)) AS1
3. [(Q → R) → (P → (Q → R))] → {(Q → R) → [(Q → R) → (P → (Q → R))]} AS1
4. (Q → R) → [(Q → R) → (P → (Q → R))] 2,3 MP
5. {(Q → R) → [(Q → R) → (P → (Q → R))]} → {[(Q → R) → (Q → R)] → [(Q → R) → (P → (Q → R))]} AS2
6. [(Q → R) → (Q → R)] → [(Q → R) → (P → (Q → R)])] 4,5 MP
7. (Q → R) → (P → (Q → R)) 1,6 MP
8. [P → (Q → R)] → [(P → Q) → (P → R)] AS2
9. {[P → (Q → R)] → [(P → Q) → (P → R)]} → [(Q → R) → {[P → (Q → R)] → [(P → Q) → (P → R)]}] AS1
10. (Q → R) → {[P → (Q → R)] → [(P → Q) → (P → R)]} 8,9 MP
11. [(Q → R) → {[P → (Q → R)] → [(P → Q) → (P → R)]}] → {[(Q → R) → (P → (Q → R))] → [(Q → R) → ((P → Q) → (P → R))]} AS2
12. [(Q → R) → (P → (Q → R))] → [(Q → R) → ((P → Q) → (P → R))] 10,11 MP
13. (Q → R) → ((P → Q) → (P → R)) 7,12 MP

The procedure for transforming one sort of derivation into another is purely rote. Moreover, the result is quite often not the most elegant or easy way to show that which you were trying to show. Notice, e.g., in the above that lines (2) and (7) are redudant, and more steps were taken than necessary. However, the purely rote procedure is effective.

This metatheoretic result is due to Jacques Herbrand (1930).

It is interesting on its own, especially when one reflects on it as a substitution or replacement for the conditional proof technique. However, it is also very useful for proving other metatheoretic results, as we shall see below.

Metatheoretic result 3: If α is a wff of language PL', and the statement letters making it up are p1, p2, ..., pn, then if we consider any possible truth-value assignment to these letters, and consider the set of premises, Δ, that contains p1if the truth-value assignment makes p1true, but contains '¬p1'if the truth-value assignment makes p1false, and similarly for p2, ..., pn, if the truth-value assignment makes α true, then in PC, it holds that Δ ⊢ α, and if it makes α false, then Δ ⊢ ¬α.

Here's the proof.

1. By the definition of a wff, α is either itself a statement letter, or ultimately built up from statement letters by the connectives '¬' and '→'.

2. If α is itself a statement letter, then obviously either it or its negation is a member of Δ. It is a member of Δ if the truth-value assignment makes it true. In that case, obviously, there is a derivation of α from Δ, since a premise maybe introduced at any time. If the truth-value assignment makes it false instead, then '¬α' is a member of Δ, and so we have a derivation of '¬α' from Δ, since again a premise may be introduced at any time. This covers the case in which our wff is simply a statement letter.

3. Suppose that α is built up from some other wff β with the sign '¬', i.e., suppose that α is '¬β'. We can assume that we have already gotten the desired result for β. (Either β is a statement letter, in which case the result holds by step (2), or is itself ultimately built up from statement letters, so even if verifying this assumption requires making a similar assumption, ultimately we will get back to statement letters.) That is, if the truth-value assignment makes β true, then we have a derivation of β from Δ. If it makes it false, then we have a derivation of '¬β' from Δ. Suppose that it makes β true. Since α is the negation of β, the truth-value assignment must make α false. Hence, we need to show that there is a derivation of '¬α' from Δ. Since α is '¬β', '¬α' is '¬¬β'. If we append to our derivation of β from Δ the derivation of 'β → ¬¬β', an instance of TS2, we can reach a derivation of '¬¬β' by modus ponens, which is what was required. If we assume instead that the truth-value assignment makes β false, then by our assumption, there is a derivation of '¬β' from Δ. Since α is the negation of β, this truth-value assigment must make α true. Now, α simply is '¬β', so we already have a derivation of it from Δ.

4. Suppose instead that α is built up from other wffs β and γ with the sign '→', i.e., suppose that α is 'β → γ'. Again we can assume that we have already gotten the desired result for β and γ. (Again, either they themselves are statements letters or built up in like fashion from statement letters.) Suppose that the truth-value assignment we are considering makes α true. Because α is 'β → γ', by the semantics for the sign '→', the truth-value assignment must make either β false or γ true. Take the first subcase. If it makes β false, then by our assumption, there is a derivation of '¬β' from Δ. If we append to this the derivation of the instance of TS3, '¬β → (β → γ)', by modus ponens we arrive at derivation of 'β → γ', i.e., α, from Δ. If instead, the truth-value assignment makes γ true, then by our assumption there is a derivation of γ from Δ. If we add to this derivation the instance of AS1, 'γ → (β → γ)', by modus ponens, we then again arrive at a derivation of 'β → γ', i.e., α, from Δ. If instead, the truth-value assignment makes α false, then since α is 'β → γ', the truth-value assignment in question must make β true and γ false. By our assumption, then it is possible to prove both β and '¬γ' from Δ. If we concatenate these two derivations, and add to them the derivation of the instance of TS4, 'β → [¬γ → ¬(β → γ)]', then by two applications of modus ponens, we can derive '¬(β → γ)', which is simply '¬α', which is what was desired.

From the above we see that the Propositional Calculus PC can be used to demonstrate the appropriate results for a complex wff if given as premises either the truth or falsity of all its simple parts. This is of course the foundation of truth-functional logic, that the truth or falsity of those complex statements one can make in it be determined entirely by the truth or falsity of the simple statements entering in to it. Metatheoretic result 3 is again interesting on its own, but it plays a crucial role in the proof of completeness, which we turn to next.

Metatheoretic result 4 (Completeness): If α is a wff of language PL' and a tautology, then α is a theorem of the Propositional Calculus.

This feature of the Propositional Calculus is called completeness because it shows that the Propositional Calculus, as a deductive system aiming to capture all the truths of logic, is a success. Every wff true solely in virtue of the truth-functional nature of the connectives making it up is something can prove using only the axioms of PC along with modus ponens. Here's the proof:

1. Suppose that α is a tautology. This means that every possible truth-value assignment to its statement letters makes it true.

2. Let the statement letters making up α be p1, p2, ..., pn, arranged in some order (say alphabetically and by the number of their subscripts). It follows from (1) and metatheoretic result 3, that there is a derivation in PC of α using any possible set of premises that consists, for each statement letter, of either it or its negation.

3. By metatheoretic result 2, we can remove from each of these sets of premises either pn or '¬pn', depending on which it contains, and make it an antecedent of a conditional in which α is consequent, and the result will be provable without using pn or '¬pn' as a premise. This means that for every possible set of premises consisting of either p1 or '¬p1' and so on, up until pn-1, we can derive both 'pn → α' and '¬pn → α'.

4. The wff '(pn → α) → ((¬pn → α) → α)' is an instance of TS5. Therefore, for any set of premises from which one can derive both 'pn → α' and '¬pn → α', by two applications of modus ponens, one can also derive α itself.

5. Putting (3) and (4) together, we have the result that α can be derived from every possible set of premises consisting of either p1 or '¬p1' and so on, up until pn-1.

6. We can apply the same reasoning given in steps (3)-(5) to remove pn-1 or its negation from the premise sets by the deduction theorem, arriving at the result that for every set of premises consisting of either p1 or '¬p1' and so on, up until pn-2, it is possible to derive α. If continue to apply this reasoning, eventually, we'll get the result that we can derive α with either p1 or its negation as our sole premise. Again, applying the deduction theorem, this means that both 'p1 → α' and '¬p1 → α' can be proven in PC without using any premises, i.e., they are theorems. Concatenating the derivations of these theorems along that for the instance of TS5, '(p1 → α) → ((¬p1 → α) → α)', by two applications of modus ponens, it follows that α itself is a theorem, which is what we sought to demonstrate.

The above proof of the completeness of system PC is easier to appreciate when visualized. Suppose, just for the sake of illustration, that the tautology we wish to demonstrate in system PC has three statement letters, 'P', 'Q' and 'R'. There are eight possible truth-value assignments to these letters, and since α is a tautology, all of them make α true. We can sketch in at least this much of α's truth table:

P
Q
R
|
α
T
T
T
T
F
F
F
F
T
T
F
F
T
T
F
F
T
F
T
F
T
F
T
F
T
T
T
T
T
T
T
T

Now, given this feature of α, it follows from metatheoretic result 3, that for every possible combination of premises that consists of either 'P' or "¬P" (but not both), either 'Q' or '¬Q', and 'R' or "¬R", it is possible from those premises to construct a derivation showing α. This can be visualized as follows:

P , Q , R ⊢ α
P , Q , ¬R ⊢ α
P , ¬Q , R ⊢ α
P , ¬Q , ¬R ⊢ α
¬P , Q , R ⊢ α
¬P , Q , ¬R ⊢ α
¬P , ¬Q , R ⊢ α
¬P , ¬Q , ¬R ⊢ α

By the deduction theorem, we can pull out the last premise from each list of premises and make it an antecedent. However, because from the same remaining list of premises we get both 'R → α' and '¬R → α', we can get α by itself from those premises according to TS5. Again, to visualize this:

P , Q ⊢ R → α ... and so P , Q ⊢ α
P , Q ⊢ ¬R → α
P , ¬Q ⊢ R → α ... and so P , ¬Q ⊢ α
P , ¬Q ⊢ ¬R → α
¬P , Q ⊢ R → α ... and so ¬P , Q ⊢ α
¬P , Q ⊢ ¬R → α
¬P , ¬Q ⊢ R → α ... and so ¬P , ¬Q ⊢ α
¬P , ¬Q ⊢ ¬R → α

We can continue this line of reasoning until all the premises are removed.

P , Q ⊢ α P ⊢Q → α and so P ⊢ α and so ⊢ P → α and so ⊢ α
P , ¬Q ⊢ α P ⊢ ¬Q → α
¬P , Q ⊢ α ¬P ⊢ Q → α and so ¬P ⊢ α and so ⊢ ¬P → α
¬P , ¬Q ⊢ α ¬P ⊢¬Q → α

At the end of this process, we see that α is a theorem. Despite only having three axiom schemata and a single inference rule, it is possible to prove any tautology in the simple Propositional Calculus, PC. It is complete in the requisite sense.

This method of proving the completeness of the Propositional Calculus is due to Kalmár (1935).

Corollary 4.1: If a given wff β of language PL' is a logical consequence of a set of wffs α1, α2, ..., αn, according to their combined truth table, then there is a derivation of β with α1, ..., αnas premises in the Propositional Calculus.

Without going into the details of the proof of this corollary, it follows from the fact that if β is a logical consequence of α1, α2, ..., αn, then the wff of the form '1 → (α2 → ... (αn → β)...))' is a tautology. As a tautology, it is a theorem of PC, and so if one begins with its derivation in PC and appends a number of steps of modus ponens using α1, α2, ..., αn as premises, one can derive β.

Metatheoretic result 5 (Soundness): If a wff α is a theorem of the Propositional Calculus (PC), then α is a tautology.

Above, we saw that all tautologies are theorems of PC. The reverse is also true: all theorems of PC are tatuologies. Here's the proof:

1. Suppose that α is a theorem of PC. This means that there is an ordered sequence of steps, each of which is either (1) an axiom of PC, or (2) derived from previous members of the sequence by modus ponens, and such that α is the last member of the sequence.

2. We can show that not only is α a tautology, but so are all the members of the sequence leading to it. The first thing to note is that every axiom of PC is a tautology. To be an axiom of PC, a wff must match one of the axiom schemata AS1, AS2 or AS3. All such wffs must be tautologous; this can easily be verified by constructing truth tables for AS1, AS2 and AS3. (This is left to the reader.)

3. The rule of modus ponens preserves tautologyhood. If α is a tautology and 'α → β' is also a tautology, β must be a tautology as well. This is because if β were not a tautology, it would be false on some truth-value assignments. However, α, as a tautology, is true for all truth-value assignments. Because a statement of the form 'α → β' is false for any truth-value assignment making α true and β false, it would then follow that some truth-value assignment makes 'α → β' false, which is impossible if it too is a tautology.

4. Hence, we see that the axioms with which we begin the sequence, and every step derived from them using modus ponens, must all be tautologies, and consequently, the last step of the sequence, α, must also be a tautology.

This result is called the soundness of the Propositional Calculus; it shows that in it, one cannot demonstrate something that is not logically true.

Corollary 5.1: A wff α of language PL' is a tautology if and only if α is a theorem of system PC.

This follows immediately from metatheoretic results 4 and 5.

Corollary 5.2 (Consistency): There is no wff α of language PL' such that both α and '¬α'are theorems of the Propositional Calculus (PC).

Due to metatheoretic result 5, all theorems of PC are tautologies. It is therefore impossible for both α and '¬α' to be theorems, as this would require both to be tautologies. That would mean that both are true for all truth-value assignments, but obviously, they must have different truth-values for any given truth-value assignment, and cannot both be true for any, much less all, such assignments.

This result is called consistency because it guarantees that no theorem of system PC can be inconsistent with any other theorem.

Corollary 5.3: If there is a derivation of the wff β with α1, α2, ..., αnas premises in the Propositional Calculus, then β is a logical consequence of the set of wffs α1, α2, ..., αn, according to their combined truth table.

This is the converse of Corollary 4.1. It follows by the reverse reasoning involved in that corollary. If there is a derivation of β taking α1, ..., αn as premises, then by multiple applications of the the deduction theorem (Metatheoretic result 2), it follows that '1 → (α2 → ... (αn → β)...))' is a theorem of PC. By metatheoretic result 5, '1 → (α2 → ... (αn → β)...))' must be a tautology. If so, then there cannot be a truth-value assignment making all of α1, ..., αn true while making β false, and so β is a logical consequence of α1, ..., αn.

Corollary 5.4: There is a derivation of the wff β with α1, ..., αnas premises in the Propositional Calculus if and only if β is a logical consequence of α1, ..., αn, according to their combined truth table.

This follows at once from corollaries 4.1 and 5.3. In sum, then, the Propositional Calculus method of demonstrating something to follow from the axioms of logic is extenionally equivalent to the truth table method of determining whether or not something is a logical truth. Similarly, the truth-table method for testing the validity of an argument is equivalent to the test of being able to construct a derivation for it in the Propositional Calculus. In short, the Propositional Calculus is exactly what we wanted it to be.

Corollary 5.5 (Decidability): The Propositional Calculus (PC) is decidable, i.e., there is a finite, effective, rote procedure for determining whether or not a given wff α is a theorem of PC or not.

By Corollary 5.1, a wff α is a theorem of PC if and only if it is a tautology. Truth tables provide a rote, effective, and finite procedure for determining whether or not a given wff is a tautology. They therefore also provide such a procedure for determine whether or not a given wff is a theorem of PC.

8. Forms of Propositional Logic

So far we have focused only on classical, truth-functional propositional logic. Its distinguishing features are (1) that all connectives it uses are truth-functional, i.e., the truth-values of complex statements formed with those connectives depend entirely on the truth-values of the parts, and (2) that it assumes bivalence: all statements are taken to have exactly one of two truth-values -- truth or falsity -- with no statement assigned both truth-values or neither. Classical truth-functional propositional logic is the most widely studied and discussed form, but there are other forms of propositional logic.

Perhaps the most well known form of non-truth-functional propositional logic is modal propositional logic. Modal propositional logic involves introducing operators into the logic involving necessity and possibility, usually along with truth-functional operators such as '→', '&', '¬', etc.. Typically, the sign '☐' is used in place of the English operator, "it is necessary that...", and the sign '◇' is used in place of the English operator "it is possible that...". Sometimes both these operators are taken as primitive, but quite often one is defined in terms of the other, since '¬☐¬α' would appear to be logically equivalent with '◇α'. (Roughly, it means the same to say that something is not necessarily not true as it does to say that it is possibly true.)

To see that modal propositional logic is not truth-functional, just consider the following pair of statements:

☐P
☐(P v ¬P)

The first states that it is necessary that P. Let us suppose in fact that 'P' is true, but might have been false. Since P is not necessarily true, the statement "☐P" is false. However, the statement "P v ¬P" is a tautology and so it could not be false. Hence, the statement "☐(P v ¬P)" is true. Notice that both 'P' and "P v ¬P" are true, but different truth-values result when the operator '☐' is added. So, in modal propositional logic, the truth-value of a statement does not depend entirely on the truth-values of the parts.

The study of modal propositional logic involves identifying under what conditions statements involving the operators '☐' and '◇' should be regarded as true. Different notions or conceptions of necessity lead to different answers to that question. It also involves discovering what inference rules or systems of deduction would be appropriate given the addition of these operators. Here, there is more controversy than with classical truth-functional logic. For example, in the context of discussions of axiomatic systems for modal propositional logic, very different systems result depending on whether instances of the following schemata are regarded as axiomatic truths, or even truths at all:

☐α → ☐☐α
◇α → ☐◇α

If a statement is necessary, is it necessarily necessary? If a statement is possible, is it necessarily possible? A positive answer to the first question is a key assumption in a logical system known as S4 modal logic. Positive answers to both these questions are key assumptions in a logical system known as S5 modal logic. Other systems of modal logic that avoid such assumptions have also been developed. (For an excellent introduction survey, see Hughes and Cresswell 1996.)

Deontic propositional logic and epistemic propositional logic are two other forms of non-truth-functional propositional logic. The former involves introduction of operators similar to the English operators "it is morally obligatory that..." and "it is morally permissible that...". Obviously, some things that are in fact true were not morally obligatory, whereas some things that are true were morally obligatory. Again, the truth-value of a statement in deontic logic does not depend wholly on the truth-value of the parts. Epistemic logic involves the addition of operators similar to the English operators "it is known that..." and "it is believed that ...". While everything that is known to be the case is in fact the case, not everything that is the case is known to be the case, so a statement built up with a "it is known that..." will not depend entirely on the truth of the proposition it modifies, even if it depends on it to some degree.

Yet another widely studied form of non-truth-functional propositional logic is relevance propositional logic, which involves the addition of an operator 'Rel' used connect two statements α and β to form an statement 'Rel(α, β)', which is interpreted to mean that α is related to β in theme or subject matter. For example, if 'P' means that Ben loves Jennifer and 'Q' means that Jennifer is a pop star, then the statement "Rel(P, Q)" is regarded as true; whereas if 'S' means The sun is shining in Tokyo, then "Rel(P, S)" is false, and hence "¬Rel(P, S)" is true. Obviously, whether or not a statement formed using the connective 'Rel' is true does not depend solely on the truth-value of the propositions involved.

One of the motivations for introducing non-truth-functional propositional logics is to make up for certain oddities of truth-functional logic. Consider the truth table for the sign '→' used in Language PL. A statement of the form 'α → β' is regarded as true whenever its antecedent is false or consequent is true. So if we were to translate the English sentence, "if the author of this article lives in France, then the moon is made of cheese" as "E → M", then strangely, it comes out as true given the semantics of the sign '→' because the antecedent, 'E', is false. In modal propositional logic it is possible to define a much stronger sort of operator to use to translate English conditionals as follows:

'α entails β' is defined as '☐(α → β)'

If we transcribe the English "if the author of this article lives in France, then the moon is made of cheese" instead as "E entails M", then it does not come out as true, because presumably, it is possible for the author of this article to live in France without the moon being made of cheese. Similarly, in relevance logic, one could also define a stronger sort of connective as follows:

'α ⇒ β' is defined as 'Rel(α, β) & (α → β)'

Here too, if we were to transcribe the English "if the author of this article lives in France, then the moon is made of cheese" as "E ⇒ M" instead of simply "E → M", it comes out as false, because the author of this article living in France is not related to the composition of the moon.

Besides non-truth-functional logic, other logical systems differ from classical truth-functional logic by allowing statements to be assigned truth-values other than truth or falsity, or to be assigned neither truth nor falsity or both truth and falsity. These sorts of logical systems may still be truth-functional in the sense that the truth-value of an complex statement may depend entirely on the truth-values of the parts, but the rules governing such truth-functionality would be more complicated than for classical logic, because it must consider possibilities that classical logic rejects.

Many-valued or multivalent logics are those that consider more than two truth-values. They may admit anything from three to an infinite number of possible truth-values. The simplest sort of many-valued logic is one that admits three truth-values, e.g., truth, falsity and indeterminancy. It might seem, for example, that certain statements such as statements about the future, or paradoxical statements such as "this sentence is not true" cannot easily be assigned either truth or falsity, and so, it might be concluded, must have an indeterminate truth-value. The admission of this third truth-value requires one to expand the truth tables given in Section III(a). There, we gave a truth table for statements formed using the operator '→'; in three-valued logic, we have to decide what the truth-value of a statement of the form 'α → β' is when either or both of α and β has an indeterminate truth-value. Arguably, if any component of a statement is indeterminate in truth-value, then the whole statement is indeterminate as well. This would lead to the following expanded truth table:

α β (α → β)
T
T
T
I
I
I
F
F
F
T
I
F
T
I
F
T
I
F
T
I
F
I
I
I
T

I
T

However, we might wish to retain the feature of classical logic that a statement of the form 'α → β' is always true when its antecedent is false or its consequent is true, and hold that it is indeterminate only when its antecedent is indeterminate and its consequent false or when its antecedent is true and its consequent indeterminate, so that its truth table appears:

α β (α → β)
T
T
T
I
I
I
F
F
F
T
I
F
T
I
F
T
I
F
T
I
F
T
T
I
T
T
T

Such details will have an effect on the remainders of the logical systems. For example, if an axiomatic or natural deduction system is created, and a desirable feature is that something be provable from no premises if and only if it is a tautology in the sense of being true (and not just not false) for all possible truth-value assignments, if we make use of the first truth table for '→', "P → P" should not be provable, because it is indeterminate when 'P' is, whereas if we use the second truth table, "P → P" should be provable, since it is a tautology according to that truth table, i.e., it is true regardless of which of the three truth-values is assigned to 'P'.

Here we get just a glimpse at the complications created by admitting more than two truth-values. If more than three are admitted, and possibly infinitely many, then the issues become even more complicated.

Intuitionist propositional logic results from rejecting the assumption that every statement is true or false, and countenances statements that are neither. The result is a sort of logic very much akin to a three-valued logic, since "neither true nor false", while strictly speaking the rejection of a truth-value, can be thought of as though it were a third truth-value. In intuitionist logic, the so called "law of excluded middle," i.e., the law that all statements of the form 'α v ¬α' are true is rejected. This is because intuitionist logic takes truth to coincide with direct provability, and it may be that certain statements, such as Goldbach's conjecture in mathematics, are neither provably the case nor provably not the case.

Paraconsistent propositional logic is even more radical, in countenancing statements that are both true and false. Again, depending on the nature of the system, semantic rules have to be given that determine what the truth-value or truth-values a complex statement has when its component parts are both true and false. Such decisions determine what sorts of new or restricted rules of inference would apply to the logical system. For example, paraconsistent logics, if not trivial, must restrict the rules of inference allowable in classical truth-functional logic, because in systems such as those sketched in Sections V and VI above, from a contradiction, i.e., a statement of the form 'α & ¬α', it is possible to deduce any other statement. Consider, e.g., the following deduction in the natural deduction system sketched in Section V.

1. P & ¬ P Premise
2. P 1 Simp
3. ¬P 1 Simp
4. P v Q 2 Add
5. Q 3,4 DS

In order to avoid this result, paraconsistent logics must restrict the notion of a valid inference. In order for an inference to be considered valid, not only must it be truth-preserving, i.e., that it be impossible to arrive at something untrue when starting with true premises, it must be falsity-avoiding, i.e., it must be impossible, starting with true premises, to arrive at something that is false. In paraconsistent logic, where a statement can be both true and false, these two requirements do not coincide. The inference rule of disjunctive syllogism, while truth-preserving, is not falsity-avoiding. In cases in which its premises are true, its conclusion can still be false; more specifically, provided that at least one of its premises is both true and false, its conclusion can be false.

Other forms of non-classical propositional logic, and non-truth-functional propositional logic, continue to be discovered. Obviously any deviance from classical bivalent propositional logic raises complicated logical and philosophical issues that cannot be fully explored here. For more details both on non-classical logic, and on non-truth-functional logic, see the recommended reading section.

9. Suggestions for Further Reading

  • Anderson, A. R. and N. D. Belnap [and J. M. Dunn]. 1975 and 1992. Entailment. 2 vols. Princeton, NJ: Princeton University Press.
  • Bocheński, I. M. 1961. A History of Formal Logic. Notre Dame, Ind.: University of Notre Dame Press.
  • Boole, George. 1847. The Mathematical Analysis of Logic. Cambridge: Macmillan.
  • Boole, George. 1854. An Investigation into the Laws of Thought. Cambridge: Macmillan.
  • Carroll, Lewis. 1958. Symbolic Logic and the Game of Logic. London: Dover.
  • Church, Alonzo. 1956. Introduction to Mathematical Logic. Princeton, NJ: Princeton University Press.
  • Copi, Irving. 1953. Introduction to Logic. New York: Macmillan.
  • Copi, Irving. 1974. Symbolic Logic. 4th ed. New York: Macmillan.
  • da Costa, N. C. A. 1974. "On the Theory of Inconsistent Formal Systems," Notre Dame Journal of Formal Logic 25: 497-510.
  • De Morgan, Augustus. 1847. Formal Logic. London: Walton and Maberly.
  • Fitch, F. B. 1952. Symbolic Logic: An Introduction. New York: Ronald Press.
  • Frege, Gottlob. 1879. Begriffsschrift, ene der arithmetischen nachgebildete Formelsprache des reinen Denkens. Halle: L. Nerbert. Published in English as Conceptual Notation, ed. and trans. by Terrell Bynum. Clarendon: Oxford, 1972.
  • Frege, Gottlob. 1923. "Gedankengefüge," Beträge zur Philosophie des deutchen Idealismus 3: 36-51. Published in English as "Compound Thoughts," in The Frege Reader, edited by Michael Beaney. Oxford: Blackwell, 1997.
  • Gentzen, Gerhard. 1934. "Untersuchungen über das logische Schließen" Mathematische Zeitschrift 39: 176-210, 405-31. Published in English as "Investigations into Logical Deduction," in Gentzen 1969.
  • Gentzen, Gerhard. 1969. Collected Papers. Edited by M. E. Szabo. Amsterdam: North-Holland Publishing.
  • Haack, Susan. 1996. Deviant Logic, Fuzzy Logic. Chicago: University of Chicago Press.
  • Herbrand, Jacques. 1930. "Recherches sur la théorie de la démonstration," Travaux de la Société des Sciences et de la Lettres de Varsovie 33: 133-160.
  • Hilbert, David and William Ackermann. 1950. Principles of Mathematical Logic. New York: Chelsea.
  • Hintikka, Jaakko. 1962. Knowledge and Belief: An Introduction to the Logic of the Two Notions. Ithaca: Cornell University Press.
  • Hughes, G. E. and M. J. Cresswell. 1996. A New Introduction to Modal Logic. London: Routledge.
  • Jevons, W. S. 1880. Studies in Deductive Logic. London: Macmillan.
  • Kalmár, L. 1935. "Über die Axiomatisierbarkeit des Aussagenkalküls," Acta Scientiarum Mathematicarum 7: 222-43.
  • Kleene, Stephen C. 1952. Introduction to Metamathematics. Princeton, NJ: Van Nostrand.
  • Kneale, William and Martha Kneale. 1962. The Development of Logic. Clarendon: Oxford.
  • Lewis, C. I. and C. H. Langford. 1932. Symbolic Logic. New York: Dover.
  • Łukasiewicz, Jan. 1920. "O logice trojwartosciowej," Ruch Filozoficny 5: 170-171. Published in English as "On Three-Valued Logic," in Łukasiewicz 1970.
  • Łukasiewicz, Jan. 1970. Selected Works. Amsterdam: North-Holland.
  • Łukasiewicz, Jan and Alfred Tarski. 1930. "Untersuchungen über den Aussagenkalkül," Comptes Rendus des séances de la Société des Sciences et de la Lettres de Varsovie 32: 30-50. Published in English as "Investigations into the Sentential Calculus," in Tarski 1956.
  • Mally, Ernst. 1926. Grundgesetze des Sollens: Elemente der Logik des Willens. Graz: Leuschner und Lubensky.
  • McCune, William, Robert Veroff, Branden Fitelson, Kenneth Harris, Andrew Feist and Larry Wos. 2002. "Short Single Axioms for Boolean Algebra," Journal of Automated Reasoning 29: 1-16.
  • Mendelson, Elliot. 1997. Introduction to Mathematical Logic. 4th ed. London: Chapman and Hall.
  • Meredith, C. A. 1953. "Single Axioms for the Systems (C, N), (C, O) and (A, N) of the Two-valued Propositional Calculus," Journal of Computing Systems 3: 155-62.
  • Müller, Eugen, ed. 1909. Abriss der Algebra der Logik, by E. Schröder. Leipzig: Teubner.
  • Nicod, Jean. 1917. "A Reduction in the Number of the Primitive Propositions of Logic," Proceedings of the Cambridge Philosophical Society 19: 32-41.
  • Peirce, C. S. 1885. "On the Algebra of Logic," American Journal of Mathematics 7: 180-202.
  • Post, Emil. 1921. "Introduction to a General Theory of Propositions," American Journal of Mathematics 43: 163-185.
  • Priest, Graham, Richard Routley and Jean Norman, eds. 1990. Paraconsistent Logic. Munich: Verlag.
  • Prior, Arthur. 1990. Formal Logic. 2nd. ed. Oxford: Oxford University Press.
  • Read, Stephen, 1988. Relevant Logic. New York: Blackwell.
  • Rescher, Nicholas. 1966. The Logic of Commands. London: Routledge and Kegan Paul.
  • Rescher, Nicholas. 1969. Many-Valued Logic. New York: McGraw Hill.
  • Rosser, J. B. 1953. Logic for Mathematicians. New York: McGraw Hill.
  • Russell, Bertrand. 1906. "The Theory of Implication," American Journal of Mathematics 28: 159-202.
  • Schlesinger, G. N. 1985. The Range of Epistemic Logic. Aberdeen: Aberdeen University Press.
  • Sheffer, H. M. 1913. "A Set of Five Postulates for Boolean Algebras with Application to Logical Constants," Transactions of the American Mathematical Society 15: 481-88.
  • Smullyan, Raymond. 1961. Theory of Formal Systems. Princeton: Princeton University Press.
  • Tarski, Alfred. 1956. Logic, Semantics and Meta-Mathematics. Oxford: Oxford University Press.
  • Urquhart, Alasdair. 1986. "Many-valued Logic," In Handbook of Philosophical Logic, vol. 3, edited by D. Gabbay and F. Guenthner. Dordrecht: Reidel.
  • Venn, John. 1881. Symbolic Logic. London: Macmillan.
  • Whitehead, Alfred North and Bertrand Russell. 1910-1913. Principia Mathematica. 3 vols. Cambridge: Cambridge University Press.
  • Wittgenstein, Ludwig. 1922. Tractatus Logico-Philosophicus. London: Routledge and Kegan Paul.

Author Information

Kevin C. Klement
Email: klement@philos.umass.edu
University of Massachusetts, Amherst
U. S. A.

Evolutionary Ethics

Evolutionary ethics tries to bridge the gap between philosophy and the natural sciences by arguing that natural selection has instilled human beings with a moral sense, a disposition to be good. If this were true, morality could be understood as a phenomenon that arises automatically during the evolution of sociable, intelligent beings and not, as theologians or philosophers might argue, as the result of divine revelation or the application of our rational faculties. Morality would be interpreted as a useful adaptation that increases the fitness of its holders by providing a selective advantage. This is certainly the view of Edward O. Wilson, the "father" of sociobiology, who believes that "scientists and humanists should consider together the possibility that the time has come for ethics to be removed temporarily from the hands of the philosophers and biologicized" (Wilson, 1975: 27). The challenge for evolutionary biologists such as Wilson is to define goodness with reference to evolutionary theory and then explain why human beings ought to be good.

Table of Contents

  1. Key Figures and Key Concepts
    1. Charles Darwin
    2. Herbert Spencer
    3. The Is-Ought Problem
    4. The Naturalistic Fallacy
    5. Sociobiology
  2. Placement in Contemporary Ethical Theory
  3. Challenges for Evolutionary Ethics
  4. References and Further Reading

1. Key Figures and Key Concepts

a. Charles Darwin

The biologization of ethics started with the publication of The Descent of Man by Charles Darwin (1809-1882) in 1871. In this follow-up to On the Origin of Species, Darwin applied his ideas about evolutionary development to human beings. He argued that humans must have descended from a less highly organized form--in fact, from a "hairy, tailed quadruped ... inhabitant of the Old World" (Darwin, 1930: 231). The main difficulty Darwin saw with this explanation is the high standard of moral qualities apparent in humans. Faced with this puzzle, Darwin devoted a large chapter of the book to evolutionary explanations of the moral sense, which he argued must have evolved in two main steps.

First, the root for human morality lies in the social instincts (ibid. 232). Building on this claim by Darwin, today's biologists would explain this as follows. Sociability is a trait whose phylogenetic origins can be traced back to the time when birds "invented" brooding, hatching, and caring for young offspring. To render beings able to fulfill parental responsibilities required social mechanisms unnecessary at earlier stages of evolutionary history. For example, neither amoebae (which reproduce by division) nor frogs (which leave their tadpole-offspring to fend for themselves) need the social instincts present in birds. At the same time as facilitating the raising of offspring, social instincts counterbalanced innate aggression. It became possible to distinguish between "them" and "us" and aim aggression towards individuals that did not belong to one's group. This behavior is clearly adaptive in the sense of ensuring the survival of one's family.

Second, with the development of intellectual faculties, human beings were able to reflect on past actions and their motives and thus approve or disapprove of others as well as themselves. This led to the development of a conscience which became "the supreme judge and monitor" of all actions (ibid. 235). Being influenced by utilitarianism, Darwin believed that the greatest-happiness principle will inevitably come to be regarded as a standard for right and wrong (ibid. 134) by social beings with highly evolved intellectual capacities and a conscience.

Based on these claims, can Darwin answer the two essential questions in ethics? First, how can we distinguish between good and evil? And second, why should we be good? If all his claims were true, they would indeed support answers to the above questions. Darwin's distinction between good and evil is identical with the distinction made by hedonistic utilitarians. Darwin accepts the greatest-happiness principle as a standard of right and wrong. Hence, an action can be judged as good if it improves the greatest happiness of the greatest number, by either increasing pleasure or decreasing pain. And the second question--why we should be good--does not pose itself for Darwin with the same urgency as it did, for instance, for Plato (Thrasymachus famously asked Socrates in the Republic why the strong, who are not in need of aid, should accept the Golden Rule as a directive for action). Darwin would say that humans are biologically inclined to be sympathetic, altruistic, and moral as this proved to be an advantage in the struggle for existence (ibid. 141).

b. Herbert Spencer

The next important contribution to evolutionary ethics was by Herbert Spencer (1820-1903), the most fervent defender of that theory and the creator of the theory of Social Darwinism. Spencer's theory can be summarized in three steps. As did Darwin, Spencer believed in the theory of hedonistic utilitarianism as proposed by Jeremy Bentham and John Stuart Mill. In his view, gaining pleasure and avoiding pain directs all human action. Hence, moral good can be equated with facilitating human pleasure. Second, pleasure can be achieved in two ways, first by satisfying self-regarding impulses and second by satisfying other-regarding impulses. This means that eating one's favorite food and giving food to others are both pleasurable experiences for humans. Third, mutual cooperation between humans is required to coordinate self- and other-regarding impulses, which is why humans develop principles of equity to bring altruistic and egoistic traits into balance (Fieser, 2001, 214).

However, Spencer did not become known for his theory of mutual cooperation. On the contrary, his account of Social Darwinism is contentious to date because it is mostly understood as "an apology for some of the most vile social systems that humankind has ever known," for instance German Nazism (Ruse, 1995: 228). In short, Spencer elevated alleged biological facts (struggle for existence, natural selection, survival of the fittest) to prescriptions for moral conduct (ibid. 225). For instance, he suggested that life is a struggle for human beings and that, in order for the best to survive, it is necessary to pursue a policy of non-aid for the weak: "to aid the bad in multiplying, is, in effect, the same as maliciously providing for our descendants a multitude of enemies" (Spencer, 1874: 346). Spencer's philosophy was widely popular, particularly in North America in the 19th century, but declined significantly in the 20th century.

Which answers could he give to the two essential questions in ethics? How can we distinguish between good and evil and why should we be good? Spencer's answer to question one is identical to Darwin’s (see above) as they both supported hedonistic utilitarianism. However, his answer to question two is interesting, if untenable. Spencer alleged that evolution equaled progress for the better (in the moral sense of the word) and that anything which supported evolutionary forces would therefore be good (Maxwell, 1984: 231). The reasoning behind this was that nature shows us what is good by moving towards it; and hence, "evolution is a process which, in itself, generates value" (Ruse, 1995: 231). If evolution advances the moral good, we ought to support it out of self-interest. Moral good was previously identified with universal human pleasure and happiness by Spencer. If the evolutionary process directs us towards this universal pleasure, we have an egoistic reason for being moral, namely that we want universal happiness. However, to equate development with moral progress for the better was a major value judgement which cannot be held without further evidence, and most evolutionary theorists have given up on the claim (Ruse, 1995: 233; Woolcock, 1999: 299). It also is subject to more conceptual objections, namely deriving "ought" from "is," and committing the naturalistic fallacy.

c. The Is-Ought Problem

The first philosopher who persistently argued that normative rules cannot be derived from empirical facts was David Hume (1711-1776) (1978: 469):

In every system of morality, which I have hitherto met with, I have always remark'd, that the author proceeds for some time in the ordinary way of reasoning, and establishes the being of a God or makes observations concerning human affairs; when of a sudden I am surpriz'd to find, that instead of the usual copulations of propositions, is, and is not, I meet with no proposition that is not connected with an ought, or an ought not. This change is imperceptible; but is, however, of the last consequence.

It is this unexplained, imperceptible change from "is" to "ought" which Hume deplores in moral systems. To say what is the case and to say what ought to be the case are two unrelated matters, according to him. On the one hand, empirical facts do not contain normative statements, otherwise they would not be purely empirical. On the other hand, if there are no normative elements in the facts, they cannot suddenly surface in the conclusions because a conclusion is only deductively valid if all necessary information is present in the premises.

How do Darwin and Spencer derive "ought" from "is"? Let us look at Darwin first, using an example which he could have supported.

  1. Child A is dying from starvation.
  2. The parents of child A are not in a position to feed their child.
  3. The parents of child A are very unhappy that their child is dying from starvation.
  4. Therefore, fellow humans ought morally to provide food for child A.

Darwin (1930: 234) writes that "happiness is an essential part of the general good." Therefore, those who want to be moral ought to promote happiness, and hence, in the above case, provide food. However, the imperceptible move from "is" to "ought" which Hume found in moral systems, is also present in this example. Thus, Darwin derives ought from is when he moves from the empirical fact of unhappiness to the normative claim of a duty to relieve unhappiness.

The same can be said for Spencer whose above argument about the survival of the fittest could be represented as follows:

  1. Natural selection will ensure the survival of the fittest.
  2. Person B is dying from starvation because he is ill, old, and poor.
  3. Therefore, fellow humans ought to morally avoid helping person B so that the survival of the fittest is guaranteed.

Even if both premises were shown to be true, it does not follow that we ought to morally support the survival of the fittest. An additional normative claim equating survival skills with moral goodness would be required to make the argument tenable. Again, this normative part of the argument is not included in the premises. Hence, Spencer also derives "ought" from "is." Thomas Huxley (1906: 80) objects to evolutionary ethics on these grounds when he writes:

The thief and the murderer follow nature just as much as the philantropist. Cosmic evolution may teach us how the good and the evil tendencies of man may have come about; but, in itself, it is incompetent to furnish any better reason why what we call good is preferable to what we call evil than we had before.

d. The Naturalistic Fallacy

But evolutionary ethics was not only attacked by those who supported Hume's claim that normative statements cannot be derived from empirical facts. A related argument against evolutionary ethics was voiced by British philosopher G.E. Moore (1873-1958). In 1903, he published a ground-breaking book, Principia Ethica, which created one of the most challenging problems for evolutionary ethics: the "naturalistic fallacy." According to Michael Ruse (1995), when dealing with evolutionary ethics, "it has been enough for the student to murmur the magical phrase 'naturalistic fallacy,' and then he or she can move on to the next question, confident of having gained full marks thus far on the exam" (p. 223). So, what is the naturalistic fallacy and why does it pose a problem for evolutionary ethics?

Moore was interested in the definition of "good" and particularly in whether the property good is simple or complex. Simple properties, according to Moore, are indefinable as they cannot be described further using more basic properties. Complex properties, on the other hand, can be defined by outlining their basic properties. Hence, "yellow" cannot be defined in terms of its constituent parts, whereas "colored" can be explained further as it consists of several individual colors.

"Good," according to Moore, is a simple property which cannot be described using more basic properties. Committing the naturalistic fallacy is attempting to define "good" with reference to other natural, i.e. empirically verifiable, properties. This understanding of "good" creates serious problems for both Darwin and Spencer. Following Bentham and Mill, both identify moral goodness with "pleasure." This means they commit the naturalistic fallacy as good and pleasant are not identical. In addition, Spencer identifies goodness with "highly evolved," committing the naturalistic fallacy again. (Both Moore's claim in itself as well as his criticism of evolutionary ethics can be attacked, but this would fall outside the scope of this entry.)

e. Sociobiology

Despite the continuing challenge of the naturalistic fallacy, evolutionary ethics has moved on with the advent of sociobiology. In 1948, at a conference in New York, scientists decided to initiate new interdisciplinary research between zoologists and sociologists. "Sociobiology" was the name given to the new discipline aiming to find universally valid regularities in the social behavior of animals and humans. Emphasis was put on the study of biological, i.e. non-cultural, behavior. The field did, however, not get off the ground until Edward Wilson published his Sociobiology: The New Synthesis in 1975. According to Wilson (1975: 4), "sociobiology is defined as the systematic study of the biological basis of all social behavior."

In Wilson's view, sociobiology makes philosophers, at least temporarily, redundant, when it comes to questions of ethics (see quote in introduction). He believes that ethics can be explained biologically when he writes (ibid. 3, emphasis added):

The hypothalamus and limbic system ... flood our consciousness with all the emotions - hate, love, guilt, fear, and others – that are consulted by ethical philosophers who wish to intuit the standards of good and evil. What, we are then compelled to ask, made the hypothalamus and the limbic system? They evolved by natural selection. That simple biological statement must be pursued to explain ethics.

Ethics, following this understanding, evolved under the pressure of natural selection. Sociability, altruism, cooperation, mutual aid, etc. are all explicable in terms of the biological roots of human social behavior. Moral conduct aided the long-term survival of the morally inclined species of humans. According to Wilson (ibid. 175), the prevalence of egoistic individuals will make a community vulnerable and ultimately lead to the extinction of the whole group. Mary Midgley agrees. In her view, egoism pays very badly in genetic terms, and a "consistently egoistic species would be either solitary or extinct" (Midgley, 1980: 94).

Wilson avoids the naturalistic fallacy in Sociobiology by not equating goodness with another natural property such as pleasantness, as Darwin did. This means that he does not give an answer to our first essential question in ethics. What is good? However, like Darwin he gives an answer to question two. Why should we be moral? Because we are genetically inclined to be moral. It is a heritage of earlier times when less morally inclined and more morally inclined species came under pressure from natural selection. Hence, we do not need divine revelation or strong will to be good; we are simply genetically wired to be good. The emphasis in this answer is not on the should, as it is not our free will which makes us decide to be good but our genetic heritage.

One of the main problems evolutionary ethics faces is that ethics is not a single field with a single quest. Instead, it can be separated into various areas, and evolutionary ethics might not be able to contribute to all of them. Let us therefore look at a possible classification for evolutionary ethics, which maps it on the field of traditional ethics, before concluding with possible criticisms.

2. Placement in Contemporary Ethical Theory

For philosophy students, ethics is usually divided into three areas: metaethics, normative ethical theory, and applied ethics. Metaethics looks for possible foundations of ethics. Are there any moral facts out there from which we can deduce our moral theories? Normative ethical theories suggest principles or sets of principles to distinguish morally good from morally bad actions. Applied ethics looks at particular moral issues, such as euthanasia or bribery.

However, this classification is not adequate to accommodate evolutionary ethics in its entirety. Instead, a different three-fold distinction of ethics seems appropriate: descriptive ethics, normative ethics, and metaethics. Descriptive ethics outlines ethical beliefs as held by various people and tries to explain why they are held. For instance, almost all human cultures believe that incest is morally wrong. This belief developed, it could be argued, because it provides a survival advantage to the group that entertains it. Normative ethical theories develop standards to judge which actions are good and which actions are bad. The standard as defended by evolutionary ethics would be something like "Actions that increase the long-term capacity of survival in evolutionary terms are good and actions that decrease this capacity are bad." However, the field has not yet established itself credibly in normative ethics. Consequentialism, deontology, virtue ethics, and social contracts still dominate debates. This is partly due to the excesses of Social Darwinism but also due to the unintuitive nature of the above or similar standards. Evolutionary ethics has been more successful in providing interesting answers in metaethics. Michael Ruse (1995: 250), for instance, argues that morality is a "collective illusion of the genes, bringing us all in.... We need to believe in morality, and so, thanks to our biology, we do believe in morality. There is no foundation "out there" beyond human nature."

Descriptive ethics seems, as yet, the most interesting area for evolutionary ethics, a topic particularly suitable for anthropological and sociological research. Which ethical beliefs do people hold and why? But in all three areas, challenges are to be faced.

3. Challenges for Evolutionary Ethics

The following are some lingering challenges for evolutionary ethics:

  • How can a trait that was developed under the pressure of natural selection explain moral actions that go far beyond reciprocal altruism or enlightened self-interest? How can, for instance, the action of Maximilian Kolbe be explained from a biological point of view? (Kolbe was a Polish priest who starved himself to death in a concentration camp to rescue a fellow prisoner.)
  • Could not human beings have moved beyond their biological roots and transcended their evolutionary origins, in which case they would be able to formulate goals in the pursuit of goodness, beauty, and truth that "have nothing to do directly with survival, and which may at times militate against survival?" (O'Hear, 1997: 203).
  • Morality is universal, whereas biologically useful altruism is particular favoring the family or the group over others. "Do not kill" does not only refer to one's own son, but also to the son of strangers. How can evolutionary ethics cope with universality?
  • Normative ethics aims to be action-guiding. How could humans ever judge an action to be ensuring long-term survival? (This is a practical rather than conceptual problem for evolutionary ethics.)
  • Hume's "is-ought" problem still remains a challenge for evolutionary ethics. How can one move from "is" (findings from the natural sciences, including biology and sociobiology) to "ought"?
  • Similarly, despite the length of time that has passed since the publication of Principia Ethica, the challenge of the "naturalistic fallacy" remains.

Evolutionary ethics is, on a philosopher's time-scale, a very new approach to ethics. Though interdisciplinary approaches between scientists and philosophers have the potential to generate important new ideas, evolutionary ethics still has a long way to go.

4. References and Further Reading

  • Darwin, Charles (1871, 1930) The Descent of Man, Watts & Co., London.
  • Fieser, James (2001) Moral Philosophy through the Ages, Mayfield Publishing Company, Mountain View California), Chapter 12 "Evolutionary Ethics."
  • Hume, David (1740, 1978) A Treatise of Human Nature, Clarendon Press, Oxford.
  • Maxwell, Mary (1984) Human Evolution: A Philosophical Anthropology, Croom Helm, London.
  • Midgley, Mary (1980) Beast and Man: The Roots of Human Nature, Methuen, London.
  • O'Hear, Anthony (1997) Beyond Evolution: Human Nature and the Limits of Evolutionary Explanation, Clarendon Press, Oxford.
  • Ruse, Michael (1995) Evolutionary Naturalism, Routledge, London.
  • Spencer, Herbert (1874) The Study of Sociology, Williams & Norgate, London.
  • Wilson, Edward O. (1975) Sociobiology: The New Synthesis, Harvard University Press, Cambridge, Massachusetts.
  • Woolcock, Peter G. (1999) "The Case Against Evolutionary Ethics Today," in: Maienschein, Jane and Ruse, Michael (eds) Biology and the Foundation of Ethics, Cambridge University Press, Cambridge, pp. 276-306.

Author Information

Doris Schroeder
Lancaster University, United Kingdom

Theories of Explanation

Within the philosophy of science there have been competing ideas about what an explanation is. Historically, explanation has been associated with causation: to explain an event or phenomenon is to identify its cause. But with the growth and development of philosophy of science in the 20th century, the concept of explanation began to receive more rigorous and specific analysis. Of particular concern were theories that posited the existence of unobservable entities and processes (atoms, fields, genes, and so forth). These posed a dilemma: on the one hand, the staunch empiricist had to reject unobservable entities as a matter of principle; on the other, theories that appealed to unobservable entities were clearly producing revolutionary results. Thus philosophers of science sought some way to characterize the obvious value of these theories without abandoning the empiricist principles deemed central to scientific rationality.

A theory of explanation might treat explanations in either a realist or an epistemic (that is, anti-realist) sense. A realist interpretation of explanation holds that the entities or processes an explanation posits actually exist--the explanation is a literal description of external reality. An epistemic interpretation, on the contrary, holds that such entities or processes do not necessarily exist in any literal sense but are simply useful for organizing human experience and the results of scientific experiments--the point of an explanation is only to facilitate the construction of a consistent empirical model, not to furnish a literal description of reality. Thus Hempel's epistemic theory of explanation deals only in logical form, making no mention of any actual physical connection between the phenomenon to be explained and the facts purported to explain it, whereas Salmon's realist account emphasizes that real processes and entities are conceptually necessary for understanding exactly why an explanation works.

In contrast to these theoretical and primarily scientific approaches, some philosophers have favored a theory of explanation grounded in the way people actually perform explanation. Ordinary Language Philosophy stresses the communicative or linguistic aspect of an explanation, its utility in answering questions and furthering understanding between two individuals, while an approach based in cognitive science maintains that explaining is a purely cognitive activity and that an explanation is a certain kind of mental representation that results from or aids in this activity. It is a matter of contention within cognitive science whether explanation is properly conceived as the process and results of belief revision or as the activation of patterns within a neural network.

This article focuses on the way thinking about explanation within the philosophy of science has changed since 1950. It begins by discussing the philosophical concerns that gave rise to the first theory of explanation, the deductive-nomological model. Discussions of this theory and standard criticisms of it are followed by an examination of attempts to amend, extend or replace this first model. There is particular emphasis on the most general aspects of explanation and on the extent to which later developments reflect the priorities and presuppositions of different philosophical traditions. There are many important aspects of explanation not covered, most notably the relation between the different types of explanation such as teleological, functional, reductive, psychological, and historical explanation -- that are employed in various branches of human inquiry.

Table of Contents

  1. Introduction
  2. Hempel's Theory of Explanation
  3. Standard Criticisms of Hempel's Theory of Explanation
  4. Contemporary Developments in the Theory of Explanation
    1. Explanation and Causal Realism
    2. Explanation and Constructive Empiricism
    3. Explanation and Ordinary Language Philosophy
    4. Explanation and Cognitive Science
    5. Explanation, Naturalism and Scientific Realism
  5. The Current State of the Theory of Explanation
  6. References and Further Reading

1. Introduction

Most people, philosophers included, think of explanation in terms of causation. Very roughly, to explain an event or phenomenon is to identify its cause. The nature of causation is one of the perennial problems of philosophy, so on the basis of this connection one might reasonably attempt to trace thinking about the nature of explanation to antiquity. (Among the ancients, for example, Aristotle's theory of causation is plausibly regarded as a theory of explanation.) But the idea that the concept of explanation warrants independent analysis really did not begin to take hold until the 20th century. Generally, this change occurred as the result of the linguistic turn in philosophy. More specifically, it was the result of philosophers of science attempting to understand the nature of modern theoretical science.

Of particular concern were theories that posited the existence of unobservable entities and processes (for example, atoms, fields, genes, etc.). These posed a dilemma. On the one hand, the staunch empiricist had to reject unobservable entities as a matter of principle; on the other hand, theories that appealed to unobservables were clearly producing revolutionary results. A way was needed to characterize the obvious value of these theories without abandoning the empiricist principles deemed central to scientific rationality.

In this context it became common to distinguish between the literal truth of a theory and its power to explain observable phenomena. Although the distinction between truth and explanatory power is important, it is susceptible to multiple interpretations, and this remains a source of confusion even today. The problem is this: In philosophy the terms "truth" and "explanation" have both realist and epistemic interpretations. On a realist interpretation the truth and explanatory power of a theory are matters of the correspondence of language with an external reality. A theory that is both true and explanatory gives us insight into the causal structure of the world. On an epistemic interpretation, however, these terms express only the power of a theory to order our experience. A true and explanatory theory orders our experience to a greater degree than a false non-explanatory one. Hence, someone who denies that scientific theories are explanatory in the realist sense of the term may or may not be denying that they are explanatory in the epistemic sense. Conversely, someone who asserts that scientific theories are explanatory in the epistemic sense may or may not be claiming that they are explanatory in the realist sense. The failure to distinguish these senses of "explanation" can and does foster disagreements that are purely semantic in nature.

One common way of employing the distinction between truth and explanation is to say that theories that refer to unobservable entities may explain the phenomena, but they are not literally true. A second way is to say that these theories are true, but they do not really explain the phenomena. Although these statements are superficially contradictory, they can both be made in support of the same basic view of the nature of scientific theories. This, it is now easy to see, is because the terms 'truth' and 'explanation' are being used differently in each statement. In the first, 'explanation' is being used epistemically and 'truth' realistically; in the second, 'explanation' is being used realistically and 'truth' epistemically. But both statements are saying roughly the same thing, namely, that a scientific theory may be accepted as having a certain epistemic value without necessarily accepting that the unobservable entities it refers to actually exist. (This view is known as anti-realism.) One early 20th century philosopher scientist, Pierre Duhem, expressed himself according to the latter interpretation when he claimed:

A physical theory is not an explanation. It is a system of mathematical propositions, deduced from a small number of principles, which aim to represent as simply, as completely, and as exactly as possible a set of experimental laws. ([1906] 1962: p7)

Duhem claimed that:

To explain is to strip the reality of the appearances covering it like a veil, in order to see the bare reality itself. (op.cit.: p19)

Explanation was the task of metaphysics, not science. Science, according to Duhem, does not comprehend reality, but only gives order to appearance. However, the subsequent rise of analytic philosophy and, in particular, logical positivism made Duhem's acceptance of classical metaphysics unpopular. The conviction grew that, far from being explanatory, metaphysics was meaningless insofar as it issued claims that had no implications for experience. By the time Carl Hempel (who, as a logical positivist, was still fundamentally an anti-realist about unobservable entities) articulated the first real theory of explanation (1948) the explanatory power of science could be stipulated.

To explain the phenomena in the world of our experience, to answer the question "Why?" rather than only the question "What?", is one of the foremost objectives of all rational inquiry; and especially scientific research, in its various branches strives to go beyond a mere description of its subject matter by providing an explanation of the phenomena it investigates. (Hempel and Oppenheim 1948: p8)

For Hempel, answering the question "Why?" did not, as for Duhem, involve an appeal to a reality beyond all experience. Hempel employs the epistemic sense of explanation. For him the question "Why?" was an expression of the need to gain predictive control over our future experiences, and the value of a scientific theory was to be measured in terms of its capacity to produce this result.

2. Hempel's Theory of Explanation

According to Hempel, an explanation is:

...an argument to the effect that the phenomenon to be explained ...was to be expected in virtue of certain explanatory facts. (1965 p. 336)

Hempel claimed that there are two types of explanation, what he called 'deductive-nomological' (DN) and 'inductive-statistical' (IS) respectively." Both IS and DN arguments have the same structure. Their premises each contain statements of two types: (1) initial conditions C, and (2) law-like generalizations L. In each, the conclusion is the event E to be explained:

C1, C2, C3,...Cn

L1, L2, L3,...Ln

------------------------

E

The only difference between the two is that the laws in a DN explanation are universal generalizations, whereas the laws in IS explanations have the form of statistical generalizations. An example of a DN explanation containing one initial condition and one law-like generalization is:

C. The infant's cells have three copies of chromosome 21.

L. Any infant whose cells have three copies of chromosome 21 has Down's Syndrome.

--------------------------------------------------------------------------------------------------

E. The infant has Down's Syndrome.

An example of an IS explanation is:

C. The man's brain was deprived of oxygen for five continuous minutes.

L. Almost anyone whose brain is deprived of oxygen for five continuous minutes will sustain brain damage.

---------------------------------------------------------------------------------------------------

E. The man has brain damage.

For Hempel, DN explanations were always to be preferred to IS explanations. There were two reasons for this.

First, the deductive relationship between premises and conclusion maximized the predictive value of the explanation. Hempel accepted IS arguments as explanatory just to the extent that they approximated DN explanations by conferring a high probability on the event to be explained.

Second, Hempel understood the concept of explanation as something that should be understood fundamentally in terms of logical form. True premises are, of course, essential to something being a good DN explanation, but to qualify as a DN explanation (what he sometimes called a potential DN explanation) an argument need only exhibit the deductive-nomological structure. (This requirement placed Hempel squarely within the logical positivist tradition, which was committed to analyzing all of the epistemically significant concepts of science in logical terms.) There is, however, no corresponding concept of a potential IS explanation. Unlike DN explanations, the inductive character of IS explanations means that the relation between premises and conclusion can always be undermined by the addition of new information. (For example, the probability of brain damage, given that a man is deprived of oxygen for 7 minutes, is lowered somewhat by the information that the man spent this time at the bottom of a very cold lake.) Consequently, it is always possible that a proposed IS explanation, even if the premises are true, would fail to predict the fact in question, and thus have no explanatory significance for the case at hand.

3. Standard Criticisms of Hempel's Theory of Explanation

Hempel's dissatisfaction with statistical explanation was at odds with modern science, for which the explanatory use of statistics had become indispensable. Moreover, Hempel's requirement that IS explanations approximate the predictive power of DN explanations has the counterintuitive implication that for inherently low probability events no explanations are possible. For example, since smoking two packs of cigarettes a day for 40 years does not actually make it probable that a person will contract lung cancer, it follows from Hempel's theory that a statistical law about smoking will not be involved in an IS explanation of the occurrence of lung cancer. Hempel's view might be defended here by claiming that when our theories do not allow us to predict a phenomenon with a high degree of accuracy, it is because we have incomplete knowledge of the initial conditions. However, this seems to require us to base a theory of explanation on the now dubious metaphysical position that all events have determinate causes.

Another important criticism of Hempel's theory is that many DN arguments with true premises do not appear to be explanatory. Wesley Salmon raised the problem of relevance with the following example:

C1. Butch takes birth control pills.

C2: Butch is a man.

L: No man who takes birth control pills becomes pregnant.

----------------------------------------------------------------------------------

E: Butch has not become pregnant.

Unfortunately, this reasoning qualifies as explanatory on Hempel's theory despite the fact that the premises seem to be explanatorily irrelevant to the conclusion.

Sylvain Bromberger raised the problem of asymmetry by pointing out that, while on Hempel's model one can explain the period of a pendulum in terms of the length of the pendulum together with the law of simple periodic motion, one can just as easily explain the length of a pendulum in terms of its period in accord with the same law. Our intuitions tell us that the first is explanatory, but the second is not. The same point is made by the following example:

C: The barometer is falling rapidly.

L: Whenever the barometer falls rapidly, a storm is approaching.

-----------------------------------------------------------------

E: A storm is approaching.

While the falling barometer is a trustworthy indicator of an approaching storm, it is counterintuitive to say that the barometer explains the occurrence of the storm. Rather, it is the approaching storm that explains the falling barometer.

These two problems, relevance and asymmetry, expose the difficulty of developing a theory of explanation that makes no reference to causal relations. Reference to causal relations is not an option for Hempel, however, since causation heads the anti-realist's list of metaphysically suspect concepts. It would also undermine his view that explanation should be understood as an epistemic rather than a metaphysical relationship. Hempel's response to these problems was that they raise purely pragmatic issues. His model countenances many explanations that prove to be useless, but whether an explanation has any practical value is not, in Hempel's view, something that can be determined by philosophical analysis. This is a perfectly cogent reply, but it has not generally been regarded as an adequate one. Virtually all subsequent attempts to improve upon Hempel's theory accept the above criticisms as legitimate.

As noted above, Hempel's model requires that an explanation make use of at least one law-like generalization. This presents another sort of problem for the DN model. Hempel was careful to distinguish law-like generalizations from accidental generalizations. The latter are generalizations that may be true, but not in virtue of any law of nature. (for example, "All of my shirts are stained with coffee" may be true, but it is- I hope- just an accidental fact, not a law of nature.) Although the idea that explanation consists in subsuming events under natural laws has wide appeal in the philosophy of science, it is doubtful whether this requirement can be made consistent with Hempel's epistemic view of explanation. The reason is simply that no one has ever articulated an epistemically sound criterion for distinguishing between law-like generalizations and accidental generalizations. This is essentially just Hume's problem of induction, namely, that no finite number of observations can justify the claim that a regularity in nature is due to an natural necessity. In the absence of such a criterion, Hempel's model seems to violate the spirit of the epistemic view of explanation, as well as the idea that explanation can be understood in purely logical terms.

4. Contemporary Developments in the Theory of Explanation

Contemporary developments in the theory of explanation in many ways reflect the fragmented state of analytic philosophy since the decline of logical positivism. In this article we will look briefly at examples of how explanation has been conceived within the following five traditions: (1) Causal Realism, (2) Constructive Empiricism, (3) Ordinary Language Philosophy, (4) Cognitive Science and (5) Naturalism and Scientific Realism.

a. Explanation and Causal Realism

With the decline of logical positivism and the gathering success of modern theoretical science, philosophers began to regard continued skepticism about the reality of unobservable entities and processes as pointless. Different varieties of realism were articulated and against this background several different causal theories of explanation were developed. The idea behind them is the ordinary intuition noted at the beginning of this essay: to explain is to attribute a cause. Michael Scriven argued this point with notable force:

Let us take a case where we can be sure beyond any reasonable doubt that we have a correct explanation. As you reach for the dictionary, your knee catches the edge of the table and thus turns over the ink bottle, the contents of which proceed to run over the table's edge and ruin the carpet. If you are subsequently asked to explain how the carpet was damaged you have a complete explanation. You did it by knocking over the ink. The certainty of this explanation is primeval...This capacity for identifying causes is learnt, is better developed in some people than in others, can be tested, and is the basis for what we call judgments. (1959: p. 456)

Wesley Salmon's causal theory of explanation is perhaps the most influential developed within the realist tradition. Salmon had earlier developed a fundamentally epistemic view according to which an explanation is a list of statistically relevant factors. However he later rejected this, and any epistemic theory, as inadequate. His reason was that all epistemic theories are incapable of showing how explanations produce scientific understanding. This is because scientific understanding is not only a matter of having justified beliefs about the future. Salmon now insists that even a Laplacean Demon whose knowledge of the laws and initial conditions of the universe were so precise and complete as to issue in perfect predictive knowledge would lack scientific understanding. Specifically, he would lack the concepts of causal relevance and causal asymmetry and he could not distinguish between true causal processes and pseudo-processes. (As an example of the latter, consider the beam of a search light as it describes an arc through the sky. The movement of the beam is a pseudo-process since earlier stages of the beam do not cause later stages. By contrast, the electrical generation of the light itself, and the movement of the lamp housing are true causal processes.)

Salmon defends his causal realism by rejecting the Humean conception of causation as linked chains of events, and by attempting to articulate an epistemologically sound theory of continuous causal processes and causal interactions to replace it. The theory itself is detailed and does not lend itself to compression. It reads not so much as an analysis of the term 'explanation' as a set of instructions for producing an explanation of a particular phenomenon or event. One begins by compiling a list of statistically relevant factors and analyzing the list by a variety of methods. The procedure terminates in the creation of causal models of these statistical relationships and empirical testing to determine which of these models is best supported by the evidence.

Insofar as Salmon's theory insists that an adequate explanation has not been achieved until the fundamental causal mechanisms of a phenomenon have been articulated, it is deeply reductionistic. It is not clear, for example, how Salmon's model of explanation could ever generate meaningful explanations of mental events, which supervene on, but do not seem to be reducible to a unique set of causal relationships. Salmon's theory is also similar to Hempel's in at least one sense, and that is that both champion ideal forms of explanation, rather than anything that scientists or ordinary people are likely to achieve in the workaday world. This type of theorizing clearly has its place, but it has also been criticized by those who see explanation primarily as a form of communication between individuals. On this view, simplicity and ease of communication are not merely pragmatic, but essential to the creation of human understanding.

b. Explanation and Constructive Empiricism

In his book The Scientific Image (1980) Bas van Fraassen produced an influential defense of anti-realism. Terming his view "constructive empiricism" van Fraassen claimed that theoretical science was properly construed as a creative process of model construction rather than one of discovering truths about the unobservable world. While avoiding the fatal excesses of logical positivism he argued strongly against the realistic interpretation of theoretical terms, claiming that contemporary scientific realism is predicated on a dire misunderstanding of the nature of explanation. (See "Naturalism and Scientific Realism" below). In support of his constructive empiricism van Fraassen produced an epistemic theory of explanation that draws on the logic of why-questions and draws on a Bayesian interpretation of probability.

Like Hempel, van Fraassen seeks to explicate explanation as a purely logical concept. However, the logical relation is not that of premises to conclusion, but one of question to answer. Following Bromberger, van Fraassen characterizes explanation as an answer to a why-question. Why-questions, for him, are essentially contrastive. That is, they always, implicitly or explicitly, ask: Why Pk, rather than some set of alternatives X= ? Why-questions also implicitly stipulate a relevance relation R, which is the explanatory relation (for example, causation) any answer must bear to the ordered pair .

Van Fraassen follows Hempel in addressing explanatory asymmetry and explanatory relevance as pragmatic issues. However, van Fraassen's question-answering model makes this view a bit more intuitive. The relevance relation is defined by the interests of the person posing the question. For example, an individual who asks for an explanation of an airline accident in terms of the human decisions that led to it can not be forced to accept an explanation solely in terms of the weather. van Fraassen deals with the problem of explanatory asymmetry by showing that this, too, is a function of context. For example, most people would say that bad weather explains plane crashes, but plane crashes don't explain bad weather. However, there are conditions (for example, unstable atmospheric conditions, an airplane carrying highly explosive cargo) that could combine to supply the latter explanation with an appropriate context.

Van Fraassen's model also avoids Hempel's problematic requirement of high probability for IS explanation. For van Fraassen, an answer will be potentially explanatory if it "favors" Pk over all the other members of the contrast class. This means roughly that the answer must confer greater probability on Pk than on any other Pi. It does not require that Pk actually be probable, or even that the probability of Pk be raised as a result of the answer, since favoring can actually result from an answer that lowers the probability of all other Pi relative to Pk. For van Fraassen, the essential tool for calculating the explanatory value of a theory is Bayes' Rule, which allows one to calculate the probability of a particular event relative to a set of background assumptions and some new information. From a Bayesian point of view, the rationality of a belief is relative to a set of background assumptions which are not themselves the subject of evaluation. Van Fraassen's theory of explanation is therefore deeply subjectivist: what counts as a good explanation for one person may not count as a good explanation for another, since their background assumptions may differ.

Van Fraassen's pragmatic account of explanation buttresses his anti-realist position, by showing that when properly analyzed there is nothing about the concept of explanation that demands a realistic interpretation of causal processes or unobservables. Van Fraassen does not make the positivist mistake of claiming that talk of such things is metaphysical nonsense. He claims only that a full appreciation of science does not depend on a realistic interpretation. His pragmatism also offers an alternative account of Salmon's Laplacean Demon. van Fraassen agrees with Salmon that an individual with perfect knowledge of the laws and initial conditions of the universe lacks something, but what he lacks is not objective knowledge of the difference between causal processes and pseudo processes. Rather, he simply lacks the human interests that make causation a useful concept.

c. Explanation and Ordinary Language Philosophy

Although van Fraassen's theory of explanation is based on the view that explanation is a process of communication, he still chooses to explicate the concept of explanation as a logical relationship between question and answer, rather than as a communicative relationship between two individuals. Ordinary Language Philosophy tends to emphasize this latter quality, rejecting traditional epistemology and metaphysics and focusing on the requirements of effective communication. For this school, philosophical problems do not arise because ordinary language is defective, but because we are in some way ignoring the communicative function of language. Consequently, the point of ordinary language analysis is not to improve upon ordinary usage by clarifying the meanings of terms for use in some ideal vocabulary, but rather to bring the full ordinary meanings of the terms to light.

Within this tradition Peter Achinstein (1983) developed an illocutionary theory of explanation. Like Salmon, Achinstein characterizes explanation as the pursuit of understanding. He defines the act of explanation as the attempt by one person to produce understanding in another by answering a certain kind of question in a certain kind of way. Achinstein rejects Salmon's narrow association of understanding with causation, as well as van Fraassen's analysis in terms of why-questions. For Achinstein there are many different kinds of questions that we ordinarily regard as attempts to gain understanding (for example, who-, what-, when-, and where-questions) and it follows that the act of answering any of these is properly regarded as an act of explanation.

According to Achinstein's theory S (a person) explains q (an interrogative expressing some question Q) by uttering u only if:

S utters u with the intention that his utterance of u render q understandable by producing the knowledge of the proposition expressed by u that it is a correct answer to Q. (1983: p.13)

Achinstein's approach is an interesting departure from the types of theory discussed above in that it draws freely both on the concept of intention as well as the irreducibly causal notion of "producing knowledge." This move clearly can not be countenanced by someone who sees explanation as a fundamentally logical concept. Even the causal realist who believes that explanations make essential reference to causes does not construe explanation itself in causal terms. Indeed, Achinstein's approach is so different from theories that we have discussed so far that it might be best construed as addressing a very different question. Whereas traditional theories have attempted to explicate the logic of explanation, Achinstein's theory may be best understood as an attempt to describe the process of explanation itself.

Like van Fraassen's theory, Achinstein's theory is deeply pragmatic. He stipulates that all explanations are given relative to a set of instructions (cf. van Fraassen's relevance relations) and indicates that these instructions are ultimately determined by the individual asking the question. So, for example, a person who ask for an explanation why the electrical power in the house has gone out implicitly instructs that the question be answered in a way that would be relevant to the goal of turning the electricity back on. An answer that explained the absence of an electrical current in scientific terms, say by reference to Maxwell's equations, would be inappropriate in this case.

Achinstein attempts to avoid van Fraassen's subjectivism, by identifying understanding with knowledge that a certain kind of proposition is true. These, he calls "content giving propositions" which are to be contrasted with propositions that have no real cognitive significance. For example, Achinstein would want to rule out as non-explanatory, answers to questions that are purely tautological, such as: Mr. Pheeper died because Mr. Pheeper ceased to live. Achinstein also counts as non explanatory the scientifically correct answer to a question like: What is the speed of light in a vacuum? For him 186,000 miles/ second is not explanatory because, as it stands, it is just an incomprehensibly large number offering no basis of comparison with velocities that are cognitively significant. This does not mean that speed of light in a vacuum can not be explained. For example, a more cognitively significant answer to the above question might be that light can travel 7 1/2 times around the earth in one second. (Thanks to Professor Norman Swartz for this example)

One of the main difficulties with Achinstein's theory is that the idea of a content-giving proposition remains too vague. His refusal to narrow the list of questions that qualify as requests for explanation makes it very difficult to identify any interesting property that an act of explanation must have in order to produce understanding. Moreover, Achinstein's theory suffers from epistemological problems of its own. His theory of explanation makes essential reference to the intention to produce a certain kind of knowledge-state, but it is unclear from what Achinstein says how a knowledge state can be the result of an illocutionary act simpliciter. Certainly, such acts can produce beliefs, but not all beliefs so produced will count as knowledge, and Achinstein's theory does not distinguish between the kinds of explanatory acts that are likely to result in such knowledge, and the kinds that will not.

d. Explanation and Cognitive Science

While explanation may be fruitfully regarded as an act of communication, still another departure from the standard relational analysis is to think of explaining as a purely cognitive activity, and an explanation as a certain kind of mental representation that results from or aids in this activity. Considered in this way, explaining (sometimes called 'abduction') is a universal phenomenon. It may be conscious, deliberative, and explicitly propositional in nature, but it may also be unconscious, instinctive, and involve no explicit propositional knowledge whatsoever. For example: a father, hearing a high-pitched wail coming from the next room, rushes to his daughter's aid. Whether he reacted instinctively, or on the basis of an explicit inference, we can say that the father's behavior was the result of his having explained the wailing sound as the cry of his daughter.

From this perspective the term 'explanation' is neither a meta-logical nor a metaphysical relation. Rather, the term has been given a theoretical status and an explanatory function of its own; that is, we explain a person's behavior by reference to the fact that he is in possession of an explanation. Put differently, 'explanation' has been subsumed into the theoretical vocabulary of science (with explanation itself being one of the problematic unobservables) an understanding of which was the very purpose of the theory of explanation in the first place.

Cognitive science is a diverse discipline and there are many different ways of approaching the concept of explanation within it. One major rift within the discipline concerns the question whether "folk psychology" with its reference to mental entities like intentions, beliefs and desires is fundamentally sound. Cognitive scientists in the artificial intelligence (AI) tradition argue that it is sound, and that the task of cognitive science is to develop a theory that preserves the basic integrity of belief-desire explanation. On this view, explaining is a process of belief revision, and explanatory understanding is understood by reference to the set of beliefs that result from that process. Cognitive scientists in the neuroscience tradition, in contrast, argue that folk psychology is not explanatory at all: in its completed state all reference to beliefs and desires will be eliminated from the vocabulary of cognitive science in favor of a vocabulary that allows us to explain behavior by reference to models of neural activity. On this view explaining is a fundamentally neurological process, and explanatory understanding is understood by reference to activation patterns within a neural network.

One popular approach that incorporates aspects of both traditional AI and neuroscience makes use of the idea of a mental model (cf. Holland et al. [1986]) Mental models are internal representations that occur as a result of the activation of some part of a network of condition-action (or if-then) type rules. These rules are clustered in such a way that when a certain number of conditions becomes active, some action results. For example, here is a small cluster of rules that a simple cognitive system might use to distinguish different types of small furry mammals in a backyard environment.

(i) If [large, scurries, meows] then [cat].

(ii) If [small, scurries, squeaks] then [rat].

(iii) If [small, hops, chirps] then [squirrel].

(iv) If [squirrel or rat] then [flees].

(v) If [cat] then [approaches].

A mental model of a squirrel, then, can be described as an activation of rule (iii).

A key concept within the mental models framework is that of a default hierarchy. A set of rules such as those above, state a standard set of default conditions. When these are met, a set of expectations is generated. For example, the activation of rule (iii) generates expectations of type (iv). However, a viable representational system must be able to revise prior rule activations when expectations are contradicted by future experience. In the mental models framework, this is achieved by incorporating a hierarchy of rules below the default condition with more specific conditions at lower levels of the model whose actions will defeat default expectations. For example, default rule (iii) might be defeated by another rule as follows:

3. Level 1: If [small, hops, chirps] then [squirrel].

Level 2: If [flies] then [bird].

In other words, a system that identifies a small, hopping chirping animal as a squirrel generates a set of expectations about its future behavior. If these expectations are contradicted by, for example, the putative squirrel flying, then the system will descend to a lower level of the hierarchy thereby allowing the system to reclassify the object as a bird.

Although this is just a cursory characterization of the mental models framework it is enough to show how explanation can be handled within it. In this context it is natural to think of explanation as a process that is triggered by a predictive failure. Essentially, when the expectations activated at Level 1 of the default hierarchy fail, the system searches lower levels of the hierarchy to find out why. If the above example were formulated in explicitly propositional terms, we would say that the failure of Level 1 expectations generated the question: Why did the animal, which I previously identified as a squirrel, fly? The answer supplied at level 2 is: Because the animal is not a squirrel, but a bird. Of course, Level 2 rules produce their own set of expectations, which must themselves be corroborated with future experience or defeated by future explanations. Clearly, the above example is a rudimentary form of explanation. Any viable system must incorporate learning algorithms which allow it to modify both the content and structure of the default hierarchy when its expectations are repeatedly undermined by experience. This will necessarily involve the ability to generalize over past experiences and activate entirely new rules at every level of the default hierarchy.

One can reasonably doubt whether philosophical questions about the nature of explanation are addressed by defining and ultimately engineering systems capable of explanatory cognition. To the extent that these questions are understood in purely normative terms, they obviously arise in regard to systems built by humans with at least as much force as they arise for humans themselves. In defense of the cognitive science approach, however, one might assert that the simple philosophical question "What is explanation?" is not well-formed. If we accept some form of epistemic relativity, the proper form of such a question is always "What is explanation in cognitive system S?" Hence, doubts about the significance of explanatory cognition in some system S are best expressed as doubts about whether system S-type explanation models human cognition accurately enough to have any real significance for human beings.

e. Explanation, Naturalism and Scientific Realism

Historically, naturalism is associated with the inclination to reject any kind of explanation of natural phenomena that makes essential reference to unnatural phenomena. Insofar as this view is understood simply as the rejection of supernatural phenomena (for example the actions of gods, irreducibly spiritual substances, etc.) it is uncontroversial within the philosophy of science. However, when it is understood to entail the rejection of irreducibly non-natural properties, (that is, the normative properties of 'rightness' and 'wrongness' that we appeal to in making evaluative judgments about human thought and behavior), it is deeply problematic. The problem is just that the aim of the philosophy of science has always been to establish an a priori basis for making precisely these evaluative judgments about scientific inquiry itself. If they can not be made, then it follows that the goals of philosophical inquiry have been badly misconceived.

Most contemporary naturalists do not regard this as an insurmountable problem. Rather, they just reject the idea that philosophical inquiry can occur from a vantage point outside of science, and they deny that evaluative judgments we make about scientific reasoning and scientific concepts have any a priori status. Put differently, they think philosophical inquiry should be seen as a very abstract form of scientific inquiry, and they see the normative aspirations of philosophers as something that must be achieved by using the very tools and methods that philosophers have traditionally sought to justify.

The relevance of naturalism to the theory of explanation can be understood briefly as follows. Naturalism undermines the idea that knowledge is prior to understanding. If it is true that there will never be an inductive logic that can provide an a priori basis for calling an observed regularity a natural law, then there is, in fact, no independent way of establishing what is the case prior to understanding why it is the case. Because of this, some naturalists (for example, Sellars) have suggested a different way of thinking about the epistemic significance of explanation. The idea, basically, is that explanation is not something that occurs on the basis of pre-confirmed truths. Rather, successful explanation is actually part of the process of confirmation itself:

Our aim [is] to manipulate the three basic components of a world picture: (a) observed objects and events, (b) unobserved objects and events, (c) nomological connections, so as to achieve a maximum of "explanatory coherence." In this reshuffle no item is sacred. (Sellars, 1962: p356)

Many naturalists have since embraced this idea of "inference to the best explanation" (IBE) as a fundamental principle of scientific reasoning. Moreover, they have put this principle to work as an argument for realism. Briefly, the idea is that if we treat the claim that unobservable entities exist as a scientific hypothesis, then it can be seen as providing an explanation of the success of theories that employ them: namely, the theories are successful because they are (approximately) true. Anti-realism, by contrast, can provide no such explanation; on this view theories that make reference to unobservables are not literally true and so the success of scientific theories remains mysterious. It should be noted here that scientific realism has a very different flavor from the more foundational form of realism discussed above. Traditional realists do not think of realism as a scientific hypothesis, but as an independent metaphysical thesis.

Although IBE has won many converts in recent years it is deeply problematic precisely because of the way it employs the concept of explanation. While most people find IBE to be intuitively plausible, the fact remains that no theory of explanation discussed above can make sense of the idea that we accept a claim on the basis of its explanatory power. Rather, every such view stipulates as a condition of having explanatory power at all that a statement must be true or well-confirmed. Moreover, van Fraassen has argued that even if we can make sense of IBE, it remains a highly dubious principle of inductive inference. The reason is that "inference to the best explanation" really can only mean "inference to the best explanation given to date." We are unable to compare proposed explanations to others that no one has yet thought of, and for this reason the property of being the best explanation can not be an objective measure of the likelihood that it is true.

One way of responding to these criticisms is to observe that Sellars' concept of explanatory coherence is based on a view about the nature of understanding that simply eludes the standard models of explanation. According to this view an explanation increases our understanding, not simply by being the correct answer to a particular question, but by increasing the coherence of our entire belief system. This view has been developed in the context of traditional epistemology (Harman, Lehrer) as well as the philosophy of science (Thagard, Kitcher). In the latter context, the terms "explanatory unification" and "consilience" have been introduced to promote the idea that good explanations necessarily tend to produce a more unified body of knowledge. Although traditionalists will insist that there is no a priori basis for thinking that a unified or coherent set of beliefs is more likely to be true, (counterexamples are, in fact, easy to produce) this misses the point that most naturalists reject the possibility of establishing IBE, or any other inductive principle, on purely a priori grounds.

For critiques of naturalism, see the Social Science article.

5. The Current State of the Theory of Explanation

This brief summary may leave the reader with the impression that philosophers are hopelessly divided on the nature of explanation, but this is not really the case. Most philosophers of science would agree that our understanding of explanation is far better now than it was in 1948 when Hempel and Oppenheim published "Studies in the Logic of Explanation." While it serves expository purposes to represent the DN model and each of its successors as fatally flawed, this should not obscure the fact that these theories have brought real advances in understanding which succeeding models are required to preserve. At this point, fundamental disagreements on the nature of explanation fall into one of two categories. First, there are metaphysical disagreements. Realists and anti-realists continue to differ over what sort of ontological commitments one makes in accepting an explanation. Second, there are meta-philosophical disagreements. Naturalists and non-naturalists remain at odds concerning the relevance of scientific inquiry ( namely, inquiry into the way scientists, ordinary people and computers actually think) to a philosophical theory of explanation. These disputes are unlikely to be resolved anytime soon. Fortunately, however, the significance of further research into the logical and cognitive structure of explanation does not depend on their outcome.

6. References and Further Reading

  • Achinstein, Peter (1983) The Nature of Explanation. New York: Oxford University Press.
  • Belnap and Steele (1976) The Logic of Questions and Answers. New Haven: Yale University
  • Bromberger, Sylvain (1966) "Why-Questions," In Baruch A. Brody, ed., Readings in the Philosophy of Science, 66-84. Englewood Cliffs: Prentice Hall, Inc..
  • Brody, Baruch A. (1970) Readings in the Philosophy of Science. Englewood Cliffs, N.J.: Prentice Hall
  • Duhem, Pierre (1962) The Aim and Structure of Physical Theory. New York:
  • Friedman, Michael (1974 ) "Explanation and Scientific Understanding." Journal of Philosophy 71: 5-19.
  • Harman, Gilbert (1965) "The Inference to the Best Explanation." Philosophical Review, 74: 88-95.
  • Hempel, Carl G. and Oppenheim, Paul (1948) "Studies in the Logic of Explanation." In Brody p. 8-38.
  • Hempel, Carl G. (1965) Aspects of Scientific Explanation and other Essays in the Philosophy of Science. New York: Free Press.
  • Holland, John; Holyoak, Keith; Nisbett, Richard; Thagard, Paul (1986) Induction: Processes of Inference, Learning, and Discovery. Cambridge: MIT Press
  • Hume, David (1977) An Enquiry Concerning Human Understanding. Indianapolis: Hackett
  • Kitcher, Philip (1981) "Explanatory Unification." Philosophy of Science 48:507-531.
  • Lehrer, Keith (1990) Theory of Knowledge. Boulder: West View Press.
  • Pitt, Joseph C. (1988) Theories of Explanation. Oxford: Oxford University Press.
  • Quine, W. V. (1969) "Epistemology Naturalized." In Ontological Relativity and Other Essays. New York: Columbia University Press: 69-90.
  • Salmon, Wesley (1984) Scientific Explanation and the Causal Structure of the World. Princeton: Princeton University Press.
  • Salmon, Wesley (1990) Four Decades of Scientific Explanation. Minneapolis: University of Minnesota Press.
  • Scriven, M (1959) "Truisms as the Grounds for Historical Explanations." In P. Gardiner (Ed.), Theories of History: Readings from Classical and Contemporary Sources, New York: Free Press, pp. 443-475.
  • Sellars, Wilfred (1962) Science, Perception, and Reality. New York: Humanities Press.
  • Stich, Stephen (1983) From Folk Psychology to Cognitive Science. Cambridge: The MIT Press.
  • Thagard, Paul (1988) Computational Philosophy of Science. Cambridge: MIT Press.
  • van Fraassen, Bas C. (1980) The Scientific Image. Oxford: Clarendon Press.
  • van Fraassen, Bas C. (1989) Laws and Symmetry. Oxford: Clarendon Press.

Author Information

G. Randolph Mayes
Email: mayesgr@csus.edu
California State University Sacramento
U. S. A.

Epsilon Calculi

Epsilon Calculi are extended forms of the predicate calculus that incorporate epsilon terms. Epsilon terms are individual terms of the form 'εxFx', being defined for all predicates in the language. The epsilon term 'εxFx' denotes a chosen F, if there are any F's, and has an arbitrary reference otherwise. Epsilon calculi were originally developed to study certain forms of arithmetic, and set theory; also to prove some important meta-theorems about the predicate calculus. Later formal developments have included a variety of intensional epsilon calculi, of use in the study of necessity, and more general intensional notions, like belief. An epsilon term such as 'εxFx' was originally read as 'the first F', and in arithmetical contexts as 'the least F'. More generally it can be read as the demonstrative description 'that F', when arising either deictically, that is, in a pragmatic context where some F is being pointed at, or in linguistic cross-reference situations, as with, for example, 'There is a red-haired man in the room. That red-haired man is Caucasian'. The application of epsilon terms to natural language shares some features with the use of iota terms within the theory of descriptions given by Bertrand Russell, but differs in formalising aspects of a slightly different theory of reference, first given by Keith Donnellan. More recently, epsilon terms have been used by a number of writers to formalise cross-sentential anaphora, which would arise if 'that red-haired man' in the linguistic case above was replaced with a pronoun such as 'he'. There is then also the similar application in intensional cases, like 'There is a red-haired man in the room. Celia believed he was a woman.'

Table of Contents

  1. Introduction
  2. Descriptions and Identity
  3. Rigid Epsilon Terms
  4. The Epsilon Calculus' Problematic
  5. The Formal Semantics of Epsilon Terms
  6. Some Metatheory
  7. References and Further Reading

1. Introduction

Epsilon terms were introduced by the german mathematician David Hilbert, in Hilbert 1923, 1925, to provide explicit definitions of the existential and universal quantifiers, and resolve some problems in infinitistic mathematics. But it is not just the related formal results, and structures which are of interest. In Hilbert's major book Grundlagen der Mathematik, which he wrote with his collaborator Paul Bernays, epsilon terms were presented as formalising certain natural language constructions, like definite descriptions. And they in fact have a considerably larger range of such applications, for instance in the symbolisation of certain cross-sentential anaphora. Hilbert and Bernays also used their epsilon calculus to prove two important meta-theorems about the predicate calculus. One theorem subsequently led, for instance, to the development of semantic tableaux: it is called the First Epsilon Theorem, and its content and proof will be given later, in section 6 below. A second theorem that Hilbert and Bernays proved, which we shall also look at then, establishes that epsilon calculi are conservative extensions of the predicate calculus, that is, that no more theorems expressible just in the quantificational language of the predicate calculus can be proved in epsilon calculi than can be proved in the predicate calculus itself. But while epsilon calculi do have these further important formal functions, we will not only be concerned to explore them, for we shall also first discuss the natural language structures upon which epsilon calculi have a considerable bearing.

The growing awareness of the larger meaning and significance of epsilon calculi has only come in stages. Hilbert and Bernays introduced epsilon terms for several meta-mathematical purposes, as above, but the extended presentation of an epsilon calculus, as a formal logic of interest in its own right, in fact only first appeared in Bourbaki's Éléments de Mathématique (although see also Ackermann 1937-8). Bourbaki's epsilon calculus with identity (Bourbaki, 1954, Book 1) is axiomatic, with Modus Ponens as the only primitive inference or derivation rule. Thus, in effect, we get:

(X ∨ X) → X,
X → (X ∨ Y),
(X ∨ Y) → (Y ∨ X),
(X ∨ Y) → ((Z ∨ X) → (Z ∨ Y)),
Fy → FεxFx,
x = y → (Fx ↔ Fy),
(x)(Fx ↔ Gx) → εxFx = εxGx.

This adds to a basis for the propositional calculus an epsilon axiom schema, then Leibniz' Law, and a second epsilon axiom schema, which is a further law of identity. Bourbaki, though, used the Greek letter tau rather than epsilon to form what are now called 'epsilon terms'; nevertheless, he defined the quantifiers in terms of his tau symbol in the manner of Hilbert and Bernays, namely:

(∃x)Fx ↔ FεxFx,
(x)Fx ↔ Fεx¬Fx;

and note that, in his system the other usual law of identity, 'x = x', is derivable.

The principle purpose Bourbaki found for his system of logic was in his theory of sets, although through that, in the modern manner, it thereby came to be the foundation for the rest of mathematics. Bourbaki's theory of sets discriminates amongst predicates those which determine sets: thus some, but only some, predicates determine sets, i.e. are 'collectivisantes'. All the main axioms of classical Set Theory are incorporated in his theory, but he does not have an Axiom of Choice as a separate axiom, since its functions are taken over by his tau symbol. The same point holds in Bernays' epsilon version of his set theory (Bernays 1958, Ch VIII).

Epsilon calculi, during this period, were developed without any semantics, but a semantic interpretation was produced by Gunter Asser in 1957, and subsequently published in a book by A.C. Leisenring, in 1969. Even then, readings of epsilon terms in ordinary language were still uncommon. A natural language reading of epsilon terms, however, was present in Hilbert and Bernays' work. In fact the last chapter of book 1 of the Grundlagen is a presentation of a theory of definite descriptions, and epsilon terms relate closely to this. In the more well known theory of definite descriptions by Bertrand Russell (Russell 1905) there are three clauses: with

The king of France is bald

we get, on Russell's theory, first

there is a king of France,

second

there is only one king of France,

and third

anyone who is king of France is bald.

Russell uses the Greek letter iota to formalise the definite description, writing the whole

BιxKx,

but he recognises the iota term is not a proper individual symbol. He calls it an 'incomplete symbol', since, because of the three parts, the whole proposition is taken to have the quantificational analysis,

(∃x)(Kx & (y)(Ky → y = x) & (y)(Ky → By)),

which is equivalent to

(∃x)(Kx & (y)(Ky→ y = x) & Bx).

And that means that it does not have the form 'Bx'. Russell believed that, in addition to his iota terms, there was another class of individual terms, which he called 'logically proper names'. These would simply fit into the 'x' place in 'Bx'. He believed that 'this' and 'that' were in this class, but gave no symbolic characterisation of them.

Hilbert and Bernays, by contrast, produced what is called a 'pre-suppositional theory' of definite descriptions. The first two clauses of Russell's definition were not taken to be part of the meaning of 'The King of France is bald': they were merely conditions under which they took it to be permitted to introduce a complete individual term for 'the King of France', which then satisfies

Kx & (y)(Ky → y = x).

Hilbert and Bernays continued to use the Greek letter iota in their individual term, although it has a quite different grammar from Russell's iota term, since, when Hilbert and Bernays' term can be introduced, it is provably equivalent to the corresponding epsilon term (Kneebone 1963, p102). In fact it was later suggested by many that epsilon terms are not only complete symbols, but can be seen as playing the same role as the 'logically proper names' Russell discussed.

It is at the start of book 2 of the Grundlagen that we find the definition of epsilon terms. There, Hilbert and Bernays first construct a theory of indefinite descriptions in a similar manner to their theory of definite descriptions. They allow, now, an eta term to be introduced as long as just the first of Russell's conditions is met. That is to say, given

(∃x)Fx,

one can introduce the term 'ηxFx', and say

FηxFx.

But the condition for the introduction of the eta term can be established logically, for certain predicates, since

(∃x)((∃y)Fy → Fx),

is a predicate calculus theorem (Copi 1973, p110). It is the eta term this theorem allows us to introduce which is otherwise called an epsilon term, and its logical basis enables entirely formal theories to be constructed, since such individual terms are invariably defined. Thus we may invariably introduce 'ηx((∃y)Fy → Fx)', and this is commonly written 'εxFx', about which we can therefore say

(∃y)Fy → FεxFx.

Since it is that F which exists if anything is F, Hilbert read the epsilon term in this case 'the first F'. For instance, in arithmetic, 'the first' may be taken to be the least number operator. However, while if there are F's then the first F is clearly some chosen one of them, if there are no F's then 'the first F' must be a misnomer. And that form of speech only came to be fully understood in the theories of reference which appeared much later, when reference and denotation came to be more clearly separated from description and attribution. Donnellan (Donnellan 1966) used the example 'the man with martini in his glass', and pointed out that, in certain uses, this can refer to someone without martini in his glass. In the terminology Donnellan made popular, 'the first F', in the second case above works similarly: it cannot be attributive, and so, while it refers to something, it must refer arbitrarily, from a semantic point of view.

With reference in this way separated from attribution it becomes possible to symbolise the anaphoric cross-reference between, for instance, 'There is one and only one king of France' and 'He is bald'. For, independently of whether the former is true, the 'he' in the latter is a pronoun for the epsilon term in the former -- by a simple extension of the epsilon definition of the existential quantifier. Thus the pair of remarks may be symbolised

(∃x)(Kx & (y)(Ky → y = x)) & Bεx(Kx & (y)(Ky → y = x)).

Furthermore such cross-reference may occur in connection with intensional constructions of a kind Russell also considered, such as

George IV wondered whether the author of Waverley was Scott.

Thus we can say 'There is an author of Waverley, and George IV wondered whether he was Scott'. But the epsilon analysis of these cases puts intensional epsilon calculi at odds with Russellian views of such constructions, as we shall see later. The Russellian approach, by not having complete symbols for individuals, tends to confuse cases in which assertions are made about individuals and cases in which assertions are made about identifying properties. As we shall see, epsilon terms enable us to make the discrimination between, for instance,

s = εx(y)(Ay ↔ y = x),

(i.e. 'Scott is the author of Waverley'), and

(y)(Ay ↔ y = s),

(that is, 'there is one and only one author of Waverley and he is Scott'), and so it enables us to locate more exactly the object of George IV's thought.

2. Descriptions and Identity

When one starts to ask about the natural language meaning of epsilon terms, it is interesting that Leisenring just mentions the 'formal superiority' of the epsilon calculus (Leisenring 1969, p63, see also Routley 1969, Hazen 1987). Leisenring took the epsilon calculus to be a better logic than the predicate calculus, but merely because of the Second Epsilon Theorem. Its main virtue, to Leisenring, was that it could prove all that seemingly needed to be proved, but in a more elegant way. Epsilon terms were just neater at calculating which were the valid theorems of the predicate calculus.

Remembering Hilbert and Bernays' discussion of definite and indefinite descriptions, clearly there is more to the epsilon calculus than this. And there are, in fact, two specific theorems provable within the epsilon calculus, though not the predicate calculus, which will start to indicate the epsilon calculus' more general range of application. They concern individuals, since the epsilon calculus is distinctive in providing an appropriate, and systematic means of reference to them.

The need to have complete symbols for individuals became evident some years after Russell's promotion of incomplete symbols for them. The first major book to allow for this was Rosser's Logic for Mathematicians, in 1953, although there were precursors. For the classical difficulty with providing complete terms for individuals concerns what to do with 'non-denoting' terms, and Quine, for instance, following Frege, often gave them an arbitrary, though specific referent (Marciszewski 1981, p113). This idea is also present in Kalish and Montague (Kalish and Montague 1964, pp242-243), who gave the two rules:

(∃x)(y)(Fy ↔ y = x) ├ FιxFx,
¬(∃x)(y)(Fy ↔ y = x) ├ιxFx = ιx¬(x = x),

where 'ιxFx' is what otherwise might be written 'εx(y)(Fy ↔ y = x)'. Kalish and Montague believed, however, that the second rule 'has no intuitive counterpart, simply because ordinary language shuns improper definite descriptions' (Kalish and Montague 1964, p244). And, at that time, what Donnellan was to publish in Donnellan 1966, about improper definite descriptions, was certainly not well known. In fact ordinary speech does not shun improper definite descriptions, although their referents are not as fixed as the above second rule requires. Indeed the very fact that the descriptions are improper means that their referents are not determined semantically: instead they are just a practical, pragmatic choice.

Stalnaker and Thomason recognised the need to be more liberal when they defined their referential terms, which also had to refer, in the contexts they were concerned with, in more than one possible world (Thomason and Stalnaker 1968, p363):

In contrast with the Russellian analysis, definite descriptions are treated as genuine singular terms; but in general they will not be substance terms [rigid designators]. An expression like ιxPx is assigned a referent which may vary from world to world. If in a given world there is a unique existing individual which has the property corresponding to P, this individual is the referent of ιxPx; otherwise, ιxPx refers to an arbitrarily chosen individual which does not exist in that world.

Stalnaker and Thomason appreciated that 'A substance term is much like what Russell called a logically proper name', but they said that an individual constant might or might not be a substance term, depending on whether it was more like 'Socrates' or 'Miss America' (Thomason and Stalnaker 1968, p362). A more complete investigation of identity and descriptions, in modal and general intensional contexts, was provided in Routley, Meyer and Goddard 1974, and Routley 1977, see also Hughes and Cresswell 1968, Ch 11. And with these writers we get the explicit rendering of definite descriptions in epsilon terms, as in Goddard and Routley 1973, p558, Routley 1980, p277, c.f. Hughes and Cresswell 1968, p203.

Certain specific theorems in the epsilon calculus, as was said before, support these kinds of identification. One theorem demonstrates directly the relation between Russell's attributive, and some of Donnellan's referential ideas. For

(∃x)(Fx & (y)(Fy → y = x) & Gx)

is logically equivalent to

(∃x)(Fx & (y)(Fy → y = x)) & Ga,

where a = εx(Fx & (y)(Fy → y = x)). This arises because the latter is equivalent to

Fa & (y)(Fy → y = a) & Ga,

which entails the former. But the former is

Fb & (y)(Fy → y = b) & Gb,

with b = εx(Fx & (y)(Fy → y = x) & Gx), and so entails

(∃x)(Fx & (y)(Fy → y = x)),

and

Fa & (y)(Fy → y = a).

But that means that, from the uniqueness clause,

a = b,

and so

Ga,

meaning the former entails the latter, and therefore the former is equivalent to the latter.

The former, of course, gives Russell's Theory of Descriptions, in the case of 'The F is G'; it explicitly asserts the first two clauses, to do with the existence and uniqueness of an F. A presuppositional theory, such as we saw in Hilbert and Bernays, would not explicitly assert these two clauses: on such an account they are a precondition before the term 'the F' can be introduced. But neither of these theories accommodate improper definite descriptions. Since Donnellan it is more common to allow that we can always use 'the F': if the description is improper then the referent of this term is simply found in the term's practical use.

One detail of Donnellan's historical account, however, must be treated with some care, at this point. Donnellan was himself concerned with definite descriptions which were improper in the sense that they did not uniquely describe what the speaker took to be their referent. So the description might still be 'proper' in the above sense -- if there still was something to which it uniquely applied, on account of its semantic content. Thus Donnellan allowed 'the man with martini in his glass' to identify someone without martini in his glass irrespective of whether there was some sole man with martini in his glass. But if one talks about 'the man with martini in his glass' one can be correctly taken to be talking about who this describes, if it does in fact correctly describe someone -- as Devitt and Bertolet pointed out in criticism of Donnellan (Devitt 1974, Bertolet 1980). It is this aspect of our language which the epsilon account matches, for an epsilon account allows definite descriptions to refer without attribution of their semantic character, but only if nothing uniquely has that semantic character. Thus it is not the whole of the first statement above , but only the third part of the second statement which makes the remark 'The F is G'.

The difficulty with Russell's account becomes more plain if we read the two equivalent statements using relative and personal pronouns. They then become

There is one and only one F, which is G,
There is one and only one F; it is G.

But using just the logic derived from Frege, Russell could formalise the 'which', but could not separate out the last clause, 'it is G'. In that clause 'it' is an anaphor for 'the (one and only) F', and it still has this linguistic meaning if there is no such thing, since that is just a matter of grammar. But the uniqueness clause is needed for the two statements to be equivalent -- without uniqueness there is no equivalence, as we shall see - so 'which' is not itself equivalent to 'it'. Russell, however, because he could not separate out the 'it', had to take the whole of the first expression as the analysis of 'The F is G' -- he could not formulate the needed 'logically proper name'.

But how can something be the one and only F 'if there is no such thing'? That is where another important theorem provable in the epsilon calculus is illuminating, namely:

(Fa & (y)(Fy → y = a)) → a = εx(Fx & (y)(Fy → y = x)).

The important thing is that there is a difference between the left hand side and the right hand side, i.e. between something being alone F, and that thing being the one and only F. For the left-right implication cannot be reversed. We get from the left to the right when we see that the left as a whole entails

(∃x)(Fx & (y)(Fy → y = x)),

and so also its epsilon equivalent

Fεx(Fx & (y)(Fy → y = x)) & (z)(Fz → z = εx(Fx & (y)(Fy → y = x))).

Given Fa, then from the second clause here we get the right hand side of our original implication. But if we substitute 'εx(Fx & (y)(Fy → y = x))' for 'a' in that implication then on the right we have something which is necessarily true. But the left hand side is then the same as

(∃x)(Fx & (y)(Fy → y = x)),

and that is in general contingent. Hence the implication cannot generally be reversed. Having the property of being alone F is here contingent, but possessing the identity of the one and only F is necessary.

The distinction is not made in Russell's logic, since possession of the relevant property is the only thing which can be formally expressed there. In Russell's theory of descriptions, a's possession of the property of being alone a king of France is expressed as a quasi identity

a = ιxKx,

and that has the consequence that such identities are contingent. Indeed, in counterpart theories of objects in other possible worlds the idea is pervasive that an entity may be defined in terms of its contingent properties in a given world. Hughes and Cresswell, however, differentiated between contingent identities and necessary identities in the following way (Hughes and Cresswell 1968, p191):

Now it is contingent that the man who is in fact the man who lives next door is the man who lives next door, for he might have lived somewhere else; that is living next door is a property which belongs contingently, not necessarily, to the man to whom it does belong. And similarly, it is contingent that the man who is in fact the mayor is the mayor; for someone else might have been elected instead. But if we understand [The man who lives next door is the mayor] to mean that the object which (as a matter of contingent fact) possesses the property of being the man who lives next door is identical with the object which (as a matter of contingent fact) possesses the property of being the mayor, then we are understanding it to assert that a certain object (variously described) is identical with itself, and this we need have no qualms about regarding as a necessary truth. This would give us a way of construing identity statements which makes [(x = y) → L(x = y)] perfectly acceptable: for whenever x = y is true we can take it as expressing the necessary truth that a certain object is identical with itself.

There are more consequences of this matter, however, than Hughes and Cresswell drew out. For now that we have proper referring terms for individuals to go into such expressions as 'x = y', we first see better where the contingency of the properties of such individuals comes from -- simply the linguistic facility of using improper definite descriptions. But we also see, because identities between such terms are necessary, that proper referring terms must be rigid, i.e. have the same reference in all possible worlds.

This is not how Stalnaker and Thomason saw the matter. Stalnaker and Thomason, it will be remembered, said that there were two kinds of individual constants: ones like 'Socrates' which can take the place of individual variables, and others like 'Miss America' which cannot. The latter, as a result, they took to be non-rigid. But it is strictly 'Miss America in year t' which is meant in the second case, and that is not a constant expression, even though such functions can take the place of individual variables. It was Routley, Meyer and Goddard who most seriously considered the resultant possibility that all properly individual terms are rigid. At least, they worked out many of the implications of this position, even though Routley was not entirely content with it.

Routley described several rigid intensional semantics (Routley 1977, pp185-186). One of these, for instance, just took the first epsilon axiom to hold in any interpretation, and made the value of an epsilon term itself. On such a basis Routley, Meyer and Goddard derived what may be called 'Routley's Formula', i.e.

L(∃x)Fx → (∃x)LFx.

In fact, on their understanding, this formula holds for any operator and any predicate, but they had in mind principally the case of necessity illustrated here, with 'Fx' taken as 'x numbers the planets', making 'εxFx' 'the number of the planets'. The formula is derived quite simply, in the following way: from

L(∃x)Fx,

we can get

LFεxFx,

by the epsilon definition of the existential quantifier, and so

(∃x)LFx,

by existential generalisation over the rigid term (Routley, Meyer and Goddard 1974, p308, see also Hughes and Cresswell 1968, pp197, 204). Routley, however, was still inclined to think that a rigid semantics was philosophically objectionable (Routley 1977, p186):

Rigid semantics tend to clutter up the semantics for enriched systems with ad hoc modelling conditions. More important, rigid semantics, whether substitutional or objectual, are philosophically objectionable. For one thing, they make Vulcan and Hephaestus everywhere indistinguishable though there are intensional claims that hold of one but not of the other. The standard escape from this sort of problem, that of taking proper names like 'Vulcan' as disguised descriptions we have already found wanting... Flexible semantics, which satisfactorily avoid these objections, impose a more objectual interpretation, since, even if [the domain] is construed as the domain of terms, [the value of a term in a world] has to be permitted, in some cases at least, to vary from world to world.

As a result, while Routley, Meyer and Goddard were still prepared to defend the formula, and say, for instance, that there was a number which necessarily numbers the planets, namely the number of the planets (np), they thought that this was only in fact the same as 9, so that one still could not argue correctly that as L(np numbers the planets), so L(9 numbers the planets). 'For extensional identity does not warrant intersubstitutivity in intensional frames' (Routley, Meyer and Goddard 1974, p309). They held, in other words that the number of the planets was only contingently 9.

This means that they denied '(x = y) → L(x = y)', but, as we shall see in more detail later, there are ways to hold onto this principle, i.e. maintain the invariable necessity of identity.

3. Rigid Epsilon Terms

There is some further work which has helped us to understand how reference in modal and general intensional contexts must be rigid. But it involves some different ideas in semantics, and starts, even, outside our main area of interest, namely predicate logic, in the semantics of propositional logic.

When one thinks of 'semantics' one maybe thinks of the valuation of formulas. Since the 1920s a meta-study of this kind was certainly added to the previous logical interest in proof theory. Traditional proof theory is commonly associated with axiomatic procedures, but, from a modern perspective, its distinction is that it is to do with 'object languages'. Tarski's theory of truth relies crucially on the distinction between object languages and meta-languages, and so semantics generally seems to be necessarily a meta-discipline. In fact Tarski believed that such an elevation of our interest was forced upon us by the threat of semantic paradoxes like The Liar. If there was, by contrast, 'semantic closure', i.e. if truth and other semantic notions were definable at the object level, then there would be contradictions galore (c.f. Priest 1984). In this way truth may seem to be necessarily a predicate of (object-level) sentences.

But there is another way of looking at the matter which is explicitly non-Tarskian, and which others have followed (see Prior 1971, Ch 7, Sayward 1987). This involves seeing 'it is true that' as not a predicate, but an object-level operator, with the truth tabulations in Truth Tables, for instance, being just another form of proof procedure. Operators indeed include 'it is provable that', and this is distinct from Gödel's provability predicate, as Gödel himself pointed out (Gödel 1969). Operators are intensional expressions, as in the often discussed 'it is necessary that' and 'it is believed that', and trying to see such forms of indirect discourse as metalinguistic predicates was very common in the middle of the last century. It was pervasive, for instance, in Quine's many discussions of modality and intensionality. Wouldn't someone be believing that the Morning Star is in the sky, but the Evening Star is not, if, respectively, they assented to the sentence 'the Morning Star is in the sky', and dissented from 'the Evening Star is in the sky'? Anyone saying 'yes' is still following the Quinean tradition, but after Montague's and Thomason's work on operators (e.g. Montague 1963, Thomason 1977, 1980) many logicians are more persuaded that indirect discourse is not quotational. It is open to doubt, that is to say, whether we should see the mind in terms of the direct words which the subject would use.

The alternative involves seeing the words 'the Morning Star is in the sky' in such an indirect speech locution as 'Quine believes that the Morning Star is in the sky' as words merely used by the reporter, which need not directly reflect what the subject actually says. That is indeed central to reported speech -- putting something into the reporter's own words rather than just parroting them from another source. Thus a reporter may say

Celia believed that the man in the room was a woman,

but clearly that does not mean that Celia would use 'the man in the room' for who she was thinking about. So referential terms in the subordinate proposition are only certainly in the mouth of the reporter, and as a result only certainly refer to what the reporter means by them. It is a short step from this thought to seeing

There was a man in the room, but Celia believed that he was a woman,

as involving a transparent intensional locution, with the same object, as one might say, 'inside' the belief as 'outside' in the room. So it is here where rigid constant epsilon terms are needed, to symbolise the cross-sentential anaphor 'he', as in:

(∃x)(Mx & Rx) & BcWεx(Mx & Rx).

To understand the matter fully, however, we must make the shift from meta- to object language we saw at the propositional level above with truth. Routley, Meyer and Goddard realised that a rigid semantics required treating such expressions as 'BcWx' as simple predicates, and we must now see what this implies. They derived, as we saw before, 'Routley's Formula'

L(∃x)Fx → (∃x)LFx,

but we can now start to spell out how this is to be understood, if we hold to the necessity of identities, i.e. if we use '=' so that

x = y → L(x = y).

Again a clear illustration of the validity of Routley's Formula is provided by the number of the planets, but now we may respect the fact that some things may lack a number, and also the fact that referential, and attributive senses of terms may be distinguished. Thus if we write '(nx)Px' for 'there are n P's', then εn(ny)Py will be the number of P's, and it is what numbers them (i.e. ([εn(ny)Py]x)Px) if they have a number (i.e. if (∃n)(nx)Px) -- by the epsilon definition of the existential quantifier. Then, with 'Fx' as the proper (necessary) identity 'x = εn(ny)Py' Routley's Formula holds because the number in question exists eternally, making both sides of the formula true. But if 'Fn' is simply the attributive '(ny)Py' then this is not necessary, since it is contingent even, in the first place, that there is a number of P's, instead of just some P, making both sides of the formula false.

Hughes and Cresswell argue against the principle saying (Hughes and Cresswell 1968, p144):

...let [Fx] be 'x is the number of the planets'. Then the antecedent is true, for there must be some number which is the number of the planets (even if there were no planets at all there would still be such a number, namely 0): but the consequent is false, for since it is a contingent matter how many planets there are, there is no number which must be the number of the planets.

But this forgets continuous quantities, where there are no discrete items before the nomination of a unit. The number associated with some planetary material, for instance, numbers only arbitrary units of that material, and not the material itself. So the antecedent of Routley's Formula is not necessarily true.

Quine also used the number of the planets in his central argument against quantification into modal contexts. He said (Quine 1960, pp195-197):

If for the sake of argument we accept the term 'analytic' as predicable of sentences (hence as attachable predicatively to quotations or other singular terms designating sentences), then 'necessarily' amounts to 'is analytic' plus an antecedent pair of quotation marks. For example, the sentence:

(1) Necessarily 9 > 4

is explained thus:

(2) '9 > 4' is analytic...

So suppose (1) explained as in (2). Why, one may ask, should we preserve the operatorial form as of (1), and therewith modal logic, instead of just leaving matters as in (2)? An apparent advantage is the possibility of quantifying into modal positions; for we know we cannot quantify into quotation, and (2) uses quotation...

But is it more legitimate to quantify into modal positions than into quotation? For consider (1) even without regard to (2); surely, on any plausible interpretation, (1) is true and this is false:

(3) Necessarily the number of major planets > 4.

Since 9 = the number of major planets, we can conclude that the position of '9' in (1) is not purely referential and hence that the necessity operator is opaque.

But here Quine does not separate out the referential 'the number of the major planets is greater than 4', i.e. 'εn(ny)Py > 4', from the attributive 'There are more than 4 major planets', i.e. '(∃n)((ny)Py & n > 4)'. If 9 = εn(ny)Py, then it follows that εn(ny)Py > 4, but it does not follow that (∃n)((ny)Py & n > 4). Substitution of identicals in (1), therefore, does yield (3), even though it is not necessary that there are more than 4 major planets.

We can now go into some details of how one gets the 'x' in such a form as 'LFx' to be open for quantification. For, what one finds in traditional modal semantics (see Hughes and Cresswell 1968, passim) are formulas in the meta-linguistic style, like

V(Fx, i) = 1,

which say that the valuation put on 'Fx' is 1, in world i. There should be quotation marks around the 'Fx' in such a formula, to make it meta-linguistic, but by convention they are generally omitted. To effect the change to the non-meta-linguistic point of view, we must simply read this formula as it literally is, so that the 'Fx' is in indirect speech rather than direct speech, and the whole becomes the operator form 'it would be true in world i that Fx'. In this way, the term 'x' gets into the language of the reporter, and the meta/object distinction is not relevant. Any variable inside the subordinate proposition can now be quantified over, just like a variable outside it, which means there is 'quantifying in', and indeed all the normal predicate logic operations apply, since all individual terms are rigid.

A example illustrating this rigidity involves the actual top card in a pack, and the cards which might have been top card in other circumstances (see Slater 1988a). If the actual top card is the Ace of Spades, and it is supposed that the top card is the Queen of Hearts, then clearly what would have to be true for those circumstances to obtain would be for the Ace of Spades to be the Queen of Hearts. The Ace of Spades is not in fact the Queen of Hearts, but that does not mean they cannot be identical in other worlds (c.f. Hughes and Cresswell, 1968, p190). Certainly if there were several cards people variously thought were on top, those cards in the various supposed circumstances would not provide a constant c such that Fc is true in all worlds. But that is because those cards are functions of the imagined worlds -- the card a believes is top (εxBaFx) need not be the card b believes is top (εxBbFx), etc. It still remains that there is a constant, c, such that Fc is true in all worlds. Moreover, that c is not an 'intensional object', for the given Ace of Spades is a plain and solid extensional object, the actual top card (εxFx).

Routley, Meyer and Goddard did not accept the latter point, wanting a rigid semantics in terms of 'intensional objects' (Goddard and Routley, 1973, p561, Routley, Meyer and Goddard, 1974, p309, see also Hughes and Cresswell 1968, p197). Stalnaker and Thomason accepted that certain referential terms could be functional, when discriminating 'Socrates' from 'Miss America' -- although the functionality of 'Miss America in year t' is significantly different from that of 'the top card in y's belief'. For if this year's Miss America is last year's Miss America, still it is only one thing which is identical with itself, unlike with the two cards. Also, there is nothing which can force this year's Miss America to be last year's different Miss America, in the way that the counterfactuality of the situation with the playing cards forces two non-identical things in the actual world to be the same thing in the other possible world. Other possible worlds are thus significantly different from other times, and so, arguably, other possible worlds should not be seen from the Realist perspective appropriate for other times -- or other spaces.

4. The Epsilon Calculus' Problematic

It might be said that Realism has delayed a proper logical understanding of many of these things. If you look 'realistically' at picturesque remarks like that made before, namely 'the same object is 'inside' the belief as 'outside' in the room', then it is easy for inappropriate views about the mind to start to interfere, and make it seem that the same object cannot be in these two places at once. But if the mind were something like another space or time, then counterfactuality could get no proper purchase -- no one could be 'wrong', since they would only be talking about elements in their 'world', not any objective, common world. But really, all that is going on when one says, for instance,

There was a man in the room, but Celia believed he was a woman,

is that the same term -- or one term and a pronominal surrogate for it -- appears at two linguistic places in some discourse, with the same reference. Hence there is no grammatical difference between the cross reference in such an intensional case and the cross reference in a non-intensional case, such as

There was a man in the room. He was hungry.

i.e.

(∃x)Mx & HεxMx.

What has been difficult has merely been getting a symbolisation of the cross-reference in this more elementary kind of case. But it just involves extending the epsilon definition of existential statements, using a reiteration of the substituted epsilon term, as we can see.

It is now widely recognised how the epsilon calculus allows us to do this (Purdy 1994, Egli and von Heusinger 1995, Meyer Viol 1995, Ch 6), the theoretical starting point being the theorem about the Russellian theory of definite descriptions proved before, which breaks up what otherwise would be a single sentence into a sequential piece of discourse, enabling the existence and uniqueness clauses to be put in one sentence while the characterising remark is in another. The relationship starts to matter when, in fact, there is no obvious way to formulate a combination of anaphoric remarks in the predicate calculus, as in, for instance,

There is a king of France. He is bald,

where there is no uniqueness clause. This difficulty became a major problem when logicians started to consider anaphoric reference in the 1960s.

Geach, for instance, in Geach 1962, even believed there could not be a syllogism of the following kind (Geach 1962, p126):

A man has just drunk a pint of sulphuric acid.
Nobody who drinks a pint of sulphuric acid lives through the day.
So, he won't live through the day.

He said, one could only draw the conclusion:

Some man who has just drunk a pint of sulphuric acid won't live through the day.

Certainly one can only derive

(∃x)(Mx & Dx & ¬Lx)

from

(∃x)(Mx & Dx),

and

(x)(Dx → ¬Lx),

within predicate logic. But one can still derive

¬Lεx(Mx & Dx),

within the epsilon calculus.

Geach likewise was foxed later when he produced his famous case (numbered 3 in Geach 1967):

Hob thinks a witch has blighted Bob's mare, and Nob wonders whether she (the same witch) killed Cob's sow,

which is, in epsilon terms

Th(∃x)(Wx & Bxb) & OnKεx(Wx & Bxb)c.

For Geach saw that this could not be (4)

(∃x)(Wx & ThBxb & OnKxc),

or (5)

(∃x)(Th(Wx & Bxb)& OnKxc).

But also a reading of the second clause as (c.f. 18)

Nob wonders whether the witch who blighted Bob's mare killed Cob's sow,

in which 'the witch who blighted Bob's mare killed Cob's sow' is analysed in the Russellian manner, i.e. as (20)

just one witch blighted Bob's mare and she killed Cob's sow,

Geach realised does not catch the specific cross-reference -- amongst other things because of the uniqueness condition which is then introduced.

This difficulty with the uniqueness clause in Russellian analyses has been widely commented on, although a recent theorist, Neale, has said that Russell's theory only needs to be modestly modified: Neale's main idea is that, in general, definite descriptions should just be localised to the context. His resolution of Geach's troubling cases thus involves suggesting that 'she', in the above, might simply be 'the witch we have been hearing about' (Neale 1990, p221). Neale might here have said 'that witch who blighted Bob's mare', showing that an Hilbertian account of demonstrative descriptions would have a parallel effect.

A good deal of the ground breaking work on these matters, however, was done by someone again much influenced by Russell: Evans. But Evans significantly broke with Russell over uniqueness (Evans 1977, pp516-517):

One does not want to be committed, by this way of telling the story, to the existence of a day on which just one man and boy walked along a road. It was with this possibility in mind that I stated the requirement for the appropriate use of an E-type pronoun in terms of having answered, or being prepared to answer upon demand, the question 'He? Who?' or 'It? Which?' In order to effect this liberalisation we should allow the reference of the E-type pronoun to be fixed not only by predicative material explicitly in the antecedent clause, but also by material which the speaker supplies upon demand. This ruling has the effect of making the truth conditions of such remarks somewhat indeterminate; a determinate proposition will have been put forward only when the demand has been made and the material supplied.

It was Evans who gave us the title 'E-type pronoun' for the 'he' in such expressions as

A Cambridge philosopher smoked a pipe, and he drank a lot of whisky,

i.e., in epsilon terms,

(∃x)(Cx & Px) & Dεx(Cx & Px).

He also insisted (Evans 1977, p516) that what was unique about such pronouns was that this conjunction of statements was not equivalent to

A Cambridge philosopher, who smoked a pipe, drank a lot of whisky,

i.e.

(∃x)(Cx & Px & Dx).

Clearly the epsilon account is entirely in line with this, since it illustrates the point made before about cases without a uniqueness clause. Only the second expression, which contains a relative pronoun, is formalisable in the predicate calculus. To formalise the first expression, which contains a personal pronoun, one at least needs something with the expressive capabilities of the epsilon calculus.

5. The Formal Semantics of Epsilon Terms

The semantics of epsilon terms is nowadays more general, but the first interpretations of epsilon terms were restricted to arithmetical cases, and specifically took epsilon to be the least number operator. Hilbert and Bernays developed Arithmetic using the epsilon calculus, using the further epsilon axiom schema (Hilbert and Bernays 1970, Book 2, p85f, c.f. Leisenring 1969, p92) :

(εxAx = st) → ¬At,

where 's' is intended to be the successor function, and 't' is any numeral. This constrains the interpretation of the epsilon symbol, but the least number interpretation is not strictly forced, since the axiom only ensures that no number having the property A immediately precedes εxAx.

The new axiom, however, is sufficient to prove mathematical induction, in the form:

(A0 & (x)(Ax → Asx)) → (x)Ax.

For assume the reverse, namely

A0 & (x)(Ax → Asx) & ¬(x)Ax,

and consider what happens when the term 'εx¬Ax' is substituted in

t = 0 ∨ t = sn,

which is derivable from the other axioms of number theory which Hilbert and Bernays are using. If we had

εx¬Ax = 0,

then, since it is given that A0, then we would have Aεx¬Ax. But since, by the definition of the universal quantifier,

Aεx¬Ax ↔ (x)Ax,

we know, because ¬(x)Ax is also given, that ¬Aεx¬Ax, which means we cannot have εx¬Ax = 0. Hence we must have the other alternative, i.e.

εx¬Ax = sn,

for some n. But from the new axiom

(εx¬Ax = sn) → An,

hence we must have An, although we must also have

An → Asn,

because (x)(Ax → Asx). All together that requires Aεx¬Ax again, which is impossible. Hence the further epsilon axiom is sufficient to establish the given principle of induction.

The more general link between epsilon terms and choice functions was first set out by Asser, although Asser's semantics for an elementary epsilon calculus without the second epsilon axiom makes epsilon terms denote rather complex choice functions. Wilfrid Meyer Viol, calling an epsilon calculus without the second axiom an 'intensional' epsilon calculus, makes the epsilon terms in such a calculus instead name Skolem functions. Skolem functions are also called Herbrand functions, although they arise in a different way, namely in Skolem's Theorem. Skolem's Theorem states that, if a formula in prenex normal form is provable in the predicate calculus, then a certain corresponding formula, with the existential quantifiers removed, is provable in a predicate calculus enriched with function symbols. The functions symbolised are called Skolem functions, although, in another context, they would be Herbrand functions.

Skolem's Theorem is a meta-logical theorem, about the relation between two logical calculi, but a non-metalogical version is in fact provable in the epsilon calculus from which Skolem's actual theorem follows, since, for example, we can get, by the epsilon definition, now of the existential quantifier

(x)(∃y)Fxy ↔ (x)FxεyFxy.

As a result, if the left hand side of such an equivalence is provable in an epsilon calculus the right hand side is provable there. But the left hand side is provable in an epsilon calculus if it is provable in the predicate calculus, by the Second Epsilon Theorem; and if the right hand side is provable in an epsilon calculus it is provable in a predicate calculus enriched with certain function symbols -- epsilon terms, like 'εyFxy'. So, by generalisation, we get Skolem's original result.

When we add to an intensional epsilon calculus the second epsilon axiom

(x)(Fx ↔ Gx) →εxFx = εxGx,

the interpretation of epsilon terms is commonly extensional, i.e. in terms of sets, since two predicates 'F' and 'G' satisfying the antecedent of this second axiom will determine the same set -- if they determine sets at all, that is. For that requires the predicates to be collectivisantes, in Bourbaki's terms, as with explicit set membership statements, like 'x ∈ y'. In such a case the epsilon term 'εx(x ∈ y)' designates a choice function, i.e. a function which selects one from a given set (c.f. Leisenring 1969, p19, Meyer Viol 1995, p42). In the case where there are no members of the set the selection is arbitrary, although for all empty sets it is invariably the same. Thus the second axiom validates, for example, Kalish and Montague's rule for this case, which they put in the form

εxFx = εx¬(x = x).

Kalish and Montague in fact prove a version of the second epsilon axiom in their system (Kalish and Montague 1964, see T407, p256). The second axiom also holds in Hermes' system (Hermes 1965), although there one in addition finds a third epsilon axiom,

εx¬(x = x) = εx(x = x),

for which there would seem to be no real justification.

But the second epsilon axiom itself is curious. One questionable thing about it is that both Leisenring and Meyer Viol do not state that the predicates in question must determine sets before their choice function semantics can apply. That the predicates are collectivisantes is merely presumed in their theories, since 'εxBx' is invariably modelled by means of a choice from the presumed set of things which in the model are B. Certainly there is a special clause dealing with the empty set; but there is no consideration of the case where some things are B although those things are not discrete, as with the things which are red, for instance. If the predicate in question is not a count noun then there is no set of things involved, since with mass terms, and continuous quantities there are no given elements to be counted (c.f. Bunt 1985, pp262-263 in particular). Of course numbers can still be associated with them, but only given an arbitrary unit. With the cows in a field, for instance, we can associate a determinate number, but with the beef there we cannot, unless we consider, say, the number of pounds of it.

The point, as we saw before, has a formalisation in epsilon terms. Thus if we write '(nx)Fx', for 'there are n F's', then εn(ny)Fy will be the number of F's, and it is what numbers them if they have a number. But in the reverse case the previously mentioned arbitrariness of the epsilon term comes in. For if ¬(∃n)(nx)Fx, then ¬([εn(ny)Fy]x)Fx, and so, although an arbitrary number exists, it does not number the F's. In that case, in other words, we do not have a number of F's, merely some F.

In fact, even when there is a set of things, the second epsilon axiom, as stated above, does not apply in general, since there are intensional differences between properties to consider, as in, for instance 'There is a red-haired man, and a Caucasian in the room, and they are different'. Here, if there were only red-haired Caucasians in the room, then with the above second axiom, we could not find epsilon substitutions to differentiate the two individuals involved. This may remind us that it is necessary co-extensionality, and not just contingent co-extensionality which is the normal criterion for the identity of properties (c.f. Hughes and Cresswell 1968, pp209-210). So it leads us to see the appropriateness of a modalised second axiom, which uses just an intensional version of the antecedent of the previous second epsilon axiom, in which 'L' means 'it is necessary that', namely:

L(x)(Fx ↔ Gx) →εxFx = εxGx.

For with this axiom only the co-extensionalities which are necessary will produce identities between the associated epsilon terms. We can only get, for instance,

εxPx = εx(Px ∨ Px),

and

εxFx = εyFy,

and all other identities derivable in a similar way.

However, the original second epsilon axiom is then provable, in the special case where the predicates express set membership. For if necessarily

(x)(x ∈ y ↔ x ∈ z) ↔ y = z,

while necessarily

y = z ↔ L(y = z),

(see Hughes and Cresswell, 1968, p190) then

L(x)(x ∈ y ↔ x ∈ z) ↔ (x)(x ∈ y ↔ x ∈ z),

and so, from the modalised second axiom we can get

(x)(x ∈ y ↔ x ∈ z) →εx(x ∈ y) = εx(x ∈ z).

Note, however, that if one only has contingently

(x)(Fx ↔ x ∈ z),

then one cannot get, on this basis,

εxFx = εx(x ∈ z).

But this is something which is desirable, as well. For we have seen that it is contingent that the number of the planets does number the planets -- because it is not necessary that ([εn(ny)Py]x)Px. This makes '(9x)Px' contingent, even though the identity '9 = εn(nx)Px' remains necessary. But also it is contingent that there is the set of planets, p, which there is, since while, say,

(x)(x ∈ p ↔ Px),

where

εn(nx)(x ∈ p) = εn(nx)Px = 9,

it is still possible that, in some other possible world,

(x)(x ∈ p' ↔ Px),

with p' the set of planets there, and

¬(εn(nx)(x ∈ p') = 9).

We could not have this further contingency, however, if the original second epsilon axiom held universally.

It is on this fuller basis that we can continue to hold 'x = y → L(x = y)', i.e. the invariable necessity of identity -- one merely distinguishes '(9x)Px' from '9 = εx(nx)Px', and from '9 = εx(nx)(x ∈ p)', as above.

Adding the original second epsilon axiom to an intensional epsilon calculus is therefore acceptable only if all the predicates are about set membership. This is not an uncommon assumption, indeed it is pervasive in the usually given semantics for predicate logic, for instance. But if, by contrast, we want to allow for the fact that not all predicates are collectivisantes then we should take just the first epsilon axiom with merely a modalised version of the second epsilon axiom. The interpretation of epsilon terms is then always in terms of Skolem functions, although if we are dealing with the membership of sets, those Skolem functions naturally are choice functions.

6. Some Metatheory

To finish we shall briefly look, as promised, at some meta-theory.

The epsilon calculi that were first described were not very convenient to use, and Hilbert and Bernays' proofs of the First and Second Epsilon Theorems were very complex. This was because the presentation was axiomatic, however, and with the development of other means of presenting the same logics we get more readily available meta-logical results. I will indicate some of the early difficulties before showing how these theorems can be proved, nowadays, much more simply.

The problem with proving the Second Epsilon Theorem, on an axiomatic basis, is that complex, and non-constant epsilon terms may enter a proof in the epsilon calculus by means of substitutions into the axioms. What has to be proved is that an epsilon calculus proof of an epsilon-free theorem (i.e. one which can be expressed just in predicate calculus language) can be replaced by a predicate calculus proof. So some analysis of complex epsilon terms is required, to show that they can be eliminated in the relevant cases, leaving only constant epsilon terms, which are sufficiently similar to the individual symbols in standard predicate logic. Hilbert and Bernays (Hilbert and Bernays 1970, Book 2, p23f) say that one epsilon term 'εxFx' is subordinate to another 'εyGy' if and only if 'G' contains 'εxFx', and a free occurrence of the variable 'y' lies within 'εxFx'. For instance 'εxRxy' is a complex, and non-constant epsilon term, which is subordinate to 'εySyεxRyx'. Hilbert and Bernays then define the rank of an epsilon term to be 1 if there are no epsilon terms subordinate to it, and otherwise to be one greater than the maximal rank of the epsilon terms which are subordinate to it. Using the same general ideas, Leisenring proves two theorems (Leisenring 1969, p72f). First he proves a rank reduction theorem, which shows that epsilon proofs of epsilon-free formulas in which the second epsilon axiom is not used, but in which every term is of rank less than or equal to r, may be replaced by epsilon proofs in which every term is of rank less than or equal to r - 1. Then he proves the eliminability of the second epsilon axiom in proofs of epsilon-free formulas. Together, these two theorems show that if there is an epsilon proof of an epsilon-free formula, then there is such a proof not using the second epsilon axiom, and in which all epsilon terms have rank just 1. Even though such epsilon terms might still contain free variables, if one replaces those that do with a fixed symbol 'a' (starting with those of maximal length) that reduces the proof to one in what is called the 'epsilon star' system, in which there are only constant epsilon terms (Leisenring 1969, p66f). Leisenring shows that proofs in the epsilon star system can be turned into proofs in the predicate calculus, by replacing the epsilon terms by individual symbols.

But, as was said before, there is now available a much shorter proof of the Second Epsilon Theorem. In fact there are several, but I shall just indicate one, which arises simply by modifying the predicate calculus truth trees, as found in, for instance, Jeffrey (see Jeffrey 1967). Jeffrey uses the standard propositional truth tree rules, together with the rules of quantifier interchange, which remain unaffected, and which are not material to the present purpose. He also has, however, a rule of existential quantifier elimination,

(∃x)Fx ├ Fa,

in which 'a' must be new, and a rule of universal quantifier elimination

(x)Fx ├ Fb,

in which 'b' must be old -- unless no other individual terms are available. By reducing closed formulas of the form 'P & ¬C' to absurdity Jeffrey can then prove 'P → C', and validate 'P ├ C' in his calculus. But clearly, upon adding epsilon terms to the language, the first of these rules must be changed to

(∃x)Fx ├ FεxFx,

while also the second rule can be replaced by the pair

(x)Fx ├ Fεx¬Fx,
Fεx¬Fx ├ Fa,

(where 'a' is old) to produce an appropriate proof procedure. Steen reads 'εx¬Fx' as 'the most un-F-like thing' (Steen 1972, p162), which explains why Fεx¬Fx entails Fa, since if the most un-F-like thing is in fact F, then the most plausible counter-example to the generalisation is in fact not so, making the generalisation exceptionless. But there is a more important reason why the rule of universal quantifier elimination is best broken up into two parts.

For Jeffrey's rules only allow him 'limited upward correctness' (Jeffrey 1967, p167), since Jeffrey has to say, with respect to his universal quantifier elimination rule, that the range of the quantification there be limited merely to the universe of discourse of the path below. This is because, if an initial sentence is false in a valuation so also must be one of its conclusions. But the first epsilon rule which replaces Jeffrey's rule ensures, instead, that there is 'total upwards correctness'. For if it is false that everything is F then, without any special interpretation of the quantifier, one of the given consequences of the universal statement is false, namely the immediate one -- since Fεx¬Fx is in fact equivalent to (x)Fx. A similar improvement also arises with the existential quantifier elimination rule. For Jeffrey can only get 'limited downwards correctness', with his existential quantifier elimination rule (Jeffrey 1967, p165), since it is not an entailment. In fact, in order to show that if an initial sentence is true in a valuation so is one of its conclusions, in this case, Jeffrey has to stretch his notion of 'truth' to being true either in the given valuation, or some nominal variant of it.

The epsilon rule which replaces Jeffrey's overcomes this difficulty by not employing names, only demonstrative descriptions, and by being, as a result, totally downward correct. For if there is an F then that F is F, whatever name is used to refer to it. The epsilon calculus terminology thus precedes any naming: it gets hold of the more primitive, demonstrative way we have of referring to objects, using phrases like 'that F'. Thus in explication of the predicate calculus rule we might well have said

suppose there is an F, well, call that F 'a', then Fa,

but that requires we understand 'that F' before we come to use 'a'.

So how does the Second Epsilon Theorem follow? This theorem, as before, states that an epsilon calculus proof of an epsilon-free theorem may be replaced by a predicate calculus proof of the same formula. But the transformation required in the present setting is now evident: simply change to new names all epsilon terms introduced in the epsilon calculus quantifier elimination rules. This covers both the new names in Jeffrey's first rule, but also the odd case where there are no old names in Jeffrey's second rule. The epsilon calculus proofs invariably use constant epsilon terms, and are thus effectively in Leisenring's epsilon star system.

Epsilon terms which are non-constant, however, crucially enter the proof of the First Epsilon Theorem. The First Epsilon Theorem states that if C is a provable predicate calculus formula, in prenex normal form, i.e. with all quantifiers at the front, then a finite disjunction of instances of C's matrix is provable in the epsilon calculus. The crucial fact is that the epsilon calculus gives us access to Herbrand functions, which arise when universal quantifiers are eliminated from formulas using their epsilon definition. Thus

(∃y)(x)¬Fyx,

for instance, is equivalent to

(∃y)¬Fyεx¬¬Fyx,

and so

(∃y)¬FyεxFyx,

and the resulting epsilon term 'εxFyx' is a Herbrand function.

Using such reductions, all universal quantifiers can evidently be removed from formulas in prenex normal form, and the additional fact that, in a certain specific way, the remaining existential quantifiers are disjunctions makes all predicate calculus formulas equivalent to disjunctions. Remember that a formula is provable if its negation is reducible to absurdity, which means that its truth tree must close. But, by König's Lemma, if there is no open path through a truth tree then there is some finite stage at which there is no open path, so, in the case above, for instance, if no valuation makes the last formula's negation true, then the tree of the instances of that negative statement must close in a finite length. But the negative statement is the universal formula

(y)FyεxFyx,

by the rules of quantifier interchange, so a finite conjunction of instances of the matrix of this universal formula, namely Fyx, must reduce to absurdity. For the rules of universal quantifier elimination only produce consequences with the form of this matrix. By de Morgan's Laws, that makes necessary a finite disjunction of instances of ¬Fyx. By generalisation we thus get the First Epsilon Theorem.

The epsilon calculus, however, can take us further than the First Epsilon Theorem. Indeed, one has to take care with the impression this theorem may give that existential statements are just equivalent to disjunctions. If that were the case, then existential statements would be unlike individual statements, saying not that one specified thing has a certain property, but merely that one of a certain group of things has a certain property. The group in question is normally called the 'domain' of the quantification, and this, it seems, has to be specified when setting out the semantics of quantifiers. But study of the epsilon calculus shows that there is no need for such 'domains', or indeed for such semantics. This is because the example above, for instance, is also equivalent to

¬FaεzFaz,

where a = εy¬FεxFyx. So the previous disjunction of instances of ¬Fyx is in fact only true because this specific disjunct is true. The First Epsilon Theorem, it must be remembered, does not prove that an existential statement is equivalent to a certain disjunction; it shows merely that an existential statement is provable if and only if a certain disjunction is provable. And what is also provable, in such a case, is a statement merely about one object. Indeed the existential statement is provably equivalent to it. It is this fact which supports the epsilon definition of the quantifiers; and it is what permits anaphoric reference to the same object by means of the same epsilon term. An existential statement is thus just another statement about an individual -- merely a nameless one.

The reverse point goes for the universal quantifier: a universal statement is not the conjunction of its instances, even though it implies them. A generalisation is simply equivalent to one of its instances -- to the one involving the prime putative exception to it, as we have seen. Not being able to specify that prime putative exception leaves Jeffrey saying that if a generalisation is false then one of its instances is false without any way of ensuring that that instance has been drawn as a conclusion below it in the truth tree except by limiting the interpretation of the generalisation just to the universe of discourse of the path. It thus seems necessary, within the predicate calculus, that there be a 'model' for the quantifiers which restricts them to a certain 'domain', which means that they do not necessarily range over everything. But in the epsilon calculus the quantifiers do, invariably, range over everything, and so there is no need to specify their range.

7. References and Further Reading

  • Ackermann, W. 1937-8, 'Mengentheoretische Begründung der Logik', Mathematische Annalen, 115, 1-22.
  • Asser, G. 1957, 'Theorie der Logischen Auswahlfunktionen', Zeitschrift für Mathematische Logik und Grundlagen der Mathematik, 3, 30-68.
  • Bernays, P. 1958, Axiomatic Set Theory, North Holland, Dordrecht.
  • Bertolet, R. 1980, 'The Semantic Significance of Donnellan's Distinction', Philosophical Studies, 37, 281-288.
  • Bourbaki, N. 1954, Éléments de Mathématique, Hermann, Paris.
  • Bunt, H.C. 1985, Mass Terms and Model-Theoretic Semantics, C.U.P., Cambridge.
  • Church, A. 1940, 'A Formulation of the Simple Theory of Types', Journal of Symbolic Logic, 5, 56-68.
  • Copi, I. 1973, Symbolic Logic, 4th ed. Macmillan, New York.
  • Devitt, M. 1974, 'Singular Terms', The Journal of Philosophy, 71, 183-205.
  • Donnellan, K. 1966, 'Reference and Definite Descriptions', Philosophical Review, 75, 281-304.
  • Egli, U. and von Heusinger, K. 1995, 'The Epsilon Operator and E-Type Pronouns' in U. Egli et al. (eds.), Lexical Knowledge in the Organisation of Language, Benjamins, Amsterdam.
  • Evans, G. 1977, 'Pronouns, Quantifiers and Relative Clauses', Canadian Journal of Philosophy, 7, 467-536.
  • Geach, P.T. 1962, Reference and Generality, Cornell University Press, Ithaca.
  • Geach, P.T. 1967, 'Intentional Identity', The Journal of Philosophy, 64, 627-632.
  • Goddard, L. and Routley, R. 1973, The Logic of Significance and Context, Scottish Academic Press, Aberdeen.
  • Gödel, K. 1969, 'An Interpretation of the Intuitionistic Sentential Calculus', in J. Hintikka (ed.), The Philosophy of Mathematics, O.U.P. Oxford.
  • Hazen, A. 1987, 'Natural Deduction and Hilbert's ε-operator', Journal of Philosophical Logic, 16, 411-421.
  • Hermes, H. 1965, Eine Termlogik mit Auswahloperator, Springer Verlag, Berlin.
  • Hilbert, D. 1923, 'Die Logischen Grundlagen der Mathematik', Mathematische Annalen, 88, 151-165.
  • Hilbert, D. 1925, 'On the Infinite' in J. van Heijenhoort (ed.), From Frege to Gödel, Harvard University Press, Cambridge MA.
  • Hilbert, D. and Bernays, P. 1970, Grundlagen der Mathematik, 2nd ed., Springer, Berlin.
  • Hughes, G.E. and Cresswell, M.J. 1968, An Introduction to Modal Logic, Methuen, London.
  • Jeffrey, R. 1967, Formal Logic: Its Scope and Limits, 1st Ed. McGraw-Hill, New York.
  • Kalish, D. and Montague, R. 1964, Logic: Techniques of Formal Reasoning, Harcourt, Brace and World, Inc, New York.
  • Kneebone, G.T. 1963, Mathematical Logic and the Foundations of Mathematics, Van Nostrand, Dordrecht.
  • Leisenring, A.C. 1969, Mathematical Logic and Hilbert's ε-symbol, Macdonald, London.
  • Marciszewski, W. 1981, Dictionary of Logic, Martinus Nijhoff, The Hague.
  • Meyer Viol, W.P.M. 1995, Instantial Logic, ILLC Dissertation Series 1995-11, Amsterdam.
  • Montague, R. 1963, 'Syntactical Treatments of Modality, with Corollaries on Reflection Principles and Finite Axiomatisability', Acta Philosophica Fennica, 16, 155-167.
  • Neale, S. 1990, Descriptions, MIT Press, Cambridge MA.
  • Priest, G.G. 1984, 'Semantic Closure', Studia Logica, XLIII 1/2, 117-129.
  • Prior, A.N., 1971, Objects of Thought, O.U.P. Oxford.
  • Purdy, W.C. 1994, 'A Variable-Free Logic for Anaphora' in P. Humphreys (ed.) Patrick Suppes: Scientific Philosopher, Vol 3, Kluwer, Dordrecht, 41-70.
  • Quine, W.V.O. 1960, Word and Object, Wiley, New York.
  • Rasiowa, H. 1956, 'On the ε-theorems', Fundamenta Mathematicae, 43, 156-165.
  • Rosser, J. B. 1953, Logic for Mathematicians, McGraw-Hill, New York.
  • Routley, R. 1969, 'A Simple Natural Deduction System', Logique et Analyse, 12, 129-152.
  • Routley, R. 1977, 'Choice and Descriptions in Enriched Intensional Languages II, and III', in E. Morscher, J. Czermak, and P. Weingartner (eds), Problems in Logic and Ontology, Akademische Druck und Velagsanstalt, Graz.
  • Routley, R. 1980, Exploring Meinong's Jungle, Departmental Monograph #3, Philosophy Department, R.S.S.S., A.N.U. Canberra.
  • Routley, R., Meyer, R. and Goddard, L. 1974, 'Choice and Descriptions in Enriched Intensional Languages I', Journal of Philosophical Logic, 3, 291-316.
  • Russell, B. 1905, 'On Denoting' Mind, 14, 479-493.
  • Sayward, C. 1987, 'Prior's Theory of Truth' Analysis, 47, 83-87.
  • Slater, B.H. 1986(a), 'E-type Pronouns and ε-terms', Canadian Journal of Philosophy, 16, 27-38.
  • Slater, B.H. 1986(b), 'Prior's Analytic', Analysis, 46, 76-81.
  • Slater, B.H. 1988(a), 'Intensional Identities', Logique et Analyse, 121-2, 93-107.
  • Slater, B.H. 1988(b), 'Hilbertian Reference', Noûs, 22, 283-97.
  • Slater, B.H. 1989(a), 'Modal Semantics', Logique et Analyse, 127-8, 195-209.
  • Slater, B.H. 1990, 'Using Hilbert's Calculus', Logique et Analyse, 129-130, 45-67.
  • Slater, B.H. 1992(a), 'Routley's Formulation of Transparency', History and Philosophy of Logic, 13, 215-24.
  • Slater, B.H. 1994(a), 'The Epsilon Calculus' Problematic', Philosophical Papers, XXIII, 217-42.
  • Steen, S.W.P. 1972, Mathematical Logic, C.U.P. Cambridge.
  • Thomason, R. 1977, 'Indirect Discourse is not Quotational', Monist, 60, 340-354.
  • Thomason, R. 1980, 'A Note on Syntactical Treatments of Modality', Synthese, 44, 391-395.
  • Thomason, R.H. and Stalnaker, R.C. 1968, 'Modality and Reference', Noûs, 2, 359-372.

Author Information

Barry Hartley Slater
Email: slaterbh@cyllene.uwa.edu.au
University of Western Australia
Australia

Gottlob Frege (1848—1925)

FregeGottlob Frege was a German logician, mathematician and philosopher who played a crucial role in the emergence of modern logic and analytic philosophy. Frege's logical works were revolutionary, and are often taken to represent the fundamental break between contemporary approaches and the older, Aristotelian tradition. He invented modern quantificational logic, and created the first fully axiomatic system for logic, which was complete in its treatment of propositional and first-order logic, and also represented the first treatment of higher-order logic. In the philosophy of mathematics, he was one of the most ardent proponents of logicism, the thesis that mathematical truths are logical truths, and presented influential criticisms of rival views such as psychologism and formalism. His theory of meaning, especially his distinction between the sense and reference of linguistic expressions, was groundbreaking in semantics and the philosophy of language. He had a profound and direct influence on such thinkers as Russell, Carnap and Wittgenstein. Frege is often called the founder of modern logic, and he is sometimes even heralded as the founder of analytic philosophy.

Table of Contents

  1. Life and Works
  2. Contributions to Logic
  3. Contributions to the Philosophy of Mathematics
  4. The Theory of Sense and Reference
  5. References and Further Reading
    1. Frege's Own Works
    2. Important Secondary Works

1. Life and Works

Frege was born on November 8, 1848 in the coastal city of Wismar in Northern Germany. His full christened name was Friedrich Ludwig Gottlob Frege. Little is known about his youth. His father, Karl Alexander Frege, and his mother, Auguste (Bialloblotzsky) Frege, both worked at a girl's private school founded in part by Karl. Both were also principals of the school at various points: Karl held the position until his death 1866, when Auguste took over until her death in 1878. The German writer Arnold Frege, born in Wismar in 1852, may have been Frege's younger brother, but this has not been confirmed. Frege probably lived in Wismar until 1869; in the years from 1864-1869 he is known to have studied at the Gymnasium in Wismar.

In Spring 1869, Frege began studies at the University of Jena. There, he studied chemistry, philosophy and mathematics, and must have solidly impressed Ernst Abbe in mathematics, who later become of Frege's benefactors. After four semesters, Frege transferred to the University of Göttingen, where he studied mathematics and physics, as well as philosophy of religion under Hermann Lotze. (Lotze is sometimes thought to have had a profound impact on Frege's philosophical views.) In late 1873, Frege finished his doctoral dissertation, under the guidance of Ernst Schering, entitled Über eine geometrische Darstellung der imaginären Gebilde in der Ebene ("On a Geometrical Representation of Imaginary Figures in a Plane"), and received his Ph.D.

In 1874, with the recommendation of Ernst Abbe, Frege received a lectureship at the University of Jena, where he stayed the rest of his intellectual life. His position was unsalaried during his first five years, and he was supported by his mother. Frege's Habilitationsschrift, entitled Rechnungsmethoden, die auf eine Erweiterung des Grössenbegriffes gründen ("Methods of Calculation Based upon An Amplification of the Concept of Magnitude,"), was included with the material submitted to obtain the position. It involves the theory of complex mathematical functions, and contains seeds of Frege's advances in logic and the philosophy of mathematics.

Frege had a heavy teaching load during his first few years at Jena. However, he still had time to work on his first major work in logic, which was published in 1879 under the title Begriffsschrift, eine der arithmetischen nachgebildete Formelsprache des reinen Denkens ("Concept-Script: A Formula Language for Pure Thought Modeled on That of Arithmetic"). Therein, Frege presented for the first time his invention of a new method for the construction of a logical language. Upon the publication of the Begriffsschrift, he was promoted to ausserordentlicher Professor, his first salaried position. However, the book was not well-reviewed by Frege's contemporaries, who apparently found its two-dimensional logical notation difficult to comprehend, and failed to see its advantages over previous approaches, such as that of Boole.

Sometime after the publication of the Begriffsschrift, Frege was married to Margaret Lieseburg (1856-1905). They had at least two children, who unfortunately died young. Years later they adopted a son, Alfred. However, little else is known about Frege's family life.

Frege had aimed to use the logical language of the Begriffsschrift to carry out his logicist program of attempting to show that all of the basic truths of arithmetic could be derived from purely logical axioms. However, on the advice of Carl Stumpf, and given the poor reception of the Begriffsschrift, Frege decided to write a work in which he would describe his logicist views informally in ordinary language, and argue against rival views. The result was his Die Grundlagen der Arithmetik ("The Foundations of Arithmetic"), published in 1884. However, this work seems to have been virtually ignored by most of Frege's contemporaries.

Soon thereafter, Frege began working on his attempt to derive the basic laws of arithmetic within his logical language. However, his work was interrupted by changes to his views. In the late 1880s and early 1890s Frege developed new and interesting theories regarding the nature of language, functions and concepts, and philosophical logic, including a novel theory of meaning based on the distinction between sense and reference. These views were published in influential articles such as "Funktion und Begriff" ("Function and Concept", 1891), "Über Sinn und Bedeutung" ("On Sense and Reference", 1892) and "Über Begriff und Gegenstand" ("On Concept and Object", 1892). This maturation of Frege's semantic and philosophical views lead to changes in his logical language, forcing him to abandon an almost completed draft of his work in logic and the foundations of mathematics. However, in 1893, Frege finally finished a revised volume, employing a slightly revised logical system. This was his magnum opus, Grundgesetze der Arithmetik ("Basic Laws of Arithmetic"), volume I. In the first volume, Frege presented his new logical language, and proceeded to use it to define the natural numbers and their properties. His aim was to make this the first of a three volume work; in the second and third, he would move on to the definition of real numbers, and the demonstration of their properties.

Again, however, Frege's work was unfavorably reviewed by his contemporaries. Nevertheless, he was promoted once again in 1894, now to the position of Honorary Ordinary Professor. It is likely that Frege was offered a position as full Professor, but turned it down to avoid taking on additional administrative duties. His new position was unsalaried, but he was able to support himself and his family with a stipend from the Carl Zeiss Stiftung, a foundation that gave money to the University of Jena, and with which Ernst Abbe was intimately involved.

Because of the unfavorable reception of his earlier works, Frege was forced to arrange to have volume II of the Grundgesetze published at his own expense. It was not until 1902 that Frege was able to make such arrangements. However, while the volume was already in the publication process, Frege received a letter from Bertrand Russell, informing him that it was possible to prove a contradiction in the logical system of the first volume of the Grundgesetze, which included a naive calculus for classes. For more information, see the article on "Russell's Paradox". Frege was, in his own words, "thunderstruck". He was forced to quickly prepare an appendix in response. For the next couple years, he continued to do important work. A series of articles entitled "Über die Grundlagen der Geometrie," ("On the Foundations of Geometry") was published between 1903 and 1906, representing Frege's side of a debate with David Hilbert over the nature of geometry and the proper construction and understanding of axiomatic systems within mathematics.

However, around 1906, probably due to some combination of poor health, the early loss of his wife in 1905, frustration with his failure to find an adequate solution to Russell's paradox, and disappointment over the continued poor reception of his work, Frege seems to have lost his intellectual steam. He produced very little work between 1906 and his retirement in 1918. However, he continued to influence others during this period. Russell had included an appendix on Frege in his 1903 Principles of Mathematics. It is from this that Frege came be to be a bit wider known, including to an Austrian student studying engineering in Manchester, England, named Ludwig Wittgenstein. Wittgenstein studied the work of Frege and Russell closely, and in 1911, he wrote to both of them concerning his own solution to Russell's paradox. Frege invited him to Jena to discuss his views. Wittgenstein did so in late 1911. The two engaged in a philosophical debate, and while Wittgenstein reported that Frege "wiped the floor" with him, Frege was sufficiently impressed with Wittgenstein that he suggested that he go to Cambridge to study with Russell--a suggestion that had profound importance for the history of philosophy. Moreover, Rudolf Carnap was one of Frege's students from 1910 to 1913, and doubtlessly Frege had significant influence on Carnap's interest in logic and semantics and his subsequent intellectual development and successes.

After his retirement in 1918, Frege moved to Bad Kleinen, near Wismar, and managed to publish a number of important articles, "Der Gedanke" ("The Thought", 1918), "Der Verneinung" ("Negation", 1918), and "Gedankengefüge" ("Compound Thoughts", 1923). However, these were not wholly new works, but later drafts of works he had initiated in the 1890s. In 1924, a year before his death, Frege finally returned to the attempt to understand the foundations of arithmetic. However, by this time, he had completely given up on his logicism, concluding that the paradoxes of class or set theory made it impossible. He instead attempted to develop a new theory of the nature of arithmetic based on Kantian pure intuitions of space. However, he was not able to write much or publish anything about his new theory. Frege died on July 26, 1925 at the age of 76.

At the time of his death, Frege's own works were still not very widely known. He did not live to see the profound impact he would have on the emergence of analytic philosophy, nor to see his brand of logic--due to the championship of Russell--virtually wholly supersede earlier forms of logic. However, in bequeathing his unpublished work to his adopted son, Alfred, he wrote prophetically, "I believe there are things here which will one day be prized much more highly than they are now. Take care that nothing gets lost." Alfred later gave Frege's papers to Heinrich Scholz of the University of Münster for safekeeping. Unfortunately, however, they were destroyed in an Allied bombing raid on March 25, 1945. Although Scholz had made copies of some of the more important pieces, a good portion of Frege's unpublished works were lost.

Although he was a fierce, sometimes even satirical, polemicist, Frege himself was a quiet, reserved man. He was right-wing in his political views, and like many conservatives of his generation in Germany, he is known to have been distrustful of foreigners and rather anti-semitic. Himself Lutheran, Frege seems to have wanted to see all Jews expelled from Germany, or at least deprived of certain political rights. This distasteful feature of Frege's personality has gravely disappointed some of Frege's intellectual progeny.

2. Contributions to Logic

Trained as a mathematician, Frege's interests in logic grew out of his interests in the foundations of arithmetic. Early in his career, Frege became convinced that the truths of arithmetic are logical, analytic truths, agreeing with Leibniz, and disagreeing with Kant, who thought that arithmetical knowledge was grounded in "pure intuition", as well as more empiricist thinkers such as J. S. Mill, who thought that arithmetic was grounded in observation. In other words, Frege subscribed to logicism. His logicism was modest in one sense, but very ambitious in others. Frege's logicism was limited to arithmetic; unlike other important historical logicists, such as Russell, Frege did not think that geometry was a branch of logic. However, Frege's logicism was very ambitious in another regard, as he believed that one could prove all of the truths of arithmetic deductively from a limited number of logical axioms. Indeed, Frege himself set out to demonstrate all of the basic laws of arithmetic within his own system of logic.

Frege concurred with Leibniz that natural language was unsuited to such a task. Thus, Frege sought to create a language that would combine the tasks of what Leibniz called a "calculus ratiocinator" and "lingua characterica", that is, a logically perspicuous language in which logical relations and possible inferences would be clear and unambiguous. Frege's own term for such a language, "Begriffsschrift" was likely borrowed from a paper on Leibniz's ideas written by Adolf Trendelenburg. Although there had been attempts to fashion at least the core of such a language made by Boole and others working in the Leibnizian tradition, Frege found their work unsuitable for a number of reasons. Boole's logic used some of the same signs used in mathematics, except with different logical meanings. Frege found this unacceptable for a language which was to be used to demonstrate mathematical truths, because the signs would be ambiguous. Boole's logic, though innovative in some respects, was weak in others. It was divided into a "primary logic" and "secondary logic", bifurcating its propositional and categorical elements, and could not deal adequately with multiple generalities. It analyzed propositions in terms of subject and predicate concepts, which Frege found to be imprecise and antiquated.

Frege saw the formulae of mathematics as the paradigm of clear, unambiguous writing. Frege's brand of logical language was modeled upon the international language of arithmetic, and it replaced the subject/predicate style of logical analysis with the notions of function and argument. In mathematics, an equation such as "f(x) = x2 + 1" states that f is a function that takes x as argument and yields as value the result of multiplying x by itself and adding one. In order to make his logical language suitable for purposes other than arithmetic, Frege expanded the notion of function to allow arguments and values other than numbers. He defined a concept (Begriff) as a function that has a truth-value, either of the abstract objects the True or the False, as its value for any object as argument. See below for more on Frege's understanding of concepts, functions and objects. The concept being human is understood as a function that has the True as value for any argument that is human, and the False as value for anything else. Suppose that "H( )" stands for this concept, and "a" is a constant for Aristotle, and "b" is a constant for the city of Boston. Then "H(a)" stands for the True, while "H(b)" stands for the False. In Frege's terminology, an object for which a concept has the True as value is said to "fall under" the concept.

The values of such concepts could then be used as arguments to other functions. In his own logical systems, Frege introduced signs standing for the negation and conditional functions. His own logical notation was two-dimensional. However, let us instead replace Frege's own notation with more contemporary notation. For Frege, the conditional function, "→" is understood as a function the value of which is the False if its first argument is the True and the second argument is anything other than the True, and is the True otherwise. Therefore, "H(b) → H(a)" stands for the True, while "H(a) → H(b)" stands for the False. The negation sign "~" stands for a function whose value is the True for every argument except the True, for which its value is the False. Conjunction and disjunction signs could then be defined from the negation and conditional signs. Frege also introduced an identity sign, standing for a function whose value is the True if the two arguments are the same object, and the False otherwise, and a sign, which he called "the horizontal," namely "—", that stands for a function that has the True as value for the True as argument, and has the False as value for any other argument.

Variables and quantifiers are used to express generalities. Frege understands quantifiers as "second-level concepts". The distinction between levels of functions involves what kind of arguments the functions take. In Frege's view, unlike objects, all functions are "unsaturated" insofar as they require arguments to yield values. But different sorts of functions require different sorts of arguments. Functions that take objects as argument, such as those referred to by "( ) + ( )" or "H( )", are called first-level functions. Functions that take first-level functions as argument are called second-level functions. The quantifier, "∀x(...x...)", is understood as standing for a function that takes a first-level function as argument, and yields the True as value if the argument-function has the True as value for all values of x, and has the False as value otherwise. Thus, "∀xH(x)" stands for the False, since the concept H( ) does not have the True as value for all arguments. However, "∀x[H(x) → H(x)]" stands for True, since the complex concept H( ) → H( ) does have the True as value for all arguments. The existential quantifier, now written "∃x(...x...)" is defined as "~∀x~(...x...)".

Those familiar with modern predicate logic will recognize the parallels between it and Frege's logic. Frege is often credited with having founded predicate logic. However, Frege's logic is in some ways different from modern predicate logic. As we have seen, a sign such as "H( )" is a sign for a function in the strictest sense, as are the conditional and negation connectives. Frege's conditional is not, like the modern connective, something that flanks statements to form a statement. Rather, it flanks terms for truth-values to form a term for a truth-value. Frege's "H(b) → H(a)" is simply a name for the True, by itself it does not assert anything. Therefore, Frege introduces a sign he called the "judgment stroke", ⊢, used to assert that what follows it stands for the True. Thus, while "H(b) → H(a)" is simply a term for a truth-value, "⊢ H(b) → H(a)" asserts that this truth-value is the True, or in this case, that if Boston is human, then Aristotle is human. Moreover, Frege's logical system was second-order. In addition to quantifiers ranging over objects, it also contained quantifiers ranging over first-level functions. Thus, "⊢∀xF[F(x)]" asserts that every object falls under at least one concept.

Frege's logic took the form of an axiomatic system. In fact, Frege was the first to take a fully axiomatic approach to logic, and the first even to suggest that inference rules ought to be explicitly formulated and distinguished from axioms. He began with a limited number of fixed axioms, introduced explicit inference rules, and aimed to derive all other logical truths (including, for him, the truths of arithmetic) from them. Frege's first logical system, that of the 1879 Begriffsschrift, had nine axioms (one of which was not independent), one explicit inference rule, and also employed a second and third inference rule implicitly. It represented the first axiomatization of logic, and was complete in its treatment of both propositional logic and first-order quantified logic. Unlike Frege's later system, the system of the Begriffsschrift was fully consistent. (It has since been proven impossible to devise a system for higher-order logic with a finite number of axioms that is both complete and consistent.)

In order to make deduction easier, in the 1893 logical system of the Grundgesetze, Frege used fewer axioms and more inference rules: seven and twelve, respectively, this time leaving nothing implicit. The Grundgesetze also expanded upon the system of the Begriffsschrift by adding axioms governing what Frege called the "value-ranges" (Werthverlaüfe) of functions, understood as objects corresponding to the complete argument-value mappings generated by functions. In the case of concepts, their value-ranges were identified with their extensions. While Frege did sometimes also refer to the extensions of concepts as "classes", he did not conceive of such classes as aggregates or collections. They were simply understood as objects corresponding to the complete argument-value mappings generated by concepts considered as functions. Frege then introduced two axioms dealing with these value-ranges. Most infamous was his Basic Law V, which asserts that the truth-value of the value-range of function F being identical to the value-range of function G is the same as the truth-value of F and G having the same value for every argument. If one conceives of value-ranges as argument-value mappings, then this certainly seems to be a plausible hypothesis. However, from it, it is possible to prove a strong theorem of class membership: that for any object x, that object is in the extension of concept F if and only if the value of F for x as argument is the True. Given that value-ranges themselves are taken to be objects, if the concept in question is that of being a extension of a concept not included in itself, one can conclude that the extension of this concept is in itself just in case it is not. Therefore, the logical system of the Grundgesetze was inconsistent due to Russell's Paradox. See the entry on Russell's Paradox for more details. However, the core of the system of the Grundgesetze, that is, the system minus the axioms governing value-ranges, is consistent and, like the system of the Begriffsschrift, is complete in its treatment of propositional logic and first-order predicate logic.

Given the extent to which it is taken granted today, it can be difficult to fully appreciate the truly innovative and radical approach Frege took to logic. Frege was the first to attempt to transcribe the old statements of categorical logic in a language employing variables, quantifiers and truth-functions. Frege was the first to understand a statement such as "all students are hardworking" as saying roughly the same as, "for all values of x, if x is a student, then x is hardworking". This made it possible to capture the logical connection between statements such as "either all students are hardworking or all students are intelligent" and "all students are either hardworking or intelligent" (for example, that the first implies the second). In earlier logical systems such as that of Boole, in which the propositional and quantificational elements were bifurcated, the connection was wholly lost. Moreover, Frege's logical system was the first to be able to capture statements of multiple generality, such as "every person loves some city" by using multiple quantifiers in the same logical formula. This too was impossible in all earlier logical systems. Indeed, Frege's "firsts" in logic are almost too numerous to list. We have seen here that he invented modern quantification theory, presented the first complete axiomatization of propositional and first-order "predicate" logic (the latter of which he invented outright), attempted the first formulation of higher-order logic, presented the first coherent and full analysis of variables and functions, first showed it possible to reduce all truth-functions to negation and the conditional, and made the first clear distinction between axioms and inference rules in a formal system. As we shall see, he also made advances in the logic of mathematics. It is small wonder that he is often heralded as the founder of modern logic.

On Frege's "philosophy of logic", logic is made true by a realm of logical entities. Logical functions, value-ranges, and the truth-values the True and the False, are thought to be objectively real entities, existing apart from the material and mental worlds. (As we shall see below, Frege was also committed to other logical entities such as senses and thoughts.) Logical axioms are true because they express true thoughts about these entities. Thus, Frege denied the popular view that logic is without content and without metaphysical commitment. Frege was also a harsh critic of psychologism in logic: the view that logical truths are truths about psychology. While Frege believed that logic might prescribe laws about how people should think, logic is not the science of how people do think. Logical truths would remain true even if no one believed them nor used them in their reasoning. If humans were genetically designed to use regularly the so-called "inference rule" of affirming the consequent, etc., this would not make it logically valid. What is true or false, valid of invalid, does not depend on anyone's psychology or anyone's beliefs. To think otherwise is to confuse something's being true with something's being-taken-to-be-true.

3. Contributions to the Philosophy of Mathematics

Frege was an ardent proponent of logicism, the view that the truths of arithmetic are logical truths. Perhaps his most important contributions to the philosophy of mathematics were his arguments for this view. He also presented significant criticisms against rival views. We have seen that Frege was a harsh critic of psychologism in logic. He thought similarly about psychologism in mathematics. Numbers cannot be equated with anyone's mental images, nor truths of mathematics with psychological truths. Mathematical truths are objective, not subjective. Frege was also a critic of Mill's view that arithmetical truths are empirical truths, based on observation. Frege pointed out that it is not just observable things that can be counted, and that mathematical truths seem to apply also to these things. On Mill's view, numbers must be taken to be conglomerations of objects. Frege rejects this view for a number of reasons. Firstly, is one conglomeration of two things the same as a different conglomeration of two things, and if not, in what sense are they equal? Secondly, a conglomeration can be seen as made up of a different number of things, depending on how the parts are counted. One deck of cards contains fifty two cards, but each card consists of a multitude of atoms. There is no one uniquely determined "number" of the whole conglomeration. He also reiterated the arguments of others: that mathematical truths seem apodictic and knowable a priori. He also argued against the Kantian view that arithmetic truths are based on the pure intuition of the succession of time. His main argument against this view, however, was simply his own work in which he showed that truths about the nature of succession and sequence can be proven purely from the axioms of logic.

Frege was also an opponent of formalism, the view that arithmetic can be understood as the study of uninterpreted formal systems. While Frege's logical language represented a kind of formal system, he insisted that his formal system was important only because of what its signs represent and its propositions mean. The signs themselves, independently of what they mean, are unimportant. To suggest that mathematics is the study simply of the formal system, is, in Frege's eyes, to confuse the sign and thing signified. To suggest that arithmetic is the study of formal systems also suggests, absurdly, that the formula "5 + 7 = 12", written in Arabic numerals, is not the same truth as the formula, "V + VII = XII", written in Roman numerals. Frege suggests also that this confusion would have the absurd result that numbers simply are the numerals, the signs on the page, and that we should be able to study their properties with a microscope.

Frege suggests that rival views are often the result of attempting to understand the meaning of number terms in the wrong way, for example, in attempting to understand their meaning independently of the contexts in which they appear in sentences. If we are simply asked to consider what "two" means independently of the context of a sentence, we are likely to simply imagine the numeral "2", or perhaps some conglomeration of two things. Thus, in the Grundlagen, Frege espouses his famous context principle, to "never ask for the meaning of a word in isolation, but only in the context of a proposition." The Grundlagen is an earlier work, written before Frege had made the distinction between sense and reference (see below). It is an active matter of debate and discussion to what extent and how this principle coheres with Frege's later theory of meaning, but what is clear is that it plays an important role in his own philosophy of mathematics as described in the Grundlagen.

According to Frege, if we look at the contexts in which number words usually occur in a proposition, they appear as part of a sentence about a concept, specifically, as part of an expression that tells us how many times a certain concept is instantiated. Consider, for example, "I have six cards in my hand" or "There are 11 members of congress from Wisconsin." These propositions seem to tell us how many times the concepts of being a card in my hand and being a member of congress from Wisconsin are instantiated. Thus, Frege concludes that statements about numbers are statements about concepts. This insight was very important for Frege's case for logicism, as Frege was able to show that it is possible to define what it means for a concept to be instantiated a certain number of times purely logically by making use of quantifiers and identity. To say that the concept F is instantiated zero times is to say that there are no objects that instantiate F, or, equivalently, that everything does not instantiate F. To say that F is instantiated one time is to say there is an object x that instantiates F, and that for all objects y, either y does not instantiate F or y is x. To say that F is instantiated twice is to say that there are two objects, x and y, each of which instantiates F, but which are not the same as each other, and for all z, either z does not instantiate F, or z is x or z is y. One could then consider numbers as "second-level concepts", or concepts of concepts, which can be defined in purely logical terms. (For more on the distinction of levels of concepts, see above.)

Frege, however, does not leave his analysis of numbers there. Understanding number-claims as involving second-level concepts does give us some insight into the nature of numbers, but it cannot be left at this. Mathematics requires that numbers be treated as objects, and that we be able to provide a definition of the number "two" simpliciter, without having to speak of two Fs. For this purpose, Frege appeals to his theory of the value-ranges of concepts. On the notion of a value-range, see above. We saw above that we can gain some understanding of number claims as involving second-level concepts, or concepts of concepts. In order to find a definition of numbers as objects, Frege treats them instead as value-ranges of value-ranges. Exactly, however, are they to be understood?

Frege notes that we have an understanding of what it means to say that there are the same number of Fs as there are Gs. It is to say that there is a one-one mapping between the objects that instantiate F and the objects instantiating G, i.e. that there is some function f from entities that instantiate F onto entities that instantiate G such that there is a different F for every G, and a different G for every F, with none left over. (In this, Frege's views on the nature of cardinality were in part anticipated by Georg Cantor.) However, we must bear in mind that the propositions:

(1) There are equally many Fs as there are Gs.
(2) The number of Fs = the number of Gs

must obviously have the same truth-value, as they seem to express the same fact. We must, therefore, look for a way of understanding the phrase "the number of Fs" that occurs in (2) that makes clear how and why the whole proposition will be true or false for the same reason as (1) is true or false. Frege's suggestion is that "the number of Fs" means the same as "the value-range of the concept being a value-range of a concept instantiated equally many times as F." This means that the number of Fs is a certain value-range, containing value-ranges, and in particular, all those value-ranges that have as many members as there are Fs. Then (2) is understood as saying the same as "the value-range of the concept being a value-range of a concept instantiated equally many times as F = the value-range of the concept being a value-range of a concept instantiated equally many times as G", which will be true if and only if there are equally many Fs as Gs, i.e. if every value-range of a concept instantiated equally many times as F is also a value-range of a concept instantiated equally many times as G.

To give some examples, if there are zero Fs, then the number of Fs, i.e. zero, is the value-range consisting of all value-ranges with no members. Recall that for Frege, classes are identified with value-ranges of concepts. (See above.) To rephrase the same point in terms of classes, zero is the class of all classes with no members. Since there is only one such class, zero is the class containing only the empty class. If there is one F, then the number of Fs, i.e. one, is the class consisting of all classes with one member (the extensions of concepts instantiated once). Here we can see the connection with the understanding of number expressions as being statements about concepts. Rather than understanding zero as the concept a concept has just in case it is not instantiated, zero is understood as the value-range consisting of value-ranges of concepts that are not instantiated. Rather than understanding one as the concept a concept has just in case it is instantiated by a unique object, it is understood as the value-range consisting of value-ranges of concepts instantiated by unique objects. This allows us to understand numbers as abstract objects, and provide a clear definition of the meaning of number signs in arithmetic such as "1", "2", "3", etc.

Some of Frege's most brilliant work came in providing definitions of the natural numbers in his logical language, and in proving some of their properties therein. After laying out the basic laws of logic, and defining axioms governing the truth-functions and value-ranges, etc., Frege begins by defining a relation that holds between two value-ranges just in case they are the value-ranges of concepts instantiated equally many times. This relation holds between value-ranges just in case they are the same size, i.e. just in case there is one-one correspondence between the entities that fall under their concepts. Using this, he then defines a function that takes a value-range as argument and yields as value the value-range consisting of all value-ranges the same size as it. The number zero is then defined as the value-range consisting of all value-ranges the same size as the value-range of the concept being non-self-identical. Since this concept is not instantiated, zero is defined as the value-range of all value-ranges with no members, as described above. There is only one such number zero. Since this is true, then the concept of being identical to zero is instantiated once. Frege then uses this to define one. One is defined as the value-range of all value-ranges equal in size to the value-range of the concept being identical to zero. Having defined one is this way, Frege is able to define two. He has already defined one and zero; they are each unique, but different from each other. Therefore, two can be defined as the value-range of all value-ranges equal in size to the value-range of the concept being identical to zero or identical to one. Frege is able to define all natural numbers in this way, and indeed, prove that there are infinitely many of them. Each natural number can be defined in terms of the previous one: for each natural number n, its successor (n + 1) can be defined as the value-range of all value-ranges equal in size to the value-range of the concept of being identical to one of the numbers between zero and n.

In the Begriffsschrift, Frege had already been able to prove certain results regarding series and sequences, and was able to define the ancestral of a relation. To understand the ancestral of a relation, consider the example of the relation of being the child of. A person x bears this relation to y just in case x is y's child. However, x falls in the ancestral of this relation with respect to y just in case x is the child of y, or is the child of y's child, or is the child of y's child's child, etc. Frege was able to define the ancestral of relations logically even in his early work. He put this to use in the Grundgesetze to define the natural numbers. We have seen how the notion of successorship can be defined for Frege, i.e. the relation n + 1 bears to n. The natural numbers can be defined as the value-range of all value-ranges that fall under the ancestral of the successor relation with respect to zero. The natural numbers then consist of zero, the successor of zero (one), the successor of the successor of zero (two), and so on ad infinitum. Frege was then able to use this definition of the natural numbers to provide a logical analysis of mathematical induction, and prove that mathematical induction can be used validly to demonstrate the properties of the natural numbers, an extremely important result for making good on his logicist ambitions. Frege could then use mathematical induction to prove some of the basic laws of the natural numbers. Frege next turned his logicist method to an analysis of integers (including negative numbers) and then to the real numbers, defining them using the natural numbers and certain relations holding between them. We need not dwell on the details of this work here.

Frege's approach to providing a logical analysis of cardinality, the natural numbers, infinity and mathematical induction were groundbreaking, and have had a lasting importance within mathematical logic. Indeed, prior to 1902, it must have seemed to him that he had been completely successful in showing that the basic laws of arithmetic could be understood purely as logical truths. However, as we have seen, Frege's definition of numbers heavily involves the notion of classes or value-ranges, but his logical treatment of them is shown to be impossible due to Russell's paradox. This presents a serious problem for Frege's logicist approach. Another heavy blow came after Frege's death. In 1931, Kurt Gödel discovered his famous incompleteness proof to the effect that there can be no consistent formal system with a finite number of axioms in which it is possible to derive all of the truths of arithmetic. This presents a serious blow to more ambitious forms of logicism, such as Frege's, which aimed to provide precisely the sort of system Gödel showed impossible. Nevertheless, it cannot be denied that Frege's work in the philosophy of mathematics was important and insightful.

4. The Theory of Sense and Reference

Frege's influential theory of meaning, the theory of sense (Sinn) and reference (Bedeutung) was first outlined, albeit briefly, in his article, "Funktion und Begriff" of 1891, and was expanded and explained in greater detail in perhaps his most famous work, "Über Sinn und Bedeutung" of 1892. In "Funktion und Begriff", the distinction between the sense and reference of signs in language is first made in regard to mathematical equations. During Frege's time, there was a widespread dispute among mathematicians as to how the sign, "=", should be understood. If we consider an equation such as, "4 x 2 = 11 - 3", a number of Frege's contemporaries, for a variety of reasons, were wary of viewing this as an expression of an identity, or, in this case, as the claim that 4 x 2 and 11 - 3 are one and the same thing. Instead, they posited some weaker form of "equality" such that the numbers 4 x 2 and 11 - 3 would be said to be equal in number or equal in magnitude without thereby constituting one and the same thing. In opposition to the view that "=" signifies identity, such thinkers would point out that 4 x 2 and 11 - 3 cannot in all ways be thought to be the same. The former is a product, the latter a difference, etc.

In his mature period, however, Frege was an ardent opponent of this view, and argued in favor of understanding "=" as identity proper, accusing rival views of confusing form and content. He argues instead that expressions such as "4 x 2" and "11 - 3" can be understood as standing for one and the same thing, the number eight, but that this single entity is determined or presented differently by the two expressions. Thus, he makes a distinction between the actual number a mathematical expression such as "4 x 2" stands for, and the way in which that number is determined or picked out. The former he called the reference (Bedeutung) of the expression, and the latter was called the sense (Sinn) of the expression. In Fregean terminology, an expression is said to express its sense, and denote or refer to its reference.

The distinction between reference and sense was expanded, primarily in "Über Sinn und Bedeutung" as holding not only for mathematical expressions, but for all linguistic expressions (whether the language in question is natural language or a formal language). One of his primary examples therein involves the expressions "the morning star" and "the evening star". Both of these expressions refer to the planet Venus, yet they obviously denote Venus in virtue of different properties that it has. Thus, Frege claims that these two expressions have the same reference but different senses. The reference of an expression is the actual thing corresponding to it, in the case of "the morning star", the reference is the planet Venus itself. The sense of an expression, however, is the "mode of presentation" or cognitive content associated with the expression in virtue of which the reference is picked out.

Frege puts the distinction to work in solving a puzzle concerning identity claims. If we consider the two claims:

(1) the morning star = the morning star

(2) the morning star = the evening star

The first appears to be a trivial case of the law of self-identity, knowable a priori, while the second seems to be something that was discovered a posteriori by astronomers. However, if "the morning star" means the same thing as "the evening star", then the two statements themselves would also seem to have the same meaning, both involving a thing's relation of identity to itself. However, it then becomes to difficult to explain why (2) seems informative while (1) does not. Frege's response to this puzzle, given the distinction between sense and reference, should be apparent. Because the reference of "the evening star" and "the morning star" is the same, both statements are true in virtue of the same object's relation of identity to itself. However, because the senses of these expressions are different--in (1) the object is presented the same way twice, and in (2) it is presented in two different ways--it is informative to learn of (2). While the truth of an identity statement involves only the references of the component expressions, the informativity of such statements involves additionally the way in which those references are determined, i.e. the senses of the component expressions.

So far we have only considered the distinction as it applies to expressions that name some object (including abstract objects, such as numbers). For Frege, the distinction applies also to other sorts of expressions and even whole sentences or propositions. If the sense/reference distinction can be applied to whole propositions, it stands to reason that the reference of the whole proposition depends on the references of the parts and the sense of the proposition depends of the senses of the parts. (At some points, Frege even suggests that the sense of a whole proposition is composed of the senses of the component expressions.) In the example considered in the previous paragraph, it was seen that the truth-value of the identity claim depends on the references of the component expressions, while the informativity of what was understood by the identity claim depends on the senses. For this and other reasons, Frege concluded that the reference of an entire proposition is its truth-value, either the True or the False. The sense of a complete proposition is what it is we understand when we understand a proposition, which Frege calls "a thought" (Gedanke). Just as the sense of a name of an object determines how that object is presented, the sense of a proposition determines a method of determination for a truth-value. The propositions, "2 + 4 = 6" and "the Earth rotates", both have the True as their references, though this is in virtue of very different conditions holding in the two cases, just as "the morning star" and "the evening star" refer to Venus in virtue of different properties.

In "Über Sinn und Bedeutung", Frege limits his discussion of the sense/reference distinction to "complete expressions" such as names purporting to pick out some object and whole propositions. However, in other works, Frege makes it quite clear that the distinction can also be applied to "incomplete expressions", which include functional expressions and grammatical predicates. These expressions are incomplete in the sense that they contain an "empty space", which, when filled, yields either a complex name referring to an object, or a complete proposition. Thus, the incomplete expression "the square root of ( )" contains a blank spot, which, when completed by an expression referring to a number, yields a complex expression also referring to a number, e.g., "the square root of sixteen". The incomplete expression, "( ) is a planet" contains an empty place, which, when filled with a name, yields a complete proposition. According to Frege, the references of these incomplete expressions are not objects but functions. Objects (Gegenstände), in Frege's terminology, are self-standing, complete entities, while functions are essentially incomplete, or as Frege says, "unsaturated" (ungesättigt) in that they must take something else as argument in order to yield a value. The reference of the expression "square root of ( )" is thus a function, which takes numbers as arguments and yields numbers as values. The situation may appear somewhat different in the case of grammatical predicates. However, because Frege holds that complete propositions, like names, have objects as their references, and in particular, the truth-values the True or the False, he is able to treat predicates also as having functions as their references. In particular, they are functions mapping objects onto truth-values. The expression, "( ) is a planet" has as its reference a function that yields as value the True when saturated by an object such as Saturn or Venus, but the False when saturated by a person or the number three. Frege calls such a function of one argument place that yields the True or False for every possible argument a "concept" (Begriff), and calls similar functions of more than one argument place (such as that denoted by "( ) > ( )", which is doubly in need of saturation), "relations".

It is clear that functions are to be understood as the references of incomplete expressions, but what of the senses of such expressions? Here, Frege tells us relatively little save that they exist. There is some amount of controversy among interpreters of Frege as to how they should be understood. It suffices here to note that just as the same object (e.g. the planet Venus), can be presented in different ways, so also can a function be presented in different ways. While "identity", as Frege uses the term, is a relation holding only between objects, Frege believes that there is a relation similar to identity that holds between functions just in case they always share the same value for every argument. Since all and only those things that have hearts have kidneys, strictly speaking, the concepts denoted by the expressions "( ) has a heart", and "( ) has a kidney" are one and the same. Clearly, however, these expressions do not present that concept in the same way. For Frege, these expressions would have different senses but the same reference. Frege also tells us that it is the incomplete nature of these senses that provides the "glue" holding together the thoughts of which they form a part.

Frege also uses the distinction to solve what appears to be a difficulty with Leibniz's law with regard to identity. This law was stated by Leibniz as, "those things are the same of which one can be substituted for another without loss of truth," a sentiment with which Frege was in full agreement. As Frege understands this, it means that if two expressions have the same reference, they should be able to replace each other within any proposition without changing the truth-value of that proposition. Normally, this poses no problem. The inference from:

(3) The morning star is a planet.

to the conclusion:

(4) The evening star is a planet.

in virtue of (2) above and Leibniz's law is unproblematically valid. However, there seem to be some serious counterexamples to this principle. We know for example that "the morning star" and "the evening star" have the same customary reference. However, it is not always true that they can replace one another without changing the truth of a sentence. For example, if we consider the propositions:

(5) Gottlob believes that the morning star is a planet.

(6) Gottlob believes that the evening star is a planet.

If we assume that Gottlob does not know that the morning star is the same heavenly body as the evening star, (5) may be true while (6) false or vice versa.

Frege meets this challenge to Leibniz's law by making a distinction between what he calls the primary and secondary references of expressions. Frege suggests that when expressions appear in certain unusual contexts, they have as their references what is customarily their senses. In such cases, the expressions are said to have their secondary references. Typically, such cases involve what Frege calls "indirect speech" or "oratio obliqua", as in the case of statements of beliefs, thoughts, desires and other so-called "propositional attitudes", such as the examples of (5) and (6). However, expressions also have their secondary references (for reasons which should already be apparent) in contexts such as "it is informative that..." or "... is analytically true".

Let us consider the examples of (5) and (6) more closely. To Frege's mind, these statements do not deal directly with the morning star and the evening star itself. Rather, they involve a relation between a believer and a thought believed. Thoughts, as we have seen, are the senses of complete propositions. Beliefs depend for their make-up on how certain objects and concepts are presented, not only on the objects and concepts themselves. The truth of belief claims, therefore, will depend not on the customary references of the component expressions of the stated belief, but their senses. Since the truth-value of the whole belief claim is the reference of that belief claim, and the reference of any proposition, for Frege, depends on the references of its component expressions, we are lead to the conclusion that the typical senses of expressions that appear in oratio obliqua are in fact the references of those expressions when they appear in that context. Such contexts can be referred to as "oblique contexts", contexts in which the reference of an expression is shifted from its customary reference to its customary sense.

In this way, Frege is able to actually retain his commitment in Leibniz's law. The expressions "the morning star" and "the evening star" have the same primary reference, and in any non-oblique context, they can replace each other without changing the truth-value of the proposition. However, since the senses of these expressions are not the same, they cannot replace each other in oblique contexts, because in such contexts, their references are non-identical.

Frege ascribes to senses and thoughts objective existence. In his mind, they are objects every bit as real as tables and chairs. Their existence is not dependent on language or the mind. Instead, they are said to exist in a timeless "third realm" of sense, existing apart from both the mental and the physical. Frege concludes this because, although senses are obviously not physical entities, their existence likewise does not depend on any one person's psychology. A thought, for example, has a truth-value regardless of whether or not anyone believes it and even whether or not anyone has grasped it at all. Moreover, senses are interpersonal. Different people are able to grasp the same senses and same thoughts and communicate them, and it is even possible for expressions in different languages to express the same sense or thought. Frege concludes that they are abstract objects, incapable of full causal interaction with the physical world. They are actual only in the very limited sense that they can have an effect on those who grasp them, but are themselves incapable of being changed or acted upon. They are neither created by our uses of language or acts of thinking, nor destroyed by their cessation.

Unfortunately, Frege does not tell us very much about exactly how these abstract objects pick out or present their references. Exactly what is it that makes a sense a "way of determining" or "mode of presenting" a reference? In the wake of Russell's theory of descriptions, a Fregean sense is often interpreted as a set of descriptive information or criteria that picks out its reference in virtue of the reference alone satisfying or fitting that descriptive information. In giving examples, Frege implies that a person might attach to the name "Aristotle" the sense the pupil of Plato and teacher of Alexander the Great. This sense picks out Aristotle the person because he alone matches this description. Here, care must be taken to avoid misunderstanding. The sense of the name "Aristotle" is not the words "the pupil of Plato and teacher of Alexander the Great"; to repeat, senses are not linguistic items. It is rather that the sense consists in some set of descriptive information, and this information is best described by a descriptive phrase of this form. The property of being the pupil of Plato and teacher of Alexander is unique to Aristotle, and thus, it may be in virtue of associating this information with the name "Aristotle" that this name may be used to refer to Aristotle. As certain commentators have noted, it is not even necessary that the sense of the name be expressible by some descriptive phrase, because the descriptive information or properties in virtue of which the reference is determined may not be directly nameable in any natural language.

From this standpoint, it is easy to understand how there might be senses that do not pick out any reference. Names such as "Romulus" or "Odysseus", and phrases such as "the least rapidly converging series" or "the present King of France" express senses, insofar as they lay out criteria that things would have to satisfy if they were to be the references of these expressions. However, there are no things which do in fact satisfy these criteria. Therefore, these expressions are meaningful, but do not have references. Because the sense of a whole proposition is determined by the senses of the parts, and the reference of a whole proposition is determined by the parts, Frege claims that propositions in which such expressions appear are able to express thoughts, but are neither true nor false, because no references are determined for them.

This interpretation of the nature of senses makes Frege a forerunner to what has since been come to be known as the "descriptivist" theory of meaning and reference in the philosophy of language. The view that the sense of a proper name such as "Aristotle" could be descriptive information as simple as the pupil of Plato and teacher of Alexander the Great, however, has been harshly criticized by many philosophers, and perhaps most notably by Saul Kripke. Kripke points out that this would make a claim such as "Aristotle taught Alexander" seem to be a necessary and analytic truth, which it does not appear to be. Moreover, he claims that many of us seem to be able to use a name to refer to an individual even if we are unaware of any properties uniquely held by that individual. For example, many of us don't know enough about the physicist Richard Feynman to be able to identify a property differentiating him from other prominent physicists such as Murray Gell-Mann, but we still seem to be able to refer to Feynman with the name "Feynman". John Searle, Michael Dummett and others, however, have proposed ways of expanding or altering Frege's notion of a sense to circumvent Kripke's worries. This has lead to a very important debate in the philosophy of language, which, unfortunately, we cannot fully discuss here.

5. References and Further Reading

a. Frege's Own Works

  • "Antwort auf die Ferienplauderei des Herrn Thomae." Jahresbericht der Deutschen Mathematiker-Vereinigung 15 (1906): 586-90. Translated as "Reply to Thomae's Holiday Causerie." In Collected Papers on Mathematics, Logic and Philosophy [CP], 341-5. Translated by M. Black, V. Dudman, P. Geach, H. Kaal, E.-H. W. Kluge, B. McGuinness and R. H. Stoothoff. New York: Basil Blackwell, 1984.
  • "Über Begriff und Gegenstand." Vierteljahrsschrift für wissenschaftliche Philosophie 16 (1892): 192-205. Translated as "On Concept and Object." In >CP 182-94. Also in The Frege Reader [FR], 181-93. Edited by Michael Beaney. Oxford: Blackwell, 1997. And In Translations from the Philosophical Writings of Gottlob Frege [TPW], 42-55. 3d ed. Edited by Peter Geach and Max Black. Oxford: Blackwell, 1980.
  • Begriffsschrift, eine der arithmetischen nachgebildete Formelsprache des reinen Denkens. Halle: L. Nebert, 1879. Translated as Begriffsschrift, a Formula Language, Modeled upon that of Arithmetic, for Pure Thought. In From Frege to Gödel, edited by Jean van Heijenoort. Cambridge, MA: Harvard University Press, 1967. Also as Conceptual Notation and Related Articles. Edited and translated by Terrell W. Bynum. London: Oxford University Press, 1972.
  • "Über die Begriffsschrift des Herrn Peano und meine eigene." Verhandlungen der Königlich Sächsischen Gesellschaft der Wissenschaften zu Leipzig 48 (1897): 362-8. Translated as "On Mr. Peano's Conceptual Notation and My Own." In CP 234-48.
  • "Über formale Theorien der Arithmetik." Sitzungsberichte der Jenaischen Gesellschaft für Medizin und Naturwissenschaft 19 (1885): 94-104. Translated as "On Formal Theories of Arithmetic." In CP 112-21.
  • Funktion und Begriff. Jena: Hermann Pohle, 1891. Translated as "Function and Concept." In CP 137-56, TPW 21-41 and FR 130-48.
  • "Der Gedanke." Beträge zur Philosophie des deutschen Idealismus 1 (1918-9): 58-77. Translated as "Thoughts." In CP 351-72. Also as part I of Logical Investigations [LI], edited by P. T. Geach. Oxford: Blackwell, 1977. And as "Thought." In FR 325-45.
  • "Gedankengefüge." Beträge zur Philosophie des deutschen Idealismus 3 (1923): 36-51. Translated as "Compound Thoughts." In CP 390-406, and as part III of LI.
  • Über eine geometrische Darstellung der imaginären Gebilde in der Ebene. Ph. D. Dissertation: University of Göttingen, 1873. Translated as "On a Geometrical Representation of Imaginary Forms in the Plane." In CP 1-55.
  • Grundgesetze der Arithmetik. 2 vols. Jena: Hermann Pohle, 1893-1903. Translated in part as The Basic Laws of Arithmetic: Exposition of the System. Edited and translated by Montgomery Furth. Berkeley: University of California Press, 1964.
  • "Über die Grundlagen der Geometrie." Jahresbericht der Deutschen Mathematiker-Vereinigung 12 (1903): 319-24, 368-75, 15 (1906): 293-309, 377-403, 423-30. Translated as "On the Foundations of Geometry." In CP 273-340. Also as On the Foundations of Geometry and Formal Theories of Arithmetic. Translated by Eike-Henner W. Kluge. New York: Yale University Press, 1971.
  • Die Grundlagen der Arithmetik, eine logisch mathematische Untersuchung über den Begriff der Zahl. Breslau: W. Koebner, 1884. Translated as The Foundations of Arithmetic: A Logico-Mathematical Enquiry into the Concept of Number. 2d ed. Translated by J. L. Austin. Oxford: Blackwell, 1953.
  • "Kritische Beleuchtung einiger Punkte in E. Schröders Vorlesungen über die Algebra der Logik." Archiv für systematsche Philosophie 1 (1895): 433-56. Translated as "A Critical Elucidation of Some Points in E. Schröder, Vorlesungen über die Algebra der Logik." In CP 210-28, and TPW 86-106.
  • Nachgelassene Schriften. Hamburg: Felix Meiner, 1969. Translated as Posthumous Writings. Translated by Peter Long and Roger White. Chicago: University of Chicago Press, 1979.
  • "Le nombre entier." Revue de Métaphysique et de Morale 3 (1895): 73-8. Translated as "Whole Numbers." In CP 229-33.
  • Rechnungsmethoden, die auf eine Erweiterung des Grössenbegriffes gründen. Habilitationsschrift: University of Jena, 1874. Translated as "Methods of Calculation based on an Extension of the Concept of Quantity." In CP 56-92.
  • Review of Zur Lehre vom Transfiniten, by Georg Cantor. Zeitschrift für Philosophie und philosophische Kritik 100 (1892): 269-72. Translated in CP 178-181.
  • Review of Philosophie der Arithmetik, by Edmund Husserl. Zeitschrift für Philosophie und philosophische Kritik 103 (1894): 313-32. Translated in CP 195-209.
  • "Über Sinn und Bedeutung." Zeitschrift für Philosophie und philosophische Kritik 100 (1892): 25-50. Translated as "On Sense and Meaning." In CP 157-77. As "On Sinn and Bedeutung." In FR 151-71. And as "On Sense and Reference." In TPW 56-78.
  • "Über das Trägheitsgesetz." Zeitschrift für Philosophie und philosophische Kritik 98 (1891): 145-61. Translated as "On the Law of Inertia." In CP 123-36.
  • "Die Unmöglichkeit der Thomaeschen formalen Arithmetik aus Neue nachgewiesen." Jahresbericht der Deutschen Mathematiker-Vereinigung 17 (1908): 52-5. Translated as "Renewed Proof of the Impossibility of Mr. Thomae's Formal Arithmetic." In CP 346-50.
  • "Der Verneinung." Beträge zur Philosophie des deutschen Idealismus 1 (1918-9): 143-57. Translated as "Negation." In CP 373-89, part II of LI, and FR 346-61.
  • "Was ist ein Funktion?" In Festschrift Ludwig Boltzmann gewidmet zum sechzigsten Geburtstage, 656-66. Leipzig: Amrosius Barth, 1904. Translated as "What is a Function?" In CP 285-92, and TPW 285-92.
  • Wissenschaftlicher Briefwechsel. Hamburg: Felix Meiner, 1976. Translated as Philosophical and Mathematical Correspondence. Translated by Hans Kaal. Chicago: University of Chicago Press, 1980.
  • Über die Zahlen des Herrn H. Schubert. Jena: Hermann Pohle, 1899. Translated as "On Mr. H. Schubert's Numbers." In CP 249-72.

b. Important Secondary Works

  • Angelelli, Ignacio. Studies on Gottlob Frege and Traditional Philosophy. Dordrecht: D. Reidel, 1967.
  • Baker, G. P. and P. M. S. Hacker. Frege: Logical Excavations. New York: Oxford University Press, 1984.
  • Beaney, Michael. Frege: Making Sense. London: Duckworth, 1996.
  • Beaney, Michael. Introduction to The Frege Reader, by Gottlob Frege. Oxford: Blackwell, 1997.
  • Bell, David. Frege's Theory of Judgment. New York: Oxford University Press, 1979.
  • Bynum, Terrell W. "On the Life and Work of Gottlob Frege. " Introduction to Conceptual Notation and Related Articles, by Gottlob Frege. London: Oxford University Press, 1972.
  • Carl, Wolfgang. Frege's Theory of Sense and Reference. Cambridge: Cambridge University Press, 1994.
  • Carnap, Rudolph. Meaning and Necessity. 2d ed. Chicago: University of Chicago Press, 1956.
  • Church, Alonzo. "A Formulation of the Logic of Sense and Denotation." In Structure, Method and Meaning: Essays in Honor of Henry M. Sheffer, edited by P. Henle, H. Kallen and S. Langer, 3- 24. New York: Liberal Arts Press, 1951.
  • Currie, Gregory. Frege: An Introduction to His Philosophy. Totowa, NJ: Barnes and Noble, 1982.
  • Dummett, Michael. Frege: Philosophy of Language. 2d ed. Cambridge, MA: Harvard University Press, 1981.
  • Dummett, Michael. Frege: Philosophy of Mathematics. Cambridge, MA: Harvard University Press, 1991.
  • Dummett, Michael. Frege and Other Philosophers. Oxford: Oxford University Press, 1991.
  • Dummett, Michael. The Interpretation of Frege's Philosophy. Cambridge, MA: Harvard University Press, 1981.
  • Geach, Peter T. "Frege." In Three Philosophers, edited by G. E. M. Anscombe and P. T. Geach, 127-62. Oxford: Oxford University Press, 1961.
  • Gödel, Kurt. "On Formally Undecidable Propositions of Principia Mathematica and Related Systems I." In From Frege to Gödel, edited by Jan van Heijenoort, 596-616. Cambridge, MA: Harvard University Press, 1967. Originally published as "Über formal unentscheidbare Sätze der Principia Mathematica und verwandter Systeme I." Monatshefte für Mathematik und Physik 38 (1931): 173-98.
  • Grossmann, Reinhardt. Reflections on Frege's Philosophy. Evanston: Northwestern University Press, 1969.
  • Haaparanta, Leila and Jaakko Hintikka, eds. Frege Synthesized. Boston: D. Reidel, 1986.
  • Kaplan, David. "Quantifying In." Synthese 19 (1968): 178-214.
  • Klemke, E. D., ed. Essays on Frege. Urbana: University of Illinois Press, 1968.
  • Kluge, Eike-Henner W. The Metaphysics of Gottlob Frege. Boston: Martinus Nijhoff, Boston, 1980.
  • Kneale, William and Martha Kneale. The Development of Logic. London: Oxford University Press, 1962.
  • Kripke, Saul. Naming and Necessity. Cambridge, MA: Harvard University Press, 1980. First published in Semantics of Natural Languages. Edited by Donald Davidson and Gilbert Harman. Dordrecht: D. Reidel, 1972.
  • Linsky, Leonard. Oblique Contexts. Chicago: University of Chicago Press, 1983.
  • Resnik, Michael D. Frege and the Philosophy of Mathematics. Ithaca: Cornell University Press, 1980.
  • Ricketts, Thomas G., ed. The Cambridge Companion to Frege. Cambridge: Cambridge University Press, forthcoming.
  • Russell, Bertrand. "The Logical and Arithmetical Doctrines of Frege." In The Principles of Mathematics, Appendix A. 1903. 2d. ed. Reprint, New York: W. W. Norton & Company, 1996.
  • Russell, Bertrand. "On Denoting." Mind 14 (1905): 479-93.
  • Salmon, Nathan. Frege's Puzzle. Cambridge: MIT Press, 1986.
  • Schirn. Matthias, ed. Logik und Mathematik: Frege Kolloquium 1993. Hawthorne: de Gruyter, 1995.
  • Schirn. Matthias, ed. Studien zu Frege. 3 vols. Stuttgart-Bad Cannstatt: Verlag-Holzboog, 1976.
  • Searle, John R. Intentionality: An Essay in the Philosophy of Mind. Cambridge: Cambridge University Press, 1983.
  • Sluga, Hans. "Frege and the Rise of Analytic Philosophy." Inquiry 18 (1975): 471-87.
  • Sluga, Hans. Gottlob Frege. Boston: Routledge & Kegan Paul, 1980.
  • Sluga, Hans. The Philosophy of Frege. 4 vols. New York: Garland Publishing, 1993.
  • Sternfeld, Robert. Frege's Logical Theory. Carbondale: Southern Illinois University Press, 1966.
  • Thiel, Christian. Sense and Reference in Frege's Logic. Translated by T. J. Blakeley. Dordrecht: D. Reidel, 1968.
  • Tichý, Pavel. The Foundations of Frege's Logic. New York: Walter de Gruyter, 1988.
  • Walker, Jeremy D. B. A Study of Frege. London: Oxford University Press, 1965.
  • Weiner, Joan. Frege in Perspective. Ithaca: Cornell University Press, 1990.
  • Wright, Crispin. Frege's Conception of Numbers as Objects. Aberdeen: Aberdeen University Press, 1983.
  • Wright, Crispin. Frege: Tradition and Influence. Oxford: Blackwell, 1984.

Author Information

Kevin C. Klement
Email: klement@philos.umass.edu
University of Massachusetts, Amherst
U. S. A.

Deductive-Theoretic Conceptions of Logical Consequence

According to the deductive-theoretic conception of logical consequence, a sentence X is a logical consequence of a set K of sentences if and only if X is a deductive consequence of K, that is, X is deducible or provable from K. Deductive consequence is clarified in terms of the notion of proof in a correct deductive system. Since, arguably, logical consequence conceived deductive-theoretically is not a compact relation and deducibility in a deductive system is, there are languages for which deductive consequence cannot be defined in terms of deducibility in a correct deductive system. However, it is true that if a sentence is deducible in a correct deductive system from other sentences, then the sentence is a deductive consequence of them. A deductive system is correct only if its rules of inference correspond to intuitively valid principles of inference. So whether or not a natural deductive system is correct brings into play rival theories of valid principles of inference such as classical, relevance, intuitionistic, and free logics.

Table of Contents

  1. Introduction
  2. Linguistic Preliminaries: the Language M
    1. Syntax of M
    2. Semantics for M
  3. What is a Logic?
  4. Deductive System N
  5. The Status of the Deductive Characterization of Logical Consequence in Terms of N
    1. Tarski's argument that the model-theoretic characterization of logical consequence is more basic than its characterization in terms of a deductive system
    2. Is deductive system N correct?
      1. Relevance logic
      2. Intuitionistic logic
      3. Free logic
  6. Conclusion
  7. References and Further Reading

1. Introduction

According to the deductive-theoretic conception of logical consequence, a sentence X is a logical consequence of a set K of sentences if and only if X is a deductive consequence of K, that is, X is deducible from K. X is deducible from K just in case there is an actual or possible deduction of X from K. In such a case, we say that X may be correctly inferred from K or that it would be correct to conclude X from K. A deduction is associated with a pair ; the set K of sentences is the basis of the deduction, and X is the conclusion. A deduction from K to X is a finite sequence S of sentences ending with X such that each sentence in S (that is, each intermediate conclusion) is derived from a sentence (or more) in K or from previous sentences in S in accordance with a correct principle of inference. The notion of a deduction is clarified by appealing to a deductive system. A deductive system D is a collection of rules that govern which sequences of sentences, associated with a given , are allowed and which are not. Such a sequence is called a proof in D (or, equivalently, a deduction in D) of X from K. The rules must be such that whether or not a given sequence associated with qualifies as a proof in D of X from K is decidable purely by inspection and calculation. That is, the rules provide a purely mechanical procedure for deciding whether a given object is a proof in D of X from K. We write

K ⊢D X

to mean

X is deducible in deductive system D from K.

See the entry Logical Consequence, Philosophical Considerations for discussion of the interplay between the concepts of logical consequence and deductive consequence, and deductive systems. We say that a deductive system D is correct when for any K and X, proofs in D of X from K correspond to intuitively valid deductions. For a given language the deductive consequence relation is defined in terms of a correct deductive system D only if it is true that

X is a deductive consequence of K if and only if X is deducible in D from K.

Sundholm (1983) offers a thorough survey of three main types of deductive systems. In this article, a natural deductive system is presented that originates in the work of the mathematician Gerhard Gentzen (1934) and the logician Fredrick Fitch (1952). We will refer to the deductive system as N (for 'natural deduction'). For an in-depth introductory presentation of a natural deductive system very similar to N see Barwise and Etchemendy (2001). N is a collection of inference rules. A proof of X from K that appeals exclusively to the inference rules of N is a formal deduction or formal proof. We shall take a formal proof to be associated with a pair where K is a set of sentences from a first-order language M, which will be introduced below, and X is an M-sentence. The set K of sentences is the basis of the deduction, and X is the conclusion. We say that a formal deduction from K to X is a finite sequence S of sentences ending with X such that each sentence in S is either an assumption, deduced from a sentence (or more) in K, or deduced from previous sentences in S in accordance with one of N's inference rules.

Formal proofs are not only epistemologically significant for securing knowledge, but also the derivations making up formal proofs may serve as models of the informal deductive reasoning performed using sentences from language M. Indeed, a primary value of a formal proof is that it can serve as a model of ordinary deductive reasoning that explains the force of such reasoning by representing the principles of inference required to get to X from K.

Gentzen, one of the first logicians to present a natural deductive system, makes clear that a primary motive for the construction of his system is to reflect as accurately as possible the actual logical reasoning involved in mathematical proofs. He writes,

My starting point was this: The formalization of logical deduction especially as it has been developed by Frege, Russell, and Hilbert, is rather far removed from the forms of deduction used in practice in mathematical proofs...In contrast, I intended first to set up a formal system which comes as close as possible to actual reasoning. The result was a 'calculus of natural deduction'. (Gentzen 1934, p. 68)

Natural deductive systems are distinguished from other deductive systems by their usefulness in modeling ordinary, informal deductive inferential practices. Paraphrasing Gentzen, we may say that if one is interested in seeing logical connections between sentences in the most natural way possible, then a natural deductive system is a good choice for defining the deductive consequence relation.

The remainder of the article proceeds as follows. First, an interpreted language M is given. Next, we present the deductive system N and represent the deductive consequence relation in M. After discussing the philosophical significance of the deductive consequence relation defined in terms of N, we consider some standard criticisms of the correctness of deductive system N.

2. Linguistic Preliminaries: the Language M

Here we define a simple language M, a language about the McKeon family, by first sketching what strings qualify as well-formed formulas (wffs) in M. Next we define sentences from formulas, and then give an account of truth in M, that is we describe the conditions in which M-sentences are true.

a. Syntax of M

Building blocks of formulas

Terms

Individual names—'beth', 'kelly', 'matt', 'paige', 'shannon', 'evan', and 'w1', 'w2', 'w3', etc.

Variables—'x', 'y', 'z', 'x1', 'y1', 'z1', 'x2', 'y2', 'z2', etc.

Predicates

1-place predicates—'Female', 'Male'

2-place predicates—'Parent', 'Brother', 'Sister', 'Married', 'OlderThan', 'Admires', '='.

Blueprints of well-formed formulas (wffs)

Atomic formulas: An atomic wff is any of the above n-place predicates followed by n terms which are enclosed in parentheses and separated by commas.

Formulas: The general notion of a well-formed formula (wff) is defined recursively as follows:

(1) All atomic wffs are wffs.
(2) If α is a wff, so is ''.
(3) If α and β are wffs, so is '(α & β)'.
(4) If α and β are wffs, so is 'v β)'.
(5) If α and β are wffs, so is '(α → β)'.
(6) If Ψ is a wff and v is a variable, then 'vΨ' is a wff.
(7) If Ψ is a wff and v is a variable, then 'vΨ' is a wff.
Finally, no string of symbols is a well-formed formula of M unless the string can be derived from (1)-(7).

The signs '~', '&', 'v', and '→', are called sentential connectives. The signs '∀' and '∃' are called quantifiers.

It will prove convenient to have available in M an infinite number of individual names as well as variables. The strings 'Parent(beth, paige)' and 'Male(x)' are examples of atomic wffs. We allow the identity symbol in an atomic formula to occur in between two terms, e.g., instead of '=(evan, evan)' we allow '(evan = evan)'. The symbols '~', '&', 'v', and '→' correspond to the English words 'not', 'and', 'or' and 'if...then', respectively. '∃' is our symbol for an existential quantifier and '∀' represents the universal quantifier. 'vΨ' and 'vΨ' correspond to for some v, Ψ, and for all v, Ψ, respectively. For every quantifier, its scope is the smallest part of the wff in which it is contained that is itself a wff. An occurrence of a variable v is a bound occurrence iff it is in the scope of some quantifier of the form 'v' or the form 'v', and is free otherwise. For example, the occurrence of 'x' is free in 'Male(x)' and in '∃y Married(y, x)'. The occurrences of 'y' in the second formula are bound because they are in the scope of the existential quantifier. A wff with at least one free variable is an open wff, and a closed formula is one with no free variables. A sentence is a closed wff. For example, 'Female(kelly)' and '∃y∃x Married(y, x)' are sentences but 'OlderThan(kelly, y)' and '(∃x Male(x) & Female(z))' are not. So, not all of the wffs of M are sentences. As noted below, this will affect our definition of truth for M.

b. Semantics for M

We now provide a semantics for M. This is done in two steps. First, we specify a domain of discourse, that is, the chunk of the world that our language M is about, and interpret M's predicates and names in terms of the elements composing the domain. Then we state the conditions under which each type of M-sentence is true. To each of the above syntactic rules (1-7) there corresponds a semantic rule that stipulates the conditions in which the sentence constructed using the syntactic rule is true. The principle of bivalence is assumed and so 'not true' and 'false' are used interchangeably. In effect, the interpretation of M determines a truth-value (true, false) for each and every sentence of M.

Domain D—The McKeons: Matt, Beth, Shannon, Kelly, Paige, and Evan.

Here are the referents and extensions of the names and predicates of M.

Terms: 'matt' refers to Matt, 'beth' refers to Beth, 'shannon' refers to Shannon, etc.

Predicates. The meaning of a predicate is identified with its extension, that is the set (possibly empty) of elements from the domain D the predicate is true of. The extension of a one-place predicate is a set of elements from D, the extension of a two-place predicate is a set of ordered pairs of elements from D.

The extension of 'Male' is {Matt, Evan}.

The extension of 'Female' is {Beth, Shannon, Kelly, Paige}.

The extension of 'Parent' is {<Matt, Shannon>, <Matt, Kelly>, <Matt, Paige>, <Matt, Evan>, <Beth, Shannon>, <Beth, Kelly>, <Beth, Paige>, <Beth, Evan>}.

The extension of 'Married' is {<Matt, Beth>, <Beth, Matt>}.

The extension of 'Sister' is {<Shannon, Kelly>, <Kelly, Shannon>, <Shannon, Paige>, <Paige, Shannon>, <Kelly, Paige>, <Paige, Kelly>, <Kelly, Evan>, <Paige, Evan>, <Shannon, Evan>}.

The extension of 'Brother' is {<Evan, Shannon>, <Evan, Kelly>, <Evan, Paige>}.

The extension of 'OlderThan' is {<Beth, Matt>, <Beth, Shannon>, <Beth, Kelly>, <Beth, Paige>, <Beth, Evan>, <Matt, Shannon>, <Matt, Kelly>, <Matt, Paige>, <Matt, Evan>, <Shannon, Kelly>, <Shannon, Paige>, <Shannon, Evan>, <Kelly, Paige>, <Kelly, Evan>, <Paige, Evan>}.

The extension of 'Admires' is {<Matt, Beth>, <Shannon, Matt>, <Shannon, Beth>, <Kelly, Beth>, <Kelly, Matt>, <Kelly, Shannon>, <Paige, Beth>, <Paige, Matt>, <Paige, Shannon>, <Paige, Kelly>, <Evan, Beth>, <Evan, Matt>, <Evan, Shannon>, <Evan, Kelly>, <Evan, Paige>}.

The extension of '=' is {<Matt, Matt>, <Beth, Beth>, <Shannon, Shannon>, <Kelly, Kelly>, <Paige, Paige>, <Evan, Evan>}.

The atomic sentence 'Female(kelly)' is true because, as indicated above, the referent of 'kelly' is in the extension of the property designated by 'Female'. The atomic sentence 'Married(shannon, kelly)' is false because the ordered pair is not in the extension of the relation designated by 'Married'.

(I) An atomic sentence with a one-place predicate is true iff the referent of the term is a member of the extension of the predicate, and an atomic sentence with a two-place predicate is true iff the ordered pair formed from the referents of the terms in order is a member of the extension of the predicate.
(II) '' is true iff α is false.
(III) '(α & β)' is true when both α and β are true; otherwise '(α & β)' is false.
(IV) 'v β)' is true when at least one of α and β is true; otherwise 'v β)' is false.
(V) '(α → β)' is true if and only if (iff) α is false or β is true. So, '(α → β)' is false just in case α is true and β is false.

The meanings for '~' and '&' roughly correspond to the meanings of 'not' and 'and' as ordinarily used. We call '' and '(α & β)' negation and conjunction formulas, respectively. The formula '(~α v β)' is called a disjunction and the meaning of 'v' corresponds to inclusive or. There are a variety of conditionals in English (e.g., causal, counterfactual, logical), with each type having a distinct meaning. The conditional defined by (V) above is called the material conditional. One way of following (V) is to see that the truth conditions for '(α → β)' are the same as for '~(α & ~β)'.

By (II) '~Married(shannon, kelly)' is true because, as noted above, 'Married(shannon, kelly)' is false. (II) also tells us that '~Female(kelly)' is false since 'Female(kelly)' is true. According to (III), '(~Married(shannon, kelly) & Female(kelly))' is true because '~Married(shannon, kelly)' is true and 'Female(kelly)' is true. And '(Male(shannon) & Female(shannon))' is false because 'Male(shannon)' is false. (IV) confirms that '(Female(kelly) v Married(evan, evan))' is true because, even though 'Married(evan, evan)' is false, 'Female(kelly)' is true. From (V) we know that the sentence '(~(beth = beth) → Male(shannon))' is true because '~(beth = beth)' is false. If α is false then '(α → β)' is true regardless of whether or not β is true. The sentence '(Female(beth) → Male(shannon))' is false because 'Female(beth)' is true and 'Male(shannon)' is false.

Before describing the truth conditions for quantified sentences we need to say something about the notion of satisfaction. We've defined truth only for the formulas of M that are sentences. So, the notions of truth and falsity are not applicable to non-sentences such as 'Male(x)' and '((x = x) → Female(x))' in which 'x' occurs free. However, objects may satisfy wffs that are non-sentences. We introduce the notion of satisfaction with some examples. An object satisfies 'Male(x)' just in case that object is male. Matt satisfies 'Male(x)', Beth does not. This is the case because replacing 'x' in 'Male(x)' with 'Matt' yields a truth while replacing the variable with 'beth' yields a falsehood. An object satisfies '((x = x) → Female(x))' if and only if it is either not identical with itself or is a female. Beth satisfies this wff (we get a truth when 'beth' is substituted for the variable in all of its occurrences), Matt does not (putting 'matt' in for 'x' wherever it occurs results in a falsehood). As a first approximation, we say that an object with a name, say 'a', satisfies a wff 'Ψv' in which at most v occurs free if and only if the sentence that results by replacing v in all of its occurrences with 'a' is true. 'Male(x)' is neither true nor false because it is not a sentence, but it is either satisfiable or not by a given object. Now we define the truth conditions for quantifications, utilizing the notion of satisfaction. For a more detailed discussion of the notion of satisfaction, see the article, "Logical Consequence, Model-Theoretic Conceptions."

Let Ψ be any formula of M in which at most v occurs free.

(VI) 'vΨ' is true just in case there is at least one individual in the domain of quantification (e.g. at least one McKeon) that satisfies Ψ.
(VII) 'vΨ' is true just in case every individual in the domain of quantification (e.g. every McKeon) satisfies Ψ.

Here are some examples. '∃x(Male(x) & Married(x, beth))' is true because Matt satisfies '(Male(x) & Married(x, beth))'; replacing 'x' wherever it appears in the wff with 'matt' results in a true sentence. The sentence '∃xOlderThan(x, x)' is false because no McKeon satisfies 'OlderThan(x, x)', that is replacing 'x' in 'OlderThan(x, x)' with the name of a McKeon always yields a falsehood.

The universal quantification '∀x( OlderThan(x, paige) → Male(x))' is false for there is a McKeon who doesn't satisfy '(OlderThan(x, paige) → Male(x))'. For example, Shannon does not satisfy '(OlderThan(x, paige) → Male(x))' because Shannon satisfies 'OlderThan(x, paige)' but not 'Male(x)'. The sentence '∀x(x = x)' is true because all McKeons satisfy 'x = x'; replacing 'x' with the name of any McKeon results in a true sentence.

Note that in the explanation of satisfaction we suppose that an object satisfies a wff only if the object is named. But we don't want to presuppose that all objects in the domain of discourse are named. For the purposes of an example, suppose that the McKeons adopt a baby boy, but haven't named him yet. Then, '∃x Brother(x, evan)' is true because the adopted child satisfies 'Brother(x, evan)', even though we can't replace 'x' with the child's name to get a truth. To get around this is easy enough. We have added a list of names, 'w1', 'w2', 'w3', etc. to M, and we may say that any unnamed object satisfies 'Ψv' iff the replacement of v with a previously unused wi assigned as a name of this object results in a true sentence. In the above scenerio, '∃xBrother(x, evan)' is true because, ultimately, treating 'w1' as a temporary name of the child, 'Brother(w1, evan)' is true. Of course, the meanings of the predicates would have to be amended in order to reflect the addition of a new person to the domain of McKeons.

3. What is a Logic?

We have characterized an interpreted formal language M by defining what qualifies as a sentence of M and by specifying the conditions under which any M-sentence is true. The received view of logical consequence entails that the logical consequence relation in M turns on the nature of the logical constants in the relevant M-sentences. We shall regard just the sentential connectives, the quantifiers of M, and the identity predicate as logical constants (the language M is a first-order language). For discussion of the notion of a logical constant see Logical Consequence, Philosophical Considerations and Logical Consequence, Model-Theoretic Conceptions. Intuitively, one M-sentence is a logical consequence of a set of M-sentences if and only if it is impossible for all the sentences in the set to be true without the former sentence being true as well. A model-theoretic conception of logical consequence in M clarifies this intuitive characterization of logical consequence by appealing to the semantic properties of the logical constants, represented in the above truth clauses (I)-(VII). The entry Logical Consequence, Model-Theoretic Conceptions formalizes the account of truth in language M and gives a model-theoretic characterization of logical consequence in M. In contrast to the model-theoretic conception, the deductive-theoretic conception clarifies logical consequence, conceived of in terms of deducibility, by appealing to the inferential properties of logical constants portrayed as intuitively valid principles of inference, that is, principles justifying steps in deductions. See Logical Consequence, Philosophical Considerations for discussion of the relationship between the logical consequence relation and the model-theoretic and deductive-theoretic conceptions of it.

Deductive system N's inference rules, introduced below, are introduction and elimination rules, defined for each logical constant of our language M. An introduction rule introduces a logical constant into a proof and is useful for deriving a sentence that contains the constant. An elimination rule for the constant makes it possible to derive a sentence that has at least one less occurrence of the logical constant. Elimination rules are useful for deriving a sentence from another in which the constant appears.

Following Shapiro (1991, p. 3), we define a logic to be a language L plus either a model-theoretic or a deductive-theoretic account of logical consequence. A language with both characterizations is a full logic just in case both characterizations coincide. For discussion on the relationship between the model-theoretic and deductive-theoretic accounts of logical consequence, see Logical Consequence, Philosophical Considerations. The logic for M developed below may be viewed as a classical logic or a first-order theory.

4. Deductive System N

In stating N's rules, we begin with the simpler inference rules and give a sample formal deduction of them in action. Then we turn to the inference rules that employ what we shall call sub-proofs. In the statement of the rules, we let P and Q be any sentences from our language M. We shall number each line of a formal deduction with a positive integer. We let k, l, m, n, o, p and q be any positive integers such that k < m, and l < m, and m < n < o < p < q.

&-Intro

k. P
l. Q
m. (P & Q) &-Intro: k, l

&-Elim

k. (P & Q) k. (P & Q)
m. P &-Elim: k m. Q &-Elim: k

&-Intro allows us to derive a conjunction from both of its two parts (called conjuncts). According to the &-Elim rule we may derive a conjunct from a conjunction. To the right of the sentence derived using an inference rule is the justification. Steps in a proof are justified by identifying both the lines in the proof used and by citing the appropriate rule. The vertical lines serve as proof margins, which, as you will shortly see, help in portraying the structure of a proof when it contains embedded sub-proofs.

~-Elim

k. ~~P
m. P ~-Elim: k

The ~-Elim rule allows us to drop double negations and infer what was subject to the two negations.

v-Intro

k. P k. P
m. (P v Q) v-Intro: k m. (Q v P) v-Intro: k

By v-Intro we may derive a disjunction from one of its parts (called disjuncts).

-Elim

k. (P → Q)
l. P
m. Q →-Elim: k, l

The →- Elim rule corresponds to the principle of inference called modus ponens: from a conditional and its antecedent one may infer the consequent.

Here's a sample deduction using the above inference rules. The formal deduction–the sequence of sentences 4-11—is associated with the pair

<{(Female(paige) & Female (kelly)), (Female(paige) → ~~Sister(paige, kelly)), (Female(kelly) → ~~Sister(paige, shannon))}, ((Sister(paige, kelly) & Sister(paige, shannon)) v Male(evan))>.

The first element is the set of basis sentences and the second element is the conclusion. We number the basis sentences and list them (beginning with 1) ahead of the deduction. The deduction ends with the conclusion.

1. (Female(paige) & Female (kelly)) Basis
2. (Female(paige) → ~~Sister(paige, kelly)) Basis
3. (Female(kelly) → ~~Sister(paige, shannon)) Basis
4. Female(paige) &-Elim: 1
5. Female(kelly) &-Elim: 1
6. ~~Sister(paige, kelly) →-Elim: 2, 4
7. Sister(paige, kelly) ~-Elim: 6
8. ~~Sister(paige, shannon) →-Elim: 3, 5
9. Sister(paige, shannon) ~-Elim: 8
10. (Sister(paige, kelly) & Sister(paige, shannon)) &-Intro: 7, 9
11. ((Sister(paige, kelly) & Sister(paige, shannon)) v Male(evan)) v-Intro: 10

Again, the column all the way to the right gives the explanations for each line of the proof. Assuming the adequacy of N, the formal deduction establishes that the following inference is correct.

(Female(paige) & Female (kelly))
(Female(paige) → ~~Sister(paige, kelly))
(Female(kelly) → ~~Sister(paige, shannon))


(therefore) ((Sister(paige, kelly) & Sister(paige, shannon)) v Male(evan))

For convenience in building proofs, we expand M to include '⊥', which we use as a symbol for a contradiction (e.g., '(Female(beth) & ~Female(beth))').

⊥-Intro

k. P
l. ~P
m. ⊥-Intro: k, l

⊥-Elim

k.
m. P ⊥-Elim: k

If we have derived a sentence and its negation we may derive ⊥ using ⊥-Intro. The ⊥-Elim rule represents the idea that any sentence P is deducible from a contradiction. So, from ⊥ we may derive any sentence P using ⊥-Elim.

Here's a deduction using the two rules.

1. (Parent(beth, evan) & ~Parent(beth, evan)) Basis
2. Parent(beth, evan) &-Elim: 1
3. ~Parent(beth, evan) &-Elim: 1
4. ⊥-Intro: 2, 3
5. Parent(beth, shannon) ⊥-Elim: 4

For convenience, we introduce a reiteration rule that allows us to repeat steps in a proof as needed.

Reit

k. P
.
.
.
m. P Reit: k

We now turn to the rules for the sentential connectives that employ what we shall call sub-proofs. Consider the following inference.

1. ~(Married(shannon, kelly) & OlderThan(shannon, kelly))
2. Married(shannon, kelly)


(therefore) ~Olderthan(shannon, kelly)

Here is an informal deduction of the conclusion from the basis sentences.

Proof: Suppose that 'Olderthan(shannon, kelly)' is true. Then, from this assumption and basis sentence 2 it follows that '((Shannon is married to Kelly) & (Shannon is taller than Kelly))' is true. But this contradicts the first basis sentence '~((Shannon is married to Kelly) & (Shannon is taller than Kelly))', which is true by hypothesis. Hence our initial supposition is false. We have derived that '~(Shannon is married to Kelly)' is true.

Such a proof is called a reductio ad absurdum proof (or reductio for short). Reductio ad absurdum is Latin for 'reduction to the absurd'. (For more information, see the article "Reductio ad absurdum".) In order to model this proof in N we introduce the ~-Intro rule.

~-Intro

k. P Assumption
.
.
.
m.
n. ~P ~-Intro: k-m

The ~-Intro rule allows us to infer the negation of an assumption if we have derived a contradiction, symbolized by '⊥', from the assumption. The indented proof margin (k-m) signifies a sub-proof. In a sub-proof the first line is always an assumption (and so requires no justification), which is cancelled when the sub-proof is ended and we are back out on a line that sits on a wider proof margin. The effect of this is that we can no longer appeal to any of the lines in the sub-proof to generate later lines on wider proof margins. No deduction ends in the middle of a sub-proof.

Here is a formal analogue of the above informal reductio.

1. ~(Married(shannon, kelly) & OlderThan(shannon, kelly)) Basis
2. Married(shannon, kelly) Basis
3. OlderThan(shannon, kelly) Assumption
4. (Married(shannon, kelly) & OlderThan(shannon, kelly)) &-Intro: 2, 3
5. ⊥-Intro: 1, 4
6. ~Olderthan(shannon, kelly) ~-Intro: 3-5

We signify a sub-proof with the indented proof margin line; the start and finish of a sub-proof is indicated by the start and break of the indented proof margin. An assumption, like a basis sentence, is a supposition we suppose true for the purposes of the deduction. The difference is that whereas a basis sentence may be used at any step in a proof, an assumption may only be used to make a step within the sub-proof it heads. At the end of the sub-proof, the assumption is discharged. We now look at more sub-proofs in action and introduce another of N's inference rules. Consider the following inference.

1. (Male(kelly) v Female(kelly))
2. (Male(kelly) → ~Sister(kelly, paige))
3. (Female(kelly) → ~Brother(kelly, evan))


(therefore) (~Sister(kelly, paige) v ~Brother(kelly, evan))

Informal Proof:

By assumption '(Male(kelly) v Female(kelly))' is true, that is, by assumption at least one of the disjuncts is true.

Suppose that 'Male(kelly)' is true. Then by modus ponens we may derive that '~Sister(kelly, paige)' is true from this assumption and the basis sentence 2. Then '(~Sister(kelly, paige) v ~Brother(kelly, evan))' is true.

Suppose that 'Female(kelly)' is true. Then by modus ponens we may derive that '~Brother(kelly, evan)' is true from this assumption and the basis sentence 3. Then '(~Sister(kelly, paige) v ~Brother(kelly, evan))' is true.

So in either case we have derived that '(~Sister(kelly, paige) v ~Brother(kelly, evan))' is true. Thus we have shown that this sentence is a deductive consequence of the basis sentences.

We model this proof in N using the v-Elim rule.

v-Elim

k. (P v Q)
m. P Assumption
.
.
.
n. R
o. Q Assumption
.
.
.
p. R
q. R v-Elim: k, m-n, o-p

The v-Elim rule allows us to derive a sentence from a disjunction by deriving it from each disjunct, possibly using sentences on earlier lines that sit on wider proof margins.

The following formal proof models the above informal one.

1. (Male(kelly) v Female(kelly)) Basis
2. (Male(kelly) → ~Sister(kelly, paige)) Basis
3. (Female(kelly) → ~Brother(kelly, evan)) Basis
4. Male(kelly) Assumption
5. ~Sister(kelly, paige) →-Elim: 2, 4
6. (~Sister(kelly, paige) v ~Brother(kelly, evan)) v-Intro: 5
7. Female(kelly) Assumption
8. ~Brother(kelly, evan) →-Elim: 3, 7
9. (~Sister(kelly, paige) v ~Brother(kelly, evan)) v-Intro: 8
10. (~Sister(kelly, paige) v ~Brother(kelly, evan)) v-Elim: 1, 4-6, 7-9

1. (P v Q) Basis
2. ~P Basis
3. P Assumption
4. ⊥-Intro: 2, 3
5. Q ⊥-Elim: 4
6. Q Assumption
7. Q Reit: 6
8. Q v-Elim: 1, 3-5, 6-7

Now we introduce the →-Intro rule by considering the following inference.

1. (Olderthan(shannon, kelly) → OlderThan(shannon, paige))
2. (OlderThan(shannon, paige) → OlderThan(shannon, evan))


(therefore) (Olderthan(shannon, kelly) → OlderThan(shannon, evan))

Informal proof:

Suppose that OlderThan(shannon, kelly). From this assumption and basis sentence 1 we may derive, by modus ponens, that OlderThan(shannon, paige). From this and basis sentence 2 we get, again by modus ponens, that OlderThan(shannon, evan). Hence, if OlderThan(shannon, kelly), then OlderThan(shannon, evan).

The structure of this proof is that of a conditional proof: a deduction of a conditional from a set of basis sentence which starts with the assumption of the antecedent, then a derivation of the consequent, and concludes with the conditional. To build conditional proofs in N, we rely on the →-Intro rule.

-Intro

k. P Assumption
.
.
.
m. Q
n. (P → Q) →-Intro: k-m

According to the →-Intro rule we may derive a conditional if we derive the consequent Q from the assumption of the antecedent P, and, perhaps, other sentences occurring earlier in the proof on wider proof margins. Again, such a proof is called a conditional proof.

We model the above informal conditional proof in N as follows.

1. (Olderthan(shannon, kelly) → OlderThan(shannon, paige)) Basis
2. (Olderthan(shannon, paige) → OlderThan(shannon, evan)) Basis
3. OlderThan(shannon, kelly) Assumption
4. OlderThan(shannon, paige) →-Elim: 1, 3
5. OlderThan(shannon, evan) →-Elim: 2, 4
6. (OlderThan(shannon, kelly) → OlderThan(shannon, evan)) →-Intro: 3-5

Mastery of a deductive system facilitates the discovery of proof pathways in hard cases and increases one's efficiency in communicating proofs to others and explaining why a sentence is a logical consequence of others. For example, suppose that (1) if Beth is not Paige's parent, then it is false that if Beth is a parent of Shannon, Shannon and Paige are sisters. Further suppose (2) that Beth is not Shannon's parent. Then we may conclude that Beth is Paige's parent. Of course, knowing the type of sentences involved is helpful for then we have a clearer idea of the inference principles that may be involved in deducing that Beth is a parent of Paige. Accordingly, we represent the two basis sentences and the conclusion in M, and then give a formal proof of the latter from the former.

1. (~Parent(beth, paige) → ~(Parent(beth, shannon) → Sister(shannon, paige))) Basis
2. ~Parent(beth, shannon) Basis
3. ~Parent(beth, paige) Assumption
4. ~(Parent(beth, shannon) → Sister(shannon, paige)) →-Elim: 1, 3
5. Parent(beth, shannon) Assumption
6. ⊥-Intro: 2, 5
7. Sister(shannon, paige) ⊥-Elim: 6
8. (Parent(beth, shannon) → Sister(shannon, paige)) →-Intro: 5-7
9. ⊥-Intro: 4, 8
10. ~~Parent(beth, paige) ~-Intro: 3-9
11. Parent(beth, paige) ~-Elim: 10

Because we derived a contradiction at line 9, we got '~~Parent(beth, paige)' at line 10, using ~-Intro, and then we derived 'Parent(beth, paige)' by ~-Elim. Look at the conditional proof (lines 5-7) from which we derived line 8. Pretty neat, huh? Lines 2 and 5 generated the contradiction from which we derived 'Sister(shannon, paige)' at line 7 in order to get the conditional at line 8. This is our first example of a sub-proof (5-7) embedded in another sub-proof (3-9). It is unlikely that independent of the resources of a deductive system, a reasoner would be able to readily build the informal analogue of this pathway from the basis sentences to the sentence at line 11. Again, mastery of a deductive system such as N can increase the efficiency of our performances of rigorous reasoning and cultivate skill at producing elegant proofs (proofs that take the least number of steps to get from the basis to the conclusion).

We now introduce the Intro and Elim rules for the identity symbol and the quantifiers. Let n and n' be any names, and 'Ωn' and 'Ωn' ' be any well-formed formulas in which n and n' appear and that have no free variables.

=-Intro

k. (n = n) =-Intro

=-Elim

k. Ωn
l. (n = n' )
m. Ωn' =-Elim: k, l

The =-Intro rule allows us to introduce '(n = n)' at any step in a proof. Since '(n = n)' is deducible from any sentence, there is no need to identify the lines from which line k is derived. In effect, the =-Intro rule confirms that '(paige = paige)', '(shannon = shannon)', '(kelly = kelly)', etc... may be inferred from any sentence(s). The =-Elim rule tells us that if we have proven 'Ωn' and '(n = n' )', then we may derive 'Ωn' ' which is gotten from 'Ωn' by replacing n with n' in some but possibly not all occurrences. The =-Elim rule represents the principle known as the indiscernibility of identicals, which says that if '(n = n' )' is true, then whatever is true of the referent of n is true of the referent of n'. This principle grounds the following inference

1. ~Sister(beth, kelly)
2. (beth = shannon)


(therefore) ~Sister(shannon, kelly)

The indiscernibility of identicals is fairly obvious. If I know that Beth isn't Kelly's sister and that Beth is Shannon (perhaps 'Shannon' is an alias) then this establishes, with the help of the indiscernibility of identicals, that Shannon isn't Kelly's sister. Now we turn to the quantifier rules.

Let 'Ωv' be a formula in which v is the only free variable, and let n be any name.

∃-Intro

k. Ωn
m. vΩv ∃-Intro: k

∃-Elim

k. vΩv
[n] m. Ωn Assumption
.
.
.
n. P
o. P ∃-Elim: k, m-n

Here, n must be unique to the subproof, that is, n doesn't occur on any of the lines above m and below n.

The ∃-Intro rule, which represents the principle of inference known as existential generalization, tells us that if we have proven 'Ωn', then we may derive 'vΩv' which results from 'Ωn' by replacing n with a variable v in some but possibly not all of its occurrences and prefixing the existential quantifier. According to this rule, we may infer, say, '∃x Married(x, matt)' from the sentence 'Married(beth, matt)'. By the ∃-Elim rule, we may reason from a sentence that is produced from an existential quantification by stripping the quantifier and replacing the resulting free variable in all of its occurrences by a name which is new to the proof. Recall that the language M has an infinite number of constants, and the name introduced by the ∃-Elim rule may be one of the wi. We regard the assumption at line l, which starts the embedded sub-proof, as saying "Suppose n names an arbitrary individual from the domain of discourse such that 'Ωn' is true." To illustrate the basic idea behind the ∃-Elim rule, if I tell you that Shannon admires some McKeon, you can't infer that Shannon admires any particular McKeon such as Matt, Beth, Shannon, Kelly, Paige, or Evan. Nevertheless we have it that she admires somebody. The principle of inference corresponding to the ∃-Elim rule, called existential instantiation, allows us to assign this 'somebody' an arbitrary name new to the proof, say, 'w1' and reason within the relevant sub-proof from 'Shannon admires w1'. Then we cancel the assumption and infer a sentence that doesn't make any claims about w1. For example, suppose that (1) Shannon admires some McKeon. Let's call this McKeon 'w1', that is, assume (2) that Shannon admires a McKeon named 'w1'. By the principle of inference corresponding to v-Intro we may derive (3) that Shannon admires w1 or w1 admires Kelly. From (3), we may infer by existential generalization (4) that for some McKeon x, Shannon admires x or x admires Kelly. We now cancel the assumption (that is, cancel (2)) by concluding (5) that for some McKeon x, Shannon admires x or x admires Kelly from (1) and the subproof (2)-(4), by existential instantiation. Here is the above reasoning set out formally.

1. ∃x Admires(shannon, x) Basis
[w1] 2. Admires(shannon, w1) Assumption
3. (Admires(shannon, w1) v Admires(w1, kelly)) v-Intro: 2
4. ∃x(Admires(shannon, x) v Admires(x, kelly)) ∃-Intro: 3
5. ∃x(Admires(shannon, x) v Admires(x, kelly)) ∃-Elim: 1, 2-4

The string at the assumption of the sub-proof (line 2) says "Suppose that 'w1 ' names an arbitrary McKeon such that 'Admires(shannon, w1)' is true." This is not a sentence of M, but of the meta-language for M, that is, the language used to talk about M. Hence, the ∃-Elim rule (as well as the ∀-Intro rule introduced below) has a meta-linguistic character.

∀-Intro

[n] k. Assumption
.
.
.
m. Ωn
n. vΩv ∀-Intro: k-m
n must be unique to the subproof

∀-Elim

k. vΩv
m. Ωn ∀-Elim: k

The ∀-Elim rule corresponds to the principle of inference known as universal instantiation: to infer that something holds for an individual of the domain if it holds for the entire domain. The ∀-Intro rule allows us to derive a claim that holds for the entire domain of discourse from a proof that the claim holds for an arbitrary selected individual from the domain. The assumption at line k reads in English "Suppose n names an arbitrarily selected individual from the domain of discourse." As with the ∃-Elim rule, the name introduced by the ∀-Intro rule may be one of the wi. The ∀-Intro rule corresponds to the principle of inference often called universal generalization.

For example, suppose that we are told that (1) if a McKeon admires Paige, then that McKeon admires himself/herself, and that (2) every McKeon admires Paige. To show that we may correctly infer that every McKeon admires himself/herself we appeal to the principle of universal generalization, which (again) is represented in N by the ∀-Intro rule. We begin by assuming that (3) a McKeon is named 'w1'. All we assume about w1 is that w1 is one of the McKeons. From (2), we infer that (4) w1 admires Paige. We know from (1), using the principle of universal instantiation (the ∀-Elim rule in N), that (5) if w1 loves Paige then w1 loves w1. From (4) and (5) we may infer that (6) w1 loves w1 by modus ponens. Since w1 is an arbitrarily selected individual (and so what holds for w1 holds for all McKeons) we may conclude from (3)-(6) that (7) every McKeon loves himself/herself follows from (1) and (2) by universal generalization. This reasoning is represented by the following formal proof.

1. ∀x(Admires(x, paige) → Admires(x, x)) Basis
2. ∀x Admires(x, paige) Basis
[w1] 3. Assumption
4. Admires(w1, paige) ∀-Elim: 2
5. (Admires(w1, paige) → Admires(w1, w1)) ∀-Elim: 1
6. Admires(w1, w1) →-Elim: 4, 5
7. ∀x Admires(x, x) ∀-Intro: 3-6

Line 3, the assumption of the sub-proof, corresponds to the English sentence "Let 'w1' refer to an arbitrary McKeon." The notion of a name referring to an arbitrary individual from the domain of discourse, utilized by both the ∀-Intro and ∃-Elim rules in the assumptions that start the respective sub-proofs, incorporates two distinct ideas. One, relevant to the ∃-Elim rule, means "some specific object, but I don't know which", while the other, relevant to the ∀-Intro rule means "any object, it doesn't matter which" (See Pelletier 1999, pp. 118-120 for discussion.)

Consider:

K = {All McKeons admire those who admire somebody, Some McKeon admires a McKeon}
X = Paige admires Paige

Here's a proof that X is deducible from K.

1. ∀x(∃y Admires(x, y) → ∀z Admires(z, x)) Basis
2. ∃x∃y Admires(x, y) Basis
[w1] 3. ∃y Admires(w1, y) Assumption
4. (∃y Admires(w1, y) → ∀z Admires(z, w1)) ∀-Elim: 1
5. ∀z Admires(z, w1) →-Elim: 3, 4
6. Admires(paige, w1) ∀-Elim: 5
7. ∃y Admires(paige, y) ∃-Intro: 6
8. (∃y Admires(paige, y) → ∀z Admires(z, paige)) ∀-Elim: 1
9. ∀z Admires(z, paige) →-Elim: 7, 8
10. Admires(paige, paige) ∀-Elim: 9
11. Admires(paige, paige) ∃-Elim: 2, 3-10

An informal correlate put somewhat succinctly, runs as follows.

Let's call the unnamed admirer, mentioned in (2), w1. From this and (1), every McKeon admires w1 and so Paige admires w1. Hence, Paige admires somebody. From this and (1) it follows that everybody admires Paige. So, Paige admires Paige. This is our desired conclusion

Even though the informal proof skips steps and doesn't mention by name the principles of inference used, the formal proof guides its construction.

5. The Status of the Deductive Characterization of Logical Consequence in Terms of N

We began the article by presenting the deductive-theoretic characterization of logical consequence: X is a logical consequence of a set K of sentences if and only if X is deducible from K, that is, there is a deduction of X from K. To make it official, we now characterize the deductive consequence relation in M in terms of deducibility in N.

X is a deductive consequence of K if and only if K ⊢N X, that is, X is deducible in N from K

We now inquire into the status of this characterization of deductive consequence.

The first thing to note is that deductive system N is complete and sound with respect to the model-theoretic consequence relation defined in Logical Consequence, Model-Theoretic Conceptions: Section 4.4. Let

K ⊢N X

abbreviate

X is deducible in N from K

Similarly, let

K ⊨ X

abbreviate

X is a model-theoretic consequence of K, that is, every M-structure that is a model of K is also a model of X. (For more information on structures and models, see Logical Consequence, Model-Theoretic Conceptions.)

The completeness and soundness of N means that for any set K of M sentences and M-sentence X, K ⊢N X if and only if K ⊨ X. A soundness proof establishes K ⊢N X only if K ⊨ X, and a completeness proof establishes K ⊢N X if K ⊨ X. So, the ⊢N and ⊨ relations, defined on sentences of M, are extensionally equivalent. The question arises: which characterization of the logical consequence relation is more basic or fundamental?

a. Tarski's argument that the model-theoretic characterization of logical consequence is more basic than its characterization in terms of a deductive system

The first thing to note is that the ⊢N-consequence relation is compact. For any deductive system D and pair there is a K' such that, K ⊢D X if and only if K' ⊢D X, where K' is a finite subset of sentences from K. As pointed out by Tarski (1936), among others, there are intuitively correct principles of inference reflected in certain languages according to which one may infer a sentence X from a set K of sentences, even though it is incorrect to infer X from any finite subset of K. Here's a rendition of his reasoning, focusing on the ⊢N-consequence relation defined on a language for arithmetic, which allows us to talk about the natural numbers 0, 1, 2, 3, and so on. Let 'P' be a predicate defined over the domain of natural numbers and let 'NatNum(x)' abbreviate 'x is a natural number'. According to Tarski, intuitively,

∀x(NatNum(x) → P(x))

is a logical consequence of the infinite set S of sentences

P(0)
P(1)
P(2)
.
.
.

However, the universal quantification is not a ⊢N-consequence of the set S. The reason why is that the ⊢N-consequence relation is compact: for any sentence X and set K of sentences, X is a ⊢N-consequence of K, if and only if X is a ⊢N-consequence of some finite subset of K. Proofs in N are objects of finite length; a deduction is a finite sequence of sentences. Since the universal quantification is not a ⊢N-consequence of any finite subset of S, it is not a ⊢N-consequence of S. By the completeness of system N, it follows that

∀x(NatNum(x) → P(x))

is not a ⊨-consequence of S either. Consider the structure U* whose domain is the set of McKeons. Let all numerals name Beth. Let the extension of 'NatNum' be the entire domain, and the extension of 'P' be just Beth. Then each element of S is true in U*, but '∀x (NatNum(x) → P(x))' is not true in U*. (See Logical Consequence, Model-Theoretic Conceptions for further discussion of structures.) Note that the sentences in S only say that P holds for 0, 1, 2, and so on, and not also that 0,1, 2, etc., are all the elements of the domain of discourse. The above interpretation takes advantage of this fact by reinterpreting all numerals as names for Beth.

However, we can reflect model-theoretically the intuition that '∀x(NatNum(x) → P(x))' is a logical consequence of set S by doing one of two things. We can add to S the functional equivalent of the claim that 1, 2, 3, etc., are all the natural numbers there are on the basis that this is an implicit assumption of the view that the universal quantification follows from S. Or we could add 'NatNum' and all numerals to our list of logical terms. On either option it still won't be the case that '∀x(NatNum(x) → P(x))' is a ⊢N-consequence of the set S. There is no way to accommodate the intuition that '∀x(NatNum(x) → P(x))' is a logical consequence of S in terms of a compact consequence relation. Tarski takes this to be a reason to think that the model-theoretic account of logical consequence is definitive as opposed to an account of logical consequence in terms of a compact consequence relation such as ⊢N.

Tarski's illustration shows that what is called the ω-rule is a correct inference rule.

The ω-rule is that from:

{P(0), P(1), P(2), ...}

one may infer

∀x(NatNum(x) → P(x))

with respect to any predicate P. Any inference guided by this rule is correct even though it can't be represented in a deductive system as this notion has been used here and discussed in Logical Consequence, Philosophical Considerations.

Compactness is not a salient feature of logical consequence conceived deductive theoretically. This suggests, by the third criterion of a successful theoretical definition of logical consequence mentioned in Logical Consequence, Philosophical Considerations, that no compact consequence relation is definitive of the intuitive notion of deducibility. So, assuming that deductive system N is correct (that is, deducibility is co-extensive in M with the ⊢N-relation), we can't treat

X is intuitively deducible from K if and only if K ⊢N X.

as a definition of deducibility in M since

X is a deductive consequence of K if and only if X is deducible in a correct deductive system from K.

is not true with respect to languages for which deducibility is not captured by any compact consequence relation (that is, not captured by any deduction-system account of it ). Some (e.g., Quine) demur using a language for purposes of science in which deducibility is not completely represented by a deduction-system account because of epistemological considerations. Nevertheless, as Tarski (1936) argues, the fact that there cannot be deduction-system accounts of some intuitively correct principles of inference is reason for taking a model-theoretic characterization of logical consequence to be more fundamental than any characterization in terms of a deductive system sound and complete with respect to the model-theoretic characterization.

b. Is deductive system N correct?

In discussing the status of the characterization of logical consequence in terms of deductive system N, we assumed that N is correct. The question arises whether N is, indeed, correct. That is, is it the case that X is intuitively deducible from K if and only if K ⊢N X? The biconditional holds only if both (1) and (2) are true.

(1) If sentence X is intuitively deducible from set K of sentences, then K ⊢N X.
(2) If K ⊢N X, then sentence X is intuitively deducible from set K of sentences.

So N is incorrect if either (1) or (2) is false. The truth of (1) and (2) is relevant to the correctness of the characterization of logical consequence in terms of system N, because any adequate deductive-theoretic characterization of logical consequence must identify the logical terms of the relevant language and account for their inferential properties (for discussion, see Logical Consequence, Philosophical Considerations: Section 4). (1) is false if the list of logical terms in M is incomplete. In such a case, there will be a sentence X and set K of sentences such that X is intuitively deducible from set K because of at least one inferential property of logical terminology unaccounted for by N and so false that K ⊢N X (for discussion of some of the issues surrounding what qualifies as a logical term see Logical Consequence, Model-theoretic Conceptions: Section 5.3). In this case, N would be incorrect because it wouldn't completely account for the inferential machinery of language M. (2) is false if there are deductions in N that are intuitively incorrect. Are there such deductions? In order to fine-tune the question note that the sentential connectives, the identity symbol, and the quantifiers of M are intended to correspond to or, and, not, if...then (the indicative conditional), is identical with, some, and all. Hence, N is a correct deductive system only if the Intro and Elim rules of N reflect the inferential properties of the ordinary language expressions. In what follows, we sketch three views that are critical of the correctness of system N because they reject (2).

i. Relevance logic

Not everybody accepts it as a fact that any sentence is deducible from a contradiction, and so some question the correctness of the ⊥-Elim rule. Consider the following informal proof of Q from 'P & ~P', for sentences P and Q, as a rationale for the ⊥-Elim rule.

From (1) P and not-P, we may correctly infer (2) P, from which it is correct to infer (3) P or Q. We derive (4) not-P from (1). (5) P follows from (3) and (4).

The proof seems to be composed of valid modes of inference. Critics of the ⊥-Elim rule are obliged to tell us where it goes wrong. Here we follow the relevance logicians Anderson and Belnap (1962, pp.105-108; for discussion, see Read 1995, pp. 54-60). In a nutshell, Anderson and Belnap claim that the proof is defective because it commits a fallacy of equivocation. The move from (2) to (3) is correct only if or has the sense of at least one. For example, from Kelly is female it is legit to infer that at least one of the two sentences Kelly is female and Kelly is older than Paige is true. On this sense of or given that Kelly is female, one may infer that Kelly is female or whatever you like. However, in order for the passage from (3) and (4) to (5) to be legitimate the sense of or in (3) is if not-...then. For example from if Kelly is not female, then Kelly is not Paige's sister and Kelly is not female it is correct to infer Kelly is not Paige's sister. Hence, the above "support" for the ⊥-Elim rule is defective for it equivocates on the meaning of or.

Two things to highlight. First, Anderson and Belnap think that the inference from (2) to (3) on the if not-...then reading of or is incorrect. Given that Kelly is female it is problematic to deduce that if she is not then Kelly is older than Paige—or whatever you like. Such an inference commits a fallacy of relevance for Kelly not being female is not relevant to her being older than Paige. The representation of this inference in system N appeals to the ⊥-Elim rule, which is rejected by Anderson and Belnap. Second, the principle of inference underlying the move from (3) and (4) to (5)—from P or Q and not-P to infer Q—is called the principle of the disjunctive syllogism. Anderson and Belnap claim that this principle is not generally valid when or has the sense of at least one, which it has when it is rendered by 'v' (e.g., see above). If Q is relevant to P, then the principle holds on this reading of or.

It is worthwhile to note the essentially informal nature of the debate. It calls upon our pre-theoretic intuitions about correct inference. It would be quite useless to cite the proof in N of the validity of disjunctive syllogism (given above) against Anderson and Belnap for it relies on the ⊥-Elim rule whose legitimacy is in question. No doubt, pre-theoretical notions and original intuitions must be refined and shaped somewhat by theory. Our pre-theoretic notion of correct deductive reasoning in ordinary language is not completely determinant and precise independently of the resources of a full or partial logic. (See Shapiro 1991, chaps. 1 and 2 for discussion of the interplay between theory and pre-theoretic notions and intuitions.) Nevertheless, hardcore intuitions regarding correct deductive reasoning do seem to drive the debate over the legitimacy of deductive systems such as N and over the legitimacy of the ⊥-Elim rule in particular. Anderson and Belnap (1962, p. 108) write that denying the principle of the disjunctive syllogism, regarded as a valid mode of inference since Aristotle, "... will seem hopelessly naïve to those logicians whose logical intuitions have been numbed through hearing and repeating the logicians fairy tales of the past half century, and hence stand in need of further support". The possibility that intuitions in support of the general validity of the principle of the disjunctive syllogism have been shaped by a bad theory of inference is motive enough to consider argumentative support for the principle and to investigate deductive systems for relevance logic.

A natural deductive system for relevance logic has the means for tracking the relevance quotient of the steps used in a proof and allows the application of an introduction rule in the step from A to B "only when A is relevant to B in the sense that A is used in arriving at B" (Anderson and Belnap 1962, p. 90). Consider the following proof in system N.

1. Admires(evan, paige) Basis
2. ~Married(beth, matt) Assumption
3. Admires(evan, paige) Reit: 1
4. (~Married(beth, matt) → Admires(evan, paige)) →-Intro: 2-3

Recall that the rationale behind the →-Intro rule is that we may derive a conditional if we derive the consequent Q from the assumption of the antecedent P, and, perhaps, other sentences occurring earlier in the proof on wider proof margins. The defect of this rule, according to Anderson and Belnap is that "from" in "from the assumption of the antecedent P" is not taken seriously. They seem to have a point. By the lights of the → -Intro rule, we have derived line 4 but it is hard to see how we have derived the sentence at line 3 from the assumption at step 2 when we have simply reiterated the basis at line 3. Clearly, '~Married(beth, matt)' was not used in inferring 'Admires(evan, beth)' at line 3. The relevance logician claims that the →-Intro rule in a correct natural deductive system should not make it possible to prove a conditional when the consequent was arrived at independently of the antecedent. A typical strategy is to use classes of numerals to mark the relevance conditions of basis sentences and assumptions and formulate the Intro and Elim rules to tell us how an application of the rule transfers the numerical subscript(s) from the sentences used to the sentence derived with the help of the rule. Label the basis sentences, if any, with distinct numerical subscripts. Let a, b, c, etc., range over classes of numerals. The →-rules for a relevance natural deductive system may be represented as follows.

→-Elim

k. (P → Q)a
l. Pb
m. Qab →-Elim: k, l

→-Intro

k. P{k} Assumption
.
.
.
m. Qb
n. (P → Q)b – {k} →-Intro: k-m, provided kb
The numerical subscript of the assumption
at line k must be new to the proof.
This is insured by using the line number
for the subscript.

In the directions for the →-Intro rule, the proviso that kb insures that the antecedent P is used in deriving the consequent Q. Anderson and Belnap require that if the line that results from the application of either rule is the conclusion of the proof the relevance markers be discharged. Here is a sample proof of the above two rules in action.

1. Admires(evan, paige)1 Assumption
2. (Admires(evan, paige) → ~Married(beth, matt))2 Assumption
3. ~Married(beth, matt)1, 2 →-Elim: 1,2
4. ((Admires(evan, paige) → ~Married(beth, matt)) → ~Married(beth, matt))1 →-Intro: 2-3
5. (Admires(evan, paige) → ((Admires(evan, paige) → ~Married(beth, matt)) → ~Married(beth, matt))) →-Intro: 1-4

For further discussion see Anderson and Belnap (1962). For a comprehensive discussion of relevance deductive systems see their (1975). For a more up-to-date review of the relevance logic literature see Dunn (1986).

ii. Intuitionistic logic

We now consider the correctness of the ~-Elim rule and consider the rule in the context of using it along with the ~-Intro rule.

~-Intro

k. P Assumption
.
.
.
m.
n. ~P ~-Intro: k-m

~-Elim

k. ~~P
m. P ~-Elim: k

Here is a typical use in classical logic of the ~-Intro and ~-Elim rules. Suppose that we derive a contradiction from the assumption that a sentence P is true. So, if P were true, then a contradiction would be true which is impossible. So P cannot be true and we may infer that not-P. Similarily, suppose that we derive a contradiction from the assumption that not-P. Since a contradiction cannot be true, not-P is not true. Then we may infer that P is true by ~-Elim.

The intuitionist logician rejects the reasoning given in bold. If a contradiction is derived from not-P we may infer that not-P is not true, that is, that not-not-P is true, but it is incorrect to infer that P is true. Why? Because the intuitionist rejects the presupposition behind the ~-Elim rule, which is that for any proposition P there are two alternatives: P and not-P. The grounds for this are the intuitionistic conceptions of truth and meaning.

According to intuitionistic logic, truth is an epistemic notion: the truth of a sentence P consists of our ability to verify it. To assert P is to have a proof of P, and to assert not-P is to have a refutation of P. This leads to an epistemic conception of the meaning of logical constants. The meaning of a logical constant is characterized in terms of its contribution to the criteria of proof for the sentences in which it occurs. Compare with classical logic: the meaning of a logical constant is semantically characterized in terms of its contribution to the determination of the truth conditions of the sentences in which it occurs. For example, the classical logician accepts a sentence of the form 'P v Q' only when she accepts that at least one of the disjuncts is true. On the other hand, the intuitionistic logician accepts ' P v Q' only when she has a method for proving P or a method for proving Q. But then the Law of Excluded Middle no longer holds, because a sentence of the form P or not-P is true, that is assertible, only when we are in a position to prove or refute P, and we lack the means for verifying or refuting all sentences. The alleged problem with the ~-Elim rule is that it illegitimately extends the grounds for asserting P on the basis of not-not-P since a refutation of not-P is not ipso facto a proof of P.

Since there are finitely many McKeons and the predicates of language M seem well defined, we can work through the domain of the McKeons to verify or refute any M-sentence and so there doesn't seem to be an M-sentence that is neither verifiable nor refutable. However, consider a language about the natural numbers. Any sentence that results by substituting numerals for the variables in 'x = y + z' is decidable. This is to say that for any natural numbers x, y, and z, we have an effective procedure for determining whether or not x is the sum of y and z. Hence, for all x, y, and z either we may assert that x = y + z or we may assert the contrary. Let 'A(x)' abbreviate 'if x is even and greater than 2 then there exists primes y and z such that x = y + z'. Since there are algorithms for determining of any number whether or not it is even, greater than 2, or prime, the hypothesis that the open formula 'A(x)' is satisfied by a given natural number is decidable for we can effectively determine for all smaller numbers whether or not they are prime. However, there is no known method for verifying or refuting Goldbach's conjecture, for all x, A(x). Even though, for each numeral n standing for a natural number, the sentence 'A(n)' is decidable (that is, we can determine which of 'A(n)' or 'not-A(n)' is true), the sentence 'for all x, A(x)' is not. That is, we are not in a position to hold that either Goldbach's conjecture is true or that it is not. Clearly, verification of the conjecture via an exhaustive search of the domain of natural numbers is not possible since the domain is non-finite. Minus a counterexample or proof of Goldbach's conjecture, the intuitionist demurs from asserting that either Goldbach's conjecture is true or it is not. This is just one of many examples where the intuitionist thinks that the law of excluded middle fails.

In sum, the legitimacy of the ~-Elim rule requires a realist conception of truth as verification transcendent. On this conception, sentences have truth-values independently of the possibility of a method for verifying them. Intuitionistic logic abandons this conception of truth in favor of an epistemic conception according to which the truth of a sentence turns on our ability to verify it. Hence, the inference rules of an intuitionistic natural deductive system must be coded in such a way to reflect this notion of truth. For example, consider an intuitionistic language in which a, b, ... range over proofs, 'a: P' stands for 'a is a proof of P', and '(a, b)' stands for some suitable pairing of the proofs a and b. The &-rules of an intuitionistic natural deductive system may look like the following:

&-Intro

k. a: P
l. b: Q
m. (a, b): (P & Q) &-Intro: k, l

&-Elim

k. (a, b): (P & Q) & nbsp; k. (a, b): (P & Q)
m. a: P &-Elim: k m. b: Q &-Elim: k

Apart from the negation rules, it is fairly straightforward to dress the Intro and Elim rules of N with a proof interpretation as is illustrated above with the &-rules. For the details see Van Dalen (1999). For further introductory discussion of the philosophical theses underlying intuitionistic logic see Read (1995) and Shapiro (2000). Tennant (1997) offers a more comprehensive discussion and defense of the philosophy of language underlying intuitionistic logic.

iii. Free Logic

We now turn to the ∃-Intro and ∀-Elim rules. Consider the following two inferences.

(1) Male(evan)


(3) ∀x Male(x)


(therefore) (2) ∃x Male(x) (therefore) (4) Male(evan)

Both are correct by the lights of our system N. Specifically, (2) is derivable from (1) by the ∃-Intro rule and we get (4) from (3) by the ∀-Elim rule. Note an implicit assumption required for the legitimacy of these inferences: every individual constant refers to an element of the quantifier domain. If this existence assumption, which is built into the semantics for M and reflected in the two quantifier rules, is rejected, then the inferences are unacceptable. What motivates rejecting the existence assumption and denying the correctness of the above inferences?

There are contexts in which singular terms are used without assuming that they refer to existing objects. For example, it is perfectly reasonable to regard the individual constants of a language used to talk about myths and fairy tales as not denoting existing objects. It seems inappropriate to infer that some actually existing individual is jolly on the basis that the sentence Santa Claus is jolly is true. Also, the logic of a language used to debate the existence of God should not presuppose that God refers to something in the world. The atheist doesn't seem to be contradicting herself in asserting that God does not exist. Furthermore, there are contexts in science where introducing an individual constant for an allegedly existing object such as a planet or particle should not require the scientist to know that the purported object to which the term allegedly refers actually exists. A logic that allows non-denoting individual constants (terms that do not refer to existing things) while maintaining the existential import of the quantifiers ('∀x' and '∃x' mean something like 'for all existing individuals x' and 'for some existing individuals x', respectively) is called a free logic. In order for the above two inferences to be correct by the lights of free logic, the sentence Evan exists must be added to the basis. Correspondingly, the ∃-Intro and ∀-Elim rules in a natural deductive system for free logic may be portrayed as follows. Again, let 'Ωv' be a formula in which v is the only free variable, and let n be any name.

∀-Elim ∃-Intro
k. vΩv k. Ωn
l. E!n l. E!n
m. Ωn ∀-Elim: k, l m. vΩv ∃-Intro: k, l

'E!n' abbreviates n exists and so we suppose that 'E!' is an item of the relevant language. The ∀-Intro and ∃-Elim rules in a free logic deductive system also make explicit the required existential presuppositions with respect to individual constants (for details see Bencivenga 1986, p. 387). Free logic seems to be a useful tool for representing and evaluating reasoning in contexts such as the above. Different types of free logic arise depending on whether we treat terms that do not denote existing individuals as denoting objects that do not actually exist or as simply not denoting at all.

In sum, there are contexts in which it is appropriate to use languages whose vocabulary and syntactic formation rules are independent of our knowledge of the actual existence of the entities the language is about. In such languages, the quantifier rules of deductive system N sanction incorrect inferences, and so at best N represents correct deductive reasoning in languages for which the existential presupposition with respect to singular terms makes sense. The proponent of system N may argue that only those expressions guaranteed a referent (e.g., demonstratives) are truly singular terms. On this view, advocated by Bertrand Russell at one time, expressions that may not have a referent such as Santa Claus, God, Evan, Bill Clinton, the child abused by Michael Jackson are not genuinely singular expressions. For example, in the sentence Evan is male, Evan abbreviates a unique description such as the son of Matt and Beth. Then Evan is male comes to

There exists a unique x such that x is a son of Matt and Beth and x is male.

From this we may correctly infer that some are male. The representation of this inference in N appeals to both the ∃-Intro and &exists;-Elim rules, as well as the &-Elim rule. However, treating most singular expressions as disguised definite descriptions at worst generates counter-intuitive truth-value assignments (Santa Claus is jolly turns out false since there is no Santa Claus) and seems at best an unnatural response to the criticism posed from the vantagepoint of free logic.

For a short discussion of the motives behind free logic and a review of the family of free logics see Read (1995, chap. 5). For a more comprehensive discussion and a survey of the relevant literature see Bencivenga (1986). Morscher and Hieke (2001) is a collection of recent essays devoted to taking stock of the past fifty years of research in free logic and outlining new directions.

6. Conclusion

This completes our discussion of the deductive-theoretic conception of logical consequence. Since, arguably, logical consequence conceived deductive-theoretically is not compact it cannot be defined in terms of deducibility in a correct deductive system. Nevertheless correct deductive systems are useful for modeling deductive reasoning and they have applications in areas such as computer science and mathematics. Is deductive system N correct? In other words: Do the Intro and Elim rules of N represent correct principles of inference? We sketched three motives for answering in the negative, each leading to a logic that differs from the classical one developed here and which requires altering Intro and Elim rules of N. It is clear from the discussion that any full coverage of the topic would have to engage philosophical issues, still a matter of debate, such as the nature of truth, meaning and inference. For a comprehensive and very readable survey of proposed revisions to classical logic (those discussed here and others) see Haack (1996). For discussion of related issues, see also the entries, "Logical Consequence, Philosophical Considerations" and "Logical Consequence, Model-Theoretic Conceptions" in this encyclopedia.

7. References and Further Reading

  • Anderson, A.R. and N. Belnap (1962): "Entailment", pp. 76-110 in Logic and Philosophy, ed. G. Iseminger. New York: Appleton-Century-Crofts, 1968.
  • Anderson, A.R., and N. Belnap (1975): Entailment: The Logic of Relevance and Necessity. Princeton: Princeton University Press.
  • Barwise, J. and J. Etchemendy (2001): Language, Proof and Logic. Chicago: University of Chicago Press and CSLI Publications.
  • Bencivenga, E. (1986): "Free logics", pp. 373-426 in Gabbay and Geunthner (1986).
  • Dunn, M. (1986): "Relevance Logic and Entailment", pp. 117-224 in Gabbay and Geunthner (1986).
  • Fitch, F.B. (1952): Symbolic Logic: An Introduction. New York: The Ronald Press.
  • Gabbay, D. and F. Guenthner, eds. (1983): Handbook of Philosophical Logic, Vol 1. Dordrecht: D. Reidel.
  • Gabbay, D. and F. Guenthner, eds. (1986): Handbook of Philosophical Logic, Vol. 3. Dordrecht: D. Reidel.
  • Gentzen, G. (1934): "Investigations Into Logical Deduction", pp. 68-128 in Collected Papers, ed. M.E. Szabo. Amsterdam: North-Holland, 1969.
  • Haack, S. (1978): Philosophy of Logics. Cambridge: Cambridge University Press.
  • Haack, S. (1996): Deviant Logic, Fuzzy Logic. Chicago: The University of Chicago Press.
  • Morscher E. and A. Hieke, eds. (2001): New Essays in Free Logic: In Honour of Karel Lambert, Dordrecht: Kluwer.
  • Pelletier, F.J. (1999): "A History of Natural Deduction and Elementary Logic Textbooks", pp.105-138 in Logical Consequence: Rival Approaches, ed. J. Woods and B. Brown. Oxford: Hermes Science Publishing, 2001.
  • Read, S. (1995): Thinking About Logic. Oxford: Oxford University Press.
  • Shapiro, S. (1991): Foundations without Foundationalism: A Case For Second-Order Logic. Oxford: Clarendon Press.
  • Shapiro, S. (2000): Thinking About Mathematics. Oxford: Oxford University Press.
  • Sundholm, G. (1983): "Systems of Deduction", in Gabbay and Guenthner (1983).
  • Tarski, A. (1936): "On the Concept of Logical Consequence", pp. 409-420 in Tarski (1983).
  • Tarski, A. (1983): Logic, Semantics, Metamathematics, 2nd ed. Indianapolis: Hackett Publishing.
  • Tennant, N. (1997): The Taming of the True. Oxford: Clarendon Press.
  • Van Dalen, D. (1999): "The Intuitionistic Conception of Logic", pp. 45-73 in Varzi (1999).
  • Varzi, A., ed. (1999): European Review of Philosophy, Vol. 4, The Nature of Logic, Stanford: CSLI Publications.

Author Information

Matthew McKeon
Email: mckeonm@msu.edu
Michigan State University
U. S. A.

Logical Consequence

For a given language, a sentence is said to be a logical consequence of a set of sentences, if and only if, in virtue of logic alone, the sentence must be true if every sentence in the set were to be true. This corresponds to the ordinary notion of a sentence "logically following" from others. Logicians have attempted to make the ordinary concept more precise relative to a given language L by sketching a deductive system for L, or by formalizing the intended semantics for L. Any adequate precise characterization of logical consequence must reflect its salient features such as those highlighted by Alfred Tarski: (1) that the logical consequence relation is formal, that is, depends on the forms of the sentences involved, (2) that the relation is a priori, that is, it is possible to determine whether or not it holds without appeal to sense-experience, and (3) that the relation has a modal element.

Table of Contents

  1. Introduction
  2. The Concept of Logical ConsequenceModel-Theoretic and Deductive-Theoretic Conceptions of Logic
    1. Tarski's characterization of the common concept of logical consequence
      1. The logical consequence relation has a modal element
      2. The logical consequence relation is formal
      3. The logical consequence relation is a priori
    2. Logical and non-logical terminology
      1. The nature of logical constants explained in terms of their semantic properties
      2. The nature of logical constants explained in terms of their inferential properties
  3. Model-Theoretic and Deductive-Theoretic Conceptions of Logic
  4. Conclusion
  5. References and Further Reading

1. Introduction

Logical consequence is arguably the central concept of logic. The primary aim of logic is to tell us what follows logically from what. In order to simplify matters we take the logical consequence relation to hold for sentences rather than for abstract propositions, facts, state of affairs, etc. Correspondingly, logical consequence is a relation between a given class of sentences and the sentences that logically follow. One sentence is said to be a logical consequence of a set of sentences, if and only if, in virtue of logic alone, it is impossible for the sentences in the set to be all true without the other sentence being true as well. If sentence X is a logical consequence of a set of sentences K, then we may say that K implies or entails X, or that one may correctly infer the truth of X from the truth of the sentences in K. For example, Kelly is not at work is a logical consequence of Kelly is not both at home and at work and Kelly is at home. However, the sentence Kelly is not a football fan does not follow from All West High School students are football fans and Kelly is not a West High School student. The central question to be investigated here is: What conditions must be met in order for a sentence to be a logical consequence of others?

One popular answer derives from the work of Alfred Tarski, one of the preeminent logicians of the twentieth century, in his famous 1936 paper, "The Concept of Logical Consequence." Here Tarski uses his observations of the salient features of what he calls the common concept of logical consequence to guide his theoretical development of it. Accordingly, we begin by examining the common concept focusing on Tarski's observations of the criteria by which we intuitively judge what follows from what and which Tarski thinks must be reflected in any theory of logical consequence. Then two theoretical definitions of logical consequence are introduced: the model theoretic and the deductive theoretic definitions. They represent two major approaches to making the common concept of logical consequence more precise. The article concludes by highlighting considerations relevant to evaluating model theoretic and deductive theoretic characterizations of logical consequence. For more comprehensive presentations of the two definitions of logical consequence, as well as further critical discussion, see the entries Logical Consequence, Model-Theoretic Conceptions and Logical Consequence, Deductive-Theoretic Conceptions.

2. The Concept of Logical Consequence

a. Tarski's characterization of the common concept of logical consequence

Tarski begins his article, "On the Concept of Logical Consequence," by noting a challenge confronting the project of making precise the common concept of logical consequence.

The concept of logical consequence is one of those whose introduction into a field of strict formal investigation was not a matter of arbitrary decision on the part of this or that investigator; in defining this concept efforts were made to adhere to the common usage of the language of everyday life. But these efforts have been confronted with the difficulties which usually present themselves in such cases. With respect to the clarity of its content the common concept of consequence is in no way superior to other concepts of everyday language. Its extension is not sharply bounded and its usage fluctuates. Any attempt to bring into harmony all possible vague, sometimes contradictory, tendencies which are connected with the use of this concept, is certainly doomed to failure. We must reconcile ourselves from the start to the fact that every precise definition of this concept will show arbitrary features to a greater or less degree. (Tarski 1936, p. 409)

Not every feature of the technical account will be reflected in the ordinary concept, and we should not expect any clarification of the concept to reflect each and every deployment of it in everyday language and life. Nevertheless, despite its vagueness, Tarski believes that there are identifiable, essential features of the common concept of logical consequence.

...consider any class K of sentences and a sentence X which follows from this class. From an intuitive standpoint, it can never happen that both the class K consists of only true sentences and the sentence X is false. Moreover, since we are concerned here with the concept of logical, that is, formal consequence, and thus with a relation which is to be uniquely determined by the form of the sentences between which it holds, this relation cannot be influenced in any way by empirical knowledge, and in particular by knowledge of the objects to which the sentence X or the sentences of class K refer. The consequence relation cannot be affected by replacing designations of the objects referred to in these sentences by the designations of any other objects. (Tarski 1936, pp. 414-415)

According to Tarski, the logical consequence relation as it is employed by typical reasoners is (1) necessary, (2) formal, and (3) not influenced by empirical knowledge. I now elaborate on (1)-(3) in order to shape two preliminary characterizations of logical consequence.

i. The logical consequence relation has a modal element

Tarski countenances an implicit modal notion in the common concept of logical consequence. If X is a logical consequence of K, then not only is it the case that not all of the elements of K are true and X is false, but also this is necessarily the case. That is, X follows from K only if it is not possible for all of the sentences in K to be true with X false. For example, the supposition that All West High School students are football fans and that Kelly is not a West High School student does not rule out the possibility that Kelly is a football fan. Hence, the sentences All West High School students are football fans and Kelly is not a West High School student do not entail Kelly is not a football fan, even if she, in fact, isn't a football fan. Also, Most of Kelly's male classmates are football fans does not entail Most of Kelly's classmates are football fans. What if the majority of Kelly's class is composed of females who are not fond of football?

We said above that Kelly is not both at home and at work and Kelly is at home jointly imply Kelly is not at work. Note that it doesn't seem possible for the first two sentences to be true and Kelly is not at work false. But it is hard to see what this comes to without further clarification of the relevant notion of possibility. For example, consider the following pairs of sentences.

Kelly kissed her sister at 2:00pm.
2:00pm is not a time during which Kelly
and her sister were 100 miles apart.
Kelly is a female.
Kelly is not the US President.
There is a chimp in Paige's house.
There is a primate in Paige's house.
Ten is a prime number.
Ten is greater than nine.

For each pair of sentences, there is a sense in which it is not possible for the first to be true and the second false. At the very least an account of logical consequence must distinguish logical possibility from other types of possibility. Should truths about physical laws, US political history, zoology, and mathematics constrain what we take to be possible in determining whether or not the first sentence of each pair could logically be true with the second sentence false? If not, then this seems to mystify logical possibility (e.g., how could ten be a prime number?). To paraphrase questions asked by G.E. Moore (1959, pp. 231-238), given that I know that George W. Bush is US President and that he is not a female named Kelly, isn't it inconsistent for me to grant the logical possibility of the truth of Kelly is a female and the falsity of Kelly is not the US President? Or should I ignore my present state of knowledge in considering what is logically possible? Tarski does not derive a clear notion of logical possibility from the common concept of logical consequence. Perhaps there is none to be had, and we should seek the help of a proper theoretical development in clarifying the notion of logical possibility. Towards this end, let's turn to the other features of logical consequence highlighted by Tarski, starting with the formality criterion of logical consequence.

ii. The logical consequence relation is formal

Tarski observes that logical consequence is a formal consequence relation. And he tells us that a formal consequence relation is a consequence relation that is uniquely determined by the form of the sentences between which it holds. Consider the following pair of sentences

(1) Some children are both lawyers and peacemakers.
(2) Some children are peacemakers

Intuitively, (2) is a logical consequence of (1). It appears that this fact does not turn on the subject matter of the sentences. Replace 'children', 'lawyers', and 'peacemakers' in (1) and (2) with the variables S, M, and P to get the following.

(1') Some S are both M and P
(2') Some S are P

(1') and (2') are forms of (1) and (2), respectively. Note that there is no interpretation of S, M, and P according to which the sentence that results from (1') is true and the resulting instance of (2') is false. Hence, (2) is a formal consequence of (1) and on each interpretation of S, M, and P the resulting (2') is a formal consequence of the sentence that results from (1') (e.g., Some clowns are sad is a formal consequence of Some clowns are both lonely and sad). Tarski's observation is that for any sentence X and set K of sentences, X is a logical consequence of K only if X is a formal consequence of K. The formality criterion of logical consequence can work in explaining why one sentence doesn't entail another in cases where it seems impossible for the first to be true and the second false. For example, (3) is false and (4) is true.

(3) Ten is a prime number
(4) Ten is greater than nine

Does (4) follow from (3)? One might think that (4) does not follow from (3) because being a prime number does not necessitate being greater than nine. However, this does not require one to think that ten could be a prime number and less than or equal to nine, which is probably a good thing since it is hard to see how this is possible. Rather, we take

(3') a is a P
(4') a is R than b

to be the forms of (5) and (6) and note that there are interpretations of 'a', 'b', 'P', and 'R' according to which the first is true and the second false (e.g. let 'a' and 'b' name the numbers two and ten, respectively, and let 'P' mean prime number, and 'R' greater). Note that the claim here is not that formality is sufficient for a consequence relation to qualify as logical but only that it is a necessary condition. I now elaborate on this last point by saying a little more about forms of sentences (that is, sentential forms) and formal consequence.

Distinguishing between a term of a sentence replaced with a variable and one held constant determines a form of the sentence. In Some children are both lawyers and peacemakers we may replace 'Some' with a variable and treat all the other terms as constant. Then

(1'') D children are both lawyers and peacemakers

is a form of (1), and each sentence generated by assigning a meaning to D shares this form with (1). For example, the following three sentences are instances of (1''), produced by interpreting D as 'No', 'Many', and 'Few'.

No children are both lawyers and peacemakers
Many children are both lawyers and peacemakers
Few children are both lawyers and peacemakers

Whether X is a formal consequence of K then turns on a prior selection of terms as constant and others replaced with variables. Relative to such a determination, X is a formal consequence of K if and only if (iff) there is no interpretation of the variables according to which each of the K are true and X is false. So, taking all the terms, except for 'Some', in (1) Some children are both philosophers and peacemakers and in (2) Some children are peacemakers as constants makes the following forms of (1) and (2).

(1'') D children are both lawyers and peacemakers
(2'') D children are peacemakers

Relative to this selection, (2) is not a formal consequence of (1) because replacing 'D' with 'No' yields a true instance of (1'') and a false instance of (2'').

Consider the following pair.

(5) Kelly is female
(6) Kelly is not US President

(6) is a formal consequence of (5) relative to replacing 'Kelly' with a variable. Given current U.S. political history, there is no individual whose name yields a true (5) and a false (6) when it replaces 'Kelly'. This is not, however, sufficient reason for seeing (6) as a logical consequence of (5). There are two ways of thinking about why, a metaphysical consideration and an epistemological one. First the metaphysical consideration. It seems possible for (5) to be true and (6) false. The course of U.S. political history could have turned out differently. One might think that the current US President could--logically--have been a female named, say, 'Sally'. Using 'Sally' as a replacement for 'Kelly' would yield in that situation a true (5) and a false (6). Also, it seems possible that in the future there will be a female US President. In order for a formal consequence relation from K to X to qualify as logical it has to be the case that it is necessary that there is no interpretation of the variables in K and X according to which the K-sentences are true and X is false.

The epistemological consideration is that one might think that knowledge that X follows logically from K should not essentially depend on being justified by experience of extra-linguistic states of affairs. Clearly, the determination that (6) follows formally from (5) essentially turns on empirical knowledge, specifically knowledge about the current political situation in the US. This leads to the final highlight of Tarski's rendition of the intuitive concept of logical consequence: that logical consequence cannot be influenced by empirical knowledge.

iii. The logical consequence relation is a priori

Tarski says that by virtue of being formal, knowledge that X follows logically from K cannot be affected by knowledge of the objects that X and the sentences of K are about. Hence, our knowledge that X is a logical consequence of K cannot be influenced by empirical knowledge. However, as noted above, formality by itself does not insure that the extension of a consequence relation is not influenced by empirical knowledge. So, let's view this alleged feature of logical consequence as independent of formality. We characterize empirical knowledge in two steps as follows. First, a priori knowledge is knowledge "whose truth, given an understanding of the terms involved, is ascertainable by a procedure which makes no reference to experience" (Hamlyn 1967, p. 141). Empirical, or a posteriori, knowledge is knowledge that is not a priori, that is, knowledge whose validation necessitates a procedure that does make reference to experience. We can safely read Tarski as saying that a consequence relation is logical only if knowledge that something falls in its extension is a priori, that is, only if the relation is a priori. Knowledge of physical laws, a determinant in people's observed sizes, is not a priori and such knowledge is required to know that there is no interpretation of k, h, and t according to which (7) is true and (8) false.

(7) k kissed h at time t
(8) t is not a time during which k and h were 100 miles apart

So (8) cannot be a logical consequence of (7). However, my knowledge that Kelly is not Paige's only friend follows from Kelly is taller than Paige's only friend is a priori since I know a priori that nobody is taller than herself.

Let's summarize and tie things together. We began by asking, for a given language L, what conditions must be met in order for a sentence X of L to be a logical consequence of a class K of L-sentences? Tarski thinks that an adequate response must reflect the common concept of logical consequence, that is, the concept as it is ordinarily employed. By the lights of this concept, an adequate account of logical consequence must reflect the formality and necessity of logical consequence, and must also reflect the fact that knowledge of what follows logically from what is a priori. Tying the criteria together, in order to fix what follows logically from what in a given language L, we must select a class of constants that determines a formal consequence relation that is both necessary and known, if at all, a priori. Such constants are called logical constants, and we say that the logical form of a sentence is a function of the logical constants that occur in the sentence and the pattern of the remaining expressions. As was illustrated above, the notion of formality does not presuppose a criterion of logical constancy. A consequence relation based on any division between constants and terms replaced with variables will automatically be formal with respect to the latter.

b. Logical and non-logical terminology

Tarski's basic move from his rendition of the common concept of logical consequence is to distinguish between logical terms and non-logical terms and then say that X is a logical consequence of K only if there is no possible interpretation of the non-logical terms of the language L that makes all of the sentences in K true and X false. The choice of the right terms as logical will reflect the modal element in the concept of logical consequence, that is, will insure that there is no 'possible' interpretation of the variable, non-logical terms of the language L that makes all of the K true and X false, and will insure that this is known a priori. Of course, we have yet to spell out the modal notion in the concept of logical consequence. Tarski pretty much left this underdeveloped in his (1936). Lacking such an explanation hampers our ability to clarify the rationale for a selection of terms to serve as the logical ones.

Traditionally, logicians have regarded sentential connectives such as and, not, or, if...then, the quantifiers all and some, and the identity predicate '=' as logical terms. Remarking on the boundary between logical and non-terms, Tarski (1936, p. 419) writes the following.

Underlying this characterization of logical consequence is the division of all terms of the language discussed into logical and extra-logical. This division is not quite arbitrary. If, for example, we were to include among the extra-logical signs the implication sign, or the universal quantifier, then our definition of the concept of consequence would lead to results which obviously contradict ordinary usage. On the other hand, no objective grounds are known to me which permit us to draw a sharp boundary between the two groups of terms. It seems to be possible to include among logical terms some which are usually regarded by logicians as extra-logical without running into consequences which stands in sharp contrast to ordinary usage.

Tarski seems right to think that the logical consequence relation turns on the work that the logical terminology does in the relevant sentences. It seems odd to say that Kelly is happy does not logically follow from All are happy because the second is true and the first false when All is replaced with Few. However, by Tarski's version of the ordinary concept of logical consequence there is no reason not to treat say taller than as a logical term along with not and, therefore, no reason not to take Kelly is not taller than Paige as following logically from Paige is taller than Kelly. Also, it seems plausible to say that I know a priori that there is no possible interpretation of Kelly and is mortal according to which it is necessary that Kelly is mortal is true and Kelly is mortal is false. This makes Kelly is mortal a logical consequence of it is necessary that Kelly is mortal. Given that taller than and it is necessary that, along with other terms, were not generally regarded as logical terms by logicians of Tarski's day, the fact that they seem to be logical terms by the common concept of logical consequence, as observed by Tarski, highlights the question of what it takes to be a logical term. Tarski says that future research will either justify the traditional boundary between the logical and the non-logical or conclude that there is no such boundary and the concept of logical consequence is a relative concept whose extension is always relative to some selection of terms as logical (p. 420). For further discussion of Tarski's views on logical terminology and contemporary views see Logical Consequence, Model-Theoretic Conceptions: Section 5.3.

How, exactly, does the terminology usually regarded by logicians as logical work in making it the case that one sentence follows from others? In the next two sections two distinct approaches to understanding the nature of logical terms are sketched. Each approach leads to a unique way of characterizing logical consequence and thus yields a unique response to the above question.

i. The nature of logical constants explained in terms of their semantic properties

Consider the following metaphor, borrowed from Bencivenga (1999).

The locked room metaphor

Suppose that you are locked in a dark windowless room and you know everything about your language but nothing about the world outside. A sentence X and a class K of sentences are presented to you. If you can determine that X is true if all the sentences in K are, X is a logical consequence of K.

Ignorant of US politics, I couldn't determine the truth of Kelly is not US President solely on the basis of Kelly is a female. However, behind such a veil of ignorance I would be able to tell that Kelly is not US President is true if Kelly is female and Kelly is not US President is true. How? Short answer: based on my linguistic competence; longer answer: based on my understanding of the semantic contribution of and to the determination of the truth conditions of a sentence of the form P and Q. For any sentences P and Q, I know that P and Q is true just in case P is true and Q is true. So, I know, a priori, if P and Q is true, then Q is true. As noted by one philosopher, "This really is remarkable since, after all, it's what they mean, together with the facts about the non-linguistic world, that decide whether P or Q are true" (Fodor 2000, p.12).

Taking not and and to be the only logical constants in (9) Kelly is not both at home and at work, (10) Kelly is at home, and (11) Kelly is not at work, we formalize the sentences as follows, letting k mean Kelly, H mean is at home, and W mean is at work.

(9') not-(Hk and Wk)
(10') Hk
(11') not-Wk

There is no interpretation of k, H, and W according to which (9') and (10') are true and (11') is false. The reason why turns on the semantic properties of and and not, which are knowable a priori. Suppose (9') and (10') are true on some interpretation of the variable terms. Then the meaning of not in (9') makes it the case that Hk and Wk is false, which, by the meaning of and requires that Hk is false or Wk is false. Given (10'), it must be that Wk is false, that is, not-Wk is true. So, there can't be an interpretation of the variable terms according to which (9') and (10') are true and (11') is false, and, as the above reasoning illustrates, this is due exclusively to the semantic properties of not and and. So the reason that it is impossible that an interpretation of k, H, and W make (9') and (10') true and (11') false is that the supposition otherwise is inconsistent with the semantic functioning of not and and. Compare: the supposition that there is an interpretation of k according to which k is a female is true and k is not US President is false does not seem to violate the semantic properties of the constant terms. If we identify the meanings of the predicates with their extensions in all possible worlds, then the supposition that there is a female U.S. President does not violate the meanings of female and US President for surely it is possible that there be a female US President. But, supposing that (9') and (10') could be true with (11') false on some interpretation of k, H, and W, violates the semantic properties of either and or not.

In sum, our first-step characterization of logical consequence is the following. For a given language L,

X is a logical consequence of K if and only if there is no possible interpretation of the non-logical terminology of L according to which all the sentence in K are true and X is false.

A possible interpretation of the non-logical terminology of the language L according to which sentences are true or false is a reading of the non-logical terms according to which the sentences receive a truth-value (that is, is either true or false) in a situation that is not ruled out by the semantic properties of the logical constants. The philosophical locus of the technical development of 'possible interpretation' in terms of models is Tarski (1936). A model for a language L is the theoretical development of a possible interpretation of non-logical terminology of L according to which the sentences of L receive a truth-value. Models have become standard tools for characterizing the logical consequence relation, and the characterization of logical consequence in terms of models is called the Tarskian or model-theoretic characterization of logical consequence. We say that X is a model-theoretic consequence of K if and only if all models of K are models of X. This relation may be represented as K ⊨ X. If model-theoretic consequence is adequate as a representation of logical consequence, then it must reflect the salient features of the common concept, which, according to Tarski means that it must be necessary, formal and a priori.

For further discussion of this conception of logical consequence, see the article, Logical Consequence, Model-Theoretic Conceptions.

ii. The nature of logical constants explained in terms of their inferential properties

We now turn to a second approach to understanding logical constants. Instead of understanding the nature of logical constants in terms of their semantic properties as is done on the model-theoretic approach, on the second approach we appeal to their inferential properties conceived of in terms of principles of inference, that is, principles justifying steps in deductions. We begin with a remark made by Aristotle. In his study of logical consequence, Aristotle comments that

A syllogism is discourse in which, certain things being stated, something other than what is stated follows of necessity from their being so. I mean by the last phrase that they produce the consequence, and by this, that no further term is required from without in order to make the consequence necessary. (Prior Analytics, p. 24b)

Adapting this to our X and K, we may say that X is a logical consequence of K when the sentences of K are sufficient to produce X. How are we to think of a sentence being produced by others? One way of developing this is to appeal to a notion of an actual or possible deduction. X is a deductive consequence of K if and only if there is a deduction of X from K. In such a case, we say that X may be correctly inferred from K or that it would be correct to conclude X from K. A deduction is associated with a pair ; the set K of sentences is the basis of the deduction, and X is the conclusion. A deduction from K to X is a finite sequence S of sentences ending with X such that each sentence in S (that is, each intermediate conclusion) is derived from a sentence (or more) in K or from previous sentences in S in accordance with a correct principle of inference.

For example, intuitively, the following inference seems correct.

  Kelly is not both at home and at work
  Kelly is at home
(therefore) Kelly is not at work

The set K of sentences above the line is the basis of the inference and the sentence X below is the conclusion. We represent their logical forms, again, as follows.

  (9') not-(Hk and Wk)
  (10') Hk
(therefore) (11') not-Wk

Consider the following deduction of (11') from (10') and (9').

Deduction: Assume that (12') Wk. Then from (10') and (12') we may deduce that (13') Hk and Wk. (13') contradicts (9') and so (12'), our initial assumption, must be false. We have deduced not-Wk from not-(Hk and Wk) and Hk.

Since the deduction of not-Wk from not-(Hk and Wk) and Hk did not depend on the interpretation of k, W, and H, the deductive relation is formal. Furthermore, my knowledge of this is a priori because my knowledge of the underlying principles of inference in the above deduction is not empirical. For example, letting P and Q be any sentences, we know a priori that P and Q may be inferred from the set K={P, Q} of basis sentences. This principle grounds the move from (10') and (12') to (13'). Also, the deduction appeals to the principle that if we deduce a contradiction from an assumption, then we may infer that the assumption is false. The correctness of this principle seems to be an a priori matter. Let's look at another example of a deduction.

  (1) Some children are both lawyers and peacemakers
(therefore) (2) Some children are peacemakers

The logical forms are, again, the following.

  (1') Some S are both M and P
(therefore) (2') Some S are P

Again, intuitively, (2') is deducible from (1').

Deduction: The basis tells us that at least one S--let's call this S 'a'--is both an M and a P. Clearly, a is a P may be deduced from a is both an M and a P. Since we've assumed that a is an S, what we derive with respect to a we derive with respect to some S. So our derivation of a is a P is a derivation of Some S is a P, which is our desired conclusion.

Since the deduction is formal, we have shown not merely that (2) can be correctly inferred from (1), but we have shown that for any interpretation of S, M, and P it is correct to infer (2') from (1').

Typically, deductions leave out steps (perhaps because they are too obvious), and they usually do not justify each and every step made in moving towards the conclusion (again, obviousness begets brevity). The notion of a deduction is made precise by describing a mechanism for constructing deductions that are both transparent and rigorous (each step is explicitly justified and no steps are omitted). This mechanism is a deductive system (also known as a formal system or as a formal proof calculus). A deductive system D is a collection of rules that govern which sequences of sentences, associated with a given , are allowed and which are not. Such a sequence is called a proof in D (or, equivalently, a deduction in D) of X from K. The rules must be such that whether or not a given sequence associated with qualifies as a proof in D of X from K is decidable purely by inspection and calculation. That is, the rules provide a purely mechanical procedure for deciding whether a given object is a proof in D of X from K.

We say that a deductive system D is correct when for any K and X, proofs in D of X from K correspond to intuitively valid deductions. For example, intuitively, there are no correct principles of inference according to which it is correct to conclude

Some animals are both mammals and reptiles

on the basis of the following two sentences.

Some animals are mammals
Some animals are reptiles

Hence, a proof in a deductive system of the former sentence from the latter two is evidence that the deductive system is incorrect. The point here is that a proof in D may fail to represent a deduction if D is incorrect.

A rich variety of deductive systems have been developed for registering deductions. Each system has its advantages and disadvantages, which are assessed in the context of the more specific tasks the deductive system is designed to accomplish. Historically, the general purpose of the construction of deductive systems was to reduce reasoning to precise mechanical rules (Hodges 1983, p. 26). Some view a deductive system defined for a language L as a mathematical model of actual or possible chains of correct reasoning in L. Sundholm (1983) offers a thorough survey of three main types of deductive systems. For a shorter, excellent introduction to the concept of a deductive system see Henkin (1967). A deductive system is developed in detail in the accompanying article, Logical Consequence, Deductive-Theoretic Conceptions.

If there is a proof of X from K in deductive system D, then we may say that X is a deductive consequence in D of K, which is sometimes expressed as K ⊢D X. Relative to a correct deductive system D, we characterize logical consequence in terms of deductive consequence as follows.

X is a logical consequence of K if and only if X is a deductive consequence in D of K, that is, there is an actual or possible proof in D of X from K.

This is called the deductive-theoretic (or proof-theoretic) characterization of logical consequence.

3. Model-Theoretic and Deductive-Theoretic Conceptions of Logic

We began with Tarski's observations of the common or ordinary concept of logical consequence that we employ in daily life. According to Tarski, if X is a logical consequence of a set of sentences, K, then, in virtue of the logical forms of the sentences involved, if all of the members of K are true, then X must be true, and furthermore, we know this a priori. The formality criterion makes the logical constants the essential determinant of the logical consequence relation. The logical consequence relation is fixed exclusively in terms of the nature of the logical terminology. We have highlighted two different approaches to the nature of a logical constant: (1) in terms of its semantic contribution to sentences in which it occurs and (2) in terms of its inferential properties. The two approaches yield distinct conceptions of the notion of necessity inherent in the common concept of logical consequence, and lead to the following characterizations of logical consequence.

(1) X is a logical consequence of K if and only if there is no possible interpretation of the non-logical terminology of the language according to which all the sentences in K are true and X is false.

(2) X is a logical consequence of K if and only if X is deducible from K.

We make the notions of possible interpretation in (1) and deducibility in (2) precise by appealing to the technical notions of model and deductive system. This leads to the following theoretical characterizations of logical consequence.

(1) The model-theoretic characterization of logical consequence: X is a logical consequence of K iff all models of K are models of X.

(2) The deductive- theoretic characterization of logical consequence: X is a logical consequence of K iff there is a deduction in a correct deductive system of X from K.

Following Shapiro (1991, p. 3) define a logic to be a language L plus either a model-theoretic or a deductive-theoretic account of logical consequence. A language with both characterizations is a full logic just in case both characterizations coincide. A soundness proof establishes K ⊢D X only if K ⊨ X, and a completeness proof establishes K ⊢D X if K ⊨ X. These proofs together establish that the two characterizations coincide, and in such a case the deductive system D is said to be complete and sound with respect to the model-theoretic consequence relation defined for the relevant language L.

We said that the primary aim of logic is to tell us what follows logically from what. These two characterizations of logical consequence lead to two different orientations or conceptions of logic (see Tharp 1975, p. 5).

Model-theoretic approach: Logic is a theory of possible interpretations. For a given language the class of situations that can--logically--be described by that language.

Deductive-theoretic approach: Logic is a theory of formal deductive inference.

The article now concludes by highlighting three considerations relevant to evaluating a particular deployment of the model-theoretic or deductive-theoretic definition in defining logical consequence. These considerations emerge from the above development of the two theoretic definitions from the common concept of logical consequence.

4. Conclusion

The two theoretical characterizations of logical consequence do not provide the means for drawing a boundary in a language L between logical and non-logical terms. Indeed, their use presupposes that a list of logical terms is in hand. Hence, in evaluating a model-theoretic or deductive-theoretic definition of logical consequence for a language L the issue arises whether or not the boundary in L between logical and non-logical terms has been correctly drawn. This requires a response to a central question in the philosophy of logic: what qualifies as a logical constant? Tarski gives a well-reasoned response in his (1986). (For more recent discussion see McCarthy 1981 and 1998, Hanson 1997, and Warbrod 1999.)

A second thing to consider in evaluating a theoretical account of logical consequence is whether or not its characterization of the logical terminology is accurate. For example, model-theoretic and deductive accounts of logical consequence are inadequate unless they reflect the semantic and inferential properties of the logical terms, respectively. So a model-theoretic account is inadequate unless it gets right the semantic contributions of the logical terms to the truth conditions of the sentences formed using them. For a particular deductive system D, the question arises whether or not D's rules of inference reflect the inferential properties of the logical terms. (For further discussion of the semantic and inferential properties of logical terms see Haack 1978 and 1996, Read 1995, and Quine 1986.)

A third consideration in assessing the success of a theoretical definition of logical consequence is whether or not the definition, relative to a selection of terms as logical, reflects the salient features of the common concept of logical consequence. There are criticisms of the theoretical definitions that claim that they are incapable of reflecting the common concept of logical consequence. Typically, such criticisms are used to question the status of the model-theoretic and deductive-theoretic approaches to logic.

For example, there are critics who question the model-theoretic approach to logic by arguing that any model-theoretic account lacks the conceptual resources to reflect the notion of necessity inherent in the common concept of logical consequence because such an account does not rule out the possibility of there being logically possible situations in which sentences in K are true and X is false even though every model of K is a model of X. Kneale (1961) is an early critic, Etchemendy (1988, 1999) offers a sustained and multi-faceted attack. Also, it is argued that the model-theoretic approach to logic makes knowledge of what follows from what depend on knowledge of the existence of models, which is knowledge of worldly matters of fact. But logical knowledge should not depend on knowledge about the extra-linguistic world (recall the locked room metaphor in 2.2.1). This standard logical positivist line has been recently challenged by those who see logic penetrated and permeated by metaphysics (e.g., Putnam 1971, Almog 1989, Sher 1991, Williamson 1999).

The status of the deductive-theoretic approach to logic is not clear for, as Tarski argues in his (1936), deductive-theoretic accounts are unable to reflect the fact that, according to the common concept, logical consequence is not compact. Relative to any deductive system D, the ⊢D-consequence relation is compact if and only if for any sentence X and set K of sentences, if K ⊢D X, then K' ⊢D X, where K' is a finite subset of sentences from K. But there are intuitively correct principles of inference according to which one may infer a sentence X from a set K of sentences, even though it is incorrect to infer X from any finite subset of K. This suggests that the intuitive notion of deducibility is not completely captured by any compact consequence relation. We need to weaken

X is a logical consequence of K if and only if there is a proof in a correct deductive system of X from K,

given above, to

X is a logical consequence of K if there is a proof in a correct deductive system of X from K.

In sum, the issue of the nature of logical consequence, which intersects with other areas of philosophy, is still a matter of debate. Tarski's analysis of the concept is not universally accepted; philosophers and logicians differ over what the features of the common concept are. For example, some offer accounts of the logical consequence relation according to which it is not a priori (e.g., see Koslow 1999, Sher 1991 and see Hanson 1997 for criticism of Sher) or deny that it even need be strongly necessary (Smiley 1995, 2000, section 6). The entry Logical Consequence, Model-Theoretic Conceptions gives a model-theoretic definition of logical consequence. For a detailed development of a deductive system see the entry Logical Consequence, Deductive-Theoretic Conceptions. The critical discussion in both articles deepens and extends points made in the conclusion of this article.

5. References and Further Reading

  • Almog, J. (1989): "Logic and the World", pp. 43-65 in Themes From Kaplan, ed. J. Almog, J. Perry, J., and H. Wettstein. New York: Oxford UP.
  • Aristotle. (1941): Basic Works, ed. R. McKeon. New York: Random House.
  • Bencivenga, E. (1999): "What is Logic About?", pp. 5-19 in Varzi (1999).
  • Etchemendy, J. (1983): "The Doctrine of Logic as Form", Linguistics and Philosophy 6, pp. 319-334.
  • Etchemendy, J. (1988): "Tarski on truth and logical consequence", Journal of Symbolic Logic 53, pp. 51-79.
  • Etchemendy, J. (1999): The Concept of Logical Consequence. Stanford: CSLI Publications.
  • Fodor, J. (2000): The Mind Doesn't Work That Way. Cambridge: The MIT Press.
  • Gabbay, D. and F. Guenthner, eds. (1983): Handbook of Philosophical Logic, Vol 1. Dordrecht: D. Reidel Publishing Company.
  • Haack, S. (1978): Philosophy of Logics . Cambridge: Cambridge University Press.
  • Haack, S. (1996): Deviant Logic, Fuzzy Logic. Chicago: The University of Chicago Press.
  • Hodges, W. (1983): "Elementary Predicate Logic", in Gabbay, D. and F. Guenthner (1983).
  • Hamlyn, D.W. (1967): "A Priori and A Posteriori", pp.105-109 in The Encyclopedia of Philosophy, Vol. 1, ed. P. Edwards. New York: Macmillan & The Free Press.
  • Hanson, W. (1997): "The Concept of Logical Consequence", The Philosophical Review 106, pp. 365-409.
  • Henkin, L. (1967): "Formal Systems and Models of Formal Systems", pp. 61-74 in The Encyclopedia of Philosophy, Vol. 8, ed. P. Edwards. New York: Macmillan & The Free Press.
  • Kneale, W. (1961): "Universality and Necessity", British Journal for the Philosophy of Science 12, pp. 89-102.
  • Koslow, A. (1999): "The Implicational Nature of Logic: A Structuralist Account", pp. 111-155 in Varzi (1999).
  • McCarthy, T. (1981): "The Idea of a Logical Constant", Journal of Philosophy 78, pp. 499-523.
  • McCarthy, T. (1998): "Logical Constants", pp. 599-603 in Routledge Encyclopedia of Philosophy, Vol. 5, ed. E. Craig. London: Routledge.
  • McGee, V. (1999): "Two Problems with Tarski's Theory of Consequence", Proceedings of the Aristotelean Society 92, pp. 273-292.
  • Moore, G.E., (1959): "Certainty", pp. 227-251 in Philosophical Papers. London: George Allen & Unwin.
  • Priest. G. (1995): "Etchemendy and Logical Consequence", Canadian Journal of Philosophy 25, pp. 283-292.
  • Putnam, H. (1971): Philosophy of Logic. New York: Harper & Row.
  • Quine, W.V. (1986): Philosophy of Logic, 2nd ed.. Cambridge: Harvard UP.
  • Read, S. (1995): Thinking About Logic. Oxford: Oxford UP.
  • Shapiro, S. (1991): Foundations without Foundationalism: A Case For Second-Order Logic. Oxford: Clarendon Press.
  • Shapiro, S. (1993): " Modality and Ontology", Mind 102, pp. 455-481.
  • Shapiro, S. (1998): "Logical Consequence: Models and Modality", pp. 131-156 in The Philosophy of Mathematics Today, ed. Matthias Schirn. Oxford, Clarendon Press.
  • Shapiro, S. (2000): Thinking About Mathematics , Oxford: Oxford University Press.
  • Sher, G. (1989): "A Conception of Tarskian Logic", Pacific Philosophical Quarterly 70, pp. 341-368.
  • Sher, G. (1991): The Bounds of Logic: A Generalized Viewpoint, Cambridge, MA: The MIT Press.
  • Sher, G. (1996): "Did Tarski commit 'Tarski's fallacy'?" Journal of Symbolic Logic 61, pp. 653-686.
  • Sher, G. (1999): "Is Logic a Theory of the Obvious?", pp. 207-238 in Varzi (1999).
  • Smiley, T. (1995): "A Tale of Two Tortoises", Mind 104, pp. 725-36.
  • Smiley, T. (1998): "Consequence, Conceptions of", pp. 599-603 in Routledge Encyclopedia of Philosophy, vol. 2, ed. E. Craig. London: Routledge.
  • Sundholm, G. (1983): "Systems of Deduction", in Gabbay and Guenthner (1983).
  • Tarski, A. (1933): "Pojecie prawdy w jezykach nauk dedukeycyjnych", translated as "On the Concept of Truth in Formalized Languages", pp. 152-278 in Tarski (1983).
  • Tarski, A. (1936): "On the Concept of Logical Consequence", pp. 409-420 in Tarski (1983).
  • Tarski, A. (1983): Logic, Semantics, Metamathematics, 2nd ed. Indianapolis: Hackett Publishing.
  • Tarski, A. (1986): "What are logical notions?" History and Philosophy of Logic 7, pp. 143-154.
  • Tharp, L. (1975): "Which Logic is the Right Logic?" Synthese 31, pp. 1-21.
  • Warbrod, K., (1999): "Logical Constants" Mind 108, pp. 503-538.
  • Williamson, T. (1999): "Existence and Contingency", Proceedings of the Aristotelian Society Supplementary Vol. 73, pp. 181-203.
  • Varzi, A., ed. (1999): European Review of Philosophy, Vol. 4: The Nature of Logic, Stanford: CSLI Publications.

Author Information

Matthew McKeon
Email: mckeonm@msu.edu
Michigan State University
U. S. A.

Emile Meyerson (1859—1933)

meyersonEmile Meyerson, a chemist and philosopher of science, proposed that the explanations of science are governed by two fundamental principles of reason, namely, the principle of lawfulness and the principle of causality. While the contents of explanations change through history as the explanatory theories of science move from early atomism and qualitative theories to relativity physics and quantum mechanics, the form of thought stays the same, Meyerson said. The following article provides an overview of his life, influence, philosophy of science, and writings.

Meyerson studies the theories of science from the point of view of psychology. His work spans a 2500-year period of developments in science, and he claims that the goal of reason to explain and control nature is the same now as always because of the action of two innate psychological principles. Meyerson then extends their range to the realm of common sense. His study generates two main questions. The first concerns the accuracy of what he says about the mind, while the second applies his discovery to the course of future developments in science. Can the proper use of these psychological principles help us avoid bad science?

Meyerson calls his two innate psychological principles "lawfulness and causality." The first principle of reason leads us to expect the regularity of natural events. We expect to find that the relationship between conditions and property behavior in nature remains constant. In his words, “our acts are performed in view of an end which we foresee; but this foresight would be entirely impossible if we did not have the absolute conviction that nature is well ordered, that certain antecedents determine and will always determine certain consequences” (IR 19). The second innate principle, causality, leads us to expect identities between the antecedent and consequent of a change. This principle underlies the success of scientific laws.

Table of Contents

  1. Life
  2. Influence
  3. Philosophy of Science
  4. References and Further Reading
    1. Books
    2. Articles

1. Life

Emile Meyerson was born in Lublin Poland on February 12, 1859. In 1870, he traveled to Heidelberg, Germany, to study chemistry with Wilhelm Bunsen and Hermann Kopp, and to Berlin to study chemistry with Liebermann. He came to France at age 22 and spent two years (1882-1884) at the Schulzenberger laboratory of the College de France to complete his studies in chemistry. In 1884 he served as Director of a dye factory in Argenteuil, but after a bitter disappointment with applied chemistry (see, Frédéric Lefevre, ‘Une heure avec M. Emile Meyerson’ In Les Nouvelles Littéraires, Saturday, Nov. 6, 1926) he left in 1889 to read philosophy at the Nationale. He read Renouvier (who taught him how to apply a scientific background to philosophy), Kant (who taught him that the thing in itself was unknowable) and Descartes (who taught him about the mathematical nature of science). He read in the history of science for 19 years before publishing his first book in 1908. During this period he supported himself by working as foreign news correspondent with l’agence Havas (Meyerson was fluent in the major European languages.) He became a naturalized French citizen after the war. The greatest influences on his thought are Auguste Comte, Boutroux and Bergson, Poincaré and Duhem, Descartes and Kant. Meyerson labels himself an ‘antipositivist’. He spent afternoons at the library reading the history of science, and evenings at home in conversation with the leading thinkers of the day; notably Lévy-Brühl, Brunschvicg, Lalande, and Langevin (plagued by insomnia Meyerson rarely slept more than four hours a day.) Whenever Einstein was in Paris, he would make it a point to visit Meyerson. In 1897, Meyerson was appointed Director General of the Jewish Colonization Association (JCA). He viewed the appointment as an opportunity to encourage the establishment of a Jewish settlement in Palestine. Meyerson shared Spencer’s belief that the rules of natural selection that govern the animal world should apply equally to human societies. On Saturday, December 2, 1933, in Paris, France, Meyerson died in his sleep of a heart attack. He had been unwell for some time. An article by André George commemorating Meyerson’s contribution to the philosophy of science appeared in Les Nouvelles Littéraires Dec 9, 1933.

The Central Zionist Archives (CZA) in Jerusalem contains 5.6 metres (35 boxes) of material and many thousands of documents on Meyerson. See ‘Personal Papers’ A 408 Emile Meyerson. (Rochelle Rubinstein, 2004).

2. Influence

The work of Emile Meyerson is an investigation into the psychological principles that accompany scientific theories. His work forms an important chapter in the history of science. From the first appearance of Identité et réalité in 1908, Emile Meyerson has been acclaimed as one of the most stimulating thinkers of our time. The title ‘Profound Philosopher,’ which Bergson conferred upon him in 1909, never left him. Einstein published an article in 1928 in which he expressed approval and admiration for what Meyerson said about the psychology of relativity physics. George Boas and André Metz are two of a long list of philosophers that wrote major books on his philosophy. Boas spent time with Meyerson getting to know him personally, while Metz is a life-long disciple. J. Lowenberg hailed him as a new Kant and thought that Meyerson had provided an important refutation of positivism. L. Lichtenstein at the University of Leipzig and C. De Koninck at Laval University developed courses on his philosophy. Scholars such as Blumberg, Bachelard, Brunschvieg, Lalande, Maritain, Schlick, and Sée, have been impressed by his work. Many doctoral dissertations are written on Meyerson’s work. André Bonnard, Charles De Koninck, T. R. Kelly, Joseph La Lumia, George Mourélos, Henri Sée, C.G. Sterling, O. Stumper, and W. A. Wallace have each written a book on his philosophy. Meyerson’s study of the history of scientific developments influenced modern French historiography of science (Alexandre Koyré, Hélène Metzger…)

3. Philosophy of Science

References to Meyerson’s work are abbreviated IR (trans.) for Identity and Reality; ES for De l’explication dans les sciences; DR for La déduction relativiste; CP for Du cheminement de la pensée; RD for Réel et déterminisme dans la physique quantique. These along with Essais, a posthumous publication of his major articles, make up the whole of his work.

Meyerson’s work is a study of scientific inductions, past and present. He examined the works of science to determine the psychological nature of scientific thought. Whereas Auguste Comte had argued that the ‘principle of lawfulness’ (the description of phenomena) governs the whole of thought, Meyerson’s evidence suggested to him that this was not the whole of thought. Science, he says, attempts equally to explain phenomena. This explanation consists in the identification of antecedent and consequent. His empirical study of scientific theories, old and new, proposes that two innate principles of reason regulate how the scientist views reality. The first rational principle predisposes a scientist to expect that nature shall attend herself with some degree of regularity. The second principle, leads a scientist to expect that the identification of antecedent and consequent shall explain the phenomena of observation. The name he reserves for these two psychological principles is lawfulness and causality, respectively. Meyerson claimed that the principles of reason were factual rather than normative.

Meyerson said that Comte did not pursue explanations in science because he limited the psychology of thought to the first of these principles. Comte did this because he was convinced that a too detailed investigation of nature would be counter-productive and lead to incoherent or sterile results. For instance, he protested strongly against the “abuse of microscopic research and the exaggerated merit still too often accorded to a means of investigation so dubious.” (IR 21). Comte expressed horror of all explanatory theory. Meyerson expressed the fundamental distinction between the principles of reason (and between Comte and himself) as follows:

The law states simply that, conditions happening to be modified in a determined manner, the actual properties of the substance must undergo an equally determined modification; whereas according to the causal principle there must be equality between causes and effects—that is, the original properties plus the change of conditions must equal the transformed properties. (ibid.,41).

According to Meyerson, the ways of reason provide evidence that both principles are in use whenever we think. In other words, science expresses a belief that its proportionality relationships (the principle of lawfulness) are grounded in an underlying structure (the principle of causality) or what Meyerson calls ‘ontology’. Thus, he says, description or lawfulness is not the only business of science. The concern for structure cannot remain foreign to science. Meyerson’s argument was based on a detailed study of the psychological principles that accompany all scientific inductions, past and present.

Meyerson’s research proposes that his work (which is essentially philosophy of mind; see Essais 59-105) shows that the psychological need to identify phenomena (the effect of the causal postulate) explains the developments of science. For instance, he says it generates the atomic theories of science. The focus of explanation is on positing the persistence of identities (to think is to identify), not on the nature of the persistent residuum. While science no longer thinks of the atom as being an irreducible unit, the causal postulate pushes the search for identities to an investigation for smaller constituents within the atom. Meyerson suggested that the same rational tendency to identify matter created the principles of conservation and ultimately lead to the elimination of time. The identification of antecedent and consequent of a change eliminated the difference between them, and therefore time. He claimed (following Spencer) that matter as eternal is just as it has to be to satisfy the ways of reason. Meyerson writes that the causal postulate creates the concept of the unity of matter and leads to ‘the assimilation of this latter with space’ (IR Ch. 7). The causal postulate ultimately leads to the annihilation of the external world. Meyerson explained this feat as a two-step movement of the causal postulate. The first movement of explanation identifies antecedent and consequent and thereby explains differences away. This step halts the movement of time because when nothing happens (a consequence of the identification of antecedent and consequent) time does not exist. Eternal matter is reduced to space. However, the march of the causal postulate is ongoing as the explanations of reason and the search for identities enter a second phase. In this case, Meyerson claims that the sufficient reason of matter is traced to the space that envelops it. The causal postulate establishes identity between matter and space. At this point nothing is left because space now empty of contents vanishes in turn.

The causal postulate and the tendency to reduce the whole of reality to an all-inclusive identity proposition failed. Science reacted, says Meyerson, and this reaction was expressed by Carnot’s principle (Meyerson calls Carnot the ‘hero of science’). The ‘irrationals’ of science such as transitive action and impact arise because reality does not lend itself to the (Eleatic) goal of total identification. We do not have the identities of antecedent and consequent supposed by the causal postulate. Carnot’s principle saves science. He reminds us that it costs energy to do work and the fully reversible reaction of rational mechanics is an illusion. Meyerson described the ‘irrationals’ of science as places of recalcitrance in reality, places that refuse to lend themselves to the formula of identification.

At this point, Meyerson introduced the distinction between identification and identities. We hope for full explanations (identification) of reality but achieve only partial explanations (identities). Meyerson fuses the convergence and divergence of reason and reality into what he terms the ‘plausible propositions’ of science (ibid., 148). He says that all scientific theories are generated this way as they reveal a mix of an a priori tendency to identify and the a posteriori elements of experience that resist total identification. The ‘plausible’ propositions of science are best expressed through mathematics since it provides a mechanism to preserve diversity while expressing identity. For instance, the proposition 7 plus 5 equals 12 expresses identity while accounting for the differences between antecedent and consequent. Meyerson attributes the discovery of this application (the mathematical method) to René Descartes.

CP extends the causal postulate to the world of common sense. The world we see upon awakening each morning is the result of the activity of the causal postulate. Reason must have its identities and cannot tolerate the fleetingness of sensations. We create the world as a place to house sensations in their absence. The world of common sense arises out of the hypostasis of sensations. This action provides an ontological foundation for science. Science purifies the world of common sense by subjecting it to additional layers of identification. Meyerson said that the constructs of science—electrons, atoms—are more real than the objects of common sense because they arise out of several coatings of identification.

The formula of identification recognizes that diversity is itself an irrational. Reason cannot know the real without reducing it to something other than itself. Meyerson is in full agreement with the Kantian view that reality is essentially unknowable or noumenal. The thing in itself cannot be known since the ways of reason spontaneously transform diversity into identity (RD 21.) The explanatory structure of science depends on the discovery of identities in diversity. But that discovery leads to the (Kantian) conclusion that reality in itself is unknowable. Does this mean that (lawfulness) description remains the only business of science? Not at all! Meyerson does not change his mind about the insufficiencies of positivistic epistemology. He reminds us that the causal postulate is factual rather than normative. The point about causality is that something must persist. The irrational nature of diversity means that some aspect of reality will always remain unknown. Error comes out of hastily constructed theories, theories with few instances of identifications, not the causal postulate. The principles of lawfulness and causality are the core structure of reason. To explain is to identify. Meyerson says that to identify is to discover sufficient reasons, as was clear to Leibniz; “Things are thus because they were already previously thus” (IR 43). Meyerson said there is no evidence to suggest that the way we think will ever change. In the past, the human mind has never modified its essence. Thus, this form of thought will shape the future of scientific developments. However, he explained the evolution of science as a two-pronged movement of reason. First, science is an attempt to generate a theory of everything through the discovery of increasingly comprehensive identity propositions. Second, we experience changes in the relationship between reason and reality. For instance the shift from the Newtonian view of homogeneous space to the heterogeneous space of relativity physics (see DR) arose because the concept space has been shown to obtain a posteriori. Experience (now) teaches us that space is not the same everywhere and therefore the concept cannot come from reason (is not a priori). Meyerson’s criticism of positivistic epistemology (and the ‘Copenhagen’ view of quantum theory) earned Einstein’s approval because it explained how the forms of reason lead to the reducibility of matter and time to heterogeneous space (see ‘the success of relativism’ In DR Ch. 16—133: ‘La réussite du relativisme’.)

4. References and Further Reading

a. Books

  • (1908) Identité et réalité. Paris: F. Alcan. xix and 571 pages.
    • The second edition appears in 1912, and the third edition in 1926. The third edition is translated into English by Kate Lowenberg. (1930) Identity and Reality. George Allen & Unwin Limited. (1960). New York, N.Y.: Dover Publications, Inc. This book is an inductive study of the theories generated by scientific thought—from their first beginnings in the works of the early Atomists to their latest developments in quantum physics—to uncover the psychological principles that accompany all scientific inductions.
  • (1921) De l’explication dans les sciences. 2 volumes. Paris:Payot. 784 pages. The Second Edition appears in 1927. The book is translated into English by Mary-Alice and David A. Sipfle. (1991) Explanation in the Sciences. Boston Studies in the Philosophy of Science. No. 128. Hingham, Mass: Kluwer Academic Publishers. 648 pages.
    • Meyerson says that IR is inductively based whereas this book is more philosophical because it moves deductively from the application of principles uncovered in that first book to their application in scientific developments.
  • (1924) La déduction relativiste. Paris: Payot. 396 pages.
    • Meyerson had been accused of dealing with pre 20th century science, so in this work he applies the principles uncovered in IR to current scientific thought. His work did not go unnoticed. In 1928, Einstein expressed admiration for Meyerson’s epistemological perspective, citing DR as a penetrating and exacting study of relativity physics. Einstein notes the presence of ‘ce démon de l’explication’ in his own work; “Eh bien j’ai lu votre livre, et je vous l’avoue, je suis convaincue” (ah yes, I read your book, I admit it, I am convinced.) See Albert Einstein, 1928, ‘A propos de la déduction relativiste de M. Emile Meyerson. In Revue Philosophique, 105, mars-avril, 161-166.
  • (1931) Du cheminement de la pensée. Three volumes. Paris: F. Alcan. xxvii and 1036 pages, (vol 3 is reserved for notes).
    • The work moves beyond science to focus on the application of principles of reason to the realm of common sense.
  • (1933) Réel et déterminisme dans la physique quantique. Paris: Hermann. 49 pages.
    • This small special study moves ahead to apply the psychological structure of thought (lawfulness and causality) to quantum mechanics. The book’s Preface is by Louis de Broglie.
  • (1936) Essais. Paris: J. Vrin. xvi and 272 pages.
    • A posthumous publication of Meyerson’s major articles. Meyerson prepared the list of articles to be included in the book. The Preface is by Louis de Broglie, and the Foreword is by L. Lévy-Bruhl.

b. Articles

i. Articles included in Essais

  • (1884) Jean Rey et la loi de la conservation de la matière. Revue scientifique. 33, jan-juillet, pp 299-303.
  • (1888) Théodore Turquet de Mayerne et la découverte de l’hydrogène. Revue scientifique. 42, nov. pp. 665-670.
  • (1891) La coupellation chez les anciens Juifs. Revue scientifique. 47, juin, pp. 756-758.
  • (1914) Y-a-t-il un rythme dans le progrès intellectuel? Bulletin de la société française de philosophie. 14. Séance des 29 janvier et 5 février, pp. 61-140.
  • (1923) Le sens commun vise-t-il la connaissance? Revue de métaphysique et de morale. 30, 15 mars, pp. 13-21.
  • (1923) Le sens commun et la quantité. Journal de psychologie. 30, 15 mars, pp. 206-217.
  • (1923) Hegel, Hamilton, Hamelin et le concept de cause. Revue philosophique. 96, juillet-aout, pp. 33-55.
  • (1933) La notion de l’identique. Recherches philosophiques. 3, pp. 1-17.
  • (1934) Le savoir et l’univers de la perception immédiate. Journal de psychologie, pp. 3-4.
  • (1934) Philosophie de la nature et philosophie de l’intellect. Revue de métaphysique et de morale. 41, avril, pp. 59-105.
  • (1934) Les mathématiques et le divers. Revue philosophique. 117, mai-juin, pp. 321-334.
  • (1934) De l’analyse des produits de la pensée. Revue philosophique. 118, sept.-oct. , pp. 135-170.

ii. Other Articles

  • (1890) Les travaux de M. Charles Henry sur une theorie mathématique de l’expression. Bulletin scientifique. 16, pp. 3-5.
  • (1891) Paracelsus et la découverte de l’hydrogne. Revue scientifique. 47, juin, p. 796.
  • (1911) L’histoire du problème de la connaissance de M. E. Cassirer. Revue de métaphysique et de morale. 19, janvier, pp. 100-129.
  • (1916) La science et les systèmes philosophiques. Revue de métaphysique et de morale. 23 janvier, pp. 203-242.
  • (1924) La tendance apriorique et l’expérience. Revue philosophique. 97, jan-juin, pp.161-179.
  • (1930) Le physicien et le primitif. Revue philosophique. 109, jan.-juin, pp. 321-358.

Author Information

Kenneth A. Bryson
Email: ken_bryson@cbu.ca
Cape Breton University
Canada

Reductio ad Absurdum

Reductio ad absurdum is a mode of argumentation that seeks to establish a contention by deriving an absurdity from its denial, thus arguing that a thesis must be accepted because its rejection would be untenable. It is a style of reasoning that has been employed throughout the history of mathematics and philosophy from classical antiquity onwards.

Table of Contents

  1. Basic Ideas
  2. The Logic of Strict Propositional Reductio: Indirect Proof
  3. A Classical Example of Reductio Argumentation
  4. Self-Annihilation: Processes that Engender Contradiction
  5. Doctrinal Annihilation: Sets of Statements that Are Collectively Inconsistent
  6. Absurd Definitions and Specifications
  7. Per Impossible Reasoning
  8. References and Further Reading

1. Basic Ideas

Use of this Latin terminology traces back to the Greek expression hê eis to adunaton apagôgê, reduction to the impossible, found repeatedly in Aristotle's Prior Analytics. In its most general construal, reductio ad absurdum - reductio for short – is a process of refutation on grounds that absurd - and patently untenable consequences would ensue from accepting the item at issue. This takes three principal forms according as that untenable consequence is:

  1. a self-contradiction (ad absurdum)
  2. a falsehood (ad falsum or even ad impossible)
  3. an implausibility or anomaly (ad ridiculum or ad incommodum)

The first of these is reductio ad absurdum in its strictest construction and the other two cases involve a rather wider and looser sense of the term. Some conditionals that instantiate this latter sort of situation are:

  • If that's so, then I'm a monkey's uncle.
  • If that is true, then pigs can fly.
  • If he did that, then I'm the Shah of Persia.

What we have here are consequences that are absurd in the sense of being obviously false and indeed even a bit ridiculous. Despite its departure from what is strictly speaking so construed - conditionals with self-contradictory - time to time conclusions – this sort of thing is also characterized as an attenuated mode of reductio. But while all three cases fall into the range of the term as it is commonly used, logicians and mathematicians generally have the first and strongest of them in view.

The usual explanations of reductio fail to acknowledge the full extent of its range of application. For at the very minimum such a refutation is a process that can be applied to

  • individual propositions or theses
  • groups of propositions or theses (that is, doctrines or positions or teachings)
  • modes of reasoning or argumentation
  • definitions
  • instructions and rules of procedure
  • practices, policies and processes

The task of the present discussion is to explain the modes of reasoning at issue with reductio and to illustrate the work range of its applications.

2. The Logic of Strict Propositional Reductio: Indirect Proof

Whitehead and Russell in Principia Mathematica characterize the principle of "reductio ad absurdum" as tantamount to the formula (~pp) →p of propositional logic. But this view is idiosyncratic. Elsewhere the principle is almost universally viewed as a mode of argumentation rather than a specific thesis of propositional logic.

Propositional reductio is based on the following line of reasoning:

If p ⊢ ~p, then ⊢ ~p

Here ⊢ represents assertability, be it absolute or conditional (that is, derivability). Since pq yields ⊢p →q this principle can be established as follows:

Suppose (1) p ⊢ ~p

(2) ⊢p → ~p from (1)

(3) ⊢p → (p & ~p) from (2) since pp

(4) ⊢ ~(p & ~p) → ~p from (3) by contraposition

(5) ⊢ ~(p & ~p) by the Law of Contradiction

(6) ⊢ ~p from (4), (5) by modus ponens

Accordingly, the above-indicated line of reasoning does not represent a postulated principle but a theorem that issues from subscription to various axioms and proof rules, as instanced in the just-presented derivation.

The reasoning involved here provides the basis for what is called an indirect proof. This is a process of justificating argumentation that proceeds as follows when the object is to establish a certain conclusion p:

(1) Assume not-p

(2) Provide argumentation that derives p from this assumption.

(3) Maintain p on this basis.

Such argumentation is in effect simply an implementation of the above-stated principle with ~p standing in place of p.

As this line of thought indicates, reductio argumentation is a special case of demonstrative reasoning. What we deal with here is an argument of the pattern: From the situation

(to-be-refuted assumption + a conjunction of preestablished facts) ⊢ contradiction

one proceeds to conclude the denial of that to-be-refuted assumption via modus tollens argumentation.

An example my help to clarify matters. Consider division by zero. If this were possible when x is not 0 and we took x ÷ 0 to constitute some well-defined quantity Q, then we would have x ÷ 0 = Q so that x = 0 x Q so that since 0 x (anything) = 0 we would have x = 0, contrary to assumption. The supposition that x ÷ 0 qualifies as a well-defined quantity is thereby refuted.

3. A Classical Example of Reductio Argumentation

A classic instance of reductio reasoning in Greek mathematics relates to the discovery by Pythagoras - disclosed to the chagrin of his associates by Hippasus of Metapontum in the fifth century BC - of the incommensurability of the diagonal of a square with its sides. The reasoning at issue runs as follows:

Let d be the length of the diagonal of a square and s the length of its sides. Then by the Pythagorean theorem we have it that d² = 2s². Now suppose (by way of a reductio assumption) that d and s were commensurable in terms of a common unit n, so that d = n x u and s = m x u, where m and n are whole numbers (integers) that have no common divisor. (If there were a common divisor, we could simply shift it into u.) Now we know that

(n x u)² = 2(m x u

We then have it that n² = 2m². This means that n must be even, since only even integers have even squares. So n = 2k. But now n² = (2k)² = 4k² = 2m², so that 2k² = m². But this means that m must be even (by the same reasoning as before). And this means that m and n, both being even, will have common divisors (namely 2), contrary to the hypothesis that they do not. Accordingly, since that initial commensurability assumption engendered a contradiction, we have no alternative but to reject it. The incommensurability thesis is accordingly established.

As indicated above, this sort of proof of a thesis by reductio argumentation that derives a contradiction from its negation is characterized as an indirect proof in mathematics. (On the historical background see T. L. Heath, A History of Greek Mathematics [Oxford, Clarendon Press, 1921].)

The use of such reductio argumentation was common in Greek mathematics and was also used by philosophers in antiquity and beyond. Aristotle employed it in the Prior Analytics to demonstrate the so-called imperfect syllogisms when it had already been used in dialectical contexts by Plato (see Republic I, 338C-343A; Parmenides 128d). Immanuel Kant's entire discussion of the antinomies in his Critique of Pure Reason was based on reductio argumentation.

The mathematical school of so-called intuitionism has taken a definite line regarding the limitation of reductio argumentation for the purposes of existence proofs. The only valid way to establish existence, so they maintain, is by providing a concrete instance or example: general-principle argumentation is not acceptable here. This means, in specific, that one cannot establish (∃x)Fx by deducing an absurdity from (∀x)~Fx. Accordingly, intuitionists would not let us infer the existence of invertebrate ancestors of homo sapiens from the patent absurdity of the supposition that humans are vertebrates all the way back. They would maintain that in such cases where we are totally in the dark as to the individuals involved we are not in a position to maintain their existence.

4. Self-Annihilation: Processes that Engender Contradiction

Not only can a self-inconsistent statement (and thereby a self-refuting, self-annihilating one) but also a self-inconsistent process or practice or principle of procedure can be "reduced to absurdity." For any such modus operandi answers to some instruction (or combination thereof), and such instruction can also prove to be self-contradictory. Examples of this would be:

  • Never say never.
  • Keep the old warehouse intact until the new one is constructed. And build the new warehouse from the materials salvaged by demolishing the old.

More loosely, there are also instructions that do not automatically result in logically absurd (self-contradictory) conclusions, but which open the door to such absurdity in certain conditions and circumstances. Along these lines, a practical rule of procedure or modus operandi would be reduced to absurdity when it can be shown that its actual adoption and implementation would result in an anomaly.Consider an illustration of this sort of situation. A man dies leaving an estate consisting of his town house, his bank account of $30,000, his share in the family business, and several pieces of costume jewelry he inherited from his mother. His will specifies that his sister is to have any three of the valuables in his estate and that his daughter is to inherent the rest. The sister selects the house, a bracelet, and a necklace. The executor refuses to make this distribution and the sister takes him to court. No doubt the judge will rule something like "Finding for the plaintiff would lead ad absurdum. She could just as well have also opted not just for the house but also for the bank account and the business, thereby effectively disinheriting the daughter, which was clearly not the testator's wish." Here we have a juridical reductio ad absurdum of sorts. Actually implementing this rule in all eligible cases - its generalized utilization across the board - would yield an unacceptable and untoward result so that the rule could self-destruct in its actual unrestricted implementation. (This sort of reasoning is common in legal contexts. Many such cases are discussed in David Daube Roman Law [Edinburgh: Edinburgh University Press, 1969], pp. 176-94.)Immanuel Kant taught that interpersonal practices cannot represent morally appropriate modes of procedure if they do not correspond to verbally generalizable rules in this way. Such practices as stealing (that is, taking someone else's possessions without due authorization) or lying (i.e. telling falsehoods where it suits your convenience) are rules inappropriate, so Kant maintains, exactly because the corresponding maxims, if generalized across the board, would be utterly anomalous (leading to the annihilation of property- ownership and verbal communication respectively. Since the rule-conforming practices thus reduce to absurdity upon their general implementation, such practices are adjudged morally unacceptable. For Kant, generalizability is the acid test of the acceptability of practices in the realm of interpersonal dealings.

5. Doctrinal Annihilation: Sets of Statements that Are Collectively Inconsistent

Even as individual statements can prove to be self-contradictions, so a plurality of statements (a "doctrine" let us call it) can prove to be collectively inconsistent. And so in this context reductio reasoning can also come into operation. For example, consider the following schematic theses:

  • AB
  • BC
  • CD
  • Not-D

In this context, the supposition that A can be refuted by a reductio ad absurdum. For if A were conjoined to these premisses, we will arrive at both D and not-D which is patently absurd. Hence it is untenable (false) in the context of this family of givens.When someone is "caught out in a contradiction" in this way their position self-destructs in a reduction to absurdity. An example is provided by the exchange between Socrates and his accusers who had charged him with godlessness. In elaborating this accusation, these opponents also accused Socrates of believing in inspired beings (daimonia). But here inspiration is divine inspiration such a daimonism is supposed to be a being inspired by a god. And at this point Socrates has a ready-made defense: how can someone disbelieve in gods when he is acknowledged to believe in god-inspired beings. His accusers here become enmeshed in self-contradiction. And their position accordingly runs out into absurdity. (Compare Aristotle, Rhetorica 1398a12 [II xxiii 8].)

6. Absurd Definitions and Specifications

Even as instructions can issue in absurdity, so can definitions and explanations. As for example:

  • A zor is a round square that is colored green.

Again consider the following pair:

  • A bird is a vertebrate animal that flies.
  • An ostrich is a species of flightless bird.

Definitions or specifications that are in principle unsatisfiable are for this very reason absurd.

7. Per Impossible Reasoning

Per impossible reasoning also proceeds from a patently impossible premiss. It is closely related to, albeit distinctly different from reductio ad absurdum argumentation. Here we have to deal with literally impossible suppositions that are not just dramatically but necessarily false thanks to their logical conflict with some clearly necessary truths, be the necessity at issue logical or conceptual or mathematical or physical. In particular, such an utterly impossible supposition may negate:

  • a matter of (logico-conceptual) necessity ("There are infinitely many prime numbers").
  • a law of nature ("Water freezes at low temperatures").

Suppositions of this sort commonly give rise to per impossible counterfactuals such as:

  • If (per impossible) water did not freeze, then ice would not exist.
  • If, per impossible, pigs could fly, then the sky would sometimes be full of porkers.
  • If you were transported through space faster than the speed of light, then you would return from a journey younger than at the outset.
  • Even if there were no primes less than 1,000,000,000, the number of primes would be infinite.
  • If (per impossible) there were only finitely many prime numbers, then there would be a largest prime number.

A somewhat more interesting mathematical example is as follows: If, per impossible, there were a counterexample to Fermat's Last Theorem, there would be infinitely many counterexamples, because if xk + yk = zk, then (nx)k + (ny)k = (nz)k, for any k.

With such per impossible counterfactuals we envision what is acknowledged as an impossible and thus necessarily false antecedent, doing so not in order to refute it as absurd (as in reductio ad absurdum reasoning), but in order to do the best one can to indicate its "natural" consequences.

Again, consider such counterfactuals as:

  • If (per impossible) 9 were divisible by 4 without a remainder, then it would be an even number.
  • If (per impossible) Napoleon were still alive today, he would be amazed at the state of international politics in Europe.

A virtually equivalent formulation of the very point at issue with these two contentions is:

  • Any number divisible by 4 without remainders is even.
  • By the standards of Napoleonic France the present state of international politics in Europe is amazing.

However, the designation per impossible indicates that it is the conditional itself that concerns us. Our concern is with the character of that consequence relationship rather than with the antecedent or consequent per so. In this regard the situation is quite different from reductio argumentation by which we seek to establish the untenability of the antecedent. To all intents and purposes, then, counterfactuals can serve distinctly factual purpose.And so, often what looks to be a per impossible conditional actually is not. Thus consider

  • If I were you, I would accept his offer.

Clearly the antecedent/premiss "I = you" is absurd. But even the slightest heed of what is communicatively occurring here shows that what is at issue is not this just-stated impossibility but a counterfactual of the format:

  • If I were in your place (that is, if I were circumstanced in the condition in which you now find yourself), then I would consult the doctor.

Only by being perversely literalistic could the absurdity of that antecedent be of any concern to us.

One final point. The contrast between reductio and per impossible reasoning conveys an interesting lesson. In both cases alike we begin with a situation of exactly the same basic format, namely a conflict of contradiction between an assumption of supposition and various facts that we already know. The difference lies entirely in pragmatic considerations, in what we are trying to accomplish. In the one (reductio) case we seek to refute and rebut that assumptions so as to establish its negation, and in the other (per impossible) case we are trying to establish an implication - to validate a conditional. The difference at bottom thus lies not in the nature of the inference at issue, but only in what we are trying to achieve by its means. The difference accordingly is not so much theoretical as functional - it is a pragmatic difference in objectives.

8. References and Further Reading

  • David Daube, Roman Law (Edinburgh: Edinburgh University Press, 1969), pp. 176-94.
  • M. Dorolle, "La valeur des conclusion par l'absurde," Révue philosophique, vol. 86 (1918), pp. 309-13.
  • T. L. Heath, A History of Greek Mathematics, vol. 2 (Oxford: Clarendon Press, 1921), pp. 488-96.
  • A. Heyting, Intuitionism: An Introduction (Amsterdam, North-Holland Pub. Co., 1956).
  • William and Martha Kneale, The Development of Logic (Oxford: Clarendon Press, 1962), pp. 7-10.
  • J. M. Lee, "The Form of a reductio ad absurdum," Notre Dame Journal of Formal Logic, vol. 14 (1973), pp. 381-86.
  • Gilbert Ryle, "Philosophical Arguments," Colloquium Papers, vol. 2 (Bristol: University of Bristol, 1992), pp. 194-211.

Author Information

Nicholas Rescher
Email: rescher+@pitt.edu
University of Pittsburgh
U. S. A.

Hans Reichenbach (1891—1953)

Hans Reichenbach was a leading philosopher of science, a founder of the Berlin circle, and a proponent of logical positivism (also known as neopositivism or logical empiricism). He is known for his philosophical investigations of Einstein's theory of relativity, quantum mechanics, the theory of probability, the nature of space and time, the character of physical laws, and conventionalism in physical science.

He was a critic of the Kantian theory of the synthetic a priori. Reichenbach complained that Kant and Poincaré should more carefully have distinguished mathematical geometry [that is, pure, a priori geometry] from physical geometry [that is, applied, synthetic geometry]. Mathematical geometry is about abstract objects in mathematical space; physical geometry is about physical objects in physical space. Kant and Poincaré failed to appreciate that it is the complete package of physics, coordinating definitions, and mathematical geometry that is compared to observation in order to select the appropriate physical geometry, said Reichenbach. This geometry cannot be selected a priori, as Kant wanted, nor by convention, as Poincaré wanted. When the package is in fact compared to observation the proper geometry is non-Euclidean geometry, as Einstein was the first to discover. In addition, developing an idea of Leibniz's, Reichenbach created a detailed theory whose goal is to explain the direction of time in terms of the direction from causes to their effects.

His methods of teaching philosophy were something of a novelty; students found him easy to approach (this fact was uncommon in German universities); and his courses were open to discussion and debate. In 1930, he and Carnap became the editors of the influential philosophical journal Erkenntnis.

Table of Contents

  1. Life
  2. The Philosophy of Space and Time and the Philosophical Meaning of the Theory of Relativity
    1. Space
    2. Time
    3. The Special Theory of Relativity
    4. The General Theory of Relativity
    5. The Reality of Space and Time
  3. Quantum Mechanics
    1. Interpretation of Quantum Physics: Part I
    2. Mathematical Formulation of Quantum Mechanics
    3. Examples of Quantum Operators
    4. Classical and Quantum Physical Quantities; Schrodinger Equations
    5. Heisenberg Indeterminacy Principle
    6. The Interpretation of Quantum Physics: Part II
  4. Reichenbach's Epistemology
    1. The Structure of Science and the Verifiability Principle
    2. Conventionalism vs. Empiricism
    3. Causality
    4. Science and Philosophy
  5. References and Further Reading

1. Life

Hans Reichenbach was born on September 26th 1891 in Hamburg, Germany. He was a leading philosopher of science, a founder of the Berlin circle, and a proponent of logical positivism (also known as neopositivism or logical empiricism). He studied physics, mathematics and philosophy at Berlin, Erlangen, Gottingen and Munich in the 1910s. Among his teachers were the neo-Kantian philosopher Ernst Cassirer, the mathematician David Hilbert, and the physicists Max Planck, Max Born and Albert Einstein. Reichenbach received his degree in philosophy from the University at Erlangen in 1915; his dissertation on the theory of probability was published in 1916. He attended Einstein's lectures on the theory of relativity at Berlin in 1917-20; at that time Reichenbach chose the theory of relativity as the first subject for his own philosophical research. He became a professor at Polytechnic at Stuttgart in 1920. In the same year he published his first book on the philosophical implications of the theory of relativity, The Theory of Relativity and A Priori Knowledge, in which Reichenbach criticized Kantian theory of the synthetic a priori. In the following years he published three books on the philosophical meaning of the theory of relativity: Axiomatization of the theory of Relativity (1924), From Copernicus to Einstein (1927) and The Philosophy of Space and Time (1928); the last in a sense states logical positivism's view on the theory of relativity. In 1926 Reichenbach became a professor of philosophy of physics at the University at Berlin. His methods of teaching philosophy were something of a novelty; students found him easy to approach (this fact was uncommon in German universities); his courses were open to discussion and debate. In 1928 he founded the Berlin circle (named Die Gesellschaft fur empirische Philosophie, "Society for empirical philosophy"). Among the members of the Berlin circle were Carl Gustav Hempel, Richard von Mises, David Hilbert and Kurt Grelling. In 1930 Reichenbach and Carnap undertook the editorship of the journal Erkenntnis ("Knowledge").

In 1933 Adolf Hitler became Chancellor of Germany. In the same year Reichenbach emigrated to Turkey, where he became chief of the Department of Philosophy at the University at Istanbul. In Turkey, Reichenbach promoted a shift in philosophy courses; he introduced interdisciplinary seminars and courses on scientific subjects. Then in 1935 he published The theory of Probability.

In 1938 he moved to the United States, where he became a professor at the University of California at Los Angeles; in the same year he published Experience and Prediction. Reichenbach's work on quantum mechanics was published in 1944 (Philosophic foundations of quantum mechanics). Afterwards he wrote two popular books: Elements of symbolic logic (1947) and The rise of scientific philosophy (1951). In 1949 he contributed an essay on The philosophical significance of the theory of relativity to Albert Einstein: philosopher-scientist edit by Paul Arthur Schillp. Reichenbach died on April 9th 1953 at Los Angeles, California, while he was working on the philosophy of time. Two books Nomological statements and admissible operations (1954) and The direction of time (1956) were published posthumously.

2. The Philosophy of Space and Time and the Philosophical Meaning of the Theory of Relativity

a. Space

Euclidean geometry is based on the set of axioms stated by Greek mathematician Euclid who developed geometry into an axiomatic system, in which every theorem is derivable from the axioms. Euclid's work revealed that the truth of geometry depends on the truth of axioms and therefore the question arose whether the axioms were true. Many Euclidean axioms were self-evident, but the axiom of parallels, which states that there is one and only one parallel to a given line through a given point, was considered not self-evident, and many mathematicians tried to derive it from the other axioms. Eventually it was proved the axiom of parallels is not a logical consequence of the remainder. As a result of this research non-Euclidean geometries were discovered and mathematicians became aware of the existence of a plurality of geometries, namely:

  • Euclidean geometry, in which the axiom of parallels is true;
  • geometry of Bolyai and Lobachevsky, also known as hyperbolic geometry, in which there is an infinite number of parallels to the given line through the given point (Janos Bolyai (1802-1860), Hungarian mathematician, published in 1832 the first account of a non-Euclidean geometry; Nikolay Lobachevsky (1793-1856), Russian mathematician, independently discovered hyperbolic geometry);
  • elliptical geometry, in which there exist no parallel.

In Reichenbach's opinion, it must be realized that there are two different kinds of geometry, namely mathematical geometry and physical geometry. Mathematical geometry, a branch of mathematics, is a purely formal system and it does not deal with the truth of axioms, but with the proof of theorems. That is, it only produces the consequences of axioms. Physical geometry is concerned with the real geometry, that is, the geometry which is true in our physical world: it searches for the truth (or falsity) of axioms, using the methods of empirical science: experiments, measurements, and so forth; it is a branch of physics.

How can physicists discover the geometry of the real world? Look at the following example, which Reichenbach analyses in The philosophy of space and time. Two-dimensional intelligent beings live in a two-dimensional world, on the surface of a sphere, but they do not know where they live; in their opinion, they might live on a plane, a sphere or whatever surface. How can they discover where they live? They could use some mathematical properties that characterize a geometry; for example, in Euclidean geometry the ratio of the circumference of a circle to its diameter equals pi (3.14...) while in elliptical geometry the ratio is variable and it is less than pi; also in hyperbolic geometry the ratio is variable but greater than pi. Therefore they could measure the circumference and the diameter of a circle; if the ratio equals pi the surface is a plane; if the ratio is less than pi the surface is a sphere. Thus they could discover where they live with the help of such measurements. This method, invented by Gauss (Karl Friedrich Gauss, b 1777 d 1855, German mathematician, was the first to discover a non-Euclidean geometry although he did not published his work) is suitable for a two-dimensional world. Riemann (Bernhard Riemann, b 1826 d 1866, German mathematician, developed both the elliptical geometry and the generalized theory of metric space in any number of dimension which Einstein used in his general theory of relativity) invented a method suitable for a three-dimensional world. There is no reason in principle why physicists could not use Riemann's method to discover the geometry of our world.

Riemann's method is based on physical measurements. Reichenbach carefully examines the epistemological implications of measuring geometrical entities. The empirical measurement of geometrical entities depends on physical objects or physical processes corresponding to geometrical concepts. The process of establishing such correlation is called a co-ordinative definition. Usually, a definition is a statement that gives the exact meaning of a concept; this kind of definition is called an explicit definition. There is another kind of definition, namely the co-ordinative definition; it is not a statement, but an ostensive definition. The co-ordinative definition of a concept is a correlation between a real object or a physical process and the concept itself. Some geometrical entities cannot be defined by an explicit definition but they require a co-ordinative definition. For example, the unit of length, ie the metre, is defined by a co-ordinative definition; the physical object corresponding to the metre is the standard rod in Paris (Museum of weights and measures in Paris houses the units of measure for International System of Units). Another example is the definition of straight line which is co-ordinated with a physical process, namely the path of a light ray.

What is the philosophical meaning of a co-ordinative definition? Reichenbach proposes the following problem, discussed in The philosophy of space and time. A measuring rod is moved from one point of space (say A) to another point (say B). When the measuring rod is in B, is its length altered? Many physical circumstances can alter the length, for example, if temperature in A differs from temperature in B. In this example, we can discover whether the temperature is the same by means of a metallic rod and a wooden rod which are of equal length when they are in A. Move the two rods to B: if their length becomes different then the temperature is also different, otherwise the temperature is the same. This method is suitable because temperature is a differential force, ie a force that produces different effects on different substances. But there are universal forces, which produce the same effect on all type of matter. The best known universal force is gravity: its effect is the same on all bodies and therefore all bodies fall with the same acceleration. Now suppose a universal force alters the length of the measuring rods when they are moved from A to B; in this instance, we do not observe any difference between the measuring rods and we cannot know whether the length is altered. Consequently, if a rod stays in A and the other is moved to B where a universal force alters its length, we cannot know their length is different. So we must acknowledge that there is not any way of knowing whether the length of two measuring rods, which are equal when they are in the same point of space, is the same when the two rods are in two different points of space. We can define the two rods equal in length if all differential forces are eliminated and disregard universal forces. But we can adopt a different definition, of course. Thus we must accept - Reichenbach says - that the geometrical form of a body is not an absolute fact, but depends on a co-ordinative definition. There is an astonish consequence of this fact. If a geometry G was proved to be the real geometry by a set of measurements, we could arbitrarily choose a different geometry G' and adopt a different set of co-ordinative definitions so that G' would become the real geometry. This is the principle of relativity of geometry, which Reichenbach examines, from a mathematical point of view, in Axiomatization of the theory of relativity and, from a philosophical point of view, in The philosophy of space and time. This principle states that all geometrical systems are equivalent; it falsifies alleged a priori character of Euclidean geometry and thus it falsifies the Kantian philosophy of space too.

At a first glance, the principle of relativity of geometry proves it is not possible to discover the real geometry of our world. This is true if we limit ourselves to metric relationships. Metric relationships are geometric properties of bodies depending on distances, angles, areas, and so forth; examples of metric relationships are "the ratio of circumference to diameter equals pi" and "the volume of A is greater than the volume of B." But we can study not only distances, angles, areas but also the order of space, the topology of space, i.e., the way in which the points of space are placed in relation to one another; an example of a topological relationship is "point A is between points B and C". A consequence of the principle of relativity of geometry is, for instance, that a plane and a sphere are equivalent with respect to metric. From a topological point of view, a sphere and a plane are not equivalent (in topology, two geometrical objects are equivalent if and only if there is a continuous transformation that assign to every point of the first object a unique point of the second and vice versa; there is not any transformation of this kind between a sphere and a plane). What is the philosophical significance of topology?

Reichenbach examines the following example (The philosophy of space and time). Measurements of space, performed by a two-dimensional being, suggest that he lives on a sphere, but, in spite of such measurements, he believes he lives on a plane. There is not any difficult, when he limits himself to metric relationships: he could adopt appropriate co-ordinative definitions and those measurements would become compatible with a plane. But the surface of a sphere is a finite surface and he might do a round-the-world tour, that is he could walk along a straight line from a point A and eventually he would arrive to the point A itself. Really this is impossible on a plane and he therefore should assert that this last point is not the point A, but a different point B which, in all other respects, is identical to A. Now there are two possibilities: (i) he changes his theory and acknowledges that he lives on a sphere or (ii) he maintains his position, but he needs to explain why point B is identical to A although A and B are different and distant points of space; he could accomplish his task only fabricating a fictitious theory of pre-established harmony: everything that occurs in A, immediately occurs in B.

Reichenbach says the second possibility entails an anomaly in the law of causality. If we assume normal causality, topology become an empirical theory and we can discover the geometry of the real world. This example is another falsification of Kantian theory of synthetic a priori. Kant believed both the Euclidean geometry and the law of causality were a priori. But if Euclidean geometry were an a priori truth, normal causality might be false; if normal causality were an a priori truth, Euclidean geometry might be false. We arbitrarily can choose the geometry or we arbitrarily can choose the causality; but we cannot choose both. Thus the most important implication of the philosophical analysis of topology is that the theory of space depends on normal causality.

b. Time

Normal causality is the main principle that underlies not only the theory of space but also the theory of time. The solution to the problem of an empirical theory of space was found when we acknowledged the priority of topological relationships over metric relationships. Also in the philosophy of time we must recognize the priority of topology. We must distinguish between two different concepts which are fundamental to the theory of time, namely the order of time and the direction of time. Time order is definable by means of causality (see The philosophy of space and time). The definition is: event A occurs before event B (and, of course, event B occurs after event A) if event A can produce a physical effect on event B. When can event A affect event B? The theory of relativity states that a finite time is required for an effect to go from event A to event B. The required time is finite because the velocity of light is a speed limit for all material particles, messages or effects, and because this velocity is finite. Suppose A and B are two events occurring in point PA and PB. Event A can affect event B if a light pulse emitted from PA when event A occurs reaches the point PB before event B occurs. If the light pulse reaches point PB when event B already occurred, event A cannot affect event B. If event A cannot affect event B and event B cannot affect event A, the order of the two events is indefinite and we could arbitrarily choose the event that occurs first or we might define the two event simultaneous; therefore simultaneity depends on a definition.

Reichenbach examines the consistency of this definition. Suppose an event A occurs before an event B and, from another point of view (reference frame), the event A occurs after the event B. In this circumstance there is a closed causal chain so that the event A produces an effect on the event B and the event B produces an effect on the event A. The definition is consistent only if we assume that there are not closed causal chains: the order of time depends on normal causality.

Reichenbach asserts that the relativity of simultaneity is independent of the relativity of motion. The relativity of simultaneity is due to the finite velocity of causal propagation. So it is a mistake - Reichenbach asserts in The philosophy of space and time and From Copernicus to Einstein - to derive the relativity of simultaneity from the relative motion of observers. Reichenbach also cautions against a possible misunderstanding of the multiplicity of observers in some expositions of the theory of relativity: observers are used only for convenience; the relativity of simultaneity has nothing to do with the relativity of observers. We must recognize - Reichenbach asserts - that the theory of an absolute simultaneity is a consistent theory although it is a wrong one. Absolute simultaneity and absolute time does not exist, but they are clever concepts.

Reichenbach also faces the problem of the direction of time. All mechanical processes are reversible: if f(t) is a solution of the equations of classical mechanics then f(-t) is also an admissible solution; also in the theory of relativity f(-t) is an admissible solution. Thus neither theory gives a consistent definition of the direction of time. In fact the direction of time is definable only by means of irreversible processes, that is, processes that are characterized by an increase of entropy. But the definition is not straightforward. The second law of thermodynamics, which states the principle of increase of entropy, is a statistical law, not a deterministic law. Really the elementary processes of statistical thermodynamics are reversible, because they are controlled by the laws of classical mechanics. In fact all macroscopic processes are theoretically reversible because statistical thermodynamics asserts that, in an isolated system, after an extremely large amount of time, the entropy will diminish to infinitesimally close to the initial value. In an isolated system, in an infinite time, there are as many downgrades as upgrades of the entropy. Thus if we observe two states A and B, and the entropy of B is greater than the entropy of A, we cannot assert that B is later than A. But if we consider not an isolated system, but many isolated systems over relatively short durations compared to the duration required for a return to the same value of entropy, then the probability that we observe a decrease of entropy is less than the probability we observe an increase of entropy. We can therefore use "many-system probabilities" to define a direction of time. Reichenbach asserts that it is possible to define an entropy for the whole universe, and statistical theory proves that the entropy of the universe first increases and then decreases; thus we can define a direction of time, but only for sections of time, not for the whole time. Reichenbach notes that this theory of time was stated in 19th century by Boltzmann (Ludwig Boltzmann, 1844-1906, Austrian physicist who formulated the statistical theory of entropy).

c. The Special Theory of Relativity

The special theory of relativity gives an unified theory of space and time in the absence of gravitational field. One example of the necessity of an unified theory of space and time is the length contraction, an effect predicted by the theory; this effect shows that the length of a moving rod depends on simultaneity. The special theory of relativity states that the length of a rod measured using a metre that is at rest with respect to the rod is different from the length measured using a metre which is moving with respect to the rod. In the first instance we measure the length of the rod by means of the well-known method used by classical mechanics. But we use a different method when the measuring rod is not at rest with respect to the metre. We measure the length of the moving rod by means of the distance between the two points occupied at a given time by the two ends of the moving rod, ie we mark the simultaneous positions of the two ends and we measure the distance between those positions; thus this method depend on the definition of simultaneity, which also depends on a definition. It must be acknowledged that the length of a moving rod is a matter of definition, but the length contraction is a genuine physical hypothesis confirmed by experiments. We must also recognize the priority of time over space: the ability to measure time is a requisite for the theory of space. Therefore only an unified theory of space and time is suitable. In spite of the necessity for an unified theory of space and time, Reichenbach states (in The philosophy of space and time) that space and time are different concepts which remain distinct in the theory of relativity. The real space is three-dimensional and the real time is one-dimensional: the four-dimensional space-time used in the theory of relativity is a mathematical artefact. Also the mathematical formulation of the special theory of relativity acknowledges the difference between space and time: the equation that defines the metric is dx^2 + dy^2 + dz^2 - dt^2 = ds^2 and the time coordinate is distinguishable from the space coordinates by the negative sign. How can we know the space is three-dimensional? and how can we recognize the difference between a real space and a mathematical space?

A physical effect is not immediately transmitted from one point to another distant point but it passes through every point between the source and the destination. This principle is known as the principle of local action and it denies the existence of action at a distance. In three-dimensional space the principle of local action is true while in a four-dimensional space it is false, so we can recognize that the real space is three-dimensional. We can also distinguish between a mathematical space and the real space because in a mathematical space the principle of local action is false. Reichenbach says that the truth of the principle of local action is an empirical fact, not an a priori truth: it could be false. But if this principle is true then there is only one n-dimensional space in which it is true; this n-dimensional space is the real spac