In 1935, Einstein, Podolsky, and Rosen (EPR) published an important paper in which they claimed that the whole formalism of quantum mechanics together with what they called a “Reality Criterion” imply that quantum mechanics cannot be complete. That is, there must exist some elements of reality that are not described by quantum mechanics. They concluded that there must be a more complete description of physical reality involving some hidden variables that can characterize the state of affairs in the world in more detail than the quantum mechanical state. This conclusion leads to paradoxical results.
As Bell proved in 1964, under some further but quite plausible assumptions, this conclusion that there are hidden variables implies that, in some spin-correlation experiments, the measured quantum mechanical probabilities should satisfy particular inequalities (Bell-type inequalities). The paradox consists in the fact that quantum probabilities do not satisfy these inequalities. And this paradoxical fact has been confirmed by several laboratory experiments since the 1970s.
Some researchers have interpreted this result as showing that quantum mechanics is telling us nature is non-local, that is, that particles can affect each other across great distances in a time too brief for the effect to have been due to ordinary causal interaction. Others object to this interpretation, and the problem is still open and hotly debated among both physicists and philosophers. It has motivated a wide range of research from the most fundamental quantum mechanical experiments through foundations of probability theory to the theory of stochastic causality as well as the metaphysics of free will.
Note: If you have trouble reading or printing this article, see the PDF Version.
Instead of the thought experiment described in the original EPR paper we will formulate the problem for a more realistic spin-correlation experiment suggested by Bohm and Aharonov in 1957.
Consider a source emitting two spin-12 particles (Fig. 1). The (spin) state space of the emitted two-particle system is H2⊗H2, where H2 is a 2-dimensional Hilbert space. (For a brief introduction to quantum mechanics, see Redhead 1987, Chapter 1). Let the quantum state of the system be the so called singlet state: ^W=PΨs, where Ψs=1√2(ψ+v⊗ψ−v−ψ−v⊗ψ+v). ψ+v and ψ−v denote the up and down eigenvectors of the spin-component operator along an arbitrary direction v. In the two wings, we measure the spin-components along directions a and b, which we set up by turning the Stern–Gerlach magnets into the corresponding positions. Let us restrict our considerations for the spin-up events, and introduce the following notations:
A= | The < spin of the left particle is up > detector fires |
B= | The < spin of the right particle is up > detector fires |
a= | The left Stern–Gerlach magnet is turned into position a |
b= | The right Stern–Gerlach magnet is turned into position b |
In the quantum mechanical description of the experiment, events A and B are represented by the following subspaces of H2 ⊗ H2:
A= | span {ψ+a⊗ψ+a, ψ+a⊗ψ−a} |
B= | span {ψ+b⊗ψ+b, ψ+b⊗ψ−b} |
(The same capital letter A, B, etc., is used for the event, for the corresponding subspace, and for the corresponding projector, but the context is always clear.) Quantum mechanics provides the following probabilistic predictions:
p(A|a)=tr(PΨs A)=p(B|b)=tr(PΨs B)= | 12 | (1) |
p(A∧ B|a∧ b)=tr(PΨs AB) = | 12 sin2 ∢(a, b)2 | (2) |
where ∢(a, b) denotes the angle between directions a and b. Inasmuch as we are going to deal with sophisticated interpretational issues, the following must be explicitly stated:
This assumption is used in (1)–(2).
The two measurements happen approximately at the same time and at two places far distant from each other. It is a generally accepted principle in contemporary physics that there is no super-luminal propagation of causal effects. According to this principle we have the following assumption:
Assumption 2 The events in the left wing (the setup of the Stern–Gerlach magnet and the firing of the detector, etc.) cannot have causal effect on the events in the right wing, and vice versa. |
One must recognize that, in spite of this causal separation, (2) generally means that there are correlations between the outcomes of the measurements performed in the left and in the right wings. In particular, if ∢(a, b)=0, the correlation is maximal: the outcome of the left measurement “determines”, with probability 1, the outcome of the right measurement. That is, if we observe “spin-up” in the left wing then we know in advance that the result must be “spin-down” in the right wing, and vice versa. The actual correlations depend on the particular measurement setups. The very possibility of perfect correlation is, however, of paramount importance:
Assumption 3 For any direction b in the right wing one can chose a direction a in the left wing—and vice versa—such that the outcome events are perfectly correlated. |
From this fact, that the measurement outcome in the left wing “determines” the outcome in the right wing, in conjunction with the causal separation of the measurements, one has to conclude that there must exist, locally in the right wing, some elements of reality which pre-determine the measurement outcome in the right wing. Einstein, Podolsky, and Rosen formulated this idea in their famous Reality Criterion:
If, without in any way disturbing a system, we can predict with certainty (i.e., with probability equal to unity) the value of a physical quantity, then there exists an element of reality corresponding to that quantity. (Einstein, Podolsky, Rosen 1935, p. 777)
It is probably true that no physicist would find this thesis implausible. In our example, the value of the spin of the right particle in direction b can be predicted with 100% certainty by performing a far distant spin measurement on the left particle in direction b, that is without in any way disturbing the right particle. Consequently, there must exist some element of reality in the right wing, that corresponds to the value of the spin of the right particle in direction b, in other words, there must exist something in the right wing that determines the outcome of the spin measurement on the right particle.
One might think that if this is true for a given direction b then—by the same token—it must be true for all possible directions. However, this is not necessarily the case. This is true only if the following condition is satisfied:
Otherwise the following conspiracy is possible: something in the world pre-determines which measurement will be performed and what will be the outcome. We assume however that there is no such a conspiracy in our world.
Thus, taking into account Assumptions 2, 3 and 4, we arrive at the conclusion that there are elements of reality corresponding to the values of the spin of the particles in all directions. (Of course, it does not mean that we are able to predict the spin of the right particle in all directions simultaneously. The reason is that we are not able to measure the spin of the left particle in all directions simultaneously.)
The answer is no. However, the meaning of this “no” is more complex and depends on the interpretation of wave function (pure state).
The Copenhagen interpretation asserts that a pure state ψ provides a complete and exhaustive description of an individual system, and a dynamical variable represented by the operator ^A has value a if and only if ^Aψ=aψ. Consequently, spin has a given value only if the state of the system is the corresponding eigenvector of the spin-operator. But spin-operators in different directions do not commute, therefore there is no state in which spin would have values in all directions. Thus, in fact, the EPR argument must be considered as a strong argument against the Copenhagen interpretation of wave function.
According to the statistical interpretation, a wave function does not provide a complete description of an individual system but only characterizes the system in a statistical/probabilistic sense. The wave function is not tracing the complete ontology of the system. Therefore, from the point of view of the statistical interpretation, the novelty of the EPR argument consists in not proving that quantum mechanics is incomplete but pointing out concrete elements of reality that are outside of the scope of a quantum mechanical description.
It does not mean, however, that statistical interpretation remains entirely untouched by the EPR argument. In fact the statistical interpretation of quantum mechanics, as a probabilistic model in general, admits different ontological pictures. And the EPR argument provides restrictions for the possible ontologies. Consider the following simple example. Imagine that we pull a die from a hat and throw it (event D). There are six possible outcomes: < 1 >, < 2 >, … < 6 >. By repeating this experiment many times, we observe the following relative frequencies:
p(< 1 >|D) | =0.05 | |
p(< 2 >|D) | =0.1 | |
p(< 3 >|D) | =0.1 | (4) |
p(< 4 >|D) | =0.1 | |
p(< 5 >|D) | =0.1 | |
p(< 6 >|D) | =0.55 |
p(D)=1, therefore p(< 1 >)=0.05, … p(< 6 >)=0.55. Our probabilistic model will be based on these probabilities, and it works well. It correctly describes the behavior of the system: it correctly reflects the relative frequencies, correctly predicts that the mean value of the thrown numbers is 4.75, etc. In other words, our probabilistic model provides everything expected from a probabilistic model. However, there can be two different ontological pictures behind this probabilistic description:
(A) | The dice in the hat are biased differently. Moreover, each of them is biased by so much, the mass distribution is asymmetric by so much, that practically (with probability 1) only one outcome is possible when we throw it. The distribution of the differently biased dice in the hat is the following: 5% of them are predestinated for < 1 >, 10% for < 2 >, … and 55% for < 6 >. That is to say, each die in the hat has a pre-established property (characterizing its mass distribution). The dice throw—as a measurement—reveals these properties. When we obtain result < 2 >, it reveals that the die has property “2”. In other words, there exists a real event in the world, namely
< #2 >= the die we have just pulled from the hat has property “2” such that
That is, in our example, event < #2 > occurs with probability 0.1 independently of whether we perform the dice throw or not. |
||||||
(B) | All dice in the hat are uniformly prepared. Each of them has the same slightly asymmetric mass distribution such that the outcome of the throw can be anything with probabilities (4). In this case, if the result of the throw is < 2 >, say, it is meaningless to say that the measurement revealed that the die has property “2”. For the outcome of an individual throw tells nothing about the properties of an individual die. In this case, there does not exist a real event < #2 > for which (5) and (6) hold.By repeating the experiment many times, we obtain the conditional probabilities (4). These conditional probabilities collectively, that is, the conditional probability distribution over all possible outcomes, do reflect an objective property common to all individual dice in the hat, namely their mass distribution. (One might think that (A) is a hidden variable interpretation of the probabilistic model in question, while the situation described in (B) does not admit a hidden variable explanation. It is entirely possible, however, that events < #1 >,< #2 >, … are objectively indeterministic. On the other hand, in case (B), the physical process during the dice throw can be completely deterministic and the probabilities in question can be epistemic.) |
We have a completely similar situation in quantum mechanics. Consider an observable with a spectral decomposition ^A=∑iaiPi. It is not entirely clear what we mean by saying that “tr(^WPi) is the probability of that physical quantity A has value ai, if the state of the system is ^W.” To clarify the precise meaning of this statement, let us start with what seems to be certain. We assumed (Assumption 1) that the quantity tr(^WPi) is identified with the observed conditional probability p(< ai >|a), where a denotes the event consisting in the performing the measurement itself and < ai > denotes the outcome event corresponding to pointer position “ai”:
tr(^WPi)=p(< ai >|a) | (7) |
If nothing more is assumed, then a measurement outcome becomes fixed during the measurement itself, and we obtain a type (B) interpretation of quantum probabilities. Let us call this the minimal interpretation. In this case, a measurement outcome < ai > does not reveal a property of the individual object. Of course, the state of the system, ^W, no matter whether it is a pure state or not, may reflect a property of the individual objects, just like the conditional probabilities (4) reflect the mass distribution of the individual dice.
One can also imagine a type (A) interpretation of tr(^WPi), which we call the property interpretation. According to this view, every individual measurement outcome < ai > corresponds to an objective property < #ai > intrinsic to the individual object, which is revealed by the measurement. This property exists and is established independently of whether the measurement is performed or not. Just as in the example above, equation (7) can be continued in the following way:
tr(^WPi)=p(< ai >|a)=p(< #ai >) | (8) |
where p(< #ai >) is the probability of that the individual object in question has the property < #ai >.
Now, from the EPR argument we conclude that the ontological picture provided by the type (B) interpretation is not satisfactory. For according to the EPR argument there must exist previously established elements of reality that determine the outcomes of the individual measurements. This claim is nothing but a type (A) interpretation.
One has to emphasize that the conclusion of the EPR argument is not a no-go theorem for hidden variable models of quantum mechanics. On the contrary, it asserts that there must be a more complete description of physical reality behind quantum mechanics. There must be a state, a hidden variable, characterizing the state of affairs in the world in more detail than the quantum mechanical state operator, something that also reflects the missing elements of reality. In other words, the pre-established value of the hidden variable has to determine the spin of both particles in all possible directions. Perhaps it is not fair to quote Einstein himself in this context, who was not completely satisfied with the published version of the joint paper (see Fine 1986), but in this final conclusion there seems to be an agreement:
I am, in fact, firmly convinced that the essentially statistical character of contemporary quantum theory is solely to be ascribed to the fact that this theory operates with an incomplete description of physical systems. (Quoted by Bell 1987, p. 90.)
Also, the EPR paper ended with:
While we have thus shown that the wave function does not provide a complete description of the physical reality, we left open the question of whether or not such a description exists. We believe, however, that such a theory is possible.
The question is: do these missing elements of reality really exist? We will answer this question in section 3 after some technical preparations.
Under what conditions can a system of empirically ascertained probabilities be described by Kolmogorov’s probability theory? The following mathematical preparations will provide some probability theoretic inequalities which are not identical with but deeply related to the Bell-type inequalities; they play an important role in distinguishing classical Kolmogorovian probabilities from quantum probabilities.
Imagine that somehow we assign numbers between 0 and 1 to particular events, and we regard them as “probabilities” in some intuitive sense. Under what conditions can these “probabilities” be represented in a Kolmogorovian probabilistic theory? As we will see, such a representation is always possible. Restrictive conditions will be obtained only if we also want to represent some of the correlations among the events in question.
Consider the following events: A1, A2, … An. Let
S⊆{(i, j)|i<j; i, j=1, 2, … n}
be a set of pairs of indexes corresponding to those pairs of events the correlations of which we want to be represented. The following “probabilities” are given:
pi= | p(Ai) | i=1, 2, … n | (9) |
pij= | p(Ai∧Aj) | (i, j)∈S |
We say that “probabilities” (9) have Kolmogorovian representation if there is a Kolmogorovian probability model (Σ, μ) with some X1, X2, … Xn∈Σ elements of the event algebra, such that
pi= | μ(Xi) | i=1, 2, … n | (10) |
pij= | μ(Xi∧Xj) | (i, j)∈S |
The question is, under what conditions does there exist such a representation? It is interesting that this evident problem was not investigated until the pioneer works of Accardi (1984; 1988) and Pitowsky (1989).
For the discussion of the problem, Pitowsky introduced an expressive geometric language. From the probabilities (9) we compose an n+|S|-dimensional, so called, correlation vector (|S| denotes the cardinality of S):
p=(p1, p2, … pn, … pij, …)
Denote R(n, S)≅ℝn+|S| the linear space consisting of real vectors of this type. Let ε∈{0, 1}n be an arbitrary n-dimensional vector consisting of 0’s and 1’s. For each ε we construct the following uε∈R(n, S) vector:
uεi | =εi | i=1, 2, … n | (11) |
uεij | =εiεj | (i, j)∈ S |
The set of convex linear combinations of uε’s is called a classical correlation polytope:
c(n, S)={ f∈ R(n, S)|f=∑ελεuε ; λε≥0; ∑ελε=1}
In 1989, Pitowsky proved (1989, pp. 22–24) the following theorem:
Theorem The correlation vector p admits a Kolmogorovian representation if and only if p∈ c(n, S).
Beyond the fact that the theorem plays an important technical role in the discussions of the EPR–Bell problem and other foundational questions of quantum theory, it shades light on an interesting relationship between classical propositional logic and Kolmogorovian probability theory. We must recognize that the vertices of c(n, S) defined in (11) are nothing but the classical two-valued truth-value functions over a minimal propositional algebra naturally related to events A1, A2, … An. Therefore, what the theorem says is that probability distributions are nothing but weighted averages of the classical truth-value functions.
It is a well known mathematical fact that the conditions for a vector to fall into a convex polytope can be expressed by a set of linear inequalities. What kind of inequalities express the condition p∈ c(n, S)?
The answer is trivial in the case of n=2 and S={(1,2)}. Set {0,1}2 has four elements: (0, 0), (1, 0), (0, 1), and (1, 1). Consequently the classical correlation polytope (Fig. 2) has four vertices: (0, 0, 0), (1, 0, 0), (0, 1, 0), and (1, 1, 1).
The condition p∈ c(2, S) is equivalent with the following inequalities:
0≤p12≤p1≤1 | |
0≤p12≤p2≤1 | (12) |
p1+p2−p12≤1 |
Indeed, from (12) we have:
p= | (1−p1−p2+p12)( 0 0 0 )+(p1−p12)( 1 0 0 ) |
+(p2−p12) ( 0 1 0 )+p12( 1 1 1 ) |
Another important case is when n=3 and S={(1, 2), (1, 3), (2, 3)}. The corresponding set of inequalities is the following (Pitowsky 1989, pp. 25–26):
0≤pij≤pi≤1 | |
0≤pij≤pj≤1 | |
pi+pj−pij≤1 | |
p1+p2+p3−p12−p13−p23≤1 | (13) |
p1−p12−p13+p23≥0 | |
p2−p12−p23+p13≥0 | |
p3−p13−p23+p12≥0 |
These are the Bell–Pitowsky inequalities.
Finally we mention the case of n=4 and
S={(1, 3), (1, 4), (2, 3), (2, 4)}
One can prove (Pitowsky 1989, pp. 27–30) that the following inequalities are equivalent with the condition p∈ c(4, S):
0≤pij≤pi≤1 | ||
0≤pij≤pj≤1 | i=1, 2 j=3, 4 | |
pi+pj−pij≤1 | ||
-1≤p13+p14+p24−p23−p1−p4≤0 | (14) | |
-1≤p23+p24+p14−p13−p2−p4≤0 | ||
-1≤p14+p13+p23−p24−p1−p3≤0 | ||
-1≤p24+p23+p13−p14−p2−p3≤0 |
Let us call them the Clauser–Horne–Pitowsky inequalities.
The elements of reality the EPR paper is talking about are nothing but what the property interpretation calls properties existing independently of the measurements. In each run of the experiment, there exist some elements of reality, the system has particular properties < #ai > which unambiguously determine the measurement outcome < ai >, given that the corresponding measurement a is performed. That is to say,
p(< ai >|< #ai >∧ a)=1 | (15) |
This condition—coming from Assumptions 2 and 3 and the Reality Criterion—is sometimes called “Counterfactual Definiteness” (Redhead 1987). According to the “no conspiracy” assumption we stipulated in Assumption 4,
p(< #ai >∧ a)=p(< #ai >)p(a) | (16) |
p(< #ai >)=p(< ai >|a)=tr(^WPi) | (17) |
That is, the relative frequency of the element of reality < #ai > corresponding to the measurement outcome < ai > must be equal to the corresponding quantum probability tr(^WPi). However, this is generally impossible. According to the Laboratory Record Argument (Szabó 2001) below, there are no things (elements of reality, properties, “quantum events”, etc.) the relative frequencies of which could be equal to quantum probabilities.
Imagine the consecutive time slices of a given region of the world (say, the laboratory) corresponding to the consecutive runs of an experiment (Fig. 3). We do not know what “elements of reality”, “properties”, “quantum events”, etc., are, but we can imagine that in every such time slices some of them occur, and we can imagine a laboratory record like the one in Table 1.
Table 1: An imaginary laboratory record about the occurrences of the hidden elements of reality
“1” stands for the case if the corresponding element of reality occurs and “0” if it does not. We put “1” into the column corresponding to a conjunction if both elements of reality occur. In order to avoid the objections like “the two measurements cannot be performed simultaneously”, or “the conjunction is meaningless”, etc., let us assume that the pairs (X1, X3), (X1, X4), (X2, X3), and (X2, X4) belong to commuting projectors.
Now, the relative frequencies can be computed from this table:
n1=N1N, n1=N2N, … n24=N24N | (18) |
Notice that each row of the table corresponds to one of the 24 possible classical truth-value functions over the corresponding propositions. In other words, it is one of the vertices uε (ε∈{0, 1}4) we introduced in (11). Let Nε denote the number of type-uε rows in the table. The relative frequencies (18) can also be expressed as follows:
ni= | ∑ελεuεi |
nij= | ∑ελεuεij |
where λε=NεN. Clearly, λε≥0 and ∑ελε=1. That is to say, the correlation vector consisting of the relative frequencies in question satisfies the condition n=(n1, n, … n24)∈c(4, S) in section 2a. (Consequently—due to Pitowsky’s theorem—it admits a Kolmogorovian representation.)
One can generalize the above observation in the following stipulation: The elements of a correlation vector p admit a relative frequency interpretation if and only if p satisfies the condition p∈ c(n, S).
So in the above example, n∈ c(4, S) if and only if n satisfies the Clauser–Horne–Pitowsky inequalities (14). But, in general, quantum probabilities do not satisfy these inequalities. Consider the EPR experiment in section 1a. Assume that the possible directions are a1 and a2 in the left wing, and b1 and b2 in the right wing. We will consider the following particular case: ∢(a1, b1)=∢(a1, b2)=∢(a2, b2)=120° and ∢(a2, b1)=0. According to (1)–(2), the quantum probabilities are the following:
p(A1|a1)=p(A2|a2)=p(B1|b1)=p(B2|b2) | =12 | (19) |
p(A1∧ B1|a1∧ b1)=p(A1∧ B2|a1∧ b2) | ||
=p(A2∧ B2|a2∧ b2) | =38 | (20) |
p(A2∧ B1|a2∧b1) | =0 | (21) |
Let X1=A1, X2=A2, X3=B1, and X4=B2. The question is whether the corresponding correlation vector n=(12, 12, 12, 12, 38, 38, 0, 38) satisfies the condition of Kolmogorovity or not. Substituting the elements of n into (14), we find that the system of inequalities is violated. Quantum probabilities measured in the EPR experiment violate the Clauser–Horne–Pitowsky inequalities, therefore they cannot be interpreted as relative frequencies. Consequently, there cannot exist quantum events, elements of reality, properties, or any other things which occur with relative frequencies equal to quantum probabilities. (To avoid any misunderstanding, the restriction of a quantum probability measure to the Boolean sublattice of projectors belonging to the spectral decomposition of one single maximal observable does, of course, admit a relative frequency interpretation. It must be also mentioned that quantum probabilities, in general, can be interpreted in terms of relative frequencies as conditional probabilities. See Szabó 2001.)
In brief, given the existence of the predicted perfect correlations by quantum mechanics (Assumption 3), according to the EPR argument, there ought to exist particular elements of reality, which, according to the Laboratory Record Argument, cannot exist. To resolve this contradiction, we have to conclude that at least one of Assumption 1, 2 and 4 fails.
In the next section we will arrive at similar conclusions in a different context.
When the EPR paper was published, there already existed a hidden variable theory of quantum mechanics, which achieved its complete form in 1952 (Bohm 1952a; b). This is the de Broglie–Bohm theory, which also called Bohmian mechanics. (For a historical review of the de Broglie–Bohm theory, see Cushing 1994. For the Bohmian mechanics version of the standard text-book quantum mechanics, see Bohm and Hiley 1993 and Holland 1993.) This theory is explicitly non-local in the following sense: One of its central objects, the so called quantum potential which locally governs the behavior of a particle, explicitly depends on the simultaneous coordinates of other, far distant, particles. This kind of non-locality is, however, a natural feature of all theories containing potentials (like electrostatics or the Newtonian theory of gravitation). Such a theory is expected to describe physical reality only in non-relativistic approximation, when the finiteness of the speed of propagation of causal effects is negligible, but, according to our expectations, it fails on a more detailed spatiotemporal scale. What is unusual in the EPR situation is that the real laboratory experiments do reach this relativistic spatiotemporal scale, but the observed results are still describable by simple (non-local) quantum/Bohm mechanics.
In his 1964 paper (reprinted in Bell 1987), John Stuart Bell proved that
In a theory in which parameters are added to quantum mechanics to determine the results of individual measurements, without changing the statistical predictions, there must be a mechanism whereby the setting of one measuring device can influence the reading of another instrument, however remote. (Bell 1987, p. 20.)
The argument was based on the violation of an inequality derivable from a few plausible assumptions. Instead of Bell’s original inequality, it is better to formulate the argument by means of the Clauser–Horne inequalities, which are more applicable to the spin-correlation experiment described in section 1a. This difference is, however, not significant.
Bell was concerned with the following problem: Can the whole EPR experiment be accommodated in a classical world, that is, in a world which is compatible with the world-view of pre-quantum-mechanical physics? This pre-quantum-mechanical world is local, deterministic and Markovian (LDM), that is, it satisfies the following assumption:
Assumption 2‘ Our world is |
Electrodynamics is the paradigmatic LDM theory of this pre-quantum-mechanical world view.
It should be clear that Assumption 2‘ prescribes determinism only on the level of the final ontology, but it does not exclude stochasticity of an epistemic kind. At first sight Assumption 2‘ seems to be much stronger than Assumption 2. It is because the three metaphysical ideas, locality, determinism, and Markovity, seem to be clearly distinguishable features of a possible world. However, further reflection reveals that these concepts are inextricably intertwined. In all pre-quantum-mechanical examples the laws of physics are such that locality, determinism, and Markovity are provided together. If, however, our world is objectively indeterministic—this, of course, hinges on the very issue we are discussing here—then it is far from obvious how the phrase “no direct causal connection between …” is understood (also see section 6).
Anyhow, the question we are concerned with is this: Can all physical events observed in the EPR experiment be accommodated in an LDM world, including the emissions, the measurement setups, the measurement outcomes, etc., with relative frequencies observed in the laboratory and predicted by quantum mechanics?
We have eight different types of event: the measurement outcomes, that is, the detections of the particles in the corresponding up-detector, A1, A2, B1, B2, and the measurement setups a1, a2, b1, b2. Let us imagine the space-time diagram of one single run of the experiment (Fig. 5).
The positive dependence domain of the Cauchy surface S, D+(S), contains all events we observe in a single run of the experiment. According to the classical views, the Cauchy data on S unambiguously determine what is going on in domain D+(S), including whether or not events A1, A2, B1, B2, a1, a2, b1, and b2 occur. The occurrence of a type-X event means that the state of affairs in the dependence domain D+(S) falls into the category X. Which events occur and which do not, can be expressed with the following functions:
uX(μ, λ, ν)= | { | 1 | if D+(S) falls into category X | (22) |
0 | if not |
Taking into account that an event cannot depend on data outside of the backward light-cone,
uAi(μ, λ, ν)= | uAi(μ, λ) | i=1, 2 | (23) |
uBi(μ, λ, ν)= | uBi(λ, ν) | ||
uai(μ, λ, ν)= | uai(μ, λ) | ||
ubi(μ, λ, ν)= | ubi(λ, ν) |
The whole experiment, that is the statistical ensemble consist of a long sequence of similar space-time patterns like the one depicted in Fig. 5. In the consecutive situations, the existing values of parameters (μ, λ, ν) determine what happens in the given run of the experiment (Fig. 6). One can count the relative frequencies of the various (μ, λ, ν) combinations. Therefore, probabilities p(μ), p(λ), p(ν), p(μ∧λ), … p(μ∧λ∧ν) can be considered as given. Applying (23), the probabilities (relative frequencies) of the eight events can be expressed as follows:
Due to the common causal past, there can be correlations between the Cauchy data belonging to the three spatially separated regions (Fig. 7). Henceforth, however, we assume that
p(μ∧λ∧ν)=p(μ)p(λ)p(ν) | (30) |
This assumption can be justified by the following intuitive arguments:
Applying Bayes’ rule and taking into account assumption (30), the conditional probability p(Ai∧Bj|ai∧bj∧λ) can be expressed as follows:
p(Ai∧Bj∧ai∧bj∧λ)p(ai∧bj∧λ) | |
= | ∑μ, νuAi(μ, λ)uai(μ, λ)uBj(λ, ν)ubj(λ, ν)p(μ)p(ν)p(λ)∑μ, νuai(μ, λ)ubj(λ, ν)p(μ)p(ν)p(λ) |
= | ∑μuAi(μ, λ)p(μ)p(λ)∑μuai(μ, λ)p(μ)p(λ) ∑νuBj(λ, ν)p(ν)p(λ)∑νubj(λ, ν)p(ν)p(λ) |
= | ∑μuAi(μ, λ)uai(μ, λ)p(μ)p(λ)∑μuai(μ, λ)p(μ)p(λ) |
× ∑νuBj(λ, ν)ubj(λ, ν)p(ν)p(λ) ∑νubj(λ, ν)p(ν)p(λ) | |
= | p(Ai∧ai∧λ)p(Ai∧λ) p(Bj∧bj∧λ)p(bj∧λ) |
So, parameter λ, standing for the Cauchy data carrying the information shared by the left and right wings, must satisfy the following so-called “screening off” condition:
p(Ai∧ Bj|ai∧bj∧λ)=p(Ai|ai∧λ)p(Bj|bj∧λ) | (31) |
Bell restricted the concept of LDM embedding with a further requirement which is nothing but Assumption 4. In this context it says the following: The choice between the possible measurement setups must be independent from parameter λ carrying the shared information. In other words,
uai(μ, λ)= | uai(μ) | i=1, 2 | (32) |
ubi(λ, ν)= | ubi(ν) |
In this case, it immediately follows from (24)–(29) that
p(Ai|ai)= | ∑λp(Ai|ai∧λ)p(λ) | |
p(Bi|bi)= | ∑λp(Bi|bi∧λ)p(λ) i, j=1, 2 | (33) |
p(Ai∧ Bj|ai∧bj)= | ∑λp(Ai∧Bj|ai∧bj∧λ)p(λ) |
For example:
p(Ai|ai) | = | p(Ai∧ai)p(ai)= ∑μ, λuAi(μ, λ)uai(μ, λ)p(μ)p(λ) ∑μ, λuai(μ, λ)p(μ)p(λ) |
= | ∑μ, λuAi(μ, λ)p(μ)p(λ) ∑μ, λuai(μ, λ)p(μ)p(λ)= ∑λ(∑μuAi(μ, λ)p(μ))p(λ)∑μuai(μ)p(μ) | |
(★)= | ∑λ(∑μuAi(μ, λ)p(μ)∑μuai(μ)p(μ))p(λ)=∑λp(Ai|ai∧λ)p(λ) |
Equality (★) would not hold without condition (32).
It is an elementary fact that for any real numbers 0≤x1, x2, y1, y2≤1
−1≤x1y1+x1y2+x2y2−x2y1−x1−y2≤0
Applying this inequality, for all λ we have
−1≤p(A1|a1∧λ)p(B1|b1∧λ)+p(A1|a1∧λ)p(B2|b2∧λ) | |
+p(A2|a2∧λ)p(B2|b2∧λ)−p(A2|a2∧λ)p(B1|b1∧λ) | |
−p(A1|a1∧λ)−p(B2|b2∧λ)≤0 |
Taking into account (31), we obtain:
−1≤p(A1∧B1|a1∧b1∧λ)+p(A1∧B2|a1∧b2∧λ) | |
+p(A2∧B2|a2∧b2∧λ)−p(A2∧B1|a2∧b1∧λ) | (34) |
−p(A1|a1∧λ)−p(B2|b2∧λ)≤0 |
Multiplying this with probability p(λ) and summing up over λ, we obtain the following inequality:
−1≤p(A1∧B1|a1∧b1)+p(A1∧B2|a1∧b2) | |
+p(A2∧B2|a2∧b2)−p(A2∧B1|a2∧b1) | (35) |
−p(A1|a1)−p(B2|b2)≤0 |
Similarly, changing the roles of A1, A2, B1, and B2, we have:
−1≤p(A2∧B1|a2∧b1)+p(A2∧B2|a2∧b2) | |
+p(A1∧B2|a1∧b2)−p(A1∧B1|a1∧b1) | (36) |
−p(A2|a2)−p(B2|b2)≤0 |
−1≤p(A1∧B2|a1∧b2)+p(A1∧B1|a1∧b1) | |
+p(A2∧B1|a2∧b1)−p(A2∧B2|a2∧b2) | (37) |
−p(A1|a1)−p(B1|b1)≤0 |
−1≤p(A2∧B2|a2∧b2)+p(A2∧B1|a2∧b1) | |
+p(A1∧B1|a1∧b1)−p(A1∧B2|a1∧b2) | (38) |
−p(A2|a2)−p(B1|b1)≤0 |
Inequalities (35)–(38) are due to Clauser and Horne (1974), but they essentially play the same role as Bell’s original inequalities of 1964. Therefore they are called Bell–Clauser–Horne inequalities.
According to Assumption 1, the conditional probabilities in the Bell–Clauser–Horne inequalities are nothing but the corresponding quantum probabilities, the values of which are given in (19)–(21). These values violate the Bell–Clauser–Horne inequalities.
So, in a different context, we arrived at conclusions similar to section 1d. That is to say, one of Assumption 1, Assumption 2′ and Assumption 4 must fail.
Notice that the Clauser–Horne–Pitowsky inequalities (14) and the Bell–Clauser–Horne inequalities (35)–(38) are not identical—in spite of the obvious similarity. The formers apply to some numbers that are meant to be the (absolute) probabilities of particular events, and express the necessary condition of that these “probabilities” admit a Kolmogorovian representation and—in the Laboratory Record Argument—a relative frequency interpretation. In contrast the Bell–Clauser–Horne inequalities apply to conditional probabilities, and we derived them as necessary conditions of LDM embedability.
Finally, it worthwhile mentioning, that the spin-correlation experiment described in section 1a has been performed in reality, partly with spin-12 particles, partly with photons (Clauser and Shimony 1981). (The experimental scenario for spin-12 particles can easily be translated into the terms of polarization measurements with entangled photon pairs.) In the experiments with photons, the spatial separation of the left and right wing measurements has also been realized. (The first experiment in which the spatial separation was realized is Aspect, Grangier and Roger 1981. The best conditions have been achieved in Weihs et al. 1998.) So far, the experimental results have been in wonderful agreement with quantum mechanical predictions. Therefore, the violation of the Bell-type inequalities is an experimental fact.
In the particular case when the values of p(Ai|ai∧λ), p(Bi|bi∧λ), and p(Ai∧Bj|ai∧bj∧λ) on the right hand side of (33) are only 0 or 1, λ is called a deterministic hidden variable. The above derivation of the Bell–Clauser–Horne inequalities simultaneously holds for both stochastic and deterministic hidden variable theories. Notice that the screening off condition (31) is not automatically satisfied by any deterministic hidden variable. What we automatically have in the deterministic case is the following:
p(Ai∧ Bj|ai∧bj∧λ)=p(Ai|ai∧bj∧λ)p(Bj|ai∧bj∧λ)
This is different from condition (31), except if the following are also satisfied:
p(Ai|ai∧bj∧λ)= | p(Ai|ai∧λ) | (39) |
p(Bj|ai∧bj∧λ)= | p(Bj|bj∧λ) | (40) |
that is to say, the outcome in the left wing is independent of the choice of the measurement setup in the right wing, and vice versa. Conditions (39)–(40), sometimes called “parameter independence” (Van Fraassen 1989), are, however, automatically satisfied by LDM embedability.
Thus, the distinction between deterministic and stochastic hidden variable theories is not so significant. As we have seen, the necessary condition of their existence is common to both of them.
When we say that the hidden variable model is “stochastic”, it means epistemic stochasticity. Parameter λ does not fully determine the measurement outcomes: the value of uAi(μ, λ) also depends on μ, and the value of uBj(λ, ν) also depends on ν. But the LDM world, as a whole, is deterministic: whether events ai and Bj occur is fully determined by μ, λ, and ν.
There is an easy resolution of the EPR/Bell paradox, if we allow the conspiracy that was prohibited by Assumption 4 (Brans 1988; Szabó 1995). It is hard to believe, however, that the “free” decisions of the laboratory assistants in the left and right wings depend on the value of the hidden variable which also determines the spins of the two particles.
Assumption 1 seems to be the most robust one. One might think that (7) is a simple empirical fact. There is, however, a resolution of the problem which is entirely compatible with Assumptions 2‘ and 4, but violates Assumption 1 in a very sophisticated way. This is Arthur Fine’s interpretation of quantum statistics (1982). The basic idea is this. To determine “What does quantum probability actually describe in the real world?” we have to analyze the actual empirical counterpart of tr(^WPi) in the experimental confirmations of quantum theory. Consider the schema of a typical quantum measurement (Fig. 8).
Contrary to classical physics where getting information about the existence of a physical entity and measuring one of its characteristics are two different actions, in a typical quantum measurement these two actions coincide. Therefore we have no independent information about the content of the original ensemble of objects emitted by the source. In fact, the theoretical “probability” predicted by quantum mechanics is identified with the ratio of the number of detections in one channel relative to the total number of detections, that is,
tr(^WPi)=Ni∑iNi | (41) |
Now, if, as it is usually assumed, a non-detection were an independent random mistake of an inefficient detector or something like that, then the right hand side of (41) would be still equal to p(< ai >|a). This is, however, a completely implausible assumption within the context of a hidden variable theory. (This is the most essential point of Fine’s approach.) For if there are (hidden) elements of reality, for instance the particle has some hidden properties, that pre-determine the outcome of the measurement and in general pre-determine the behavior of the system during the whole measurement process, then it is quite plausible that they also pre-determine whether the entity in question can pass through the analyzer and can be detected, or not. If so, then the right hand side of (41) is a relative frequency on a “biased” ensemble, therefore
p(< #ai >)=p(< ai >|a)≠tr(^WPi)
and the Clauser–Horne–Pitowsky inequalities as well as the Bell–Clauser–Horne inequalities can be—and, in fact, are—satisfied. This is, of course, not the whole story. The concrete hidden variable theory has to describe how the hidden properties determine the whole process and how the relative frequencies of the hidden elements of reality are related to quantum probabilities. There exist such hidden variable models for several spin-correlation experiments and they are entirely compatible with the real experiments performed so far (2008). For further reading see Fine 1986; 1991; Larsson 1999; Szabó 2000; Szabó and Fine 2002.
In spite of the above mentioned developments and in spite of the fact that the no-action-at-a-distance principle seems to hold in all other branches of physics, the painful conclusion that Assumption 2 is violated is more widely accepted in contemporary philosophy of physics.
Many argue that the violation of locality observed in the EPR experiment is not a serious one, because the spin-correlations are not capable of transmitting information between spatially separated space-time regions. The argument is based on the fact that, although the outcome in the right wing is (maximally) correlated with the outcome in the left wing, the outcome in the left wing itself is a random event (with probability 12 it is “up” or “down”) which cannot be influenced by our free action. We cannot send Morse code signals from the left station to the right one with an EPR equipment.
Others argue that this is a misinterpretation of the original no-action-at-a-distance principle which completely prohibits spatially separated physical events having any causal influence on each other, no matter whether or not the whole process is suitable for transmission of information. Consider the example depicted in Fig. 9.
(A) | |
(B) | |
(C) |
In case (A) the telegraph works normally. By pressing the key we can send information from one station to the other. It is no wonder that the pressing of the key at the sender station and the behavior of the register at the receiver station are maximally correlated. We have a clear causal explanation of how the signal is propagating along the cable connecting the two stations. Next, imagine that something goes wrong and the key randomly presses itself (case (B)). The random sequence of signals generated in this way is properly transmitted to the receiver station, but the system is not suitable to send telegrams. Still we have a clear causal explanation of the correlation between the behaviors of the key and the register. Finally, case (C), imagine the same situation as (B) except that the cable connecting the two stations is broken. In this situation, it would be astonishing if there really were correlations between the random behavior of the key and the behavior of the register, and it would cry out for causal explanation, no matter whether or not we are able to send information from one station to the other.
As this simple example illustrates, no matter whether or not we are able to communicate with EPR equipment, the very fact that we observe correlations which cannot be accommodated in the causal order of the world is still an embarrassing metaphysical problem.
In order to resolve the paradox, there have been various suggestions to modify the underlying physical/mathematical/logical theories by which we describe the phenomena in question. Some of these endeavors are based on the observation that the violation of the Bell-type inequalities is deeply related to the non-classical feature of quantum probability theory (Santos 1986; Pitowsky 1989; Pykacz 1989; Pykacz and Santos 1991). More exactly, it is rooted in the (non-distributive lattice) structure of the underlying event algebra which essentially differs from the classical Boolean algebra. According to some of these approaches, the fact in itself that the Bell-type inequalities are violated has nothing to do with such physical questions as locality, causality or the ontology of quantum phenomena. It is just a simple mathematical consequence of quantum probability theory and/or quantum logic (Pitowsky 1989, pp. 49–51; 182–183).
According to another approach, it is quantum mechanics itself that has to be modified. So called relational quantum mechanics (Bene 1992; Rovelli 1996; Bene and Dieks 2002) introduces a new concept: the relative quantum state. It turns out that the relative quantum state of the right particle changes if the left particle is measured and vice versa. Therefore, it is argued, the two particles are not causally separated at a quantum level.
Some papers, motivated by the problem of quantum gravity, suggest space-time structures that are intrinsically based on quantum theory. These results have remarkable interrelations with the EPR–Bell problem (Szabó 1986; 1989; Svetlichny 2000). The EPR events, which are spatially separated in classical space-time, turn out not to be spatially separated in some other space-time structures based on quantum mechanics.
Another branch of research attempts to develop, within the framework of algebraic quantum field theory, an exact concept of “separation” of subsystems (Rédei 1989; Redhead 1995; Rédei and Summers 2002; 2005).
What is common to all these efforts is that they aim to improve the conceptual/theoretical means by which we describe and analyze the EPR–Bell problem. All these approaches, however, encounter the following difficulty: The violation of the Bell-type inequalities is an experimental fact. It means that the EPR–Bell problem exists independently of quantum mechanics, and independently of any other theories: what is important from (1)–(2) is that
p(A|a)=p(B|b)= | 12 | (42) |
p(A∧B|a∧b)= | 12sin2 ∢(a, b) 2 | (43) |
We observe correlations in the macroscopic world, which have no satisfactory explanation. It is hard to see how we could resolve the EPR–Bell paradox by changing something in our theories, by introducing new concepts, by changing, for example, the notion of a quantum state, by applying “quantum logic”, “quantum space-time”, etc. For, until the modified theory can reproduce the experimentally observed relative frequencies (42)–(43), the modified theory will contradict to Assumptions 1, 2/2′, and 4. (Note that Fine’s approach differs from the other proposals in claiming that (42)–(43) are not what we actually observe in the real experiments).
How correlations between event types are related to causality between particular events is an old problem in the history of philosophy. Although the underlying causality on the level of particular events does not necessarily yield to correlations on the level of event types, it is a deeply rooted metaphysical conviction, on the other hand, that there is no correlation without causal explanation. If there is correlation between two event types then there must exist something in the common causal past of the corresponding particular events that explains the correlation. This something is called a “common cause”. “Particular event” means an event of a definite space-time locus, a definite piece of the history of the universe, that is the totally detailed state of affairs in a given space-time region.
The interesting situation is, of course, when the correlated events are not in direct causal relationship; for example, they are simultaneous or, at least, spatially separated. (In order to distinguish direct causal relations from common-cause-type causal schemas, in other words real causal processes from pseudo-processes, Reichenbach (1956) and Salmon (1984) introduced the so called mark-transmission criterion: a direct causal process is capable of transmitting a local modification in structure (a “mark”); a pseudo-process is not. Consider Salmon’s simple example: as the spotlight rotates, the spot of light moves around the wall. We can place a red filter at the wall with the result that the spot of light becomes red at that point. But if we make such a modification in the travelling spot, it will not be transmitted beyond the point of interaction. The “motion” of the spot of light on the wall is not a real causal process. On the contrary, the propagation of light from the spotlight to the wall is a real causal process. If we place a red filter in front of the spotlight, the change of color propagates with the light signal to the wall, and the spot of light on the wall becomes red. It is not entirely clear, however, how the mark-transmission criterion is applicable for objectively random uncontrollable phenomena, like the EPR experiment. It also must be mentioned that the criterion is based on some prior metaphysical assumptions about free will and free action.)
The idea that a correlation between events having no direct causal relation must always have a common-cause explanation is due to Hans Reichenbach (1956). It is hotly disputed whether the principle holds at all. Many philosophers claim that there are “regularities” in our world that have no causal explanations. The most famous such example was given by Elliot Sober (1988): The bread prices in Britain have been going up steadily over the last few centuries. The water levels in Venice have been going up steadily over the last few centuries. There is therefore a “regularity” between simultaneous bread prices in Britain and sea levels in Venice. However, there is presumably no direct causation involved, nor a common cause. Of course, “regularity” here does not mean correlation in probability-theoretic sense (p(A∧B)−p(A)p(B)=1⋅1−1=0). So, it is still an open question whether the principle holds, in its original Reichenbachian sense, for events having non-zero correlation. Various examples from classical physics have been suggested which violate Reichenbach’s common cause principle. There is no consensus on whether these examples are valid. There is, however, a consensus that the EPR–Bell problem is a serious challenge to Reichenbach’s principle.
Another much-discussed problem is how to define the concept of common cause. As we have seen, in Bell’s understanding, the common cause is the hidden state of the universe in the intersection of the backward light cones of the correlated events. This view is based on the LDM world view of the pre-quantum-mechanical physics. According to Reichenbach’s definition (1956, Chapter 19) a common cause explaining the correlation p(A∧B)−p(A)p(B)≠0 is an event C satisfying the following condition:
p(A∧B|C)= | p(A|C)p(B|C) | (44) |
p(A∧B|¬C)= | p(A|¬C)p(B|¬C) | (45) |
Reichenbach based his common-cause concept on intuitive examples from the classical world with epistemic probabilities. However, as Nancy Cartwright (1987) points out, we are in trouble if the world is objectively indeterministic. We have no suitable metaphysical language to tell when a world is local, to tell the difference between direct and common-cause-type correlations, to tell what a common cause is, and so on. These concepts of the theory of stochastic causality are either unjustified or originated from the observations of epistemically stochastic phenomena of a deterministic world.
László E. Szabó
Email: leszabo@philosophy.elte.hu
Eötvös University
Hungary
Last updated: January 7, 2008 | Originally published: