"Inspect every piece of pseudo-science and you will find a security
blanket, a thumb to suck, a skirt to hold. What have we to offer in
exchange? Uncertainty! Insecurity!"
Witztum, Rips and Rosenberg (WRR) describe in a paper in "Statistical Science"
(Vol. 9 (1994), p. 429)
the outcomes of two experiments which purport to statistically
prove the existence of a hidden code in the Book of Genesis.
In later preprints they report on further successful experiments
and yet another successful experiment was reported by Gans.
WRR paper showed surprising proximity (according to some notion of distance)
between equidistant
letter sequences (ELS for short)
of names of famous rabbis and their known dates of birth
and death. WRR ran their experiments on two lists of rabbis and the
experiments are referred to as the "famous rabbis experiments". Gans
(and later WRR themselves) showed similar phenomenon when the names of
the rabbis were matched to the places where they were born or died.
(This experiment is known as the "cities of the famous rabbis experiment".)
WRR's fantastic claims raise the question
whether the outcomes they describe express their own
expectations rather than any real phenomenon.
Indeed, many similar experiments to those reported by WRR
performed by skeptics - McKay, Simon and others
showed no trace of the alleged phenomenon.
Moreover, running the "famous rabbis experiments" and the
experiment reported by Gans
based on appropriate process for data-selection
left no trace of WRR's phenomenon.
See:
Brendan McKay's site and
Barry Simon's site .
Detailed study by Bar-Hillel, Bar-Natan and McKay (BBM) of WRR's experiments
show the existence of a large "wiggle room" in WRR experiment
and gives plenty of evidence for biased data-selection.
A comprehensive paper "Solving the Bible Code Puzzle"
about these findings as well as
some of those presented below appeared in
"Statistical Science" Vol 14(1999) 150-173. See,
Brendan McKay's site .
A case which is very similar in various respects to the famous rabbis
experiment (pointed out to me by B. McKay) is the "Mars effect" The
claim by the French psychologist Michel Gauquelin that Mars occupies
certain positions in the sky more often at the
birth of sports champions than at the birth of ordinary people.
The British psychologist Hans Jurgen Eysenck thought that Gauquelin's
results are the only reason not to reject astrology completely.
Astrologer Robert Hand has stated that the Gauquelin
findings are 'one of the strongest threats to mechanist-materialism
in existence. See here
for the lovely paper of Jan Willem Nienhuys.
In this site I describe my statistical work
with Brendan McKay and Maya Bar-Hillel on the subject.
The idea was to study only the statistical outcomes reported by WRR
without getting into the historical or grammatical choices and
without carrying experiments on the Book of Genesis or any other book.
Our conclusion is:
"The results of Witztum, Rips and Rosenberg stretch credibility, even without
challenging the validity of their hidden code hypothesis.
Our analysis of the results of their replication and control
experiments show them to express naive expectations rather than
statistical reality."
Our paper
Isaac Asimov in the tenth anniversary issue of The
Skeptical Inquirer.
The Two Famous Rabbis Experiments: How Similar is Too Similar?
The Center of Rationality and Interactive Decisions
Feldman Building, Givat-Ram,
The Hebrew University of Jerusalem
91904 Jerusalem Israel.
P. Diaconis, Theories of Data Analysis: from magical thinking through classical statistics, in: Exploring Data Tables, Trends, and Shapes, D. Hoaglin et als. (eds.), Wiley and Sons, New York, 1985.
The basic observation is: The two (original) p-values reported by WRR are too close.
The hypothesis suggested in the paper is: The significance in the second test of Witztum, Rips and Rosenberg (WRR) was achieved via a data selection process, which was stopped when the significance level of the first test was met.
The results further suggest that the data-selection process was carried out (at least in its final stages) by adding or deleting favorable appellations for the Rabbis.
It is also argued that the distributions of pair-distances in the two experiments do not support reasonable interpretations of the original research hypothesis of a hidden text.
Here I carried out some of the suggestions from the first version. I discover a dependence between the data of the two experiments that I could not explain: The two distributions of distances are closer together than expected even from two samples of the same distribution.
I point out that WRR's pair-distances distributions are close to distributions obtained by simple simulations of an optimization procedure based on the P2 statistics.
1. The initial observation is a-posteriori
This is the most serious criticism against my argument. One can argue
that it is always possible to find something which looks unlikely
and make a story around it.
2. The two p-values being close is quite an arbitrary event.
One could make a similar claim if one p-value was precisely twice the other
or if the ratio between them was close to 3.14159, etc.
It turns out that expecting the outcome of a replication
to be similar to the
outcomes of the original experiment is a familiar phenomenon
which is discussed in the psychological literature.
3. A p-value of 1/100 is not enough to accuse somebody
in tailoring the experiments
But is it enough to raise suspicion?
4. the alternative hypothesis does not quite fit: the inspection paradox
This is a correct and quite an interesting point.
The expected waiting time for a bus when you arrive to the
station at a random time is usually
larger than 1/2 the expected gap between two buses. This was overlooked
in the first paper.
5. "Your hypothesis suggests that WRR acted stupidly.
One thing you cannot blame them is being stupid."
Empirical experiments by Tversky and Kahnemann showed that
people's (including statistical savvy scientists)
statistical expectations are quite different than what can normatively
be expected. In this case, it was difficult to know in advance what to expect.
Moreover, experienced, statistically savvy, famous scientists made similar
mistakes when they fabricated experiments.
See the paper of Dorfman - Science, Vol 201 (1978) p. 1177
on the case of Sir Cyril Burt.
6. The excessive similarity between the two experiments may have some
explanation according to WRR's research hypothesis.
Of course, everything can be explained as expressing divine intervention.
However, note that that the striking similarity of WRR's two experiments
relates to the false statistical measures WRR initially used and their defunct
computer programs. The striking similarity for the two lists of Rabbis
in the cities experiment occurs for the initial lists of cities that was
later withdrawn as imperfect.
Finally, a paper by J.B.S. Haldane entitled 'The faking of genetical results'
that appeared in "Eureka":
Cambridge undergraduate mathematics journal from 1942 seems quite relevant.
(I found, to my surprise, this reference with a discussion and further
references in the book:
"Fourier Analysis" by T.W. Korner Ch. 82 p. 425.)
Haldane is
quoting his father (experimental physiologist) as saying
"Unless the blood is very thoroughly faked, it will be
found that duplicate determinations rarely agree". He continues to say:
"In genetical work also, duplicates rarely agree unless they are faked."
The new (1998) paper with McKay and Bar-Hillel.
This paper gives much more evidence that WRR's outcomes express
WRR's naive expectations. We offer a solution for the mystery
why the two distance distributions are so close together. We also discuss
another aspect of WRR's experiment- the control experiments.
The most important control experiment is the one
suggested by Diaconis. WRR presented (as they expected) a flat histogram
for this control experiment, but in the context of their experiment such a
flat histogram is unlikely.
They also presented utterly flat histogram for their
experiment when they ran it on the Samaritan version of the book of Genesis.
Again this is "too good to be true".
The paper contains statistical analysis of the following observations:
The
significance level of WRR's experiment 2 was inordinately similar to that of
experiment 1. (p=0.01)
The distribution of the pairwise distances in
experiment 2 was inordinately close to that in experiment 1. (p=0.035)
The particular visual display of the pairwise distances
as described by histograms
was optimal, namely, of all possible
histograms like this one (same number of bins, same breadth of bin) none
would have yielded a second histogram as close to the first as the one
actually used. This support the explanation that the dependence
between the distributions is due to intentional intervention aimed at
presenting similar histograms.
The histograms of the control experiment suggested by Diaconis were
inordinately flat. (p=0.003)
The histograms of the 3 other control texts reported in WRR 1987 preprint were
inordinately flat (p=0.003, p=0.017 and p=0.86.)
The p-values of Gans' experiment (based on WRR's method at the time)
were also inordinately close. (p=0.002)
We also point out that WRR changed their measurement tools during the review
process of their paper. These changes were apparently unknown to the
referees.
1. Did the Maharishi meditation program influence middle-east peace
and car accidents in Jerusalem?
The following paper was published in a peer-reviewed scientific journal:
ORME-JOHNSON, D. W.; ALEXANDER, C. N.; DAVIES, J. L.; CHANDLER, H.
M.; and LARIMORE, W. E. International peace project in the Middle East:
The effect of the Maharishi Technology of the Unified Field.
Journal of
Conflict Resolution
, 32(4): 776-812, 1988.
A rather small group of meditators seemed to have achieved:
"Improved Quality of National Life as Measured by Composite Indices
Comprising Data on War Intensity in Lebanon,
Newspaper Content Analysis of Israeli
National Mood, Tel Aviv Stock Index,
Automobile Accident Rate in Jerusalem, Number of Fires
in Jerusalem, and Maximum Temperature in Jerusalem;
Significant Improvement in Each
Variable in the Index (Israel, 1983). Decreased War Deaths (Lebanon,
1983)."
The strong correspondence between the number of Transcendental Meditation-Sidhi
program participants in the group in Jerusalem and a composite index of
all the variables above can only be described as amazing.
The graph can be found
Here .
Challenge: Find out what is going on.
2. Study statistically the changes between the two versions of distances
WRR described in their 87 preprint all the distances (152 for the first list
and 163 for the second) between Rabbis and appellations. The histograms
of the Statistical Science paper are based on these distances.
WRR claimed that
The findings of our paper suggest that
Our paper is based only on studying the original list of distances.
3. A conjecture on the distribution of distances.
WRR give no clue how the distances in their samples will look like except that
their distributions will be skewed towards small distances.
The best way to support a theory which is based on a-posteriori observasion
is, of course, via a replication.
The subsequent study concerning the similarity of the two experiments of WRR
and more than that, the fact that
the same phenomenon (similar p-values)
occurred in the "cities experiment" give much additional strength to my
hypothesis.
Tversky and Kahneman who studied people as intuitive statisticians
showed that people have inflated intuitive expectations
of achieving the same significance in a replication
as the significance of the original experiment.
In any case, our further studies showed further statistical
"finger prints" that WRR's results were tailored.
However, after checking closely the situation at hand it turned out that this
mistake is not very damaging.
The Two Famous Rabbis Experiments: How Similar is Too Similar?
A file of distances
Challenges for the interested reader
They also supplied computer programs els1.c (and later els2.c) which give
somewhat different distances.
the changes represent a blind debugging process.
the defunct distances represent
deliberate effort towards similarity.
Challenge: study our hypothesis based on the two
versions of distances.
In the second version of my paper I proposed a conjecture
for the distributions of distances which is based on the assumption of
biased data-selection towards success in a permutation test:
If you have a sample of size n and the product of the numbers in the
sample is A then the distribution will be given by:
The probability that x
is smaller than t is
(1- log t/log A)**(n-1).
The rational is that apart from the sample size and product
of the entries, the distances will be "random".
Challenge: Check this conjecture for the distribution of distances
for the various "successful" samples described by Witztum, Rips, Gans etc.