the pair distances distribution for the two experiments separately

Next: Similarity of the pair-distances Up: The pair-distance distribution Previous: The pair-distance distribution

the pair distances distribution for the two experiments separately

Consider now all the distances for all the pairs of appellations versus dates for all the Rabbis in each of the two tests made by WRR. There are 152 pairs for which the distances are defined in the first experiment and 163 in the second. And now consider the pair-distance histogram namely the histogram of the distances occurring at each experiment. (See, [2, p. 437] and [3, p. 4,5].)

One striking fact about the pair-distance distribution is that at least apparently they do not fit at all the suggestion made by WRR that there is a hidden text in the book of Genesis in which we can expect pairs of words which are ``related'' to be close together. The decreasing shape of the histogram even for ``bad'' distances does not seem to be supported by any reasonable hidden text hypothesis. In particular what can be the reason for the rare appearances of very large distances (e.g. distances higher than 0.9)?

One could expect, for example, that the pair-distances histogram will be a combination of pair-distances for pairs which appear in the hidden text and pair distances for pairs which do not appear in the hidden text. Since three forms of writing the dates were chosen we can expect a substantial portion of pairs not to be in the hidden text. The histogram we see does not fit this possible description.

It seems that the pair distance histogram exhibits phenomena which are unfavorable to a theory of hidden text but are favorable to the main statistics chosen to verify the research hypothesis. This is not a good sign. It is like somebody tries to check the hypothesis that a certain university is lowering the academic standards for basketball players. He claims to prove this hypothesis by showing negative correlation between height and academic achievements. And then it turns out that this negative correlations is supported on small heights where there are no basketball players anyway.

In our view, the obvious explanation is that this is another sign of an optimization process which took place aimed at improving the P2 statistics.

It will be interesting to check a process that for a random ordering of all appellations of the two tests, pick them one by one adding to the list only those improving the P2 score (or perhaps choosing those with higher probability than the others) and stopping when the P2 score reaches (say) 10^-9. What will be the typical shape of the histogram of distances between pairs? This test can be conducted with respect to pair distances of the control tests given in [3], or for pair distance arising from distances in War and Peace. A simpler model would be to consider such a P2-optimization process when the data consists of independent random numbers with uniform distribution on [0,1] and at each stage 1-3 new such numbers are considered.

Next: Similarity of the pair-distances Up: The pair-distance distribution Previous: The pair-distance distribution

Gil Kalai
9/2/1997