The case in hand is complicated and emotionally loaded. Moreover, the (hidden) assumption of divine intervention complicates things even further. Let us start, therefore, with simple examples which demonstrates the basic approach.
Suppose someone claims to be able to hit a globe hanging 200 meters away with a bow and arrows while blindfolded. You blindfold him, he shoots once, then after a while he shoots again, and then sends his son to bring the globe, and ...lo and behold! Both arrows are stuck in the globe on the equator, very close to each other. What the boy did not know, however, is that while his father was shooting, the globe was spinning round the axis through its poles very quickly, so that if an arrow hits the globe the longitude on which it lands is essentially random. Taking a closer look you calculate that the probability of two random points on the equator being as close as the two arrows is 1/100. Furthermore, the distance between the two arrows is about the closest the boy could have stuck the arrows without hurting his fingers if he was the one who stuck them in....
We observe here a phenomenon which supports a simple cheating strategy and is very unlikely to be found otherwise.
We will move now to an example which is quite close to the discussion in the next sections.
A researcher conducts a statistical test to check the hypothesis that there is a significant positive correlation between height and salary among people with academic education.
She interviews all people from one neighborhood in the city and sorts out the 90 with academic education among them.
Then she tests for each individual his height and his salary. She finds indeed a positive correlation and the significance level for her finding is 0.000132.
When she is asked by the referees of the paper to repeat her experiment she assigns one of her assistants to the job. He chooses another neighborhood, this time it is a neighborhood with many immigrants so finding those individuals with academic education is harder. He repeats the process (finds 80 persons this time) and proves again that there is positive correlation between height and salary with a significance level of 0.000120.
At this point the professor suspects that the assistant who wanted to ingratiate himself with her faked the results using the degrees of freedom he had. In particular, his precise criteria for a person to be qualified as having academic education could not be understood by her.
The similarity of the two significance levels raises the hypothesis that however the assistant tampered with the data his strategy was to do it gradually until the significance level of the second test passed that of the first test. The ratio between them is 1.1.
How can we test such a hypothesis? Assume that we have all the relevant data on the people considered as having academic education but not on those rejected.
Suppose we split the 170 chosen individuals in another way into two groups of people consisting of 90 and 80 people respectively. We can assume the significant positive correlation between height and salary will hold also for both these groups. However, there is no reason to assume that the proximity between the significance levels we observe in the two parts will typically be higher then that we observe in the original research. In fact, there are reasons to assume it will be typically smaller.
We can bound the significance of the proximity of the two significance levels by comparing it in a Monte Carlo experiment to the proximity observed by splitting the 170 chosen individuals to 90 and 80 at random.
Suppose we find out that with probability 1/100 the following event occur: For a random splitting of these 170 people into two parts of 90 and 80, the ratio of the significance levels for the two parts is 1.1 or smaller.
This would be an incriminating evidence since there is here a phenomenon that we could expect to occur if the assistant was tampering with the data in a certain way but was very unlikely to be found otherwise.
But we can make one further step. Suppose we suspect that the main degree of freedom of the assistant was in including or rejecting persons in his list and that he gradually added people with academic education to the second list which were favorable to the research hypothesis until he reached the significance level of the first test. In this case the proximity of the significance level of the two tests should be related to the effect of adding the last person.
If the typical effect of adding a single person to the list is compatible with the number 1.1, this will give an additional support to the cheating hypothesis. (As we will see later, being compatible means that typically adding a person with academic education to the list changes the significance levels by a factor in the neighborhood of 1.12.)
Several people were reminded by the Bible code case of the following story (possibly a tale), which I first heard from Maya Bar Hillel. The mathematician Poincaré bought loaves of bread from a certain bakery and after a while he complained to the police that the average weight he observed is 0.9 kilograms rather than the required 1 kilogram. The police intervened and since then all of Poincaré's loaves of bread were heavier than 1 kilogram. Six months later Poincaré was asked if the baker stopped cheating and his answer was that he didn't. He found statistical evidence to the fact that the baker kept cheating but that every day he was putting aside for Poincaré a loaf of bread which was heavier than 1 kilogram. Although all the loaves of bread Poincaré got were over 1 kilogram their distribution was significantly close to a normal distribution with average 0.9 kilogram cuts off at 1 kilogram.
Poincaré identified a simple cheating strategy of the baker so that the distribution of weights of his loaves of breads agrees with the distribution you expect if you assume cheating and is very unlikely to happened if there was no cheating.
All the examples of this section have the weakness that the statistical analysis was not made to give a priori predictions on new data but rather to study given data. With a good lawyer the baker may get off the hook. But the researcher, as soon as she got the picture, disqualified the experiment made by her assistant.