Torsten's post ended with:
Quote:Sorry, but we are bound to the facts even if we don't like the outcome.
and this is quite misleading. The counts are the facts, and speculation about causes and effects are not facts.
All of the statistics presented in the post are based on the text of the entire MS. Then the reason is presented:
Quote:There is a reason for this result. The chance for a 'sh'-word to occur on a page increases as more often the corresponding 'ch'-word appears on that page:
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
This is the first time that statistics 'per page' enter the argument, so it is not based on what is written before.
The evidence seems to be the two links to voynichese.com that show the distribution per page of the most frequent words including 'ch' and 'sh'.
Going back to the facts, words including 'ch' are roughly twice as numerous as words including 'sh'. More than that, this holds true for similar word patterns, so typically for all word types including a 'ch', by replacing 'ch' with 'sh' one tends to finds half the number.
This is 'across the board'. It is an observation, regardless what process is at the origin of this behaviour. Now statistics are always more reliable when based on larger numbers, and when looking at individual word types, most of the time we are working with relatively small samples. For any word pattern including a 'ch', that appears N times, the expected frequency of the corresponding sh-word seems to be roughly N/2, but in reality there is a distribution around that value. In some cases, it will be less than N/2 and can even be zero. In other cases it will be more than N/2 and can even be greater than N. That these things happen is precisely what is shown by Torsten's statistics. There is a small number of words where the sh variant is more frequent than the ch variant. This says nothing very specific about the process that is behind the appearance of these words.
Also, it is shown that this happens for the text throughout the MS. Nothing is said about the behaviour per page. Of course, since a single page is a much smaller sample of text, one should expect a much greater dispersion of the statistics.
This is where the facts end.
Again, the only suggestion for the behaviour per page is given by the outputs of voynichese.com
It is probably worth looking at them in detail to see if there is any evidence of the suggested cause. I find that there are plenty of pages where it does not hold even remotely, even for these very frequent words.