(22-12-2025, 10:27 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.AFAIK, no one has found a natural language that explains any of these statistical properties:
In some of those examples there has been no serious attempt at checking whether those statistical properties occur or not among natural languages. The proponents just assumed that they "obviously" do not.
Quote:Major inconsistencies in basic glyph and glyph-bigram statistics between pages (some pages full of "or", "ol", "in" etc., some pages totally missing "e", "n", etc.
Can you give examples of these "major inconsistencies" in running text of pages
within the same section?
Quote:Currier "language" drifts and dialects
Natural languages
do have dialects. And spellings may vary. And word frequencies (hence bigram frequenceis) will depend strongly on topic.
Quote:Word pairs statistics: You are not allowed to view links. Register or Login to view. You are not allowed to view links. Register or Login to view.
The first paper considers only Indo-European languages. Not even Basque, Hugarian, Finnish, Estonian. Not Semitic languages (Arabic, Hebrew, Aramaic, Coptic, Ge'ez, Berber), not Kartvelian languages (Georgian, Mingrelian, Laz), not Turkish ...
... and not any East Asian languages, of course.
Quote:Frequent local similarities (reduplication and almost-reduplication) including insanely high levels of clustering of k/t gallows especially in Currier B, You are not allowed to view links. Register or Login to view.
This is one case where the proponents did not even bother to check whether such "insanely high levels of clustering" occur in English, much less in other languages -- they just assumed it would not. But there are a number of reasons why such clustering could occur in a natural language:
- Again, statistics of characters and bigrams are determined by their occurrences in the most frequent words -- or, in this case, consecutive word pairs. If the most frequent word pairs happen to have a certain letter in common, then that letter will have anomalously high duplication frequency. For example if you extract the sequence of vowels of English (after mapping "oo" and "ee" to single letters), every occurrence of "it is" or "if it" or "if I" or "in it" or "if in" or "I did" etc will generate an "ii" pair. When all word pairs are considered, "ii" is unlikely to come out just as common as predicted by the frequency of "i"s. Indeed, if Voynichese did not have anomalous duplication, it would be evidence that it was not natural language, and would suggest that each word generated independently by a random process.
- The same argument above applies if certain topic-specific word pairs occur with significant frequency in a given section, like "hot tea" or "this star".
- The definite article in Arabic is "al-"; unless the next word starts with "r", "s", or "z", in which case it changes to "ar-", "as-", or "az-". Each occurrence of this rule enhances the frequency of "rr", "ss", and "zz".
- Hungarian and Turkish have this thing called "vowel harmony". The vowels are divided in two sets, and all syllables of a word must use vowels from the same set. Thus the Turkish plural suffix is "-ler" or "-lar"; the plural of house "ev" is "evler", while that of car "araba" is "arabalar". These rules enhance the frequency of "ee" and "aa" (and other pairs) in the vowel sequence. If these languages were written with each morpheme as a separate "word", this enhancement would stand out even across multiple successive "words".
- And you surely do not want to know about "tone sandhi"...
Maybe the "anomalous" duplication frequencies of natural languages are not as extreme as those of Voynichese. Maybe they are even more extreme. Either way, the proponents should have verified that...
Quote:Patterns across word breaks, by Emma M.S. and Marco P. You are not allowed to view links. Register or Login to view.
See the above answer, especially the first point. Namely, the frequency of a character pair in this statistic is determined by its occurrence in the most common consecutive word pairs. Every occurrence of "it is" in English increases the frequency of "t-i", and so on. Again, if Voynichese did
not have anomalous frequencies of bigrams across word breaks, it would be evidence that it was
not a natural language.
And the other points above also apply, mutatis mutandis. As people have immediately pointed out, even in English there is the rule for "a" or "an" depending on the first phoneme of the next word.
Quote:"Vertical pairs" by Tavie You are not allowed to view links. Register or Login to view.
One "anomaly" discussed in that thread is that the first word (only) of a line is longer in average, while the last 1-3 words are shorter. As I explained in the previous post, this sort of anomaly is a guaranteed result of the trivial line-breaking algorithm. Does it explain
precisely the length anomaly of the VMS? I don't know; but until this explanation is tested, the anomaly cannot be used as evidence of LAAFU and/or that Voynichese is not natural language.
The other anomaly discussed in that thread is the distribution of bigrams in the sequence that one gets by taking the first character of every line. In that 301-line table there are many pairs whose frequencies do not match the numbers expected by the formula fr(XY) =fr(X)fr(Y). However in most cases the numbers are small so it is hard to tell whether the discrepancies are significant. If you throw 4000 balls into an array of 20x20=400 bins, perfectly at random, there will be some bins with highly anomalous counts, that deviate a lot from that formula.
Eyeballing that table, the anomalies that seem
statistically significant are the q-o, q-q, o-q, and o-o pairs. That by itself says nothing about the
language, but only about the
formatting of the text.
Here is one of many possible explanations for those anomalous pairs. Check You are not allowed to view links.
Register or
Login to view. (Swiss, paper, 410 pages, ~1430). Note that, on this particular section, the scribes did not separate paragraphs as we do today. Instead they seem to have marked the start of each sentence with a vertical slash through the first letter, and the start of each paragraph by underlining the first 1-3 words
and placing a paragraph marker on the left margin, all in red ink. Now suppose that the VMS scribe used a similar scheme: each sentence or clause in a paragraph may start anywhere in a line, and is marked there somehow; but then a q is also placed at the start of that line, to make it easier for the reader to find it. Then, if each clause is two or more lines long, there will be no q-q pairs in the line initial sequence.
I can think of other explanations based on interactions between the grammar of the language, the average length of a sentence, and the initial-word-length anomaly discussed above. But it is not worth detailing them here. The point, overall, is that those q-o anomalies do not really imply LAAFU, much less "it is not a natural language".
Quote:Patrick Feaster's several statistical discoveries are yet to be explained by a quantitative study of any language
These anomalies all seem to be largely consequences of the anomalous distribution of words in line-initial and line-final position, discussed above. There may be additional perturbations due to uncertain word spaces, which are expected to vary with position along a line.
And let me repeat again some advice for those doing statistical studies of the text: (1) don't try to analyze the whole book at the same time: if possible, limit the analysis to one or two sections the same "topic" and to a single specific type of text, such as multiline parags excluding the head lines (2) formulate an hypothesis and then collect the statistics that would best prove or disprove it; and (3) check whether the "anomaly" occurs in natural languages, including non-"European" ones.
All the best, --stolfi