The Voynich Ninja

Full Version: Distribution of Q-Q gaps in paragraphs
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3
It is not just  q words that have a dislike for coming at the start and end of a line. Words starting  c+h don't seem to want to do so either. Again, under a meaningful text hypothesis distributions mid-line should be level, but no, the manuscript doesn't show it.

[attachment=13608]

But also 321 paragraphs don't seem to me to be enough to be able to say with statistical confidence that there is anything odd about paragraph last words. In fact the top paragraph last words seem very much to be top also for the whole of the quire.

[attachment=13609]
(22-01-2026, 01:43 PM)Bernd Wrote: You are not allowed to view links. Register or Login to view.checked if other quires show a similar pattern

One big difference between quires 13 and 20 is that  q words in quire 13 are happy to appear as line first words. ( Compare the plots given earlier [ You are not allowed to view links. Register or Login to view. ])
(22-01-2026, 09:25 PM)dashstofsk Wrote: You are not allowed to view links. Register or Login to view.It is not just  q words that have a dislike for coming at the start and end of a line. Words starting  c+h don't seem to want to do so either.

q and sh are really common start-line glyphs(?)
(22-01-2026, 09:50 PM)Bluetoes101 Wrote: You are not allowed to view links. Register or Login to view.q and sh are really common start-line glyphs

Common, yes. But in quires 13 and 20 which are the sections of the manuscript with the most and longest paragraph text, they prefer to come later in the lines.
(22-01-2026, 09:25 PM)dashstofsk Wrote: You are not allowed to view links. Register or Login to view.Again, under a meaningful text hypothesis distributions mid-line should be level

You mean under a random gibberish hypothesis.  

In meaningful text one expects uneven distribution of words along sentences.  Depending on the content of the parags, one expects the distribution of words along parags to be uneven too.   Like, in a herbal, the word "herb" being more common at the beginning, while "grows" more common close to the end

There are various hypotheses that could explain how an uneven distribution of words in sentences and parags can produce uneven distribution along a line.

Some manuscripts have peculiar devices to help the reader or other reasons.  Like placing a ¶ at the beginning of a line that contains the start of a new sentence or topic.  Others may put such a mark inside the line, unless the sentence starts at the beginning of the line...

And please folks, this is important: when text is formatted into paragraphs, longer words will be be more common at start of line, and less common at the end of line.  In any language, any text, any spelling or encoding.  That will affect the distribution of prefixes, glyphs, digraphs etc at those places.

Indeed 320 parags are too few to estimate the frequency of a word or prefix at the start or end of parags.  But if you take 320 words at random  out of the VMS lexicon, the probability of them being all distinct is very low.  Several words occur with frequency 1% or more. You would expect quite a few duplicates...

All the best, --stolfi
(23-01-2026, 02:51 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.You mean under a random gibberish hypothesis

No, No. I do not mean that. If you are digging at my belief that the text in the manuscript is meaningless and artifcially constructed then I need to make it clear that I do not believe it to be random or gibberish. The writer had a way of generating text to give it a semblance of genuineness. At the same time giving himself a free hand to add variability and to write in a personal style. I have done my very best to try to say this in many of my previous offerings. [ You are not allowed to view links. Register or Login to view. ] for instance shows that gallow words are constructed to a pattern. [ You are not allowed to view links. Register or Login to view. ] and [ You are not allowed to view links. Register or Login to view. ] is something about words being formed from sequences of  e series and  i series strings.
(23-01-2026, 08:40 AM)dashstofsk Wrote: You are not allowed to view links. Register or Login to view.[ You are not allowed to view links. Register or Login to view. ] for instance shows that gallow words are constructed to a pattern.

But on the first two rows of that table we already see that {qok,ok} x {eedy,aiin} grossly deviate from the random combination claim.  Like what happens on many natural languages. 

Quote: [ You are not allowed to view links. Register or Login to view. ] and [ You are not allowed to view links. Register or Login to view. ] is something about words being formed from sequences of  e series and  i series strings.

Agreed, the Author clearly designed his alphabet with writing speed and ease as a primary goal.  (It probably was still far from ideal in this respect, which may have motivated him to change the glyph values and spelling rules at some point.  That is still a possible explanation for the A/B split.)  

But "efficient glyph design by combining simple pen strokes" does not affect the probability that it is natural language.

You saw my own OKOKO/CMC model for Voynichese words, that is a bit more detailed than just a division into prefixes and suffixes.  It implies many constraints on digraphs. And it could still be made stricter.  And yet it is not incompatible with natural language...

All the best, --stolfi
(23-01-2026, 09:06 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.{qok,ok} x {eedy,aiin}

They are close enough to parity for my liking. As I said before I do not believe in blind randomness. I do not believe that the writer was rigidly following the method. So long as he was careful to give the text a semblance of genuineness, there was no need for strict adherence. To make the task of writing easier he was free to deviate as he wished and had his own style and preference. Formed gallows by keeping to the prefix / suffix formation, but was not too strict. Did not need to give exact parity.
(23-01-2026, 09:28 AM)dashstofsk Wrote: You are not allowed to view links. Register or Login to view.
(23-01-2026, 09:06 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.{qok,ok} x {eedy,aiin}
They are close enough to parity for my liking. As I said before I do not believe in blind randomness. I do not believe that the writer was rigidly following the method.

But how would he have done it then?  If he picked prefixes and suffixes independently -- by spinning two wheels, by pulling cards from two bags, etc -- we should see Pr(XY) = ~Pr(X)Pr(Y).  Even if each wheel/bag is heavily biased.  Even if he failed to spin or mix properly before each draw.  Even if he got lazy now an then, and chose a prefix or suffix from his head, or repeated a recent one, instead of using the device.

Those deviations from independence that we see mean that he did not choose prefixes and suffixes independently while writing the text, but chose the words as a whole -- even if he initially made up his lexicon by combining prefixes and suffixes.

Guess what, that is how natural languages work...

Quote:So long as he was careful to give the text a semblance of genuineness, there was no need for strict adherence.

If the goal was to "give the text a semblance of genuineness", he should not have chosen to generate it by combining prefixes and suffixes.  That would have been twice the work than choosing whole words from a bigger bag, only to produce something that would have looked totally unnatural to an European at the time.  

Not to mention that the prefixes and suffixes are not arbitrary.  As per the CMC+OKOKO model, each word consists in fact of 7 + 8 slots, in a specific order, each of which can be filled or not with glyphs or glyph combinations from a small set of elements, specific to each slot.  The prefix/suffix decomposition that you use seems to be the result of splitting those 15 slots in two parts, around the gallows. Why would he use such a complicated method to create the prefixes and suffixes? 

And then he would have to make sure that the entropy per word, the distribution of word pairs, the distribution of words along a paragraph, etc etc all had the "semblance of genuineness"...

That is the big problem with all the "gibberish" proposals.  There are many methods of generating mysterious-looking text with much higher "semblance of genuineness" that would be much easier to execute and devise, that would be perfectly good for the time.  If the method generated 20 bits per word, or a  Zipf plot with three flat steps, or the token pair distribution was factorable -- no one at the time would have noticed.  

And one could also simply scan the Bible back to front, copying every third word with a simple encoding.  Or anything like that.  Without even being too careful about it. That "cipher text" would be mysterious and impossible to decipher, but would look as  "genuine" as the Bible, and would even have the right Zipf, entropy, etc..

Kelley made up an "Enochian language" complete with an "Enochian alphabet", and filled a whole book with it, without any device like wheels or bags of cards.  Yet the result was good enough to fool mathematician John Dee, all the way to his grave.

I still haven't seen any solid evidence that Voynichese is not natural language in the plain.  But I have seen a lot of hints that it is...

All the best, --stolfi
(23-01-2026, 04:58 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.But how would he have done it then?

We can only guess how it was done.

But because the writing seems to flow fluently I do not believe that he stopped after every word to consult tables or throw dice in order to decide the next word. He seems to have had the ability just to write. I had an idea some time ago for a possible scenario [ You are not allowed to view links. Register or Login to view. ].

Also, the writer was not an automaton. Human randomness is not the same as computational randomness. We all know humans are not good at mimicking randomness. Psychological tricks sometimes mislead. For instance the writer might have been tempted to follow a short prefix with a long suffix, or a long prefix with a short suffix. Short-short seems to occur less than expected. This seems to be so with  oky . But also this word occurs more often than expected at the end of a line. Obviously this is one word the writer liked to use when he was approaching the edge of the page and had to choose something small with which to terminate.


(23-01-2026, 04:58 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.he should not have chosen to generate it by combining prefixes and suffixes

But he did. And this is where he made a mistake. And gave us a clue to the deception. He was not sufficiently clever in hiding the fabrication. It might have fooled people in the 15th century. But today with the assistence of programmable computers we can see something of the way it was done.
Pages: 1 2 3