The Voynich Ninja
An Artificial Construction - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: An Artificial Construction (/thread-4764.html)

Pages: 1 2


An Artificial Construction - dashstofsk - 25-06-2025

I think the time has come to give some more evidence for my belief that the text of the manuscript is an artificial construction.

Consider the  words from the language B pages listed in matrix form.


   


To generate this I took all  k words and then split them to get the initial and final parts. I then counted the frequency of words formed by the merger of every initial-final combination. The cut off count for the parts was set at 20. You will see that for the most frequent initials and finals every combination yields valid words that are frequent.

Now compare the word counts against estimates of what would be expected if the initials and finals were applied randomly. For instance out of 6546  k words 1343 start  ok and 537 end  kedy. The expected count of  okedy should be 1343 x 537 / 6546 = 110. The actual number is 108 which gives a ratio value of 0.98. This is very encouraging.


   


This matrix gives all the ratios. The word counts for the most frequent words are broadly in the range of what would be expected. No significant big swings from parity. The general parity between the observed and expected tells us that the choice of initial and final is largely independent, that the final part of the words are not dependent on the initial part. This suggests that some arbitrary construction has been used for these words, that they have an artificial structure.

That was for  in the language B pages. The same is similarly true for  t words in B.


   
   


The frequencies for the initials and finals are different. There is more variability in the initials and less in the finals. We can only guess what method the authors used to write  k t words. But I like to imagine that the authors had their ‘favourites’ for the start of the word. Then they wrote a  k or  t ( chosen possibly random or possibly dependent on the chosen initial ). Then a random final, of which the authors also had ‘favourites’, but this was largely chosen without much regard for what came before.

The authors seemed to understand that in order to give the constructed text a semblance of genuineness they had to add variability. Their fraudulence would be suspected if words did not appear to be ordered in descending frequency that is usual in languages.

But the significant fact is that the initial and final word parts are independent. And this is where the authors made a mistake. Such behaviour is not present in European languages, and it gives us evidence of a method of construction.

In language B there are 6546  k words and 3397  t words. The words given in the matrices sum to 7406. That makes 74%. This is a sufficient value to affirm that some method has been used to construct these gallows words, and therefore that this might also be true for the whole of the manuscript.


RE: An Artificial Construction - oshfdk - 25-06-2025

(25-06-2025, 10:33 AM)dashstofsk Wrote: You are not allowed to view links. Register or Login to view.This matrix gives all the ratios. The word counts for the most frequent words are broadly in the range of what would be expected. No significant big swings from parity.

How to you define significant? I see numbers like 0.11 and 2.47 and 4.24, etc, this is difference up to 8 times the expected number.

There are empty slots. Which means the count is zero, I suppose. Like for Shekeedy, which could be made of Shek (count 53) and eedy (count 612). But somehow it's not there. How likely is that?

Overall, thank you for the numbers, but I do not agree with the conclusions. While Voynichese seems to allow for much more freedom in combining chunks, the prefixes and suffixes are not selected randomly and independently. There appears to be some system in place. Which for me points more towards the cipher hypothesis.


RE: An Artificial Construction - RobGea - 25-06-2025

Nice work,
  and like oshfdk mentioned, a statistic for the signicance of the ratios would be a welcome extra.

A rough and ready count of the ratios in the k-words gives   52 / 235  within a range of 0.85 -- 1.15 .
That range is probably a bit narrow, idk what a significant one would be.


RE: An Artificial Construction - dashstofsk - 25-06-2025

(25-06-2025, 11:21 AM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.How to you define significant? I see numbers like 0.11 and 2.47 and 4.24, etc, this is difference up to 8 times the expected number.

There are empty slots. Which means the count is zero, I suppose. Like for Shekeedy, which could be made of Shek (count 53) and eedy (count 612). But somehow it's not there. How likely is that?

The fact that the ratios hover around the parity mark for the most frequent parts is good enough. You have to allow for the fact that it is the author's choice and that at each sitting they may have different ideas how to write. One great thing about the construction hypothesis is that there are no binding rules about the writing and this can explain the various oddities in the text.

0.11, 2.47 and 4.24 are for  chekeey qolkeedy ykeeody  which occur 1, 6 and 6 times. Their expected counts are 9, 3, 2. With such low counts you can't really expect to get unity between observed and expected.

Also  chekeey occurs once. 9 were expected. But I suspect that words that include eke are going to be underrepresented since I suspect that the authors would instead have preferred to write this as the one character ckh.

The expected count for Shekeedy is 5. Because of the small number and because of my suspicions about  ckh most probably it was never destined that the author should write this word.


RE: An Artificial Construction - dashstofsk - 25-06-2025

(25-06-2025, 12:21 PM)RobGea Wrote: You are not allowed to view links. Register or Login to view.a statistic for the signicance of the ratios would be a welcome extra.

A rough and ready count of the ratios in the k-words gives   52 / 235  within a range of 0.85 -- 1.15 .
That range is probably a bit narrow, idk what a significant one would be.

By 'significance' I take it that you would like to see a measure of statistical significance? It would be difficult to apply statistical testing to observations that are under the control of human choice. Any such attempt would lead to dubious results. I mentioned in an earlier post how I felt that the authors had a choice of similar looking words:

You are not allowed to view links. Register or Login to view.

It would be difficult to compute statistical significance on such behaviour.


RE: An Artificial Construction - RobGea - 25-06-2025

Thanks, I see what you mean. I was kinda thinking of something like this. ( Rough and ready version, green and orange were done consecutively, thats why they're like that )

   


RE: An Artificial Construction - RobGea - 25-06-2025

Actually, on review, i realize that i have been a bit cheeky by asking for something like color-coding the data.

It's like you have gone to the trouble of making and baking a tasty cake 
and i've said "sure this cake looks good and tastes scrummy but you could have added some gold leaf decorations"
My bad  Sad Blush  
Awaiting your tables of all the other words  Tongue


RE: An Artificial Construction - oshfdk - 25-06-2025

(25-06-2025, 12:37 PM)dashstofsk Wrote: You are not allowed to view links. Register or Login to view.The fact that the ratios hover around the parity mark for the most frequent parts is good enough. You have to allow for the fact that it is the author's choice and that at each sitting they may have different ideas how to write. One great thing about the construction hypothesis is that there are no binding rules about the writing and this can explain the various oddities in the text.

This is all good observations, and indeed when there are low expected counts, large percentage deviations will happen all the time. All this is certainly compatible with a cipher. Take a very simple cipher, where you write the alphabet sequentially in a table of 26x26 rotating each row by one and use two letters to pick column and row to encode a single plaintext letter. So, for each plaintext letter you will have 26 ways of encoding it using combinations of two characters. When encoding with this cipher you can select all kinds of character sequences almost independently. And as a result it's likely that you will have a fairly random distribution of prefixes and suffixes and yet a more or less practical cipher. (Usual disclaimer: I don't think this was the specific way VMS was encoded, even though it's possible to use the glyphset of Voynichese to mimic this cipher just for fun.)

I think no matter what algorithm one can propose for randomly generating a lot of pseudo text, it's probably quite trivial to adapt this mechanism for actual encoding, just by adding a bit of constraints on the randomness, which would produce a result quite similar to what your tables show. I don't think there is any statistical test that could differentiate between random gibberish and a cipher, even when talking in the context of the medieval ciphers.

Edit: Basically, as far as your main argument goes, we are in a perfect agreement, Voynichese looks like an artificial construction that doesn't behave like a natural European language in several ways, and your tables just show this nicely. However, I think it makes sense to test how this would work for non-European languages too.


RE: An Artificial Construction - dashstofsk - 25-06-2025

(25-06-2025, 02:53 PM)RobGea Wrote: You are not allowed to view links. Register or Login to view.color-coding the data

When generating the matrices I had to draw a line somewhere which words to include. I included only those whose initials and finals occurred at least 20 times, and calculated the counts and ratios for every initial-final merger in the matrix.

As you move away from the top corner words start to become rare. As words starts to get very rare the ratios start to get meaningless. Any one more or less occurrence will have a big effect on the ratio. This is where there are many of the 'reds'.

If you were to weigh each ratio by the word frequency then I believe the global ratio would be good.

Because we can only guess what method the authors used I think it is okay for some of the words to fall outside of the ratio range [0.5:2]. In fact it would be unbelievable if the ratios were all very close to parity. It would hint at something too regular, improbable. If the ratio values were widely scattered then it would hint at inconsistency. The middle ground feels just right. Just enough variability to account for mistakes, choices and individual style of the author.


RE: An Artificial Construction - dashstofsk - 25-06-2025

(25-06-2025, 02:53 PM)RobGea Wrote: You are not allowed to view links. Register or Login to view.Awaiting your tables of all the other words

I have no plans yet to try anything similar on other words.

I wrote elsewhere [ You are not allowed to view links. Register or Login to view. ] and recently "if one section of the manuscript is an artificial fabrication then all of it is going to be so. If it can be shown that at least some part of it is fabricated then the main objective has been achieved. There will be no great need then to try to explain every word, every oddity, nor to attempt to give a precise method of construction."

Hopefully I have demonstrated that at least some part of it has been fabricated. The objective is now nearer to having been achieved.