The Voynich Ninja

Full Version: Let there be meaning
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5 6
Voynich marathon marches on, new text is out: how to look for and find meaning in the Voynich MS.

You are not allowed to view links. Register or Login to view.

No good pictures this time, but there are bits of simple math and some basic Python. I hope you'll enjoy it!

Also, I heard there is a strict no spamming policy on voynich.ninja. I'm not sure how close I am to the banhammer  Smile
My feeling is that the texts I write substantially differ from one another and have enough information each to deserve separate threads, but if you feel otherwise, please let me know.
I like your definition of stateful one-to-many additive ciphers. Yes

Quote:My present primary area of investigation in the Voynich manuscript is stateful one-to-many additive ciphers that operate on character by character basis. These ciphers encode the plaintext characters one by one or in small groups, appending the result to the ciphertext and possibly keeping track of some internal state, and any character or a combination of characters in the plaintext can map to several possible ciphertexts (by choice or based on the internal state).

But this definition needs an introduction. I have the nagging feeling that to the general population of Voynicheros we, who are interested in this sort of thing, sound like total weirdos, given the fact that such stateful ciphers are not known to have existed in the 15th century.

The interesting property is that the same chunk of ciphertext does not necessarily represent the same plaintext, which addresses the difficulty of converting a babble-like stream of similar words to something that looks less strange: daiin daiin may not be a repetition after all.

Conversely, different ciphertexts can be generated for the same plaintext, a property shared with homophonic substitution ciphers, but they need more symbols than letters and this does not seem to be the case in the VMS: the set of common glyphs is as small as the Latin alphabet.

As a result these ciphers destroy most of the evidence of word co-occurrence that are expected in a meaningful text. And this is exactly what is found in word pair statistics of the VMS: see the Word Pair Permutation Analysis by Marke Fincher or the published article from last year's Voynich conference: An Analysis of the Relationship between Words within the Voynich Manuscript by Andrew Caruana, Colin Layfield and John Abela.
You can share any research you publish, that's not spamming, so don't worry about it Smile
Just a very minor comment: could you indicate which transliteration file you used? The IT, ZL and RF files are quite different, and this will show up especially when looking at individual words and comparing words.

Note that the version you extracted (1.7) is the version of the format, not the version of the file. All most recent files are format version 2.0, though the differences between these two can be ignored for most practical purposes.

On a more fundamental point, I am concerned that a "stateful one to many cipher" may be helpful when trying to decode text, but will not be able to explain how any plain text, when encoded using this, will have the properties that Marco Ponzi already pointed out.
You are not allowed to view links. Register or Login to view.

This post is too long for me to read in full. I read the conclusions at the end and skimmed through the rest.

I don't know much about statistics, but I'd like to comment on this statement:

Quote:Using the same logic as above, we take two locations for ytoa and compute the probability of yfa to appear in one of remaining 6 slots in either location. 2 * 6 * 2 / (1033 - 1 - 7) = 2.3%. Which is still very low.

I don't know why 2.3% should be regarded as "very low". When one cherry-picks complex patterns from anything that is not totally trivial, it's easy to find patterns that do not occur frequently. By making the patterns more and more specific, probabilities can be made arbitrarily low.

For instance, we can consider You are not allowed to view links. Register or Login to view..

daily-random Wrote:• 10 NUMBERS FROM 1 TO 500:
133   499   165   144   207   215   491   361   336   400

We can split the 10 numbers into 2 sequences of 5 each:

133   499   165   144   207
215   491   361   336   400

The probability that the last two numbers in both sequences have 0 as second digit is 1%.

In both sequences, the second number starts with "49". The probability that this happens is (1/100)^2=0.0001%.

For positions 2, 3 and 5, the numbers in the two lists share the same second digit (9, 6, 0). The probability that this happens 3 or more times is 0.86%.

Plotting the two sequences, one sees that they go up and down together, following the pattern up-down-down-up. The probability that the two follow the same pattern is 1/(2^4)=6.25%. The probability that both show the specific pattern up-down-down-up is the square of that figure, i.e. 0.39%.
[attachment=7791]

Of course, given any two sets of numbers or words, one can make up numberless patterns of this kind (the longer the sequences and the numbers, the more patterns can be found).


The post discusses these two sequences of Voynichese words (this the order in which they appear in the post, which is different from the order in the manuscript):

f67r2: ytoaiin, yfain, opcholdy, ofar.oeoldan, okain.am, okal, dolchsody
f72r3: ytoar.shar, yfary, opalal, oraiiral, oletal, aral, octho

The conclusion is:

Quote:There is 99.42% probability that two sets of 7 labels on f67r2 and f73r3 are textually related to one another (from the fact that their probability to co-occur by chance is only 0.58%).

Some observations:
  • Stating that "textually related" follows from "unlikely to occur by chance" is a non-sequitur. Many non-random patterns have been spotted in the Voynich manuscript, e.g. word structure (Stolfi), line patterns, word repetition, different "dialects" for the different sections (Currier). We know very well that Voynichese is not random and has structure at some level: this does not imply that the text is meaningful.
  • It's not clear why the similarities between the two lists are discussed, but not the differences (e.g. in most cases word lengths are markedly different, in 3 out of seven cases the number of words in the labels are different, suffixes only match in one case).
  • The seven labels from f72r3 are part of a set of 30 labelled stars for the Cancer page. The page is part of a sequence of 10 zodiac signs (one page is missing), each containing 30 labelled stars. Therefore (contrary to f67r2) those seven labels do not look likely to have anything to do with the seven planets but seem more likely to be related with the thirty degrees that make up each zodiac sign (see Panofsky's reference to ms BAV Reg.lat.1283a).
  • The presence of several circle-gallows prefixes is typical of Voynichese labels (where circles are o a y and gallows t k p f). See also histogram You are not allowed to view links. Register or Login to view.. While only about 16% of paragraph words start with such sequences, the rate is 48% for labels. It seems that this wider phenomenon could help in the analysis of the prefixes in the two sets of seven labels.
I have basically the same objections, started writing them this morning, gave up because I had other things to do.

MarcoP did it better than I could. Well done!
(22-10-2023, 02:25 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.Just a very minor comment: could you indicate which transliteration file you used? The IT, ZL and RF files are quite different, and this will show up especially when looking at individual words and comparing words.

Note that the version you extracted (1.7) is the version of the format, not the version of the file. All most recent files are format version 2.0, though the differences between these two can be ignored for most practical purposes.

There is a problem, I use the file that I downloaded, I think, about 2 years ago when I first decided to look at the Voynich manuscript statistics. I assumed that it would be uniquely identifiable from its top header.

The header in full reads:

#=IVTFF Eva- 1.7
# ZL transliteration file, updated from EVMT project
# Version 1r of 11/04/2020

Could you help with identifying the proper name of this file, by which I should refer to it?
(22-10-2023, 03:43 PM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.You are not allowed to view links. Register or Login to view.

This post is too long for me to read in full. I read the conclusions at the end and skimmed through the rest.

Thank you for checking it out, I actually didn't expect anyone to read it in full. I wouldn't track this research myself, at least not until some important results appear. Smile 
When answering the questions below, I provide some relevant quotes from the text, just to double-check myself.

Quote:I don't know much about statistics, but I'd like to comment on this statement:

Quote:Using the same logic as above, we take two locations for ytoa and compute the probability of yfa to appear in one of remaining 6 slots in either location. 2 * 6 * 2 / (1033 - 1 - 7) = 2.3%. Which is still very low.

I don't know why 2.3% should be regarded as "very low". When one cherry-picks complex patterns from anything that is not totally trivial, it's easy to find patterns that do not occur frequently. By making the patterns more and more specific, probabilities can be made arbitrarily low.

Relevant quotes from the text: "When assessing how likely certain arrangement is, you always need to consider it against all other arrangements or all other attempts." and "Before we proceed with the calculations, it's very important to note that the choice of f67r2 was not random or a result of any brute-force enumeration. It was a result of tentative identification of planet correspondences that we made earlier. Should our analysis involve comparison of all sequences in the manuscript to all other sequences in the manuscript, we'd have to correct for the number of attempts that we make, but we don't have to do this if we start with a particular sequence in the manuscript."

Cherry-picking refers to a process where you use a large sample of relationships and pick only those that are of interest to you. This cannot apply here, since we started from 7 labels in the text that we tentatively identified before and started looking for statistically relevant patterns related to these 7 labels specifically in the rest of the manuscript. The methodological reason for picking these 7 labels was that we have tentatively attributed some meaning to them in my past article (we couldn't check for further meaningful connections if we don't attribute some meaning in the first place). Then we computed the probabilities considering all other labels in the text.

That's why 2.3% is the expected probability of this structural feature to occur by random chance and it definitely looks very low. If we started by examining the whole of the manuscript looking for patterns, as you probably interpreted it, we'd have to adjust for the total number of the relationships we examined, so instead of 2.3% we'd have something closer to 1 - (1 - 0.023) ^ (1033 / 7) = 96.77% of this relationship being random, given there could be ~ 1033 / 7 blocks of 7 labels in the manuscript. This is definitely not the right way to compute this precisely, but a good upper bound. 

Quote:The post discusses these two sequences of Voynichese words (this the order in which they appear in the post, which is different from the order in the manuscript):

f67r2: ytoaiin, yfain, opcholdy, ofar.oeoldan, okain.am, okal, dolchsody
f72r3: ytoar.shar, yfary, opalal, oraiiral, oletal, aral, octho

The conclusion is:

Quote:There is 99.42% probability that two sets of 7 labels on f67r2 and f73r3 are textually related to one another (from the fact that their probability to co-occur by chance is only 0.58%).

Some observations:
  • Stating that "textually related" follows from "unlikely to occur by chance" is a non-sequitur. Many non-random patterns have been spotted in the Voynich manuscript, e.g. word structure (Stolfi), line patterns, word repetition, different "dialects" for the different sections (Currier). We know very well that Voynichese is not random and has structure at some level: this does not imply that the text is meaningful. 

You probably interpret "textually related" as "meaningfully related". In the article we talk about textual structures as statistically significant regularities in the text, the text treated as a sequence of characters, defined by the transliteration. No connection to meaning is implied, quoting: "The structure lets you make predictions about the text using the information in the text itself, but when it doesn't relate to any external objects, properties or behaviors, the structure doesn't convey meaning."

Quote:
  • It's not clear why the similarities between the two lists are discussed, but not the differences (e.g. in most cases word lengths are markedly different, in 3 out of seven cases the number of words in the labels are different, suffixes only match in one case).

The differences are discussed, e.g., "If we assume there is a meaningful link between two charts, the imperfect match of labels on f67r2 and f72r3 could imply one of two things: either a one-to-many cipher is used or two sets of labels refer to parallel sets of objects with similar names or in similar but different languages."
The basis of my arguments is built on statistical significance of the matching parts, the features of parts that don't match do not affect it in any way.

Quote:
  • The seven labels from f72r3 are part of a set of 30 labelled stars for the Cancer page. The page is part of a sequence of 10 zodiac signs (one page is missing), each containing 30 labelled stars. Therefore (contrary to f67r2) those seven labels do not look likely to have anything to do with the seven planets but seem more likely to be related with the thirty degrees that make up each zodiac sign (see Panofsky's reference to ms BAV Reg.lat.1283a).

I'm not sure I understand the specific counterpoint in this comment, sorry. Could you rephrase maybe? If this is not a critique of any aspect of my text, but just a reference to an alternative interpretation, thank you for the reference!

On a general note, arguing what things look like and not look like in the Voynich manuscript appears to be of low use, given that very few things in the manuscript were specifically identified with high certainty in a universally accepted way. As I joked on Twitter, plant images could very well be schematics for alien machinery and bathing ladies represent organic chemistry. I don't think we should give strong weight to any interpretations based just on what things look like, as, in my opinion, they have been largely fruitless so far when it comes to explaining what the manuscript is and how, why and by whom it was created.

Quote:
  • The presence of several circle-gallows prefixes is typical of Voynichese labels (where circles are o a y and gallows t k p f). See also histogram You are not allowed to view links. Register or Login to view.. While only about 16% of paragraph words start with such sequences, the rate is 48% for labels. It seems that this wider phenomenon could help in the analysis of the prefixes in the two sets of seven labels.

Again, if this is a specific counterpoint, I'm not sure I understand it. The prefix statistics that I compute use the set of all labels, including the features that you mention. The result I get is computed against the full distribution of the actual label prefixes in the text.

Thanks again for your comments and looking forward for more, if you have time and desire.
(22-10-2023, 02:25 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.On a more fundamental point, I am concerned that a "stateful one to many cipher" may be helpful when trying to decode text, but will not be able to explain how any plain text, when encoded using this, will have the properties that Marco Ponzi already pointed out.

Sorry, I'm a bit lost here. Which properties are these? I'm not sure whether you refer to any particular post or text, or some general set of ideas.
Let there be no connection to meaning!
Pages: 1 2 3 4 5 6