The Voynich Ninja
An explanation of the Voynich Manuscript text - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: An explanation of the Voynich Manuscript text (/thread-1812.html)

Pages: 1 2 3 4 5 6 7


RE: An explanation of the Voynich Manuscript text - Torsten - 25-04-2017

(25-04-2017, 12:34 AM)DonaldFisk Wrote: You are not allowed to view links. Register or Login to view.These look like coincidences to me.   You have about 240 pages of text.   Suppose for the sake of argument the words are random.   Then you'll find places where you get the same word repeated several times, or in the same line position, or similar words close together.   But (since it's random) claiming significance in this is like seeing faces in clouds, or hearing voices in static.    People are predisposed to spotting patterns.

There are not some places. It happens everywhere. 

There are two different patterns. One pattern is the vertical pattern described by Schinner. In his paper Schinner has presented statistical evidence. I have calculated the  proportion of identical words appearing near to each other for the whole VMS (see You are not allowed to view links. Register or Login to view.). You have already implemented  such a feature for your sample text. Nearly all of the lines in your sample text start with [p]. 

The second pattern is that similar words do co-occur throughout the text. This pattern is described by You are not allowed to view links. Register or Login to view.. I have also calculated the frequencies for repeated words within 20 lines for the whole VMS (see You are not allowed to view links. Register or Login to view.). 

In some way also the difference between Currier A and Currier B is based on this pattern (see You are not allowed to view links. Register or Login to view.). For instance words using [ed] are very rare in Currier A but common in Currier B. See for instance the frequencies for the words [cheody], [chedy] and [qokeedy] in the VMS (see You are not allowed to view links. Register or Login to view.):

Herbal in Currier A [cheody] x 8   [chedy] x 1    [qokeedy] x 0    (word count: 8087)
Pharmaceutical (A)  [cheody] x 18  [chedy] x 1    [qokeedy] x 0    (count: 2529)
Astronomical        [cheody] x 8   [chedy] x 4    [qokeedy] x 0    (count: 2136)
Cosmological        [cheody] x 7   [chedy] x 24   [qokeedy] x 4    (count: 2691)
Herbal in Currier B [cheody] x 7   [chedy] x 62   [qokeedy] x 9    (count: 3233)
Stars (B)           [cheody] x 33  [chedy] x 190  [qokeedy] x 137  (count: 10673)
Biological (B)      [cheody] x 0   [chedy] x 210  [qokeedy] x 153  (count: 6911)

On the other side words like [chol] are common in Currier A but rare in Currier B:

Herbal in Currier A [chol] x 228   [chor] x 155  (count: 8087)
Pharmaceutical (A)  [chol] x 45    [chor] x 24   (count: 2529)
Astronomical        [chol] x 8     [chor] x 2    (count: 2136)
Cosmological        [chol] x 19    [chor] x 8    (count: 2691)
Herbal in Currier B [chol] x 13    [chor] x 6    (count: 3233)
Stars (B)           [chol] x 62    [chor] x 19   (count: 10673)
Biological (B)      [chol] x 14    [chor] x 1    (count: 6911)

This is for sure not a coincidence. That rare glyph sequences occur together is only one aspect of this pattern.


RE: An explanation of the Voynich Manuscript text - DonaldFisk - 25-04-2017

(25-04-2017, 10:41 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.
(23-04-2017, 10:08 PM)DonaldFisk Wrote: You are not allowed to view links. Register or Login to view.I carried out this test, and confirmed that words in the manuscript are independent of the previous word (see You are not allowed to view links. Register or Login to view.).   The mean frequency of word pairs was very close to the expected probability (the product of the probabilities of the two words considered separately), but the frequency's variance was very close to the square of the mean (i.e. a Gamma distribution, in my generated text it was a Poisson distribution), suggesting that the mechanism I initially suggested for deciding on transition paths, and only that, was wrong.   As far as I know, no one had spotted this before.

I'm sorry but this is completely inconclusive.

Your figure 1 shows a strong deviation from randomness.
Still, also that is not conclusive in the other direction.

What completely lacks is evidence (metrics) for:
- what should be the statistical behaviour of a meaningful text
- how much variation there would be in this

The analysis is based on the assumption that every word type in the Voynich MS should consistently represent the same word in some plain text. This is not at all certain.
If the Voynich MS text includes null characters, all word combination statistics are completely thrown off.

There was a thread here in the voynich.ninja where there was a strong indication that the repeating sequence statistics in the MS are not that different from a known plain text.
(Of course also that is not conclusive for the same reason: "not that different" is not sufficiently defined).

My summary is:
- there are indications that the text could be meaningful
- there are indications that the text could be meaningless
Both are still inconclusive.
Some papers concentrate more on one or the other, but other papers do state both.

There are different kinds of random: Gaussian, Poisson, etc.   Figure 2 and Figure 4 show that the distribution mean is the same as you'd get by chance, and that the distribution variance is the square of the mean, i.e. it behaves like a Gamma distribution.

However, Figure 6 shows that there's at least one glyph pair (final y, initial qo) which occurs roughly twice as often.   There must actually be more than one, to prevent the distribution becoming skewed.   To avoid complexity, I'd now work on the assumption that  the initial glyph always depends on the final glyph of the previous word.   That came as a surprise to me, but it doesn't prove meaning, and it might (I haven't checked) explain the Gamma distribution.   It can easily be accommodated by my state transition model.

Yes, it's worth checking texts in other languages, to see how their distributions differ from chance, though someone will no doubt point out that I haven't checked Tocharian B.

I think what's needed, as people are divided into three camps (unencrypted, encrypted, meaningless; and even within those camps there's disagreement) are tests agreed in advance and validated by statisticians, which are considered acceptable by everyone as evidence.

And there has to come a point where you say, "This is enough evidence, I think X is true," or even, "Even though this clearly isn't enough evidence, I think X is probably true," but still be prepared to change your position if new evidence arrives.   That point has, for me, been reached.   I think the text is probably, bordering on almost certainly, meaningless, generated by a stochastic process.   I haven't worked out all the details.    But if someone is ever able to provide a translation which isn't nonsense, I'll change my mind.   So far, they haven't.


RE: An explanation of the Voynich Manuscript text - DonaldFisk - 25-04-2017

(25-04-2017, 02:07 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view.
(25-04-2017, 12:34 AM)DonaldFisk Wrote: You are not allowed to view links. Register or Login to view.These look like coincidences to me.   You have about 240 pages of text.   Suppose for the sake of argument the words are random.   Then you'll find places where you get the same word repeated several times, or in the same line position, or similar words close together.   But (since it's random) claiming significance in this is like seeing faces in clouds, or hearing voices in static.    People are predisposed to spotting patterns.

There are not some places. It happens everywhere. 

There are two different patterns. One pattern is the vertical pattern described by Schinner. In his paper Schinner has presented statistical evidence. I have calculated the  proportion of identical words appearing near to each other for the whole VMS (see You are not allowed to view links. Register or Login to view.). You have already implemented  such a feature for your sample text. Nearly all of the lines in your sample text start with [p]. 

The second pattern is that similar words do co-occur throughout the text. This pattern is described by You are not allowed to view links. Register or Login to view.. I have also calculated the frequencies for repeated words within 20 lines for the whole VMS (see You are not allowed to view links. Register or Login to view.). 

In some way also the difference between Currier A and Currier B is based on this pattern (see You are not allowed to view links. Register or Login to view.). For instance words using [ed] are very rare in Currier A but common in Currier B. See for instance the frequencies for the words [cheody], [chedy] and [qokeedy] in the VMS (see You are not allowed to view links. Register or Login to view.):

Herbal in Currier A [cheody] x 8   [chedy] x 1    [qokeedy] x 0    (word count: 8087)
Pharmaceutical (A)  [cheody] x 18  [chedy] x 1    [qokeedy] x 0    (count: 2529)
Astronomical        [cheody] x 8   [chedy] x 4    [qokeedy] x 0    (count: 2136)
Cosmological        [cheody] x 7   [chedy] x 24   [qokeedy] x 4    (count: 2691)
Herbal in Currier B [cheody] x 7   [chedy] x 62   [qokeedy] x 9    (count: 3233)
Stars (B)           [cheody] x 33  [chedy] x 190  [qokeedy] x 137  (count: 10673)
Biological (B)      [cheody] x 0   [chedy] x 210  [qokeedy] x 153  (count: 6911)

On the other side words like [chol] are common in Currier A but rare in Currier B:

Herbal in Currier A [chol] x 228   [chor] x 155  (count: 8087)
Pharmaceutical (A)  [chol] x 45    [chor] x 24   (count: 2529)
Astronomical        [chol] x 8     [chor] x 2    (count: 2136)
Cosmological        [chol] x 19    [chor] x 8    (count: 2691)
Herbal in Currier B [chol] x 13    [chor] x 6    (count: 3233)
Stars (B)           [chol] x 62    [chor] x 19   (count: 10673)
Biological (B)      [chol] x 14    [chor] x 1    (count: 6911)

This is for sure not a coincidence. That rare glyph sequences occur together is only one aspect of this pattern.
I've explicitly accounted for the observations of Currier, and of Montemurro and Zanette.    If you checked, you'd get similar statistics for my generated manuscript.

Your other point, while possibly correct, is noting that I haven't modelled some aspects of the manuscript, specifically paragraph and line breaks.    I included a few lines of code to begin new paragraphs on word-initial p or f.   I know it's inaccurate but it isn't really part of my model.   I don't have access to the Schinner paper.


RE: An explanation of the Voynich Manuscript text - Diane - 25-04-2017

Thanks to everyone for a lively discussion. Voynich.ninja has allowed real development and exchange of views after years of drought - since the early 2000s, really.

It's fantastic to see this happening.

Given that I'm not a linguist, and not even terribly interested in the written part of this text - there's something which always bothers me about the 'grille/random' argument versus linguistic and cryptological arguments.  It's simply the narrow range in which the issue of 'language' is addressed.

But if something isn't prose, and isn't poetry, it can still be meaningful.  The shorthand idea is one option.

But there are others. 
Whether or not we concede the interpretation which Don Hoffmann attached to each of the glyphs, he did show that they could be interpreted as a mixture of initial letters and numerical values.  I don't know if anyone had even considered something as simple as that: initial letters and numerical values.  Yet we use that sort of language all the time.

Some time ago, just for the heck of it, I asked Julian Bunn if he would print off a page of Voynichese showing each glyph with a different colour value - it isn't impossible that the pictures are talking about one thing and the text another - and among the 'others' I had considered the sort of patterns used in fabrics: carpets, tapestries, woven silks and so on.

Julian said he'd do that, but it hasn't happened yet.  Perhaps he talked himself out of doing so much work for a non-language test. Smile

I guess my point is that the 'random nonsense' versus 'intelligible prose' never gets far - so why not consider other types of text, including technical ones?


RE: An explanation of the Voynich Manuscript text - Torsten - 25-04-2017

(25-04-2017, 02:42 PM)DonaldFisk Wrote: You are not allowed to view links. Register or Login to view.
(25-04-2017, 02:07 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view.
(25-04-2017, 12:34 AM)DonaldFisk Wrote: You are not allowed to view links. Register or Login to view.These look like coincidences to me. 

There are not some places. It happens everywhere. 

I've explicitly accounted for the observations of Currier, and of Montemurro and Zanette.

On one side you say that within a section the words in the VMS are randomly used. On the other side you say that you accept that the VMS changes over time and that you accept the observation that similar words do co-occur throughout the script. If both statements are true it must be possible to demonstrate clear breaks between different sections of the manuscripts. Your graphs for page cluster show that no clear breaks between different sections exists (see You are not allowed to view links. Register or Login to view.). Therefore both statements can't be true at the same time.


RE: An explanation of the Voynich Manuscript text - DonaldFisk - 25-04-2017

(25-04-2017, 03:05 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view.On one side you say that within a section the words in the VMS are randomly used. On the other side you say that you accept that the VMS changes over time and that you accept the observation that similar words do co-occur throughout the script. If both statements are true it must be possible to demonstrate clear breaks between different sections of the manuscripts. Your graphs for page cluster show that no clear breaks between different sections exists (see You are not allowed to view links. Register or Login to view.). Therefore both statements can't be true at the same time.

You can't apply propositional logic.   There's a huge amount of modality, probability, and fuzziness involved here.   Things can be possibly true, probably true, or quite true.

Clustering is very much a black art.    There are no clear breaks between contiguous clusters, and their differences are small, e.g. for blue and black herbal.   I could have drawn the clusters differently, e.g. subdivided them or moved their boundaries slightly.  Also, if there's randomness, you're going to get some overlap.   What it might help elucidate, is the order in which the text was written.

There are places in the text where word frequencies do change abruptly between non-contiguous clusters, e.g. f26r, from green to red.

Similar words co-occur because they're frequent within certain pages clusters, e.g. the red herbal (Currier B) pages have a high frequency of words ending in edy.


RE: An explanation of the Voynich Manuscript text - voynichbombe - 26-04-2017

Going back to the deck of cards, what measure for "slowness" of the suggested process did you have in mind, and how exactly did you imagine the execution? A real world (manual) applicability seems rather crucial in our case.

The order of a deck of cards is accepted as sufficiently random enough for a professional play of 17+4 after it has been shuffled seven times using a shuffling machine. Humans have many, many different ways of shuffling cards by hand.

People don't know "random" (what: apples, pears?) because they are never looking at anything manmade featuring true randomness. Anyone who only slightly touches on the topic of RNGs must get a glimpse how hard it is to get hands on sufficiently "good enough" random numbers for computational purposes. The generation process will leave a trace in the output, if ever so faint - which allows for reverse engineering.

Any method involving tangible things is by orders of magnitude more prone to produce telling patterns in the pseudo-random output.

I believe your method(s) could be deduced, given a few good enough hints (and GPU's). This hasn't been the case for any VMs text, sofar.

While I oppose the argument that overseeing statistical text features would be out of bounds for a XV.(±) creation  (Al-Kindi, complications in diplomatic renaissance chiffres etc.), the concept of propability seems a more modern development to me. Generally, I'd like to ask for more linguistic clarity (denoting realm, or scope), because "Randomness is the lack of pattern, coincidence is pattern itself."

Now you have raised the term "writing order" and I guess I'm not the only one looking forward to what you have to offer in this regard.


RE: An explanation of the Voynich Manuscript text - DonaldFisk - 26-04-2017

(26-04-2017, 05:27 AM)voynichbombe Wrote: You are not allowed to view links. Register or Login to view.Going back to the deck of cards, what measure for "slowness" of the suggested process did you have in mind, and how exactly did you imagine the execution? A real world (manual) applicability seems rather crucial in our case.

The order of a deck of cards is accepted as sufficiently random enough for a professional play of 17+4 after it has been shuffled seven times using a shuffling machine. Humans have many, many different ways of shuffling cards by hand.

People don't know "random" (what: apples, pears?) because they are never looking at anything manmade featuring true randomness. Anyone who only slightly touches on the topic of RNGs must get a glimpse how hard it is to get hands on sufficiently "good enough" random numbers for computational purposes. The generation process will leave a trace in the output, if ever so faint - which allows for reverse engineering.

Any method involving tangible things is by orders of magnitude more prone to produce telling patterns in the pseudo-random output.

I believe your method(s) could be deduced, given a few good enough hints (and GPU's). This hasn't been the case for any   VMs text, sofar.

While I oppose the argument that overseeing statistical text features would be out of bounds for a XV.(±) creation  (Al-Kindi, complications in diplomatic renaissance chiffres etc.), the concept of propability seems a more modern development to me. Generally, I'd like to ask for more linguistic clarity (denoting realm, or scope), because "Randomness is the lack of pattern, coincidence is pattern itself."

Now you have raised the term "writing order" and I guess I'm not the only one looking forward to what you have to offer in this regard.

I'm not particularly attached to any specific implementation, and it's unclear which random distribution I'm seeing here, now that it seems the first glyph of words depends on the last glyph of the previous word, something I didn't know about and so didn't take into account.   If the input's Poisson, it rules out cards.

Regarding determining the writing order, the inputs are the PCA of the pages, the illustrations, the handwriting, and what's inferable about the original binding.   I'm not particularly attached to the page clusters I derived, but I think they're good enough to test the general method.   Incidentally, I'm surprised only one other person (Sarah Goslee) seems to have done Principal Component Analysis, which I only recently discovered because she didn't call it that.   I get the impression that very few people are using computing power to its full potential.

I'm not sure how much more effort it's worth my putting in.   I can justify the effort to date: I now have a half-decent statistical library for my Lisp dialect, and have made various other improvements and bug fixes.   It probably depends now on my writing code for another purpose which can then be applied to the Voynich Manuscript.   Most people don't care or at best agnostic, and the few who do care almost all disagree, understandably if they've spent a lot of time and got nowhere, or are attached to a particular theory of their own.


RE: An explanation of the Voynich Manuscript text - Torsten - 26-04-2017

(25-04-2017, 03:44 PM)DonaldFisk Wrote: You are not allowed to view links. Register or Login to view.You can't apply propositional logic.   There's a huge amount of modality, probability, and fuzziness involved here.   Things can be possibly true, probably true, or quite true.

Clustering is very much a black art.    There are no clear breaks between contiguous clusters, and their differences are small, e.g. for blue and black herbal.   I could have drawn the clusters differently, e.g. subdivided them or moved their boundaries slightly.  Also, if there's randomness, you're going to get some overlap.   What it might help elucidate, is the order in which the text was written.

There are places in the text where word frequencies do change abruptly between non-contiguous clusters, e.g. f26r, from green to red.

Similar words co-occur because they're frequent within certain pages clusters, e.g. the red herbal (Currier B) pages have a high frequency of words ending in edy.

This is exactly my point. The parameters used for clustering are subjective. Other clusterings are also possible. There are notable differences between two clusters. This means another interesting clustering would be to use a cluster for each quire or for each page of the VM. This way you would get smaller differences for two contiguous clusters.  If I understand you right you use one state transition table to simulate each cluster. This means as more transition tables you use as more detailed is your simulation.

This would be no problem if you would argue that you use transition tables to simulate your results for the VMS. But since you claim that the text of the VMS was generated using your transition tables I see two problems. 

The first problem is that you explain the differences within a cluster away as random. This is problematic since the details you see only depend on the number of clusters you use.

The second problem is that you use different transition tables for each cluster segment. As far as I see you didn't give an explanation for the changes between two transition tables.  One explanation I see is that for the VMS a unknown text generation method was used and that the output for this unknown method changes over time. But this would mean that your state transition tables are only a way to simulate some of the output of this unknown text generation method. This would mean that your state transition tables can be very useful to learn something about this text generation method. But this would mean that they are only a simulation not the unknown method itself.


RE: An explanation of the Voynich Manuscript text - DonaldFisk - 26-04-2017

(26-04-2017, 10:53 AM)Torsten Wrote: You are not allowed to view links. Register or Login to view.
(25-04-2017, 03:44 PM)DonaldFisk Wrote: You are not allowed to view links. Register or Login to view.Clustering is very much a black art.    There are no clear breaks between contiguous clusters, and their differences are small, e.g. for blue and black herbal.   I could have drawn the clusters differently, e.g. subdivided them or moved their boundaries slightly.  Also, if there's randomness, you're going to get some overlap.   What it might help elucidate, is the order in which the text was written.

There are places in the text where word frequencies do change abruptly between non-contiguous clusters, e.g. f26r, from green to red.

Similar words co-occur because they're frequent within certain pages clusters, e.g. the red herbal (Currier B) pages have a high frequency of words ending in edy.

This is exactly my point. The parameters used for clustering are subjective. Other clusterings are also possible. There are notable differences between two clusters. This means another interesting clustering would be to use a cluster for each quire or for each page of the VM. This way you would get smaller differences for two contiguous clusters.  If I understand you right you use one state transition table to simulate each cluster. This means as more transition tables you use as more detailed is your simulation.

This would be no problem if you would argue that you use transition tables to simulate your results for the VMS. But since you claim that the text of the VMS was generated using your transition tables I see two problems. 

The first problem is that you explain the differences within a cluster away as random. This is problematic since the details you see only depend on the number of clusters you use.

The second problem is that you use different transition tables for each cluster segment. As far as I see you didn't give an explanation for the changes between two transition tables.  One explanation I see is that for the VMS a unknown text generation method was used and that the output for this unknown method changes over time. But this would mean that your state transition tables are only a way to simulate some of the output of this unknown text generation method. This would mean that your state transition tables can be very useful to learn something about this text generation method. But this would mean that they are only a simulation not the unknown method itself.

I now actually think we're close to agreeing on this.   The clusters are to some extent arbitrary.   The PCA plots for the pages on You are not allowed to view links. Register or Login to view. are not arbitrary (which is why Sarah Goslee's plot is nearly identical to mine except for its orientation, which doesn't matter), and indicate how similar any two pages are.   This can then be compared with the bindings and the handwriting.

Also, if there's a method which is mathematically equivalent to the state machine I propose, but doesn't explicitly use it, I'm happy with that.   The important thing is that the state transition table is a grammar for the words, and the ones on You are not allowed to view links. Register or Login to view. model almost 90% of the words, and I'm sure that can be improved upon.

In summary, I think my analysis stands up, and adds to what was already known, even if you disagree with my conclusions.