Torsten > 25-04-2017, 02:07 PM
(25-04-2017, 12:34 AM)DonaldFisk Wrote: You are not allowed to view links. Register or Login to view.These look like coincidences to me. You have about 240 pages of text. Suppose for the sake of argument the words are random. Then you'll find places where you get the same word repeated several times, or in the same line position, or similar words close together. But (since it's random) claiming significance in this is like seeing faces in clouds, or hearing voices in static. People are predisposed to spotting patterns.
DonaldFisk > 25-04-2017, 02:26 PM
(25-04-2017, 10:41 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.(23-04-2017, 10:08 PM)DonaldFisk Wrote: You are not allowed to view links. Register or Login to view.I carried out this test, and confirmed that words in the manuscript are independent of the previous word (see You are not allowed to view links. Register or Login to view.). The mean frequency of word pairs was very close to the expected probability (the product of the probabilities of the two words considered separately), but the frequency's variance was very close to the square of the mean (i.e. a Gamma distribution, in my generated text it was a Poisson distribution), suggesting that the mechanism I initially suggested for deciding on transition paths, and only that, was wrong. As far as I know, no one had spotted this before.
I'm sorry but this is completely inconclusive.
Your figure 1 shows a strong deviation from randomness.
Still, also that is not conclusive in the other direction.
What completely lacks is evidence (metrics) for:
- what should be the statistical behaviour of a meaningful text
- how much variation there would be in this
The analysis is based on the assumption that every word type in the Voynich MS should consistently represent the same word in some plain text. This is not at all certain.
If the Voynich MS text includes null characters, all word combination statistics are completely thrown off.
There was a thread here in the voynich.ninja where there was a strong indication that the repeating sequence statistics in the MS are not that different from a known plain text.
(Of course also that is not conclusive for the same reason: "not that different" is not sufficiently defined).
My summary is:
- there are indications that the text could be meaningful
- there are indications that the text could be meaningless
Both are still inconclusive.
Some papers concentrate more on one or the other, but other papers do state both.
DonaldFisk > 25-04-2017, 02:42 PM
(25-04-2017, 02:07 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view.I've explicitly accounted for the observations of Currier, and of Montemurro and Zanette. If you checked, you'd get similar statistics for my generated manuscript.(25-04-2017, 12:34 AM)DonaldFisk Wrote: You are not allowed to view links. Register or Login to view.These look like coincidences to me. You have about 240 pages of text. Suppose for the sake of argument the words are random. Then you'll find places where you get the same word repeated several times, or in the same line position, or similar words close together. But (since it's random) claiming significance in this is like seeing faces in clouds, or hearing voices in static. People are predisposed to spotting patterns.
There are not some places. It happens everywhere.
There are two different patterns. One pattern is the vertical pattern described by Schinner. In his paper Schinner has presented statistical evidence. I have calculated the proportion of identical words appearing near to each other for the whole VMS (see You are not allowed to view links. Register or Login to view.). You have already implemented such a feature for your sample text. Nearly all of the lines in your sample text start with [p].
The second pattern is that similar words do co-occur throughout the text. This pattern is described by You are not allowed to view links. Register or Login to view.. I have also calculated the frequencies for repeated words within 20 lines for the whole VMS (see You are not allowed to view links. Register or Login to view.).
In some way also the difference between Currier A and Currier B is based on this pattern (see You are not allowed to view links. Register or Login to view.). For instance words using [ed] are very rare in Currier A but common in Currier B. See for instance the frequencies for the words [cheody], [chedy] and [qokeedy] in the VMS (see You are not allowed to view links. Register or Login to view.):
Herbal in Currier A [cheody] x 8 [chedy] x 1 [qokeedy] x 0 (word count: 8087)
Pharmaceutical (A) [cheody] x 18 [chedy] x 1 [qokeedy] x 0 (count: 2529)
Astronomical [cheody] x 8 [chedy] x 4 [qokeedy] x 0 (count: 2136)
Cosmological [cheody] x 7 [chedy] x 24 [qokeedy] x 4 (count: 2691)
Herbal in Currier B [cheody] x 7 [chedy] x 62 [qokeedy] x 9 (count: 3233)
Stars (B) [cheody] x 33 [chedy] x 190 [qokeedy] x 137 (count: 10673)
Biological (B) [cheody] x 0 [chedy] x 210 [qokeedy] x 153 (count: 6911)
On the other side words like [chol] are common in Currier A but rare in Currier B:
Herbal in Currier A [chol] x 228 [chor] x 155 (count: 8087)
Pharmaceutical (A) [chol] x 45 [chor] x 24 (count: 2529)
Astronomical [chol] x 8 [chor] x 2 (count: 2136)
Cosmological [chol] x 19 [chor] x 8 (count: 2691)
Herbal in Currier B [chol] x 13 [chor] x 6 (count: 3233)
Stars (B) [chol] x 62 [chor] x 19 (count: 10673)
Biological (B) [chol] x 14 [chor] x 1 (count: 6911)
This is for sure not a coincidence. That rare glyph sequences occur together is only one aspect of this pattern.
Diane > 25-04-2017, 03:01 PM
Torsten > 25-04-2017, 03:05 PM
(25-04-2017, 02:42 PM)DonaldFisk Wrote: You are not allowed to view links. Register or Login to view.(25-04-2017, 02:07 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view.(25-04-2017, 12:34 AM)DonaldFisk Wrote: You are not allowed to view links. Register or Login to view.These look like coincidences to me.
There are not some places. It happens everywhere.
I've explicitly accounted for the observations of Currier, and of Montemurro and Zanette.
DonaldFisk > 25-04-2017, 03:44 PM
(25-04-2017, 03:05 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view.On one side you say that within a section the words in the VMS are randomly used. On the other side you say that you accept that the VMS changes over time and that you accept the observation that similar words do co-occur throughout the script. If both statements are true it must be possible to demonstrate clear breaks between different sections of the manuscripts. Your graphs for page cluster show that no clear breaks between different sections exists (see You are not allowed to view links. Register or Login to view.). Therefore both statements can't be true at the same time.
voynichbombe > 26-04-2017, 05:27 AM
DonaldFisk > 26-04-2017, 10:41 AM
(26-04-2017, 05:27 AM)voynichbombe Wrote: You are not allowed to view links. Register or Login to view.Going back to the deck of cards, what measure for "slowness" of the suggested process did you have in mind, and how exactly did you imagine the execution? A real world (manual) applicability seems rather crucial in our case.
The order of a deck of cards is accepted as sufficiently random enough for a professional play of 17+4 after it has been shuffled seven times using a shuffling machine. Humans have many, many different ways of shuffling cards by hand.
People don't know "random" (what: apples, pears?) because they are never looking at anything manmade featuring true randomness. Anyone who only slightly touches on the topic of RNGs must get a glimpse how hard it is to get hands on sufficiently "good enough" random numbers for computational purposes. The generation process will leave a trace in the output, if ever so faint - which allows for reverse engineering.
Any method involving tangible things is by orders of magnitude more prone to produce telling patterns in the pseudo-random output.
I believe your method(s) could be deduced, given a few good enough hints (and GPU's). This hasn't been the case for any VMs text, sofar.
While I oppose the argument that overseeing statistical text features would be out of bounds for a XV.(±) creation (Al-Kindi, complications in diplomatic renaissance chiffres etc.), the concept of propability seems a more modern development to me. Generally, I'd like to ask for more linguistic clarity (denoting realm, or scope), because "Randomness is the lack of pattern, coincidence is pattern itself."
Now you have raised the term "writing order" and I guess I'm not the only one looking forward to what you have to offer in this regard.
Torsten > 26-04-2017, 10:53 AM
(25-04-2017, 03:44 PM)DonaldFisk Wrote: You are not allowed to view links. Register or Login to view.You can't apply propositional logic. There's a huge amount of modality, probability, and fuzziness involved here. Things can be possibly true, probably true, or quite true.
Clustering is very much a black art. There are no clear breaks between contiguous clusters, and their differences are small, e.g. for blue and black herbal. I could have drawn the clusters differently, e.g. subdivided them or moved their boundaries slightly. Also, if there's randomness, you're going to get some overlap. What it might help elucidate, is the order in which the text was written.
There are places in the text where word frequencies do change abruptly between non-contiguous clusters, e.g. f26r, from green to red.
Similar words co-occur because they're frequent within certain pages clusters, e.g. the red herbal (Currier B) pages have a high frequency of words ending in edy.
DonaldFisk > 26-04-2017, 11:18 AM
(26-04-2017, 10:53 AM)Torsten Wrote: You are not allowed to view links. Register or Login to view.(25-04-2017, 03:44 PM)DonaldFisk Wrote: You are not allowed to view links. Register or Login to view.Clustering is very much a black art. There are no clear breaks between contiguous clusters, and their differences are small, e.g. for blue and black herbal. I could have drawn the clusters differently, e.g. subdivided them or moved their boundaries slightly. Also, if there's randomness, you're going to get some overlap. What it might help elucidate, is the order in which the text was written.
There are places in the text where word frequencies do change abruptly between non-contiguous clusters, e.g. f26r, from green to red.
Similar words co-occur because they're frequent within certain pages clusters, e.g. the red herbal (Currier B) pages have a high frequency of words ending in edy.
This is exactly my point. The parameters used for clustering are subjective. Other clusterings are also possible. There are notable differences between two clusters. This means another interesting clustering would be to use a cluster for each quire or for each page of the VM. This way you would get smaller differences for two contiguous clusters. If I understand you right you use one state transition table to simulate each cluster. This means as more transition tables you use as more detailed is your simulation.
This would be no problem if you would argue that you use transition tables to simulate your results for the VMS. But since you claim that the text of the VMS was generated using your transition tables I see two problems.
The first problem is that you explain the differences within a cluster away as random. This is problematic since the details you see only depend on the number of clusters you use.
The second problem is that you use different transition tables for each cluster segment. As far as I see you didn't give an explanation for the changes between two transition tables. One explanation I see is that for the VMS a unknown text generation method was used and that the output for this unknown method changes over time. But this would mean that your state transition tables are only a way to simulate some of the output of this unknown text generation method. This would mean that your state transition tables can be very useful to learn something about this text generation method. But this would mean that they are only a simulation not the unknown method itself.