doranchak > 31-03-2018, 08:04 PM
MarcoP > 05-04-2018, 08:23 PM
(31-03-2018, 08:04 PM)doranchak Wrote: You are not allowed to view links. Register or Login to view.I am very curious about the general case of sequences of n "vords".
For example, instead of considering only patterns of the type XYXY, consider any sequence ABCD. In other words, consider every combination of 4 vords.
For each combination, compute the expected number of occurrences based on your probability calculations. Then compare to the actual count, and sort the list of combinations in descending order of the difference between actual and expected. Which combinations are unusually repetitive (or unusually non-repetitive, i.e. phobic of specific combinations) and are thus statistically significant compared to random distributions of vords?
I suspect this test might have already been performed - perhaps someone can point me to existing research on this.
Koen G > 05-04-2018, 08:51 PM
MarcoP > 05-04-2018, 10:22 PM
(05-04-2018, 08:51 PM)Koen Gh. Wrote: You are not allowed to view links. Register or Login to view.Interesting that there is so little repetition of three and four word sequences. Is it easy to run a test against a corpus of similar size? To give it a fair shot it shouldn't be something with fixed formulas like Homer or the Bible. And the language shouldn't be too analytic like modern English. For example "in the name of the" is a sequence of five, compared to Latin "in nomen", two. So my guess is that Latin prose may be a fair shot, though I'd still expect more repetition of sequences.
Koen G > 06-04-2018, 04:48 AM
MarcoP > 06-04-2018, 11:23 AM
doranchak > 06-04-2018, 11:32 AM
(05-04-2018, 08:23 PM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.Hello Doranchak,That is the kind of work I enjoy doing. My main focus of research is the Zodiac ciphers and I spent a lot of time generating statistics, often by randomizing a ciphertext and using the randomizations to estimate significance of certain observations (You are not allowed to view links. Register or Login to view.). My curiosities around Voynich follow a similar pattern of thinking about how the various qualities of Voynichese compare to randomizations of the text. The "phobia" of 3- and 4- word sequence repetitions seems very significant because I would expect shuffles of the text to produce many more than are observed. It would also be very interesting to me to see if the 3- or 4- word repetitions are actually happening over certain distances (i.e., a pattern of words such as ABC might not repeat, but a pattern such as A??BC might, where "?" is a wildcard representing any other Voynich word).
it is surprising how little statistical research has been done on the manuscript. There is so much to discover!
You can find some information about repeating word sequences in this post by Julian Bunn:
You are not allowed to view links. Register or Login to view.
Apparently, there is a single repeating 4-words sequence:
ol shedy qokedy qokeedy
which occurs at f75v.P2.21 and f84r.P.10
There are a few 3-words sequences, but the most promising area for an extensive analysis are two-words sequences: they seem to be numerous enough to produce meaningful statistics.
MarcoP > 06-04-2018, 12:16 PM
(06-04-2018, 11:32 AM)doranchak Wrote: You are not allowed to view links. Register or Login to view.That is the kind of work I enjoy doing. My main focus of research is the Zodiac ciphers and I spent a lot of time generating statistics, often by randomizing a ciphertext and using the randomizations to estimate significance of certain observations (You are not allowed to view links. Register or Login to view.). My curiosities around Voynich follow a similar pattern of thinking about how the various qualities of Voynichese compare to randomizations of the text. The "phobia" of 3- and 4- word sequence repetitions seems very significant because I would expect shuffles of the text to produce many more than are observed. It would also be very interesting to me to see if the 3- or 4- word repetitions are actually happening over certain distances (i.e., a pattern of words such as ABC might not repeat, but a pattern such as A??BC might, where "?" is a wildcard representing any other Voynich word).