The Voynich Ninja
[Article] "The Strange Quest to Crack the Voynich Code" - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: News (https://www.voynich.ninja/forum-25.html)
+--- Thread: [Article] "The Strange Quest to Crack the Voynich Code" (/thread-3098.html)

Pages: 1 2 3 4 5


RE: "The Strange Quest to Crack the Voynich Code" - Aga Tentakulus - 17-02-2020

(17-02-2020, 09:55 PM)nickpelling Wrote: You are not allowed to view links. Register or Login to view.And there's a whole lot of Voynich toilet paper out there. :-(
But there's also a lot of Voynich shit  Big Grin


RE: "The Strange Quest to Crack the Voynich Code" - MarcoP - 18-02-2020

(17-02-2020, 07:45 PM)Alin_J Wrote: You are not allowed to view links. Register or Login to view.Did they assume that their variances (sigmas) in the other languages were the same as for the two languages they had measured many texts of - English and Portuguese? Furthermore, they chose the variance that was smallest (of the English cmp. to Portuguese texts)? Am I understanding correctly?

Then, as I understand, the compatibility was calculated from integration of the upper/lower tails from the measured value, of the interpolated distribution resulting from the use of these sigmas? If these integrals were < 0.05 it is not compatible since it is too far off a reasonable probability. 

I don't know about you, but to me this seems a bit bold... assuming the same sigma from the lower of only two measurements (languages). Not that I think that this would affect much the overall conclusions of the study though.

Hi Jonas,
sadly I don't understand enough of all this to contribute a meaningful opinion.

Another thing that strikes me is that one of the measures discussed by Amancio et al. is the Clustering Coefficient "C", something that was recently disccussed You are not allowed to view links. Register or Login to view.. If I understand correctly, they claim that C is not significant for telling an original text from it shuffled version (while C*, computed only over the 50 most frequent words, is).

Quote:Distinguishing Books from Shuffled Sequences
[...]
[measures that] do not fully satisfy  ζ1 [...] cannot be used to distinguishing a manuscript [sic] from its shuffled version.
[...]
When we performed the informativeness analysis over the most frequent words, we found that ζ1 is satisfied for the clustering coefficient and for the shortest paths (note that C* and L* are informative while C and L are not).

The paper discussed in the other thread (You are not allowed to view links. Register or Login to view. Cardenas et al. 2015) uses an almost identical approach in their analysis of the VMS: they compare its Clustering Coefficient with a randomly scrambled ordering of the text (which they call a "cyphered" text). Interestingly, Amancio et al. is not referenced: it appears that Cardenas et al. were not awere of this earlier work. Differently from what Amancio et al. found, the graph they provide (Fig.5 top) suggests that C is quite effective in separating a text from its shuffled versions.
Another puzzling thing: Amancio et. al measured a lower C (and C*) for the original text than for its scrambled versions: their normalized values are <1.  Cardenas' plot shows that C is higher for the original than for scrambled versions. For what is worth, when I tried replicating Cardenas' experiments I found something similar to what reported by Amancio.

(31-01-2020, 07:02 PM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.My preliminary attempts (using the networkx Python library) are not being very successful: in particular, scrambled texts appear to result in higher mean clustering C than original texts, while in the paper the opposite happens.

Of course, the easiest explanation is that I am misunderstanding something basic in one of the two papers (or both).


RE: "The Strange Quest to Crack the Voynich Code" - Alin_J - 18-02-2020

(18-02-2020, 05:56 AM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.Another puzzling thing: Amancio et. al measured a lower C (and C*) for the original text than for its scrambled versions: their normalized values are <1.  Cardenas' plot shows that C is higher for the original than for scrambled versions. For what is worth, when I tried replicating Cardenas' experiments I found something similar to what reported by Amancio.

Marco,

The Amancio study seem to make use of the "global" clustering coefficient while Cardenas et al. use the "mean" or average clustering coefficient. I have no in-depth knowledge of this, but the global clustering coefficient is the number of closed triplets divided by the total number triplets in a network (or text), while the average is calculated from the number of closed triplets between a node and its neighbors divided by the number of possible connections between the neighbors (average of all nodes or words). So they are different.


RE: "The Strange Quest to Crack the Voynich Code" - MarcoP - 18-02-2020

Thank you, Jonas!
This is above me. I must say that calling "Global clustering coefficient" "Transitivity" (as some authors apparently do) could help. The two measures seem closely related and I find it strange than one increases and the other decreases when shuffling a text. But of course the only way to be sure is replicating the results of these papers and I don't feel much confident I will ever manage to do that...


RE: "The Strange Quest to Crack the Voynich Code" - nickpelling - 18-02-2020

Voynich papers that use EVA 'raw' as if all the issues of parsing them into glyphs are non-existent are, of course, the worst culprits of all. In the pantheon of idiocy, these are right up there with channeled solutions and people reading EVA off the page on YouTube.

But using GC's 101 transcription is, in many ways, no less problematic. His take on the problems of transcribing Voynichese may have solved some problems with EVA, but it certainly introduced other problems in their place.

At least with Linear B pretty much all the experts agreed that it was a syllabary. With Voynichese, we still have not even the outline of a proof as to what constitutes a single token.

For example, I think i
 can prove that qo- isn't an extension of o-, which would imply that qo- is an indivisible token. But this doesn't mean that o- is a token. And what of or/ol etc?

It seems like a huge step to me to just assume that one particular parsing schema is the correct one, but it seems like it's no big deal to all these paper writers, so what do I know?


RE: "The Strange Quest to Crack the Voynich Code" - -JKP- - 19-02-2020

(17-02-2020, 10:57 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view.
"The analysis was made on a widely available transliteration of the Voynich manuscript text, the '101' format by Glen Claston" (Alin 2019, p. 5).

"I strongly believe that the biggest problem we face precedes cryptanalysis – in short, we can’t yet parse what we’re seeing well enough to run genuinely useful statistical tests" (You are not allowed to view links. Register or Login to view.).


That's the first time I've seen this statement by Nick and I have to say it is BANG on the money. It deserves to be a sig file.

Even now, years later, we have NOT yet parsed the glyphs in any way that is empirically substantiated and until at least SOME of that is accomplished, a large proportion of statistical tests will be glyph-assumption-based and highly questionable.

Are the sizes of the loops meaningful? Are the directions of the tails meaningful? Are the lengths of the tails meaningful? Are the spaces real in the linguistic sense? Are the gallows chars and benched gallows ligatures? Are EVA-x, EVA-g, and EVA-m ligatures? Is EVA-y an abbreviation? and so on...


RE: "The Strange Quest to Crack the Voynich Code" - nablator - 19-02-2020

(19-02-2020, 03:05 PM)-JKP- Wrote: You are not allowed to view links. Register or Login to view.
(17-02-2020, 10:57 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view."The analysis was made on a widely available transliteration of the Voynich manuscript text, the '101' format by Glen Claston" (Alin 2019, p. 5).

"I strongly believe that the biggest problem we face precedes cryptanalysis – in short, we can’t yet parse what we’re seeing well enough to run genuinely useful statistical tests" (You are not allowed to view links. Register or Login to view.).
That's the first time I've seen this statement by Nick and I have to say it is BANG on the money. It deserves to be a sig file.

Even now, years later, we have NOT yet parsed the glyphs in any way that is empirically substantiated and until at least SOME of that is accomplished, a large proportion of statistical tests will be glyph-assumption-based and highly questionable.

Only space detection and the choice of glyph variants (how many are there?) influence the words network statistics. 8am is as good as daiin for that purpose.


RE: "The Strange Quest to Crack the Voynich Code" - -JKP- - 19-02-2020

8am/daiin is not one of the bigger problems. If one is simply substituting symbol for symbol, the statistics come out somewhat the same. But what if it's not daiin? What if it is claw? in which the first letter is a ligature and the last three minims are all part of the same letter (which is quite possible)? Now we are talking about both structures on either side of the "a" having a different interpretation. And even the "a" itself may be a ligature. Maybe it is a combination of c + i. Or maybe it is only a combination of c + i when preceded by certain characters.

Another problem is that some transcripts don't even distinguish between dain, daiin, daiiin, daid, daiid, and daiiid.

As an even better example, if the EVA-y glyph, which BEHAVES like a Latin abbreviation in terms of position is, in fact, an abbreviation, then it can stand for 2, 3, or more letters. AND, in Latin, what it stands for at the beginning of a word is not the same as what it stands for at the end of a word. What if Voynichese functions in this manner, which was basically the norm in the 15th century?

Most statistical tests assume EVA-y is a single glyph AND that it has the same value whether at the beginning or end of the token (which is not at all the way it worked in the many languages that used Latin scribal conventions). At the beginning it was typically expanded to 3-letter common prefixes, at the end, it was typically expanded to 2-letter common suffixes (that were quite different from the prefixes). What if Voynichese uses some of these very prevalent methods of abbreviating text?


RE: "The Strange Quest to Crack the Voynich Code" - MarcoP - 19-02-2020

This seems to me an egg and chicken problem. Statistics are a powerful tool when approaching an unknown script. If one cannot make statistics unless he knows how the script works, there is no way out.


RE: "The Strange Quest to Crack the Voynich Code" - nablator - 19-02-2020

(19-02-2020, 05:05 PM)-JKP- Wrote: You are not allowed to view links. Register or Login to view.8am/daiin is not one of the bigger problems. If one is simply substituting symbol for symbol, the statistics come out somewhat the same. But what if it's not daiin? What if it is claw?
...
What if Voynichese uses some of these very prevalent methods of abbreviating text?
They don't matter for words statistics.