The Voynich-LAT seems to be in no-man's land, or do I see that incorrectly?
I was rather expecting something like this, because reasonably high-frequency single characters are converted to tri-grams, and they will dominate the tri-gram statistics, on which the analysis is based...
(15-08-2020, 02:02 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.The Voynich-LAT seems to be in no-man's land, or do I see that incorrectly?
I was rather expecting something like this, because reasonably high-frequency single characters are converted to tri-grams, and they will dominate the tri-gram statistics, on which the analysis is based...
I would have expected it to be somewhere between the other Voynichese transliterations and Latin, but it is not surprising that distances are not preserved when the dimensions are reduced to only two by PCA.
A question for Darrin:
Does "Languages which are similar will have hypervectors close together" mean that a high enough proportion of exactly matching letter tri-grams is needed? I'm just trying to understand what is being compared. After a rotation (like rot13) or any permutation of the alphabet of a single language sample text, will this text be placed differently on the PCA plot?
Rene, I'm not sure what you mean by "no man's land".
Would you consider Syriac, Russian, Finnish, Aramaic, and German to be out in no man's land? They are also near the outer edges of language groups.
You are not allowed to view links.
Register or
Login to view.
Wanted to share this reference with the group. It is cited in the Kanerva lecture slide deck linked to Darrin’s fuller discussion.
(15-08-2020, 03:15 PM)MichelleL11 Wrote: You are not allowed to view links. Register or Login to view.You are not allowed to view links. Register or Login to view.
Wanted to share this reference with the group. It is cited in the Kanerva lecture slide deck linked to Darrin’s fuller discussion.
If I am reading this correctly, a hypervector built from some trigrams statistics will only match well (small cosine distance) other vectors built from similar trigrams statistics (for the same trigrams). This will not work at all with an arbitrary choice of letters to represent a language (i.e. a permutation of the alphabet or simple substitution of another alphabet). For Voynichese, no one is arguing that EVA (or CUVA etc.) matches the actual letters that the writer had in mind (if any). Every natural language theorist uses a different mapping.
(15-08-2020, 02:35 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.expected it to be somewhere between the other two Voynichese transliterations and Latin
However, we have already seen that almost all examples of 'Voynichese expanded to Latin' do not look like Latin at all.
(15-08-2020, 02:35 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.... distances are not preserved when the dimensions are reduced to only two by PCA.
Absolutely. And the actual hyper-vector measure is far from easy to grasp.
(15-08-2020, 05:23 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.Quote:... distances are not preserved when the dimensions are reduced to only two by PCA.
Absolutely. And the actual hyper-vector measure is far from easy to grasp.
There's even more. The addition of Voynich-LAT has actually changed the directions of the two principle axes. A closer comparison of the last figure with the previous one shows that there has been a rotation, moving also the Voynich-Eva and Voynich-Cuva dots 'outside' the branch that they were in.
Compare You are not allowed to view links.
Register or
Login to view. with You are not allowed to view links.
Register or
Login to view. , and the locations of "Adyghe" and "Agul".
This suggests that there was also a separation in a higher dimension.
(15-08-2020, 04:55 PM)nablator Wrote: You are not allowed to view links. Register or Login to view. (15-08-2020, 03:15 PM)MichelleL11 Wrote: You are not allowed to view links. Register or Login to view.You are not allowed to view links. Register or Login to view.
Wanted to share this reference with the group. It is cited in the Kanerva lecture slide deck linked to Darrin’s fuller discussion.
If I am reading this correctly, a hypervector built from some trigrams statistics will only match well (small cosine distance) other vectors built from similar trigrams statistics (for the same trigrams). This will not work at all with an arbitrary choice of letters to represent a language (i.e. a permutation of the alphabet or simple substitution of another alphabet). For Voynichese, no one is arguing that EVA (or CUVA etc.) matches the actual letters that the writer had in mind (if any). Every natural language theorist uses a different mapping.
Yes, this is exactly the point I (and Marco) earlier tried to make clear in this thread.
Darrin,
Another thing that for the moment worries me in the description of methodology (You are not allowed to view links.
Register or
Login to view.) is the assumption of these particular vectors fulfilling the requirement of angle close to 90 deg (cos(angle) = ~0) for two different random vectors. In order for this to be true, the dot product of the two vectors must be near 0. The only way I can see for these to accomplish that is if the mean vector component value of at least one of the random vectors is 0. Therefore, two random vectors composed of random 0 or 1 will not work because their mean component value will then be approx. 0.5 if they are truly random. The dot product of the two vectors will be 0.25 times the number of components (dimensions) according to my calculations (resulting in a lower angle, around 60 degrees).
However, random vectors composed of either -1 or 1 randomly will work (each random vector component centered around 0 with equal distance). Perhaps this is what is meant?
(15-08-2020, 05:23 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view. (15-08-2020, 02:35 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.expected it to be somewhere between the other two Voynichese transliterations and Latin
However, we have already seen that almost all examples of 'Voynichese expanded to Latin' do not look like Latin at all.
Yes, agreed, it doesn't, but most western languages (English, French, German, Italian, Spanish, etc.) used many of the same Latin scribal abbreviations, so I wanted to see how these expansions might influence position on the graph, and the orientation of the graph itself.
(15-08-2020, 07:14 PM)Alin_J Wrote: You are not allowed to view links. Register or Login to view.Another thing that for the moment worries me in the description of methodology (You are not allowed to view links. Register or Login to view.) is the assumption of these particular vectors fulfilling the requirement of angle close to 90 deg (cos(angle) = ~0) for two different random vectors. In order for this to be true, the dot product of the two vectors must be near 0. The only way I can see for these to accomplish that is if the mean vector component value of at least one of the random vectors is 0. Therefore, two random vectors composed of random 0 or 1 will not work because their mean component value will then be approx. 0.5 if they are truly random. The dot product of the two vectors will be 0.25 times the number of components (dimensions) according to my calculations (resulting in a lower angle, around 60 degrees).
However, random vectors composed of either -1 or 1 randomly will work (each random vector component centered around 0 with equal distance). Perhaps this is what is meant?
In other words (l believe) it’s this part of your discussion it would be interesting to understand more about:
“The lecture left out an important step. After summing each trigram into the language hypervector, each vector element must have a threshold function applied. If the count of ones in any element is greater than the 50% of vector dimension it is set to 1, else it is zero.
Eng = (0, 1, 1, 0, 0 … BITn-1, BITn)”
Of course, l could be misinterpreting this - but more input would be great! Please understand these questions are not to downplay the obvious amount of effort that went into your work.
Bottom line, thank you for attempting this and being so open to tweaking and explaining what you did.