It is interesting to see how language groups flowed out fairly naturally from the data, in groupings that one might expect. There are almost always ways to tweak an algorithm to improve it, but this is a promising start.
(15-08-2020, 09:53 PM)-JKP- Wrote: You are not allowed to view links. Register or Login to view.It is interesting to see how language groups flowed out fairly naturally from the data, in groupings that one might expect. There are almost always ways to tweak an algorithm to improve it, but this is a promising start.
Oh, absolutely - l agree. But here’s the thing - the reason why this technology was used wasn’t to get groupings of languages - it was to develop vectors to describe each language and then feed back sentence length data to find a match. This use worked well, with known languages and using trigrams (or more, the data went up to 5-grams) they got ~97% matches.
But if it is to be used to find meaningful groupings, as was done here, a better understanding of what is actually being compared is needed. There are acknowledged “outliers.” Also, each vector set is built from the ground up and (understandably) Darrin didn’t show that this set “works” to be able to match new data to the known languages used. That would be helpful to judge the functionality of the particular individual language vectors set that was built this time.
This is completely aside to all the transliteration issues (e.g. the seeming need for a meaningful alphabet to truly be able to use this technology for comparisons).
None of this is meant to discourage - just to understand better what was done, how it can be interpreted, and what experiments might make sense if Darrin wants to undertake them.
(15-08-2020, 10:24 PM)MichelleL11 Wrote: You are not allowed to view links. Register or Login to view.None of this is meant to discourage - just to understand better what was done, how it can be interpreted, and what experiments might make sense if Darrin wants to undertake them.
Thanks for writing that! This is also exactly how my comments are intended.
(15-08-2020, 07:14 PM)Alin_J Wrote: You are not allowed to view links. Register or Login to view.Darrin,
Another thing that for the moment worries me in the description of methodology (You are not allowed to view links. Register or Login to view.) is the assumption of these particular vectors fulfilling the requirement of angle close to 90 deg (cos(angle) = ~0) for two different random vectors. In order for this to be true, the dot product of the two vectors must be near 0. The only way I can see for these to accomplish that is if the mean vector component value of at least one of the random vectors is 0. Therefore, two random vectors composed of random 0 or 1 will not work because their mean component value will then be approx. 0.5 if they are truly random. The dot product of the two vectors will be 0.25 times the number of components (dimensions) according to my calculations (resulting in a lower angle, around 60 degrees).
However, random vectors composed of either -1 or 1 randomly will work (each random vector component centered around 0 with equal distance). Perhaps this is what is meant?
Keep in mind that the components of the hypervectors are bits. They can only take two values.
Prof. Kanerva uses the values -1 and +1 while Darrin uses 0 and 1.
Addition and multiplication are defined in the same space.
I wonder, therefore, if there is a bug in Darrin's approach (which may turn out to have no impact).
The bit-wise multiplication based on "-1" and "+1" is straightforward:
-1 x -1 = 1
-1 x 1 = -1
1 x -1 = -1
1 x 1 = 1
Translated to the use of 0 and 1, this gives:
0 x 0 = 1
0 x 1 = 0
1 x 0 = 0
1 x 1 = 1
This means that multiplication is the same as the logical 'equivalent' function.
However, in Darrin's example he uses the opposite (which is the logical 'exclusive or' function).
Then, at the end, when all tri-grams are added, Darrin does this in the 'natural numbers' space, and then converts the result to 0 or 1 depending on whether it is more or less than 'half'.
However, it may well be that prof. Kanerva intended a bitwise addition, which is the same as 'exclusive or'. This result can be obtained from the 'natural numbers' sum by checking whether the number is even or odd.
On this last point I am not certain what prof. Kanerva meant.
(18-08-2020, 08:34 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.Then, at the end, when all tri-grams are added, Darrin does this in the 'natural numbers' space, and then converts the result to 0 or 1 depending on whether it is more or less than 'half'.
Yes, and the steps that are described up to this point are perfectly logical and fine, it's just after this last step that the result will not be correct if orthogonal vectors are to be expected from comparing two fully uncorrelated vectors. As it is now, two vectors that are meant to be uncorrelated will give an angle of 60 deg., which (erroneously) shows some correlation.
But if just after having applied the threshold function all '0' are changed to '-1' (and each '1' is left as it is) it should work as intended. (But the correlation will be based on number of exactly matching trigrams according to our known alphabet and the VMS text is just in a transliteration alphabet and is not in our alphabet, but that's another issue).
No, the theory is correct, really. One should not think of the 0's and 1's as real numbers, and the correlation is not computed in the way that we would normally expect.
(18-08-2020, 07:40 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.No, the theory is correct, really. One should not think of the 0's and 1's as real numbers, and the correlation is not computed in the way that we would normally expect.
Yeah, I forgot one thing, that the PCA procedure should automatically center the data around 0. That is, the mean component value is subtracted from each vector. That will leave the vectors having components either -0.5 or 0.5. So, yeah in that case it should all make sense mathematically.

(but the description in the beginning didn't cover that part).
I think we were meaning the same.....
If one uses -1 and +1, then the result +1 means fully correlated, ; -1 means fully ant-correlated, and 0 means uncorrelated. This is as we know correlation.
If one does the same calculation with 0 and 1, then fully correlated still comes out as 1, but fully anti-correlated comes out as 0. One would have to do the linear mapping.
In the (hypothetical) case that Darrin omitted that mapping, then the plots should come out looking the same, but the scales for the two axes would be different.
(19-08-2020, 07:00 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.In the (hypothetical) case that Darrin omitted that mapping, then the plots should come out looking the same, but the scales for the two axes would be different.
Any linear scaling of the original data will only change the scaling of the principal components in the end result of PCA, true. But still, during PCA component calculation, all the data
must be translated/centered around 0 (mean-subtracted) to work because it's based on variance and co-variance in the data and these are used to calculate the directions of correlation in the data. This is why the routines for PCA in software automatically does it for you so you don't have to think about it.
So, it doesn't matter how much you scale or translate the data before this, as long as you do the same for all the data, but internally, this step is still critical to ensure that PCA gives the correct result.
Variances and co-variances are products and squares of the mean-centered deviations, so they will not increase proportionally/linearly with the deviation. This means that if a mean-centering should not have been performed either by the software during PCA or by you it will show false results and correlations with e.g. the principal components rotated more in relationship with the original vectors in wrong directions.