Hey guys. Fully open to more experiments. The tool chain and analysis system is set up so tweaks are easy.
While developing this I got a feel for whats happening with PCA. The language vectors really are 10,000 dimensional so it's impossible to visualize in 3D before PCA. You've just got to remember the letters a-z and space are all orthogonal. That means any given trigram will be unique in the 10K dim space. Summing the trigrams for a language produces a unique fingerprint. If enough trigrams are similar the language vectors will be -mathematically- similar.
The PCA plot as projects a 10K dim point cloud (vector endpoints) on to a 2D surface. Like shining a flashlight through a snow globe and looking at shadows on the wall. As long as you always use the same PCA axes the plot gives meaningful data from run to run.
To an earlier point, summing binary hypervectors always requires a threshold function
(1 1 0)
(1 0 1)
(1 1 0)
(1 1 0)
----------
(4 3 1) pre threshold
thr=nvec/2=2
If (bit-n >= thr) then (bit-n=1) else (bit-n=0)
(1 1 0) post threshold
Vector multiplication is bitwise XOR
0 0 = 0
0 1 = 1
1 0 = 1
1 1 = 0
D
Hello Darrin,
thanks for coming back to this.
I have two comments, which I already mentioned in earlier posts. The first is very important, the second probably less.
Quote:The language vectors really are 10,000 dimensional so it's impossible to visualize in 3D before PCA. You've just got to remember the letters a-z and space are all orthogonal. That means any given trigram will be unique in the 10K dim space.
That is right, but PCA only gives you two the first two of 10,000 dimensions, and it should then be clear that one has to be very careful about drawing conclusions from only these two dimensions.
The 3rd, 4th and 5th (and a few more) may show additional separations of similar magnitude as the first and second that are visualised.
As I already mentioned, there are clear indications that the Voynich text-related points seem to be separated in one or more of the other components.
Quote:Vector multiplication is bitwise XOR
0 0 = 0
0 1 = 1
1 0 = 1
1 1 = 0
No, that's not correct. This table is for vector addition.
Multiplication is bitwise Equivalent:
0 0 = 1
0 1 = 0
1 0 = 0
1 1 = 1
(22-08-2020, 05:54 PM)dvallis Wrote: You are not allowed to view links. Register or Login to view.If enough trigrams are similar the language vectors will be -mathematically- similar.
Darrin, I hope you don't think I'm being too picky, but what you really wan't to (or should) say is 'if enough trigrams are
the same', not '
similar'. I.e. they are only matched and contribute to the total correlation when all three characters in the trigram are exactly matched. I think it's important to make that distinction. It's the number of these exactly matching trigrams that are counted and which will determine the outcome. And if they are not matched, the vectors of the two trigrams will be as orthogonal (fully uncorrelated) as the rest.
This is a really interesting topic, but unfortunately I won't be able to find the time to try this out myself.
Some further thoughts:
- all the shifted characters: A in position 1, A in position 2, A in position 3 etc. etc. are also normal / uncorrelated to all other ones.
- I would have expected that combining the tri-grams would be done by addition of the hypervectors, but the original paper proposes multiplication. The implementation by Darrin seems to use addition, but I remain uncertain whether it actually makes any difference
The correlation is certainly based on multiplication (XNOR as Darrin said)
The main thing that remains open for me in this study is to know more about the higher PCA components.
Kanerva's follow-on lecture is easier to follow than the original paper. He clarifies that a profile vector is created by addition of trigrams.
ACCUMULATE PROFILE VECTOR
Add all trigram vectors of a text into a 10,000-D
Profile Vector ....
Yes.
What I meant was the creation of the trigram vectors, i.e. the combination of:
T__ with _H_ and __E.
This seems to be done by multiplication in his paper.
It is. And that's what I use. rrT XOR rH XOR E
This is all very confusing, and I found the point where the confusion arises.
Referring to the You are not allowed to view links.
Register or
Login to view. , following is a clip from one page:
[
attachment=4715]
This is very counter-intuitive, and in other uses of bit strings this is quite different.
Using [0;1] ("binary") for the bits implies the use of 'false' and 'true'.
Using [-1;1] ("bipolar") is very different. This is just a matter of polarity. There is no more sense of true and false.
The clip above clearly indicates that the proposed implementation is confusing. This is due to the illogical mapping between binary and bipolar.
The algorithm should work independent from this mapping, but this is not the case.
Addition is independent of the mapping, but multiplication depends completely on this.
Multiplication is needed to compute correlation. In this respect, multiplication is done using the XNOR operation.
When the paper applies XOR, it is equivalent with addition.
This is also what you, Darrin, have done, so there is no issue.
However, the main issue I have remains, namely the higher PCA components.
(22-08-2020, 07:20 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.That is right, but PCA only gives you two the first two of 10,000 dimensions, and it should then be clear that one has to be very careful about drawing conclusions from only these two dimensions.
René, I have to correct you on this one. It is not true that PCA only gives you the first two of the 10 000 dimensions. The two first principal components in any n-dimensional set gives two directions, as a 2-dimensional projection, of the directions in the n-dimensional set along where the two largest co-variances are found (these are the directions of highest correlation in the data). You have to look at the percentage of variance explained by the components in each individual case to determine if this is high enough. But it's completely possible to have 10 000-dimensional data with the first two components together explaining 99% or so of the variance/co-variance in PCA.
In this case where the correlation in data is made up to be fully random for most of the possible pairs of the dimensions being compared it is even probable that the vast majority of dimensions will not make any difference (that of all the co-variances being evaluated, most will be very close to zero).