27-07-2020, 07:33 AM
Hi JKP,
if I understand correctly, light blue corresponds to Scribe1. Herbal pages (big plants) do not have a specific colour, they are just marked by scribe-colours. So pages 26r, 95v2, 95r1, 45r at the bottom left border are all Herbal pages by three different scribes.
Dots corresponding to all other sections have two colours: a smaller dot for the scribe and a larger circle or connected cluster for the section.
@RobGea: what does "cosine matched" mean exactly? Are you using cosine-similarity on the N-dimensional vectors of token counts (where N is the total number of word-types)?
If so, this seems to me a significant improvement with respect to what Julian described on voynichattacks (his system only uses word-types, ignoring token counts).
I don't know how easy it is to do, but I would be curious to see the same plot based on PCA instead of whatever system LinLog uses: I suspect that the closeness among page couples across the A/B boundary (e.g. 65v / 88v) is an artefact of this particularly plotting system. But of course a PCA plot will be much less easy to read.
I think that what you are doing is of the greatest interest. Further exploring Julian's experiments with LinLog seems very promising: I am looking forward to read more of what you find!
You make me want to play with this software myself, but as a first step I should probably read some of the papers you linked...
if I understand correctly, light blue corresponds to Scribe1. Herbal pages (big plants) do not have a specific colour, they are just marked by scribe-colours. So pages 26r, 95v2, 95r1, 45r at the bottom left border are all Herbal pages by three different scribes.
Dots corresponding to all other sections have two colours: a smaller dot for the scribe and a larger circle or connected cluster for the section.
@RobGea: what does "cosine matched" mean exactly? Are you using cosine-similarity on the N-dimensional vectors of token counts (where N is the total number of word-types)?
If so, this seems to me a significant improvement with respect to what Julian described on voynichattacks (his system only uses word-types, ignoring token counts).
I don't know how easy it is to do, but I would be curious to see the same plot based on PCA instead of whatever system LinLog uses: I suspect that the closeness among page couples across the A/B boundary (e.g. 65v / 88v) is an artefact of this particularly plotting system. But of course a PCA plot will be much less easy to read.
I think that what you are doing is of the greatest interest. Further exploring Julian's experiments with LinLog seems very promising: I am looking forward to read more of what you find!
You make me want to play with this software myself, but as a first step I should probably read some of the papers you linked...