Hi all ,
MarcoP has succintly described whats going on in the graphic.
A dot that is only a single color on the white background is a Herbal ( big plants ) page and that single color refers to the Scribe who wrote it.
My apologies for not making it clearer.
MarcoP asked "Are you using cosine-similarity on the N-dimensional vectors of token counts (where N is the total number of word-types)?"
If i have understood what i'm doing correctly

Then yes.
I used this You are not allowed to view links.
Register or
Login to view.
Using term frequency–inverse document frequency( tf-idf ) in addition to cosine-similarity would be an improvement.
Regarding You are not allowed to view links.
Register or
Login to view.
It is now an old (in internet terms) tool from 2009, it is very easy to use but the layout it creates is from a random seed its not deterministic
So if you run it several times with the same data , you will get slighly different layouts.
Overall LinLogLayout will produce similar big groupings on multiple runs but the specific neighbors of a folio may well differ.
( added note to my previous post )
If you have Java JDK then you can edit the source code and recompile to modify this behaviour (I have not done this) : You are not allowed to view links.
Register or
Login to view.
From my very limited understanding, the algorithms for the Linear Logarithmic layout are now widespread and can be found in many datascience tools.
One of which is You are not allowed to view links.
Register or
Login to view.