Since a picture is worth 1,000 words, here is a summary pic of what part of my Concordance looks like for one specific glyph-combination (this does not include the info in the 1100+-page document and there are other views that are not included in the pic):
Yes, I know, small and blurry. My apologies... I guess I'm not quite ready to give up data that took me
y e a r s to assemble (and from which I'm still trying to salvage what I can).
The concordance maps every VMS token (yep, every one). It produces statistics and graphs on the length of the token, where it occurs, how frequently it occurs on a specific folio, and which glyph-groups precede and follow it if it also occurs as part of a longer token. There are breakdowns of specific sections, such as individual pool pages, cosmo pages, individual large-plants, and each individual rotum on the "map" folio.
What I expected to see (patterns of relationships or similarity) simply aren't there in any linguistic sense or even in any token-group sense that I can see.
I would never discourage someone from trying to go down the same path, maybe they'll see something I'm overlooking, but I mention all of this mainly because I
suspect the answer lies in a different path.