Hi all,
I've been looking at different vords, trying to uncover an underlying structure that all or most VMS vords contain.
I've been wrong in the past and could be wrong now, but I wonder if characters that look alike (all "circle" characters, all gallows characters, all "ch/e" things) have the same function because they seem to appear in the same places. Take a look at the following:
[
attachment=1414]
There seem to be four groups of characters, and they repeat in a predictable and uniform way:
"Circle" group: o a y ai
"Hook" group: r l n m
"C" group: c h ch sh e ee eee che ech etc.
"Tall" group: k t p f d s
Can all (or most) vords be reduced to this 4-unit paradigm? Is that the underlying structure of the Voynichese cipher?
One thing to consider is the "unique" vords as compared to those that repeat at least a few times.
Unique vords don't necessarily follow the [exact] same structure.
Note that unique vords can often be broken into two vords that do follow patterns similar to the vords around them, just as removing EVA-P from the beginning of paragraphs (treating them as a special case) often yields vords that follow regular patterns.
Also, something I've noticed is that EVA-ch -sh (which you've classified in the "c" group) are more positionally flexible than many of the others and MIGHT belong in a class of their own (or differ in some other way).
Did I overlook something, or could you have done with one less "group of four"?
I think this could be done by making the first purple one the start.
Any word starting left of it could be shifted right by one group.
(01-06-2017, 03:27 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.Did I overlook something, or could you have done with one less "group of four"?
I think this could be done by making the first purple one the start.
Any word starting left of it could be shifted right by one group.
You're right Rene. That would make the table more concise. That is a good observation.
The main idea of the image is that this 4-unit structure may be the lowest-common-denominator that repeats throughout the manuscript (sometimes all 4 components are present, sometimes 3 or 2, sometimes only 1).
I could be completely wrong. But if I'm right, I can see a cipher application. For example:
to encode "a" = Group 1 sign + Group 2 sign + Group 3 sign + NO Group 4 sign
to encode "b" = Group 1 sign + Group 2 sign + NO Group 3 sign + Group 4 sign
to encode "c" = Group 1 sign + NO Group 2 sign + Group 3 sign + Group 4 sign
etc.
-JKP- Wrote:One thing to consider is the "unique" vords as compared to those that repeat at least a few times.
Unique vords don't necessarily follow the [exact] same structure.
Note that unique vords can often be broken into two vords that do follow patterns similar to the vords around them, just as removing EVA-P from the beginning of paragraphs (treating them as a special case) often yields vords that follow regular patterns.
Also, something I've noticed is that EVA-ch -sh (which you've classified in the "c" group) are more positionally flexible than many of the others and MIGHT belong in a class of their own (or differ in some other way).
That's a very good point JKP - I wonder if ch belongs to its own class, or if "ch" and "e" clusters are different. I only grouped them together on the basis that they look alike (which may be a bad reason)
The simplest You are not allowed to view links.
Register or
Login to view. can be done by an [a] & [o] check:
25% of the words do not contain an [a] nor an [o] in the word.
While you are wholly right that characters group together, I don't think that your groups are necessarily the best that can be.
For example, lk is common, rk is rare, and nk or mk are basically non-existent. Similarly, s is common at the end of words, d a bit less common but clearly normal, and all the gallows are uncommon in that position.
Instead of groups I've preferred to think like this: for every character, which is the most similar? It lets me see more about similarities and differences, and informs the bigger groups.
It's an interesting way of looking at it, Thomas. There surely seems to be a preference for certain shape types to follow others.
What confuses me a bit as a non-statistics minded person is that there can be several open spaces in the grid. The reasoning seems to be that filling in a spot is optional. But doesn't that kind of undermine the whole idea? How do you know that a grid like this underlies the system if it's not always filled in?
Just as a test I looked at the first word of a random page, which was on f88r:
So that's 4-1-2-3-1-1, right?
Doesn't that turn your proposed solution into something like binary code, where the exact glyph doesn't matter but rather whether it is expressed (1) or omitted (0)?
(01-06-2017, 06:49 PM)Koen Gh. Wrote: You are not allowed to view links. Register or Login to view.![[Image: image.jpg?q=f88r-286-446-161-63]](https://voynich.ninja/extractor/image.jpg?q=f88r-286-446-161-63)
So that's 4-1-2-3-1-1, right?
Doesn't that turn your proposed solution into something like binary code, where the exact glyph doesn't matter but rather whether it is expressed (1) or omitted (0)?
Bingo, Koen - that's what may emerge. Something like this:
to encode "a" = Group 1 sign + Group 2 sign + Group 3 sign + NO Group 4 sign
to encode "b" = Group 1 sign + Group 2 sign + NO Group 3 sign + Group 4 sign
to encode "c" = Group 1 sign + NO Group 2 sign + Group 3 sign + Group 4 sign
etc.
Emma May Smith Wrote:While you are wholly right that characters group together, I don't think that your groups are necessarily the best that can be.
For example, lk is common, rk is rare, and nk or mk are basically non-existent. Similarly, s is common at the end of words, d a bit less common but clearly normal, and all the gallows are uncommon in that position.
Instead of groups I've preferred to think like this: for every character, which is the most similar? It lets me see more about similarities and differences, and informs the bigger groups.
Those are great points, Emma - I don't know how to account for the different distributions, such as
rk vs.
lk. That is a shortcoming with my theory.
I've been wrong before and could be wrong again

I like any approach that tries to look for connections between glyphs and word structure and such. But I personally feel that the next step you propose is implausible.
One problem I see is that there would be ambiguities if any glyph can be dropped, to produce a 0 in the implicit binary code. Consider a word 1234. How do we know that those four glyphs belong to the same set? Might it be 123 without 4, followed by a 4 without 123? Or may 3412 have been dropped between 2 and 3?
So if I understand it correctly, a Voynich word 1234 may represent one, two or three plain text letters?
(01-06-2017, 11:11 PM)Koen Gh. Wrote: You are not allowed to view links. Register or Login to view.I like any approach that tries to look for connections between glyphs and word structure and such. But I personally feel that the next step you propose is implausible.
One problem I see is that there would be ambiguities if any glyph can be dropped, to produce a 0 in the implicit binary code. Consider a word 1234. How do we know that those four glyphs belong to the same set? Might it be 123 without 4, followed by a 4 without 123? Or may 3412 have been dropped between 2 and 3?
So if I understand it correctly, a Voynich word 1234 may represent one, two or three plain text letters?
I think this problem can be solved by spacing: if the encoder wants to separate "1" and "2" from "12", he can put a space between them.
Again I could be completely wrong about everything
