Options

A possible way to break down Voynich text

Index
A possible way to break down Voynich text
RE: A possible way to break down Voynich text

ThomasCoon > 23-12-2016, 04:10 AM

(22-12-2016, 06:42 AM)stellar Wrote: You are not allowed to view links. Register or Login to view.@ Coon

I hope not to be ignorant here.

I have a few questions. Have you stopped this theory or are you still working on it? Since you found 26 combinations and you explain them to be letters, would this indicate English instead of Latin? If you use combinations does this lower the word entropy? You say the spaces are fake and do you still believe that?

You must have worked on this quite some time so Great Job if that means anything?

Why are some glyph s equaled as the same glyph when they are different and how do you figure that?

Hi Stellar - thank you for the questions. I haven't stopped working on this (and should do more work on it), but the school year is in full-swing and time is in short supply!

Regarding English or Latin: When I started working on the VMS, I made a conscious decision NOT to guess what the underlying language was. To assume a language didn't seem the right place to begin, because that assumption could color all my observations. Instead it's better to work on the code, and the underlying language would eventually become clear. Earlier I said there were 26 "units", now I'm not sure. My views have changed a little - the patterns I talk about here may only be one component in a larger encryption scheme.

The word entropy would be lower using these combinations, yes.

I still suspect that some spaces are contrived / at least not meaningful.

Thank you for the "Great Job" - I admire your diligence and hard work also.

Regarding why some glyphs equaled others: some 2-letter combinations often appeared in the same places as each other, so I wondered if they might have the same meaning. I'm not so sure anymore - that is still something to investigate. As JKP and I have discussed before, when you try to break up a string of Voynichese into units, the whole thing quickly becomes a bowl of alphabet soup. The code is very hard to grasp; the writer must have been very intelligent.
RE: A possible way to break down Voynich text

A.Wilmarth > 03-08-2024, 04:31 PM

I understand this topic hasn't received any attention in nearly on 10 years which likely indicates the author has moved on from this line of thinking. However, I am assuming it's still somewhat relevant as it's included in You are not allowed to view links. Register or Login to view. pinned to the top of Analysis of the text.

After reading this thread I noticed two unanswered question themes, the first one was how accurate is this on a large scale, and which bi-grams are the same and whether author of the VMS intended for these to be reversed.

Process

Using ThomasCoon's unit chart, TC's explanations, and posted examples of the Units in action, I created a list of all Units and their permutations in EVA. I removed spaces, which TC believes do not carry meaning, and experimented with different orders of operations which break the text up line by line.
It wasn't possible for me to directly match TC's process. Part of the differences are due to from working in EVA vs with the VMS itself; there is a finite-ness in transliterations where a certain liberty exists working directly with the VMS. I attempted to account for this in being overly-cautions with my unit permutations.

The other reason was, and I could be wrong, but it seems like this was originally done pen on paper and if one way didn't work by the end TC would go back and try it another. This isn't a criticism, but it is difficult to replicate programmatically, particularly where the same VMS characters are broken up different ways in the same line. This is to say, while my results were not a 1 for 1 match for TC's I do believe I got close.

Results

Q1. How accurate is this on a large scale?
I used an Archimedes in the bathtub way of measuring success; I measured the original number of EVA characters in the VMS, ran the program replicating TC's method, then measured what was left.

Original characters (transliteration): 191545
Remaining characters: 8956
95.32% accuracy

For comparison I selected the 27 highest frequency bi-grams and used them in place of TC's Units. No permutations and the only order of operations was remove in order of frequency
.
ch, he, dy, ai, ok, in, ol, qo, ee, ed, ii, sh, da, ho, ey, ke, ot, yq, eo, ar, yo, al, ka, or, od, yc, hy.

Original characters (transliteration): 191545
Remaining characters: 52711
72.48% accuracy

Q2. Are bi-grams interchangeable?

Using the methodology from my You are not allowed to view links. Register or Login to view. post, I calculated the relative frequency of each Unit and its permutations across all topics, normalizing the results. I am working off the theory that content relates to the illustrations and their relative frequency in conjunction with those illustrations shows relatedness to that broad topic. So if Unit permutations have similar relative topic frequencies it might provide evidence the VMS author intended for them to be reversed or characters are interchangeable.

Here is a snip-it of the results, the full results are here:

ThomasCoon_units_results.xlsx (Size: 56.88 KB / Downloads: 10)

Discussion

I'm going to add my personal thoughts here. Feel free to skip this as I assume, given the length of time that has passed, most everyone has already made up their mind about this one way or another, so the information above is likely only relevant to newer folks like myself coming across this for the first time.

Assuming content is related to illustrations, I'm not seeing much evidence that TC's Unit's or even bi-grams can flipped. There are very few examples where permutations have the same major leans, let alone minor leans. Even within this thread the author seemed to be walking away from some of this interchangeability. If the illustrations are not related to the content, or perhaps only exist to tell the reader which way to break up the text, then my results for interchangeability are likely next to worthless. However, I might still point to the raw counts themselves, I would expect that if they could be interchanged freely, we may seem a more even split, but there are often vast differences between bi-grams and their reverses.

As for TC's Units themselves, the results are very good, and this may suggest there is indeed a way to break the VMS down into smaller non-character units, tho TC's original Units are likely not that way despite the impressive accuracy. TC would have very likely trimmed my cautious list of permutations down quite a bit, but even after that, differentiating c, h, and z alone would balloon the number of units. Also, me not being able to figure out a systematic approach which split characters the same way every time bothered me, but this could very well just be a failing on my part to understand it.

While my control/comparison test yielded significant lower results, this close to zero effort approach still yielded what I believe to be non-negligible results. A better order of operations, potentially based on the first or last letter, potentially removing it when the line has an odd number of characters, and a handpicked list of bi-grams may wind up with an accuracy rivaling TC's with far less units.

Overall, while I am not sure it's the right way to look at the VMS, I do think ThomasCoon's method is interesting and impressive even if ultimately I lean away from accepting it. The fact he worked this all out by hand makes it even more so. I believe that this, or a similar method, could be leveraged to improve transliteration, as it tends to highlight places where odd bi-grams stand out and deserve a second look.
Next Oldest Next Newest

A possible way to break down Voynich text

Index

RE: A possible way to break down Voynich text

RE: A possible way to break down Voynich text