The Voynich Ninja

Full Version: A possible way to break down Voynich text
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5 6
Hello Thomas;

     Saw  your  analysis briefly and it reminded me of some work I did a few years back which I shared in the Voynich Mailing List site.

      The issue of constant repetitions of "words" or "word parts " or "letter sequences"  in the VMs was in my opinion consistent with two main elements in  what I thought of as a "cryptocartographic" interpretation. 

(1) In this theory, each full page- including the "text" blocks, is primarily an image of a map.  Parts of the map depicted (and there may be many different  maps in the volume) can be  rendered as  shapes in plants, people, designs, etc; similarly, other parts of the map (perhaps important parts) may be rendered as what seems to be text .  The "text" might, in an exaggerated hypothetical example , be a block which appears as :

Sand Sandy  River smalltree  Hill
Sandy Grass River Hill
Sand Grass  River Grass Sand
Sandy Grass   River House Sand
Bigtree  Grass River Garden Sand
Sand Sandy  River  Sandy Sand 

This "text" area demonstrates a possible reason for  the large redundancy we observe in the Vms; the point being that "reading" the text involves viewing the entire field exactly  as positioned on the page,  just as we view a poem, a calendar or a list.  In that mode,   the Voynichese is "written" a bit like the way a surveyor might record a plot of land or,  perhaps ,  a  text in  "boustrophedon"  but made  only in one direction , the way an raster might  scan an image electronically.


(2)  In the above example, it is  fairly clear  that what is depicted is a kind of "map" of a dwelling  near a couple of trees and hills in the middle of a sandy area ;  the dominant  feature is  a river which passes from "north" to "south".  This feature is  potentially important as pertains to the VMs because in many VMs folios, there are clear patterns of similar "words" which run vertically or diagonally  through the "text".  If the VMs has any consistent  verbal meaning,   these features must be explicable...somehow.

Or so I thought. Haven't been doing much work on this recently.

David Suter
Thomas,


Would you consider using the outer text band of You are not allowed to view links. Register or Login to view. as one of your texts for analysis? Not only does it have a high number of words starting with EVA 'o'', it also has far more internal word repetition than any other band of text in the VMs Zodiac. Not to mention one of Stolfi's "'start here' markers".

I trust you are aware of the 4 x 17 symbol sequences of VMs f57v. They also conventionally start with the symbol that is EVA 'o'. Couldn't that be more than just coincidence?
To Anton and ReneZ:

ReneZ Wrote:That last point I think is important in order to advance, namely to clearly separate the analytical part (how to separate the text into basic units) and the speculative part: what it means (cipher or language?, which language?).

The most important questions to answer at the start are, in my opinion:
- How much of the text (e.g. in percent) does this really explain? It is certainly not 100%, and it is also correct that one may assume that the text has some errors. If it can't (yet) be checked by a piece of software, it should at least be checked against some relevant pages, one herbal-A, one herbal-B, one bio or one recipes page.
- How can we know that this is the only method? Really, it is not likely to be, but how to find the 'most likely' method.

(04-09-2016, 04:54 PM)Anton Wrote: You are not allowed to view links. Register or Login to view.Hi Thomas,

Next, about the orthography decomposition. As I believe I already noted in my review of Brian's paper, if we speak of some rules or mapping, then the rules are there to be adhered to. So if one proposes a "rule" that is followed in only 50% of the cases, that is simply not a rule. Respectively, Brian is right about the testing of the pattern validity. Figures are needed to assess that. I'd say, 85% conformity is a must to ever begin any discussion and 95% would be good conformity to speak of a "rule" (providing 5% for scribal errors).

Regarding how often my rules are adhered to:
Here are the first 3 pages of the Herbal (pictures attached). With my method, here are the result:

You are not allowed to view links. Register or Login to view. Correct combinations: 228
Exceptions: 2
Rate of success: over 99%

You are not allowed to view links. Register or Login to view. Correct combinations: 249
Exceptions: 5
Rate of success: Over 97%

You are not allowed to view links. Register or Login to view. Correct combinations: 148
Exceptions: 1
Rate of success: over 99%

I do admit that I am making the (reasonable) assumption that cTh may be broken into ch t and likewise for cKh

Quote:ce and ee are definitely not the same thing, since that crossbar is met not only with e, but, e.g. with y. Wladimir provided examples of that.

The fact that c9 also appears is not a problem to me -
Actually, that may reinforce my idea that ch is just  ee:
ch = ee
c9 = ey
co = eo
(And all three are accounted for in my chart.)

Sorry, pictures of the herbal breakdowns (as mentioned in the last post):
Thomas, just so you know, when I was commenting on unique word-tokens, I wasn't talking specifically about your line of thought but in general.

Taking the approach of deconstructing the text and looking across spaces to find the patterns naturally leads to going in the other direction as well... of putting in spaces where they might be intentionally missing.

A number of people have proposed that "unique" vords might be names, for example, and I thought that was a reasonable idea to consider, but soon noticed that many of these "unique" words aren't unique at all but oftentimes a combination of two common vords, which led me to suspect that spaces might be either artificial or carefully contrived. It's a challenging problem algorithm-wise even if the components are correctly identified since the insertion or removal of spaces doesn't have to follow a regular pattern. But... given the regularity of the individual parts and the frequent patterns in word beginnings and ends, I'm leaning toward them being contrived, which makes it a slightly less intractable problem.
(04-09-2016, 04:54 PM)Anton Wrote: You are not allowed to view links. Register or Login to view.Hi Thomas,

In my opinion, the key point in approaching the Voynichese text is that we don't know the real Voynich alphabet. E.g., we don't know whether iin stands for three characters or one character, whether l stands for one character or two characters (i plus a tail modifier), etc.

Hey Anton. That is true - we may not know which characters are original and which are modifications - but we can also see where each character appears in relation to each other - and this is usable data. Statistically some characters cannot be the same because they appear in very different circumstances. l for example appears in completely different environments than y so I really doubt they are the same. But m appears in many of the same places as y and also looks similar to it in the text - and this is something I only realized after trying to figure out what combinations y could join in.

Here are three examples where m looks like y with an extra loop at the top:

[Image: attachment.php?aid=537]

Also, I claimed yesterday that <Sh> = <se>.

<s> characters are clearly made of two strokes many times: one that looks like a "c" and then a swirling descender. When <Sh> is written, there is also a "c"-like shape and a swirling descender:

[Image: attachment.php?aid=538]

I know that this is not conclusive evidence, but in my "units" theory, <S> works exactly where <s> does, including in combinations like <Sy>
In the topmost scan it is not m but g.

Quote:The fact that c9 also appears is not a problem to me -
Actually, that may reinforce my idea that ch is just  ee:
ch = ee
c9 = ey
co = eo

I believe there are many more examples with the "unusual" crossbar. If Wladimir reads this thread maybe he will refresh the example (I can't find it...)

Quote:You are not allowed to view links. Register or Login to view. Correct combinations: 228
Exceptions: 2
Rate of success: over 99%

You are not allowed to view links. Register or Login to view. Correct combinations: 249
Exceptions: 5
Rate of success: Over 97%

You are not allowed to view links. Register or Login to view. Correct combinations: 148
Exceptions: 1
Rate of success: over 99%

That's good, but three folios are insufficient to make conclusions. Ideally, the whole corpus should be analyzed. But the fact that the three folios picked at once gave high compatibility is suggesting indeed.

Quote:I do admit that I am making the (reasonable) assumption that cTh may be broken into ch t and likewise for cKh

Consider e.g. this post, screenshots 2 - 6: You are not allowed to view links. Register or Login to view.

***

...What I forgot to mention is one very important consideration. At least some EVA characters appear to have standalone significance. Consider  columns in You are not allowed to view links. Register or Login to view. or You are not allowed to view links. Register or Login to view. or circles in f57v. These might represent a negligible percentage of the overall corpus, but it's clear that those cannot be attributed to scribal errors.
The recipe section seems to contain more incompatibilities with your system.

I just looked at You are not allowed to view links. Register or Login to view. (a randomly picked recipe folio) and just in the last two paragraphs see some stuff like:
  • polarar lshedy
  • okeey lr
  • ssheey l shey
  • qokeey lchedy loty
etc.

By the way (as a far reaching unconfirmed conclusion) this might mean that botanical folios more or less follow a pre-defined template, while the recipe folios are more "loose".
suter Wrote:The issue of constant repetitions of "words" or "word parts " or "letter sequences" in the VMs was in my opinion consistent with two main elements in what I thought of as a "cryptocartographic" interpretation.

Dear David - your idea is a very interesting one! I can definitely see what you're talking about. I would definitely be interested in reading more research on the idea, if you decide to continue working with it!

(05-09-2016, 12:25 AM)Anton Wrote: You are not allowed to view links. Register or Login to view.The recipe section seems to contain more incompatibilities with your system.

I just looked at You are not allowed to view links. Register or Login to view. (a randomly picked recipe folio) and just in the last two paragraphs see some stuff like:
  • polarar lshedy
  • okeey lr
  • ssheey l shey
  • qokeey lchedy loty
etc.

By the way (as a far reaching unconfirmed conclusion) this might mean that botanical folios more or less follow a pre-defined template, while the recipe folios are more "loose".

You're right - Here are the breakdowns: 28 good combinations and 4 bad ones, all involving <L>:
  • p/ol/ar/ar l/s/he/d/y
  • ok/ee/y/ lr
  • s/sh/ee/y/ l /sh/ey
  • qo/k/ee/y l/ch/e/d/y lo/t/y

-JKP- Wrote:Taking the approach of deconstructing the text and looking across spaces to find the patterns naturally leads to going in the other direction as well... of putting in spaces where they might be intentionally missing.

That is true, -JKP-. I think, though, that if we get a string of text which clearly spells out German (or Latin, Italian etc.) phrases, we will know where spaces are supposed to be. For example if you saw this string of text:

Weholdthesetruthstobeselfevidentthatallmenarecreatedequal

You would clearly recognize it as English and know immediately where to put spaces.



-JKP- Wrote:It's a challenging problem algorithm-wise even if the components are correctly identified since the insertion or removal of spaces doesn't have to follow a regular pattern

E xact ly. I do n't thin k th ere is a pat te rn.
I thin k it 's intent ional ly rand om.


Anton Wrote:In the topmost scan it is not m but g.

Thank you - I was not even aware that <g> was a different letter than <m>. Alright, I'll have to look more at that situation. I'm not sure I agree with the EVA creators that they are different.

Anton Wrote:...What I forgot to mention is one very important consideration. At least some EVA characters appear to have standalone significance. Consider  columns in You are not allowed to view links. Register or Login to view. or You are not allowed to view links. Register or Login to view. or circles in f57v. These might represent a negligible percentage of the overall corpus, but it's clear that those cannot be attributed to scribal errors.

That is also fair - I will look into those pages and see what I can find.



Anton Wrote:Consider e.g. this post, screenshots 2 - 6: You are not allowed to view links. Register or Login to view.

I have seen these <chh> combinations and they may not affect my units - for example:

2. ySKcy= y/s/k/ey

5. ch cKhh y = ch / ek / ch / y

6. ch cKhhy = ch / ek / ch / y (same as 5)



Koen Gh Wrote:One interesting test would be to somehow try to encrypt a known text into Voynichese using your system, and see to what extent you manage to approach its appearance.

Here is a random phrase that I pulled from Augustine's De Vita Beata:
"Nullo modo dixerunt. Etiam hoc tertium respondete: spiritus..."

Assigning random Latin letters to my 26 units, I can produce:
yolar ar Shaiin seeo Shocsal chetoly qoch qoskeaiin arSheo qochot qosolaiin otchod keesy oeee qochod ...

Now, just for an attempt to recreate Voynich, this looks pretty Voynich-ish Smile Actually, some of these words are vords.
(Key: <y> = N, <ol> = U, <ar> = L, <Sh> or <se> = o, <aiin> = m, <eo> = d, <s> = i, <al> = x, <ch> or <ee> = e, <ot> = r, <qo> = t, <ke> = a, <or> = H, <od> = s)
@ThomasCoon,

Thanks for the response.

(04-09-2016, 03:23 AM)ThomasCoon Wrote: You are not allowed to view links. Register or Login to view.4. As you can see, <l> is preceded by <o> so often that it is not random chance. It seemed to me that <ol> must be a combination, and there are 25 similar combinations that repeatedly appear.

(04-09-2016, 03:23 AM)ThomasCoon Wrote: You are not allowed to view links. Register or Login to view.That is a good question. I think statistics would be the answer. If l is not supposed to function with <o>or <a>, why does it almost always appear with <o> or <a>?

I agree with the particular assessment about the ol and al combinations. I came to a similar conclusion in my own theory. However, I think it's a stretch to apply this principle to every glyph in the text.

If statistics is the answer, then I have an answer. For what it's worth, I once made an algorithm to find the most significant glyph combinations in the entire corpus. I don't remember how I defined "significant" here, but it was something more interesting than just identifying the most common ones. I do know that it considered the principle of "we can also see where each character appears in relation to each other - and this is usable data". This was not part of an approach that tried to delineate all text into combinations like yours, so not everything can be covered. Briefly, the resulting combinations are in the list below, sorted by cluster.
  • qo
  • cTh, cKh, cPh, te, tch, kch, kee, pch
  • ee
  • al, ar, ol, or, dy, ey, eey
  • ai, aii
  • am

This list is rather different from yours.
Just for the record, I held this finding to describe the text properties but not explain them.

(04-09-2016, 03:23 AM)ThomasCoon Wrote: You are not allowed to view links. Register or Login to view.the majority seem completely unrestrained to certain word positions (except for the ones like <qo> you mentioned - but for that see point 7).

That's not true. The strong relationship between glyphs and word position has been noted and analysed many times.
For example, your unit 18 aiin appears almost always at the end of words. Why? Why do we not frequently see other ordered combinations of unit 18 like aiinol, aiinkk or chaiinaiin? If all word spaces are fake as you suggest, there is still the question for the system: Why do certain units like aiin[font=Arial], but not others, [/font](almost) always precede the fake word space?

(04-09-2016, 03:23 AM)ThomasCoon Wrote: You are not allowed to view links. Register or Login to view.the <m> character apears in the exact same places that we see <y>: most importantly after <d>. Anyone who has spent time with the text knows that <dy> is everywhere, but we also have <dm>. I believe they might be the same character.

That's not true. m appears mostly after a.

There are lots of characters/combinations that appear in the same places that other characters/combinations appear. That applies to the Voynich Manuscript and natural languages. In English, consider vowels, or the word endings "-ing" and "-ed". Why would that mean that they're the same thing?
In my list shortly above, the combinations in each cluster replace each other in the same context quite often (that's why they're in the same cluster). However, from that I would not conclude that they're the same. It's just not convincing enough for such a definite conclusion.

(04-09-2016, 03:23 AM)ThomasCoon Wrote: You are not allowed to view links. Register or Login to view.Regarding #9 and #10, those are just ways to incorporate the gallows, which only appear inconsistently in the text (always in the first line of a paragraph) which means they are likely a variant of another character. No natural language puts certain sounds only at the beginning of a string of speech, so they must be another letter in the text just written differently.

The gallows don't always appear in the first position in the line. I would link to Job's query engine to demonstrate this, but I can't find it (you can tell this is a common problem for me...)

There has already been a paper* analysing the hypothesis that the gallows characters might be capital letter equivalents of other characters. If the gallows characters are indeed capitals of other characters, which ones are they? Which ones can they replace without affecting the text statistics? The result was that the gallows characters just replace each other. So either they are capitals/alternate forms for each other, or that type of relationship is simply not applicable here. The former fits your idea that these are really the same "unit", but note that just about every glyph exhibits the same behaviour (some examples are the combinations in my list above), and it would be reductive to say that those must all be the same unit too.

*Morningstar, J. B. (2001). Gallows Variants as Null Characters in the Voynich Manuscript. Chapel Hill, NC, USA: University of North Carolina.

(04-09-2016, 03:23 AM)ThomasCoon Wrote: You are not allowed to view links. Register or Login to view.In the text, <m> looks like <y> with an extra loop at the top - I actually never realized this before trying to figure out what <m>'s function was.

(04-09-2016, 07:45 PM)ThomasCoon Wrote: You are not allowed to view links. Register or Login to view.But m appears in many of the same places as y and also looks similar to it in the text - and this is something I only realized after trying to figure out what combinations y could join in.

I hate to self-advertise, but my (admittedly controversial) You are not allowed to view links. Register or Login to view. offers an alternative explanation for this visual construction. Section 1.2 is the bit relevant to your comment here.

(04-09-2016, 07:53 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.That last point I think is important in order to advance, namely to clearly separate the analytical part (how to separate the text into basic units) and the speculative part: what it means (cipher or language?, which language?).

I agree with Rene's statement. In your response, you mix these two things up a lot. For example in the below quote:

(04-09-2016, 03:23 AM)ThomasCoon Wrote: You are not allowed to view links. Register or Login to view.I admit that the possibility of reversing some combinations will bring up the number of possible units, but regardless whether <ar> or <ra>, for example, both should still only correlate to one plaintext Latin letter. So it's not as if I'm devising a really loose system which will spell anything I want to read into the text - I've seen the pitfalls of that in other "decryptions"! Smile

I was not talking about decryption there. By "fitting" I was simply referring to the first step of the analysis, which is proving that the manuscript's text can be decomposed into your list of units with a high success rate. If the word formation system is so loose and has such a large list of units (>60, not just 26) then it is easily possible for them to fit by chance.

I found You are not allowed to view links. Register or Login to view. that I was referring to. I think the concern about tightness/coverage near the end applies especially to this point.

(04-09-2016, 01:48 AM)ThomasCoon Wrote: You are not allowed to view links. Register or Login to view.Well, I actually think that the spaces in text are completely fake. I think that he wrote large words (ex. "vocabulary") into two words: ("vocab-ulary").

(04-09-2016, 05:17 PM)ThomasCoon Wrote: You are not allowed to view links. Register or Login to view.I believe that many ch (ee) and other 2-letter combinations are hidden: for example, one "c" is at the end of a word while the next "c" is at the beginning of the following word.

(04-09-2016, 07:44 PM)-JKP- Wrote: You are not allowed to view links. Register or Login to view.Thomas, just so you know, when I was commenting on unique word-tokens, I wasn't talking specifically about your line of thought but in general.

Taking the approach of deconstructing the text and looking across spaces to find the patterns naturally leads to going in the other direction as well... of putting in spaces where they might be intentionally missing.

(04-09-2016, 08:44 AM)Koen Gh. Wrote: You are not allowed to view links. Register or Login to view.There is more to the structure of Voynichese words, and I'm not sure if it can be explained just by word breaks.

I also agree with this sentiment. If you can ignore arbitrary word breaks whenever it is convenient to fit your system, this further increases the looseness of the system, and hence makes it more likely that they fit by chance.

(04-09-2016, 04:54 PM)Anton Wrote: You are not allowed to view links. Register or Login to view.Brian, nice to see you on the forum again.

No problem, Anton.  Smile
(04-09-2016, 07:45 PM)ThomasCoon Wrote: You are not allowed to view links. Register or Login to view.Statistically some characters cannot be the same because they appear in very different circumstances. l for example appears in completely different environments than y so I really doubt they are the same. But m appears in many of the same places as y and also looks similar to it in the text - and this is something I only realized after trying to figure out what combinations y could join in.

One has to be careful with such considerations, because it can equally be argued that two shapes that appear in completely different environments can represent the same character, and the two forms are to be used in these different environments as a result of some rule.
This happens with 's' for example (special form at end of word), or with 'r' (r rotunda following a round shape).

The two forms of s don't look at all alike, while the two forms of r have some similarity.
Pages: 1 2 3 4 5 6