The Voynich Ninja

Pages: 1 2 3 4

If we assume that the Voynich text is linguistic in nature, what alternatives are there to concluding that the script is an alphabet?

It is often considered that you can split most scripts up by the way they work into one of three groups:
1. alphabets where characters represent individual sounds;
2. syllabaries where character represent whole syllables; and
3. logographic systems where characters represent whole words.

The size of the character set is often taken to be diagnostic: alphabets have from 20 to 50 characters; syllabaries have from 50 to 100 characters; have logographic systems have many hundreds or thousands of characters. The number of characters thus reflects the number of underlying items which the characters represent: language have more words than they have syllables, and more syllables than they have sounds. Knowing that the Voynich script has 20-25 characters, we can rule out the language having only 25 words or 25 syllables, whereas 25 sounds is realistic.

Of course, not all characters must represent sounds. Some may be punctuation or ideographic. Also, some characters may represent more than one sound depending on the context. But the basic principle of any given instance of a character standing for a single sound should be mostly good. The question whether the script fully represents all kinds of sounds (that is, an abjad without vowels) would be unanswered, but abjads are still a kind of alphabet.

We might also have the count of characters wrong, however, with there being more or fewer distinction than we currently make. But there is no way a decrease in the number of characters would make an alphabet less likely, and any increase would have to be substantial before we could being to consider syllabaries.

What alternatives are there, if we believe the characters are a script to represent language, other than the script being alphabetic in nature? I'm genuinely looking for evidence to shake an assumption I've had for a long time.

(Note: I understand that the linguistic nature of the text is an assumption, but that's not the topic of this thread.)

Numbers (which usually further map to an alphabet).

I think it's an alphabet that's been made a bit harder to read by using two forms for several letters. But actually that's what we do as well with lower and upper case - you could say our alphabet is a double set of characters as well. So that would still make it an alphabet, I guess, even though some might call it a cipher.

It could also be an alphabet mixed with a large number of abbreviations, but that would still make it an alphabet.

I think to really get out of the realm of alphabets, you'd have to assume a fully coded text, where for example each character stands for a step in a process (cfr Don's theory).

(10-03-2016, 08:12 PM)Koen Gh. Wrote: You are not allowed to view links. Register or Login to view.I think it's an alphabet that's been made a bit harder to read by using two forms for several letters. But actually that's what we do as well with lower and upper case - you could say our alphabet is a double set of characters as well.

This would make the number of characters in the VMS alphabet very small.

JKP: agreed, and that is problematic. I'm still working on it, so everything is subject to change, but these are some ways I would compensate for that:

1) Allow several glyphs to represent two or more closely related sounds, like /l/ and /r/ (that's one I'm fairly certain about).
2) Some glyphs are "single", so it's not totally divided by two.
3) I see the bench glyphs as ligatures. Various "hats" on the benches represent various sounds. There are two or three different "hats" and one standard, which turns this into three or four ligatures.
4) A large number of vowels is not required for every language, just look at Spanish for example.
5) This one is not easy to explain, I hope I can make clear what I mean. I think at least part of the manuscript contains a transcription of Indic languages for Greek speakers (or other "outsiders" who were familiar with Greek). When transcribing a language based on sound alone ("sounds like...") I think you'll often lose some sounds. Kind of like the English dental fricative "th" is just not heard by people who don't have it in their own language. Here people hear "th" as /t/ or /d/, so if they would transcribe English (and if "th" was a single glyph), this glyph would be lost.

(10-03-2016, 08:46 PM)Koen Gh. Wrote: You are not allowed to view links. Register or Login to view.JKP: agreed, and that is problematic. I'm still working on it, so everything is subject to change, but these are some ways I would compensate for that:

3) I see the bench glyphs as ligatures. Various "hats" on the benches represent various sounds. There are two or three different "hats" and one standard, which turns this into three or four ligatures.

I think it's possible that the bench glyphs are ligatures, either that or there are two extra characters, rarely used, that look like the beginning and end of the bench (usually straddling a gallows character).

The problem with that pesky bench char is that the ones with the cap and the ones without appear to behave in the same way, as do the chars with variously shaped caps, which makes it difficult to ascertain whether they are intended to be read differently.

Glad to see I'm not the only one utterly annoyed by the bench Big Grin

So this has been specifically tested? They all behave exactly the same?

Now on the other hand, gallows are a bit like capitals and they always go near the front. And another set of double glyphs I see are vowels with a flourish (EVA a - y is the most common and certain one), which are very end-loving. If these characters are mostly claiming those specific places, it will also have as a result that the "core" glyphs, being the single ones and the bench ligature, get a bit less freedom of movement and appear to behave more similarly to each other. Does that make sense?

(10-03-2016, 07:27 PM)Emma May Smith Wrote: You are not allowed to view links. Register or Login to view.If we assume that the Voynich text is linguistic in nature, what alternatives are there to concluding that the script is an alphabet?

It is often considered that you can split most scripts up by the way they work into one of three groups:
1. alphabets where characters represent individual sounds;
2. syllabaries where character represent whole syllables; and
3. logographic systems where characters represent whole words.

The size of the character set is often taken to be diagnostic: alphabets have from 20 to 50 characters; syllabaries have from 50 to 100 characters; have logographic systems have many hundreds or thousands of characters. The number of characters thus reflects the number of underlying items which the characters represent: language have more words than they have syllables, and more syllables than they have sounds. Knowing that the Voynich script has 20-25 characters, we can rule out the language having only 25 words or 25 syllables, whereas 25 sounds is realistic.

Off the top of my head here some possible ways to encode information using a character set. I am not endorsing any of them, just offering them as examples.
1. Symbolic (maybe mixed logographic?) - a key part of the string is the word and everything else is modifiers. For example 1+Gbos could encode the word cow used as an adjective (the 1 for part of speech, the + to indicate that it has a positive connotation as used in this sentence, the G to indicate feminine and bos as the base word being modified), bull would be 1+Mbos. So that would make a this guessable -- 1-Mbos 2+Mpoo. In theory 2+Gbos could be fresh young coriander if 2 indicates nouns, + means fresh or new for nouns, G indicates a plant, b indicates a subset of plant types like medicinal herbs and os indicates the one that corresponds to coriander. The number of usable symbols is very large because character order and a rule set means that one character may play many roles. Systems like this are used in knowledge representation; they have the advantage that you can be very specific about a piece of information in a small amount of space. The bad thing is that while the encoding system is always logical the logic isn't always obvious.

2. Mixed phonological representation where some characters always represent syllables and other characters always represent a single sound.

3. Mixed symbolic where you have a phonological representation but special characters or rules can change the meaning without changing pronunciation or symbol order. God and god in English is a trivial example of this.

There are probably a lot more systems and combinations of systems possible. Whatever is going on in VMS it's not trivial.

Crezac, About your point 3 - I've read the idea before that the gallows are like determinatives in Hieroglyphs - they determine the category the following symbols will belong to. These are mostly used in hieroglyphic scripts because they help to massively reduce the number of required glyphs.

They could be used in mixes systems as well though, like in earlier forms of Coptic.

(10-03-2016, 09:47 PM)crezac Wrote: You are not allowed to view links. Register or Login to view.Off the top of my head here some possible ways to encode information using a character set. I am not endorsing any of them, just offering them as examples.
1. Symbolic (maybe mixed logographic?) - a key part of the string is the word and everything else is modifiers. For example 1+Gbos could encode the word cow used as an adjective (the 1 for part of speech, the + to indicate that it has a positive connotation as used in this sentence, the G to indicate feminine and bos as the base word being modified), bull would be 1+Mbos. So that would make a this guessable -- 1-Mbos 2+Mpoo. In theory 2+Gbos could be fresh young coriander if 2 indicates nouns, + means fresh or new for nouns, G indicates a plant, b indicates a subset of plant types like medicinal herbs and os indicates the one that corresponds to coriander. The number of usable symbols is very large because character order and a rule set means that one character may play many roles. Systems like this are used in knowledge representation; they have the advantage that you can be very specific about a piece of information in a small amount of space. The bad thing is that while the encoding system is always logical the logic isn't always obvious.

2. Mixed phonological representation where some characters always represent syllables and other characters always represent a single sound.

3. Mixed symbolic where you have a phonological representation but special characters or rules can change the meaning without changing pronunciation or symbol order. God and god in English is a trivial example of this.

There are probably a lot more systems and combinations of systems possible. Whatever is going on in VMS it's not trivial.

I think 1) would be more a question of the underlying message. What you're suggesting is more like a code or artificial language. I'm not wholly against the idea of it, thought I fear that it would prove almost impossible to crack. I wonder how we could prove that your suggestion was true, or at least as plausible as alternatives?

For 3) linguistic analysis should still be possible, at least on the phonological level.

Pages: 1 2 3 4

Emma May Smith

-JKP-

Koen G

-JKP-

Koen G

-JKP-

Koen G

crezac

Koen G

Emma May Smith