The Voynich Ninja
Syllabification - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: Syllabification (/thread-201.html)

Pages: 1 2 3 4 5 6 7


RE: Syllabification - Emma May Smith - 07-03-2016

(07-03-2016, 12:04 AM)Anton Wrote: You are not allowed to view links. Register or Login to view.The "pronouncibility" of the EVA transliteration is mere phantom,...

As I understand it, EVA was designed to be pronounceable. Of course not all speakers of all languages can pronounce it, but obviously unpronounceable clusters were avoided if they were common and there is a good variation of vowels and consonants. I don't know how much insight the devisers of EVA were working with, but by seeking to make it pronounceable they may well have stumbled upon a fundamental principle of language: that sounds are ordered in a particular way which is common across all languages. Some languages allow for a more complex ordering, some less, but the broad principles are the same. The assignment of vowel and consonant characters to actual vowels and consonants could be a result of this.


RE: Syllabification - ReneZ - 07-03-2016

The purpose of every transcription alphabet that has ever been defined for the Voynich MS is to convert the handwritten text into anything that can be parsed and analysed by computer software. It is not meant in any way towards a 'translation'.

The 'best' alphabet available at the time when it was defined was Frogguy, but its very frequent use of the quote character made it quite suboptimal for computer parsing. To improve this, the Eva character set was selected as a set of 26 contiguous ascii codes.

One of the big problems with the Voynich script is that one cannot be sure which combinations of strokes are intended to form a single character. The gallows embedded in Eva-ch is a specific case. Furthermore, people tended to treat consecutive strokes of 'i' as a single characters, but consecutive strokes of 'c' as multiple. This may well be correct but we don't know. To avoid all this, Eva (like Frogguy) does not make any assumptions and the 'building of characters' is left as a next step to anyone interpreting the text.

An accurate transcription requires an alphabet of much larger size. Extended Eva and GC's v101 both go into this direction, but primarily to represent the rare characters. To 'do it right' one would need some combined superset of Eva and v101, such that one could translate all existing transcriptions into this new alphabet without loss, and eventually make a new 'super-transcription'. This would leave an additional very big problem to the interpreter: deciding which slightly different shapes represent the same or different characters.

One advantage of Eva is that it is very easy to learn and allows easy communication in fora like this. The big risk is that people may take it as an interpretation of the text which, as Anton correctly says, should not be done. It was never its purpose.


RE: Syllabification - Sam G - 08-03-2016

(07-03-2016, 12:04 AM)Anton Wrote: You are not allowed to view links. Register or Login to view.Even if one puts apart the cipher theory, I am afraid that no syllabification is possible without our understanding of the alphabet. There is no confirmation that any of the transcription alphabets, EVA included, accurately represent the real alphabet adopted by the author.

We can't know it perfectly of course, but the distinction between consonant and vowel does seem quite clear.

Quote:This fact suggests nothing. A person wishing to conceal his message could well use these letters to represent consonants, for an additional layer of obscurity.

Everything about the script suggests that it was intended to emphasize the structure of the text, not obscure it.

Quote:Besides, EVA e is not like Roman "e". It is like Roman "c".

And "c" is an "e" without the crossbar.  And there's a pretty clear reason why the scribe omitted the crossbar - the whole system of straight-stroke and curved-stroke letters which follow <a> and <e> respectively.

Quote:If the text is abbreviated, then single characters would represent character blocks, like 9 (EVA y) represented "us" in the end of the word and "con" in the beginning of the word in medieval Latin documents.

Among other problems, there aren't enough different glyphs for the text to be abbreviated Latin or anything else, although the shapes of the letters do derive from symbols used in Latin abbreviation (and the Roman alphabet).

Quote:
Quote:Really, the fact that EVA transliteration makes the text basically "pronouncible", as would likely any other transliteration scheme that mapped <a>, <e>, <o>, and <y> to vowels and the other letters to consonants (and considered <i> as a modifier), is by itself strong evidence that its implicit assignment of consonant and vowel status is basically correct.

The "pronouncibility" of the EVA transliteration is mere phantom, partly because the transcription is not fully matched to the Latin alphabet (e.g. substitute "c" for EVA e, as indicated above, and you will lose this pronouncibility at once),

The fact that there's a mapping that makes the text pronounceable at all is significant.  If you don't think so, try finding a simple letter-for-letter mapping to make, say, the Beale Ciphers pronounceable.

The fact that the pronounceable mapping preserves the obvious consonant/vowel distinction in the Latin-derived VMS script is also significant.

Quote:and partly because EVA is Latin-alphabet centric - while there is no confirmation that the Latin alphabet was the basis for the Voynichese script. For example, characters like a, c, i, o are found in the Cyrillic alphabet, characters l, d , r, y, q are like Arabic digits, and the rest of the characters are not found in the Latin alphabet at all.

I'd say that being Latin-centric and excess focus on EVA is the worst approach for those who wish to explore the plain text  language path. EVA has quite little to do with the real Voynichese alphabet, and absolutely nothing with the Voynichese language (if any).

You are now contradicting what you wrote above, about EVA <y> deriving from Latin abbreviations.  I think it's been well-established for a long time, and is obvious to begin with, that the VMS script derives from medieval Latin abbreviations and from the Roman alphabet, and that there is really no need to look further afield for the origins of the shapes of the letters.  The tables in D'Imperio show this well enough.  

The way that these letter shapes are used is different and the shapes have been tailored somewhat to fit the structure of the text and to produce an internally consistent system.  I think this aspect of the VMS is pretty well-established at this point.

Quote:
Quote:Second, the entropy is too low.

As I noted in another thread, it is technically not reliable to speak of (character) entropy in respect to a written language when we don't know what is that language's alphabet.

We know the alphabet well enough to show that the entropy is going to be low no matter how you combine or split the glyphs.  The low entropy is really just telling you how rigid the phonotactic structure (i.e. rules governing how the glyphs may be combined to form words) is, and this can be understood without using math at all.


RE: Syllabification - Anton - 08-03-2016

Quote:We can't know it perfectly of course, but the distinction between consonant and vowel does seem quite clear.

Could you please explain how this can be clear if we don't understand a single word in Voynichese? I'm aware that there are scientific algorithms that allow one to "detect" vowels and consonants in an unknown text, and I vaguely remember that some of these methods were applied to Voynichese, although I don't remember with what result (maybe you could remind me).

The problem, however, in any case, is that when you apply these methods to detect vowels and consonants, it is assumed that the alphabet adopted in the text is known and determinate. In the VMS case, we are not sure of the alphabet. (The circumstance that different transcription alphabets have been adopted by different researchers over time is the best illustration of this fact). For example, we are not certain if EVA a is really a single character, or it is a succession of c and i; whether ir or in are bigrams or single characters, and so forth. (The post by lelle above is really a striking point, in my opinion). Different alphabets would yield different conclusions about vowels and consonants, hence I doubt that we can be certain of them, as of the present moment.

Quote:Everything about the script suggests that it was intended to emphasize the structure of the text, not obscure it.

The structure (morphology, regularity and patterns) is not the meaning (the message). While the structure may be more or less evident (and my opinion rather shifts for "less" here), the message still stands obscured.

Quote:You are now contradicting what you wrote above, about EVA <y> deriving from Latin abbreviations. I think it's been well-established for a long time, and is obvious to begin with, that the VMS script derives from medieval Latin abbreviations and from the Roman alphabet, and that there is really no need to look further afield for the origins of the shapes of the letters.  The tables in D'Imperio show this well enough

My comments do not represent a self-consistent system (since the natural language hypothesis is not the working one for me); I rather tried to emphasize the points that I consider weak in your considerations, with ready counter-arguments.

As for the derivation of some of the VMS symbols from Latin abbreviations - this is likely, but it is by no means "established". There is a good deal of researchers who would prefer to refrain from this derivation. Actually, if we consider EVA y, this is a shape of the Arabic digit "nine". I don't know if the Latin abbreviation symbol was derived from the digit or it is a standalone invention, but it is certainly not "established" whether the VMS script inherits the abbreviation or the digit. In my opinion, this is the former case, where the y shape is actually comprised of the c with the tail modifier (as suggested by Currier and recently revived by Cham), so as to mask the real glyph composition behind the "well-known" abbreviation symbol. But this is nothing more than a working hypothesis and it is simply not scientifically correct to dub it "established". In science, "established" means proven, consistently reproducible and independently verifiable. In fact, there are too few really established things about the Voynich Manuscript.

Quote:We know the alphabet well enough to show that the entropy is going to be low no matter how you combine or split the glyphs.  The low entropy is really just telling you how rigid the phonotactic structure (i.e. rules governing how the glyphs may be combined to form words) is, and this can be understood without using math at all.

Entropy in information theory is mean information per message. This is just the definition of entropy. If we speak of character entropy (which is constantly referred to be "low" for Voynichese, in contrast to its vord entropy which is comparable to that of the natural languages), that is mean information per character. I'm sorry that I am not acquainted with the linguistic implications of entropy (linguistics is not my field at all), but what is for sure is that character entropy is alphabet-dependent, just by definition. Hence, it cannot be low no matter of how we combine the glyphs. I agree that exercises in glyph combinations that would raise the character entropy of Voynichese to the linguistically "normal" (Hawaiian apart Wink ) level would make the Voynichese look even more strange, yet I would be interested in seeing calculations of entropy with different "glyph combination levels".


RE: Syllabification - Sam G - 08-03-2016

(08-03-2016, 07:32 PM)Anton Wrote: You are not allowed to view links. Register or Login to view.
Quote:We can't know it perfectly of course, but the distinction between consonant and vowel does seem quite clear.

Could you please explain how this can be clear if we don't understand a single word in Voynichese?

Okay... let's start with this: do you honestly believe that every ciphertext in existence has a glyph-for-glyph mapping onto the Roman alphabet that makes it pronounceable?  Because this is what you and others seem to be arguing here, and if you seriously believe this to be true then I don't really know what I'm supposed to say.

Quote:
Quote:Everything about the script suggests that it was intended to emphasize the structure of the text, not obscure it.

The structure (morphology, regularity and patterns) is not the meaning (the message). While the structure may be more or less evident (and my opinion rather shifts for "less" here), the message still stands obscured.

That you don't understand it doesn't mean it was the author's intention to make it obscure.  All texts in languages that you can't read are obscure to you - that says nothing about the intentions of the people who wrote them.

Quote:As for the derivation of some of the VMS symbols from Latin abbreviations - this is likely, but it is by no means "established".  There is a good deal of researchers who would prefer to refrain from this derivation. Actually, if we consider EVA y, this is a shape of the Arabic digit "nine". I don't know if the Latin abbreviation symbol was derived from the digit or it is a standalone invention, but it is certainly not "established" whether the VMS script inherits the abbreviation or the digit. In my opinion, this is the former case, where the y shape is actually comprised of the c with the tail modifier (as suggested by Currier and recently revived by Cham), so as to mask the real glyph composition behind the "well-known" abbreviation symbol. But this is nothing more than a working hypothesis and it is simply not scientifically correct to dub it "established". In science, "established" means proven, consistently reproducible and independently verifiable. In fact, there are too few really established things about the Voynich Manuscript.

Well, you're free to believe what you want of course.  I've seen enough evidence to show that the VMS script is, broadly speaking, Latin-based, and we can throw Arabic digits in with "Latin" if you like, which makes the argument about EVA <y> irrelevant.  The point is that the script was clearly devised by someone who knew how to write Latin, a point which I'm not sure if you're actually disputing, or if you're just trying to nitpick.


RE: Syllabification - Anton - 08-03-2016

Quote:Okay... let's start with this: do you honestly believe that every ciphertext in existence has a glyph-for-glyph mapping onto the Roman alphabet that makes it pronounceable?  Because this is what you and others seem to be arguing here, and if you seriously believe this to be true then I don't really know what I'm supposed to say.

Actually I did not thought about it, but this is an interesting question. To answer it, we in the first place need to introduce some language-independent measure of pronounceability. Don't know if it exists (as I said, I'm zero in linguistics), but offhand it seems to me that pronounceability depends largely on the pronounciation rules of the language of choice (or perhaps it will be more appropriate to speak of language families here?) - what is pronounceable in one language may be not pronounceable in another language using the same alphabet.

As for ciphers, at least all substitution ciphers would have glyph-for-glyph pronounceable mapping onto the Roman alphabet. That's just trivial. Beyond substitution, I would not venture to assert anything, but offhand I don't see any reasons why at least a subset of ciphers may not have at least one "pronounceable" mapping (even however vague be the notion of "pronounceability").

Quote:That you don't understand it doesn't mean it was the author's intention to make it obscure.  All texts in languages that you can't read are obscure to you - that says nothing about the intentions of the people who wrote them.

That's true, but that does not waive the possibility of that intention. While I suggest to take the possibility of deliberate obfuscation of the message by means of these "pseudo-vowels" as one of the options for consideration, you prefer to waive it altogether. Actually you take them (the pseudo-vowels) at face value just because of their shape. What's the proof that they are vowels indeed? Little as I know of linguistics, I can give you al least one example offhand when the shape is disguising. The "hard sign" letter of the Russian alphabet is currently used as the division between a consonant and the subsequent vowel, but roughly until XII century it represented a vowel.

Quote:The point is that the script was clearly devised by someone who knew how to write Latin, a point which I'm not sure if you're actually disputing, or if you're just trying to nitpick.

No I am not disputing this (mailnly because this is supported by You are not allowed to view links. Register or Login to view. marginalia), but there are researchers who are. I won't speak for Bax or O'Donovan or others (let them defend their points of view themselves). What I mean is that scientific discourse should be based on criteria of scientific truth, not on assertions like "this is clear" or "this is evident" or "thus spake D'Imperio". What is "clear" for one researcher may be not "clear" for another - and the lack of consensus upon essential aspects of the VMS amongst different researchers shows that there are too few things that are really clear - that means, commonly acknowledged.


RE: Syllabification - -JKP- - 09-03-2016

(08-03-2016, 09:26 PM)Sam G Wrote: You are not allowed to view links. Register or Login to view....  The point is that the script was clearly devised by someone who knew how to write Latin, a point which I'm not sure if you're actually disputing, or if you're just trying to nitpick.

The script was clearly devised by someone who was familiar with Latin scribal conventions.
  • As in Latin, the "9" char is usually at the end, sometimes at the beginning, and only occasionally elsewhere.
  • As in Latin, the Eva-s are Eva-r are sometimes in a longer word and sometimes alone, between words. Even though single chars are otherwise uncommon in the VMS, these two shapes behave more-or-less as they do in Latin.
  • As in Latin, the shape that looks like a bench is sometimes broken into two shapes that stand alone.
  • As in Latin, the terminal shape in a word-token sometimes has a tail stretching up and back over the last glyph.
But, the script does not otherwise behave like Latin. While the variability of the letters may match or be similar to a number of natural languages in a purely numeric sense, the problem is that the VMS imposes strict rules on the position of letters in a word that are different from most natural languages, including Latin and most Germanic languages. Which makes me wonder whether the Latin conventions are used to represent similar conventions in a language of a less variable structure or, if they were added to make it superficially look like Latin (as a ruse or as a comfortable way to write something less familiar in a familiar system).


RE: Syllabification - Sam G - 09-03-2016

(08-03-2016, 10:59 PM)Anton Wrote: You are not allowed to view links. Register or Login to view.
Quote:Okay... let's start with this: do you honestly believe that every ciphertext in existence has a glyph-for-glyph mapping onto the Roman alphabet that makes it pronounceable?  Because this is what you and others seem to be arguing here, and if you seriously believe this to be true then I don't really know what I'm supposed to say.

Actually I did not thought about it, but this is an interesting question. To answer it, we in the first place need to introduce some language-independent measure of pronounceability. Don't know if it exists (as I said, I'm zero in linguistics), but offhand it seems to me that pronounceability depends largely on the pronounciation rules of the language of choice (or perhaps it will be more appropriate to speak of language families here?) - what is pronounceable in one language may be not pronounceable in another language using the same alphabet.

As for ciphers, at least all substitution ciphers would have glyph-for-glyph pronounceable mapping onto the Roman alphabet. That's just trivial. Beyond substitution, I would not venture to assert anything, but offhand I don't see any reasons why at least a subset of ciphers may not have at least one "pronounceable" mapping (even however vague be the notion of "pronounceability").

Well, maybe you should look into some cipher mechanisms and see which ones have pronounceable output comparable to that which we see with the VMS. Because beyond simple substitution ciphers (for which finding a way of making them pronounceable is generally to go over halfway toward solving them), and a few other contrived types of ciphers (e.g. verbose cipher) that can be excluded for other reasons, outputs of cipher mechanisms aren't pronounceable.  If you want to dispute this, then please provide examples.  And if you can't provide examples of such ciphers then I don't see how you can assert that the pronounceablility of the VMS text is meaningless.  Of course the pronounceability is, all by itself, already a very strong indicator that we are simply looking at a natural language text written in the plain, not at a ciphertext.

Quote:
Quote:That you don't understand it doesn't mean it was the author's intention to make it obscure.  All texts in languages that you can't read are obscure to you - that says nothing about the intentions of the people who wrote them.

That's true, but that does not waive the possibility of that intention.

Which isn't my argument.  I wrote above: Everything about the script suggests that it was intended to emphasize the structure of the text, not obscure it. You responded to that with the non-sequitur about the meaning remaining obscure, and I was just addressing that.

Quote:While I suggest to take the possibility of deliberate obfuscation of the message by means of these "pseudo-vowels" as one of the options for consideration, you prefer to waive it altogether. Actually you take them (the pseudo-vowels) at face value just because of their shape. What's the proof that they are vowels indeed?

The proof that they are indeed vowels is that assuming them to be vowels makes the text pronounceable.  If it were a cipher that randomly assigned consonants to vowel-like shapes then we shouldn't see this property, which can be easily demonstrated by looking at the unpronounceable output of simple substitution ciphers of English or other European languages.

Quote:
Quote:The point is that the script was clearly devised by someone who knew how to write Latin, a point which I'm not sure if you're actually disputing, or if you're just trying to nitpick.

No I am not disputing this (mailnly because this is supported by You are not allowed to view links. Register or Login to view. marginalia),

Which is probably not the best way to go about understanding the origins of the script when you consider the possibility that the VMS is a copy made by someone who did not create the original.  But if you at least agree that the scribe who created the VMS script knew Latin, then it seems impossible to deny that the apparent vowels in the VMS script are at least intended to resemble those of the Roman alphabet, a point which you seemed to not fully accept above, by disputing that the shapes of these letters are of any relevance.

Quote:but there are researchers who are. I won't speak for Bax or O'Donovan or others (let them defend their points of view themselves). What I mean is that scientific discourse should be based on criteria of scientific truth, not on assertions like "this is clear" or "this is evident" or "thus spake D'Imperio". What is "clear" for one researcher may be not "clear" for another - and the lack of consensus upon essential aspects of the VMS amongst different researchers shows that there are too few things that are really clear - that means, commonly acknowledged.

Well, I wonder what could possibly constitute 100% proof of the origins of the VMS script in your mind.  Maybe you could elaborate on these "criteria for scientific truth".  Obviously we can't redo the creation of the VMS in a laboratory experiment.  I see no problem using terms like "clear" and "obvious" because in my view that's as good as it can realistically get with historical questions like this.  We can amass all the evidence in the world, but at the end of the day everyone is still free to believe whatever he or she wants.

(09-03-2016, 02:26 AM)-JKP- Wrote: You are not allowed to view links. Register or Login to view.
(08-03-2016, 09:26 PM)Sam G Wrote: You are not allowed to view links. Register or Login to view....  The point is that the script was clearly devised by someone who knew how to write Latin, a point which I'm not sure if you're actually disputing, or if you're just trying to nitpick.

The script was clearly devised by someone who was familiar with Latin scribal conventions.
  • As in Latin, the "9" char is usually at the end, sometimes at the beginning, and only occasionally elsewhere.
  • As in Latin, the Eva-s are Eva-r are sometimes in a longer word and sometimes alone, between words. Even though single chars are otherwise uncommon in the VMS, these two shapes behave more-or-less as they do in Latin.
  • As in Latin, the shape that looks like a bench is sometimes broken into two shapes that stand alone.
  • As in Latin, the terminal shape in a word-token sometimes has a tail stretching up and back over the last glyph.

Well, I basically agree with this, but if someone wants to say that all these points and many others that can be raised are just a meaningless coincidence, then what are you supposed to do?  I feel like that's where I'm at with pointing out the obvious similarity in the vowel shapes.

Quote:But, the script does not otherwise behave like Latin. While the variability of the letters may match or be similar to a number of natural languages in a purely numeric sense, the problem is that the VMS imposes strict rules on the position of letters in a word that are different from most natural languages, including Latin and most Germanic languages. Which makes me wonder whether the Latin conventions are used to represent similar conventions in a language of a less variable structure or, if they were added to make it superficially look like Latin (as a ruse or as a comfortable way to write something less familiar in a familiar system).

My view, which is somewhat similar to the second of the options that you present, is that the VMS script was created by a Latin scribe for some non-European language that he encountered and wanted to be able to write down.  He adapted the writing system(s) he was familiar with to writing this new language and to make a script that reflected its peculiar structure.  That just seems to be what the evidence of the VMS itself indicates is correct.


RE: Syllabification - Anton - 09-03-2016

Quote:If you want to dispute this, then please provide examples.

Most basic example which comes into my mind is like this. Let's take a simple plain text:

Code:
this is a cipher message

Represent it in binary form, e.g. we could use ASCII for that purpose:

Code:
01110100 01101000 01101001 01110011 00100000 01101001 01110011 00100000 01100001 00100000 01100011 01101001 01110000 01101000 01100101 01110010

Substitute Latin "o" for zeroes and Latin "l" for units:

Code:
ollloloo ollolooo ollolool olllooll oolooooo ollolool olllooll oolooooo ollooool oolooooo olloooll ollolool ollloooo ollolooo olloolol ollloolo

Since each chunk is 8-bit fixed, let's adopt IPv6-notation-like technique (if you know what I mean) and substitute two or more consecutive "o"'s with an "a" (but this can be done only once in a chunk, to avoid ambiguity!):

Code:
olllola ollola ollolal olllall oola ollolal olllall oola ollal oola ollall ollolal ollla ollola ollalol olllalo

(Hope I did not make a typo anywhere, all this is a bit repetitive Smile )

We now have a look at a "language" that is a bit strange (only three letters), yet quite pronounceable nonetheless (I leave it aside that we did not discuss any objective measures of prononceability).  Cool But in effect it is no language, but a decryptable cipher. Anyone is welcome to apply considerations about pronounceability and Latin character shapes thereto Wink .  By the way, character entropy here would be quite low. And I think there can be observed some curious positional restrictions as well - like all words begin with "o", and "a " is mostly toward the end of the words.

For additional obfuscation, we could even introduce some more spaces according to some rule that will allow to keep the original spaces recoverable.

Quote:Which isn't my argument.  I wrote above: Everything about the script suggests that it was intended to emphasize the structure of the text, not obscure it. You responded to that with the non-sequitur about the meaning remaining obscure, and I was just addressing that.

Please note that I did not speak of obscuring structure, but of obscuring the message. You seem to be confusing the two.

Quote:Maybe you could elaborate on these "criteria for scientific truth".  Obviously we can't redo the creation of the VMS in a laboratory experiment.  I see no problem using terms like "clear" and "obvious" because in my view that's as good as it can realistically get with historical questions like this.  We can amass all the evidence in the world, but at the end of the day everyone is still free to believe whatever he or she wants.

I briefly touched those criteria above, such as consistency, proof, consistent reproducibility and independent verification. Beliefs are beliefs and knowledge is knowledge. If science were about beliefs, the Earth would have been flat until this very day Wink


RE: Syllabification - Sam G - 09-03-2016

This is a verbose cipher.  Like I said above, it's basically the only possibility for generating language-like (and low entropy) ciphertext.  The problem with it is that if such a cipher were used, then we should see many repeated sequences of several words in the ciphertext (as every instance of the same plaintext word would yield the same sequence of ciphertext words), and we don't see this in the VMS.  So it is not possible that the VMS is written with a verbose cipher.