The Voynich Ninja
Syllabification - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: Syllabification (/thread-201.html)

Pages: 1 2 3 4 5 6 7


RE: Syllabification - davidjackson - 09-03-2016

Quote: (Hope I did not make a typo anywhere, all this is a bit repetitive [Image: smile.png] )

Can you imagine writing that all out with a quill and parchment, by candelight? Big Grin


RE: Syllabification - Anton - 09-03-2016

(09-03-2016, 02:03 PM)Sam G Wrote: You are not allowed to view links. Register or Login to view.This is a verbose cipher... every instance of the same plaintext word would yield the same sequence of ciphertext words

Not exactly - note that in my example "oolooooo" can be reduced to "oola" (as I did), but also to "alooooo". Which would be two different ciphertext words conveying the same underlay character (i.e., space). If we use this trick not only for "o" sequences, but also for "l" sequences, introducing a fourth letter into the alphabet, the degree of variability will rise further.

***

Basically, what makes the ciphertext "unpronounceable" is the comparatively low number of letters signifying vowels in the alphabet as related to consonants. I think that if one uses an alphabet with roughly the same number of vowels and consonants, any ciphertext will be more or less "pronounceable".

Another cipher technique for making the unpronounceable ciphertext pronounceable would be just filling in "filler-vowels" according to a pre-defined pattern.

Yet another technique would be separately enciphering consonants and vowels of the plain text. I think that's the trick that would make any ciphertext pronounceable. Consider an example which is neither substitution, nor verbose. Don't know if it is a known cipher, but I just borrow it from the differential encoding used in the telecom world. Here, each subsequent character is not enciphered per se, but what is enciphered is rather the difference between it and the preceding character. E.g., the English alphabet runs: A, B, C, D etc. Let's assume we encode each line from scratch and the first word of the line is the word "odd". "O" is the first character of the line. It is enciphered just by its number in the alphabet - i.e., "15". The second letter is "D", which is 16 letters forward from "O" ("O" being counted itself). Thus "D" is enciphered with number "16". The third letter "D" is one letter distant from the second letter "D" (because we count the letter itself), hence it is coded with "1". We could well measure the distance beginning with 0, not with 1, but beginning with 1 is convenient for subsequent mapping of numbers to cipherext letters. In our example, 15 is "O" in the English alphabet, 16 is P, and 1 is A. So we get "OPA" as the ciphertext for "ODD".

Applied to the whole alphabet, such ciphertext would be unpronounceable for long phrases. What would make it pronounceable is considering two numbered rows - one for consonants (B, C, D, F...), another for vowels (A, E, I, O...) and enciphering consonants of the plain text with the first row and vowels of the plain text with the second row, using the same differential encoding as explained above. This way the phrase "this is a cipher", if I made no mistake, will be represented as "TNIM AB U KINSYL", which is quite pronounceable.


RE: Syllabification - Emma May Smith - 09-03-2016

In very broad terms, the more vowels you assign to characters the more pronounceable a text becomes. This is because vowels form the nucleus of syllables* and in most languages** vowel-only syllables are valid***. So long as you're happy with every vowel representing a syllable, even a string like eiuoaueioa can be pronounced.

The problems stack up as the ratio of consonants to vowels increases. As consonants are not typically the nuclei of syllables, any string of sounds must be split up into syllables according to the vowels available. Syllable parsing is often contestable, and different people may assign consonants to the end or beginning of neighbouring syllables depending on their viewpoint. But as the ratio of consonants increases so too does the number of sequential consonants occurring either side of vowel. These are consonant clusters.

Some languages--such as English--deal pretty well with consonant clusters, allowing three or even four consonants in a row: strengths is a canonical example, with three beginning consonants and four**** ending consonants around a lone vowel. Most languages don't allow such complex syllables to exist, with a word like twin being more typical of the most complex syllables allowed.

But the important thing is that no matter how complex syllables can be, they must adhere to a rule known as the sonority sequencing principle. Broadly, all sounds have a characteristic known as sonority, that vowels have the highest sonority of all sounds, that syllables should have a single peak of sonority, and that sonority within a syllable should increase and decrease regularly to and from that peak. In short, sonority is highest nearest the vowel.

So, if I tell you that /p/ has a lower sonority than /l/, which naturally has a lower sonority than the vowel /a/, we can compose valid syllables from these three sounds: /pal/, /lap/, /alp/, and /pla/; and the invalid /lpa/ and /apl/. For the first two sonority goes up and then down, with the peak at the vowel; for the next two sonority starts high and falls or starts low and climbs, but again with the vowel being the high point; but for the last two sonority drops then rises, with both /a/ and /l/ being peaks either side of /p/.

The principle is not hard and fast but it is a good guide. It is also a universal rule, governed by how humans speak rather than any given language. We internalize it as we learn to speak and can replicate it even without knowing of its existence. Thus when Rene and Landini made EVA pronounceable they were implicitly seeking out the sonority curve and applying sounds to it they knew would fit. The assignment of EVA characters is neither random nor, importantly, meaningless. Moreover, as the number of different characters within a text increases the only two options available to make that text pronounceable is either to make more of the characters vowels or to assign characters values that increasingly approximate the sonority hierarchy.


* Consonants can form the nucleus of syllables, but this is less common and typically marginal in any given language.
** A few languages don't permit vowel-only syllables. The Voynich candidate-darling Hawaiian is supposedly one, but I think this is wrong.
*** That is, phonologically valid. Such words can be pronounced even if meaningless.
**** Even though written as three sounds: /ng th s/; it is often pronounced with an inserted /k/: /ng k th s/.


RE: Syllabification - Sam G - 09-03-2016

(09-03-2016, 03:50 PM)Anton Wrote: You are not allowed to view links. Register or Login to view.
(09-03-2016, 02:03 PM)Sam G Wrote: You are not allowed to view links. Register or Login to view.This is a verbose cipher... every instance of the same plaintext word would yield the same sequence of ciphertext words

Not exactly - note that in my example "oolooooo" can be reduced to "oola" (as I did), but also to "alooooo". Which would be two different ciphertext words conveying the same underlay character (i.e., space). If we use this trick not only for "o" sequences, but also for "l" sequences, introducing a fourth letter into the alphabet, the degree of variability will rise further.

It would be apparent pretty quickly that "a" and sequences of "o" are equivalent and that would allow the repeated sequences of words to be uncovered.

Quote:Basically, what makes the ciphertext "unpronounceable" is the comparatively low number of letters signifying vowels in the alphabet as related to consonants. I think that if one uses an alphabet with roughly the same number of vowels and consonants, any ciphertext will be more or less "pronounceable".

I basically agree (though I think you also need at least some regular structure to prevent occasional long runs of consonants), but the VMS text is actually pronounceable with far more consonants than vowels, which is going to be harder to achieve.

Quote:Another cipher technique for making the unpronounceable ciphertext pronounceable would be just filling in "filler-vowels" according to a pre-defined pattern.

The problem here is that the VMS vowels aren't thrown in at random.  They have certain places within words where they can go, so you would need to explain the existence of those "vowel slots", which is going to be basically the same as accounting for the vowels themselves.

Quote:Yet another technique would be separately enciphering consonants and vowels of the plain text. I think that's the trick that would make any ciphertext pronounceable. Consider an example which is neither substitution, nor verbose. Don't know if it is a known cipher, but I just borrow it from the differential encoding used in the telecom world. Here, each subsequent character is not enciphered per se, but what is enciphered is rather the difference between it and the preceding character. E.g., the English alphabet runs: A, B, C, D etc. Let's assume we encode each line from scratch and the first word of the line is the word "odd". "O" is the first character of the line. It is enciphered just by its number in the alphabet - i.e., "15". The second letter is "D", which is 16 letters forward from "O" ("O" being counted itself). Thus "D" is enciphered with number "16". The third letter "D" is one letter distant from the second letter "D" (because we count the letter itself), hence it is coded with "1". We could well measure the distance beginning with 0, not with 1, but beginning with 1 is convenient for subsequent mapping of numbers to cipherext letters. In our example, 15 is "O" in the English alphabet, 16 is P, and 1 is A. So we get "OPA" as the ciphertext for "ODD".

Applied to the whole alphabet, such ciphertext would be unpronounceable for long phrases. What would make it pronounceable is considering two numbered rows - one for consonants (B, C, D, F...), another for vowels (A, E, I, O...) and enciphering consonants of the plain text with the first row and vowels of the plain text with the second row, using the same differential encoding as explained above. This way the phrase "this is a cipher", if I made no mistake, will be represented as "TNIM AB U KINSYL", which is quite pronounceable.

This is clever, although without thinking about it too much it's not clear that it would be fully invertible once you split the consonants and vowels into separate rows, since you would then have more numbers than letters within each row.  In any case though, you're basically preserving the consonant/vowel distinction found in the plaintext here, so this example actually sort of reinforces my larger point that it's hard to see the apparent consonant/vowel distinction made in the VMS text as meaningless.  (And further considerations of word structure and things like that would of course show that the VMS text was not actually produced by this method.)


RE: Syllabification - -JKP- - 09-03-2016

Abjads (scripts without vowels) by their very nature need to be more orderly and regimented than languages that include vowels from their inception.

The system of adding in the vowels (something that is done in the head, not on the stone) is based on grammatical rules, so that words that look identical can be distinguished by context (e.g., "book" and "writer").


RE: Syllabification - Anton - 09-03-2016

Quote:This is clever, although without thinking about it too much it's not clear that it would be fully invertible once you split the consonants and vowels into separate rows, since you would then have more numbers than letters within each row.

No, each row is numbered separately:

B,C,D,F... -> 1,2,3,4... -> ciphertext consonants

A,E,I,O... -> 1,2,3,4... -> ciphertext vowels

Since the decrypter knows that consonants are encoded with consonants only, and vowels - with vowels only, he uses the consonant row whenever he encounters a consonant in the ciphertext, and the vowel row whenever he encounters the vowel. This way the cipher is unambigously reversible.

***

Of course the examples that I provided are not meant to be claimed close to Voynichese, they are just two worked out offhand as suitable examples for a "pronounceable ciphertext".

Personally I expect that the most productive approach to decrypting Voynich is the theory-independent one - since it elegantly circumvents the possible pitfalls of "cipher" and "language" theories by just postponing their application until some semantic relations are established or at least guessed between some Voynichese vords and certain objects or notions. Contextual analysis may help to trace those relations.


RE: Syllabification - ReneZ - 10-03-2016

Pronouncibility is quite subjective, of course.
čtvrt is a perfectly pronouncible word for Czech people, while many Asians cannot pronounce the English word shrimp.


RE: Syllabification - crezac - 12-03-2016

(08-03-2016, 02:39 PM)Sam G Wrote: You are not allowed to view links. Register or Login to view.
(07-03-2016, 12:04 AM)Anton Wrote: You are not allowed to view links. Register or Login to view.Even if one puts apart the cipher theory, I am afraid that no syllabification is possible without our understanding of the alphabet. There is no confirmation that any of the transcription alphabets, EVA included, accurately represent the real alphabet adopted by the author.

We can't know it perfectly of course, but the distinction between consonant and vowel does seem quite clear.

CR: We can't even know it imperfectly without an understanding of how the alphabet encodes semantic or phonemic content.

Quote:This fact suggests nothing. A person wishing to conceal his message could well use these letters to represent consonants, for an additional layer of obscurity.

Everything about the script suggests that it was intended to emphasize the structure of the text, not obscure it.

CR: Unless that's what the scribe wanted you to think.  What you see isn't necessarily what is.

Quote:Besides, EVA e is not like Roman "e". It is like Roman "c".

And "c" is an "e" without the crossbar.  And there's a pretty clear reason why the scribe omitted the crossbar - the whole system of straight-stroke and curved-stroke letters which follow <a> and <e> respectively.

Quote:If the text is abbreviated, then single characters would represent character blocks, like 9 (EVA y) represented "us" in the end of the word and "con" in the beginning of the word in medieval Latin documents.

Among other problems, there aren't enough different glyphs for the text to be abbreviated Latin or anything else, although the shapes of the letters do derive from symbols used in Latin abbreviation (and the Roman alphabet).

CR: make that appear to derive and I can agree with it.  

Quote:
Quote:Really, the fact that EVA transliteration makes the text basically "pronouncible", as would likely any other transliteration scheme that mapped <a>, <e>, <o>, and <y> to vowels and the other letters to consonants (and considered <i> as a modifier), is by itself strong evidence that its implicit assignment of consonant and vowel status is basically correct.

The "pronouncibility" of the EVA transliteration is mere phantom, partly because the transcription is not fully matched to the Latin alphabet (e.g. substitute "c" for EVA e, as indicated above, and you will lose this pronouncibility at once),

The fact that there's a mapping that makes the text pronounceable at all is significant.  If you don't think so, try finding a simple letter-for-letter mapping to make, say, the Beale Ciphers pronounceable.

The fact that the pronounceable mapping preserves the obvious consonant/vowel distinction in the Latin-derived VMS script is also significant.

CR: meh.  If it's significant tell me what it means.  That if you make the transcription as like Latin as possible it becomes more or less possible to pronounce the transcribed words?   If some of the consonants are vowels or diphthongs or if a couple of your vowels were W or L sounds the distribution could still be adequate to give you pronouncability, whatever that means in this context. And the Beale Cipher is a sucker bet example since it's extremely likely to be a total fraud.  The logic for having three separate ciphered documents doesn't even make sense.

Quote:and partly because EVA is Latin-alphabet centric - while there is no confirmation that the Latin alphabet was the basis for the Voynichese script. For example, characters like a, c, i, o are found in the Cyrillic alphabet, characters l, d , r, y, q are like Arabic digits, and the rest of the characters are not found in the Latin alphabet at all.

I'd say that being Latin-centric and excess focus on EVA is the worst approach for those who wish to explore the plain text  language path. EVA has quite little to do with the real Voynichese alphabet, and absolutely nothing with the Voynichese language (if any).

You are now contradicting what you wrote above, about EVA <y> deriving from Latin abbreviations.  I think it's been well-established for a long time, and is obvious to begin with, that the VMS script derives from medieval Latin abbreviations and from the Roman alphabet, and that there is really no need to look further afield for the origins of the shapes of the letters.  The tables in D'Imperio show this well enough.  

CR:  so sad there aren't more things obvious about VMS. There are characters in VMS that could be from the Maltese alphabet and early Etruscan which aren't like Latin characters and are still in VMS. And others that seem entirely original to the manuscript. Obviously someone messed up when writing VMS since not all the characters are derived "from medieval Latin abbreviations and from the Roman alphabet"  You are not allowed to view links. Register or Login to view.You are not allowed to view links. Register or Login to view. that are which leaves a few unaccounted for, hardly "well enough".

The way that these letter shapes are used is different and the shapes have been tailored somewhat to fit the structure of the text and to produce an internally consistent system.  I think this aspect of the VMS is pretty well-established at this point.

CR:  Frankly, I'm beginning to have some serious doubts that anything about the text in VMS is "pretty well established" at this point.  Either that or my standards for evidence are way too high.

Quote:
Quote:Second, the entropy is too low.

As I noted in another thread, it is technically not reliable to speak of (character) entropy in respect to a written language when we don't know what is that language's alphabet.

We know the alphabet well enough to show that the entropy is going to be low no matter how you combine or split the glyphs.  The low entropy is really just telling you how rigid the phonotactic structure (i.e. rules governing how the glyphs may be combined to form words) is, and this can be understood without using math at all.

CR: Or the low entropy is telling you that EVA has some mistakes in it and that iii and ii and i may be three distinct characters or three distinct combination of anywhere from 1 to 4 characters. That the phonotactic structure isn't that rigid because you have a larger character set than you think you do.  [b] Maybe it's telling you you don't know the alphabet as well as you think you do.  Inconvenient if true. But believing that anything about VMS "has been well-established for a long time, and is obvious to begin with" is hubris and limiting.[/b]


RE: Syllabification - crezac - 13-04-2016

I was doing some reading on the Cherokee syllabary recently.  I found it interesting that the A E and I sounds are represented by the symbols D R and T.  So if VMS is written in an artificial alphabet used to represent one or more languages that had no written form there is at least one contemporary example of a similar approach.  And in the contemporary example, while characters are borrowed from other writing systems they do not represent the same phonemes.


RE: Syllabification - Diane - 14-04-2016

This is a very interesting discussion, but I'm puzzled about why my name should have been brought into it.  Since Anton couples my name with that of a Professor of Linguistics, I wonder if it isn't a slip, and the person mentioned was meant for another linguist, such as Anna May Smith?

Anton said:

Quote:No I am not disputing this ... but there are researchers who are. I won't speak for Bax or O'Donovan or others (let them defend their points of view themselves). What I mean is that scientific discourse should be based on criteria of scientific truth, not on assertions like "this is clear" or "this is evident" or "thus spake D'Imperio".


My research has had nothing to do with linguistics, nor with the written part of the text at all.  I work within my own field which is the provenancing of imagery in problematic artefacts.

Since my conclusions are usually presented online with some of the historical and comparative iconographic evidence which led me to those conclusions, and no other qualified person has ever disputed either the evidence or the conclusions, I do not see that I have any need to defend my conclusions at present.  Vague sneers and determined avoidance have been pretty much the only reactions I've seen by Voynicheros between 2008 and the present, so for that reason too I have had no reason to defend any of my conclusions, comparative evidence or reasoning. They have never been challenged or argued against.

I wonder if Anton has misunderstood my use of the word  "Latin Europe".

 It is used of that region whose dominant culture was Christian and official common language Latin.  I might have said "western Christendom" but the term is out of favour. 

Saying that the imagery is not a product of that region and its dominant culture says nothing about the written part of this text.  As example: suppose that an early copy of Aratus had been discovered, and that while its imagery was copied exactly, the text was translated into a form more congenial for its present time and owners.

So the imagery would continue to speak of a non-Latin and non-Christian culture but the text might well be in Latin.  Correctly identifying the region(s) and time(s) for first enunciation of this imagery, noting signs of later alterations and additions *to the original matter*, and finally positing a time, and cultural environment for the exemplars informing our present copy was not an easy task, but one of these days it may prove useful to those  interested in what the manuscript actually contains, and where the material came from and so forth.

Perhaps Professor Bax - like Anna Smith, or Rene Zandbergen and everyone else - feels his work also likely to be of use to others in the longer term?