The Voynich Ninja
[split] Verbose cipher? - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: [split] Verbose cipher? (/thread-3356.html)

Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13


RE: [split] Verbose cipher? - ReneZ - 16-09-2020

There's a not so little step before that, which is selecting a transliteration as input.
For the time being, people seem to be happy to use the Takahashi file in Eva, but that is significantly sub-optimal.

However, I completely subscribe to the idea of having standard tools based on standard formats.
With that, each step in the process can be solved by different people.

If there is anyone here who is interested in writing web applications of the type described by Emma, it would be good to hear from him/her.

There are many things that could be done.


RE: [split] Verbose cipher? - farmerjohn - 16-09-2020

You are not allowed to view links. Register or Login to view. you can download very simple tool for calculation of letter frequencies and entropy.
You are not allowed to view links. Register or Login to view.



RE: [split] Verbose cipher? - Koen G - 16-09-2020

(16-09-2020, 03:13 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.There's a not so little step before that, which is selecting a transliteration as input.
For the time being, people seem to be happy to use the Takahashi file in Eva, but that is significantly sub-optimal.

I agree that it would be ideal if anyone could use a standardized format. I am still using nablator's java code, but I understand that not everybody can manage this. Either way, a web tool is more accessible.

When it comes to manipulating the input file, I prefer to use find and replace in notepad. This allows me to keep an eye on the results, whether everything went as desired. You would lose this control if the program did both the replacements of glyphs and the calculations, as Emma suggested.

As for the transliteration file used, I wouldn't mind agreeing on some standard. But this should probably go in a separate thread.


RE: [split] Verbose cipher? - geoffreycaveney - 16-09-2020

The most significant new concept to me in Koen's Voynich character inventory is the idea of [o]+[glyph] as representing a single unit. 

Looking into this further, I now see that in addition to Koen's bigrams [ol], [or], [od], [ok], [ot], and the well-known series [oin], [oiin], [oiiin], [oir], we must also take into account the combinations [oe], [oa], [os], [osh], [och], [ockh], and [octh], all of which occur a non-negligible number of times in the Voynich ms text.

It is curious, for example, that while [sh] is more frequent than [s] without [h], in combination with [o] the opposite is true: [os] without [h] is much more frequent than [osh].

Also notable is the relative rarity of [och], 181 occurrences, a very small portion of the total number of occurrences of [ch].

So far we might explain these observations in a non-linguistic or potentially purely algorithmic way: To keep vord lengths relatively more uniform, the author preferred to add [o] before a single glyph rather than before glyph combinations such as [ch] or [sh].

However, what I find most striking is the relatively greater frequency of [ockh] and [octh] as a portion of all occurrences of [ckh] and [cth]:

all [ckh]: 907
all [ockh]: 201
[qockh]: 70

all [cth]: 945
all [octh]: 154
[qocth]: 26

Despite the relative rarity of [ckh] and [cth] in comparison with [ch] itself, the combinations [ockh] and [octh] occur just as often as [och] !

This cannot be easily explained by a non-linguistic reason or as part of a purely algorithmic method, since here it is the longer glyph strings [ockh] and [octh] that are given preference, as a portion of all [ckh] and [cth], over the apparently simpler and more natural shorter glyph string [och] as a portion of all [ch]. I cannot see a good reason why such an algorithm would tend to reject [och], but tend to favor [ockh] and [octh].

It can be simply and logically explained, however, by linguistic reasons if the ms text represents actual meaningful linguistic content: 

For example, let us suppose for sake of argument that [o] represents voicing of a consonant, such that [o+glyph] is the voiced counterpart of voiceless [glyph] without preceding [o].

And let us further suppose that [ch] represents the palatal glide /j/ (the "y" sound in English). Then all of the combinations [glyph+ch] and ligatures [c+glyph+h] could very well and naturally represent palatalized or soft forms of the consonants represented by the plain [glyph] itself. 

Languages with series of such types of consonants have soft/palatalized voiced as well as voiceless consonants, so the combination "voiced + soft/palatalized", represented according to this hypothesis with [o+[c+glyph+h]], such as [ockh] and [octh], would indeed be expected to occur with a reasonable frequency.

But in such a language represented in such a fashion, the combination [o+ch] by itself would likely be rather superfluous, as the voiced/voiceless distinction for the palatal glide /j/ (English "y") itself is an exceedingly rare distinction found only in a very tiny handful of known natural languages. (It is a non-phonemic allophonic variation in Scottish Gaelic, and a phonemic distinction in Kildin Sámi and in North America in Washo and one Mazatec language.) In all likelihood there would be simply no need to distinguish between [ch] and [o+ch]. Hence it would indeed be expected that [och] as a variant of [ch] would be unnecessary and relatively rare. 

Of course this cannot be the only explanation of the phenomenon I have observed. But it is logical, linguistic, based on the characteristics of the ms text and script itself, and it seems to be a simpler and more natural explanation of a phenomenon that seems otherwise hard to explain. 

==========

I may as well add that as long as I am supposing that [o] represents voicing and [ch] represents palatal(ization), it would come close to neatly completing a possible consonant phoneme inventory if we were to further suppose that the [qo], [qok], [qot] series may possibly represent the nasal consonants. Indeed nasal phonemes are usually and typically voiced, so in this case the combination [q+o] would almost always be natural, and [q] without [o] would indeed be expected to be extremely rare. (In the same vein, if the EVA labeling got lucky and happened to choose [l] and [r] for the actual liquid phonemes, they are also usually and typically voiced, and so it would also be natural if they were most often represented in such combinations as [o+l] and [o+r], as they are in the ms text.)

One problematic issue that may arise with this line of analysis is that the apparent vowels may seem to disappear completely from many very common vords/syllables with [o], such as [ol], [chol], [or], [chor], [shol], [qol], [sho], etc. In this case one possible logical linguistic explanation would be the presence of syllabic liquid phonemes: /r/ and /l/ serving as the nucleus of a syllable in place of a normal vowel. 

Up to this point I have developed this line of analysis based exclusively on the characteristics observed in the ms text itself, above all the otherwise difficult to explain relative frequency of [ockh] and [octh] as a portion of all [ckh] and [cth] vs. the relatively extreme rarity of [och] as a portion of all [ch]. The analysis to this point is not dependent on the identification of any particular language as that of the Voynich ms text.

But it makes sense at this point to consider whether any plausible candidate languages may possess the features described in the analysis above. The most significant feature necessary to make sense of this line of analysis is the presence of a relatively robust and complete series of soft / palatalized forms of all or most consonant phonemes in the language. The Celtic languages could fit this bill, but a Celtic language is a rather implausible candidate to be the language of the Voynich ms, unless a medieval Irish speaker happened to find himself or herself somehow located in a Central European cultural context. But this is unlikely. 

Much more natural would be an East Slavic or South Slavic language. Slavic languages have the extensive series of palatalized forms of consonants that fit so well in the line of analysis presented here. Moreover, East Slavic and South Slavic languages are well-known for their syllabic consonants, and syllabic liquids are the most common among those. 

Among East Slavic and South Slavic languages, the obviously most likely candidate to be the language of a Central European ms in the early-mid 15th century would be Czech. We even have the later Czech / Bohemian history of the possession of the Voynich ms, although that is a secondary and very circumstantial and limited form of evidence. Of course, the identification of the language is pure speculation at this stage; the critical issue is the deeper analysis of the characteristics of the ms text, a process I have attempted to delve into above, basing myself on Koen's Voynich character inventory that produced the most natural language-like entropy and conditional entropy statistics that anyone has yet succeeded in finding in an analysis of the glyphs of the Voynich ms text. 

Geoffrey


RE: [split] Verbose cipher? - nickpelling - 16-09-2020

I think that it should (in principle, at least) be possible to write a programme along the following lines:
  • Start with a corpus (say, Q20)
  • Filter out all unreliable words (e g. words containing unreadable glyphs, weirdos, or glyphs that occur less than five times, say)
  • Create a starting list of tokens based on all the glyphs that remain
  • Calculate a metric based on the entropy of the text (more on this below)
  • Find the pair of tokens that, when combined as a new composite token, yields the best metric
  • Halt if the best metric found is worse than the previous best metric
  • Else add that new composite token to the list, reparse and repeat

The reason this is possible is that adding tokens to the list of tokens shortens the total amount of text but flattens out the stats. So there is a ceiling to the process.

But the trick would be finding a weighted combination of different entropy measures that yield worthwhile results.

Just thought I'd say. :-)


RE: [split] Verbose cipher? - Koen G - 16-09-2020

(16-09-2020, 07:58 PM)nickpelling Wrote: You are not allowed to view links. Register or Login to view.But the trick would be finding a weighted combination of different entropy measures that yield worthwhile results.

This is what I've been trying manually. It's a lot of work though. And I have a feeling that I reached a ceiling - don't think I can get further without either messing with spaces or equating glyphs (like a = y). Though it would be awesome if the results are improved. Anyone up for the challenge? Big Grin


RE: [split] Verbose cipher? - geoffreycaveney - 18-09-2020

For those who are interested, on a provisional basis I have developed a new transcription of the Voynich ms character inventory primarily based on Koen's treatment of selective n-grams (multi-glyph sequences) as single units, by means of which he managed to raise the conditional character entropy (h2) to a European natural language-like level while maintaining a normal basic character entropy (h1). I have incorporated most of the suppositions I proposed You are not allowed to view links. Register or Login to view. in order to incorporate Koen's treatment of [o+glyph] as a single unit as well as to explain a certain remarkable anomaly in the character distribution. I have adjusted a few things, added a few things, and the aim is to present a transcription based on this new analysis that is as complete as can be possible at this provisional stage. I have tentatively titled this system the Voynich Verbose Cipher Character Inventory, which I abbreviate as VCI.

I hasten to underline that I have striven to avoid making this transcription look like any particular candidate language that the Voynich ms text could possibly be written in. The series of soft / palatalized consonants in this transcription, represented by ligatures and glyph sequences with EVA [ch], is based exclusively on the language-neutral analysis presented in my prior post. The concept of [o+glyph] as representing the voiced counterparts of the plain voiceless [glyph] consonants was inspired by Koen's insights, although of course as always any mistakes or errors in my incorporation of Koen's analysis into this provisional transcription system are my responsibility alone. The idea of treating [qo+glyph] sequences as nasal consonants was presented in my prior post. 

One major adjustment since my prior post is the provisional identification of EVA [l] as VCI <s>. The motivation for this proposal is the observation of the relatively frequent sequence EVA [lk] (1079 occurrences) in contrast with the relative infrequency of the sequence EVA [lt] (107 occurrences). It is already natural to propose <t> as a plausible value of EVA [k], due to its frequency and relation to other glyphs. The frequency of the EVA [lk] = VCI <st> sequence is further quite natural in many European languages, hence the proposal EVA [l] = VCI <s>. 

This further leads to the natural identification of EVA [s] and [r] as VCI <r> and <l> respectively. It is natural to suppose that these glyphs with similar shapes represent similar liquid phonemes. (I have to admit that this analysis, although genuinely motivated as a consequence of the language-neutral EVA [lk] = VCI <st> identification above, does create the proposed transcription EVA [sh] = VCI <>, which may look like Slavic or specifically East Slavic or Czech. Those who prefer may represent it rather as simply <r'>.)

Regarding the tentatively proposed "vowel" glyphs and glyph sequences, first of all I note that I have left EVA [y] as VCI <y> because this glyph has a very complicated pattern of conditional occurrence in relation to other glyphs, and I prefer not to speculate about its phonemic quality at this stage, even to the extent of its label in the VCI transcription. As for [ch], it is only natural that <j> and <i> could be represented by the same glyph/sequence. Naturally researchers should choose one or the other VCI representation of it for use in statistical analysis.

I regard the simplification of EVA [aiin] to VCI <o> as a kind of cutting of the Gordian knot. I very much like Koen's treatment of EVA [aiin] as a single unit. It is high time to develop a transcription that makes it look like a single unit as well. Let's look at "[daiin]" as a two-phoneme CV syllable rather than a 5-letter word! For good measure my system treats EVA [an] / [ain] as VCI <u> / <ú>, and in general the number of EVA [i]'s are consistently treated as vowel length markers.

The representation of EVA [al] was the most difficult decision I had to make. In the spirit of Koen's analysis of this sequence as a single unit, the VCI system here lets go of the identification of the glyph EVA [l] with "<s>", and simply treats EVA [al] as VCI <a>. We shall see how this decision works out, both in statistical analysis and in linguistic analysis of the ms text. EVA [am] is treated as a "long" version of [al], and hence represented as VCI <á>.

Geoffrey Caveney

   

   


RE: [split] Verbose cipher? - Aga Tentakulus - 18-09-2020

   

From my course " We learn Voynich "
Unfortunately, nobody was interested in the theory at that time.
It explains metaphorically what Nick wrote and suggests a possible system.
It also explains that you can make more out of less.

You first have to take something apart if you want to know how something might work.

As JKP wrote it, a second "8" would be possible, if the ( example "es" combination ) is written inaccurately.

PS: Maybe I should write my essay here as well.

Translated with You are not allowed to view links. Register or Login to view. (free version)


RE: [split] Verbose cipher? - -JKP- - 18-09-2020

Aga, since you are proposing very specific interpretations for the shapes, this belongs in an Aga Theory thread, according to forum rules.

It will be easier to discuss it that way.


RE: [split] Verbose cipher? - Anton - 18-09-2020

You are not allowed to view links. Register or Login to view. will be fine, it's exactly on this subject.