The Voynich Ninja
HMM automatic vowel detection - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: HMM automatic vowel detection (/thread-2121.html)

Pages: 1 2 3 4 5


RE: HMM automatic vowel detection - -JKP- - 01-10-2017

You know you've been staring at the VMS for too long when you can glance through a text file like this and know immediately where it came from in the manuscript.


RE: HMM automatic vowel detection - MarcoP - 01-10-2017

(30-09-2017, 10:23 PM)Koen Gh. Wrote: You are not allowed to view links. Register or Login to view.Marco, I took text from some pages and transcribed them taking into account some possible digraphs on the one hand and developing the benches on the other - a scenario I deem possible. It's obviously not a proposed solution, just a test to see what happens when the text is written this way. If possible, could you check which results your program gives for it?

(I kept EVA q as q, though it's clear that in this transcription it would take on a vowel value, likely one already represented differently elsewhere)

Hi Koen, I attach the output of the Python script and a few lines in which I highlighted in red characters more likely to be generated by state 1. 
Which pages did you transcribe? I would be curious to understand more of your transcription!
In particular, what have you done with qo? 

___________________

I followed Davidsch's suggestion and wrote to Prof.Knight. His suggestion is to read the tutorials of the You are not allowed to view links. Register or Login to view. and experiment with it.


RE: HMM automatic vowel detection - Koen G - 01-10-2017

Thanks, Marco!
I attach the folios I started from (taken from Takahashi). I selected these folios randomly, but from the same "language" to have some consistency.

It's basically EVA, but I made some changes to test digraphs and trigraphs. I came up with a way to represent "o+gallow" as a single glyph. It's always the voiced version of that gallow. So oT = D, oP = B, oK = G. Once again, this is not a proposed translation, just a way to test digraphs.

If you represent o+gallow as a consonant, eva q looks like it has to be a vowel.

I also replaced: 
EVA in = n 
EVA iin = m 
EVA n = l
EVA l = s
EVA ee = u

The most interpretative part is that I developed the benches depending on their surroundings (though as consistently as possible). This can surely be done better, or at least differently. It would be interesting if we had a way to test how "optimal" a certain transcription is (with entropy?).


RE: HMM automatic vowel detection - MarcoP - 01-10-2017

(01-10-2017, 10:52 AM)Koen Gh. Wrote: You are not allowed to view links. Register or Login to view.Thanks, Marco!
I attach the folios I started from (taken from Takahashi). I selected these folios randomly, but from the same "language" to have some consistency.

It's basically EVA, but I made some changes to test digraphs and trigraphs. I came up with a way to represent "o+gallow" as a single glyph. It's always the voiced version of that gallow. So oT = D, oP = B, oK = G. Once again, this is not a proposed translation, just a way to test digraphs.

If you represent o+gallow as a consonant, eva q looks like it has to be a vowel.

I also replaced: 
EVA in = n 
EVA iin = m 
EVA n = l
EVA l = s
EVA ee = u

The most interpretative part is that I developed the benches depending on their surroundings (though as consistently as possible). This can surely be done better, or at least differently. It would be interesting if we had a way to test how "optimal" a certain transcription is (with entropy?).


Thank you, Koen! It's nice to see the kind of experiments one can easily run with these systems. One must always be careful not to expect too much from automatic methods, but they at least provide an objective point of view.

I think that the You are not allowed to view links. Register or Login to view. about the frequent dropping of initial o- in some phonetic contexts is quite convincing. According to your proposal, this would mean that the initial consonant switches from voiced to unvoiced in some contexts. I wonder if this is known to happen in some languages?

In particular, after -in, -iin, -r an initial o- seems to be required (or at least very strongly preferred). If those endings are consonants, I find it easy to imagine that a vowel must follow. I don't know if a voiced / unvoiced alteration would make the sequence easier to pronounce. I am certainly biased by the frequent dropping of the final vowel before an initial vowel in Italian: della ora becomes dell'ora, questo anno becomes quest'anno etc.


RE: HMM automatic vowel detection - Koen G - 01-10-2017

Marco: it's not about alteration. Say a word starts with t. If the preceding consonant is voiced and speech is fluent, this t could become d. 

I am just using voicedness as an example of what this initial o could indicate. It could also be aspiration for example. In some languages the difference between aspired and unaspired is phonemic.


RE: HMM automatic vowel detection - ReneZ - 01-10-2017

Marco,

from the figures you have shown, with the white and red characters against a black background, I conclude that the algorithm really goes ahead by assigning each character in the text to one of the possible states.
When doing this, it tries to maximise the consistency of having each character always being generated by the same state, but also to alternate as much as possible.

In the end, the A and B matrices can be computed by simply doing the statistics on the state transitions and the mapping of characters to states.

What is not output, it seems, is the fraction of characters represented by each state, i.e. the ratio of red and white characters in these plots, but this can be computed from the components of the A matrix.

Not sure how this will come out. If we call the states 1 and 2 (I never figured out why the vast majority of software people prefer to start counting at 0...), and the A matrix looks like this:


Quote: /              \
|  A(11)  A(12)  |
|                |
|  A(21)  A(22)  |
 \              /

Then:
   p(state=1)  = A(21) /  { A(12) + A(21) }
and
  p(state=2)  = A(12) /  { A(12) + A(21) }

FWIW....


RE: HMM automatic vowel detection - Koen G - 01-10-2017

Wouldn't this also mean that the algorithm will perform poorly in languages which prefer large consonant clusters?


RE: HMM automatic vowel detection - MarcoP - 01-10-2017

(01-10-2017, 01:18 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.Marco,

from the figures you have shown, with the white and red characters against a black background, I conclude that the algorithm really goes ahead by assigning each character in the text to one of the possible states.
When doing this, it tries to maximise the consistency of having each character always being generated by the same state, but also to alternate as much as possible.

Hi Rene,
the red/white plots are something I produce on the basis of the B matrix.
You are not allowed to view links. Register or Login to view.
I assign the state:0/1 information on the basis of the probability for each character to be higher in state 0 or 1.
For instance:
 a state:1  0.03163 0.19249
0.03163<0.19249 I count 'a' as state:1

 b state:0  0.00685 0.00000
0.00685>0.00000 I count 'b' as state:0


I highlighted in red all the characters for which P0(char)<P1(char).
From the B graphical representation, it should be clear that in most cases the model has a clear preference for generating each character in one of the two states. But in this example, 'k' results ambiguous:
 k state:1  0.01406 0.01693
(the probabilities for state:0 and state:1 are very close).

[Image: attachment.php?aid=1739]


RE: HMM automatic vowel detection - MarcoP - 01-10-2017

(01-10-2017, 01:46 PM)Koen Gh. Wrote: You are not allowed to view links. Register or Login to view.Wouldn't this also mean that the algorithm will perform poorly in languages which prefer large consonant clusters?

The algorithm tries to predict the next character on the basis of the single preceding character.
It the input is made of clusters of letters that tend to occur consecutively (e.g. clusters of consonants followed by clusters of vowels) the model will favor sequences of characters from the same state and the A matrix will have high probability loops 
state:0->state:0 state:1->state:1
and lower probability
state:0->state:1 state:1->state:0

If there is alternation (as it typically is the case with consonants and vowels) it will favor state alternation with the probability of state:0 being followed by state:1 being higher than state:0 being followed by state:0 (low probability loops).

I guess the algorithm will perform poorly in languages having frequent trigraphs, since these cannot be model by only considering one character and the immediately following one. Anyway, at the moment, I don't know how to compare results of runs based on different inputs.


RE: HMM automatic vowel detection - ReneZ - 01-10-2017

Thanks for the explanation Marco.

The complication is that, according to the B matrix, the majority of each character is typically generated by the preferred state, while 'some' may be generated by the other state.
One cannot also really compare the absolute values in the B matrix since, overall, one state may be much more prominent than the other (see my previous post).

This is really only a problem for cases like the 'k' you quoted. The fraction 0.01406 is for all state-0 characters and 0.01693 is for all state-1 characters, so if there are many more state-0 characters altogether, k is more likely to belong to state 0.

Not sure if I explained that clearly.
Anyway, it is not really too important.