MarcoP > 28-09-2017, 08:32 PM
Quote:Another method is to use a two-state bigram HMM (Knight et al., 2006; Goldsmith and Xanthos, 2009) over letters, and induce two clusters of letters with EM. In alphabetic languages like English, the clusters correspond almost perfectly to vowels and consonants. We find that a curious phenomenon occurs with the VMS – the last character of every word is generated by one of the HMM states, and all other characters by another; i.e., the word grammar is a ∗ b.
Emma May Smith > 28-09-2017, 08:56 PM
davidjackson > 28-09-2017, 09:01 PM
MarcoP > 28-09-2017, 09:35 PM
(28-09-2017, 09:01 PM)davidjackson Wrote: You are not allowed to view links. Register or Login to view.K-R say they are using characters A-Z, *, 1-9. you seem to be missing some of those in your explanation above - are they included?
The source website transcription included in the paper is still live, you might want to use that one instead of creating a new transcription file.
MarcoP > 28-09-2017, 09:43 PM
(28-09-2017, 08:56 PM)Emma May Smith Wrote: You are not allowed to view links. Register or Login to view.Marco, would it be possible to alter the input text to test some hypotheses?
Specifically, I'm interested in what difference it would make if [a] and [y] were transcribed with the same character, and further if a [y] was placed between [e] and [d] when they appeared in the string [ed].
If they lead to a stronger split between the two states, with less looping, would that be a positive sign?
Emma May Smith > 28-09-2017, 10:00 PM
ReneZ > 29-09-2017, 05:45 AM
MarcoP > 29-09-2017, 04:24 PM
(29-09-2017, 05:45 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.Marco, when you mention a limit of 1000 characters, is this the length of the text?
From what I understand of the algorithm, the text should not have to be stored in memory, and should
only have to be read once, so such a low limit would not make much sense.
The overall VMS has about 160,000 characters (not counting spaces).
BAR ZC9 FCC89 ZCFAE 8AE 8AR OE BSC89 ZCF 8AN OVAE ZCF9
<f81v.1> BAR.ZC9.PCC89.ZCFAE.8AE.8AR.OE.BSC89.ZCF.8AN.OVAEZCF9-
Davidsch > 29-09-2017, 06:03 PM
Koen G > 30-09-2017, 10:23 PM