![]() |
Finding patterns in Voynich words via a Hidden Markov Model - Printable Version +- The Voynich Ninja (https://www.voynich.ninja) +-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html) +--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html) +--- Thread: Finding patterns in Voynich words via a Hidden Markov Model (/thread-4896.html) |
Finding patterns in Voynich words via a Hidden Markov Model - quimqu - 27-08-2025 Dear all, These holidays I’ve been exploring how to analyze the Voynich manuscript’s word structure with models from data science - text analyse branch. Checking the Markov Models, I found some information about Hidden Markov Model (HMM) and explored a code that runs on EVA transliteration. The goal is not decipherment, but to quantify recurring patterns in how word pieces combine and to score how “typical” each word looks under those rules. A Hidden Markov Model is a simple probabilistic model with hidden states (unseen “roles”), transition probabilities between states, and emission probabilities for the observable symbols. From data, an HMM learns those transitions and emissions; for a new sequence it can decode the most likely path of roles and compute a log-likelihood that tells how well the model explains the sequence. Voynich "words" show strong positional regularities (common openings and endings), and an HMM gives a compact way to (i) discover recurring roles behind word pieces, (ii) quantify which pieces go where, and (iii) measure how typical a word is under the learned rules. What I did:
[size=1][font='Proxima Nova Regular', 'Helvetica Neue', Helvetica, Arial, sans-serif] ![]() How to read the state graph:
What the You are not allowed to view links. Register or Login to view. shows when you click a word:
How this could help with the manuscript:
RE: Finding patterns in Voynich words via a Hidden Markov Model - RobGea - 28-08-2025 This may be relevant, Application of Hidden Markov Modelling --> You are not allowed to view links. Register or Login to view. RE: Finding patterns in Voynich words via a Hidden Markov Model - Jorge_Stolfi - 28-08-2025 (28-08-2025, 12:42 AM)RobGea Wrote: You are not allowed to view links. Register or Login to view.This may be relevant, Application of Hidden Markov Modelling --> You are not allowed to view links. Register or Login to view. Thanks! I had not seen that analysis. So the HMM optimization consistently identifies o and a as "vowels", but adds other glyphs that depend on the transcription alphabet used. It seems consistent with my own mostly-by-hand analysis of the first level of the VMS word structure: In that formula, the O with superscript '?' means "zero or one glyph from the set O = {a,o,y}" The HMM optimization probably failed to recognize y as a "vowel" because (IIRC) it occurs mostly at the beginning or end of the word (a further constraint that is not captured by the formula above), and therefore does not alternate with "consonants" often enough. Were the Sukhotin and HMM optimization algorithms instructed to keep at least four glyphs in the "vowel" set? That could explain why various other glyphs besides a and o were included in the "vowels", and why they vary according to the transliteration alphabet. I wonder what HMM optimization would do with Arabic or Hebrew? Arabic has only three "long" or "strong" vowels (I,A,U), which are always written in the standard script, and their "short" or "weak" versions, which are usually not written. You are not allowed to view links. Register or Login to view. is a sample of Arabic with both short and long vowels ("bīsmī allāhī alrrāµmānī alrrāµymī") and You are not allowed to view links. Register or Login to view. is the same text with long vowels only ("bsm allh alrµmn alrµym"). The files are in the iso-latin-1 character set, using an You are not allowed to view links. Register or Login to view. of Arabic letters as single iso-latin-1 characters. All the best, --jorge RE: Finding patterns in Voynich words via a Hidden Markov Model - quimqu - 28-08-2025 (28-08-2025, 12:42 AM)RobGea Wrote: You are not allowed to view links. Register or Login to view.This may be relevant, Application of Hidden Markov Modelling --> You are not allowed to view links. Register or Login to view. This is great and very interesting! I will try to see how to use René's approach. Thank you. RE: Finding patterns in Voynich words via a Hidden Markov Model - Mauro - 28-08-2025 Very interesting, and I like how you consider the sequence of the words in your Markov model. I need some clarifications ![]() Quote:Each label lists the state’s top pieces: P = top prefix fragments, T = top stem fragments, F = top final/suffix fragments.By 'top', do you mean 'most frequent'? I'm not sure how to read the state graph. Ie., if I start at S3, which strings does it actually contain? All the six strings [o che she d o daiin] ('o' being duplicated because it's been classified both as a prefix and a stem)? How can the outputs of S3 add up to 1.06? They should be < 1. RE: Finding patterns in Voynich words via a Hidden Markov Model - quimqu - 28-08-2025 Hello Mauro, I try to explain: (28-08-2025, 09:14 AM)Mauro Wrote: You are not allowed to view links. Register or Login to view.By 'top', do you mean 'most frequent'? Yes, in every state, there is a quantity of prefixes, stems and sufixes. The shown are the "top", which are the, let's say, sub-morphemes (prefixes, stems, or suffixes) that are most likely to appear when the HMM is in that state (but there are more, only the top are shown). (28-08-2025, 09:14 AM)Mauro Wrote: You are not allowed to view links. Register or Login to view.I'm not sure how to read the state graph. Ie., if I start at S3, which strings does it actually contain? All the six strings [o che she d o daiin] ('o' being duplicated because it's been classified both as a prefix and a stem)? I suggest you click on the link to the html and see how the morphemes are distributed per "word". You can start in S3 with a part of the words (which may contain 3 subpart, let's sy sub-prefix, sub-stem and/or sub-sufix) then the most likely way to go is to S7 (as the arrow is thicker). In the html you can see word per word how the HMM understands it is constructed. (28-08-2025, 09:14 AM)Mauro Wrote: You are not allowed to view links. Register or Login to view.How can the outputs of S3 add up to 1.06? They should be < 1. I see the arrow labels are not correct. S3->S7 is labelled as S1->S6... I think it is confusing. I'll try to fix it and update the post. RE: Finding patterns in Voynich words via a Hidden Markov Model - ReneZ - 28-08-2025 Mary D'Imperio does something similar here: You are not allowed to view links. Register or Login to view. It starts off with individual characters, and then moves to 'small groups'. RE: Finding patterns in Voynich words via a Hidden Markov Model - quimqu - 28-08-2025 I updated a bit the code and found stronger relationships: From S2 ends always with S5 (100%). From S4 ends always with S0 (100%). I updated the graf plot and the html in the link. Thank you |