04-11-2022, 04:21 PM
The slot-based word morphology approach does a good job of exposing consistent patterns that underlie a majority of word forms, while the "network" approach does a good job of predicting word frequencies based on edit distance from the [ol], [chedy], and [daiin] prototypes.
I'm not sure either of those approaches has a clear advantage over a model based on a transitional probability matrix for which glyph will follow next after a given glyph or sequence of glyphs, without regard for its position within a word, and in which spaces are inserted according to largely consistent patterns into a continuous stream of text.
Torsten's [ol], [chedy], and [daiin] prototypes correspond rather closely to closed loops [cholcholchol...], [qokeedyqokeedyqokeedy...], and [daiindaiindaiin...], made up of the highest-probability sequences of individual graphemes in particular parts of the manuscript, such that if transitional probability matrices have high predictive power in themselves, we should also expect to see word frequencies correlate with "edit distance" as Torsten has found -- though not exactly, since the words [ol] and [chedy] will often represent partial cycles in contexts such as [~r.ol~] and [~l.chedy~]. So I think it might be possible to predict the "network" patterns on the basis of transitional probability matrices, with no need to factor in "edit distance" as such. Perhaps we could think of a statistical test that would yield a different result depending on which model is more effective.
I'm less sure about transitional probability matrices being able to account for the patterns exposed by slot-based word morphologies. But I wouldn't rule out that they could.
A while ago I calculated matrices for each bigram in Currier B (e.g., what the next single glyph would be after each pair of glyphs, such as [k] after [qo]) and generated some text randomly based on them. Of course I had to make some working assumptions about what a "glyph" is, all of which are open to challenge, and I don't think analysis of preceding bigrams goes deep enough. But the results, which I may have quoted here before, came out looking like this (with spaces inserted between any two glyphs that are more often separated by a space than not):
[ol.qokeodar.ar.okaiin.Shkchedy.Shdal.qotam.ytol.dal.cheokeedy.chkal.Shedy.qokair.odain.al.ol.daiin.cheal.qokeeey.lkain.chcPhedy.kchdy.cheey.otar.cheor.aiin.Shedy.dal.dochey.opchol.okchy.Sheoar.ol.oeey.otcheol.dy.chShy.lkar.ain.okchedy.l.chkedy.oteedar.ShecKhey.okaiin.chor.olteodar.okal.qokeShedy.ol.ol.Sheey.kain.cheky.chey.chol.chedy]
I *think* most of these words would conform to most proposed Voynichese word morphologies (including this new one), but they were generated without reference to any slot system. Any apparent word structure that appears here is a byproduct of the transitional probability matrices operating freely.
I'm not sure either of those approaches has a clear advantage over a model based on a transitional probability matrix for which glyph will follow next after a given glyph or sequence of glyphs, without regard for its position within a word, and in which spaces are inserted according to largely consistent patterns into a continuous stream of text.
Torsten's [ol], [chedy], and [daiin] prototypes correspond rather closely to closed loops [cholcholchol...], [qokeedyqokeedyqokeedy...], and [daiindaiindaiin...], made up of the highest-probability sequences of individual graphemes in particular parts of the manuscript, such that if transitional probability matrices have high predictive power in themselves, we should also expect to see word frequencies correlate with "edit distance" as Torsten has found -- though not exactly, since the words [ol] and [chedy] will often represent partial cycles in contexts such as [~r.ol~] and [~l.chedy~]. So I think it might be possible to predict the "network" patterns on the basis of transitional probability matrices, with no need to factor in "edit distance" as such. Perhaps we could think of a statistical test that would yield a different result depending on which model is more effective.
I'm less sure about transitional probability matrices being able to account for the patterns exposed by slot-based word morphologies. But I wouldn't rule out that they could.
A while ago I calculated matrices for each bigram in Currier B (e.g., what the next single glyph would be after each pair of glyphs, such as [k] after [qo]) and generated some text randomly based on them. Of course I had to make some working assumptions about what a "glyph" is, all of which are open to challenge, and I don't think analysis of preceding bigrams goes deep enough. But the results, which I may have quoted here before, came out looking like this (with spaces inserted between any two glyphs that are more often separated by a space than not):
[ol.qokeodar.ar.okaiin.Shkchedy.Shdal.qotam.ytol.dal.cheokeedy.chkal.Shedy.qokair.odain.al.ol.daiin.cheal.qokeeey.lkain.chcPhedy.kchdy.cheey.otar.cheor.aiin.Shedy.dal.dochey.opchol.okchy.Sheoar.ol.oeey.otcheol.dy.chShy.lkar.ain.okchedy.l.chkedy.oteedar.ShecKhey.okaiin.chor.olteodar.okal.qokeShedy.ol.ol.Sheey.kain.cheky.chey.chol.chedy]
I *think* most of these words would conform to most proposed Voynichese word morphologies (including this new one), but they were generated without reference to any slot system. Any apparent word structure that appears here is a byproduct of the transitional probability matrices operating freely.