Jorge_Stolfi > 13-09-2025, 11:27 AM
(13-09-2025, 07:53 AM)Torsten Wrote: You are not allowed to view links. Register or Login to view.You are assuming the existence of a fixed seed text from which words are copied and modified. That is not the case. .... In the self-citation model, the Voynich text functions simultaneously as both the source and the outcome of the copying process.
Quote:The algorithm requires only a minimal seed (e.g., a single line of text) to initialize. ... In our implementation, we used line f103v.P.9 of the VMS as seed—<pchal shal shorchdy okeor okain shedy pchedy qotchedy qotar ol lkar
Quote:... to generate a corpus of more than 10,000 words. The resulting text contained 7,678 Voynich words (70%) and 3,156 non-Voynich words (30%).
dexdex > 13-09-2025, 01:38 PM
(13-09-2025, 11:27 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.But surely the percentage of Voynich words was higher than 70% at the beginning (when it was mostly copies of fragments of the seed line) and less than 70% near the end (where most words were the result of multiple mutation steps). And the percentage must have been decreasing; unless the mutation procedure was complicated and finely tuned as per above.This argument seems fallacious: the space of results grows like a tree, so divergence from a specific outcome is expected - what should be compared is the size of the resulting result space, which can be estimated by running the algorithm using different seeds and compare their pair-wise difference. That way, you get an estimate of the variation in results space depending on parameter space.
Jorge_Stolfi > 13-09-2025, 03:36 PM
(13-09-2025, 01:38 PM)dexdex Wrote: You are not allowed to view links. Register or Login to view.voynich-length output that looks similar enoughThe question is what "similar" means. To someone who has never seen English text, this sentence may look "similar enough" to English:
Jorge_Stolfi > 13-09-2025, 03:57 PM
(13-09-2025, 01:38 PM)dexdex Wrote: You are not allowed to view links. Register or Login to view.I'm not sure why you keep asking how 'tuned' it is
dexdex > 13-09-2025, 03:59 PM
(13-09-2025, 03:36 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.The previous message pointed out that, in the output of their test run, "70% were Voynich words"; implying that the similarity criterion was just that, namely the percentage of words (word instances or word forms, not clear) that were in the VMS lexicon. If that was the criterion, then the "divergence" (a drop in the similarity as the algorithm progresses) is a problem, because it means that a significant part of the similarity was due to the fact that the seed text had been taken from the VMS.That was not the only criterion: various Zipfian laws as well as frequency distributions were also compared in the article.
All the best, --jorge
Torsten > 13-09-2025, 05:52 PM
(13-09-2025, 11:27 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.Here are some tests of your algorithm with a 14-word seed text in English (a bit longer than the one you used above). The mutation algorithm randomly deletes a letter, with increasing prob if the word is long; or inserts a letter chosen with the approximate English letter frequency, with increased prob if the word is short; or replaces a random letter by a loosely similar letter (vowel by vowel, stop by stop, sibilant by sibilant).
(13-09-2025, 11:27 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.(This algorithm is not trivial and somewhat "tuned" to English, but I suppose that this is still considerably simpler and less "tuned" than the mutation procedure you used for Voynichese, correct?)Our algorithm is designed to approximate how a human scribe might have carried out the self-citation method. This is necessarily more complex than a purely mechanical procedure, since it must approximate the human ability to recognize, compare, and adapt patterns. While a human scribe can intuitively judge whether two glyphs or words appear similar, a computer program requires explicit rules to determine which glyphs are considered visually similar.
Torsten > 13-09-2025, 07:04 PM
(13-09-2025, 03:36 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.The previous message pointed out that, in the output of their test run, "70% were Voynich words"; implying that the similarity criterion was just that, namely the percentage of words (word instances or word forms, not clear) that were in the VMS lexicon.
(13-09-2025, 03:36 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.If that was the criterion, then the "divergence" (a drop in the similarity as the algorithm progresses) is a problem, because it means that a significant part of the similarity was due to the fact that the seed text had been taken from the VMS.
Jorge_Stolfi > 15-09-2025, 03:50 AM
(13-09-2025, 03:59 PM)dexdex Wrote: You are not allowed to view links. Register or Login to view.various Zipfian laws as well as frequency distributions were also compared in the article.
Eiríkur > Yesterday, 03:56 PM
(25-08-2025, 07:55 PM)Mauro Wrote: You are not allowed to view links. Register or Login to view.One thing which could be interesting to do is to compare the copy&modify algorithm with a version of it without the copy&modify part, but with the same generation rules. The idea is to generate one word at time, always starting from a null string and applying the rules a random number of times (a few times) for each generated words (one needs also a mechanism for creating 'separable' words: at each step generate two words and decide with a certain probability if to add the 'join two words' rule, else keep only the first word). This could help in separating the effect of the copy&modify mechanism from the effect of the rules.
Jorge_Stolfi > Yesterday, 06:11 PM
(Yesterday, 03:56 PM)Eiríkur Wrote: You are not allowed to view links. Register or Login to view.One thing which could be interesting to do is to compare the copy&modify algorithm with a version of it without the copy&modify part, but with the same generation rules. The idea is to generate one word at time, always starting from a null string and applying the rules a random number of times (a few times) for each generated word.