(23-08-2025, 08:23 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view.There was no need to invent an artificial “gibberish generation” mechanism. As D’Imperio already observed [...]
She was not stating a fact. She was proposing her version of the "hoax" theory. Which, in general terms, apparently is the same as yours. Which has the same problems as yours.
Torsten Wrote:It is simply more efficient for a scribe to copy and modify existing words than to continually invent entirely new ones.
This is correct. Continually inventing new words is hard. Copying previous text is much easier.
Torsten Wrote:Consequently, a scribe attempting to generate language-like gibberish would, sooner or later, abandon the laborious task of perpetual invention in favor of the far easier strategy of reduplicating and adapting previously written material — and would ultimately adhere to this approach consistently.
Note my emphasis. The problem is that the "adapting" is far from a simple step. Voynichese words have a very restricted structure, so the "adapting" must be random but such that it preserves that structure. At this point the gibberish generation method is not much easier than generating each word from scratch (as Rugg had proposed), and is totally not "natural".
Much easier and more natural would have been to take any text in Latin or some other readable language, even if an "alchemical herbal", and encode it with a cipher that was easy to apply on the fly but hard or even impossible to decipher. The resulting text would look like language and have language-like properties, much more so than the VMS.
In fact (if I read you correctly), your justification for your proposed method is that it creates the repetitiousness that you claim to see in the VMS; which is a clue that the text is gibberish. Wouldn't the Author have worried about this last fact?
Torsten Wrote:I begin by identifying fundamental, corpus-wide patterns — for example, the clustering of similar words across pages. These clusters suggest a mechanism of repetition and gradual variation. The central argument, therefore, is that an iterative process of copying and modification is sufficient to account for the statistical features. [...]
Paraphrasing your argument: "
The VMS text has statistical properties X, Y, and Z, where Z is 'repetitiousnss'. Here is an algorithm that generates gibberish with properties X, Y and Z. Therefore the VMS must be gibberish."
Do you see the logical fallacy there?
Please confirm if this Python code is a sufficiently close approximation of your method:
Code:
def TnT(Seedtext,Mutate,Prob_Restart,Prob_Mutate):
# Generates a pseudo-VMS text as a list of strings.
#
T = SeedText(1000) # Create a 1000-word seed text.
k = None; # Source text index.
for i in range(35000):
if i == 0 or random() < Prob_Restart:
k = randint(0, len(T)-1)
word = T[k]; k += 1
if random() < Prob_Mutate:
word = Mutate(word)
T.append(word)
return(T)
Torsten Wrote:Therefore there is no need to determine or fine-tune external parameters
(23-08-2025, 09:03 PM)dexdex Wrote: You are not allowed to view links. Register or Login to view.the author(s) didn't need to 'figure out' the parameters. Had they accidentally landed somewhere else in parameter space, we would be very similarly stumped. All that's required for the hypothetical process to be plausible is that the fraction of the parameter space that explains the Voynich is significant enough that they could have landed there. Which it sounds like this simple process is.
There was no need for the Author to tune the parameters to produce the Voynichese "language"
specifically, but the parameters could not have been random. The seed text and the algorithm of the Mutate function had to be compatible, and both had to generate the non-trivial word structure that we see in VMS lexicon.
The Mutate function could not have been just "choose randomly between deleting a random letter, inserting a random letter in a random place, or replacing a randomly chosen letter with some other random letter". After a short while those mutations would produce a lexicon that is just random strings of letters, with no discernible structure.
For the same reason, the seed text could not have been just a list of random strings of letters. Its words must have had the non-trivial structure we observe in Voynichese, which must have been preserved by Mutate.
Moreover, the Author would have had to choose the seed text and the Mutate function so as to produce the pronounced idiosyncrasies we see in the VMS word frequency distribution. Consider the following word counts from my version of the VMS transcription file:
105.250 Chdy
35.125 Shdy
301.250 Chedy
236.500 Shedy
34.000 Cheedy
50.500 Sheedy
26.000 okChdy
1.000 okShdy
18.500 okChedy
3.000 okShedy
0.000 okCheedy
0.000 okSheedy
38.000 qokChdy
4.000 qokShdy
33.000 qokChedy
5.000 qokShedy
2.000 qokCheedy
1.000 qokSheedy
(The fractional numbers result from regarding the ',' separator as a word break with 50% probability).
How could a "parameterless" Mutate function produce these asymmetric word frequencies?
By the way, such asymmetries are normal in natural languages. Here are some counts in Well's
War of the Worlds:
91 brother
0 brothers
13 brother's
63 another
0 anothers
1 another's
58 other
11 others
0 other's
3 mother
1 mothers
0 mother's
1 bother
0 bothers
0 bother's
It so happens that the novel's main character had just one brother, and their mother must have passed away before the invasion...
But here is a way you could further confirm your theory. If the seed text has N tokens, it has only N-1 consecutive token pairs. Therefore, if the seed text is only a thousand words of less, the distribution of word types that could follow a given word type would have been very limited, often singular (only one choice, zero entropy). But each time the source index k is reset, or a word is mutated, new consecutive word pairs are created, and therefore that distribution becomes broader as more and more words are generated.
For instance, suppose that
Chody occurs only once in the seed text, preceded by
dol and followed by
daiin. For a while, the algorithm will generate a
Chody only after a
dol, and then would always generate a
daiin next. But if the pointer k is reset after a
Chody output to point to the word
ChChy, the generated text will have a new pair starting with
Chody, namely
Chody ChChy. Then future
Chodys may be followed by either
daiin or
ChChy.
A similar situation occurs if after a
Chody output the Mutate function is called and changes T[k] from
daiin to
kaiin.
Thus, as the algorithm progresses, the next-word distribution will become "blurred" by the incluson of
random previously generated words, and mutations thereof.
In fact, if the algorithm is run for a long enough time, the next-word distribution will tend to be the same for every word type. The algorithm then would reduce to a Markov chain of order zero, with a word frequency distribution that is invariant under the Mutate function. The next-word entropy should grow from near zero at the beginning of the algorithm to the entropy of that distribution.
Did you observe such effect in the output of your algorithm? And/or on the VMS?
All the best, --jorge