The Voynich Ninja

Full Version: List of "weird" vords
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5
(22-06-2025, 08:03 PM)Mark Knowles Wrote: You are not allowed to view links. Register or Login to view.I am not suggesting that the filler text is random gibberish. It makes sense that there is structure to it, if you like "structured" gibberish.

That's exactly the problem, if it was structured, it would be easily detectable. The only way for it to blend in with the text would be to be about the same kind of random as the text, which is very hard to do with a simple algorithm (short of actually using two texts: one for the message and one for the filler, but then it's basically all cipher). 

Especially given that there was no way in the 1400s to test the statistical properties of any gibberish generation procedure to ensure it will blend in good enough to fool computers 6 centuries later.

I cannot rule out the possibility of the text being mostly filler, but personally I find it highly improbable.

Also, I'm not sure it's correct to call Vonichese repetitive. It appears repetitive because it's in fact quite random. We are accustomed to lack of repetition in natural languages, because repeating words many times is an unreliable way of conveying information, so natural languages are anti-repetitive. If we scramble a large English text randomly, it probably would contain a lot of "the the the" and "will will will", etc, and maybe on a scale larger than in Voynichese (it's possible to compute this, I think).

And the very randomness of Voynchese makes me believe that it's intentional and works according to a precise scheme, all of it. These "chol chol chol" sequences do not look to me like a result of someone making up words, but like a result of a scheme that produces these "chol chol chol" sequences, maybe to a certain dismay of the encoder.
(22-06-2025, 08:35 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.
(22-06-2025, 08:03 PM)Mark Knowles Wrote: You are not allowed to view links. Register or Login to view.I am not suggesting that the filler text is random gibberish. It makes sense that there is structure to it, if you like "structured" gibberish.

That's exactly the problem, if it was structured, it would be easily detectable.

I am not at all sure you can say that.

(22-06-2025, 08:35 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.The only way for it to blend in with the text would be to be about the same kind of random as the text, which is very hard to do with a simple algorithm.

Again, this seems more like an assumption than a deduction.

(22-06-2025, 08:35 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.I cannot rule out the possibility of the text being mostly filler, but personally I find it highly improbable.

Also, I'm not sure it's correct to call Vonichese repetitive. It appears repetitive because it's in fact quite random. We are accustomed to lack of repetition in natural languages, because repeating words many times is an unreliable way of conveying information, so natural languages are anti-repetitive. If we scramble a large English text randomly, it probably would contain a lot of "the the the" and "will will will", etc, and maybe on a scale larger than in Voynichese (it's possible to compute this, I think).

And the very randomness of Voynchese makes me believe that it's intentional and works according to a precise scheme, all of it. These "chol chol chol" sequences do not look to me like a result of someone making up words, but like a result of a scheme that produces these "chol chol chol" sequences, maybe to a certain dismay of the encoder.

That is certainly not how they appear to me.
(22-06-2025, 09:21 PM)Mark Knowles Wrote: You are not allowed to view links. Register or Login to view.
(22-06-2025, 08:35 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.The only way for it to blend in with the text would be to be about the same kind of random as the text, which is very hard to do with a simple algorithm.

Again, this seems more like an assumption than a deduction.

Yes, as I said before, I cannot prove that the scheme you propose won't work. But I can think of no practical way to implement it using the XV century math in a way that would produce the statistical properties of the Voynich MS. If you have in mind any specific method of encoding with fillers, it's possible to test it and see whether it looks similar to Voynichese.
It's possible to designate specific segments of text using Stolfi's "start here" markers from the various circular diagrams, starting with VMs White Aries. It's built in.
(20-06-2025, 02:29 PM)Mark Knowles Wrote: You are not allowed to view links. Register or Login to view.Has someone mapped out the distribution of abnormal words throughout the manuscript?

Here are some literal maps of the distribution of hapax legomena.

I used the Cuva representation of paragraph text in RF1a-n.txt, as generated by bitrans.  Words with uncertain characters were removed, as well as paragraphs shorter than five lines.  Of 33 954 remaining word tokens, 5 015 were unique.

In the following plots, the coordinates are labeled from the upper left corner by line number (rows) and ordinal position of the first character of a word (columns).  Word counts are binned into 25-line x 1-character cells.
You are not allowed to view links. Register or Login to view.
The left panel shows the absolute number of hapax per cell;  the right shows their fractional density relative to all words in the cell.  The bunching and antibunching of density in the first few columns is a consequence of indexing the words by their first character.  The only large-scale pattern seems to be a slightly lower density in the Bio section.

For a collective pagewise distribution, all pages were stacked together to obtain the total counts in each 1-line x 1-character cell.  The fractional plot usefully corrects for the lower number of words generally at large coordinate values:
You are not allowed to view links. Register or Login to view.
(Blue values are greater than the scale shown.)  A feature appearing here is greater density in the first line, and perhaps toward the ends of lines.  The same can be seen when all paragraphs are stacked and sampled in 1-line x 5-character cells.  On this grid, greater density at the beginning of lines is also visible:
You are not allowed to view links. Register or Login to view.
When the paragraph coordinates are rescaled as rightwardness/downwardness, it becomes clear that the line ends also have a greater density of unique words:
You are not allowed to view links. Register or Login to view.
The few deviations from uniformity in these plots may simply correspond to known quirks of line and paragraph extremities (but quirks of the calculation are always possible).
(24-06-2025, 12:18 AM)obelus Wrote: You are not allowed to view links. Register or Login to view.clear that the line ends also have a greater density of unique words

This is consistent with the observation that when the writing was approaching an illustration or coming to the page edge and the writer could see that he was running out of space he would sometimes squeeze the last words together and eliminate the gap between them to make the line fit into the available space. Is  cholalaiin ( You are not allowed to view links. Register or Login to view. ) really one word? Or  ototaykal ( You are not allowed to view links. Register or Login to view. )? Or else terminate a word early and leave it with an unusual ending. Either would result in a unique word being formed.
Pages: 1 2 3 4 5