The Voynich Ninja

Full Version: List of "weird" vords
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5
(21-06-2025, 11:23 AM)Rafal Wrote: You are not allowed to view links. Register or Login to view.Yes, precise defining of "weird" words may be difficult.
But personally I am not very fond of looking for perfect definitions because it may put you into total block.

I agree with you! But my point was that, without giving a 'definition' of what is normal and what is weird, it's impossible to create a list of weird words (the 'definition' can be more more less fuzzy, but of course, the less fuzzy it is, the best: the list should be, at least mostly, reproducible).

I made a test with another possible definition: a word is 'weird' if it contains the least frequent bigrams. There are 42 bigrams which appear just once in the RF1a-n transcription (full list below): examples of words containing them are 'oqd', 'aqol', 'tameiin', 'docheesm', 'tolkaiinr' and 'qrgs'. The latter is doubly weird because it contains two of the rarest bigrams ('qr' and 'gs', 'rg' is rare too but appears 7 times). So, by this criterion, there are at most 41 weird words (I did not check them all).


List of the bigrams appearing only once in RF1a-n:

gs
qr
ix
qd
aq
jk
cd
oh
sm
qf
ms
rf
SPACE-z
me
eu
gy
hq
hm
SPACE-u
ml
yg
mm
qp
dx
nr
mr
yi
rn
ce
pk
tk
ax
tu
nk
vo
ze
kr
vr
rt
vs
pr
xd
You might find it interesting to see a list of all words of at least 7 characters from the GC transliteration for the Stars B3  pages.
(21-06-2025, 12:45 PM)Mauro Wrote: You are not allowed to view links. Register or Login to view.So, by this criterion, there are at most 41 weird words (I did not check them all).

The best weird words are those that don't exist. Smile

"qrgs" Dodgy

or "yr?" or "ys?"

[attachment=10861]

It's certainly weird.
As I have stated elsewhere I am inclined to the view that the "weird" or abnormal words most likely constitute the true text of the manuscript. Whilst, the ordinary or normal words are most likely filler or null text with no real meaning. So, then the real text of the Voynich manuscript is left once the repetitive filler words or text are removed from the manuscript. Once the real text is isolated then it can be studied to see what kind of cipher it uses whether simple substitution or something more complex. The difficulty is in clearly defining what are filler words or text and what are real words or text. As I have said before I think the majority of the text is most likely filler. In fact the explanation for how the filler text was generated may be along the lines suggested by Torsten Timms and others. The only difference being that whilst they argue that all the text is "hoax" text I would argue that only some of it is whilst other parts of text are quite genuine. I would expect that the proportion of real text is greater amongst the labels than in sentence text. I would think it most likely that each page would contain a small amount of real text and a lot of filler text. So, that the real text is distributed at a roughly common frequency throughout the manuscript.

Assuming this hypothesis is true I would be interested in ideas as to how the two types of text can be identified.
The problem is one of partitioning the text into two types "filler" text and "real" text. The most repetitive text seems to fit within the "filler" category. The difficulty is drawing the line or clearly making the distinction between the two types of text. Some real language can at times be a little repetitive, so one might have to be careful in not being too strict in distinguishing between the types of text. Obviously if every fifth word was a real word and all the other words were filler it would be easy to distinguish between them, but it seems unlikely that such a simple distinction exists.
(21-06-2025, 08:23 PM)Mark Knowles Wrote: You are not allowed to view links. Register or Login to view.As I have stated elsewhere I am inclined to the view that the "weird" or abnormal words most likely constitute the true text of the manuscript. 

This may be, but considering weird words introduces a practical problem: weird words are rare by definition ('weird' being synonymous with 'improbable'), so it's sure any list of weird words will include also the scribal and transcription mistakes. For instance: none of the six (on ~41) weird words of post #21 ('oqd', 'aqol', 'tameiin', 'docheesm', 'tolkaiinr' and 'qrgs'), from RF1a-n, can be found in the transcription of You are not allowed to view links. Register or Login to view. (it's obviously not my aim here to discuss the merits/faults of different transcriptions, I'm just taking them 'as they are').

Considering only the 'normal' (=~ more frequent) words, instead, largely avoids this problem, but there is no obvious break in the frequency data to suggest where the cutoff should be set, resulting as a minimum in a high degree of arbitrariness (and of course, if @Mark Knowles's hypothesis above is the right one, studying the normal words would be a waste of time).
(21-06-2025, 10:07 PM)Mauro Wrote: You are not allowed to view links. Register or Login to view.Considering only the 'normal' (=~ more frequent) words, instead, largely avoids this problem, but there is no obvious break in the frequency data to suggest where the cutoff should be set, resulting as a minimum in a high degree of arbitrariness (and of course, if @Mark Knowles's hypothesis above is the right one, studying the normal words would be a waste of time).
Studying the normal words would only be useful in so far as it helps to better identify what are normal "filler" words and what are real words. I would guess that around 20% of words are real words whilst the remaining 80% of words are just filler. This is just a guess, however I would be surprised if less than 10% are real words or alternatively more than 30% are real words.
It could be that the real text is encoded with a very simple substitution cipher in which case frequency analysis etc. should make the cipher easy to break. It could be that the real text is encoded with a complex cipher in which it will.be very hard to break especially in the context of having already to distinguish it from filler text.
These are all nice theories, but first you should look at it in practice.
When I look at books in Latin, German words suddenly appear. Mostly they are nouns. I don't know Latin, but these words immediately catch my eye. Not because I can read them, but because they simply don't fit into the picture.
Even if I don't understand either language, I can still see the differences. The picture changes, and that happens in all languages when it changes.
Hence the rule. If I can't translate something, I write it as I know it.
Sometimes I hear foreigners speaking. They can't translate our expressions, so they use our terms. Everything is normal.
(22-06-2025, 05:34 AM)Mark Knowles Wrote: You are not allowed to view links. Register or Login to view.around 20% of words are real words

You are piling up the improbabilities and suggesting that the manuscript is some horrendous mixture of valid text, fabrication, encypherment. This is too complicated. Consider it from the perspective of the authors. In addition to the task of having to write a thing of ~225 pages and ~36000 words using strange letters would they really have wanted to make things even more difficult by jumping between fabrication and encypherment? No, they would have wanted simplicity. Also afterwards they themselves would have found the manuscript difficult to read.

But did I read correctly that you think that 80% of the text might be bogus filler words? Why would they have wanted to waste so much parchment on meaningless words? But also look at the HerbalA1 pages. 95 pages of 8086 words gives an average of 85 words to a page. And then only 20% of those are real words, making 17. That is hardly enough to say anything of value about the plants the text is supposed to chronicle. All this pother just just to hide the meaning of those 17 words?

Somehow it just doesn’t seem very clever of the authors.
(22-06-2025, 11:27 AM)dashstofsk Wrote: You are not allowed to view links. Register or Login to view.
(22-06-2025, 05:34 AM)Mark Knowles Wrote: You are not allowed to view links. Register or Login to view.around 20% of words are real words

You are piling up the improbabilities and suggesting that the manuscript is some horrendous mixture of valid text, fabrication, encypherment. This is too complicated. Consider it from the perspective of the authors. In addition to the task of having to write a thing of ~225 pages and ~36000 words using strange letters would they really have wanted to make things even more difficult by jumping between fabrication and encypherment? No, they would have wanted simplicity. Also afterwards they themselves would have found the manuscript difficult to read.

But did I read correctly that you think that 80% of the text might be bogus filler words? Why would they have wanted to waste so much parchment on meaningless words? But also look at the HerbalA1 pages. 95 pages of 8086 words gives an average of 85 words to a page. And then only 20% of those are real words, making 17. That is hardly enough to say anything of value about the plants the text is supposed to chronicle. All this pother just just to hide the meaning of those 17 words?

Somehow it just doesn’t seem very clever of the authors.

I am not distinguishing between "valid text" and "encipherment". I am distinguishing between "null text"/"filler text" and "valid text". Without identifying the "valid text" it seems impossible to know whether it is enciphered or not. Simplicity is nice, but by its very nature encipherment cuts against simplicity.

It is not uncommon for enciphered text to contain filler text. They would have "wasted" so much parchment on meaningless words precisely because it would make it hard for someone to identify the real text. Null characters were very commonly used in ciphers of the time. You can say something of importance in 17 words.

It was very clever of the authors if it made it very difficult for us to read.

I have thought this the most likely explanation for Voynichese for a number of years.
Pages: 1 2 3 4 5