| Welcome, Guest |
You have to register before you can post on our site.
|
| Latest Threads |
Monthly Report: Analysis ...
Forum: Theories & Solutions
Last Post: Orlogs
9 minutes ago
» Replies: 0
» Views: 4
|
ORIGINAL stains on the ve...
Forum: Physical material
Last Post: RobGea
49 minutes ago
» Replies: 3
» Views: 88
|
Three arguments in favor ...
Forum: Theories & Solutions
Last Post: Stefan Wirtz_2
1 hour ago
» Replies: 14
» Views: 989
|
Wherefore art thou, aberi...
Forum: Imagery
Last Post: Stefan Wirtz_2
1 hour ago
» Replies: 48
» Views: 13,534
|
The Naibbe cipher
Forum: Theories & Solutions
Last Post: ahalay-mahalay
1 hour ago
» Replies: 83
» Views: 20,698
|
L. Rauwolf
Forum: Provenance & history
Last Post: DG97EEB
1 hour ago
» Replies: 42
» Views: 6,171
|
Water, earth and air
Forum: Voynich Talk
Last Post: Stefan Wirtz_2
2 hours ago
» Replies: 17
» Views: 7,060
|
On the word "luez" in the...
Forum: Marginalia
Last Post: nablator
3 hours ago
» Replies: 41
» Views: 967
|
No text, but a visual cod...
Forum: Theories & Solutions
Last Post: Antonio García Jiménez
3 hours ago
» Replies: 1,658
» Views: 919,928
|
Examples of connected dot...
Forum: Imagery
Last Post: Antonio García Jiménez
3 hours ago
» Replies: 40
» Views: 20,210
|
|
|
| Can LAAFU effects be modeled? |
|
Posted by: pfeaster - 03-09-2025, 12:42 PM - Forum: Analysis of the text
- Replies (36)
|
 |
I always enjoy reading about people's efforts to model systems that can generate text mimicking the vord structure and frequency ratios of Voynichese, and I think we stand to learn a lot from them. Sure, there's no guarantee that a system that can produce output superficially like Voynichese resembles the system actually used to produce Voynichese. But much of the time we seem to be at a loss to come up with any plausible explanation for the weird patterns we find, and in those cases the models -- if successful -- can at least help show how those patterns could maybe have come about (which seems like an improvement on having no leads to follow at all).
On the other hand, we hardly ever see comparable efforts to model LAAFU ("Line As A Functional Unit") behavior. To summarize what's at issue for anyone who might need it: Voynichese running text displays clear patterning at the line level. The first vords of lines have distinctive statistical properties, as do the last vords of lines. But so, sometimes, do the second vords of lines (see You are not allowed to view links. Register or Login to view. at Agnostic Voynich). And You are not allowed to view links. Register or Login to view. that many vord features have subtler "preferences" for earlier or later positions deeper within the mid-line.
My feeling is that most proposed explanations don't bear up particularly well to scrutiny.
1. Do line-start and line-end features correspond to parts of words split across line breaks? Likely not, since line-start and line-end words aren't shorter on average than mid-line words (I don't recall offhand who studied this, but someone did).
2. Are line-end features abbreviations employed when the writer was running out of space? Maybe -- but my sense is that, in practice, abbreviations didn't typically cluster at line-end in manuscripts of the period, so this would be a stranger explanation than it might seem at first glance.
3. Do line-start and line-end patterns reflect a linguistic phenomenon, or some other patterning of underlying content (such as poetry)? That would be hard to square with line breaks seemingly inserted as necessary to fill available space around illustrations.
4. Do line-start and line-end patterns reflect contextual scribal variations -- i.e., different ways of writing the "same" glyphs at the beginnings or ends of lines? To be sure, there was plenty of contextual scribal variation in other European writing systems of the period (though not a lot specific to line ends and line starts). But that variation was conventionalized and had emerged over many generations. Unless Voynichese had a long undocumented tradition behind it, when -- and under what pressures -- would such conventions have evolved?
I don't claim that any of those explanations is weak enough that we can completely dismiss it, but at the same time, none of them strikes me as very persuasive -- certainly not enough so that we could say, "Oh, that's probably just X, so it's most likely safe to ignore."
On the other hand, I can imagine a system that would predictably produce line effects as a natural byproduct of its use, and that also falls well within the range of hypotheses people already entertain about how Voynichese might have worked (along the lines of Rene's You are not allowed to view links. Register or Login to view.). Consider this set of specifications:
(1) Lines always break at word boundaries.
(2) Within lines, words are run together indiscriminately.
(3) Text is chunked for encoding into units consisting of one or more consonants followed by one or more vowels, with each "chunk" being encoded as a vord.
(4) It's possible to encode an isolated consonant or vowel (or isolated clusters of either), but this is done only as needed to satisfy rule (1).
(5) Vords that are similarly structured represent similarly structured "chunks," but not in a straightforward letter-by-letter way (imagine something like Naibbe encoding tables not being randomly interchangeable, but each encoding a different category of "chunk").
I've brought this idea up here before, but only as a thought exercise. Now, to try it out in practice, I've just cobbled together a little over a million characters' worth of miscellaneous transcribed medieval Latin and run a few experiments on it to see what would happen to the plaintext (prior to any further encoding) if it were "chunked" as I've described. Note: it isn't actually necessary to break the "chunked" text into lines to gather data about what characteristics different line positions would have -- presuming that line breaks are inserted arbitrarily, we just have to work out how each word would be "chunked" in each of several positions and compare the results.
Based on my sample, the top twelve most common "chunks" in the middle of the line (i.e., the units we get if we run all text together) would be:
[re] 2.53%
[te] 2.02%
[ta] 2.02%
[tu] 1.89%
[mi] 1.63%
[ne], [ra] 1.62%
[ri] 1.59%
[ti] 1.54%
[si] 1.45%
[ni] 1.43%
[se] 1.38%
At line-start (considering only the first "chunks" of individual words), the top twelve most common values would instead be:
i 8.18% † -- yes, the "i" should be in brackets, but that gets misinterpreted as an italics flag! Darn forum formatting.
[e] 8.16% †
[a] 7.04% †
[co] 3.12%
[re] 2.51% *
[o] 2.31% †
[se] 2.20% *
[no] 2.16%
[si] 2.10% *
[de] 2.07%
[u] 1.95% †
[pe] 1.53%
The asterisks mark cases that overlap the mid-line "top twelve," while daggers mark cases that could only occur line-initially. Meanwhile, at line-end (considering only the last "chunks" in individual words), the top twelve most common "chunks" would be:
[s] 18.10%
[m] 16.06%
[t] 11.90%
[r] 4.16%
[n] 3.94%
[d] 2.74%
[re] 2.45% *
[nt] 1.85%
[c] 1.79%
[ns] 1.38%
[ne] 1.26% *
[st] 1.22%
The two cases marked with asterisks overlap the most common mid-line "chunks," but the others would be exclusive to the end of the line.
The second "chunk" in the line -- analyzed so as to permit crossover to a new word, e.g., the second "chunk" in [ex urbe] would be [xu] -- also seems likely to have distinctive characteristics because it will tend disproportionately to represent the second syllable of a word. And indeed it does. For example, [re] is significantly less common as the second "chunk" in a line (1.03%) than as the first "chunk" in a line (2.51%) or in the mid-line as a whole (2.53%). Meanwhile, [mi] is somewhat more common as the second "chunk" (2.11%) than in the mid-line as a whole (1.63%).
As this illustrates, a syllabic encoding scheme along the lines I've described should predictably generate LAAFU effects considerably stronger than the ones we see in the Voynich Manuscript -- and they would affect not just first and last vords, but second vords as well (compare You are not allowed to view links. Register or Login to view. at Agnostic Voynich). I'm less sure about it producing subtler mid-line patterns, but I wouldn't rule out that it might, in practice.
If these effects seem too strong to be comparable to Voynichese, one way to weaken them would be to substitute this for rule #2:
(2) Within lines, the words that make up phrases are run together indiscriminately, but the phrases themselves are not run together.
The beginnings and ends of lines would still have heavily skewed statistical characteristics, but there would be fewer forms that could only be found there -- now limited to "chunks" that occur at beginnings and ends of individual words, but not at beginnings and ends of whole phrases.
Magnesium writes as follows about Voynichese LAAFU patterns:
(12-08-2025, 10:01 PM)magnesium Wrote: You are not allowed to view links. Register or Login to view.One of the things I want to explore is the extent to which the structure of the plaintext can create these biases within Naibbe ciphertext. For example, if the Naibbe cipher were used to encrypt a poem such as Dante's Divina Commedia, the poem's line-by-line structure would have rhyming, repeated phrases, etc. that would theoretically impose greater line-by-line positional biases in the frequencies of plaintext unigrams and bigrams relative to prose such as Pliny's Natural History. Is that sufficient to explain the full extent of the VMS's "line as a functional unit" properties? Maybe, maybe not. But maybe it becomes much easier to achieve "line as a functional unit" properties within a Naibbe-like ciphertext if the plaintext is a poem or poem-like in its structure.
There's certainly no harm in exploring that. But since one of his goals is to "(b) consistently replicate these properties [ = 'well-known VMS statistical properties' ] when encrypting a wide range of plaintexts in a well-characterized natural language," I assume he'd prefer to model a system that would reliably produce LAAFU effects when applied to any source text.
Just wondering: how difficult would it be to adapt the Naibbe approach from a unigram/bigram system to a syllabic "chunk" system? Might the frequencies of different "chunk" types result naturally in something like the frequency distributions simulated through playing cards?
|
|
|
Looking at f65r in regards to the line as a functional unit. |
|
Posted by: anyasophira - 02-09-2025, 03:22 AM - Forum: Analysis of the text
- Replies (17)
|
 |
note: I am -struggling- with the formatting on the forum. I have attempted quite a few times to make this right without strange extra spaces and random inserted formatting tags. I have temporally given up and will attempt to figure this out later. But that does mean the text is presented very strangely.
Hello Everyone.
Has there been much discussion about You are not allowed to view links. Register or Login to view. and the implications of its very existence? Three words, one line, a single plant picture. And while we can’t identify the plant itself, it’s not a nebulous star lady or uncertain bathhouse that could also be interpreted as human organs. It’s a plant. Let’s assume that vords are not nonsense. Let’s also assume that they somehow give info if you know the rules behind them. If Voynich gives any information, then it might be useful to look at the smallest example of vords in a line because, as we know, the line seems to have it’s own functions, especially the first line of a paragraph. Now I looked to see if this had been discussed yet, but I could not find it on this forum. I suppose you will all let me know if all of this is a tired well discussed line of thinking.
My question, could You are not allowed to view links. Register or Login to view. not be helpful in studying the line as a function?
Let’s begin with what is on the page.
f65r has a single plant, with 3 words written in a line next to a leaf on the bottom half. It may be called labeled, but it certainly has the format of a line. This plant either needed only one line, or this plant was never finished, and so we see how a line begins before finished.
Has it been discussed or looked at that If the manuscript gives information, then whatever mechanism or language being used was able to give information about a plant using only the glyphs on the page, with the frequency and order given? Has anyone ever considered what glyphs on f65r’s are used, and their frequency and order on the page?
Here are a few thing’s I have noticed by looking at this folio:
My method: I use the "Glen Claston" (GC) transliteration files which uses Keys and IVTT for my calculations. I extract numbers using that transliteration file and a python code then use an AI voynich bot using EVA to double check to make sure my numbers are at least in the ballpark of correct. ( Which is a good way to use AI btw, just to help take your data that you already have to help double check. I promise I never give AI real responsibility here, I know it lies) I also cross reference Voynichese online which uses Eva and use my eyeballs as well. ? I did not include Rosette. If I do not specify that I am using herbal section numbers then I am looking at all folios. When looking at ratios I just add up all keys on a folio and then see what the ratio of specific key is. When it’s a pair I do the same. I treat bench gallows as separate however I do specific here where they occur when relevant if you do not feel the same way. And luckily the glyphs I am looking at are pretty straight forward so I don’t have to get into whether c is ch etc.
The folio in question: f65r, the first folio after a large gap between You are not allowed to view links. Register or Login to view. and several missing manuscript pages. Part of the herbal section Part of an elusive quire that has three hands and two languages. A quire that has two stars text pages, two herbal pages and one cosmos page.
On F65r. is a large plant and near the bottom on the left side reads:
Otaim, dam alam
Otaim is found 3 times (f111v f65r)
dam is found 97 times
alam is found 9 times (111r, 107r to name a few)
Both EVA and G.C do not disagree with these transliterations, so we have some somewhat agreed upon vords.
What was chosen, or encoded or needed, or used: So when writing a single Voynich sentence, for whatever rules, or reasons, there is one example where O, T, A, I, M, D would be needed but the rest of the glyphs would not.
A single line of text that gives further information about a single plant needed these glyphs but not others and only needed 3 words, and 12 glyphs
What is there and what is not there: I have seen it argued that before we crack the code we must define what a glyph/letter actually is, and what a word actually is. However these glyphs on the page are not as debated about as much, at least to what I have found in my research. There is no N so we don’t have to argue about whether i and n is the same glyph or separate glyphs.
A can be debated sometimes as O, but if you look at these A’s they are nice examples, where they connect pretty well. There are no bench gallows, or benches like sh or ch. Or even s which sometimes gets debated as r like. There is no y, which sometimes is debated as unnecessary, or perhaps a type of A or maybe filler. There is however M, who stands at the end, an single i which is very unusual though not unheard of, and what is nice about M is it is very distinct from I, with a closure. There is one single gallows- t so we don’t have to worry about the debate P might be K. We don’t have to also debate whether ee is ch or e is c so that is also nice. Also, does this page somewhat give us some inherent information about the nature of M? This might be evidence that it’s an abbreviation or a placeholder for other glyphs. I would argue that it eliminates it’s use as random padding since there is clearly enough room on the page to not need it ( unless the text is meaningless). However…maybe this is evidence it -is- a null which is why it is necessary even in a three letter line to disguise even smaller chunks. I don’t know cyphers enough to decide which point these land on.
Before we look at Indvidual Glyphs lets look at a few things of note with clusters.
im: according to my scan there are 52 instances of im in the entire manuscript ALL also begin with a to make aim. Aim can exist embedded and also on its’s own. It’s very fond of daim and more often than not has that d there as well. I have not checked every single one of these to make sure the “a” is a proper one yet.
iim: Just to be diligent I explored iim as well. There are 17. All but one must have A to make aiim. My strange outlier is f113r. the word is oriim. What a weirdo. So that means Ais not always required for this cluster. I checked this one visually and this is a very separate fat little o that is definitely not an “a”. Also aiim exists as its own word on f87v.
iiim: Somehow does exist, just once, You are not allowed to view links. Register or Login to view. in the word loiiim. No A needed here, just an o. I checked the actual page and it does -seem- to be an o instead of a but this one is less clear.
Single i in the im family is more frequent than double ii making it a reverse pattern to in and iiin.. Numbers are very small here though.
The glyphs themselves.
Their frequency/ratios:
When looking at these three words, I find it useful to look at their ratio of presence to the total number of glyphs given on a folio page. I like to compare these against other folio pages and how they behave, rather than if everything was normalized to be equal chance. Sometimes I look at the whole manuscript, and other times I focus just on herbal pages.
When doing this A and M are the most noteworthy presence.
Eva M-
Why M and not N and why im and not in or iin?
M is favoring rightness, though more on a vord basis than line basis. M is favoring ending vords. M ended the line. So we see M following those “ line as a function trends”. But we see M here and not N or Y. And we see M dominating the page at a 25% ratio.
M occurs at least a few times a page, but its ratios are very very small. Most of the time it sits around 0-2% of any section. Perhaps it’s inflated because of how little the text is. So we look at the next ratio of high M. The highest ratio of M after You are not allowed to view links. Register or Login to view. is F3r, an herbal page has 456 keys, so a goodly amount, and it’s ratio is 24 making this 5.26%. That’s quite a difference. Third highest folio with a ratio of M -F3v has 295 total glyphs and 31 M’s. Not bad! It’s a lot less overall glyphs so perhaps M stands out when there are less glyphs. Nope- it’s a bit less at 4.07%. Let’s look at 4th and 5th highest- f24r, 350/40 M’s 11.43% and You are not allowed to view links. Register or Login to view. 266/24 which is 9%. These folios with the top highest ratios of M in the entire manuscript, ( including f65r). Notice that with four of these folios, less glyphs on a folio usually mean slightly less ratios of M. Well this is not a trend found in You are not allowed to view links. Register or Login to view. with it’s scant glyphs and M ratio at extreme heights. Would love to eventually check n pattern and see how they relate to M here.
Eva A-
The Eva A glyph will go up and down in ratios across different folios. For example on page You are not allowed to view links. Register or Login to view. it is at a 11.36%, while on many of the bath pages it likes to be around 5.8- 7.8 %, and on some pages it is also lower at even 4.28%. The top highest ratios it has is f67v2 263/52 19.77 %, f72r2 452/81 17.92% and You are not allowed to view links. Register or Login to view. 1577/274 17.37%. These are folios with a lot of glyphs. I looked at a few that have smaller sample size, and I noticed that like with M, with -more- glyphs the % of A goes up and with less it tends to go down, although this is a quick glance at the rates, and would need to be checked. So A being so numerous even with smaller numbers is counter to what it normally likes to do. For example You are not allowed to view links. Register or Login to view. with 177 glyphs has 22 a with 12.43%.
And You are not allowed to view links. Register or Login to view. also has of course the highest ratios of A of an other folio.
F65r has 4 A’s, with a rate of 33.3%. The next folio with the highest ratio of A only has 19.77 % with 263 glyphs.
Again, we do need to ask that if we have so little data, does this mean that A and M are acting strangely because of how small the glyphs are? But then they are acting opposite of their usual tendency. And they are also behaving different while other glyphs are not ( see below). And wouldn’t their inflation -be- the curious thing here. Again given only a few letters to use, a and m were pulled in greater number rather than o or y or n.
Eva-i
So in this case, I is not a compound. But it’s tricky. So let’s just measure it as it is found- A single i without another I or n following it. This is rare with it’s rates often 0% on most folios. But it is found. Obviously You are not allowed to view links. Register or Login to view. is the highest at 8.33%. When looking at the entire manuscript not just herbal section we have folios with 1.52, 1.22, .99, .76 and it just goes smaller. Unlike M and A there is not trend at all of higher or lower amounts of glyphs effecting ratios of single i- it’s all over the place.
f65r- The glyphs that are not acting out:
Eva-o
The ratio of o is 8.33% , a little low to other herbal pages, but it matches the ratio on You are not allowed to view links. Register or Login to view. at 8.37%, so on typical herbal pages it can be that low. Now on herbal pages, o is more likely to be in the 10% range, and 41r has one of the lowest o ratios in the herbal, but still it can and does exist elsewhere to have these ratios. So we do NOT see a giant inflation of o because we only have three words, we just see it sitting in it’s usually ratios.
Eva-d
The ratio of eva d is 8.33% . I randomly looked at a few herbal pages and d goes up and down based on the folio.
For example f 33v has 8.9 %, You are not allowed to view links. Register or Login to view. has 11.90% and You are not allowed to view links. Register or Login to view. only 5.24%. So d numbers move up and down a lot more than o does. So d isn’t even low or high or inflated! Its just acting like all the other ratios of D on the herbal pages.
Gallows Eva T
Let’s look at Eva T with ratios regarding it’s place on top line ( first line in a paragraph). I honestly forgot to go back and look at T ratios per folios, so that can be a further research idea.
If we count the single line on You are not allowed to view links. Register or Login to view. as top line, then it’s ratios of Eva T does not really stand out with ratios at 8.33%. .
On the top rows in the herbal folios, Eva T can be found in different types of ratios, from 0% to 11% 13v ( biggest top line ratio in the herbal) to a mid range of 4.87% (f5v).
So one single T on Folio 65r falls in the middle of the ratios that T can be found in top line.
The ratio is ordinary of how many T's we expect to find in a top line.
However having a vord with T or OT start the first line is a bit more unusual.
Here are all examples found in the manuscript.
See below for all examples In the entire manuscript.
8r, tydio
15r, tshor
18v, told
44v. tsho
65r otaim
80r toroly,
82v otechdy,
84v tody,
88f otorchety,
96r tor
I did not include bench gallows T- there are a few stretched out strange ones that do begin top line. If they are included they still align with all of this.
Is it strange that P F and K are not around in f65r? Is it strange that only one type of gallows is there?
As I look at those three words, it does strike me that when given only a little bit of glyphs to use, the scribe or creator or their encoding method resulted in T instead of P. Remember P dominates top line, and especially dominates the first Vord of the top line. OT and T do not usually start that top line.
But You are not allowed to view links. Register or Login to view. is also unusual in that it is a top line that only has one type of gallows.
So...Is that a different type of top line?
Looking at all herbal pages top lines we have 121. Then I looked for any that only had one type of Gallows, P, K , T or F and considered their ratio to total top lines in the herbal section.
( Not considering bench gallows).
Results
The follow folios have a top line with only one type of gallows. the trend stands regardless if you consider bench gallows, but I decided to include bench gallows data just to be transparent.
Eva T – 5 folios- 4.96%, (Including f65r).
Eva P – 3-4 folios- 2.4 % to 3.3% see below
Eva K- 1 folio- 0.8%
Eva F- o folio- 0.% may need to double check this.
All other folios are mixed in some way at 89.3%
So while we think of P as dominating top row, when a single gallows is allowed, suddenly T is the top contender to be that gallows, at least in the herbal section. Also, does this somehow lead to the function of T being in labels a lot? Labels -can- have more than one gallows, but maybe more often they are much likelier to be a single type gallows situation than a paragraph lines, so thus they favor T over P? ,
Herbal Pages with a single Gallow on the top line.
Only t (no p, f, k) – 7 folios
Note: All of these lines begin with either T or OT and if we include T bench gallows one begins with CTH - f9 starts with a T vord. r There are 2 bench gallows as well, but if you consider them the same as T, they are both T bench gallows. If bench gallows are not considered ( and they were not for any other gallows) then it’s still singular type of T gallows. If they are considered, it’s interested that they match the T completely
- f15r – Starts with a T vord. 3 t gallows here.
- f18v- starts with a T vord. 4 t gallows here
- f35r- A tricky one. It has the loooonng T bench gallow to start. However there is also a single T later in the line.
- f44v-starts with T- vord. 6 t’s here and only t bench gallows.
- f65r- starts with OT vord. 1 t
so yeah take your pick- ether bench gallows don’t count and so don’t mess up this classification, or they do count and are just more T’s in these T lines.
Top line with Only p (no t, f, k) – 4 folios
Note: All of these lines begin with P. We have Pch twice and Po and psh. All but You are not allowed to view links. Register or Login to view. has more than one P, with the last two having OP as well.- f14r- 2 P’s
- f17v- in question, EVA says its 3 P’s and IVTT says it’s 2 P’s and 1 F.
- f24r-2 P’s
- f96v- either 3 P’s and 1 f bench ( which we might consider another disqualifying situation, since it doesn’t match, and only furthers T’s dominance) or 2 P’s, 1 P with what looks like the beginning of a bench on the left that never actualized and 1 F bench gallows.
Take your pick, either bench gallows do not count, and so there are only 4 folios. They do count but only if they match the type of singular gallows, in which case we still eliminate 1-2 folios once again boosting T as most likely to be top row with only one type of gallows.
Top line Only k – 1 folio
Uh oh, we might need to rethink this one. So the very first line is a single word. But perhaps this is the most like 65r? The leaves seem similar, the roots seem lumpy too. Probably not really that special since all those plants are lumpy and strange. Lets pay more attention to the glyphs. So we could consider this a legit line as we considered You are not allowed to view links. Register or Login to view. a legit line, and in that case, then here is the single and sole example of an herbal line with only K. It reads: Keer (e:o)dal. Or a single word is just not good enough to stand as a line, we need at least two, thus eliminating this, and having 0 top row lines that have K only in the herbal section. K on it’s own is rare either way.
Only f – 1 folio
None.
F65v if considered a top line, is in a small little sub class- one with only one type of gallows included.
So what does this say?
So we classify a somewhat rare type of top line, that which only has one type of gallows. And in that we only have a few samples at -most- 10. So that makes You are not allowed to view links. Register or Login to view. a part of a very small group of top rows that go against the norm
Then we need to look at if we allow for bench gallows, or only those who match.
But the trend remains no matter what.
Let’s just eliminate those folios whose singular gallows type has any type of bench gallows and be really conservative.
We then have top lines with only type of Gallows in it:
Gallows T- 3
Gallows P- 2
Gallows K- 1
Gallows F- still not here.
T still leads, and by a greater margin than P. There are at least 3 examples of T existing on a top line without any other gallows including bench gallows while P only has 2. So, T can be used in a top line if other gallows are not used, or needed or wanted or whatever is happening behind the reasons the the Voynich script acts the way it does. So while P is line initial, and line ratio dominate, it is so when other types of gallows are also there.
What does this all mean? I have no idea lol. But I do know that at least now there is some reason that if you had to pick only one type of Gallows in those three vords, the trend was to choose T over any other gallows.
So f65r, with it’s single line and three vords is doing the following: - Increasing it’s M and it’s A by an incredible rate against the usual trend of decreasing with less glyphs
- It is using single i but not using N.
- It is using O, T, and D at normal ratios.
- It is not using Q or S or R or Y or Ch or Sh or any other common glyphs. This is against the ratios usually found.
Does this indicate something? Is this evidence that these are not necessary to encode, encrypt, or relay -some- information, no matter how minimum?
- Also, You are not allowed to view links. Register or Login to view. is one of only 2 folios without a Q ( You are not allowed to view links. Register or Login to view. also doesn’t have Q).
- [font=Times New Roman] [/font]Lack of Hapax. It goes against the increased use of hapax words at greater rates in the top line. When looking at the trends of other folios, it is very unusual that every single one of these vords is not a hapax. The top row of herbals has a huge ratio of those, and labels do too. Otaim is found 3 times (f111v f65r) dam is found 97 times and alam is found 9 times (111r, 107r to name a few).
- [font=Times New Roman] [/font][font=Times New Roman] [/font]It goes against the bigram structure: It does not favor vords in groups of bigrams, as two of the words have odd numbered glyphs. But it has 12 glyphs total, and therefore could have been in even number glyphs. Some possible explanations:
- The Encryption process with fewer vords trended towards odd vords for some reason.
- The information being given lends it’self more to more A and M and odd numbered vords.
- The mechanism which makes the first vord longer than the second vord in a line had more weight than the mechanism that creates even numbered vords.
F65r also follows a lot of trends and features of Voynich top lines, and glyph ratios
Below might indicate how important or integral these behaviors are to the encoding of the manuscript. Because despite having only 3 vords, these features were either important enough, or dominant enough to still exist as if there were paragraphs of text.
- There is a gallows in the first vord. And it doesn’t go against the norm when using T both as when given only type of Gallows, and also when given how often T is used on top line. Having a gallows on top line in the first vord is pretty integral to how Voynich seems to function.
- General Vord Length: It follows the trend of gallows words being longer ( otaim has the most glyphs on the page
- First vord versus second vord length: as stated above, the first vord is the longest vord on the folio and it is followed by the shortest.
- Ratios of O, D and T: Are not strange by folio in the herbal pages. So how hardwired are these 3 glyphs in the workings of how this script works that even in this tiny sample of Voynichese, they are trending as if they were in a paragraph of text.
- General order of where certain glyphs like to be placed in vords and lines: The gallows in first vord and leftward. The a, and I and m are in their usual spots when they do appear, as a cluster at the end. Initial O is normal. Initial A is less likely but breaks no major trends or patterns.
- Repetitive nature I mean we have three words that end in -aim then -am and finally -am. We see that even in minimum, and even when we have an unusual amount of A and M, voynichese is being repetitive to the vords around it. What does that say about its nature? Are there lots of A and M because it must repeat, and so it is using what it has on the page? Or is the information being used heavy on the use of using A and M to relay it, and therefore the words will look similar? If this is true, does the first word on the folio or line or paragraph set the stage for the line? Or is A and M always doing this and we can’t tell when other glyphs exist on the page hiding their statistics? How can we test for this? ( The last one is less likely I feel). Does this page help eliminate the idea of a natural unknown language, considering that the pattern of repetition is holding true even with three words, even when using uncommon bigrams and/or glyphs. Like what language does this? This seems very purposely whether meaningless or not.
A few more questions I have regarding research:
- Could this page look this way using a cypher like the Naibbe cypher? Is this like the labels where that falls apart, or does this hold up act more like a line in that regards.
- Does this line speak to Patrick Feaster’s work with loops, or rightness/leftness?
- How does this work with any of the slot theories, or core and mantle theories?
- Does this line have other implications that I have not see?
When looking at f65r, it appears to me that regardless of language, or encryption process or if this page is making a note about the plant, or something completely off topic, we have a single line with 3 vords that would be considered suitable information for the creators for that given page, and we still end up with an initial vord that is larger, the second word that is shorter, the T is at the front and the m is at the end, after a or i. D also starts a vord, which is interesting seeing how one of the most common vords in the text has DA starting it. Unlike most pages, there is not a lot needed such as more vords or lines, or q or y or r or s. It also does not necessitate P or F or K. The folio also doesn’t use any daiin or aiin variant. The 3 vords are not hapax, however the first vord -is- uncommon and found only 3 other time so perhaps the chances a vord becomes hapax increases with glyph count. Something to explore later perhaps.
There is a really good argument against the idea that the folio was meant to be longer, and that is how the M’s are doing such a good job of ending everything. It would be MORE unusual for a Muli lined paragraph to begin with three words that start with M. In fact this would be against the trends MUCH more than this three-letter line that for the most part follows line trends.
And finally, I have indeed asked myself " What makes this plant so obvious that it only needs those 3 vords."
Some crackpot theories:
With the gap of lost pages, perhaps a few other herbal would have matched this style of page if we had them.
Or maybe this is just that that obvious of a plant. Like "duh, this one is outside your window, you use it daily, not much really needed to mention".
Or even wilder, hey here is a plant that isn't a composite, unlike the rest of all those -other- plants, so its pretty straight forward and doesn't need numerous plant info. Ok I know thats a reach.
I know this isn't Hand 1 or Hand 2- can't recall which Hand it is, but I know its a rarer one for herbal page, so maybe this is more a function of Scribe style. But by that I mean using only 3 vords. My art speaks for itself perhaps?
And this is where I leave this thread with these final thoughts and observations.
Also, I suppose I should make a quick intro since this is my first real post. This is my first time having the nerve to speak up in my own post so I thought I could state a little bit about myself. I am a 40 something elementary school teacher who got the Voynich fever around February 2025. I knew nothing. So I leaped into it and absolutely attempted to see how Ai could be used. It could not. No matter, I continued fourth. Slowly I gained enough knowledge to be able to read EVA and IVTT Keys which further allowed me to understand the research better. I got better at knowing the layout of the manuscript and what the real problems and limitations are of this script. I slowly began to understand , at least a bit, what the heck entropy is. And swallow tailed melons. I read discussions and listened to the videos. I am terrible at math and I can’t code, but I did figure out what a transliteration file was and slowly read Rene’s site, though it was dense and hard for a newbie, and also somehow slowly got through Stolfi and Emma, and Tavi and Koen and Mark, and Peter, and Lisa. Torsten was harder, along with heavy slot scientists. I read theories that were not correct to get a sense of what makes a bad theory. Those are useful. I still don’t understand how historical cyphers work, but I do know to stay away from anything relying on simple substitution. I wish I could read Latin or understand how topic modeling or Markov models work. And to be honest and humble, I don’t know what I could possibly bring to the table outside of these small little observations. I just can’t quit, because I see those Glyphs on the page, and they are so pleasing to the eye, and they feel like they -should- say something and it drives me crazy that they don’t.
Anyways, not sure how to end this.
I would love to here more data or thoughts or additional questions from this folio.
Thanks,
A.
|
|
|
| Detecting Vowels in the Voynich Text |
|
Posted by: quimqu - 01-09-2025, 01:16 PM - Forum: Analysis of the text
- Replies (1)
|
 |
One of the big challenges when analyzing the Voynich manuscript is that we don’t even know what symbols are “vowels” and which are “consonants”. To explore this, I built an unsupervised pipeline that tries to infer vowel-like characters directly from the distribution of glyphs in the text, without assuming a known language.
To do this, I treat each glyph as a separate character. But in the MS, common pairs like “ch”, “sh”, “ai”, “in” might actually behave as single units. So, to detect these automatically, I compute PMI (Pointwise Mutual Information): if two symbols co-occur much more than expected by chance, I fuse them into a digraph token.This avoids false positives like treating “c” as a vowel just because it appears almost always in “ch.”
Then I train a simple HMM with two hidden states: one meant to represent “vowel-like” positions, the other “consonant-like.” The model is biased to prefer alternation (V↔C), because in most languages vowels and consonants interleave rather than forming long runs.
But the HMM alone isn’t enough. So I add metrics that capture linguistic tendencies:
- Coverage: vowels tend to appear in many different words.
- Neighbor entropy: vowels have diverse neighbors on both sides (they can be surrounded by many different consonants).
- Repetition rate: I applied a threshold based on when they appear doubled (consonants are more often repeated).
- Position: vowels often appear inside words, not only at the edges.
Each of these gets a weight, and I combine them with the HMM score into a calibrated probability of being a vowel.
Languages typically have only a handful of vowels (say 4–8). So instead of accepting every symbol above 0.5, I impose a parsimonious prior:- I target ~6 vowels in total.
- Candidates must pass a minimum probability threshold.
- I choose the set that minimizes structural loss, i.e., reduces long VV or CC runs and increases alternation.
I can configure the script to work on latin like languages, Indo-European languages, Semitic languages, Syllabic languages... setting the vowel patterns for them.
The result is a short list of the most “vowel-like” glyphs or digraphs.
I have tested:
Indo-European languages often have 5–8 vowels, noticeable V↔C alternation, and frequent diphthongs; this config favors ~6 vowels, rewards alternation and neighbor diversity, and lets strong data-driven digraphs emerge without hardwired seeds. This is the result:
Indo-European
| Voynichese | EVA | prob_vowel |
| o | o | 0.983284 |
| y | y | 0.970378 |
| a | a | 0.886179 |
| k | k | 0.865411 |
| t | t | 0.824919 |
| l | l | 0.709329 |
| ch | ch | 0.684790 |
| c | c | 0.660998 |
| f | f | 0.658652 |
| p | p | 0.640008 |
| s | s | 0.624870 |
| ai | ai | 0.602096 |
Semitic scripts often omit short vowels and allow heavy consonant clustering; this setup targets ~3 vowel-like symbols, relaxes alternation and repetition penalties, and keeps digraph selection fully data-driven. This is the result:
Semitic
| Voynichese | EVA | prob_vowel |
| o | o | 0.978236 |
| y | y | 0.904206 |
| k | k | 0.888190 |
| t | t | 0.809498 |
| d | d | 0.792952 |
| a | a | 0.763163 |
| l | l | 0.750061 |
| h | h | 0.661877 |
| s | s | 0.654497 |
| r | r | 0.632587 |
| p | p | 0.625348 |
| f | f | 0.619525 |
Syllabic scripts pack consonant+vowel into single signs, so we weaken alternation, fuse more bigrams to approximate CV units, use gentler repetition/edge penalties, and select a flexible 3–7 set of vowel-like nuclei. This is the result:
Syllabic
| Voynichese | EVA | prob_vowel |
| o | o | 0.975626 |
| y | y | 0.945105 |
| t | t | 0.796126 |
| a | a | 0.756872 |
| k | k | 0.748639 |
| r | r | 0.702806 |
| ch | ch | 0.685732 |
| l | l | 0.643413 |
| ol | ol | 0.634971 |
| s | s | 0.608752 |
| sh | sh | 0.604313 |
| d | d | 0.581746 |
Arabic is an abjad where short vowels are often omitted and consonant clusters are common, so this config targets ~3 vowel-like symbols, weakens alternation and repetition penalties, and keeps digraph discovery fully data-driven. This is the result:
Arabic
| Voynichese | EVA | prob_vowel |
| o | o | 0.960930 |
| y | y | 0.866163 |
| k | k | 0.852621 |
| t | t | 0.762965 |
| d | d | 0.744871 |
| s | s | 0.711418 |
| l | l | 0.705106 |
| a | a | 0.693979 |
| h | h | 0.689840 |
| e | e | 0.660995 |
| r | r | 0.618145 |
| ch | ch | 0.598030 |
In every configuration tested, the glyph o comes out as the clearest vowel, with y and a also very strong. What surprises me much is the appearance of k and t, the common gallows seem to act as “vowel-like,” though they might act as semi-vowels or special cases, the patterns fit well with a “vowel-like” pattern. As you can see, some languages result in having 2grams in the top “vowel-like” list, what gives us an idea that the configuration for different language types gives different outputs.
|
|
|
| A new hypothesis: The Voynich Manuscript as a Slavic-based macaronic mixed language |
|
Posted by: BadBigX - 30-08-2025, 09:40 PM - Forum: The Slop Bucket
- Replies (8)
|
 |
Hello everyone,
I would like to share a new hypothesis developed with AI-assisted analysis (GPT-5). This is not presented as a final solution, but as a feasibility study suggesting that the Voynich Manuscript is neither random nor meaningless, but a systematic mixed idiom.
Core idea
The Herbal section shows consistent correlations between plant illustrations and recurring word families (morphemes). These morphemes appear to align with specific plant parts (leaves, roots, flowers). On mixed folios, the corresponding word families appear together, reflecting the illustrated combination.
Corpus examined
12 representative Herbal folios:- Leaf-dominant: f1r, f8r, f26r, You are not allowed to view links. Register or Login to view.
- Root-dominant: f2r, f6r, f16r, You are not allowed to view links. Register or Login to view.
- Flower-dominant: f9v, f16v, You are not allowed to view links. Register or Login to view.
- Mixed: f33v, You are not allowed to view links. Register or Login to view.
Findings- Leaves → morphemes like
saral
,
araral
,
sharal
- Roots → morphemes like
otal
,
otol
,
otaly
- Flowers → morphemes like
okar
,
okaly
,
okal
- Mixed folios → combinations (
otal
+
okaly
)
These patterns repeat consistently across folios and correspond to the dominant botanical features in the illustrations.
Linguistic parallels- Slavic roots (14th–15th century):
- otal
↔ kořen / korzeń (root)
- okar/okaly
↔ květ / kwiat (flower)
- Romance/Latin endings:
-al, -ol, -aly
resemble Latinized case endings.
- Possible Germanic influence: e.g.
saral
↔ kraut.
Interpretation
The VM text may represent a Slavic-based macaronic language, with Slavic lexical stems, Romance flexional endings, and some Germanic influence. This aligns with the cultural context of 15th-century Bohemia/Northern Italy, where such multilingual blends were common.
Conclusion
This is a testable hypothesis: the Herbal section follows a coded recipe logic, with plant parts directly linked to recurring textual morphemes. It suggests the VM is not nonsense, but an encrypted or deliberately obscured transmission of botanical–medical knowledge.
I would be very interested in feedback, criticism, and further testing of this approach.
The next step would be a full statistical evaluation of all Herbal folios.
(Posted under pseudonym for privacy – I’m simply interested in constructive feedback.)
|
|
|
| Bad news: f116v was substantially mangled by Retracer |
|
Posted by: Jorge_Stolfi - 30-08-2025, 04:05 PM - Forum: Marginalia
- Replies (20)
|
 |
I just noticed now that You are not allowed to view links. Register or Login to view. was extensively damaged by insects, and significant parts of the writing were "restored" by someone who may have grossly misinterpreted what remained of it.
The damage is clearly visible on the Beinecke 2014 scans at 2x magnification, as sharply delimited patches where the parchment has a very rough texture. Most of those patches connect to the wormholes, and many extend along creases of the parchment which would have created a space between You are not allowed to view links. Register or Login to view. and the back cover, which the insects could crawl into. They surely must be areas where the surface of the parchment -- and any writing on it -- was completely scraped away.
- One patch extends diagonally across the text, along a crease of the parchment, through the 3rd cross on line 2 and the "f" of "gafmich" on line 4.
- Another patch extends over the whole garbled glyph on line 2, below the largest of the wormholes just below the top edge.
- Another patch includes the first letter of the "portad" on line 2.
There may be many more scraped-out areas but they are not as obvious as those above. For instance, patch 3 above may also include the "r" of the "portad", if not the whole word. Maybe they are more visible in the multi-spectral images with oblique illumination (sequence numbers 032-030 and 031-039).
It is also clear that an attempt was made to restore some of the lost text. The evidence includes the fact that the ink of the garbled glyph in patch 2 is solid brown, whereas glyphs in other worm-scraped areas have the expected appearance -- pitted, or even reduced to scattered dots. Ditto for the mangled "to" in "multos".
The restoration obviously happened after the worms did their damage, and therefore many years (centuries?) after the original text was written. In fact, as in page f1r, the insects may have been attracted by glue from the cover that offsetted onto the adjacent page, and thus must have been after the book was bound.
In that case, it seems that the Restorer failed to restore some of the damaged glyphs, and made many wrong guesses about others. Which of course has a huge impact on the "decipherment" of those lines.
In fact, I suspect that the original writing was in Voynichese, and it was the Restorer who turned it into that Latin-maybe-sort-of script. Note that the word on line 2, just above the end of "maria" on line 3, starts with a bona-fide Voynichese bench Ch.
Perhaps line 2 was not "michton oladabas" but qotain CThey okad akad qoaiin kChd qoChCKh Cho ...
All the best, if possible... --jorge
|
|
|
| the comparison of sh and ch |
|
Posted by: Petrasti - 29-08-2025, 02:52 PM - Forum: Analysis of the text
- Replies (15)
|
 |
I used You are not allowed to view links. Register or Login to view. to compare the words.
In total, I compared 4357 words in all folios that have an “c+h” (at the beginning, in the middle, at the end, or standing alone) with the change to “ch.” There are 3815 identical words that only differ in c+h to ch. (including multiple occurrences) That corresponds to 88%. In other words, around 88% (deviations of a few percent are possible) of the words that appear in the manuscript with c+h are identical except for the change to ch.
Is the c+h to ch change the same word or two different words? I think we are dealing here with a sound change.
The following further anomalies exist:
The individual occurrences that do not show a c+h to ch change are halfway longer than we are used to in the manuscript. Often these are compound words that also exist individually in the manuscript.
Here is a brief example:
c+heolkchy = c+heol kchy
otalc+hedy = otal c+hedy
okeolc+hey = okeol c+hey
dalteoc+hy = dal teo c+hy
There are a few more peculiarities that require closer examination.
for example:
root word stem
chor
also exist as:
c+hor, cThor, cPhor, cFhor, ckhor, kchor, pchor, fchor, tchor, qotchor, qopchor, qokchor, qofchor, cheor, c+heor, dchor, ochor, ychor,
the comparison of sh and ch.pdf (Size: 1,014.85 KB / Downloads: 14)
|
|
|
| Finding patterns in Voynich words via a Hidden Markov Model |
|
Posted by: quimqu - 27-08-2025, 11:09 PM - Forum: Analysis of the text
- Replies (7)
|
 |
Dear all,
These holidays I’ve been exploring how to analyze the Voynich manuscript’s word structure with models from data science - text analyse branch. Checking the Markov Models, I found some information about Hidden Markov Model (HMM) and explored a code that runs on EVA transliteration. The goal is not decipherment, but to quantify recurring patterns in how word pieces combine and to score how “typical” each word looks under those rules.
A Hidden Markov Model is a simple probabilistic model with hidden states (unseen “roles”), transition probabilities between states, and emission probabilities for the observable symbols. From data, an HMM learns those transitions and emissions; for a new sequence it can decode the most likely path of roles and compute a log-likelihood that tells how well the model explains the sequence. Voynich "words" show strong positional regularities (common openings and endings), and an HMM gives a compact way to (i) discover recurring roles behind word pieces, (ii) quantify which pieces go where, and (iii) measure how typical a word is under the learned rules.
What I did: - Tokenized and cleaned paragraphs.
- Discovered affix candidates (short frequent strings with high branching entropy).
- Segmented each word into prefix | stem | suffix with a simple scorer that prefers productive affixes and reasonable stem lengths.
- Built sequences of morphological tokens (such as pre:qo, st:dai, suf:n).
- Trained an HMM on those sequences (learning states, transitions, and emissions).
- Decoded each word with Viterbi to get its state path and log-likelihood.
- Exported a state-transition graph (below) and an You are not allowed to view links. Register or Login to view. to explore words (click the link, I recommend you to check it out!)
[size=1][font='Proxima Nova Regular', 'Helvetica Neue', Helvetica, Arial, sans-serif] [/font][/size]
How to read the state graph:- Boxes are states (latent “roles”). Each label lists the state’s top pieces: P = top prefix fragments, T = top stem fragments, F = top final/suffix fragments.
- Edges are transitions and their width is proportional to probability. Styles mean different things:
- solid black = within a word (typical flows such as START → MID and MID → END)- dashed gray = from the end of one word to the beginning of the next (e.g., END → START or END → MID; also MID → START when a word has no explicit suffix)- dotted light = unusual directions
- Some states behave like START, MID, or END by how they are used after training.
- Read the solid paths left to right to see typical inside-word sequences of roles. Dashed paths show what tends to follow in the next word. For example, a dashed S5 (END) → S4 (MID) means words that end in S5 are often followed by a word that begins in an S4-like role; it does not mean “go back to MID within the same word.”
What the You are not allowed to view links. Register or Login to view. shows when you click a word:- Basics: the (prefix, stem, suffix) segmentation; the morph tokens used (e.g., pre:qo, st:dai, suf:n); the token IDs and whether any mapped to UNK; the word’s log-likelihood under the HMM.
- Decoded path: the state sequence (e.g., S3 → S2 → S5) and mapped role names; for each morph token, its emission probability in that state and its rank among that state’s emissions (how typical it is).
- State context: for each state visited by the word, the state’s top prefix/stem/suffix pieces, plus the pieces observed in the current word with their probabilities.
- Across the page, each word is colored by log-likelihood bins from red to green (least to most typical), using the 1–99% range to avoid outliers.
How this could help with the manuscript:- A compact, testable “grammar of pieces” for Voynich words.
- A way to compare sections or folios: train on one part, score another.
- A tool to spot anomalies or outliers (very low-likelihood words) and dominant regularities (roles and paths).
- Exported emissions and transitions for further statistics and plots.
|
|
|
I solved the Voynich manuscript by pictures! |
|
Posted by: Worldmaster777 - 27-08-2025, 08:51 PM - Forum: Theories & Solutions
- Replies (1)
|
 |
Hello!
I’m Vladimir Aristippus Robespierre from Russia, Moscow. And I solved the Voynich manuscript by pictures.
I didn’t spend years of my life solving the manuscript, it happened by accident. It so happened that I already had all the necessary knowledge in my head, thanks to my passion for personality typologies. I accidentally saw the Voynich manuscript and decided to flip through it out of curiosity, not expecting anything special, but immediately understood the meaning of some pictures and became interested. Then, in the process of further consideration of the pictures and reflections, I understood the meaning of many more pictures from the Voynich manuscript.
So, what is this entire manuscript about? The manuscript contains information about the structure of the universe and humans, and how humans interact with the universe.
The 6-page scheme is the most important scheme in the manuscript, which sets the themes for the other sections of the manuscript. This scheme shows how a person perceives the world around them through their senses. The central circle in the scheme represents the brain, the 6 towers represent the 6 types of sensations, and the cloud above the towers represents the soul (mind).
The biological section describes in more detail the work of the body’s organs, in particular the sensory organs and reproductive organs. On the pages describing the work of the sensory organs, bathing women represent nerve impulses. And if a woman has a headgear, then the nerve impulse carries information.
The reproductive theme is presented in the manuscript in the context that when a person dies, their soul goes to eternity, and then comes back to this world through a fertilized egg. There are no recipes in the manuscript against unwanted pregnancy or anything like that.
Also, how people perceive the world around them depends on their personality types. Personality types are innate and do not change throughout life. The manuscript describes the process of the circulation of souls in the universe and their distribution by personality types. The stars in the manuscript represent souls.
Plants are not the subject of the Voynich manuscript at all. These are schemes on completely different topics, stylized as images of plants. The pages of the botanical section, in particular, contain schemes about the structure of the universe, the senses and on the reproductive topic. This is why botanists cannot identify the plants in the Voynich manuscript. It makes sense that if you are encrypting the text, you should also encrypt the schemes.
You can read about all of this in my report in English:
You are not allowed to view links. Register or Login to view.
I also have a video in Russian where I explain everything in more detail:
You are not allowed to view links. Register or Login to view.
|
|
|
|