The Voynich Ninja

Pages: 1 2 3 4 5 6

(25-06-2025, 06:49 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.
(25-06-2025, 06:14 PM)Mauro Wrote: You are not allowed to view links. Register or Login to view.
(25-06-2025, 04:33 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.If I encode a character using the combination of the word length and the second character in the word (say, qOkedy is O-6 or S, cHedy is H-5 or A, oTey is T-4 or Y, pOchey is O-6 or S again), this is non-deterministic, but perfectly reversible.

Hmm sorry but I don't understand. It seems to me your encoding method is deterministic (no random numbers are involved) but non-reversible (is O-6 'qokedy' or 'qokain'?)

It's non deterministic because you can freely choose any word as long as the second letter is the one you need, and the whole word is of the required length. It's reversible, because the plaintext letter is uniquely determined by the word length and the second letter of the word.

Ah thank you, I had completely misunderstood, I thought you were encoding a word, not a character.

(24-06-2025, 11:07 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.
Binomial word-length distribution: Since you cited my 2002 webpage, you should know that East Asian monosyllabic languages do have a binomial word-length distribution.

A single network of similar word forms: Monosyllabic languages have such "networks of similar forms" because of the fixed structure of the syllables

Absence of semantic categories (e.g., nouns vs. verbs): Like most East Asian languages.

No identifiable function words: Like most East Asian languages.

For an answer see Timm & Schinner 2019: "Typical for this language family are monosyllabic words and tones to differentiate between various meanings. Omitting the diacritical markers (that redefine the pronunciation), a text written in Vietnamese script will indeed contain many (similar) short words, also resulting in a single network. But even then this network is substantially different from that one representing the VMS: multiple word clusters in contrast to the homogeneous VMS-network.
Additionally, in Vietnamese (like all languages) there will be frequent words distributed equally over the entire text, the so-called function words (like conjunctions, articles etc.). They do not appear contextual, but rather serve to implement grammatical structures, and they normally do not have co-occurring similar words of comparable frequency. In the VMS frequently used tokens differ from page to page. With the exception of repetitive prayers or poems, words in natural languages are chosen because of their meaning, and not of their similarity with previously written words." [Timm & Schinner 2019, p. 4f]

(24-06-2025, 11:07 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.
Lack of clear word order or repeated phrases: Again, this is a property of the text, not of the language. But East Asian languages do give the impression of having no fixed word order.

Language, by definition, is a system for conveying information, and word order is one of the primary mechanisms for signaling relationships between words — such as who is doing what to whom.

Even languages with flexible word order still rely on:

Preferred or default orders (e.g., Subject-Verb-Object in English, Subject-Object-Verb in Japanese).
Morphological markers (like case endings) to compensate for word order flexibility.
Discourse patterns, where word order reflects emphasis or focus rather than grammar alone.

This also holds true for East Asian languages. See for instance these articles about You are not allowed to view links. Register or Login to view. or You are not allowed to view links. Register or Login to view..

(24-06-2025, 11:07 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.
Deep correlation between frequency, similarity, and spatial proximity What do you mean, precisely? Those terms can describe normal features of texts in natural languages.

For an answer see Timm & Schinner 2019: "The global VMS graph, as well as the corresponding sub-networks for individual folios, give more evidence for a fundamental connection between token frequency, number of similar tokens, and position within the text. ... The respective frequency counts confirm the general principle: high-frequency tokens also tend to have high numbers of similar words. This is illustrated in greater detail in Figure 3: "isolated" words (i.e. unconnected nodes in the graph) usually appear just once in the entire VMS, while the most frequent token <daiin> (836 occurrences) has 36 counterparts with edit distance 1." [Timm & Schinner 2019, p. 6]

(24-06-2025, 11:07 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.
Random walk-like statistical behavior and long-range correlations: What do you mean, precisely? Texts in natural languages generally have "random-walk like behavior" and "long-range correlations".

See for instance You are not allowed to view links. Register or Login to view. or You are not allowed to view links. Register or Login to view.. Schinner demonstrated that the probability of the occurrence of a similar word decreases with distance:
"Interpreting normal texts as bit sequences yields deviations of little significance from a true (uncorrelated) random walk. For the VMS, this only holds on a small scale of approximately the average line length; beyond positive correlation build up: the presence/absence of a symbol appears to increase/decrease the tendency towards another occurrence." [Schinner 2007, p. 105].

(24-06-2025, 11:07 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.
Systematic shifts from Currier A to Currier B: Those "language shifts" are perfectly compatible with each section having been copied or summarized from a different book on a different topic -- even if the books were all in the same natural language.

For an answer see Timm & Schinner 2019: "It is possible to distinguish Currier A and B based on frequency counts of tokens containing the sequence <ed>. The summary in Table 2 shows, e.g., that if <chedy> is used more frequently, this also increases the frequency of similar words, like <shedy> or <qokeedy> | in conformity with our previous analysis concerning frequency versus similarity. At the same time, also words using the prefix <qok-> are becoming more and more frequent, whereas words typical for Currier A like <chol> and <chor> vanish gradually. Now, reordering the sections with respect to the frequency of token <chedy> replaces the seemingly irregular mixture of two separate languages by the gradual evolution of a single system from 'state A' to 'state B'."

[attachment=10908]

(24-06-2025, 11:07 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.
Lines function as structural units: There are simple explanations for why the Scribe could have created such patterns. Like stretching, squeezing, or abbreviating the text to get line breaks falling on sentence boundaries when possible.

No corrections or deleted sequences: Surely it was not written directly on vellum, even if it was a hoax. The Author wrote a final draft on paper and recruited a Scribe to copy it onto vellum. But in fact there are many instances of corrections and uncorrected errors made by the Scribe.

Line-endings fit precisely into available space: Again, a normal product of a minimally competent Scribe.

I disagree. The Voynich text clearly responds to its physical container — the manuscript page. This suggests the text was generated during the act of writing, not copied mechanically from a pre-existing source. The scribe would need to actively choose, at minimum, the words at the end of each line to ensure that the text consistently fits neatly within the available space. That level of precise layout control isn't possible if the text was fully predetermined elsewhere.

(24-06-2025, 11:07 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.
Context-dependent self-similarity: Again, it is quite possible that a meaningful text in any language would have this feature.

See Timm & Schinner 2019: "all pages containing at least some lines of text do have in common that pairs of frequently used words with high mutual similarity appear. The exact cooccurrences may vary: there are pages where <daiin> is paired with <dain>, but also pages where it is frequently used together with <aiin> (f41v, f46r, f55v, f89v2, v105v and f114r) or <saiin> (f2r, f16r, and f90r2)." [Timm & Schinner 2019, p. 3]. This is a consistent feature observed across all pages of the Voynich Manuscript.
See also Timm 2016: "In natural languages a word normally (cf. poems) is used because of its meaning and not because it is similar to a previously written one. The result that the words are arranged such that they co-occur with similar ones is therefore not compatible with a linguistic system. An English text with similar features would mainly consist of words similar to the words 'the', 'and' and 'to'. Additionally, a word "the" would co-occur with words like 'khe', 'phe', 'fhe', 'tha', 'tho', 'thy', 'thee' and 'theee'." [You are not allowed to view links. Register or Login to view.]

(23-06-2025, 06:47 PM)dashstofsk Wrote: You are not allowed to view links. Register or Login to view.This takes time, work and effort. It wasn't 'garbage'.

Just watched it again. It's better than I thought the first time. They did credit their sources, there is a Google docs You are not allowed to view links. Register or Login to view. in the description. Yes

(25-06-2025, 04:42 PM)bi3mw Wrote: You are not allowed to view links. Register or Login to view.The binomial word length distribution of a text generated with Torsten`s SelfCitationTextgenerator ( 34,980 words, 198,497 characters , comparable to VMS ). The result is better than expected, at least the basic structure is recognizable.
You are not allowed to view links. Register or Login to view.

Sorry for the confusion. The terms "binomial distribution" indeed mean the distribution
Prob(k) = choose(n,k) p^k (1-p)^(n-k)
which is the probability of getting k 'sucesses' in n 'tries', if the probability of each try succeeding is p. The plot indeed looks like the red dotted line above. Note the long tail and the asymmetry: if the range of values is n, the peak is at about k = p n.

But what I meant in that webpage and that post is that the distribution of lengths of lexemes ("word types"?) looks like the the binomial coefficient function choose(k,n). Which is the special case of the binomial distribution when p = 0.5. A key feature of this function is its symmetry about the middle of the range n/2. The VMS and East Asian monosyllabic languages have this property. "European" languages do not.

(25-06-2025, 11:21 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view.Language, by definition, is a system for conveying information, and word order is one of the primary mechanisms for signaling relationships between words — such as who is doing what to whom.

It seems that you are conflating two senses of "word order" here.

In a telephone number, the order of the digits is terribly important: swap two digits and the number will not work. However, in the "language" of telephone numbers, there is no "digit order" -- in the sense that any digit may occur at any position in the number.

(On the other hand, for telephones from a specific geographical context, some digits will be more common than others in certain positions of the numbers. And these anomalies will be different in different contexts...)

Quote:[natural laguages have] Preferred or default orders (e.g., Subject-Verb-Object in English, Subject-Object-Verb in Japanese). [...] This also holds true for East Asian languages. See for instance these articles about You are not allowed to view links. Register or Login to view.

That does not mean that "in every language, certain words can only appear at the beginning/middle/end of sentences" I don't speak Chinese (or any of my "Chinese" languages, unfortunately), but Google Translate gives me these

"The selling room is here." ▶ 售卖室在这里 ≡ Shòu mài shì zài zhè lǐ.
"He sells good stuff." ▶ 他卖的东西不错 ≡ Tā mài de dōng xī bù cuò.
"This car is sold." ▶ 这辆车卖了 ≡ Zhè liàng chē mài le.
"I have a book to sell." ▶ 我有一本书要卖 ≡ Wǒ yǒu yī běn shū yào mài.
"The seller wants to come." ▶ 卖家想来≡ Mài jiā xiǎng lái.

Apologies to Chinese speakers if the translations are broken or stilted, but I am sure that the point is correct: the same Mandarin syllable can appear, alone or as a compound, in many different grammatical roles - and hence in any part of a sentence.

The division of the lexicon into four largely disjoint sets -- nouns, verbs, adjectives, and adverbs -- with well-distinguished grammatical roles is a feature of Indo-European languages, not a "linguistic universal". (II seem to recall Jacques Guy saying that linguists are still desperately trying to find one of these.) Even English, a creole language that lost most of the characteristic IE features, will often use the same word as any of those categories: "this is a stone", "this is a stone chisel", "stone him", "it is stone hard".

Quote:[or natural languages will have] Morphological markers (like case endings) to compensate for word order flexibility.

But a language can have neither word-place segregation nor inflections. See Mandarin.

Quote:[natural languages also have] Discourse patterns, where word order reflects emphasis or focus rather than grammar alone.

Yes, but this goes against your point: the same word may appear in different places of the sentence, depending on the desired emphasis.

Quote:The respective frequency counts confirm the general principle: [in the VMS] high-frequency tokens also tend to have high numbers of similar words.

This is true in any natural language, no? Even in English? "cat" "bat" "fat" "hat" "mat" "pat" "rat" "sat" "vat" "kit" "cot" "cut" "cab" "cam" "can" "cap" "car" ... but not so much for "however" or "equinox" ...

Quote:I disagree [about the Scribe clean-copying from a draft]. The Voynich text clearly responds to its physical container — the manuscript page. This suggests the text was generated during the act of writing, not copied mechanically from a pre-existing source. The scribe would need to actively choose, at minimum, the words at the end of each line to ensure that the text consistently fits neatly within the available space. That level of precise layout control isn't possible if the text was fully predetermined elsewhere.

When copying running text, Medieval European Scribes (like today's word processors) routinely disregarded line breaks in the draft and inserted line breaks, abbreviations, capitals, flourishes on their own. Fitting the lines neatly between margins was part of their basic skill set, just as preparing ink and shaping the pen.

Quote:[in the VMS] pairs of frequently used words with high mutual similarity appear. The exact cooccurrences may vary: there are pages where <daiin> is paired with <dain>, but also pages where it is frequently used together with <aiin> (f41v, f46r, f55v, f89v2, v105v and f114r) or <saiin> (f2r, f16r, and f90r2)."

Such occurrences are expected if the words are single syllables. Even in the random 5-sentence above there is a "mài shì zài".

Also, a feature of Mandarin is that when two syllables of certain tones are pronounced in succession, the tone of the second one may be changed to a "simpler" tone (say from "dipping" to "flat") or to a tone similar to that of the first, just for euphonics (like vowel harmony in other languages). Thus perhaps daiin and aiin and saiin are actually the same word, with different tones because of the different surroundings.

Also, perhaps the system that the Author used to encode tones was to write a letter code for the "current" pitch level and insert such codes when the pitch changed. E.g. he might have written "Wǒ yǒu yī běn shū yào mài" as "3w1o 53 yo1u3 yi b1en sh3u ya2o 5 ma2i" or in any of dozens of equivalent ways. Then, again, the same word would be written differently depending on the preceding context. And, by the way, that could also explain why isolated y, or words beginning with y, are more common at the start of new sentences...

Also, if the VMS was dictated and the Author did not recognize all those plant names and medical terms, he surely must have made lots of mistakes, inconsistently -- writing the same word in different ways each time. The diaries of the Lewis and Clark expedition were not dictated, but they were similar to that scenario in that the writers were not exactly spelling-bee champions. I counted five different spellings of the word "buffalo", some even in the same paragraph.

Also, perhaps the spelling system that the Author devised at first was too detailed, and recorded features of the spoken words that were just noise without significance, like the length or loudness of the vowels. Then that "phonetic noise" would be different each time the Reader spoke the same word. By the way, that could also explain also the difference between "languages": between one section and the other, the Author's new literate friend explained to him that Mandarin had only four tones -- not seven, as his grocer had categorically affirmed before he started the project.

There is more, but I must end here for now...

All the best, --jorge

(26-06-2025, 01:07 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.That does not mean that "in every language, certain words can only appear at the beginning/middle/end of sentences"

In linguistics, word order refers to the structured arrangement of words in a sentence that signals grammatical relationships. It does not imply that specific words are rigidly restricted to the beginning, middle, or end of sentences.

Chinese provides a great example of how word order functions in a language that has minimal inflection (i.e., very few endings or case markers) and relies heavily on strict word order to convey meaning.

Just like English, Mandarin Chinese typically follows a Subject-Verb-Object structure.
Example:
我吃苹果。
Wǒ chī píngguǒ.
I eat apples.

我 (wǒ) — Subject (I)
吃 (chī) — Verb (eat)
苹果 (píngguǒ) — Object (apples)

Changing the word order can either make the sentence ungrammatical or completely change its meaning.
苹果吃我。
Píngguǒ chī wǒ.
The apples eat me.
Same words, completely different meaning due to different word order.

You can add time expressions or location phrases, but they typically appear in specific positions — and the core SVO structure stays the same.
Example with location Expression:
我在学校吃苹果。
Wǒ zài xuéxiào chī píngguǒ.
I eat apples at school.

在学校 (zài xuéxiào) — Location phrase (at school) placed after the subject, before the verb.

This in mind D’Imperio argued in 1978: "The short words, the many sequential repetitions, the rarity of one- or two-letter words, the rarity of doublets, all militate against simple substitution. So also the strange lack of parallel context surrounding different occurrences of the same word." [D’Imperio 1978].

(26-06-2025, 01:07 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.The division of the lexicon into four largely disjoint sets -- nouns, verbs, adjectives, and adverbs -- with well-distinguished grammatical roles is a feature of Indo-European languages, not a "linguistic universal". (II seem to recall Jacques Guy saying that linguists are still desperately trying to find one of these.) Even English, a creole language that lost most of the characteristic IE features, will often use the same word as any of those categories: "this is a stone", "this is a stone chisel", "stone him", "it is stone hard".

No, all known natural languages use for instance some form of function words or grammatical markers, though the way they appear can vary dramatically.

Function words are words that don't carry lexical meaning themselves, but instead serve a grammatical role — structuring the sentence and clarifying relationships between words.

Examples in English:

Articles: the, a, an
Pronouns: he, she, it, they
Prepositions: in, on, at, with
Conjunctions: and, but, or
Auxiliary verbs: is, have, do

How do other languages handle this?
Analytic Languages (like Mandarin Chinese):

Use many function words because they lack inflection.
Example:
我在学校。 (Wǒ zài xuéxiào.) — I am at school.
Here, 在 (zài) functions as a preposition.

Synthetic Languages (like Latin or Finnish):

Use fewer function words because they encode grammatical roles in word endings (inflection).
But even here, function words still appear, e.g., conjunctions, particles.

Polysynthetic Languages (like Inuktitut):

Often bundle many grammatical markers into a single word.
Function-like roles are expressed through morphology, but the functions themselves still exist, even if not as separate words.

There is no known natural language without function words or functional equivalents. They may appear as standalone words, prefixes, suffixes, or particles — but the grammatical roles they serve are universal and necessary for structured, meaningful communication. If the Voynich text represents plain natural language it should be easy to identify function words or some common markers used for indicating the relationships between words.

(26-06-2025, 01:07 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.
Quote:The respective frequency counts confirm the general principle: [in the VMS] high-frequency tokens also tend to have high numbers of similar words.

This is true in any natural language, no? Even in English? "cat" "bat" "fat" "hat" "mat" "pat" "rat" "sat" "vat" "kit" "cot" "cut" "cab" "cam" "can" "cap" "car" ... but not so much for "however" or "equinox" ...

Quote:[in the VMS] pairs of frequently used words with high mutual similarity appear. The exact cooccurrences may vary: there are pages where <daiin> is paired with <dain>, but also pages where it is frequently used together with <aiin> (f41v, f46r, f55v, f89v2, v105v and f114r) or <saiin> (f2r, f16r, and f90r2)."

Such occurrences are expected if the words are single syllables. Even in the random 5-sentence above there is a "mài shì zài".

My point was that high-frequency tokens also tend to have high numbers of similar words whereas "isolated" words (i.e. unconnected nodes in the graph) usually appear just once in the entire VMS.

See "for example, folio f108r: the most frequent tokens on that page are <qokeedy>, <qokedy>, and <okedy>, each one appearing sixteen times.
A useful method to analyze the similarity relations between words of a VMS (sub-)section is their representation as nodes in a graph. Starting with the most frequent
token one can recursively search for other words differing by just a single glyph, and connect these new nodes with an edge. The resulting network, built around the three most frequent tokens of folio You are not allowed to view links. Register or Login to view. (restricted to their 33 most similar tokens), gives a first impression of an existing deep correlation between frequency, similarity, and spatial vicinity of tokens within the VMS text (cf. Figure 1). Note that besides the aforementioned top-frequency tokens also words like <otedy>, <qokeey>, <okeey>, <qokey>, <qotedy>, and also <okeedy> enter the You are not allowed to view links. Register or Login to view. network.
How does this situation change when we look at the entire VMS? Figure 2 shows the resulting network, connecting 6796 out of 8026 words (=84.67 %). Again, an edge indicates that two words differ by just one glyph. The longest path within this network has a length of 21 steps, substantiating its surprisingly high connectivity. ..." [Timm & Schinner 2019, p. 4]

[attachment=10909]

This isn't a random or isolated phenomenon; it's a consistent feature that holds true across all pages and for essentially all word types within the manuscript. (see You are not allowed to view links. Register or Login to view.).

(26-06-2025, 01:07 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.When copying running text, Medieval European Scribes (like today's word processors) routinely disregarded line breaks in the draft and inserted line breaks, abbreviations, capitals, flourishes on their own. Fitting the lines neatly between margins was part of their basic skill set, just as preparing ink and shaping the pen.

Sorry, but I’m not sure how your statement explains the specific situation in the Voynich Manuscript, where the line itself behaves as a functional unit, with both line-start and line-end patterns being consistently observable. If medieval scribes routinely disregarded original line breaks or inserted their own purely for layout purposes, how do you account for these systematic patterns in the VMS? It’s not just neat margins — the structure of the text responds to the line boundaries in a way that suggests the line is an intentional, meaningful unit, not just a visual convenience.

Regards,
Torsten

(26-06-2025, 03:20 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view.In linguistics, word order refers to the structured arrangement of words in a sentence that signals grammatical relationships. It does not imply that specific words are rigidly restricted to the beginning, middle, or end of sentences.

Sorry if I was not clear, but that is indeed what I meant.

The second sentence above implies that attempts to identify word classes based on frequencies at begin/middle/end of sentences (or paragraphs, or lines) would only work for some natural languages.

Quote:Chinese provides a great example of how word order functions in a language that has minimal inflection (i.e., very few endings or case markers) and relies heavily on strict word order to convey meaning. Changing the word order can either make the sentence ungrammatical or completely change its meaning.

I will not be surprised if there is a three-character sentence in Mandarin such that all six permutations of its characters can be read as grammatically correct sentences. I would bet a beer and pizza slice that there are many such examples...

(26-06-2025, 01:07 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.The division of the lexicon into four largely disjoint sets -- nouns, verbs, adjectives, and adverbs -- with well-distinguished grammatical roles is a feature of Indo-European languages, not a "linguistic universal". (II seem to recall Jacques Guy saying that linguists are still desperately trying to find one of these.) Even English, a creole language that lost most of the characteristic IE features, will often use the same word as any of those categories: "this is a stone", "this is a stone chisel", "stone him", "it is stone hard".

Quote:No, all known natural languages use for instance some form of function words or grammatical markers, though the way they appear can vary dramatically.

Sorry, "No" to what? This statement does not negate what I wrote above.

Quote:Function words are words that don't carry lexical meaning themselves, but instead serve a grammatical role — structuring the sentence and clarifying relationships between words.

Sure. But here is what the dictionary says about the first function word of Mandarin that I picked at random with Google,
You are not allowed to view links. Register or Login to view. , yán:

along [preposition]
to follow (a line, tradition etc) [verb]
to carry on [verb]
to trim (a border with braid, tape etc) [verb]
border [noun? adjective?]
edge [noun]

Quote:There is no known natural language without function words or functional equivalents.

With the caveats "or functional equivalents", this is a trivial tautology.

Quote:If the Voynich text represents plain natural language it should be easy to identify function words or some common markers used for indicating the relationships between words.

It is not easy at all, if the function words of the language can also be used in roles that in IE languages would be served by nouns, verbs, and adjectives. Especially when the function word can be part of hundreds of two-syllable compounds with special meanings, each compound acting in multiple of those roles...

And it is not easy at all if the language actually is East Asian or Aboriginal or Mesaomerican, but the would-be crackers will refuse to consider those possibilities because "the dresses and hats are obviously European, so it must be an European language".

Quote:Sorry, but I’m not sure how your statement explains the specific situation in the Voynich Manuscript, where the line itself behaves as a functional unit, with both line-start and line-end patterns being consistently observable.

"line starts and line end patterns are clearly observable" does not imply "lines are functional units". This is an hypothesis that needs good additional evidence to become likely.

If the line is a "functional unit", how do you explain that most of those "functional units" happen to have the right number of glyphs to precisely span the width of the text? And why is the last "functional unit" of a page almost always shorter, ending at some random point between the text rails (and, in some sections, it is the only "functional unit" that is like that)?

On page f112r, after writing the first "functional unit" at full width, the Scribe apparently noticed that there was something wrong with he vellum in that area (perhaps oily, or too absorbent, or too rough, or whatever) and thus reduced the text width by ~30% for "functional units" 2 to 20, and returned to normal width after that. Was that layout planned by the Author, too?

On page f112v, the same vellum problem apparently caused the Scribe to indent "functional units" 1-10 by ~3 cm, then exdent "functional unit" 11 by ~1 cm, and continue to the end of the page with a left rail that was quite a bit slanted away from vertical; while the right rail was all straight and vertical. Semantically significant too?

All the best, --jorge

(26-06-2025, 05:59 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.
Quote:Sorry, but I’m not sure how your statement explains the specific situation in the Voynich Manuscript, where the line itself behaves as a functional unit, with both line-start and line-end patterns being consistently observable.

"line starts and line end patterns are clearly observable" does not imply "lines are functional units". This is an hypothesis that needs good additional evidence to become likely.

If the line is a "functional unit", how do you explain that most of those "functional units" happen to have the right number of glyphs to precisely span the width of the text? And why is the last "functional unit" of a page almost always shorter, ending at some random point between the text rails (and, in some sections, it is the only "functional unit" that is like that)?

The term "Functional Entity" just describes a set of observations. It does not imply that "lines are functional units". See Currier 1976: "In addition to my findings about languages and hands, there are two other points that I d like to touch on very briefly. Neither of these has, I think, been discussed by anyone else before. The first point is that the line is a functional entity in the manuscript on all those pages where the text is presented linearly. There are three things about the lines that make me believe the line itself is a functional unit. The frequency counts of the beginnings and endings of lines are markedly different from the counts of the same characters internally. There are, for instance, some characters that may not occur initially in a line. ... The ends of the lines contain what seem to be, in many cases, meaningless symbols: little groups of letters which don t occur anywhere else, and just look as if they were added to fill out the line to the margin. Although this isn t always true, it frequently happens." [You are not allowed to view links. Register or Login to view. 1976].

Anyway, the discussion is about whether the text was generated during writing — and that’s exactly my point. If, as you argue, medieval scribes routinely disregarded original line breaks or inserted their own purely for layout reasons, how do you explain the systematic, consistent patterns we see at both the beginnings and ends of lines in the Voynich Manuscript?

(26-06-2025, 05:59 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.On page f112r, after writing the first "functional unit" at full width, the Scribe apparently noticed that there was something wrong with he vellum in that area (perhaps oily, or too absorbent, or too rough, or whatever) and thus reduced the text width by ~30% for "functional units" 2 to 20, and returned to normal width after that. Was that layout planned by the Author, too?

On page f112v, the same vellum problem apparently caused the Scribe to indent "functional units" 1-10 by ~3 cm, then exdent "functional unit" 11 by ~1 cm, and continue to the end of the page with a left rail that was quite a bit slanted away from vertical; while the right rail was all straight and vertical. Semantically significant too?

At the top of folio f112, there’s a visible wrinkle in the parchment. The most practical way to avoid that wrinkle while writing the first two lines on the back (f112v) was to begin exactly where lines f112v.P.1 and f112v.P.2 actually start. The shorter line length at the beginning of You are not allowed to view links. Register or Login to view. isn’t a sign that the scribe was running out of space — it simply reflects an adjustment to the irregular surface. We see a similar pattern on folios You are not allowed to view links. Register or Login to view. and f115v, where the starting point for writing shifts to the right, again likely in response to the parchment's condition. Even the hole in folio f107 caused no noticeable space issues.

One plausible explanation for all of this is that the scribe had the flexibility to choose glyph groups that neatly fit into the available space as they wrote. To me, also examples like on f112 strongly suggest that the text was being generated during the writing process, not copied mechanically from a fixed draft.

(26-06-2025, 05:59 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.
Quote:Function words are words that don't carry lexical meaning themselves, but instead serve a grammatical role — structuring the sentence and clarifying relationships between words.

Sure. But here is what the dictionary says about the first function word of Mandarin that I picked at random with Google,
[font=-apple-system, BlinkMacSystemFont, Roboto, Helvetica, Arial, sans-serif]You are not allowed to view links. Register or Login to view. , [/font]yán:

along [preposition]

[font='Palatino Linotype', 'Book Antiqua', Palatino, serif]to follow (a line, tradition etc) [verb][/font]

[font='Palatino Linotype', 'Book Antiqua', Palatino, serif]to carry on [verb][/font]

[font='Palatino Linotype', 'Book Antiqua', Palatino, serif]to trim (a border with braid, tape etc) [verb][/font]

[font='Palatino Linotype', 'Book Antiqua', Palatino, serif]border [noun? adjective?][/font]

[font='Palatino Linotype', 'Book Antiqua', Palatino, serif]edge [noun][/font]

Indeed, many present function words were once content words in ancient Chinese, that is, they could individually convey certain meanings. However, that historical origin doesn’t change the fact that even in ancient Chinese, it’s possible to trace grammatical rules and structural patterns. The evolution from content word to function word is a common linguistic process, but the presence of grammar — and the ability to analyze word order and syntactic structure — has always been essential for meaningful communication.

Regards,
Torsten

(26-06-2025, 05:59 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.Sure. But here is what the dictionary says about the first function word of Mandarin that I picked at random with Google,
You are not allowed to view links. Register or Login to view. , yán:

along [preposition]

to follow (a line, tradition etc) [verb]

to carry on [verb]

to trim (a border with braid, tape etc) [verb]

border [noun? adjective?]

edge [noun]

I know from the Thai language that many prepositions are actually verbs.
Many words can be used either as a noun or as a verb, with related meanings.
Classifier words are very often nouns themselves.
Adjectives can sometimes be nouns or verbs.

Compared with European languages, the grammatical rules appear far more 'fuzzy'.
Undoubtedly, this is also true for other East-Asian languages.

Edit: some of these things also happen in English.

Example: the word 'bargain' can be a noun or a verb, but if it's a noun there would usually be an article.
Asian langauges I know of do not use articles.

Past or present participles (verbs) can usually be used as adjectives, but this is clear in their forms.
Many East- Asian languages do not inflect verbs

Again: the main objection to the Chinese theory in my opinion is the historical context. From statistical considerations I see no problem.

Pages: 1 2 3 4 5 6

Mauro

Torsten

nablator

Jorge_Stolfi

Jorge_Stolfi

Torsten

Jorge_Stolfi

Torsten

Torsten

ReneZ