(26-06-2025, 01:07 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.That does not mean that "in every language, certain words can only appear at the beginning/middle/end of sentences"
In linguistics,
word order refers to the structured arrangement of words in a sentence that signals grammatical relationships. It does
not imply that specific words are rigidly restricted to the beginning, middle, or end of sentences.
Chinese provides a great example of how
word order functions in a language that has minimal inflection (i.e., very few endings or case markers) and relies heavily on
strict word order to convey meaning.
Just like English,
Mandarin Chinese typically follows a
Subject-Verb-Object structure.
Example:
我吃苹果。
Wǒ chī píngguǒ.
I eat apples.- 我 (wǒ) — Subject (I)
- 吃 (chī) — Verb (eat)
- 苹果 (píngguǒ) — Object (apples)
Changing the word order can either make the sentence ungrammatical or completely change its meaning.
苹果吃我。
Píngguǒ chī wǒ.
The apples eat me.
Same words, completely different meaning due to different word order.
You can add
time expressions or
location phrases, but they typically appear in specific positions — and the core
SVO structure stays the same.
Example with location Expression:
我在学校吃苹果。
Wǒ zài xuéxiào chī píngguǒ.
I eat apples at school.- 在学校 (zài xuéxiào) — Location phrase (at school) placed after the subject, before the verb.
This in mind D’Imperio argued in 1978: "The short words, the many sequential repetitions, the rarity of one- or two-letter words, the rarity of doublets, all militate against simple substitution. So also the strange lack of parallel context surrounding different occurrences of the same word." [D’Imperio 1978].
(26-06-2025, 01:07 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.The division of the lexicon into four largely disjoint sets -- nouns, verbs, adjectives, and adverbs -- with well-distinguished grammatical roles is a feature of Indo-European languages, not a "linguistic universal". (II seem to recall Jacques Guy saying that linguists are still desperately trying to find one of these.) Even English, a creole language that lost most of the characteristic IE features, will often use the same word as any of those categories: "this is a stone", "this is a stone chisel", "stone him", "it is stone hard".
No, all known natural languages use for instance some form of function words or grammatical markers, though the way they appear can vary dramatically.
Function words are words that
don't carry lexical meaning themselves, but instead serve a
grammatical role — structuring the sentence and clarifying relationships between words.
Examples in English:
- Articles: the, a, an
- Pronouns: he, she, it, they
- Prepositions: in, on, at, with
- Conjunctions: and, but, or
- Auxiliary verbs: is, have, do
How do other languages handle this?
Analytic Languages (like Mandarin Chinese):
- Use many function words because they lack inflection.
- Example:
我在学校。 (Wǒ zài xuéxiào.) — I am at school.
Here, 在 (zài) functions as a preposition.
Synthetic Languages (like Latin or Finnish):
- Use fewer function words because they encode grammatical roles in word endings (inflection).
- But even here, function words still appear, e.g., conjunctions, particles.
Polysynthetic Languages (like Inuktitut):
- Often bundle many grammatical markers into a single word.
- Function-like roles are expressed through morphology, but the functions themselves still exist, even if not as separate words.
There is
no known natural language without function words or functional equivalents. They may appear as standalone words, prefixes, suffixes, or particles — but the
grammatical roles they serve are universal and necessary for structured, meaningful communication. If the Voynich text represents plain natural language it should be easy to identify function words or some common markers used for indicating the relationships between words.
(26-06-2025, 01:07 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.Quote:The respective frequency counts confirm the general principle: [in the VMS] high-frequency tokens also tend to have high numbers of similar words.
This is true in any natural language, no? Even in English? "cat" "bat" "fat" "hat" "mat" "pat" "rat" "sat" "vat" "kit" "cot" "cut" "cab" "cam" "can" "cap" "car" ... but not so much for "however" or "equinox" ...
Quote:[in the VMS] pairs of frequently used words with high mutual similarity appear. The exact cooccurrences may vary: there are pages where <daiin> is paired with <dain>, but also pages where it is frequently used together with <aiin> (f41v, f46r, f55v, f89v2, v105v and f114r) or <saiin> (f2r, f16r, and f90r2)."
Such occurrences are expected if the words are single syllables. Even in the random 5-sentence above there is a "mài shì zài".
My point was that high-frequency tokens also tend to have high numbers of similar words whereas "isolated" words (i.e. unconnected nodes in the graph) usually appear just once in the entire VMS.
See "for example, folio f108r: the most frequent tokens on that page are <qokeedy>, <qokedy>, and <okedy>, each one appearing sixteen times.
A useful method to analyze the similarity relations between words of a VMS (sub-)section is their representation as nodes in a graph. Starting with the most frequent
token one can recursively search for other words differing by just a single glyph, and connect these new nodes with an edge. The resulting network, built around the three most frequent tokens of folio You are not allowed to view links.
Register or
Login to view. (restricted to their 33 most similar tokens), gives a first impression of an existing deep correlation between frequency, similarity, and spatial vicinity of tokens within the VMS text (cf. Figure 1). Note that besides the aforementioned top-frequency tokens also words like <otedy>, <qokeey>, <okeey>, <qokey>, <qotedy>, and also <okeedy> enter the You are not allowed to view links.
Register or
Login to view. network.
How does this situation change when we look at the entire VMS? Figure 2 shows the resulting network, connecting 6796 out of 8026 words (=84.67 %). Again, an edge indicates that two words differ by just one glyph. The longest path within this network has a length of 21 steps, substantiating its surprisingly high connectivity. ..." [Timm & Schinner 2019, p. 4]
[
attachment=10909]
This isn't a random or isolated phenomenon; it's a consistent feature that holds true across all pages and for essentially all word types within the manuscript. (see You are not allowed to view links.
Register or
Login to view.).
(26-06-2025, 01:07 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.When copying running text, Medieval European Scribes (like today's word processors) routinely disregarded line breaks in the draft and inserted line breaks, abbreviations, capitals, flourishes on their own. Fitting the lines neatly between margins was part of their basic skill set, just as preparing ink and shaping the pen.
Sorry, but I’m not sure how your statement explains the specific situation in the Voynich Manuscript, where the
line itself behaves as a functional unit, with both line-start and line-end patterns being consistently observable. If medieval scribes routinely disregarded original line breaks or inserted their own purely for layout purposes, how do you account for these systematic patterns in the VMS? It’s not just neat margins — the structure of the text responds to the line boundaries in a way that suggests the line is an intentional, meaningful unit, not just a visual convenience.
Regards,
Torsten