(7 hours ago)oshfdk Wrote: You are not allowed to view links. Register or Login to view.1. Matching visual patterns is a much broader process than just modifying words glyph by glyph. The existence of the curve line system may suggest that Voynichese words are visually constrained, and it's possible that the actual self-citation might not work at glyph level at all. For example, maybe it works at stroke level, rearranging strokes according to some rules. Using only glyph-based rules is much easier for testing, but may miss some important part of the visual process. Is it likely that the rules defined using glyph codes may not be a good proxy for how the manuscript was created at all?
You're right that the scribe did operate at the stroke level. Currier noted in 1976 that the glyphs are built from shared base strokes — "you can make up almost any of the other letters out of these two symbols." A glyph has a shape but the glyph consists at the same time of strokes.
For instance the ligature <ch> consists of two <e>-glyphs connected by a dash. But there are also instances of three <e> strokes connected with a dash like in You are not allowed to view links.
Register or
Login to view. on f82r. My Rule 1 — replace one or more glyphs by similar ones — is a glyph-level formalization of what is often a stroke level modification like the additional stroke changing "ch" into "sh".
Voynich words are also visually constrained. Schwerdtfeger described in 2008 the following four design rules for Voynich words: (1) line-glyphs can follow line-glyphs or <a>; (2) curve-glyphs and <a> can follow curve-glyphs; (3) the <l>-glyph can be used as a curve-glyph or as a line-glyph; and (4) gallows glyphs count as curve glyphs.
(7 hours ago)oshfdk Wrote: You are not allowed to view links. Register or Login to view.2. While it's easy to devise a test that would show that self-citation is possible, how to tell apart self citation as primary text generation method from actual citation, as in using past words to actually refer to the same or similar concepts in the text, especially in the context of a constructed language?
Your constructed language scenario — where "daiin" means "heat," "odaiin" means "fire," "qodaiin" means "scorching" — is a genuine alternative that would produce similar word families. But it makes predictions the VMS doesn't satisfy:
A constructed language dictionary produces consistent usage. If "daiin" means "heat," it should appear in consistent semantic contexts. In the VMS, "daiin" appears everywhere — in every section, next to every kind of word. No semantic clustering is detectable.
A constructed language doesn't show a continuous evolutionary gradient. If the dictionary is fixed, the vocabulary shouldn't evolve from section to section. But the VMS shows "chol" dominant in early sections, "chedy" dominant in late sections, with smooth intermediate stages. A dictionary doesn't evolve — a copying process does (see You are not allowed to view links.
Register or
Login to view.).
Continuous evolution follows from the incremental nature of the modification process. Each new word differs minimally from its source; accumulated modifications over many copying events produce gradual vocabulary drift. The direction of evolution is determined by the asymmetry of the process: new variants can only appear after their source words exist, creating a temporal arrow from early forms to late forms. “Words typical for Currier A also exist in Currier B, but not the other way round,” because late-emerging variants “could not appear on pages written previously” (Timm and Schinner, 2020, pp. 8–9). This is “an automatic side-effect” of the process, not evidence of changing writing strategies.
A constructed language doesn't show line-boundary production effects. If words have fixed meanings, their form shouldn't depend on line position. But some words appear almost exclusively at line starts or line ends. For instance "dsheey" appears at line starts 7 out of 8 times (see You are not allowed to view links.
Register or
Login to view.). Words looked up in a dictionary don't care about margins. The reason is that "the source for the first word in each line could only be found within the previous lines. Since the first and the last word in each line are easy to spot, the most obvious way is to pick them as a source for the generation of a group at the beginning or at the end of a line. For the second glyph group it is also possible to select the first group as a source. Since the first group in a line usually has a prefix the simplest change is to remove this prefix." (Timm 2014, p. 19). This is the reason that the second glyph group is shorter than the first group in 48% of the lines and longer in only 32% leading to the following statistical observation by Elmar Vogt:

[You are not allowed to view links.
Register or
Login to view. 2012: p. 4].
A constructed language doesn't show cross-line independence. If the text carries meaning, that meaning should continue across line breaks — sentences span lines in any meaningful text. But quimqu showed that within-line predictability is strong (~30% top-1) while cross-line predictability drops to near random. A text composed from a dictionary doesn't reset at line boundaries.
A constructed language with semantic prefixes doesn't show prefix-suffix independence. If "o-" means intensification, then prefix choice constrains meaning, which constrains which suffixes co-occur — because you don't intensify every concept equally often. But as you and dashstofsk demonstrated, prefix and suffix are selected nearly independently in the VMS. A semantic prefix system produces dependencies. A copying system produces independence.
See Table 3 in Timm & Schinner 2020, p. 10:
Note that this table reproduces the most frequent word types <daiin>, <ol>, and <chedy>.
A constructed language with semantic prefixes also wouldn't produce the frequency hierarchy observed in the VMS. In a semantic system, word frequency reflects how often a concept is discussed — unrelated to the word's visual similarity to other words. In the VMS, frequency and visual similarity are directly correlated. For instance, in most cases, words with "sh" are less frequent than the corresponding variant using "ch". Also words using "p" or "f" instead of "k" and "t" are generally less frequent. The frequency-connectivity correlation arises through a feedback loop inherent in the copying process. Frequent words are more likely to be selected as copying templates, generating more variants; the existence of more variants increases the probability that members of that word family are selected in subsequent copying events. This self reinforcing cycle ensures that the most frequently used words accumulate the most similar neighbors.
Your scenario is testable — and each test distinguishes it from self-citation. The VMS matches the self-citation predictions and fails the constructed-language predictions.