Rafal > 22-12-2025, 08:41 PM
Quote:Maybe the texts of Herbal-A and Herbal-B were taken from sources in two different dialects, or two very similar languages. Like Northwest Lower West Bavarian and Southwest Lower West Bavarian...
Jorge_Stolfi > 22-12-2025, 10:05 PM
(22-12-2025, 08:41 PM)Rafal Wrote: You are not allowed to view links. Register or Login to view.Personally I am still struggling with understanding statistical research of VM. These guys often claim that Voynichese behaves similarly to real languages. ... Because if it is only clusters then it doesn't prove similarity to real languages.
nablator > 22-12-2025, 10:27 PM
(22-12-2025, 10:05 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.And so on. AFAIK, no one has found a statistical property of Voynichese that does not occur also in some natural language.
kckluge > 23-12-2025, 12:18 AM
(22-12-2025, 08:49 AM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.(20-12-2025, 08:35 AM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.I would be curious to see a version of the plot colored by Scribe/Hand.
As always, it’s possible I made errors. I used the You are not allowed to view links. Register or Login to view. shared by Karl here to re-create the plot in You are not allowed to view links. Register or Login to view., colored by hand and adding labels.
It seems the two clusters match the hands very well.
Scribe4 has pages in the two different clusters. The two pages that end up in the B cluster on the right are the Rosettes page I think (f85v1, I could never understand exactly page numbering for the large foldout) and the last zodiac page: Sagittarius (f73v).
Scribe3 has a single “A” page: f58r. The position of f58v, just across the gap, is also interesting. See also Karl's comment #37 just above.
You are not allowed to view links. Register or Login to view., the zodiac pages show a drift from “close to A” to “close to B” (all in the range of Rene’s C/Cosmo intermediate language). You are not allowed to view links. Register or Login to view. shows that the drift takes place both in circle and label text.
ReneZ > 23-12-2025, 12:55 AM
kckluge > 23-12-2025, 01:10 AM
(22-12-2025, 07:49 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.(20-12-2025, 04:10 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.then I see no way to meaningfully interpret the results of PCA at all.
Quote:* Chances of getting a useful insight from these analyses will improve if one uses fewer data so as to reduce the number of factors that affect them. Like comparing Herbal-A and Herbal-B only, thus hopefully eliminating the "topic" factor. Then maybe one can figure out whether the difference is a change of spelling, or something else. Once one gets some insight on Herbal-A vs. Herbal-B, one can then consider what is happening in other sections.
Quote:* It is not surprising that word frequencies are different in each section. Even for the most common words, which may or may not be "function" words like "much", "is", "find", "good"; not to mention "content" words like "herb", "star", "blood", etc.
* If word frequencies change, digraph frequencies will change too, since they are determined by the digraphs that appear in the most common words. As I mentioned before, "rb" is probably much more frequent in an herbal text, (Latin or English) than in a text about astrology. (Unless the latter it talks a lot about "orbits"...)
* What is surprising is that (IIUC) Herbal-A differs from Herbal-B noticeably more than either differs from Bio or Stars. Thus, besides the difference of topic, we indeed have a difference of language or spelling (or encryption). Maybe the texts of Herbal-A and Herbal-B were taken from sources in two different dialects, or two very similar languages. Like Northwest Lower West Bavarian and Southwest Lower West Bavarian...
[...]
* And the chances of obtaining useful insights will improve a lot if one uses the good old "scientific method": make an hypothesis, then devise the simplest and most effective way to test for it, and do that.
Al the best, --stolfi
Jorge_Stolfi > 23-12-2025, 01:25 AM
(22-12-2025, 10:27 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.AFAIK, no one has found a natural language that explains any of these statistical properties:
Quote:Major inconsistencies in basic glyph and glyph-bigram statistics between pages (some pages full of "or", "ol", "in" etc., some pages totally missing "e", "n", etc.
Quote:Currier "language" drifts and dialects
Quote:Word pairs statistics: You are not allowed to view links. Register or Login to view. You are not allowed to view links. Register or Login to view.
Quote:Frequent local similarities (reduplication and almost-reduplication) including insanely high levels of clustering of k/t gallows especially in Currier B, You are not allowed to view links. Register or Login to view.
Quote:Patterns across word breaks, by Emma M.S. and Marco P. You are not allowed to view links. Register or Login to view.See the above answer, especially the first point. Namely, the frequency of a character pair in this statistic is determined by its occurrence in the most common consecutive word pairs. Every occurrence of "it is" in English increases the frequency of "t-i", and so on. Again, if Voynichese did not have anomalous frequencies of bigrams across word breaks, it would be evidence that it was not a natural language.
Quote:"Vertical pairs" by Tavie You are not allowed to view links. Register or Login to view.One "anomaly" discussed in that thread is that the first word (only) of a line is longer in average, while the last 1-3 words are shorter. As I explained in the previous post, this sort of anomaly is a guaranteed result of the trivial line-breaking algorithm. Does it explain precisely the length anomaly of the VMS? I don't know; but until this explanation is tested, the anomaly cannot be used as evidence of LAAFU and/or that Voynichese is not natural language.
Quote:Patrick Feaster's several statistical discoveries are yet to be explained by a quantitative study of any languageThese anomalies all seem to be largely consequences of the anomalous distribution of words in line-initial and line-final position, discussed above. There may be additional perturbations due to uncertain word spaces, which are expected to vary with position along a line.
Jorge_Stolfi > 23-12-2025, 01:43 AM
(23-12-2025, 12:55 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.PCA is only used to find the largest dimensions in a multi-dimensional cloud of points. The result can be used for visualisation.
ReneZ > 23-12-2025, 03:25 AM
kckluge > 23-12-2025, 03:49 AM
(20-12-2025, 04:26 PM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.I find the subject of an A/B switch vs a progressive drift very interesting. It would be great if this problem could get a clear answer.
I hadn't previously realized that Scribe 4 apparently played a major role in the A/B transition or switch.
As always, it is likely I made errors, but I tried computing the frequency of the bigram ‘ed’ in each zodiac page. It’s lucky that the order of these pages is known. I processed Circle and Label text both separately and together. As a reference, the average frequency of 'ed' in Currier B is ~4%.