Dunsel > 15-02-2026, 03:04 AM
ReneZ > 15-02-2026, 04:09 AM
Quote:That is "ed" compared to the top 100 bigrams by total count and percentage of pages.
Jorge_Stolfi > 15-02-2026, 08:22 AM
(15-02-2026, 03:04 AM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.It's been known for many years that the bigram "ed" is just plain odd. It occurs in the Voynich as a midfix 4,474 times and as a suffix 186 times. Never as a prefix. That may not sound that striking but this chart shows just how striking it is.
ReneZ > 15-02-2026, 09:52 AM
eggyk > 15-02-2026, 11:02 AM
(15-02-2026, 08:22 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.But I must insist that statistics of characters and digraphs are bound to be more confusing than illuminating. Their frequencies are mostly determined by whether they occur in the most common words; and these in turn may be highly dependent on the topic.
For example, the digraph "rb" may be significantly more common in a Latin herbal text than in an astronomical text, because it occurs in the word "herba".
The digraph "ed" may be more frequent in an English chronicle than in an English herbal, because of its occurrence in verbs inflected in the past tense.
And the "th" digraph may be less common at the beginning of a line in any English text, because its frequency there is determined by the occurrence of the common words that begin with it: "the", "this", "that", "then", "they", "them", "there", "thus", etc -- but those common words are relatively short, and when text is formatted into paragraphs the first word of each line tends to be longer than average, while short words are more likely to fit at the end of each line.
All the best, --stolfi
Dunsel > Yesterday, 12:01 AM
(15-02-2026, 04:09 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.I like your first graph, which is very striking.
Just a minor question or perhaps a nitpick: what exactly do you mean with:
Quote:That is "ed" compared to the top 100 bigrams by total count and percentage of pages.
These would be two different lists. Is it the superset of these two lists, or rather the cross-section?
With respect to Currier, in his paper he has a list of points that help to distinguish his A vs. B languages. While 'ed' only appears in B language, he never mentions this bigram as a discriminator.
Have you checked this page: You are not allowed to view links. Register or Login to view. ?
There is some overlap with what you are showing.
Also, in your first graph, there are some points near the top (close to 100% of page coverage) but with a relatively low total count. These may be of interest as well.
Finally, bigram statistics depend heavily on the transliteration alphabet. It would be worth using something different than Eva, not because it would be better or worse, but because it may bring a different (additional) perspective. This will not affect the specific bigram 'ed' too much though, so if that is your main focus, it is not too important.
Dunsel > Yesterday, 12:10 AM
(15-02-2026, 09:52 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.Well, bigram statistics show the behaviour of bigrams.
Comparisons with English (or Latin etc) may not be relevant because the Voynich MS text behaves in a very different way from these natural languages.
Their frequency may depend on the subject matter but it remains to be seen to what extent.
Should the Herbal-A and Herbal-B pages be about the same subject matter?
Their bigram statistics are wildly different, so we have a strong statistical observation that requires an explanation.
Bigram statistics are not the only thing. They are a piece of the puzzle.
Dunsel > Yesterday, 12:29 AM
(15-02-2026, 08:22 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.Thus statistics of words and word pairs are usually more illuminating than statistics of characters or digraphs.
All the best, --stolfi
oshfdk > Yesterday, 12:37 AM
Rafal > Yesterday, 12:46 AM