The Voynich Ninja

Full Version: Vocabulary size by Illustration Type
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2
(04-07-2022, 03:07 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.Between the two views:

- ignoring sample text length

- dividing by sample text length

both views are imperfect. It is not clear to me which one of the two is the more indicative one.

Certainly, dictionary size increases very non-linearly with sample text size.

It is only a problem when text lenghts are significantly different. That is the case here of course.
Large differences in sample text size are certainly a problem but it can be addressed by subsampling.

You can split each section into chunks of ~1000 vords each (maybe rounded to full pages) to match text length to the smallest and 2nd smallest ones (Astro and Zodiac). This also allows you to test if the differences between sections of different illustration types are significantly larger than within them.

(05-07-2022, 12:16 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view.If illustrations do indicate topics, some common terms specific to a particular type of illustration or topic should exist. However such terms doesn't exist.
This hypothesis implies a simple encoding in which 'vords' always translate 1:1 into plaintext words regardless of context, which is highly questionable considering the strange properties of 'Voynichese'. If it was that easy, the VM would have been deciphered long ago.

But still so far there is little (if any?) evidence of a connection between text and imagery. Please correct me if I'm wrong.
(05-07-2022, 08:26 PM)Bernd Wrote: You are not allowed to view links. Register or Login to view.Large differences in sample text size are certainly a problem but it can be addressed by subsampling.

You can split each section into chunks of ~1000 vords each (maybe rounded to full pages) to match text length to the smallest and 2nd smallest ones (Astro and Zodiac).

Yes, this is a valid approach statistically, but of course it weakens the statistics of the larger sections.
It also removes the meaning of the idea of a 'corpus'.
Certainly, as you said there is no perfect solution and it's always a trade-off between sample size and sample homogenity. We need to find a viable balance and test out which approach is the lesser evil.

Still I'd be interested if it is possible to distinguish between subsamples and samples of different sections and how much the length of the text fragment matters.

Also as with all such experiments I'd like to see a test run with readable 15th century texts of comparable length covering different topics. This should tell us about the resolving power of this setup.
(05-07-2022, 09:45 PM)Bernd Wrote: You are not allowed to view links. Register or Login to view.Still I'd be interested if it is possible to distinguish between subsamples and samples of different sections and how much the length of the text fragment matters.

There are some interesting blog posts about this subject:
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
(05-07-2022, 08:26 PM)Bernd Wrote: You are not allowed to view links. Register or Login to view.This hypothesis implies a simple encoding in which 'vords' always translate 1:1 into plaintext words regardless of context, which is highly questionable considering the strange properties of 'Voynichese'. If it was that easy, the VM would have been deciphered long ago.
Indeed.

(05-07-2022, 08:26 PM)Bernd Wrote: You are not allowed to view links. Register or Login to view.But still so far there is little (if any?) evidence of a connection between text and imagery. Please correct me if I'm wrong.

To illustrate my point I have used three different colors to mark all instances of vords containing the sequences 'ed' (plum), 'ho' (green), and 'in' (yellow): You are not allowed to view links. Register or Login to view.

The pages are obviously not independent of one another, since pages colored in similar way tend to be adjacent to one another in the manuscript. However if we look into the details the distribution of vords appears much more complicated.

There are at least two kinds of pages using herbal illustrations. There are herbal pages dominated by green + yellow (Currier A).
But there are also herbal pages dominated by plum + yellow (Currier B). However, some herbal pages in Currier A also contain vords colored in plum (see You are not allowed to view links. Register or Login to view.) and herbal pages in Currier B frequently contain vords colored in green. One page even contains a paragraph colored in green + yellow and another paragraph colored in plum + yellow (see You are not allowed to view links. Register or Login to view.).

For the stars section it is even possible to point to pages dominated by vords colored in plum (see You are not allowed to view links. Register or Login to view.) whereas the very next page is dominated by 'yellow' vords. Even another page within the very same section contains an unusual high number of vords colored in green (see You are not allowed to view links. Register or Login to view.).

Decide yourself if this type of changes 'suggests that the illustrations in each part of the text are relevant to the linguistic text in each section' [You are not allowed to view links. Register or Login to view., p. 303] or if it is more correct to describe this type of changes as: 'No obvious rule can be deduced which words form the top-frequency tokens at a specific location, since a token dominating one page might be rare or missing on the next one" [You are not allowed to view links. Register or Login to view., p. 3].
(08-07-2022, 09:05 AM)Torsten Wrote: You are not allowed to view links. Register or Login to view.To illustrate my point I have used three different colors to mark all instances of vords containing the sequences 'ed' (plum), 'ho' (green), and 'in' (yellow): You are not allowed to view links. Register or Login to view.
I consider that 'in', even 'ain' 'aiin' are a unique caracter. That explains the abundance of them.
'ho' is also part of the trigrams 'hor' and 'hol' when they are not separated by a space. A separated analisys is needed for the 3 of them.
(08-07-2022, 12:37 PM)Juan_Sali Wrote: You are not allowed to view links. Register or Login to view.I consider that 'in', even 'ain' 'aiin' are a unique caracter. That explains the abundance of them.
'ho' is also part of the trigrams 'hor' and 'hol' when they are not separated by a space. A separated analisys is needed for the 3 of them.

I did choose 'in' since it covers 'ain' as well as 'aiin' and 'aiiin'. In the same way the sequences 'hor' and 'hol' do both contain 'ho'. This means the illustration already covers all instances of 'ain', 'aiin', 'hor', and 'hol' and the distribution of colors is still the same:
You are not allowed to view links. Register or Login to view.
Pages: 1 2