Volume of overlapping information

Volume of overlapping information - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: Volume of overlapping information (/thread-2327.html)

Volume of overlapping information - Wladimir D - 04-03-2018

I noticed that on bifolio 104v-115r on sections 2 and 4, ink was applied, which are not found in color on other pages. On site 3, in the process of writing quality of writing letters worsens , either the ink in the ink tank dried up, or the feather worn out. Finding such ink supports the version that the text was written before stitching.

Filename: 104v-115r.JPG Size: 122.86 KB 04-03-2018, 07:51 PM

I decided to intersecting the words of the text on this bifolio.
V (tx) - the amount of text on the pages.
N - is the number of identical words.
N (un) - the number of unique words on the page.
N (U) - is the number of intersecting words, taking into account the repetition on the corresponding page.
It is necessary to introduce the term V (∩) volume of overlapping information. This ratio is N (U) / V (tx), expressed as a percentage.
V(tx)      N(un)                 N                     N(υ)                V(∩)

104V        458        64=14,0%         93                   207                  45,2%
115R        444        64=14,4%          93                   206                  46,4%
The flip side of this bifolio gives results worse.
           V(tx)      N(un)                 N                     N(υ)                V(∩)

104R        438        74=16,9%          80                   158                  36,1%
115V        399        74=18,5%          80                   137                  34,3%

114V/115R      V(tx)=362/444   N=74   N(υ) =133/160 V(∩) = 36,7/36,0%
104V/105R      V(tx)=458/370   N=60   N(υ) =142/120 V(∩) = 31,0/32,4%
103V/104R      V(tx)=450/438    N=67 N(υ) =196/147     V(∩) = 43,6/33/5%
104R/104V     V(tx)=438/458     N=73   N(υ) = 153/178 V(∩) = 34,9/38,9%
Even higher results for bifolio 78V-81R, where the left and right pages are connected by pipes.
           V(tx)      N(un)                 N                     N(υ)                V(∩)

78V       292        13=4,45%          53                   157                  53,8%
81R       207        14=6,76%          53                   112                  54,1%

A high result (albeit a little lower), in bifolio 79v-80r, where stitching is done.
           V(tx)      N(un)                 N                     N(υ)                V(∩)

79V        355       35=9,86%          66                     169                  47,6%
80R        441      44=10.0%           66                     225                  51,0%

I did about 40 comparisons, from Pn = n! = 204!
The minimum V (∩) I received

  1R/104V      V(tx)=210/458   N=28 N(υ) =58/56 V(∩) =27,6/12,2%
49V/104V      V(tx)=142/458   N=22   N(υ) =42/46   V(∩) =29,6/10,0%
But here V (tx) can exert a great influence.

RE: Volume of overlapping information - Anton - 04-03-2018

That's an interesting approach.

Basically, V(∩) is the number of vords of page X that do occur in page Y, divided by the volume of page X, right?

The total number of comparisons will be not n! I think, but n(n-1).

I wonder if it can be automated... well surely it can... I'd say I wonder whether anyone could run the check. Blush

At least it would be interesting to see the picture for botanical folios within a single quire, for kinda screening estimate whether it shows something.

I think I could write a matlab script if provided with raw transcriptions - but this no earlier than July, since I'm desperately busy until then.

RE: Volume of overlapping information - Wladimir D - 04-03-2018

Basically, V(∩) is the number of vords of page X that do occur in page Y, divided by the volume of page X, right?

The number of matching words on a particular page can be different. For example, 79V / 80R qokal (2/8) You are not allowed to view links. Register or Login to view. . Their number on each page should be summed, and then divided by the amount of text on this page.

RE: Volume of overlapping information - Wladimir D - 06-03-2018

I'm not a programming specialist. If someone is interested in my idea, then use it.
I see the order of work as follows.
First we need to find V (∩) in the interior of each Quire. At the same time, it is necessary to ensure the process of forming a rating of overlapping words within each Quire (not counting the repetition on a particular page).
Then, you need to compare the Quire 1 ratings (without taking into account r1), 2, 3, 4, 5, 6, 7, 15, 17, 19 with the Quire 9 rating.
If we correctly understand that Quire 9 is an astronomical section, then words that coincide in the compared ratings can not have values: leaf, root, stem, bud, ..., juice, nectar, ...
And vice versa. Crossing words with high ratings can have semantic meanings, which can be divided into several groups:
1 / punctuation marks
2 / pronouns / prepositions
3 / nouns describing the environment - water, fire, earth, air, mountains, ...., life, death ...
4 / adjectives: large, small, medium, ..., colors - blue, red, yellow, ..., star, ....
5 / verbs: grow (spread), be, ...
6 / ......

PS/ Anton, we were both mistaken. Smile

When comparing two pages to k = 2.
Cn = n!/{ k!*(n-k)!}=(n-k)!*n*(n-1)/{k!*(n-k)!}= n*(n-1)/{1*2}>20000

RE: Volume of overlapping information - Davidsch - 06-03-2018

let me start by: the idea is a good one, but:

>>N - is the number of identical words.
You are writing here that the words found on both pages, compared is only N ?
So if you compare 104V and 115r there are only 93 words the same?

I do not have my data now, but is seems very low. What is a 'word' in your perception, 2 long or 1 character or...?
And do you count between the unique words MS-overall, or in the group uniques per page? (I assume you did that, cause it gives the lowest number)

The nicest you can do is:
Take all unique words in the MS. That is the (group MS-uniques)
Then compare each page or section against that
and make a weighted formula for the amount of those uniques (group page-uniques)
compared to the total volume of the text of that the page you compared towards the total volume of the MS overal.
Make a graph of that.

I expect that this will not reveal much, because the (group MS-uniques) versus the (group page-uniques) will be overall consistent.
However, if you build this clean and present it nicely it will show the differences where Currier talked about.

>>204!

You have no idea, but the result of that is:
1326057243693621217332951116794432419966106079326253461128571361896219220627592813941309056267893882766353579962753961259654835090125299420645464567948176773341405929566742723734451284877728373946503737685045475883370067891849358707528438515087100654429082243182025720274900300149415790281766550837429303812722914834538921053721492193280000000000000000000000000000000000000000000000000

RE: Volume of overlapping information - Wladimir D - 06-03-2018

(06-03-2018, 04:33 PM)Davidsch Wrote: You are not allowed to view links. Register or Login to view.let me start by: the idea is a good one, but:

>>N - is the number of identical words.
You are writing here that the words found on both pages, compared is only N ?
So if you compare 104V and 115r there are only 93 words the same?

I do not have my data now, but is seems very low. What is a 'word' in your perception, 2 long or 1 character or...?
And do you count between the unique words MS-overall, or in the group uniques per page? (I assume you did that, cause it gives the lowest number)

comparison order 104v/115r 1) orar(1/1) 2) opchedy(2/1) 3) aiir (2/1) …….92) okeey(3/1) 93) ycheeo(1/1) You are not allowed to view links. Register or Login to view.

RE: Volume of overlapping information - Wladimir D - 10-03-2018

Here's what V (∩) looks like for 78v-81r.
Words are divided into several groups. ** cdy: ** ccdy: **** y: **** n: ** ar / or: and words containing ol / al. Unique words are highlighted in white.

Filename: рис 78v-81r.JPG Size: 107.15 KB 10-03-2018, 07:48 PM

It is interesting that the assortment of intersection 78v-81r is characteristic not only for Quire 13, but also for 103r, 108r, 108v, 116r.
You are not allowed to view links. Register or Login to view.

So the intersection of 81r / 75r words that enter the intersection 78v-81r is V (∩) = 53,9%. And the total volume of the intersection is 81r / 75r V (∩) = 58.7 / 56.5%.
You are not allowed to view links. Register or Login to view.

RE: Volume of overlapping information - Davidsch - 11-03-2018

That is extremely high, but again: it depends very much of the chosen 'groups' or 'words'. If you use small groups as or and ol, of course there is a 50% or higher overlap.
And on the other hand, it is unclear what you calculate, if you take 98% of all groups, and find them in one page and compare those with the groups in another page,
the difference that you get is simply the difference in counting those groups.

RE: Volume of overlapping information - Wladimir D - 11-03-2018

(11-03-2018, 05:00 PM)Davidsch Wrote: You are not allowed to view links. Register or Login to view.That is extremely high, but again: it depends very much of the chosen 'groups' or 'words'. If you use small groups as or and ol, of course there is a 50% or higher overlap.
And on the other hand, it is unclear what you calculate, if you take 98% of all groups, and find them in one page and compare those with the groups in another page,
the difference that you get is simply the difference in counting those groups.

I'm looking for the intersection of whole words. What I called the group is for highlighting the color. If you want to see all the intersecting words, copy the link to the WORD file. There will be a complete list of words.