The Voynich Ninja
Common vocabulary between pages - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: Common vocabulary between pages (/thread-2914.html)

Pages: 1 2 3 4


RE: Common vocabulary between pages - nablator - 01-09-2019

(01-09-2019, 03:54 PM)Koen G Wrote: You are not allowed to view links. Register or Login to view.Doesn't that look like it's not normalized?

It's not. Smile

a = number of types of a page
b = number of types of another page
c = number of common types

pct = 100*c/(a+b-c)

Maybe averaging c/a and c/b would give a better indicator of how much the pages have in common, giving equal importance to both pages?

pct = 50*(c/a+c/b)

EDIT: no, the results are visually very close. Confused

   


RE: Common vocabulary between pages - Koen G - 01-09-2019

I have a really hard time wrapping my head around how this could work. 

When one page has 50 types and another 50 as well, they can get 50c max.
But when one page has 90 types and another 10, they can get 10c max.

So ideally I think we should strive for a situation where the max c value would result in a same score.

I did some tests with both formulas and your second formula comes closer to desired numbers.

   

Edit: Nablator I just now see your edit. Even though the colors are close, your second attempt should still be better. But it's not yet optimal.


RE: Common vocabulary between pages - Koen G - 01-09-2019

Is there a way to derive a formula from a bunch of examples and desired outcome?


RE: Common vocabulary between pages - nablator - 01-09-2019

To get the values of the DESIRED column, you want 100 * c / min(a, b). It's a good formula.


RE: Common vocabulary between pages - Koen G - 01-09-2019

(01-09-2019, 07:31 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.To get the values of the DESIRED column, you want 100 * c / min(a, b). It's a good formula.
Could you make the sheet with that one?


RE: Common vocabulary between pages - nablator - 01-09-2019

(01-09-2019, 07:47 PM)Koen G Wrote: You are not allowed to view links. Register or Login to view.
(01-09-2019, 07:31 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.To get the values of the DESIRED column, you want 100 * c / min(a, b). It's a good formula.
Could you make the sheet with that one?

Is it better? They all look nearly the same to me.

   


.xlsx   commonTypesMinRatioGradientGreenRedZL2HerbalB.xlsx (Size: 16.72 KB / Downloads: 28)


.xlsx   commonTypesMinRatioGradientGreenRedZL2.xlsx (Size: 289.81 KB / Downloads: 49)


RE: Common vocabulary between pages - Koen G - 01-09-2019

It does look the same. Still, I think this formula normalizes the data as best as we can? Basically it expresses c as a percentage of max c.


RE: Common vocabulary between pages - Koen G - 01-09-2019

It does change the order of folios when you sort them though. For example, if I sort by f40r, the first sheet gives as highest hits: You are not allowed to view links. Register or Login to view. f94r You are not allowed to view links. Register or Login to view.

For the new one it is: You are not allowed to view links. Register or Login to view. f39r You are not allowed to view links. Register or Login to view.

So nothing major, just some shuffling. But it's better.


RE: Common vocabulary between pages - RobGea - 05-09-2019

This may be of use:
You are not allowed to view links. Register or Login to view.


RE: Common vocabulary between pages - DonaldFisk - 06-09-2019

(31-08-2019, 09:23 PM)Koen G Wrote: You are not allowed to view links. Register or Login to view.Has anyone collected data on which pages share most vocabulary? Specifically, I'd like to know when I select a page (for example f1v), which other folios share the most different word types with it?

I did this using Principal Component Analysis for the whole manuscript in You are not allowed to view links. Register or Login to view., and for herbal, text/recipe, and biological pages in You are not allowed to view links. Register or Login to view..