The Voynich Ninja
Are perfect-reduplication and quasi-reduplication related? - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: Are perfect-reduplication and quasi-reduplication related? (/thread-3107.html)

Pages: 1 2 3 4 5 6


RE: Are perfect-reduplication and quasi-reduplication related? - Helmut Winkler - 21-02-2020

Even more common is the repetition of etc (etcetera) three or four times on the last line of a chapter or section without being there a follow up, especially in the 15th c. (sorry, I have no digitized  example)


RE: Are perfect-reduplication and quasi-reduplication related? - -JKP- - 21-02-2020

Helmut, yes, that's another example.

I probably have clips, but no time to hunt them up right now, but sometimes EVA-g was repeated several times at the ends of paragraphs for the same reason.


RE: Are perfect-reduplication and quasi-reduplication related? - davidjackson - 21-02-2020

Not exactly. I'm saying that %red ~ %quasi/(n.Quasi/n.Red).

If n.Quasi and n.Red are independent variables (one being the count of perfect reduplications, the other the count of vords with edit distance = 1) , then this shouldn't happen? Or am I being thick?

(21-02-2020, 02:40 PM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.I found that the two % columns have a high correlation: 82%. Could this be an element in favour of the two phenomena being related?
How did you calculate the 82% rate?


RE: Are perfect-reduplication and quasi-reduplication related? - MarcoP - 22-02-2020

(21-02-2020, 08:43 PM)davidjackson Wrote: You are not allowed to view links. Register or Login to view.Not exactly. I'm saying that %red ~ %quasi/(n.Quasi/n.Red).

If n.Quasi and n.Red are independent variables (one being the count of perfect reduplications, the other the count of vords with edit distance = 1) , then this shouldn't happen? Or am I being thick?

Hi David,
as I said in the previous post, %red is proportional to n.Red. Your equation connects %red and n.Red, which indeed are not independent.

Let's call C the total number of couples in a sample:

(a1) %red = n.Red/C
(a2) %quasi = n.Quasi/C

(b) the right side of your equation
%quasi/(n.Quasi/n.Red)

© we can multiply numerator and denominator by n.Red/n.Quasi
%quasi/(n.Quasi/n.Red) = %quasi*n.Red/n.Quasi

(d) we can replace %quasi according to a2
%quasi*n.Red/n.Quasi = (n.Quasi/C)*n.Red/n.Quasi =
(n.Quasi*n.Red)/(n.Quasi*C) = n.Red/C = %red

So, yes, I think that (b) is equivalent to %red.
%quasi/(n.Quasi/n.Red)=%red

Does this make sense? Am I getting something wrong?

(21-02-2020, 08:43 PM)davidjackson Wrote: You are not allowed to view links. Register or Login to view.
(21-02-2020, 02:40 PM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.I found that the two % columns have a high correlation: 82%. Could this be an element in favour of the two phenomena being related?
How did you calculate the 82% rate?

I am sorry, I messed up something here: thank you for pointing that out. The correct correlation is 0.85 (I used the CORREL openOffice function to compute this). Does this seem correct do you?


RE: Are perfect-reduplication and quasi-reduplication related? - davidjackson - 22-02-2020

I think my misunderstanding is in how you calculated NRed and NQuasi. Aren't these numbers supposed to be independent counts of different things? In your first post you said
(20-02-2020, 04:21 PM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.By perfect-reduplication, I mean the exact consecutive repetition of the same word: e.g. daiin.daiin

By quasi-reduplication, I mean two consecutive words that are very similar to each other: e.g. qokedy.qokey


So I assumed that Nred and Nquasi are two independent counts that don't have anything to do with one another. (IE the number of times they both appear in the corpus).

But they are percentages of a total count number C which is always >nRed<nQuasi. So NQuasi is always a fixed multiple of NRed (although the fixed multiple changes from section to section).

It doesn't matter what n.Quasi is set to, the ratio is always the same. You could set n.Quasi to 5000 in _Herbal_B, the equation %red = %quasi/(n.Quasi/n.Red) is always true.

It's easy to test: If %red = %quasi/(n.Quasi/n.Red)
your _Herbal_B data (with a value of C calculated to be 32.53 which is the closest I could get:
C          NRed           %Red           NQuasi   %Quasi
32.53   20                0.612           63          1.928
                                20/32.53                     63/32.53             
so    0.612 = 1.928(63/20)  or to put it another way   NQuasi/c = (Nred/c)*(NQuasi/NRed)  | 63=20*3.15


They are not independent counts of different things, as I first assumed.


Quote:Let's call C the total number of couples in a sample:

(a1) %red = n.Red/C
(a2) %quasi = n.Quasi/C


(What is the value of C for each row? It's not in your first count table.)

Right, reading this post again it doesn't make much sense!!!! My real question:

- What is C for each section? And how do the counts N.red and N.quasi relate to it?

BTW, my OpenOffice gives 0.476 for CORREL(%red;%quasi) using your data. I attach the spreadsheet here:
.ods   marcoP_quick_test.ods (Size: 11.26 KB / Downloads: 11)


RE: Are perfect-reduplication and quasi-reduplication related? - MarcoP - 22-02-2020

Hi David,
thank you very much for checking these numbers: this is quite helpful to me! You already helped my correct one error and it is well possible it was not the only one.

%Red and %Quasi are independent in principle, but they appear to be correlated in the VMS.
I added the C values (number of word-couples) for each row here. I also add MATTR200 which, as Koen pointed out, is significantly anti-correlated with %Red. For MATTR, I used a slightly modified version of Nablator's java class. 

Obviously, reduplication contributes to lower MATTR, but the instances of reduplication, however numerous, are not enough to have a great impact: I have to reflect more about this. Of course, MATTR should be totally independent on quasi-reduplication, but in the VMS both appear to be correlated with Reduplication and (somehow more weakly) correlated with each other.
I would also like to test with scrambled versions as suggested by Koen.

At the bottom of the table, I include an example for a simple synthetic file (attached): an extreme case that could help clarify things.

            C  N.Red %Red N.Quasi %Qs MATTR200
Herbal_B  3268  20  0.612   63   1.928  0.767
AstroCZ   1756  11  0.626   24   1.367  0.776
Pharma    2167  16  0.738   44   2.030  0.752
StarsQ20 10091  75  0.743  223   2.210  0.771
Herbal_A  7496  68  0.907  180   2.401  0.752
BioQ13    6347  66  1.040  163   2.568  0.654
-------------------
50_redup   400 200 50.000   40  10.000  0.050


(22-02-2020, 09:09 AM)davidjackson Wrote: You are not allowed to view links. Register or Login to view.the equation %red = %quasi/(n.Quasi/n.Red) is always true.

Yes, the equation is always true.
Since %quasi/N.Quasi is 1/C, you can rewrite the equation as
%red=n.Red/C, which is true by definition.


(22-02-2020, 09:09 AM)davidjackson Wrote: You are not allowed to view links. Register or Login to view.BTW, my OpenOffice gives 0.476 for CORREL(%red;%quasi) using your data. I attach the spreadsheet here: 

Please, check the value of C4. Shouldn't it be 0.738 to match my data?


RE: Are perfect-reduplication and quasi-reduplication related? - Koen G - 22-02-2020

Thank you Marco, I like these kinds of tests since they bring us closer to the core of Voynichese issues.

I once tested whether reduplication impacts larger-window TTR by simply removing duplicated words within a short distance. Small-window TTR went way up while larger windows felt little effect. So as you say, reduplication (and even things like a.b.a patterns) are not enough to explain TTR differences.

Your experiment with shuffled texts might teach us more about the relation between TTR and reduplication.


RE: Are perfect-reduplication and quasi-reduplication related? - davidjackson - 22-02-2020

(22-02-2020, 11:14 AM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.Please, check the value of C4. Shouldn't it be 0.738 to match my data?

Whoops! Typo Alert! Sorry.

Thank you for the example file and for providing the C values.



Why are you multiplying %red and %quasi by 100? 66 occurrences in a corpus of 6347 is 0.0139%, not 1.04%.
But yes I now get CORREL to 0.8535.

Calculating the associated p-value (because I can) we get the p-Value of .30621 (the result is significant at p < .05)
   

The coefficient of determination is 0.7285.

You are not allowed to view links. Register or Login to view.
What does it all mean? No idea.
YES it suggests that reduplication and quasi reduplication are linked. But I can think of no logical reason why. Nor do I really now think that we have excluded the possibility of dittography. It may simply be that the number of instances are so low that they appear to be statically linked; but the real reason is that occurrences of both lie within the realms of possibility.
IE it's not unlikely that the scribe fell into the trap of reduplicating words, and semi reduplicating words, as he (they) went along.

I still think it is worth investigating the number of occurrences by scribe, were such a thing possible.


RE: Are perfect-reduplication and quasi-reduplication related? - -JKP- - 22-02-2020

I finally have a few minutes to really think about this question.


I agree that these are duplicates, particularly if they are adjacent -> daiin daiin They are apparently similar. Whether it's "duplication" in the strictest sense of being a repeat (rather than two instances with shades of meaning), I need to think about some more. In English we can say, "He had had a bad day," and it has a different grammatical meaning from, "He had a bad day." It's duplication in terms of letters, but not in terms of grammar.


I am not completely comfortable with the word quasi-duplication (you don't need the "re-" part, as it is redundant) when referring to VMS text because it implies an intent to duplicate when, in fact, two things can be almost the same and yet have no such intention. It might need to be defined more specifically.

For example, in a sentence like "The dog dug a hole," dog and dug only differ by one letter, but there's no intent to duplicate and the similarity is coincidental.

I can see that quasi-duplication might be an appropriate term for the example you posted in your opening post where obviously the intention is the same, even though the words are spelled differently (e.g., blan and blanc). We can see by context that they refer to the same thing.

But, in contrast to natural-language examples, we cannot discern intention in the VMS until we know more about it. We can observe similarity, but we need more evidence to know whether it's quasi-duplication (or quasi-triplication or quasi-replication if there are more than two that appear to be similar).


RE: Are perfect-reduplication and quasi-reduplication related? - MarcoP - 23-02-2020

(22-02-2020, 05:19 PM)davidjackson Wrote: You are not allowed to view links. Register or Login to view.Why are you multiplying %red and %quasi by 100? 66 occurrences in a corpus of 6347 is 0.0139%, not 1.04%. 

I believe that multiplying by 100 is correct. See wikipedia You are not allowed to view links. Register or Login to view..
Half a quantity is 50% not 0.5%.