[split] Word length autocorrelation

[split] Word length autocorrelation - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: [split] Word length autocorrelation (/thread-4910.html)

Pages: 1 2 3

RE: [split] Word length autocorrelation - Jorge_Stolfi - 06-09-2025

[Fixed a bug in the program. The conclusion is the same, but more definite.]

"And then we must consider the need to squeeze words together as we get near the end of the line"
# average length = 3.80
# averages = 3.789 3.842
# variances = 3.730994 -0.035088 3.695906
# deviations = 1.9316 1.9225
# word length correlation = -0.009

"And then we must consider the needto squeezewords togehtheraswe getneartheendoftheline"
# average length = 7.70
# averages = 6.111 8.222
# variances = 16.361111 +20.472222 41.694444
# deviations = 4.0449 6.4571
# word length correlation = +0.784

The averages, variances and deviations are for the lengths of first word and second word of each pair.
The middle number in the "variances" line is the covariance.
The correlation is defined as covariance/sqrt(variance1 * variance2)

RE: [split] Word length autocorrelation - quimqu - 06-09-2025

You are right. I don't see if these autocorrelation calculations on the Voynich are really very useful.

RE: [split] Word length autocorrelation - Jorge_Stolfi - 06-09-2025

(06-09-2025, 10:24 AM)quimqu Wrote: You are not allowed to view links. Register or Login to view.You are right. I don't see if these autocorrelation calculations on the Voynich are really very useful.

A statistic that should be more robust (in the sense of being independent of word splitting/joining, whether by the Author, Scribe, or Transcribers) would be the distribution of "number of glyphs between successive occurrences of X, ignoring spaces", where X could be gallows or any other selected subset of the glyphs.

It might let us infer whether the puff gallows p,f on parag head lines stand for simple gallows t,k or combinations like te,ke

(Although it is possible that p with hook = te or et, p without hook = t. There is a word somewhere that is something like cheopy (the ch and y may be something else) where the e is the hook of the p, and the o is nested under the arm of the p, between the hook and the leg...)

Unfortunately I don't see how one could choose X for other languages that would allow meaningful comparison of the distributions. Unless the shape of the distribution for the VMS turns out to be really weird, like two peaks at 4 and 7 glyphs apart...

All the best, --jorge