julian > 31-07-2021, 05:52 PM
(31-07-2021, 07:47 AM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.Hi Julian,
I am trying to reproduce your results as a first step to further explore the interesting observations you made.
I am getting slightly higher counts than you. This is what I meant to do (as always, I may have made errors):
- I processed You are not allowed to view links. Register or Login to view..
- I kept all lines (labels included).
- Lines were not merged.
- I treated the unreadable character '?' as an ordinary character.
- I counted overlapping matches (kalshedykchedychT results in two counts for k:7).
The maximum values I get are: f:6(53) p:6(168) k:5(1352) t:5(757)
I attach the linux scripts I used and the output csv.
julian > 31-07-2021, 05:58 PM
(31-07-2021, 01:01 PM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.(31-07-2021, 11:53 AM)nablator Wrote: You are not allowed to view links. Register or Login to view.julian Wrote:Another goal was to see whether there are differences in the distributions of number of glyphs following EVA p, k, f and t. It turns out that statistically there is a difference: EVA t , k tend to be followed by 5 glyphs before the next gallows is written, and EVA f, p tend to be followed by 6 or 7 glyphs.
That's probably because words are longer (or p/f are inserted) on the first lines of paragraphs.
It seems to me that p/f words are longer because they include a bench much more often that k/t words. If benches are represented as single characters (e.g. C and S) instead of two characters (ch and sh), the next-gallows-distance has a maximum at 5 characters for all gallows.
EDIT: if these figures are correct, they confirm Koen's idea about the impact of parsing.
MarcoP > 31-07-2021, 06:05 PM
(31-07-2021, 04:48 PM)MichelleL11 Wrote: You are not allowed to view links. Register or Login to view.do the distance between occurrence of any letter in -- say English or Latin -- show the same sort of peak around 3 or 5 with a long tail?
julian > 31-07-2021, 06:08 PM
(31-07-2021, 04:48 PM)MichelleL11 Wrote: You are not allowed to view links. Register or Login to view.(31-07-2021, 12:13 AM)Renegade Healer Wrote: You are not allowed to view links. Register or Login to view.Another area for improvement is to exclude Grove words from all the counts. I don't think the gallows that make Grove words have anything to do with the gallows that occur anywhere else in the manuscript.
Hi, Julian:
It's great to see you with a new posting. I continue to mull over what your results could mean.
One thing I would be curious about is whether there is a difference in your numbers between Currier A and Currier B. Of course, breaking it up could reduce the numbers such that the results are less reliable. Apologize if Marco did this -- I have not reviewed his data yet.
Also, in a more general sense -- do the distance between occurrence of any letter in -- say English or Latin -- show the same sort of peak around 3 or 5 with a long tail?
I assume this would be connected to frequency. If so, what frequency in English or Latin is needed to exhibit a similar behavior?
Or is this gallows glyph behavior one of the "non-language" characteristics that is not mimicked by any particular single letters in, say -- English or Latin? Is this kind of distance measurement so similar to secondary entropy that it is particular to Voynichese (or other languages with low entropy --- etc., etc. -- no need to test)?
As for Grove words -- my understanding of their definition is that it is all words that have a gallows glyph as the first(?) glyph and if that glyph is removed, a "valid" (e.g. seen elsewhere in the manuscript) word remains. I don't think word position in the paragraph is involved, but I could be wrong.
I agree it would be interesting to see if removing such examples of gallows use from your numbers how the graphs are changed.
The theory is that these kinds of gallows glyphs are more likely to serve some other function than "the same" substitution proposed by that same gallows glyph use in other word environments. This is because in "non-Grove" words the glyph use is what turns the word into a "valid" word and therefore is more likely to be representing some letter or group of letters in the plaintext.
But "Grove word" gallows glyphs have a theoretical greater chance of having some sort of non-letter function (paragraph signal/topic signal/item signal/punctuation-like) or maybe a signal to interpret the substitution going forward in some particular way?
But this is just my impression from conversations on the board and elsewhere and I could have misinterpreted.
Thanks for considering doing some of these additional analyses and I will be reviewing your and Marco's data with great interest.
Michelle
Anton > 31-07-2021, 07:42 PM
(30-07-2021, 11:46 PM)julian Wrote: You are not allowed to view links. Register or Login to view.Hi Koen: thanks. I used Takeshi. It's true that the counts would go down if some of those glyph sequences you mention are treated as single glyphs, and it's possible that their probability of occurring is different depending on the preceding gallows. I haven't looked at that, preferring to leave the decision on what is a different glyph to Takeshi-san ( a cop-out, I know).
julian > 31-07-2021, 09:12 PM
(31-07-2021, 07:42 PM)Anton Wrote: You are not allowed to view links. Register or Login to view.(30-07-2021, 11:46 PM)julian Wrote: You are not allowed to view links. Register or Login to view.Hi Koen: thanks. I used Takeshi. It's true that the counts would go down if some of those glyph sequences you mention are treated as single glyphs, and it's possible that their probability of occurring is different depending on the preceding gallows. I haven't looked at that, preferring to leave the decision on what is a different glyph to Takeshi-san ( a cop-out, I know).
Hi Julian, judging from the example in the very beginning of your post you count ch as a single glyph, do you?
My first impression is that the figures look like the Rice distribution... I'd check that and if it's so then estimate its parameters - I suspect that v would be around 1...1.4 - who knows where it may lead us to
Anton > 31-07-2021, 09:18 PM
(31-07-2021, 09:12 PM)julian Wrote: You are not allowed to view links. Register or Login to view.EVA ch is counted as two glyphs - which example are you referring to?
Quote:The line above can be represented as
Gxxxx-xxGxxx-Gxx-xGxxx-xx-Gxxxx-xGxxxx
(31-07-2021, 09:12 PM)julian Wrote: You are not allowed to view links. Register or Login to view.If I had to fit the distributions, I'd use a Weibull function, as it is very flexible.
julian > 01-08-2021, 06:51 PM
(31-07-2021, 09:18 PM)Anton Wrote: You are not allowed to view links. Register or Login to view.(31-07-2021, 09:12 PM)julian Wrote: You are not allowed to view links. Register or Login to view.EVA ch is counted as two glyphs - which example are you referring to?
This one (after the second figure in your blog post):
Quote:The line above can be represented as
Gxxxx-xxGxxx-Gxx-xGxxx-xx-Gxxxx-xGxxxx
(31-07-2021, 09:12 PM)julian Wrote: You are not allowed to view links. Register or Login to view.If I had to fit the distributions, I'd use a Weibull function, as it is very flexible.
Flexible yes... but I'm rather thinking of the phenomena which could produce this picture
Koen G > 01-08-2021, 07:18 PM
julian > 02-08-2021, 12:07 AM
(01-08-2021, 07:18 PM)Koen G Wrote: You are not allowed to view links. Register or Login to view.This is not specifically about your work, Julian, but I think in general parsing is not considered enough. Too often, people take EVA as "Voynichese", while questions of parsing should really come between EVA and certain statistical analyses. Basically anything that focuses on characters: character entropy, glyph counts etc. EVA was never intended to "correctly" represent Voynichese, just to be able to type it into a computer.
I also don't think it was Takahashi's intention to provide a "correct" parsing of the MS, just to somehow represent it in a computer-readable form using EVA. While I greatly appreciate and make use of his efforts, we must keep in mind that at the time he was a biology major who self-describes as, I quote, "an idiot who spent his college days deciphering Voynich".
I think your research is really interesting, so I hate the sound of my own complaining, but I am just afraid that this is an analysis of EVA, not necessarily of the VM. Between EVA and character-based analyses looms the matter of parsing, which is in my opinion the greatest challenge we face.