The Voynich Ninja

Full Version: Glyph counts between Gallows
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4
(31-07-2021, 07:47 AM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.Hi Julian,
I am trying to reproduce your results as a first step to further explore the interesting observations you made.

I am getting slightly higher counts than you. This is what I meant to do (as always, I may have made errors):
  • I processed You are not allowed to view links. Register or Login to view..
  • I kept all lines (labels included).
  • Lines were not merged.
  • I treated the unreadable character '?' as an ordinary character.
  • I counted overlapping matches (kalshedykchedychT results in two counts for k:7).

The maximum values I get are: f:6(53) p:6(168) k:5(1352) t:5(757)

I attach the linux scripts I used and the output csv.

Hi Marco,

Thanks for checking. I am a bit stunned and very pleased that your results are approximately the same as mine! I wonder if the small discrepancies are due to my using a different transcription (the file I have is eva-takeshi.txt and was retrieved some years ago).

In your last bullet point, I agree that this should be two counts for k:7 - that's what my code would count, too.

Julian
(31-07-2021, 01:01 PM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.
(31-07-2021, 11:53 AM)nablator Wrote: You are not allowed to view links. Register or Login to view.
julian Wrote:Another goal was to see whether there are differences in the distributions of number of glyphs following EVA p, k, f and t. It turns out that statistically there is a difference: EVA t , k tend to be followed by 5 glyphs before the next gallows is written, and EVA f, p tend to be followed by 6 or 7 glyphs.

That's probably because words are longer (or p/f are inserted) on the first lines of paragraphs.

It seems to me that p/f words are longer because they include a bench much more often that k/t words. If benches are represented as single characters (e.g. C and S) instead of two characters (ch and sh), the next-gallows-distance has a maximum at 5 characters for all gallows.

EDIT: if these figures are correct, they confirm Koen's idea about the impact of parsing.

It may be that using Voyn_101 would be clearer for this study, as GC defined separate codes for the benched and un-benched gallows.
(31-07-2021, 04:48 PM)MichelleL11 Wrote: You are not allowed to view links. Register or Login to view.do the distance between occurrence of any letter in -- say English or Latin -- show the same sort of peak around 3 or 5 with a long tail?  

Hi Michelle,
this is what I get for f,k,p,t in Alice in Wonderland.

The eight Voynich gallows occur in 52% of word tokens, but they occur twice (or more) in only 2% of word tokens (the single gallows is the "core" of Stolfi's word structure). The peak of the distance corresponds to the average length of a Voynichese word.

In English, f,k,p,t occur in 47% of word tokens (close enough to 52%). They occur twice (or more) in 8% of word tokens. These are all consonants, so they are not terribly likely to occur consecutively (though, ignoring spaces, this is frequent enough). I guess that the most frequent distance of 2-3 corresponds to a syllable (including words like 'the'). The spike at distance 0 for 'f' is due to the frequent double 'ff' and even more to the sequence "of the".

I don't think it is possible to have anything like the Voynich distribution in English or in Latin-related languages: what is needed is a set of frequent characters that are constrained not to appear in the same word. Also, Voynichese has fewer very short words than ordinary written languages.
(31-07-2021, 04:48 PM)MichelleL11 Wrote: You are not allowed to view links. Register or Login to view.
(31-07-2021, 12:13 AM)Renegade Healer Wrote: You are not allowed to view links. Register or Login to view.Another area for improvement is to exclude Grove words from all the counts. I don't think the gallows that make Grove words have anything to do with the gallows that occur anywhere else in the manuscript.



Hi, Julian:

It's great to see you with a new posting.  I continue to mull over what your results could mean.

One thing I would be curious about is whether there is a difference in your numbers between Currier A and Currier B.  Of course, breaking it up could reduce the numbers such that the results are less reliable.  Apologize if Marco did this -- I have not reviewed his data yet.

Also, in a more general sense -- do the distance between occurrence of any letter in -- say English or Latin -- show the same sort of peak around 3 or 5 with a long tail?  

I assume this would be connected to frequency.  If so, what frequency in English or Latin is needed to exhibit a similar behavior? 

Or is this gallows glyph behavior one of the "non-language" characteristics that is not mimicked by any particular single letters in, say -- English or Latin?  Is this kind of distance measurement so similar to secondary entropy that it is particular to Voynichese (or other languages with low entropy --- etc., etc. -- no need to test)? 

As for Grove words -- my understanding of their definition is that it is all words that have a gallows glyph as the first(?) glyph and if that glyph is removed, a "valid" (e.g. seen elsewhere in the manuscript) word remains.  I don't think word position in the paragraph is involved, but I could be wrong.

I agree it would be interesting to see if removing such examples of gallows use from your numbers how the graphs are changed.

The theory is that these kinds of gallows glyphs are more likely to serve some other function than "the same" substitution proposed by that same gallows glyph use in other word environments.  This is because in "non-Grove" words the glyph use is what turns the word into a "valid" word and therefore is more likely to be representing some letter or group of letters in the plaintext.

But "Grove word" gallows glyphs have a theoretical greater chance of having some sort of non-letter function (paragraph signal/topic signal/item signal/punctuation-like) or maybe a signal to interpret the substitution going forward in some particular way?

But this is just my impression from conversations on the board and elsewhere and I could have misinterpreted.

Thanks for considering doing some of these additional analyses and I will be reviewing your and Marco's data with great interest.

Michelle

Hi Michelle,

Thanks - I also wonder about Currier A/B, but I haven't looked yet. Your comments about distances between letters in English and Latin is interesting. What's the most likely distance between "e"s in English? I don't know, but it must be related to the number of letters in the alphabet and the frequency of the letter itself.

The thing about Gallows is that they stick out like a sore thumb - they are clearly not "like" the other glyphs. I suppose they are a bit like capital letters in English, but far too frequent to be that. So it seems like their inter-distance properties might be unusual in comparison to the other glyphs.

Thanks for the clarification about Grove :-)

Julian
(30-07-2021, 11:46 PM)julian Wrote: You are not allowed to view links. Register or Login to view.Hi Koen: thanks. I used Takeshi. It's true that the counts would go down if some of those glyph sequences you mention are treated as single glyphs, and it's possible that their probability of occurring is different depending on the preceding gallows. I haven't looked at that, preferring to leave the decision on what is a different glyph to Takeshi-san ( a cop-out, I know).

Hi Julian, judging from the example in the very beginning of your post you count ch as a single glyph, do you?

My first impression is that the figures look like the Rice distribution... I'd check that and if it's so then estimate its parameters - I suspect that v would be around 1...1.4 - who knows where it may lead us to Undecided
(31-07-2021, 07:42 PM)Anton Wrote: You are not allowed to view links. Register or Login to view.
(30-07-2021, 11:46 PM)julian Wrote: You are not allowed to view links. Register or Login to view.Hi Koen: thanks. I used Takeshi. It's true that the counts would go down if some of those glyph sequences you mention are treated as single glyphs, and it's possible that their probability of occurring is different depending on the preceding gallows. I haven't looked at that, preferring to leave the decision on what is a different glyph to Takeshi-san ( a cop-out, I know).

Hi Julian, judging from the example in the very beginning of your post you count ch as a single glyph, do you?

My first impression is that the figures look like the Rice distribution... I'd check that and if it's so then estimate its parameters - I suspect that v would be around 1...1.4 - who knows where it may lead us to Undecided

Hi Anton,

EVA ch is counted as two glyphs - which example are you referring to?

If I had to fit the distributions, I'd use a Weibull function, as it is very flexible.

Julian
(31-07-2021, 09:12 PM)julian Wrote: You are not allowed to view links. Register or Login to view.EVA ch is counted as two glyphs - which example are you referring to?

This one (after the second figure in your blog post):

Quote:The line above can be represented as

Gxxxx-xxGxxx-Gxx-xGxxx-xx-Gxxxx-xGxxxx

(31-07-2021, 09:12 PM)julian Wrote: You are not allowed to view links. Register or Login to view.If I had to fit the distributions, I'd use a Weibull function, as it is very flexible.

Flexible yes... but I'm rather thinking of the phenomena which could produce this picture
(31-07-2021, 09:18 PM)Anton Wrote: You are not allowed to view links. Register or Login to view.
(31-07-2021, 09:12 PM)julian Wrote: You are not allowed to view links. Register or Login to view.EVA ch is counted as two glyphs - which example are you referring to?

This one (after the second figure in your blog post):

Quote:The line above can be represented as

Gxxxx-xxGxxx-Gxx-xGxxx-xx-Gxxxx-xGxxxx

(31-07-2021, 09:12 PM)julian Wrote: You are not allowed to view links. Register or Login to view.If I had to fit the distributions, I'd use a Weibull function, as it is very flexible.

Flexible yes... but I'm rather thinking of the phenomena which could produce this picture

Hi Anton, you're right - in the counting example "Gxxxx" I'm counting what I see as glyphs, whereas the analysis itself was done using the EVA transcription.
This is not specifically about your work, Julian, but I think in general parsing is not considered enough. Too often, people take EVA as "Voynichese", while questions of parsing should really come between EVA and certain statistical analyses. Basically anything that focuses on characters: character entropy, glyph counts etc. EVA was never intended to "correctly" represent Voynichese, just to be able to type it into a computer. 

I also don't think it was Takahashi's intention to provide a "correct" parsing of the MS, just to somehow represent it in a computer-readable form using EVA. While I greatly appreciate and make use of his efforts, we must keep in mind that at the time he was a biology major who self-describes as, I quote, "an idiot who spent his college days deciphering Voynich".

I think your research is really interesting, so I hate the sound of my own complaining, but I am just afraid that this is an analysis of EVA, not necessarily of the VM. Between EVA and character-based analyses looms the matter of parsing, which is in my opinion the greatest challenge we face.
(01-08-2021, 07:18 PM)Koen G Wrote: You are not allowed to view links. Register or Login to view.This is not specifically about your work, Julian, but I think in general parsing is not considered enough. Too often, people take EVA as "Voynichese", while questions of parsing should really come between EVA and certain statistical analyses. Basically anything that focuses on characters: character entropy, glyph counts etc. EVA was never intended to "correctly" represent Voynichese, just to be able to type it into a computer. 

I also don't think it was Takahashi's intention to provide a "correct" parsing of the MS, just to somehow represent it in a computer-readable form using EVA. While I greatly appreciate and make use of his efforts, we must keep in mind that at the time he was a biology major who self-describes as, I quote, "an idiot who spent his college days deciphering Voynich".

I think your research is really interesting, so I hate the sound of my own complaining, but I am just afraid that this is an analysis of EVA, not necessarily of the VM. Between EVA and character-based analyses looms the matter of parsing, which is in my opinion the greatest challenge we face.

Hi Koen, 

I'm not sure what you are getting at when you say that the analysis is of EVA and not the VMS. 

EVA is one person's attempt to make a machine readable version of the text. There are several others, all with their pros and cons. That EVA is a good approximation of what we see on the folios is not in dispute (is it?). I mean, the way EVA represents a gallows as p,f,k or t seems reasonable to me, and the way it represents the benched gallows likewise. In Voyn_101 it is done differently - notably, if I remember correctly, there are 19 different gallows glyphs! 

But whatever transcription is used, if it's faithful to what appears in the manuscript, the counting of glyphs seems a valid activity, and the counts will reflect what is actually on the folios. The values of the counts will surely differ amongst the transcriptions.

I feel like I'm missing your point, apologies if so :-)
Pages: 1 2 3 4