The Voynich Ninja

Full Version: An Essay on Entropy: what is it, and why is it so important?
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5
(29-04-2022, 04:14 PM)RenegadeHealer Wrote: You are not allowed to view links. Register or Login to view.In your reply to tavie, you mention h1 and h2 forming a profile which is distinct for any given language. Having read Patrick Feaster’s work on the VMs, I’m inclined to take this out a step further and say that any kind of information has a unique profile of h0, h1, h2, and h1/h2 values.

When I said "language", I most certainly meant "text" Smile Well, it's complicated, because both language and text type influence entropy statistics.

I like the possibility of Roman numerals or something similar being involved in Voynichese. If we were to test this, we would need the style of Roman numeral notation where they add a swoop at the end, because these behave a lot like [iin]. And these swoops would need to be transcribed as different characters. Probably subtractive notation is best avoided (so iiii instead of iv). We discussed this a little bit in this thread: You are not allowed to view links. Register or Login to view.

Then the question is: only roman numerals? How many glyphs can we get if we include variations? Is this enough to match Voynichese? It's an interesting question that I haven't studied yet.
(29-04-2022, 08:35 AM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.Finally, this subject was discussed by Lindemann and Bowern, who compared actual abbreviated historical texts in a few languages with normalized, unabbreviated versions (You are not allowed to view links. Register or Login to view.):

Lindemann and Bowern Wrote:The usage of abbreviations and special characters has the effect of raising the conditional character entropy of the English, Icelandic, and Latin texts and taking them further from the values we find for Voynichese.

Thanks for confirming my understanding of the prior work by Lindemann and Bowern on abbreviations -- I re-read it last night, thinking I had remembered it having some useful data on this topic, but just didn't have a chance to post it in this string.  I do appreciate the extra explanation -- examples are always helpful.
Simply to remind people, when carrying out experiments in EVA or other transcriptions, you have to adjust for the fact that multiple EVA characters map to one glyph, in a process called ligatures.

IE, Sh is Sh

ITh is ITh
Not properly accounting for ligatures is one of the main reasons entropy experiments fail to be reproducible. People forget that EVA does not attempt to map glyphs to Roman letters, but is simply a way to represent glyphs on a keyboard.
Rule number one when repeating entropy experiments - make sure H0 is always identical!
(30-04-2022, 06:30 PM)davidjackson Wrote: You are not allowed to view links. Register or Login to view.Rule number one when repeating entropy experiments - make sure H0 is always identical!

What do you mean with this, David? I think h0 is a pretty worthless statistic for the types if experiments we are doing. 

Let me demonstrate this with an example. I took a cleaned up full transcription of the VM, it's only got lower case characters and spaces. Over two hundred thousand characters. I then added 49 characters to the end: 1234567890 ABCDEFGHIJKLMNOP&é"'(§è!èçà)-$µù^=:;²= 

(It started out in a structured manner but then I was just pressing random buttons). 

Now normally this addition at the end should not affect the stats much: it's a small nonsense line in a massive text document. And indeed, h1 and h2 hardly change. 

h1 increases with 0.08%
h2 decreases with 0.01%

But h0 increases with 34% !!

H0 might be good to show how much mess there is in your txt file. But h1 and h2 is what you actually need.

Edit: I guess you might mean that when repeating an experiment, an identical h0 should indicate that the same transcription system is used, but this is not reliable. One forgotten upper case letter or punctuation mark and your h0 will change.
(30-04-2022, 08:18 PM)Koen G Wrote: You are not allowed to view links. Register or Login to view.Edit: I guess you might mean that when repeating an experiment, an identical h0 should indicate that the same transcription system is used, but this is not reliable. One forgotten upper case letter or punctuation mark and your h0 will change.

Exactly that. If you want to extend upon someone else's experiment (ie, fit your own text experiments into their scatter chart) you first need to repeat their workings to ensure that you get the same result. Otherwise, you're comparing apples with pears, if you can't even repeat their results.
Ensuring that you're working off the same character set is the very first step. Checking that H0 is identical is one way to ensure that the transcriptions match.

Edit: what keyboard are you using that has § as a principal key???
I still think h0 is unreliable in itself. It can be used as an additional check though, so I guess I follow you there. 

My keyboard is azerty Belgian/French. I have no idea what that key is for, I've never used it  Big Grin
(30-04-2022, 09:43 PM)Koen G Wrote: You are not allowed to view links. Register or Login to view.I still think h0 is unreliable in itself. It can be used as an additional check though, so I guess I follow you there. 



My keyboard is azerty Belgian/French. I have no idea what that key is for, I've never used it  Big Grin

§ is an abbreviation for section.  It is used in legal citations -- like the primary statutes for patent law in the US is 35 USC §§ 101-390.  Which you would read "35 USC (sections) 101 to 390."  USC stands for United States Code.  The only way to get that into a text using a standard US keyboard is "insert" and "special character".  It would have been nice to have it as a standard key when I was in law school -- although it probably could have been set up that way . . .
Edit: what keyboard are you using that has § as a principal key???
[/quote]

That is so on German keyboards, it is upper case 3, and is the symbol for Paragraphenzeichen  in law bools, especially German Federal Law, e.g. § 10 BGB, cp. You are not allowed to view links. Register or Login to view., I have found it a pretty good idea not to read EN wikipedia  only
Has someone looked at the entropy of those medieval French business ledgers consisting of Roman numerals?
They would probably need to be transcribed first. RenegadeHealer, do you have a link to where you read about this?
Pages: 1 2 3 4 5