The Voynich Ninja

Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13

Most of the cases, they mean the simple substitution cipher, the Caesar cipher yes. I'd say when a newbie tries to "decipher" the VMS, s/he tries that first and foremost.

Anton,

"Mathematically that's correct, but from what I've seen in practice, generally when people consider it from the angle of 'plaintext vs ciphertext' they don't think of maths, but rather of the underlying cultural case study....Culturally, a cipher is a technique used to conceal. In this view, natural language written in some script is not a 'cipher' text, and it has not been 'encrypted', for cryptography is secret writing, and there is no secret in plain words put on paper."

I get and agree with the distinction you're making here, and it's certainly the case that things like Benedek Lang's work on the types of content that appears in cipher texts address those kinds of cultural context issues. That's absolutely a useful kind of research question to pursue. However...

You've accidentally made my point for me when you say, "generally when people consider it from the angle of 'plaintext vs ciphertext' they don't think of maths," because part of the point I am making is that considering it from the angle of 'plaintext vs. ciphertext' does not justify in any way, shape, or form not thinking about the statistics of the text.

If someone thinks spaces in the mss. are word separators in an underlying text (a debatable proposition, but one a lot of work on the text makes), then just the word length distribution combined with the 1st and 2nd order entropy statistics massively constrains the space of viable hypotheses about the underlying text (if there is one). The problem is that you get people who make handwavey appeals to "natural language" thinking that somehow changes that fact or allows them to ignore the implications of the statistics. It doesn't. The characteristics of the text are what they are, and any hypothesis has to engage with that.

"Putting aside some most extravagant natural language theories, generally people are considering a forgotten language or (another case) an unknown script designed specifically for ethnic group having no their script of their own. Researchers pursuing or inclining to this strand may be linguists or people with some background in linguistics."

...or a script designed by a non-native speaker for their own use in recording a language whose relevant characteristics (phonology, tones, etc.) they don't fully understand or hear to record. Or a (possibly lossy or ambiguous) abbreviated text. I am completely onboard with considering those kinds of possibilities.

I want to make clear that I am not suggesting that linguistic approaches or methods have no utility here. For example, off the top of my head I can think of several cases where people have applied techniques from linguistics to look at the morphological structure of "words" in the VMS. That's certainly worth doing. What I am suggesting is that hypothesizing something like the text being "a forgotten language [in] an unknown script designed specifically for ethnic group having no their script of their own" is all to often used as an excuse to ignore, dismiss, or otherwise fail/refuse to engage with the actual, concrete statistical properties of the text. It isn't.

"On the contrary, when people try to find substitution cipher solutions, that's mostly because they are novices in the VMS, and its perceived 'simplicity' blinds them at first. A substitution cipher is the simplest cipher, so it's entertaining to seek for possible solutions, especially if they are limited to suggesting several keywords."

I didn't express myself clearly, so I'll try again. Consider the following two hypotheses:

1) The text is a phonetic recording by a non-native speaker using a constructed or non-Latin script (a "natural language" hypothesis), and

2) The text is a phonetic recording by a non-native speaker using the Latin script that has then been enciphered using a constructed cipher alphabet and a simple substitution cipher (a "ciphertext" hypothesis).

From a hypothesis testing point of view, those are not distinguishable cases (assuming the same non-native speaker, and given the ~mid-20ish number of common glyphs in most transcription schemes). If the statistics of the text agree with or rule out Hypothesis #2, they also agree with or rule out Hypothesis #1. It buys nothing to make handwavey appeals to "natural language" -- Hypothesis #1 is just a special case of Hypothesis #2. Applying cryptanalyitic methods is not somehow ignoring the possibility that it isn't a ciphertext, it's just recognizing that equivalence.

"Just a passing note (I have not read the blog post above discussed)."

Not a problem in the context of what we're talking about here, although I do want to say that Robert's posts on his Goodreads page (You are not allowed to view links. Register or Login to view.) regarding the papers from the Malta conference are thoughtful and well worth reading...

Apologies if this shows up with excessive unnecessary white space between paragraphs -- not sure why that's happening...

(21-02-2023, 09:55 PM)Mark Knowles Wrote: You are not allowed to view links. Register or Login to view.I never quite sure precisely what people mean when they use the term "substitution cipher". Do they just mean a cipher when each letter is substituted with a letter or symbol? Or do they include a more complex cipher with homophones, nulls, substring substitutions, word substitutions etc.

Mark,

I *think* "substitution cipher" is a class that subsumes most (if not all) of those things. When the mapping is 1-1 (no nulls or homophones), it's a "simple substitution cipher", with an addition axis of whether the mapping is fixed (a monoalphabetic substitution cipher) or varies according to some key sequence and alphabet table (a polyalphabetic substitution cipher). Unless I'm mistaken, variable-length substitutions for letters (straddling checkboards, for instance) fall under this umbrella. If by "substring substitutions" you mean something like replacing digrams with other digrams, I don't believe those sorts of schemes are considered "substitution ciphers". But I'm saying that off the cuff without double checking standard nomenclature against something like the Riverbank manuals or Lanaki's "Classical Cryptography Course" (You are not allowed to view links. Register or Login to view. relevant document I came across recently that may help answer questions about what buckets different cipher schemes fall into is the following (1918) taxonomy by William Friedman: You are not allowed to view links. Register or Login to view.

(22-02-2023, 05:15 AM)kckluge Wrote: You are not allowed to view links. Register or Login to view.Apologies if this shows up with excessive unnecessary white space between paragraphs -- not sure why that's happening...

This is a known glitch, however we don't know how to fix it. BTW, you can use text selection to quote portions of text. Just select the phrase that you wish to quote with your mouse, and then hit the "Reply" pop-up that appears next. This will paste the selection into the editor window.

(22-02-2023, 05:15 AM)kckluge Wrote: You are not allowed to view links. Register or Login to view.What I am suggesting is that hypothesizing something like the text being "a forgotten language [in] an unknown script designed specifically for ethnic group having no their script of their own" is all to often used as an excuse to ignore, dismiss, or otherwise fail/refuse to engage with the actual, concrete statistical properties of the text. It isn't.

Well, I would say that in the first place such hypothesizing goes ahead of even considering the statistical properties of the text. We have seen many such examples. E.g. late prof. Bax (who was not some enthusiast amateur but a prominent linguist and whose 2014 article first attracted my attention to the VMS) did not seem to have considered statistical properties when he issued the said paper - and he was much criticized for that by the VMS-research veterans. I am far from linguistics and do not know whether statistics are or are not the second nature of a modern linguist. I think there is computational linguistics which revolves around statistics, but probably there are other flavours of linguistics which are statistics-unaware. People with linguistics background may comment on this, whether I'm right or wrong.

My own story was that I did not think about statistics at all when I read that paper of Stephen's (although I'm a technical guy and understand what the stuff is about). It was not until I got a bit acquainted with the VMS research history, mostly through web resources of Nick and Rene, that I began to consider the VMS text from the stats perspective.

(22-02-2023, 05:15 AM)kckluge Wrote: You are not allowed to view links. Register or Login to view.Robert's posts on his Goodreads page (You are not allowed to view links. Register or Login to view.) regarding the papers from the Malta conference are thoughtful and well worth reading.

I've been damned busy, not only had to miss the conference, but even did not approach the conference papers yet... Sad

Schiffer Books have advised that Voynich Reconsidered is now scheduled for the Spring 2024 catalogue. Review of the first galley is underway. It includes an additional chapter with my summaries and critiques of selected papers from the Voynich 2022 conference.

I discovered a set of Unicode Voynich fonts which make it possible (at least in Word, Excel and Powerpoint) to write continuous text in Voynich glyphs without laboriously importing the glyphs as jpg or png images.

[font=-apple-system, system-ui, BlinkMacSystemFont, 'Segoe UI', Roboto, 'Helvetica Neue', 'Fira Sans', Ubuntu, Oxygen, 'Oxygen Sans', Cantarell, 'Droid Sans', 'Apple Color Emoji', 'Segoe UI Emoji', 'Segoe UI Emoji', 'Segoe UI Symbol', 'Lucida Grande', Helvetica, Arial, sans-serif]Some of my thoughts on the conference papers are on my GoodReads/Amazon blog at You are not allowed to view links. Register or Login to view.[/font]

[quote='Anton' pid='53906' dateline='1677068304']
[quote]
What I am suggesting is that hypothesizing something like the text being "a forgotten language [in] an unknown script designed specifically for ethnic group having no their script of their own" is all to often used as an excuse to ignore, dismiss, or otherwise fail/refuse to engage with the actual, concrete statistical properties of the text. It isn't.
[/quote]

Well, I would say that in the first place such hypothesizing goes ahead of even considering the statistical properties of the text. We have seen many such examples.
[/quote]

We are in vigorous agreement on that point -- statistical properties are necessarily statistics computed over some specific considered body or bodies of text. Suggestions of additional bodies/types of texts to examine should be welcomed.

[quote='Anton' pid='53906' dateline='1677068304']
[quote]
E.g. late prof. Bax (who was not some enthusiast amateur but a prominent linguist and whose 2014 article first attracted my attention to the VMS) did not seem to have considered statistical properties when he issued the said paper - and he was much criticized for that by the VMS-research veterans. I am far from linguistics and do not know whether statistics are or are not the second nature of a modern linguist. I think there is computational linguistics which revolves around statistics, but probably there are other flavours of linguistics which are statistics-unaware.
[/quote]

Be that as it may, investing some reasonable level of due diligence in a literature search and engaging with key relevant prior work is an expected part of doing and publishing research. The goal here should be for everyone to broaden their sandboxes, not bury their heads in the one they're accustomed to.

In hindsight, I probably didn't make the point(s) I was trying to make as clearly as I would have liked at the start, and I appreciate the interaction because that helped me recognize that and try to hone what I'm trying to say. To recapitulate/summarize in bullet form:

People coming up with unusual/unconventional forms of natural language text to consider (or, for that matter, alternatives to the "text" being text at all) is a good thing
Given that linguists have specialized knowledge of languages, their helping come up with such unusual/unconventional possibilities is a very good thing
Given that linguists also have knowledge of/access to specialized analytic methods & tools from their field, their applying those to the text of the mss. is also a very good thing
Having said that, however, the text of the mss. is still what it is...
- Cipher types aren't *ignoring* the possibility that the mss. text is in a natural language rather than a cipher, they are *explicitly rejecting* it (with a narrow set of known exceptions including abjads & devowelled text)
- They are doing so for perfectly sound evidence-based reasons
- This is a consequence of the equivalence between an unenciphered text in some language in an alphabetic script and the same text/script having been enciphered with a Caesar Cipher with shift = 0 -- all the arguments against the text being a (single glyph for single letter) substitution cipher apply
- Those reasons don't go away simply because someone is approaching the text from a different disciplinary framework; they have to be engaged with
- For that matter, it's entirely possible to find linguists who are of the same opinion (whether for similar or different reasons)
With regard to possibilities that fall outside the body of texts used in making those arguments, as always the burden of proof is on the person making the positive claim
- Anyone disputing the rejection of the "natural language text" hypothesis needs to produce a corpus whose characteristics are consistent with those of the mss. text
- Making vague handwavy arguments about creoles or phonetic spellings or whatever as if that's a "get out of jail free" card doesn't cut it
- That's not me being a jerk, that's how science works
- (Nor for that matter is that me being a naive falsificationist; I always liked Allen Newell's view that hypotheses are like graduate students: you don't abandon them at the first signs of trouble, you try to find ways to fix them)

Karl

(21-02-2023, 09:55 PM)Mark Knowles Wrote: You are not allowed to view links. Register or Login to view.I never quite sure precisely what people mean when they use the term "substitution cipher". Do they just mean a cipher when each letter is substituted with a letter or symbol? Or do they include a more complex cipher with homophones, nulls, substring substitutions, word substitutions etc.

Mark,

While looking for something else, i came across the following book chapter on enciphering methods -- it's discussion of the different types of substitution ciphers probably answers your question in a more authoritative way relative to standard usage: You are not allowed to view links. Register or Login to view.

Karl

(P.S., sorry if I dragged this thread off topic -- I'm very much looking forward to seeing Robert's book when it comes out)

(11-03-2023, 07:59 PM)kckluge Wrote: You are not allowed to view links. Register or Login to view.
(21-02-2023, 09:55 PM)Mark Knowles Wrote: You are not allowed to view links. Register or Login to view.I never quite sure precisely what people mean when they use the term "substitution cipher". Do they just mean a cipher when each letter is substituted with a letter or symbol? Or do they include a more complex cipher with homophones, nulls, substring substitutions, word substitutions etc.

Mark,

While looking for something else, i came across the following book chapter on enciphering methods -- it's discussion of the different types of substitution ciphers probably answers your question in a more authoritative way relative to standard usage: You are not allowed to view links. Register or Login to view.

Karl

It clarifies what is meant by a simple substitution cipher, which is what I guessed. Some aspects of what I have described are not referred to. This accounts for my use of the term "diplomatic cipher" given its more specific correspondence.

Submitted third galley for "Voynich Reconsidered" to Schiffer. Expanded Chapter 5 (Currier) and tightened up Chapter 8 (The Gold Bug) and Chapter 9 (Cannabis). We may go now to layout stage.

Received fourth (probably final) galley from Schiffer Books. We have added a postscript to Chapter 5 (Currier) in which we recognise Currier's concern about what looked like "junk glyphs", and we consider whether certain glyphs serve as punctuation in the Voynich manuscript.

We have to return this galley to Schiffer by March 27, 2023.

Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13

Anton

kckluge

kckluge

Anton

dfs346

kckluge

kckluge

Mark Knowles

dfs346

dfs346