The Voynich Ninja
Spaces - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: Spaces (/thread-1051.html)

Pages: 1 2


Spaces - Anton - 08-11-2016

The question is often discussed whether spaces in the VMS are real spaces or not.

In his You are not allowed to view links. Register or Login to view., Bennett specifies that space is the most frequent character in the languages that are considered there (Western European languages mostly). Interestingly, I have never thought of a space in that way before reading the book.

I decided to check whether this is true for Voynichese. I did a very brief check (less than 600 character long - first and fourth paragraphs of You are not allowed to view links. Register or Login to view. in Takahashi's transcription) for a trial indication - and it seems that it is. The space (represented by the dot in the transcription) is ranked first, accounting for 17 and 16% of the total, respectively. I added a space to the end of each line except the last line of a paragraph. I also included "titles" in the count.

I suspect this is the sort of a check that must have been performed before me. Are there results for a more solid body of the text than two paragraphs? D'Imperio does not count spaces, and neither does the "Voynich Reader" tool.

Do the results speak in favour of the assumption that spaces are spaces? It's 3am in Moscow, so I'm not sure right now Confused


RE: Spaces - Sam G - 08-11-2016

Well, if you count line breaks as spaces, then a block of text with n words will contain n-1 spaces, which means you have approximately one space per word in the larger blocks of text.  Labels, on the other hand, will have few to no spaces (unless we count the divisions between labels as spaces).

So, in order for a letter to be more frequent than a space in the main text, it would have to occur more than once per word on average.

I could imagine that if you had a language with many long words and a relatively low number of phonemes that you might be able to find a character more frequent than a space (assuming 1 character = 1 segmental phoneme), though I don't know if such a language exists.

If you don't require that one character represents a phoneme, then it would be easy to come up with a contrived system that would have characters more common than spaces.  For instance, if you were to replace every instance of o in the VMS with ooo, then o would probably be more common than a space.


RE: Spaces - Anton - 08-11-2016

Yes, that's quite obvious. But what about Voynichese? Suppose there are no  "real spaces" there at all (i.e. the plain text was prepared for encoding without spaces), or the "real spaces" are represented by some glyph, while the "observed spaces" are just fakes. Does the fact that the "observed space" is the most frequent character disprove such assumptions?


RE: Spaces - Sam G - 08-11-2016

Well... I suppose not... since like I said it's not hard to imagine a system in which a certain letter is more common than a space, but I don't think this is really the most useful way to determine whether the spaces are meaningful or not.

Really, I don't see why this issue is controversial.  Just about everyone will agree that letters have certain places within words where they appear.  q goes at the beginning, n goes at the end, etc.  Since spaces are precisely that which define and delimit words, it follows directly from this that the spaces are meaningful.  E.g. saying "q only appears at the beginning of a word" is functionally the same as saying "q only follows a space".  And then of course the one-word labels confirm that this is correct.


RE: Spaces - Anton - 08-11-2016

Actually not that simple, because we are not sure that vords are words. Suppose the Voynichese text is just a scrambled sequence of characters, where even characters within words are scrambled (yes there are labels, but let's leave them apart).

For example I can represent this sentence in the following way:

For4ex ampl e4I4c an4repr esen t4this4sen ten ce4in4the4fo llo w ing4w ay

You see, even without scrambling, I substituted "4" for real spaces and introduced fake spaces (so that their count is larger than the count of "4"). I did that without any reversibility of the procedure, just to illustrate what I mean.


RE: Spaces - Sam G - 08-11-2016

(08-11-2016, 10:37 PM)Anton Wrote: You are not allowed to view links. Register or Login to view.Actually not that simple, because we are not sure that vords are words. Suppose the Voynichese text is just a scrambled sequence of characters, where even characters within words are scrambled (yes there are labels, but let's leave them apart).

For example I can represent this sentence in the following way:

For4ex ampl e4I4c an4repr esen t4this4sen ten ce4in4the4fo llo w ing4w ay

You see, even without scrambling, I substituted "4" for real spaces and introduced fake spaces (so that their count is larger than the count of "4"). I did that without any reversibility of the procedure, just to illustrate what I mean.

Well, your example doesn't address the point I made at all.


RE: Spaces - nablator - 12-07-2018

This "Voynich Manuscript similarity sorted EVA-transcription analysis by Joachim Dathe" (or a similarly sorted list of words) may be useful for identifying potentially fake spaces.

Quote:* Even showing identical expressions separated into multiple parts:

* > .ol.s.aiin. (3*11)

* > .olsaiin. (2*9)

* >.chey.kain. (4*11)

* >.cheykain. (1*10)
You are not allowed to view links. Register or Login to view.

(I don't understand the ">" in the output format)

On the other hand, missing spaces may be the real issue. Some spaces may have been omitted in order to obfuscate and shorten the resulting text. Undecided


RE: Spaces - Koen G - 12-07-2018

What about if you take the number of spaces as a percentage of the total number of characters and compare this across various languages and Voynichese? Should be easy to do in Word. (I'm on my phone right now)


RE: Spaces - Koen G - 12-07-2018

So for the whole VM, adding spaces to the end of each line I get 16,4%. Without spaces at the end of lines I get 14,7%.

It's hard to say which is the "right" number with all the labels and we can't really see what sentences are, so we don't know whether there should be an implicit space at the end of a line or not. I'm leaning towards 16% as a point of reference.

For a modern Dutch novel I get 18,3% spaces, and 16,8% without end of line spaces (though again this is somewhat arbitrary). 

I guess the difference may be explained by the fact that Voynichese lacks really short words.


RE: Spaces - Emma May Smith - 12-07-2018

Part of the difference may be what you're counting as a glyph. Is aiiin one glyph or five?