The Voynich Ninja

Pages: 1 2 3

Right. What I'm getting at is that, taken over the whole manuscript, the amount of text is so large that things tend to cancel each other out to a surprising degree. You might notice that some of the stats you are looking at don't change all that much between a transliteration with and one without uncertain spaces.

Of course, it could be that there are specific features that are disproportionately affected by turning uncertain spaces on or off. Identifying those features might be interesting in and of itself.

Edit: I'm looking at it from a perspective where you'd be analyzing large sections of text. If you do a deep reading of a single paragraph, toggling uncertain spaces will have a large effect.

You are not allowed to view links. Register or Login to view. found that:

Quote:cases also correlate pretty well with specific glyph adjacencies -- that is, whenever a common-ish "vord" contains certain glyphs next to each other, it will typically be found with a break between them, without a break between them, and with an ambiguous sort-of-break between them, all in loosely consistent ratios.

From his You are not allowed to view links. Register or Login to view.:

Quote:the glyph pairs with the highest incidences of unexpected behavior also tend to have relatively high proportions of ambiguous word breaks ... there appears to be a correlation between ambiguous spacing and inconsistent spacing between given glyph pairs

He makes the example of r.a and [oraiin] [or.aiin] [or,aiin].

Therefore, in the case of Emma's experiments, I expect that there will be several situations that are not significantly affected by ambiguous spaces (e.g. y.q), while cases that are already problematic when ignoring ambiguous spaces will be even more blurred when comparing with an analysis that includes ambiguous spaces.

(14-03-2024, 09:31 PM)Koen G Wrote: You are not allowed to view links. Register or Login to view.Right. What I'm getting at is that, taken over the whole manuscript, the amount of text is so large that things tend to cancel each other out to a surprising degree. You might notice that some of the stats you are looking at don't change all that much between a transliteration with and one without uncertain spaces.

This is correct in general, but when working with words, one is basically never in the realm of 'large numbers'.
I am just experiencing this myself. The tables towards the end of You are not allowed to view links. Register or Login to view. change considerably when going to a newer and more reliable transliteration (to the point where I have started to suspect errors in my scripts, which I am still double-checking).

(13-03-2024, 08:37 PM)Aga Tentakulus Wrote: You are not allowed to view links. Register or Login to view.You of all people should know this.
Do I now write "le xxx" or "l'xxx"?
Sometimes you have to listen to your instincts.

An "honest" transliteration should be about what is, not what you want it to be.

'Get your facts first, and then you can distort 'em as much as you please.' -Mark Twain

In the 15th century they didn't have have a choice between le and l' ; when xxx started with a vowel the apostrophe was not written. It is true that spaces were often omitted in manuscripts: when you know the language, it's easy to locate missing spaces, just as in spoken language.

(14-03-2024, 09:31 PM)Koen G Wrote: You are not allowed to view links. Register or Login to view.Right. What I'm getting at is that, taken over the whole manuscript, the amount of text is so large that things tend to cancel each other out to a surprising degree. You might notice that some of the stats you are looking at don't change all that much between a transliteration with and one without uncertain spaces.

I remember posting some graphs on the MATTR thread, then removing them as they are so much dependent on uncertain spaces... some statistics are more affected than others.

If I'm not mistaken in case of random errors or mischief in spacing the highest frequency vords should not be too much affected, (remove say 10% of all spaces and they remain the most frequent, down to a certain level of frequency) and since the frequency of vords correlates very well with "regularity", Massimiliano Zattera's claim that vords were built by some process enforcing his 12-slot alphabet/sequence, with some spaces omitted, is even more likely to be mostly correct (I don't think he mentioned the frequency-regularity correlation). Is daiin actually d aiin? Maybe, but neither the slot sequence nor the grammar tell us.

He put the final nail in the coffin of natural language substitution cipher theories (even those that include some amount of processing, like removing vowels or duplicate parts), but who noticed? Not the natural language theorists.

(15-03-2024, 06:04 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.An "honest" transliteration should be about what is, not what you want it to be.

'Get your facts first, and then you can distort 'em as much as you please.' -Mark Twain

In the 15th century they didn't have have a choice between le and l' ; when xxx started with a vowel the apostrophe was not written. It is true that spaces were often omitted in manuscripts: when you know the language, it's easy to locate missing spaces, just as in spoken language.

[attachment=8266]
Example from the 14th century.
I already know what I'm talking about.

The difference. Sometimes it's done and sometimes it's not. Which also makes it a problem.
"dachs" or "dachs" seen as "dachs" and "d achs".
Badger and axle. With "ddachs" it's clear. The first is always the article, and you can even find that in the VM. "88xxxx".

Der Unterschied. Mach mal wird es gemacht, und manchmal nicht. Was es auch zum Problem macht.
"dachs" oder "dachs" so gesehen "dachs" und "d achs".
Dachs und Achse. Bei "ddachs" ist es klar. Der erste ist immer der Artikel, und das kann man sogar im VM finden. "88xxxx".

My personal approach has been my own subjective judgement on a case-by-case basis taking into account the surrounding context - how loose or how tight are the surrounding glyphs placed. This means that I would avoid reading a space even if that yields something contrary to the "usual" Voynichese morphology.

[attachment=8267]
An example of how the same sentence is written in different books and dialects.
" Wenn der Mond in den ....." " When the moon enters the ....."
This is how "der" (the) is also written " der, d', dy, dr, die ".

And once even the spaces have to be marked so that it can still be understood.
The spelling changes from person to person and origin.

(16-03-2024, 02:37 PM)Anton Wrote: You are not allowed to view links. Register or Login to view.My personal approach has been my own subjective judgement on a case-by-case basis taking into account the surrounding context - how loose or how tight are the surrounding glyphs placed. This means that I would avoid reading a space even if that yields something contrary to the "usual" Voynichese morphology.

It is not uncommon to find places with several consecutive ambiguous spaces. There is no obvious reason to choose just one interpretation.

A few examples:

[attachment=8284]

[attachment=8287]

Lost in transliteration because Takeshi-san used his own subjective judgement instead of objective distance:

[attachment=8286]

I put together a rough python script based on the word boxes from the XML files by Job (voynichese.com). The script goes through each word box splitting it into smaller boxes for strokes that are not connected. Distances of 5 pixels or less (~0.2 mm at 600 dpi) are ignored.

I ran the script on Q20. This is the output for the first 3 lines of f106r. Spaces are expressed as micrometers (1000=1 mm).

pshdar 1355 shoefy 2625 yteedy 2498 sho 296 l 2328 korchy 2709 she 423 ky 1905 otchedy 3810 o 423 kshed 2074 qotedy 1651 qoted 1736 yteeo 381 dy

sh 296 edch 254 y 2625 yt 254 chedy 1947 chees 1439 otshes 2540 o 847 kcho 2244 chdy 2074 qo 254 tee 296 dy 2963 ch 423 e 508 d 762 ch 381 e 296 d 254 y 1990 ch 339 e 423 dy 1228 qota 508 r 931 rod

d 296 sh 296 es 847 l 466 che 339 dy 1609 lkch 296 edy 2921 ytchdy 2074 o 847 r 550 ch 423 eo 550 s

In this image, spaces are coloured according to width (<800 green, <1600 blue, <2400 purple; wider spaces red):

[attachment=8299]

Histograms for the distribution of spaces for a few last-first combinations:

[attachment=8300]

This analysis obviously is very rough. In particular, spaces are measured horizontally across boxes, which can result in underestimation with respect to the actual distance between strokes. For example, l often has a long leftward descender: I slightly cut descenders and ascenders, but they certainly have a huge impact in reducing these measures.

Moreover, I probably made errors I am not aware of, so everything should be taken with caution.

The histogram including all spaces possibly shows two overlapping distributions: one peaking close to 0 and a much smaller one peaking at ~2mm.

The comparison between r.a (one of Patrick's examples of ambiguous pairs) and r.c could be particularly interesting. The left stroke of a is very similar to the left stroke of c: the difference is probably due to something deeper than stroke shape. In the case of r.a, there is a drift from a normal close to 0 distance. For r.c, it is clearer that the space tends to be close to 1.5 mm.

EDIT: the last word of the image above shows one of the problem with my script. Words are broken into unconnected fragments and parts of the words are assigned to each fragment based on the position of the characters in the word. I made no attempt at OCR here. The parts of cheos where labelled ch-eo-s instead of the correct che-o-s. My impression is that these errors are not terribly frequent nor very systematic, so I expect that histogram shapes are significant.

Pages: 1 2 3

Koen G

MarcoP

ReneZ

nablator

Aga Tentakulus

Anton

Aga Tentakulus

nablator

MarcoP