The Voynich Ninja

Pages: 1 2 3 4 5 6

I've uploaded a version of VMS images with paint subdued by a simple linear regression algorithm, using the TIFF images published by Beinecke (Beinecke Rare Book and Manuscript Library, Yale University). The idea was to boost the ink to get some understanding of what the manuscript looked like before the painting.

Online version: You are not allowed to view links. Register or Login to view.

Image files: You are not allowed to view links. Register or Login to view.

Things to consider, non technical: the algorithm is very unlikely to introduce shapes that are not present in some way or other in the original images of the manuscript. If there is a line in the processed images, it's likely it was in the original images too. Not necessarily this line was made with ink though.
The opposite is not true: the algorithm can (and in many cases does) fail to detect quite obvious ink lines under paint.
I used a single model to process all the image files, so it's natural that for some of them the result is much better than for the others.

You can use/modify/distribute these images in whatever way you like. Probably you should credit Beinecke Rare Book and Manuscript Library, Yale University for the original images. You can credit me as 'oshfdk' (all lowercase) if you wish, but this is not necessary.

More technical details:

The processing is local and based on a diamond shaped 5x5 px kernel, that looks like this:

**2**
*212*
21012
*212*
**2**

0 is the pixel the class (ink, paint, vellum) of which the algorithm is trying to identify. The model receives the color information for pixel 0, the average color of all pixels in group 1 and the average color of all pixels in group 2. Only the color information from these 13 pixels is used by the model for each output pixel. Averaging the color values of pixels in groups 1 and 2 allowed me to provide the model with some immediate context without giving it any spacial or directional information. Prior to averaging, the color information was augmented by combining RGB and HSV channels and adding second order polynomials (a^2 for each channel and ab for each pair of channels), resulting in 27 values per group and 81 input values in total per single output pixel. The model itself is a simple linear regression, the training data included about 50000 marked pixels from 8 folios: 1r, 1v, 2r, 4v, 7r, 25r, 67r, 83v (I started from 1v and then was adding folios for which model couldn't separate colors well enough).

Now that I think of it, it could be possible to train a more powerful model using weighted combinations of the training data points. E.g., if we combine 30% of a known ink kernel and 70% of a known paint kernel, using proper computation mimicking physical blending of paints, and then label this as ink, we can get a much larger set of training points for detecting faint ink and separating it from layers of paint.

Thank you! I am certain I will use these sooner or later.
I am impressed by the way they look already, but if you think you can make it even better, I'm not saying no Wink

Great stuff!

If I recall correctly there is a strong suspiction that the pictures in Voynich Manuscript were painted not by the original creator but by someone much later who didn't really know what he was doing.

So he was unable to read the text, didn't know the plants and used some totally random colors. And he wasn't really good at drawing as he often couldn't fit well into the shape outlines.

By watching the manuscript without colors we are closer to truth, to the intention of the author Smile

By the way, do you think that after the color removal we could spot some invisible details like some words?

What do you think of the cases below? Could these be words? Were they discussed before?

You are not allowed to view links. Register or Login to view.

The example on the left looks like some lines. The one on the right like the kinds of circles we see on some of the flower heads.

Thank you!

With this version of the model I think it's impossible to uncover anything that wasn't already known from inspecting the color TIFFs or the original manuscript.

In principle there is a small chance that with some creative design of the training process it will be possible to produce a model that could exploit statistical variations to detect ink where no amount of standard color correction can bring it out.

Regarding the images you posted, I can't see any particular letter shapes there, can you?

Do you know what kind of scan would be required to get the best contrast between ink and paint? Or to even reveal ink under the paint?

[Edit: Split off if needed Koen please, not sure if I'm going off topic]

I'm certain this will have been brought up on here before, but I couldn't find a thread.
It's quite easy to "see things in the paint" if you look for too long.. but I think your scans show this one quite clearly - You are not allowed to view links. Register or Login to view.
Anyone able to read it?

If it is in Voynichese it is "ios a(n/r) on"
It is obviously done a lot smaller than the other text on the page but the letters look consistent with the bulk text to me
I hesitate to assume it is Voynichese as I don't read other languages, and it would be very strange. "on" for example has 4 clear hits only, 3 are contained within a 12 word section, nothing starts with "i" as far as I remember.

[Image: plant.jpg]

Based on presumably rather early observations, Rene wrote:

"Voynich text (ios an on ?) under green paint of bottom right leaf (middle petal). The suggestion is that this could also be a painting instruction."

Source: You are not allowed to view links. Register or Login to view.

I'd only change that the middle word might be "ain"?

It might be, though the spacing seems fairly uniform (in Voynich terms..)
I've seen "r" done this way, with a much lower down connection and it looks different to the next "n".. but I think all 3 are possible.. it would be interesting if it were a painting instruction as I'm sure the other examples were thought to not be in Voynichese (the "G" and "rot").. then you have the obvious question of why this one and not others

I guess the "G" and this one are plant 1 and 2 of the manuscript, so maybe they abandoned the idea early on or something.. I guess it's just another to file under interesting but it is nice how much these images show it clearly

Pages: 1 2 3 4 5 6

oshfdk

oshfdk

Koen G

Rafal

Koen G

oshfdk

Koen G

Bluetoes101

Koen G

Bluetoes101