The Voynich Ninja

Pages: 1 2 3 4

I haven't used my method for more than a year now, and I only applied it to a handful of pages.
Images are first smoothed and then R,G,B values are converted to Hue,Chroma,Lightness. In this, I am not tryng to do any mapping of hues to compensate for features of the human eye.

It then uses the ranges of these three values to make decisions on parchment, paint and ink.

For detection of faint ink and ink on a background of paint there are some additional tricks.

I know that that is still not sufficient, and some light brown paint that is easily detected by the human eye cannot yet be separated out. I know what I eventually want to do there, but that is a low-priority thing.
Right now it is the writing I am after.

I created the same two images as posted by @nablator.

Differences (beside the algorithm used) are:
- all parchment is shown in a standard parchment colour, not from the original image
- it did not just detect/remove green, but all colours
- the image was smoothed before. (This becomes clearly visible when zooming in).

The images were created as BMP and converted to JPG using Irfanview on Windows.

It clearly mistakes some of the darker parts as 'brown paint', probably because I forgot to tell it not to look at brown. It took me a while to figure out how to do this at all.

Link to image of paint only: You are not allowed to view links. Register or Login to view.
Link to image without paint: You are not allowed to view links. Register or Login to view.

Edit: the ink was drawn with the original colour of the (smoothed) image, so it still has the appearance of colour in some places.
In the following link the ink has been drawn in a 'standard' ink colour.

Link to image without paint: You are not allowed to view links. Register or Login to view.

I've tried removing paint by training using a 3x3 pixels kernel from the visible light TIFFs, specifically I take the value of the central pixel (as a 6 channel RGBHSV) and the mean of the other 8 pixels (RGBHSV as well) and train on the resulting 12-dim vector. Here are a few sample images:

[attachment=9536]
[attachment=9537]
[attachment=9538]

I have the whole MS processed this way and I'm more than willing to share it, I'm just not sure how to do this properly. These are basically processed images published by Yale, when I produce an image or two this looks like a clear fair use for research/amusement purposes. However, if I bundle the whole thing together and put it online, will it be sufficient just to say that these are processed images from Beineke and give a link, or should I include some license? The problem is I took the TIFFs from archive.org, they are no longer available from Yale site, so I'm not even sure about their status. Maybe @LisaFaginDavis can clarify this?

It works more or less fine when detecting text covered by paint.

[attachment=9539]

Aren't all Yale scans of the VM copyright free? People are even selling them printed on clothing.

(09-12-2024, 05:29 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.The problem is I took the TIFFs from archive.org, they are no longer available from Yale site, so I'm not even sure about their status.

You are not allowed to view links. Register or Login to view.
Click the button at the bottom-left corner of the viewer.
Select Full size original (tiff)

Quote:Rights
The use of this image may be subject to the copyright law of the United States (Title 17, United States Code) or to site license or other rights management terms and conditions. The person using the image is liable for any infringement.

I guess non-commercial derivative works fall under fair use.

Quote:If you use or reproduce our materials in any format, we ask that the Beinecke Rare Book and Manuscript Library always be cited as the source of the material with the appropriate credit line found at the bottom of this page.

You are not allowed to view links. Register or Login to view.

So you need to credit the Beinecke Rare Book and Manuscript Library, Yale University.

(09-12-2024, 05:29 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.I've tried removing paint by training using a 3x3 pixels kernel from the visible light TIFFs, specifically I take the value of the central pixel (as a 6 channel RGBHSV) and the mean of the other 8 pixels (RGBHSV as well) and train on the resulting 12-dim vector.

Did you do the training on labeled data? How exactly?

(10-12-2024, 01:51 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.Did you do the training on labeled data? How exactly?

I take an original TIFF into an image editor, mark some areas of interest with a distinctive color (specific to the desired predicted class) and cover the rest with neutral grey. Then the training script reads the original TIFF and the marked file, extracts the desired values from the marked file, treating each distinct color value as a separate class (and ignoring grey ares), then extracts 3x3 kernels of the same pixels from the original TIFFs, shuffles them (can be skipped for models that don't process data sequentially, e.g., linreg, but important for models that work on batched data) and trains the model.

I've labelled 4 images this way, using those that show various colors. On each image I only label (manually, with a pixel brush) a small subset of features, maybe 0.1% or less of all that is present. Normally it takes no more than 10-15 minutes to label an image this way. When adding marks, I'm primarily targeting areas which can be clearly identified visually, but could be challenging for an algorithm with no spacial awareness (e.g., faint lines with clearly visible beginning and ending, lines coming partially under paint, etc). Normally I split my labels between 50% of clearly identifiable features and 50% of features that I can only extrapolate from spacial clues, but the model will have to learn them the hard way using local pixel values only. After finishing the training, I applied the resulting model to the whole set.

The following image shows the markup used for f69r, the original on the left, the image after adding color labels in the middle and the final markup image as seen by the model on the right. For training the model receives the leftmost and the rightmost images as its only input (other than the hyperparameters of the model). For prediction it only receives the original TIFFs with no annotations.

Note that the model is trained to separate various paint colors and ink into different layers, not to remove color, I "converted" it to paint removal tool by setting white color to all paint classes in post processing step. It's likely the results could be better if I train it to specifically remove paint by leaving just two training classes - ink and everything else.

[attachment=9542]

(09-12-2024, 07:44 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.
Quote:If you use or reproduce our materials in any format, we ask that the Beinecke Rare Book and Manuscript Library always be cited as the source of the material with the appropriate credit line found at the bottom of this page.
You are not allowed to view links. Register or Login to view.

So you need to credit the Beinecke Rare Book and Manuscript Library, Yale University.

Thank you, I think this answers it perfectly. Probably, it will be more convenient to distribute the processed images as separate image files (as opposed to, say, a single PDF, which would be quite large), so I will probably just add something like "processed image based on the originals published by Beinecke Rare Book and Manuscript Library, Yale University" in small type to the bottom of each image (there is plenty of space there).

A great job, wow. I can't wait to see the whole processed VMS (and my eyesight will greatly benefit from it!).

Pages: 1 2 3 4

ReneZ

ReneZ

oshfdk

oshfdk

Koen G

nablator

ReneZ

oshfdk

oshfdk

Mauro