The Voynich Ninja

Full Version: Pareidolia, but underwater: What is under the green paint?
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3
Like a responsible parent, I have been trying to pass on to my computer my Superior Pareidolia skills. Specifically, the ability to see inked details that were painted over.

The Painter who applied the semi-opaque tempera colors often painted over inked outlines.   Examples are easily seen where these inked strokes were still dark and clear, like (A,B) below.  
[attachment=11739]
Besides obscuring those strokes, it seems that the painting also washed away some of the ink, and sometimes deposited it a short distance away, as in (D).  

Thus any ink strokes that were already quite faint and faded, like (E), must have become invisible to the naked after being painted over.  And that is why we need Artificial Superior Pareidolia.

The idea is as follows.  
  • Take an image of an area which is suspected of having "invisible" drawings or text under some semi-opaque paint.  
  • Select a set of pixels A representative of what one wants to detect, like places where there is definitely ink covered by green paint. 
  • Select one or more additional sets B, C, ... that are to be distinguished from A -- like places where there is green paint with but almost surely without ink underneath.  
  • Look at the colors of those pixels as points of three-dimensional space, within the unit cube where (0,0,0) is black, (1,1,1) is white, (1,0,0) is red, etc.   Here is an example with three subsets of a page, representative of blank vellum (red), dark text ink (green), and green paint over blank vellum (blue):
    [attachment=11744][attachment=11743][attachment=11742]
  • Approximate each cloud ou points A, B, C, ... by a trivariate Gaussian probability density function (PDF). This can be visualized as a fuzzy ellipsoid with varied dimensions along three axes, with some generic orientation in space.
  • Take each pixel of the image and use Bayes's formula to estimate the probability that the pixel belongs to each distribution A, B, C, ...  or is an "outlier" that probably does not belong to any of them.
  • Write one grayscale image for each set, showing the probability of each pixel belonging to that set.
     
Ideally we should do this with high-resolution uncompressed multispectral images with frontal illumination and linear encoding.  But we don't have multispectral scans for any of the pages that may have significant details hidden under the paint.  (The herbal pages have green paint, but the ink that can be seen under it is just boring nervures or leaf outlines.   At best, those images could be useful to validate this approach.)  And even those that we do have are taken with oblique lighting that creates light and dark spots at every tiny bump on the vellum surface.

So we must do with the Beinecke 2014 scans, which have frustratingly low resolution (some ink traces being only a couple of pixels across), only the three RGB color coordinates, oblique illumination, non-linear "gamma" encoding, and complex JPEG compression artifacts.  But, sigh, that is life...

[To be continued]

All the best, --stolfi

[Sorry for the big images, but I couldn't figure out how to insert only a thumbnail of the attachment, with the full version opening on a click.  Is that possible?]
This is very similar to what I tried last year, the only difference is instead of single pixels I used a diamond-shaped kernel of 3x3 or 5x5 pixels, concatenating its pixel vectors in both RGB and Lab, and using linear regression. Unfortunately, the result wasn't particularly remarkable. E.g., the same lady:

[attachment=11747]

The full gallery: You are not allowed to view links. Register or Login to view.
(19-10-2025, 07:16 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.This is very similar to what I tried last year, [...] The full gallery: You are not allowed to view links. Register or Login to view.

Thanks!  I think I am getting slightly better results for "invisible" ink.  For instance I get a stronger signal for something between the feet of that nymph, and maybe for something under the east end of the "backrest" she is leaning onto. We will see...
Continuing this thread, here is a You are not allowed to view links. Register or Login to view. that explains the method in more detail. It includes an example that is just for illustration of the method; still not the final analysis, focused on ink-under-paint.

All the best, --stolfi
I suppose if a model is trained on this folio it's possible to get much more curios results, but I was actually aiming for maximum certainty, so I only trained on samples from 6-7 folios and then applied the result to the whole MS. The main problem I have with automated pareidolia is the same I have with the natural one - the texture of the vellum itself provides a lot of subtle lines and curves, with enough squinting or tweaking of the parameters of models one can see/reveal almost anything.
(20-10-2025, 08:24 AM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.The main problem I have with automated pareidolia is the same I have with the natural one - the texture of the vellum itself provides a lot of subtle lines and curves, with enough squinting or tweaking of the parameters of models one can see/reveal almost anything.

Indeed.  But this approach is at honest at least in the sense that the final classification is made independently for each pixel, based only on its color; without trying to look for multi-pixel patterns like lines or characters.  Which is where actual pareidolia comes in.  It will be left to the human user to "see" such patterns on the computed probability maps.  The user's choice of sample pixels will influence the classification, but only through their colors, not through their positions or adjacency relations.

Quote:I suppose if a model is trained on this folio it's possible to get much more curios results, but I was actually aiming for maximum certainty, so I only trained on samples from 6-7 folios and then applied the result to the whole MS.

I think that training separately for each page (or even for each clip from each page) is justified because there seems to be some overall variation from folio to folio in the colors of parchment, ink, and paint.  For instance, the green paint of You are not allowed to view links. Register or Login to view. seems to be more bluish than that used in the Bio section.  

And there is also a much larger variety of paints and stains over the whole MS.  On this particular image I don't have to include the red, blue, yellow, and rusty paints as separate provinces. They will be classified as "OTHER" without significantly affecting the classification of the other provinces of interest.  And I don't have to worry about ketchup stains, or the gray offset from blue flowers, that are important "noise" features on some other pages.

All the best, --stolfi
(20-10-2025, 11:44 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.Indeed.  But this approach is at honest at least in the sense that the final classification is made independently for each pixel, based only on its color; without trying to look for multi-pixel patterns like lines or characters.  Which is where actual pareidolia comes in.  It will be left to the human user to "see" such patterns on the computed probability maps.  The user's choice of sample pixels will influence the classification, but only through their colors, not through their positions or adjacency relations.

Yes, this was my reasoning too for only using local models. Also, it helps if the result can be independently reproduced via a simple (ideally linear, or maybe polynomial) combination of channels. 

For example, I still think that the strongest argument that the signature like feature at the bottom right of f116v actually exists is that it can be reproduced by a linear combination of MSI channels and it's possible to verify this with a simple Python script: You are not allowed to view links. Register or Login to view.

The more complex processing is required, the less persuasive the result is, because I think for chaotic processes it's possible to keep tweaking the training set or the training parameters to highlight random combinations of pixels to produce the result one desires.
(20-10-2025, 11:57 AM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.it helps if the result can be independently reproduced via a simple (ideally linear, or maybe polynomial) combination of channels.

That will not be the case, because Bayesian classification with Gaussian distributions is inherently non-linear, and usually extremely so.

For instance, imagine that you have only two Gaussian classes (plus "OTHER"), where class A has a very broad spherical  distribution centered at middle gray (0.5,0.5,0.5) and class B has a much narrower one centered at slightly darker gray (0.4,0.4,0.4).  Bayesian classification will assign class A to colors inside the A sphere, except within a small region around the darker gray, where it will say B.   A linear classifier will be unable to delimit even the A sphere, much less the B hole inside it.  

That is why linear vector classifiers are usually applied to non-linear functions of the inputs, the (improperly) so called "kernels".  Which requires the user to come up with suitable kernels.  If one tries to use as kernels all polynomials on the input coordinates up to degree (say) 4, one gets so many kernels that the classification will probably be garbage. 

That is also a danger if one uses "magical" non-linear classifiers with zillions of internal parameters, like neural networks...

All the best, --stolfi
(20-10-2025, 01:16 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.That will not be the case, because Bayesian classification with Gaussian distributions is inherently non-linear, and usually extremely so.

Personally for me any deterministic result that is robust to minor changes in the source raw pixels values will work. So, if there is (R,G,B) -> (Ink, Paint, Vellum) (or whatever it is detecting), which always produces the same result for the same R, G, B and the result doesn't change a lot visually if I replace all original RGBs with (R + r1, G + r2, B + r3) were r1, r2, r3 are reasonably small random integers, say [-2, 2], then I'd say the whole pipeline looks reasonable. There will be a separate question of what exactly it's detecting, but highlighting all ink on the page that is obvious or has faded but certainly had been present in the past, while not highlighting a lot of empty space, would be a spectacular result. I couldn't achieve this.
(20-10-2025, 01:31 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.So, if there is (R,G,B) -> (Ink, Paint, Vellum) (or whatever it is detecting), which always produces the same result for the same R, G, B and the result doesn't change a lot visually if I replace all original RGBs with (R + r1, G + r2, B + r3) were r1, r2, r3 are reasonably small random integers, say [-2, 2], then I'd say the whole pipeline looks reasonable.

The report I just posted shows the results of that basic analysis.  What do you think of them?  

That test still does not try to detect the ink-under-paint "province", which is my ultimate goal.  I have tried the latter already (with seven distinct color classes plus "OTHER") and the results are encouraging, although I have still not found the best subsets and the best way to combine the results (like ORing the "dark ink under green paint" map with the "faint ink under green paint" one, and things like that).  Please stay tuned...

All the best, --stolfi
Pages: 1 2 3