The Voynich Ninja

Full Version: Red Herrings are sometimes useful
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I am examining the Voynich Manuscript as part of a major project. I have a number of suspicions and leads on it that have been deferred as part of the project is to make a tool with a wider scope. In spite of that the system has yielded many insights. One of them is not good. I also realise that if this solution solve it, that is a problem, it ends the adventure and as such so far all efforts have deliberately stopped short of getting too close for now. The one that is not good has found a strong candidate for a system of generating text from other texts but it's only one example and might be spurious despite the probabilities checking out just about. The approach here is that even if generative it must be derivative to some degree from something so it searches for that which covers all other possibilities as well.

The Voynich Manuscript is perfect for this. It's a bit weird. It's kind of average. The problem isn't that all possibilities are ruled out leaving a mystery. The problem is that you can't easily rule them out. It exists at a strange kind of junction. Sometimes I look at the evidence I have found for it being fake which I have not made an effort as of yet to pursue holding myself back and consider the chance of it. It works for lines but not well for single word labels so that needs to be examined. Prior to that I consider the background. People making fakes like this is not unknown for multiple reasons. It's a pattern in which a King would pay people, often two on purpose to compete, to go and collect new books for their library. There is a good chance at least one would cheat and make it up.

Around this time is also when people started working on well, everything, including, yes, DRM, this could be ancient DRM. I'm sorry, I have bad news for you. I recently put this into a Bayesian filter and can confirm that the Voynich Manuscript is in fact ancient spam. The same way you open an email and see it's spam, back then you would go to the library, open a book and see it's spam. What's more at least 5% of it is a virus so don't try to convert it into machine code and run it on your CPU. At this time there were people working on ways to prevent people copying their manuscripts. Creating something like this to tie people up is compatible with the many options of the era. When I consider counters to this such as the images I then also consider that someone making a forgery might consider this and go even more out of their way to make it look like that than something authentic. This is the too authentic to be true test. The book routinely fails the true test derived from UFO studies. There is a barrier that always appears before you can get too close. That is, it always leads you on. It never actually confirms anything. There is a distinct line and drop off that is worrying. You see the same in old pictures of Big Foot, the Lock Ness Monster, etc. There is this point where the resolution always just plummets, bottoms out rather than following the normal continuous curve and this book does that.

I have found a huge number of correlations as anyone else will and its easy to do but where do you move on from there? If you look at the alphabet you see recognisable elements. The approach I'm taking and the tool I'm working on is at phase one of detecting correlations both manually and automatically in a holistic approach. The second phase which has not been properly embarked upon is dating it. Lets say it has that ribbon symbol which is sometimes also a 4 or a 5 in more recent alphabets. The problem is that doesn't tell you must about it. Loads of text have that. The system I am working on does many things and one of them is to detect as many correlations as possible then roughly order or date them.

This is a kind of loose exclusion principle. The more widespread, that is, found elsewhere for example, the less specific. The ideal is to narrow down on as specific traits as close in the timeline as possible to it. Even a very loose preliminary manual application of this methodology has yielded promising results but I shall keep that under my hat. I have enumerated certain characteristics of the text that narrows it into a box though the permutations are still quite extreme.

At the moment this system is in testing and only the first proof of concept version. It has already shown some abilities. This includes detecting ancient language patterns by accident such as Indi-European, ancient human transits as far as to the Americas, ancient conduits and a strange ancient alternative to GMT with Greece and Egypt along the meridian instead. You pick up on many things like this in it including natural barriers when plotted on a map. In another case it display the ability to detect different character sets within a text. Numbers are quite easy. If the VM uses numbers then either they are used in a specific way and sparingly or they are like Roman numerals in most cases and using a letter.

I think even without using more than test samples for the tool in early versions a picture has emerged. In particular there are matches for Enochian on sight without even putting it into the tool. When putting in preliminary sequences there is a certain kind of match along some dimensions. Prior to the tool just looking at Enochian rang a lot of bells as being similar. I immediately felt like the VM seems like a precursor to it.

I am likely both right and wrong though I did foresee my error. Enochian or Adamic did not come out of nowhere. When you read things that give you the impression someone just invented it out of the blue that is incorrect. It's all inspired. When you look at Enochian it's clear that it is some garbled mumbo jumbo based on things like treating prior ciphers differently as to intended. Even prior to it things weren't separated. Actual pharmacology with real ingredients that worked was not considered different to magic. There was a split after the era the VM was likely to have been written. Magic and science among other things split. That is quality control, the wheat from the chaff. Alchemy and chemistry, astrology from astronomy. The Voynich Manuscript seems to be from a time and place where these are more fused. Today you go to your hippy friend's basement and there's a Ouija board. You visit your other friend's place who is a scientist and there is a microscope. In this era it wasn't always like that. You visit a single friend and both are in the basement. They are both the scientist and the crystal worshiping freak. There is no well maintained separation.

I don't think that the VM is actually Enochian but in my analysis it has characteristics that seem to share a common ancestor that's quite recent. Enochian if you look at it takes existing functional mechanisms and makes them creative. It is based on the ciphers of the time but does weird things like inverting them. It's clearly the product in part of people asking funny questions like what happens if you decrypt a test already decrypted as well as people trying to interpret the result of the cipher on face value then integrating the concept with other magical notions.

The system I am working on is holistic but a feature of it is to show you specifically thing such as for this correlation which elements contribute the most to it. The problem with a lot of statistical systems is that they just give an output and that's it. An aggregate. The individuals removed. The system I am working on is different. It minimises things like the use of libraries and is all hand written to be able to do things like pull out the pieces of text that cause the correlation or whatever statistical pattern it is for manual review.

The point being is that I would not entirely dismiss Enochian. It's useful at least to get an idea of what was going on at the time and how creative people could be. Not only that but it likely has correlations if the language was generated or uses a cipher through shared ancestry. You can clearly see an ancestral precedent to Enochian in things such as Cistercian symbols. It is useful to include Enochian text and wordlists so that if it matches you can check to find out why. I did this with it matching Pacific Islands, Zulu, Vietnamese and Matan really well which so far has an explanation for increased rates of coincidence. Even if a red herring it told me something about the way the language is mutated. 

There is something about Fijian that's quite interesting where it seems to have quite a restricted character set which is used heavily against itself into a box. Other characters exist with diacritics but it seems in many texts these are either stripped out or people just don't bother to use them. It's a Latinised foreign language. Although it might be different and for other reasons VM might have some boxing as well. There are signs of two sets of types of symbols but the signal is not yet clear enough. It's not as obvious as numbers normally are but like I said it matches some texts using Roman numerals quite well.
I just wanted to check you are not a Scotsman an inch from death (?). As a brit, and dear friend of the (some) scots. 
Rest assured, everything else is being handled. 
Thank you for the input. 













(koen... *random points and winks*)
Not even Balneological has that much qokain
Either you are trolling or need to find a more esoteric community to reach the right audience for your ideas.
If you don't get it you should just say so. One of the things I am looking for and really at this point anyone should be primarily are not direct language matches but transformations. I have a tool which has a number of capabilities already with many more planned. It's more of a suite for analysing and potentially deciphering ancient texts. I already have sample data from a few hundred languages for testing it as it is developed. Naturally, no good hits on direct comparisons.

The text has many characteristics that make it look like it's actually a language. Either it is but something is done to it or it is but isn't through virtual of someone using a means to generate pretend text out of real text. In either case the process is more similar than you might expect. In my case I need to implement a barrage of transformations such as ciphers, obfuscations and so on on the sample texts to then compare. Enochian however does already do this in certain ways I cannot. It's derived from methods going be back to that era so is useful for scanning for transformations.

In respect to the possible evidence for it being generated from an existing text, it's one prime suspect still out of around half a dozen leads depending how you count it. I've narrowed it down to as the best (a few specific ciphers, text generation with a single lead for a specific but rough likely method and a specific method of compression that is also the same as obfuscation). If you minify JavaScript it both compresses and obfuscated. These leads are narrowed down to those it matches best with an issue that they tend to overlap in terms of the signals and patterns they produce. I have methods in the pipeline to test for these but it is tedious and fiddly.

In respect to it potentially being fabricated, I have by a strange coincidence in that a fairly recent poem (19th century) I translated into as many languages as possible (unfortunately not before I realised it had proper nouns in) happened to contain a standard long phrase for a heading in botanical and pharmalogical texts not far from this era (such as Acta Aruditorum and I think Pantegi The African as well as perhaps earlier). This is a Latin phrase with two long words (likely to break through fragmentation) with a cumulative length of 21 characters. The line appears to take these words and use them to generate new words increasing the length by 50% in a highly systematic manner that is hard to ignore yet the two words are still embedded in the line in a manner that can be isolated despite being scattered. This could be a coincidence but is a strong lead though I must confess I have not yet investigated further as this is a general tool so things like this I have to get back to. It was an unexpected find when merely testing it.

In relation to the languages on the map. What I am saying there is that I cannot identify the language with any certainty but it does strongly show signs of being related to European languages. It matches quite well with Ossetian/Persian sometimes. Adamic is kind of useful. It sort of matches the family tree or ancestral languages. It does behave like it has been transformed in a way that results in a fragmentation effect. This can be done surprisingly simply.
It's not DRM. It's actually a manual for building a Ford Fiesta.
(28-07-2025, 05:46 AM)i_want_links_damit Wrote: You are not allowed to view links. Register or Login to view.If you don't get it you should just say so. 

"So."

Are you looking for any feedback? Do you have anything specific to discuss?
From your texts, I get the impression that you are doing everything at once and nothing in particular.
It is a preliminary report with some tips, insights, intrigues and ideas along the way. I hate to tease but it's better than nothing. I'll get back to here in a month or so when I have pursued some leads, further investigated the line of text that appears to be procedurally generated with words from another text with it using two words too long to be scrambled enough to be irretrievable which would otherwise be the case for other word pairs or phrases used. It's possible it's a remarkable coincidence or if not a sign of generation (I have reasons to doubt it) it might be a crack into something else but it's only one particularly compelling one and I need to look for more plus test a bunch of other things to eliminate a bunch more possibilities closing the box.
Ok, thank you for clarifying.