The Voynich Ninja

Full Version: What tools do you use for VMS analysis?
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2
(21-12-2023, 04:38 PM)asteckley Wrote: You are not allowed to view links. Register or Login to view.You are correct that, for this purpose, it (or any SQL database for that matter) is just cumbersome and unnecessary.   In data processing terms, the VMS is.. well, tiny.  The extra overhead of dealing with a persistent database just isn't warranted, since the entire manuscript and all its components can be easily held several times over in RAM on a laptop.
So reading in directly from an IVTT transliteration file and/or from CSV files is more than sufficient.

I do, however, use MySQL behind applications like The Voynich Garden and also for clipping and managing graphical elements from the Voynich hi-res images.
This was mainly necessary because the graphical editing work is so meticulous and time-consuming, that it is worth persisting each action to disk so that I can backtrack editing mistakes and not lose my work when I accidentally fat-thumb a deletion.

On a separate topic though, and pardon my rant: 
I must disagree with the "terrible reputation" claims on MySQL.  I've used it for several large-scale commercial and mission-critical implementations with great success. Any failures concerning security or reliability would result from its hosting and implementation infrastructure, and any other alternative database (e.g. Oracle) is equally vulnerable to that. The criticisms of it almost always come from those with a vested interest in its commercial competitors.

I have been discouraged by the complications that I encountered, but when I have a bit more time, I will pick it up again. Indeed, the VMS transliteration data is quite a small dataset, but on the other hand, plain ascii with in-line encoded data is a dead end. Also, when including several transliterations, and character position information, a database is the only option.

The surity issue is clearly off-topic here, and there are other good reasons why I can't go into it. Certainly, this will not be only on MySQL...
I'm quite a beginner to Voynich Manuscript and other ancient ciphers (I work with Rohonc Codex as well) and haven't achieved much yet  with Voynich but I'm already using SQL databases (Microsoft Sql Server variant) combined with my graphical user interface.

If I may say something, I would strongly advise you to try it if you have necessary skills. Working with text files and some scripts in Fortran, Perl or other ancient language executed from command line is so 1990s Wink You will be really able to achieve much in more comfortable way with database approach.
My thought behind going for a proper DB is a simple one.
I already have my database in (logically) linked CSV files, and fortran tools to access them (I am also very 1990's  Shy ), but that only works for me and nobody else. 
An online server would change that...
But I am still quite far away from that.
I use Python, but I guess that's mainly because I'm comfortable with Python and fall back on it for most of the programming I do.  I'm especially interested in the structure of lines and paragraphs, which seems to be relatively untrodden territory, so when I have a statistical question, there usually isn't some pre-existing tool available to answer it, and I have to come up with something from scratch.
I use the c++ programming language and have written a console program to do analysis on the manuscript. The program reads transliteration files in IVTFF format and stores information inside B-tree data structures. Data stored include words, word prefixes, words suffixes, word positions within lines, word pairs, characters, character pairs, character positions within words. I have my own c++ classes for storing data in B-tree structures. I create a set of such data structures for each folio, quire, language, hand and illustration type. Also for last and first words of a line, first line words in a paragraph, label single words, label multi-word sentences. My program then searches this data to get statistics of various word frequencies, word pair and character pair affinities.

The program outputs RTF files of tables of statistics that can also show manuscript words and sentences in a V font. Also CSV files to show matrices of affinities.

The only SQL database I use is SQLite which I run in memory mode. I am able to interface to SQLite directly from within my C++ program. I use SQLite to create a representation of the whole manuscript, as a PNG image file, which I occasionally do to locate where a particular word falls within the manuscript.

Whenever I feel the need to persue an additional line of investigation I can quickly make the coding changes and recompile my program.
Hello,

If anyone is interested, I use photoshop and try to mimic the effect of scrubbing it with water to remove the top layer gently.

Maybe that is why deciphering the Voynich is difficult, and maybe that is why some pages are missing..

I believe and notice almost every cell on each page inked or blank area, contain hidden information.

This book is meant to be looked at in a 3D fashion. And what appears to be behind this top layer amongst other things, golden objects such as handles. I can show you some examples and you can actually see through the items I am mentioning..

Anyway

Let me know your thoughts : )

Thank you,
Emmanuela
(30-01-2024, 09:25 PM)Moonchild Wrote: You are not allowed to view links. Register or Login to view.I can show you some examples and you can actually see through the items I am mentioning..

Sounds a lot like a restoration technique. I think it would be interesting to see some of your examples but perhaps in a separate thread. Could you start one?

The use of Photoshop has been discussed frequently on this site with varying purposes, methods and outcomes but so far I haven't found a previous one discussing this effect or use in particular. There have been several discussions about the VM underdrawings, though. And elsewhere, digital restoration of manuscripts, including uncovering underdrawings, using both Photoshop and other methods (including AI), has been a developing area of great interest, probably as long as such programs have existed. You are not allowed to view links. Register or Login to view. is a notable project group (You are not allowed to view links. Register or Login to view.; in-depth You are not allowed to view links. Register or Login to view. about DIAMM).

Here is a small selection of a few other choice examples (none of them specific to the VM, though):

BMC Blog Network 2019 article, Unveiling the invisible – mathematical methods for restoring and interpreting illuminated manuscripts (original You are not allowed to view links. Register or Login to view. on SpringerOpen; also covered in an You are not allowed to view links. Register or Login to view.; 3D modeling is discussed):
You are not allowed to view links. Register or Login to view.

Adobe Photoshop and Eighteenth-Century Manuscripts: A New Approach to Digital Paleography by Hilary Havens, University of Tennessee, Digital Humanities, Volume 8 Number 4, 2014: You are not allowed to view links. Register or Login to view.

Digital Restoration by Denoising and Binarization of Historical Manuscripts Images by Dimitrios E. Ventzas, Nikolaos Ntogas and Maria-Malamo Ventza, March 2012, in book: Advanced Image Acquisition, Processing Techniques and Applications I (instead of discussing specific software, this paper focuses on the methods and techniques used):
You are not allowed to view links. Register or Login to view.
Thank you for all the links, very interesting. I will show some examples and start a new thread in the coming days and looking forward to some feedback Smile

Emmanuela
I do most of my analyses with tools I've written in the Awk language. I've attached the following files to make my frequency analysis tool available to Unix-savvy folks (especially those new to looking at the text):

* ekg2Awk.txt: this is the Awk code file -- I'm releasing it under the GPL v3.0 (or later) license. This is a "Swiss Army knife" tool that handles computing many of the common properties people look at when analyzing the text (Sukhotin's vowel detection algorithm, word type and token length distributions, fitting Zip's Law to word frequencies, (unconditional) entropies, type/token ratio).

* COPYING.txt -- the GPL v3.0 license text

* HerbA.txt and BioB.txt -- the Herbal A and Biological B portions of the D'Imperio transcription using Currier's transcription alphabet. These are provided because they're easy to preprocess in order to simplify exploring use of the tool. You can use the transcription of your choice as long as you figure out how to strip off line identification prefixes, handle alternate possible glyph readings, and convert non-ASCII transcription characters into something ASCII (probably the transcription alphabet's "weirdo" character). I have sed scripts that handle that for the EVA-based transcriptions and the v101 transcription, but I didn't want to have to get into documenting and releasing those scripts as part of releasing the basic tool. You can also feed appropriately preprocessed natural language text samples into the program for comparison of their properties to those of the Voynich mss text.

* OutputExamples.txt -- some additional examples of using the program beyond those in the documentation comments at the start of the program file. I included this file because I suspect that the program output would get mangled (line wrapped, etc.) when put in the body of a post. (Also, it doesn't look like any of the available fonts for posts are fixed-width Sick ). Look at this file using a fixed-width font (for example, Consolas Regular in Notepad under Windows).

It is important to run this is a terminal window that's using a fixed-width font.

I hope folks will find this of value.

Karl
Pages: 1 2