The Voynich Ninja

Full Version: Transliteration-related information
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4
In this thread I will collect information related to updates of transliteration files and the corresponding tools.

Everyone is also welcome to ask questions, raise comments and report issues.

To start with, following are some recent updates already posted here.

As I posted You are not allowed to view links. Register or Login to view. , the GC and ZL transliteration files have been improved to versions 1b and 2b respectively, to correct the page variable settings. There have been no changes to the actual transliterations.

Then, in the following message, I indicated that the same has been done for the other three files.
FG (output of the First Study Group) was improved to version 1d,
CD (the good old Currier transliteration) was improved to version 1a
and IT (the popular transliteration of Takeshi Takahashi) was improved to version 1a.

In the meantime, FG has been updated again to version 1e.
All changes were small, and did not affect the text.

As I am now working in this area, I have found several additional issues, or rather inconsistencies. As I sort these out, I will add the information here.

Probably more importantly, I recently reported an update to ivtt (You are not allowed to view links. Register or Login to view.).
While that fixed a bug, it introduced a new one, which has now also been fixed, so ivtt version 1.3 is now available You are not allowed to view links. Register or Login to view. .
While there are basic Eva and extended Eva, the Eva used in Takashi Takahashi's transliteration is neither.
It looks like basic Eva, but it includes a number of character combinations that belong to extended Eva, but are not written according to its rules.

For people who care about details, and being able to double-check results, here are the specifics.
The following list shows character sequences that exist in a number of transliteration files, followed by the correct syntax according to extended Eva:

Quote:cfhh    {cfhh}
ckhh    {ckhh}
ckhhh   {ckhhh}
co      {co}
cy      {cy}
cphh    {cphh}
cthh    {cthh}
cthhh   {cthhh}
ih      {ih}
oh      {oh}

To remove uncertainties, this variant of Eva is now identified in transliteration files at my site as "EvaT" instead of "Eva-". These codes are found in the header records of IVTFF formatted files, You are not allowed to view links. Register or Login to view. .
Corresponding bitrans tables are:
Teva.bit        (to convert Eva-T to extended Eva, and back)
IS16-Teva_def.bit   (to convert STA to Eva-T, and back)

This form of Eva is also found in the much-used web site: You are not allowed to view links. Register or Login to view..

Note that the text used at this web site is now also available as an IVTFF formatted file, through the link already provided above. This file is called VT_ivtff_0c.txt and is at beta level.

Finally, this form of Eva is also used in the interlinear file.
How many total characters and/or words in the text are affected?
This concerns around 100 characters out of a total of approximately 155,000.
Those who attended my talk at the recent Voynich conference, and those who read the corresponding paper (You are not allowed to view links. Register or Login to view.) will be aware that I set up a new transliteration alphabet which I have called STA.
This is an abbreviation of Super Transliteration Alphabet, which does not mean that it is super-duper, but that it is a super-set of all existing alphabets. It combines all details of extended Eva and v101, and thereby also the older alphabets.

Since this alphabet has of the order of 300 different symbols (including all possible ligatures), Voynich characters are encoded by a pair of alphanumeric characters. Files using this alphabet are not suitable for human interpretation. Strings like 'B1A3P1' or 'L1A1E1G2A2' just don't mean anything.
However, it has various other uses, at least for me, and I will come back to it.

Those who want to know more can also read You are not allowed to view links. Register or Login to view. .
(13-02-2023, 10:56 AM)nablator Wrote: You are not allowed to view links. Register or Login to view.TT_ivtff_v0a on your website.

Thanks!
Good to know that these older versions are still being used. The "TT" may be a bit misleading as it belongs to the "IT" series (so to speak), in the current terminology.
I wanted to use "TT" for the transliteration included in Takeshi's web pages, but haven't found the time to do a conversion of that.
From the earliest days of creating transliterations, the symbol "-" has been used to indicate an interruption of the text by a drawing element, usually part of a plant.
This symbol has made it into the IVTFF format as: <->

However, its use has been quite inconsistent. Often but not always, it is preceded or followed by a period in order to indicate a word space. As I am more and more into automated processing, these relatively minor inconsistencies have become a bit annoying, so I decided to consolidate its use. That is: in general the use of word spaces.
This has led to a consolidation of the IVTFF format version 1.7, which I now call 2.0, but is fundamentally the same.

The main rules are:
<-> implies a word space.
<~>, which means vertical bad alignment (which is hardly used), implies <-> and therefore also a word space.

The four symbols for word spaces; comma, period, <-> and <~> shall always stand alone.
They shall only occur between words, never at the start or end of a line.

The full details are available here: You are not allowed to view links. Register or Login to view.
(18-02-2023, 01:27 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.Did you remove the alternates ([:])? I made a few mistakes in the counts because there are some complicated cases that I failed to remove.


The command:


Code:
ivtt -u1 input-file >output-file

will effectively remove all of them, by replacing the construct with the first option.
(02-02-2023, 02:00 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.Those who attended my talk at the recent Voynich conference, and those who read the corresponding paper will be aware that I set up a new transliteration alphabet which I have called STA.

[...]

However, it has various other uses, at least for me, and I will come back to it.

Coming back to it now...

I have recently consolidated the definition of this alphabet, and it is now at the first released version (STA-1).
This is different in some areas from the version I had at the time of the conference.

The new version has been fully integrated at my web site, mainly You are not allowed to view links. Register or Login to view. .
(14-02-2023, 03:36 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.This has led to a consolidation of the IVTFF format version 1.7, which I now call 2.0, but is fundamentally the same.

[...]

The full details are available here: You are not allowed to view links. Register or Login to view.

The new format version 2.0 has now been made the standard at my web site.
For users who need access to older material, earlier versions of some items have now been collected on a legacy items page:

You are not allowed to view links. Register or Login to view.
Pages: 1 2 3 4