The Voynich Ninja
Text parsing and BITRANS - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: Text parsing and BITRANS (/thread-3013.html)

Pages: 1 2


Text parsing and BITRANS - ReneZ - 07-12-2019

There's quite a bit of confusion going round.
Nick used the word 'parsing' correctly in You are not allowed to view links. Register or Login to view. , although there is a point to be made about transcription vs. transliteration. However, that's not my purpose here.
One can transliterate the Voynich MS text using any system, be it Eva, Currier, v101 etc. The result is a text file.
It is this text file that we want to parse as part of statistical analysis.
So:
- step 1 is to create a text file using some definition;
- step 2 is parsing this text file with the aim to figure out which are the real 'units' of the Voynich MS text.
For this second step I used to rely on BITRANS, a tool made in the 90's by Jacques Guy, but it was only available as a DOS command line tool, and still worked in early versions of Windows. It seems to be dysfunctional now, due to Windows evolutions, and I never saw a Unix / Linux version.
You are not allowed to view links. Register or Login to view. is Dennis Stallings' page pointing to a download.
This tool was perfectly fitted for the parsing task. It was used extensively in the definition of the Eva alphabet, and the creation of the interlinear file by Gabriel Landini and Jorge Stolfi.
Just to illustrate, the example given by Nick in the above-mentioned post:
Quote:Task #2: Parsing the raw transcription to determine the fundamental units (its tokens) e.g. [qo][k][ee][dy]
is easily done by defining substitution rules. BITRANS then allows to use these rules back and forth.
It also allows to define context-dependent rules, for example at start of words or end of words.
Now this tool seems lost, but I have been making good progress with a revival implementation. It does not allow multi-pass parsing, but it does support most other features that I used to find important.
The context-dependent rules are not needed when converting between different transliteration alphabets, but they are likely to play a role when parsing the results for interpretation of the text.
Just to give a simple example, the following definition file changes:
- 'con' at word starts into '9'
- 'us' at word ends into '9'.
#con #9
us# 9#

It changes this text:
consensus contract proconsul tempus couscous
into this:
9sens9 9tract proconsul temp9 cousco9
By specify the 'backwards' option, the same command with the same definition file changes the second back into the first.


RE: Text parsing and BITRANS - -JKP- - 07-12-2019

(07-12-2019, 12:57 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view....
It is this text file that we want to parse as part of statistical analysis.
...

Yes, exactly.

As is probably apparent by now from the other thread, I do not like the word parse being used in terms of mapping the Voynich glyphs to plaintext. One can parse complex glyphs into separate units, but it's not a standard way of using the term—"mapping" (as suggested by Anton) is better for describing the correspondence and breakdown of the VMS glyphs into their plaintext representatives because it is less likely to be confused with discussions about parsing the transcript (the Voynich text).


RE: Text parsing and BITRANS - -JKP- - 07-12-2019

For search or replace functions, I use a tool that combines an intuitive graphical text-boxed based search together with grep capabilities. What is nice is that it will also do searches and replaces on text formatting (bold, italic) and colors. However, it's not a stand-alone application. Part of the reason I can't release my transliteration tools is because they are all integrated... my own fonts, my own search-and-replace, my own transcripts. It took about four years to get it working really well, but the parts are integrated into a whole, they are not individual utilities, so you can't just peel it apart. It is very flexible, however. I can change almost anything to anything.


RE: Text parsing and BITRANS - Aga Tentakulus - 07-12-2019

Apart from the correct interpretation of a glyph, there is also the correct assignment.
Let's take -9 based on the word Taurus, -um / -us. So (-9 -um) stands for singular.
"um" is also the abbreviation for "unum".
If 9- is at the beginning, "um" stands for one, one.
If 9 stands alone, it has a similar meaning. But for an electronic translator to understand it, I have to write it out. "unum". So the "9" corresponds to 4 normal letters.


Abgesehen von der richtigen Auslegung einer Glyphe, kommt noch die richtige Zuweisung.
Nehmen wir -9 basierend auf dem Wort Taurus, -um / -us. So steht (-9  -um) für Einzahl.
"um" ist aber auch der Kürzel für "unum"
Steht 9- am Anfang, so steht "um" für ein , eine.
Steht 9 alleine, so hat sie eine ähnliche Bedeutung. Damit aber ein elektronischer Übersetzer es versteht muss ich es ausschreiben. " unum". So entspricht die "9", 4 normalen Buchstaben.
Basierend auf dem Wort Taurus.
    [font=Tahoma, Verdana, Arial, sans-serif]     [/font]


RE: Text parsing and BITRANS - -JKP- - 07-12-2019

(07-12-2019, 02:26 PM)Aga Tentakulus Wrote: You are not allowed to view links. Register or Login to view....
If 9- is at the beginning, "um" stands for one, one.
...

It's just a small point, but most of the time (in Latin scripts) when 9- is at the beginning, it stands for "con-" or "com-".


RE: Text parsing and BITRANS - Aga Tentakulus - 07-12-2019

JKP
Good question.
Does he use real abbreviations?
I suspect that he uses German spelling on Latin soil.

Is it what it looks like?

   


RE: Text parsing and BITRANS - -JKP- - 07-12-2019

I don't want to get too far off René's topic, Aga, but if the chart you posted is about Latin scribal conventions, a few of those are correct, but quite a few of them are not. Is this something you did? Or by someone else? It isn't quite correct. Here are some of the inconsistencies with Latin scribal conventions:

  • For example, the r shape in the top-right corner is not used for -ur or -tur. -ur/-tur is written like the number 2 and it is usually superscripted. If you want to see examples, I have many of them.
  • The long-cee (c) in the top-left corner of the chart is not used for con-/com-. I have never seen it used that way. For con-/com- they almost always use 9 or a reverse-cee shape.
  • The s is not usually used for con or cum, it is not even usually a "c". Most of the time it is "e" with a tail (for words like eius), but sometimes it is a "c" with a tail, but usually for ce or cer or similar syllables or words, not for con or cum.
  • The g shape is not usually used for eius. Most of the time they used s for euis. The g shape in Latin languages is usually the suffix -cis, just as the m is usually -ris (or sometimes -tis).
  • The k is usually "Item". In French, it is a ligature for "Il".


BUT... I'm not quite sure what the chart represents:

  1. Is it a chart of Latin conventions (if so, several of them are wrong and should probably be corrected), OR
  2. Is it a chart of possible interpretations of Voynich glyphs (in other words, not really Latin, but some guesses about what these shapes might mean in Voynichese)??
If the chart represents a theory about what the shapes might mean in Voynichese, then it doesn't have to be perfect Latin, it just has to have good logic for the choices.


RE: Text parsing and BITRANS - Aga Tentakulus - 07-12-2019

No, I wasn't.

NSA Report 1978


RE: Text parsing and BITRANS - Stephen Carlson - 10-12-2019

(07-12-2019, 02:26 PM)Aga Tentakulus Wrote: You are not allowed to view links. Register or Login to view.[Image: attachment.php?aid=3763]

I do not recommend using Google translate for Latin, because this result is often wrong as is the case here (but then, garbage in-garbage out).

totis means something like "to/for whole things" (assuming dative rather than ablative) and unum means "one." Thus, totis totis unum should mean something like "for whole things, for whole things, one." It's hard to imagine a context that makes this phrase make sense.


RE: Text parsing and BITRANS - DONJCH - 10-12-2019

(10-12-2019, 09:03 AM)Stephen Carlson Wrote: You are not allowed to view links. Register or Login to view.It's hard to imagine a context that makes this phrase make sense.

Oh I don't know, as part of a recipe, maybe.
Depending on what came before.