07-12-2019, 12:57 PM
There's quite a bit of confusion going round.
Nick used the word 'parsing' correctly in You are not allowed to view links. Register or Login to view. , although there is a point to be made about transcription vs. transliteration. However, that's not my purpose here.
One can transliterate the Voynich MS text using any system, be it Eva, Currier, v101 etc. The result is a text file.
It is this text file that we want to parse as part of statistical analysis.
So:
- step 1 is to create a text file using some definition;
- step 2 is parsing this text file with the aim to figure out which are the real 'units' of the Voynich MS text.
For this second step I used to rely on BITRANS, a tool made in the 90's by Jacques Guy, but it was only available as a DOS command line tool, and still worked in early versions of Windows. It seems to be dysfunctional now, due to Windows evolutions, and I never saw a Unix / Linux version.
You are not allowed to view links. Register or Login to view. is Dennis Stallings' page pointing to a download.
This tool was perfectly fitted for the parsing task. It was used extensively in the definition of the Eva alphabet, and the creation of the interlinear file by Gabriel Landini and Jorge Stolfi.
Just to illustrate, the example given by Nick in the above-mentioned post:
It also allows to define context-dependent rules, for example at start of words or end of words.
Now this tool seems lost, but I have been making good progress with a revival implementation. It does not allow multi-pass parsing, but it does support most other features that I used to find important.
The context-dependent rules are not needed when converting between different transliteration alphabets, but they are likely to play a role when parsing the results for interpretation of the text.
Just to give a simple example, the following definition file changes:
- 'con' at word starts into '9'
- 'us' at word ends into '9'.
#con #9
us# 9#
It changes this text:
consensus contract proconsul tempus couscous
into this:
9sens9 9tract proconsul temp9 cousco9
By specify the 'backwards' option, the same command with the same definition file changes the second back into the first.
Nick used the word 'parsing' correctly in You are not allowed to view links. Register or Login to view. , although there is a point to be made about transcription vs. transliteration. However, that's not my purpose here.
One can transliterate the Voynich MS text using any system, be it Eva, Currier, v101 etc. The result is a text file.
It is this text file that we want to parse as part of statistical analysis.
So:
- step 1 is to create a text file using some definition;
- step 2 is parsing this text file with the aim to figure out which are the real 'units' of the Voynich MS text.
For this second step I used to rely on BITRANS, a tool made in the 90's by Jacques Guy, but it was only available as a DOS command line tool, and still worked in early versions of Windows. It seems to be dysfunctional now, due to Windows evolutions, and I never saw a Unix / Linux version.
You are not allowed to view links. Register or Login to view. is Dennis Stallings' page pointing to a download.
This tool was perfectly fitted for the parsing task. It was used extensively in the definition of the Eva alphabet, and the creation of the interlinear file by Gabriel Landini and Jorge Stolfi.
Just to illustrate, the example given by Nick in the above-mentioned post:
Quote:Task #2: Parsing the raw transcription to determine the fundamental units (its tokens) e.g. [qo][k][ee][dy]is easily done by defining substitution rules. BITRANS then allows to use these rules back and forth.
It also allows to define context-dependent rules, for example at start of words or end of words.
Now this tool seems lost, but I have been making good progress with a revival implementation. It does not allow multi-pass parsing, but it does support most other features that I used to find important.
The context-dependent rules are not needed when converting between different transliteration alphabets, but they are likely to play a role when parsing the results for interpretation of the text.
Just to give a simple example, the following definition file changes:
- 'con' at word starts into '9'
- 'us' at word ends into '9'.
#con #9
us# 9#
It changes this text:
consensus contract proconsul tempus couscous
into this:
9sens9 9tract proconsul temp9 cousco9
By specify the 'backwards' option, the same command with the same definition file changes the second back into the first.