The Voynich Ninja

Pages: 1 2 3

As suggested by this post:

(23-10-2020, 04:29 PM)RobGea Wrote: You are not allowed to view links. Register or Login to view.
(23-10-2020, 12:45 PM)Koen G Wrote: You are not allowed to view links. Register or Login to view.Does anyone happen to have separate text files for each folio?

I can make them from Takahashi ( they would be created from the lowercase version ) --You are not allowed to view links. Register or Login to view.
For Takahashi with capitals most individual pages can be found on his site --You are not allowed to view links. Register or Login to view.

For some other transcriptions you can ( if i understand the manual ) extract pages using IVTT ( a thread with some cmdline examples of IVTT would be nice )

Here's an IVTT recipe to create one file per page. This works with any file in IVTFF format, in particular the five main transliteration files by Friedman (FSG), Currier, Takahashi, GC and Zandbergen-Landini.

It is based on 'csh' scripting language, and can be varied in many different ways. The output files only have the plain transliteration without any annotations.

Code:
foreach qq ( A B C D E F G H I J K L M N O Q S T )

  ivtt +Q${qq} ZL.txt temp.txt >&/dev/null

  foreach pp ( A B C D E F G H I J K L M N O P Q R S T U V W X )

     ivtt -x8 +P${pp} temp.txt ${qq}${pp}.txt >&/dev/null

  end

  \rm temp.txt

end

The first ivtt command splits the file into quires, preserving all annotations. The second one splits each quire into pages, removing all annotations.
The result is a series of files: AA.txt , AB.txt , AC.txt etc.
If a page does not exist, the file will be created but will be empty. It could be removed with another line in the script.

It may take some time getting used to these two-character codes, but they have some advantages.
The shell syntax ??.txt matches all pages in their correct order.
The first character indicates the quire: A=1, T=20

One can add further ivtt arguments to select only one Currier language, one illustration type or only text in paragraphs (for example).

The problem of 'high Ascii'.

For all transliterations, see here:
You are not allowed to view links. Register or Login to view.

Both the GC and ZL transliterations include characters whose representations are in the 'high Ascii' area (Ascii code between 128 and 255).
From a present-day point of view, this is an old-fashioned way of doing things, but there is no Unicode area for Voynich characters yet, and at least it results in having a single byte per character.

Since these high-Ascii codes cannot be visualised in a standard way, and can cause issues with tools expecting Unicode, there is an IVTFF convention to represent them as @nnn; for Ascii code nnn. That makes the file standard Ascii, but more difficult to process by quickly-patched-together tools.

Both in Eva and in v101, all these characters are rare. They don't appear in the Takahashi transliteration, and the are very rare in the ZL transliteration. In v101 they are numerous, but still well under 1% of the text (quick guess).

ivtt has an option to easily swap between the 1-byte high-Ascii representation and the extended @nnn; representation:

-a1 goes from single-byte high-Ascii to @nnn;
-a2 goes from @nnn; to single-byte high-Ascii

(28-10-2020, 11:34 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.It is based on 'csh' scripting language .....

Is there also a script for the bash shell ?
With the csh-script I get syntax errors.

I am very far from being a shell guru.
All of the scripts I use are very simple. Some are /bin/sh and some are /bin/csh

I guess that bash should be similar to 'sh'.
In the Bourne shell, the equivalent of 'foreach' is 'for' with quite a different syntax.
If my example generates errors in /bin/csh please let me know and I can correct.
However, it is there to show a simple logic that can be translated to other syntaxes.

(28-10-2020, 03:06 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.In the Bourne shell, the equivalent of 'foreach' is 'for'

Yes, the syntax is "for i in ,do, done" I will see if I can rewrite that. Undecided

In the example, all messages from the tool are suppressed by routing the stderr output to /dev/null

One might want to see something happening when executing a script like this, which will take some time to complete, so one "echo" command in the outer loop might be nice.

The csh-script works with slight modification:
csh -f extractor.csh

Code:
#!/bin/csh

foreach qq ( A B C D E F G H I J K L M N O Q S T )

  ./ivtt +Q${qq} ZL.txt temp.txt >&/dev/null

  foreach pp ( A B C D E F G H I J K L M N O P Q R S T U V W X )

    ./ivtt -x8 +P${pp} temp.txt ${qq}${pp}.txt >&/dev/null

  end

  \rm temp.txt

end

The only difference I see is the use of ./ivtt , which means that ivtt is not part of the PATH.

Yes, that's right. But this little detail can already outsmart you Wink

This script works with the bash:

GNU bash, Version 5.0.17(1)-release (x86_64-pc-linux-gnu). Version 4.X should also work.

./extractor.sh

Code:
#!/bin/bash

for qq in A B C D E F G H I J K L M N O Q S T

    do

     ./ivtt +Q${qq} ZL.txt temp.txt >&/dev/null

  for pp in A B C D E F G H I J K L M N O P Q R S T U V W X

      do

         echo -en "\015\033[Kwrite ${qq}${pp}.txt"

       ./ivtt -x8 +P${pp} temp.txt ${qq}${pp}.txt >&/dev/null

  done

  rm temp.txt

done

echo 

echo done

find . -name '*.txt' -type f -empty -delete

(28-10-2020, 11:34 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.If a page does not exist, the file will be created but will be empty. It could be removed with another line in the script.

I have added a line to delete the empty text files. There remain 227 files.

(28-10-2020, 03:34 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.One might want to see something happening when executing a script like this, which will take some time to complete, so one "echo" command in the outer loop might be nice.

Echo is inserted ( pretty fast ). Smile

Pages: 1 2 3

ReneZ

ReneZ

bi3mw

ReneZ

bi3mw

ReneZ

bi3mw

ReneZ

bi3mw

bi3mw