The Voynich Ninja
IVTT recipes - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: IVTT recipes (/thread-3407.html)

Pages: 1 2 3


IVTT recipes - ReneZ - 28-10-2020

As suggested by this post:

(23-10-2020, 04:29 PM)RobGea Wrote: You are not allowed to view links. Register or Login to view.
(23-10-2020, 12:45 PM)Koen G Wrote: You are not allowed to view links. Register or Login to view.Does anyone happen to have separate text files for each folio?

I can make them from Takahashi ( they would be created from the lowercase version ) --You are not allowed to view links. Register or Login to view.
For Takahashi with capitals most individual pages can be found on his site --You are not allowed to view links. Register or Login to view.

For some other transcriptions you can ( if i understand the manual ) extract pages using IVTT ( a thread with some cmdline examples of IVTT would be nice )


Here's an IVTT recipe to create one file per page. This works with any file in IVTFF format, in particular the five main transliteration files by Friedman (FSG), Currier, Takahashi, GC and Zandbergen-Landini.

It is based on 'csh' scripting language, and can be varied in many different ways. The output files only have the plain transliteration without any annotations.

Code:
foreach qq ( A B C D E F G H I J K L M N O Q S T )
  ivtt +Q${qq} ZL.txt temp.txt >&/dev/null
  foreach pp ( A B C D E F G H I J K L M N O P Q R S T U V W X )
     ivtt -x8 +P${pp} temp.txt ${qq}${pp}.txt >&/dev/null
  end
  \rm temp.txt
end

The first ivtt command splits the file into quires, preserving all annotations. The second one splits each quire into pages, removing all annotations.
The result is a series of files: AA.txt , AB.txt , AC.txt etc.
If a page does not exist, the file will be created but will be empty. It could be removed with another line in the script.

It may take some time getting used to these two-character codes, but they have some advantages.
The shell syntax  ??.txt  matches all pages in their correct order.
The first character indicates the quire: A=1, T=20

One can add further ivtt arguments to select only one Currier language, one illustration type or only text in paragraphs (for example).


RE: IVTT recipes - ReneZ - 28-10-2020

The problem of 'high Ascii'.

For all transliterations, see here:
You are not allowed to view links. Register or Login to view.

Both the GC and ZL transliterations include characters whose representations are in the 'high Ascii' area (Ascii code between 128 and 255).
From a present-day point of view, this is an old-fashioned way of doing things, but there is no Unicode area for Voynich characters yet, and at least it results in having a single byte per character.

Since these high-Ascii codes cannot be visualised in a standard way, and can cause issues with tools expecting Unicode, there is an IVTFF convention to represent them as @nnn; for Ascii code nnn. That makes the file standard Ascii, but more difficult to process by quickly-patched-together tools.

Both in Eva and in v101, all these characters are rare. They don't appear in the Takahashi transliteration, and the are very rare in the ZL transliteration. In v101 they are numerous, but still well under 1% of the text (quick guess).

ivtt has an option to easily swap between the 1-byte high-Ascii representation and the extended @nnn; representation:

-a1 goes from single-byte high-Ascii to @nnn;
-a2 goes from @nnn; to single-byte high-Ascii


RE: IVTT recipes - bi3mw - 28-10-2020

(28-10-2020, 11:34 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.It is based on 'csh' scripting language .....
Is there also a script for the bash shell ?
With the csh-script I get syntax errors.


RE: IVTT recipes - ReneZ - 28-10-2020

I am very far from being a shell guru.
All of the scripts I use are very simple. Some are /bin/sh and some are /bin/csh

I guess that bash should be similar to 'sh'.
In the Bourne shell, the equivalent of 'foreach' is 'for' with quite a different syntax.
If my example generates errors in /bin/csh please let me know and I can correct.
However, it is there to show a simple logic that can be translated to other syntaxes.


RE: IVTT recipes - bi3mw - 28-10-2020

(28-10-2020, 03:06 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.In the Bourne shell, the equivalent of 'foreach' is 'for'
Yes, the syntax is "for i in ,do, done" I will see if I can rewrite that. Undecided


RE: IVTT recipes - ReneZ - 28-10-2020

In the example, all messages from the tool are suppressed by routing the stderr output to /dev/null

One might want to see something happening when executing a script like this, which will take some time to complete, so one "echo" command in the outer loop might be nice.


RE: IVTT recipes - bi3mw - 28-10-2020

The csh-script works with slight modification:
csh -f extractor.csh

Code:
#!/bin/csh
foreach qq ( A B C D E F G H I J K L M N O Q S T )
  ./ivtt +Q${qq} ZL.txt temp.txt >&/dev/null
  foreach pp ( A B C D E F G H I J K L M N O P Q R S T U V W X )
    ./ivtt -x8 +P${pp} temp.txt ${qq}${pp}.txt >&/dev/null
  end
  \rm temp.txt
end



RE: IVTT recipes - ReneZ - 28-10-2020

The only difference I see is the use of ./ivtt , which means that ivtt is not part of the PATH.


RE: IVTT recipes - bi3mw - 28-10-2020

Yes, that's right. But this little detail can already outsmart you Wink


RE: IVTT recipes - bi3mw - 28-10-2020

This script works with the bash:

GNU bash, Version 5.0.17(1)-release (x86_64-pc-linux-gnu). Version 4.X should also work.

./extractor.sh

Code:
#!/bin/bash
for qq in A B C D E F G H I J K L M N O Q S T
    do
     ./ivtt +Q${qq} ZL.txt temp.txt >&/dev/null
  for pp in A B C D E F G H I J K L M N O P Q R S T U V W X
      do
         echo -en "\015\033[Kwrite ${qq}${pp}.txt"
       ./ivtt -x8 +P${pp} temp.txt ${qq}${pp}.txt >&/dev/null
  done
  rm temp.txt
done
echo
echo done
find . -name '*.txt' -type f -empty -delete

(28-10-2020, 11:34 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.If a page does not exist, the file will be created but will be empty. It could be removed with another line in the script.
I have added a line to delete the empty text files. There remain 227 files.
(28-10-2020, 03:34 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.One might want to see something happening when executing a script like this, which will take some time to complete, so one "echo" command in the outer loop might be nice.
Echo is inserted ( pretty fast ). Smile