28-10-2020, 11:34 AM
As suggested by this post:
Here's an IVTT recipe to create one file per page. This works with any file in IVTFF format, in particular the five main transliteration files by Friedman (FSG), Currier, Takahashi, GC and Zandbergen-Landini.
It is based on 'csh' scripting language, and can be varied in many different ways. The output files only have the plain transliteration without any annotations.
The first ivtt command splits the file into quires, preserving all annotations. The second one splits each quire into pages, removing all annotations.
The result is a series of files: AA.txt , AB.txt , AC.txt etc.
If a page does not exist, the file will be created but will be empty. It could be removed with another line in the script.
It may take some time getting used to these two-character codes, but they have some advantages.
The shell syntax ??.txt matches all pages in their correct order.
The first character indicates the quire: A=1, T=20
One can add further ivtt arguments to select only one Currier language, one illustration type or only text in paragraphs (for example).
(23-10-2020, 04:29 PM)RobGea Wrote: You are not allowed to view links. Register or Login to view.(23-10-2020, 12:45 PM)Koen G Wrote: You are not allowed to view links. Register or Login to view.Does anyone happen to have separate text files for each folio?
I can make them from Takahashi ( they would be created from the lowercase version ) --You are not allowed to view links. Register or Login to view.
For Takahashi with capitals most individual pages can be found on his site --You are not allowed to view links. Register or Login to view.
For some other transcriptions you can ( if i understand the manual ) extract pages using IVTT ( a thread with some cmdline examples of IVTT would be nice )
Here's an IVTT recipe to create one file per page. This works with any file in IVTFF format, in particular the five main transliteration files by Friedman (FSG), Currier, Takahashi, GC and Zandbergen-Landini.
It is based on 'csh' scripting language, and can be varied in many different ways. The output files only have the plain transliteration without any annotations.
Code:
foreach qq ( A B C D E F G H I J K L M N O Q S T )
ivtt +Q${qq} ZL.txt temp.txt >&/dev/null
foreach pp ( A B C D E F G H I J K L M N O P Q R S T U V W X )
ivtt -x8 +P${pp} temp.txt ${qq}${pp}.txt >&/dev/null
end
\rm temp.txt
end
The first ivtt command splits the file into quires, preserving all annotations. The second one splits each quire into pages, removing all annotations.
The result is a series of files: AA.txt , AB.txt , AC.txt etc.
If a page does not exist, the file will be created but will be empty. It could be removed with another line in the script.
It may take some time getting used to these two-character codes, but they have some advantages.
The shell syntax ??.txt matches all pages in their correct order.
The first character indicates the quire: A=1, T=20
One can add further ivtt arguments to select only one Currier language, one illustration type or only text in paragraphs (for example).