Text parsing and BITRANS - Printable Version +- The Voynich Ninja (https://www.voynich.ninja) +-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html) +--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html) +--- Thread: Text parsing and BITRANS (/thread-3013.html) Pages:
1
2
|
Text parsing and BITRANS - ReneZ - 07-12-2019 There's quite a bit of confusion going round. Nick used the word 'parsing' correctly in You are not allowed to view links. Register or Login to view. , although there is a point to be made about transcription vs. transliteration. However, that's not my purpose here. One can transliterate the Voynich MS text using any system, be it Eva, Currier, v101 etc. The result is a text file. It is this text file that we want to parse as part of statistical analysis. So: - step 1 is to create a text file using some definition; - step 2 is parsing this text file with the aim to figure out which are the real 'units' of the Voynich MS text. For this second step I used to rely on BITRANS, a tool made in the 90's by Jacques Guy, but it was only available as a DOS command line tool, and still worked in early versions of Windows. It seems to be dysfunctional now, due to Windows evolutions, and I never saw a Unix / Linux version. You are not allowed to view links. Register or Login to view. is Dennis Stallings' page pointing to a download. This tool was perfectly fitted for the parsing task. It was used extensively in the definition of the Eva alphabet, and the creation of the interlinear file by Gabriel Landini and Jorge Stolfi. Just to illustrate, the example given by Nick in the above-mentioned post: Quote:Task #2: Parsing the raw transcription to determine the fundamental units (its tokens) e.g. [qo][k][ee][dy]is easily done by defining substitution rules. BITRANS then allows to use these rules back and forth. It also allows to define context-dependent rules, for example at start of words or end of words. Now this tool seems lost, but I have been making good progress with a revival implementation. It does not allow multi-pass parsing, but it does support most other features that I used to find important. The context-dependent rules are not needed when converting between different transliteration alphabets, but they are likely to play a role when parsing the results for interpretation of the text. Just to give a simple example, the following definition file changes: - 'con' at word starts into '9' - 'us' at word ends into '9'. #con #9 us# 9# It changes this text: consensus contract proconsul tempus couscous into this: 9sens9 9tract proconsul temp9 cousco9 By specify the 'backwards' option, the same command with the same definition file changes the second back into the first. RE: Text parsing and BITRANS - -JKP- - 07-12-2019 (07-12-2019, 12:57 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.... Yes, exactly. As is probably apparent by now from the other thread, I do not like the word parse being used in terms of mapping the Voynich glyphs to plaintext. One can parse complex glyphs into separate units, but it's not a standard way of using the term—"mapping" (as suggested by Anton) is better for describing the correspondence and breakdown of the VMS glyphs into their plaintext representatives because it is less likely to be confused with discussions about parsing the transcript (the Voynich text). RE: Text parsing and BITRANS - -JKP- - 07-12-2019 For search or replace functions, I use a tool that combines an intuitive graphical text-boxed based search together with grep capabilities. What is nice is that it will also do searches and replaces on text formatting (bold, italic) and colors. However, it's not a stand-alone application. Part of the reason I can't release my transliteration tools is because they are all integrated... my own fonts, my own search-and-replace, my own transcripts. It took about four years to get it working really well, but the parts are integrated into a whole, they are not individual utilities, so you can't just peel it apart. It is very flexible, however. I can change almost anything to anything. RE: Text parsing and BITRANS - Aga Tentakulus - 07-12-2019 Apart from the correct interpretation of a glyph, there is also the correct assignment. Let's take -9 based on the word Taurus, -um / -us. So (-9 -um) stands for singular. "um" is also the abbreviation for "unum". If 9- is at the beginning, "um" stands for one, one. If 9 stands alone, it has a similar meaning. But for an electronic translator to understand it, I have to write it out. "unum". So the "9" corresponds to 4 normal letters. Abgesehen von der richtigen Auslegung einer Glyphe, kommt noch die richtige Zuweisung. Nehmen wir -9 basierend auf dem Wort Taurus, -um / -us. So steht (-9 -um) für Einzahl. "um" ist aber auch der Kürzel für "unum" Steht 9- am Anfang, so steht "um" für ein , eine. Steht 9 alleine, so hat sie eine ähnliche Bedeutung. Damit aber ein elektronischer Übersetzer es versteht muss ich es ausschreiben. " unum". So entspricht die "9", 4 normalen Buchstaben. Basierend auf dem Wort Taurus. [font=Tahoma, Verdana, Arial, sans-serif] [/font] RE: Text parsing and BITRANS - -JKP- - 07-12-2019 (07-12-2019, 02:26 PM)Aga Tentakulus Wrote: You are not allowed to view links. Register or Login to view.... It's just a small point, but most of the time (in Latin scripts) when 9- is at the beginning, it stands for "con-" or "com-". RE: Text parsing and BITRANS - Aga Tentakulus - 07-12-2019 JKP Good question. Does he use real abbreviations? I suspect that he uses German spelling on Latin soil. Is it what it looks like? RE: Text parsing and BITRANS - -JKP- - 07-12-2019 I don't want to get too far off René's topic, Aga, but if the chart you posted is about Latin scribal conventions, a few of those are correct, but quite a few of them are not. Is this something you did? Or by someone else? It isn't quite correct. Here are some of the inconsistencies with Latin scribal conventions:
BUT... I'm not quite sure what the chart represents:
RE: Text parsing and BITRANS - Aga Tentakulus - 07-12-2019 No, I wasn't. NSA Report 1978 RE: Text parsing and BITRANS - Stephen Carlson - 10-12-2019 (07-12-2019, 02:26 PM)Aga Tentakulus Wrote: You are not allowed to view links. Register or Login to view. I do not recommend using Google translate for Latin, because this result is often wrong as is the case here (but then, garbage in-garbage out). totis means something like "to/for whole things" (assuming dative rather than ablative) and unum means "one." Thus, totis totis unum should mean something like "for whole things, for whole things, one." It's hard to imagine a context that makes this phrase make sense. RE: Text parsing and BITRANS - DONJCH - 10-12-2019 (10-12-2019, 09:03 AM)Stephen Carlson Wrote: You are not allowed to view links. Register or Login to view.It's hard to imagine a context that makes this phrase make sense. Oh I don't know, as part of a recipe, maybe. Depending on what came before. |