The Voynich Ninja
Agreeing on standard transliteration files to use - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Voynich Talk (https://www.voynich.ninja/forum-6.html)
+--- Thread: Agreeing on standard transliteration files to use (/thread-3357.html)

Pages: 1 2 3


RE: Agreeing on standard transliteration files to use - RobGea - 16-09-2020

This is just some standard files for computing entropy right ?
Then it would just be pure text, no locators or anything fancy ?

If so bi3mw posted a 'voynich_as_text' file based on TT, just post that here as an attachment, or something similar.

Then it would be consistent and freely available to all, a nice and straightforward baseline metric.

Maybe do the same with the current ZL version (ZL_ivtff_1r), I just tried but don't know what to with the rare chars @130 etc and the apostrophes.


RE: Agreeing on standard transliteration files to use - Aga Tentakulus - 16-09-2020

When it comes to transliteration files, this link is certainly useful.

I think most of them are represented here. I'm not sure about this one, but I think Rene knows more about it.

You are not allowed to view links. Register or Login to view.


RE: Agreeing on standard transliteration files to use - Koen G - 16-09-2020

(16-09-2020, 07:10 PM)RobGea Wrote: You are not allowed to view links. Register or Login to view.Maybe do the same with the current ZL version (ZL_ivtff_1r), I just tried but don't know what to with the rare chars @130 etc and the apostrophes.
Right, this may have been why I went with TT initially, I may have found it easier to convert to pure text. However, if I understood Rene correctly in the other thread, he thinks using TT is not ideal. If this is the case it may be worth addressing.


RE: Agreeing on standard transliteration files to use - bi3mw - 17-09-2020

(16-09-2020, 07:10 PM)RobGea Wrote: You are not allowed to view links. Register or Login to view.If so bi3mw posted a 'voynich_as_text' file based on TT, just post that here as an attachment, or something similar.
Yes, I did. The link is here:
You are not allowed to view links. Register or Login to view.

The file was created based on the VIB.


RE: Agreeing on standard transliteration files to use - Ruby Novacna - 17-09-2020

This may be the time to improve transcription, taking into account the different shapes of some glyphs, such as 8, k, l and especially 9? Take the opportunity to try several variants of ligature reading? 
Unfortunately I don't have enough programming skills to help, however I am willingly available for proofreading a few pages.


RE: Agreeing on standard transliteration files to use - ReneZ - 17-09-2020

I find this discussion a bit surprising, because all that is being discussed exists already exists since quite a while.

Let's take Nick's most recent post as an example:

Quote:I think that it should (in principle, at least) be possible to write a programme along the following lines:
  • Start with a corpus (say, Q20)

  • Filter out all unreliable words (e g. words containing unreadable glyphs, weirdos, or glyphs that occur less than five times, say)

  • Create a starting list of tokens based on all the glyphs that remain

  • Calculate a metric based on the entropy of the text (more on this below)

  • Find the pair of tokens that, when combined as a new composite token, yields the best metric

  • Halt if the best metric found is worse than the previous best metric

  • Else add that new composite token to the list, reparse and repeat

The reason this is possible is that adding tokens to the list of tokens shortens the total amount of text but flattens out the stats. So there is a ceiling to the process.


Suppose that one wants to do this not just once, but for different cases:
- the whole MS
- the A language corpus
- quire 20 or quire 13
- and for the v101 transliteration as well
- and also when converted to Cuva

then one would be spending a lot of time just for preparing manually each version of the text to be processed. (The above bullets imply 16 different text versions, but it is still just a selection).

For me, all of the first steps in Nick's list and all of the bullets I listed after the quote are simple commands and can be combined in a single script, due to the use of some standards.

The only 'work' that remains is the interesting part: computing a clever metric and a clever optimisation scheme.


RE: Agreeing on standard transliteration files to use - Emma May Smith - 17-09-2020

I hope that any potential programme lets a human choose which replacements/combinations they would like to try. A text optimised for entropy by a computer would only repeat the work of Koen. What we're really after is the crossover point of higher entropy and reasoned replacements. How far can entropy be raised based on our understanding of the text and script?


RE: Agreeing on standard transliteration files to use - RobGea - 17-09-2020

Humans Rolleyes  No way !   I'm gonna whip me up a Genetic algo and maybe some Simulated Annealing Nerd Nerd Cool Tongue


RE: Agreeing on standard transliteration files to use - ReneZ - 17-09-2020

Genetic is the way to go Yes


RE: Agreeing on standard transliteration files to use - Koen G - 17-09-2020

No but seriously, if one could train one of those "AI" to optimize Voynichese that would be interesting. Do you think this kind of software is accessible nowadays? Not to me obviously but perhaps to experienced programmers.