RobGea > 16-09-2020, 07:10 PM
Aga Tentakulus > 16-09-2020, 08:05 PM
Koen G > 16-09-2020, 08:05 PM
(16-09-2020, 07:10 PM)RobGea Wrote: You are not allowed to view links. Register or Login to view.Maybe do the same with the current ZL version (ZL_ivtff_1r), I just tried but don't know what to with the rare chars @130 etc and the apostrophes.Right, this may have been why I went with TT initially, I may have found it easier to convert to pure text. However, if I understood Rene correctly in the other thread, he thinks using TT is not ideal. If this is the case it may be worth addressing.
bi3mw > 17-09-2020, 09:25 AM
(16-09-2020, 07:10 PM)RobGea Wrote: You are not allowed to view links. Register or Login to view.If so bi3mw posted a 'voynich_as_text' file based on TT, just post that here as an attachment, or something similar.Yes, I did. The link is here:
Ruby Novacna > 17-09-2020, 01:30 PM
ReneZ > 17-09-2020, 01:53 PM
Quote:I think that it should (in principle, at least) be possible to write a programme along the following lines:The reason this is possible is that adding tokens to the list of tokens shortens the total amount of text but flattens out the stats. So there is a ceiling to the process.
- Start with a corpus (say, Q20)
- Filter out all unreliable words (e g. words containing unreadable glyphs, weirdos, or glyphs that occur less than five times, say)
- Create a starting list of tokens based on all the glyphs that remain
- Calculate a metric based on the entropy of the text (more on this below)
- Find the pair of tokens that, when combined as a new composite token, yields the best metric
- Halt if the best metric found is worse than the previous best metric
- Else add that new composite token to the list, reparse and repeat
Emma May Smith > 17-09-2020, 03:17 PM
RobGea > 17-09-2020, 04:18 PM
ReneZ > 17-09-2020, 05:23 PM
Koen G > 17-09-2020, 05:33 PM