The Voynich Ninja

Full Version: Curve-Line System - Bluetoes edition
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5 6
(29-01-2025, 12:24 AM)Bluetoes101 Wrote: You are not allowed to view links. Register or Login to view.See, I'm hoping I missed the "After introducing the concept of ‘looping’ slot grammars" and that is why it sounds like Mauro is casting spells at me Big Grin . I may need to backtrack some to get a foothold into that, but I certainly will, just learning what an Nbit and LOOP-L (rappers?) is will help demystify things greatly.

I have the advantage of not being able to fully understand either your CLS or Mauro's grammars  Smile So for me they look similar enough.

But as far as I understand, both systems identify certain Voynichese sequences as "conforming" and certain other sequences as "non conforming". And you seem to be using a metric, normally called "coverage", that tells how many of actual Voynichese words (from a particular page or the whole manuscript) are identified as conforming. Or how many are identified as non-conforming, which is basically a variant of the same metric.

However, coverage alone doesn't show how good a system actually is, because you can make a very flexible "anything goes" system, that would obviously have perfect coverage. So, there should be another metric, for example, "specificity", which tells us how many possible sequences a system can generate, whether attested in the MS or now. For a loop-like system, that can generate sequences of any lengths, we can talk about how many possible sequences up to length X the system can produce. Then using a combination of "coverage" and "specificity" you can argue which system better describes the text.

But then again, you can make a system that just lists all known Voynichese words, either explicitly (just making a list) or implicitly (via a long list of rules like "qo can precede t if t is followed by edy, unless it's followed by..."). This system would have a perfect coverage (all words present in the MS are listed) and a perfect specificity (only words present in the MS are listed). So, "coverage" and "specificity" might not be enough, there probably should be another metric reflecting how complex the system itself is.

As far as I understand this, Mauro has found a good answer to this by making a metric that shows how many bits it takes to encode the whole text and the grammar together. But this could be an overkill for a relatively simple system, maybe it makes sense to just start with understanding what the specificity of the original CLS and your version of CLS is. E.g., how many possible glyph sequences of length 3 can either system produce?
Yeah I think it might be too simple. I could probably fit it on a couple of napkins in a bar. I guess we will see what happens. 

I worry a little about making it score better for the sake of it, but I should probably think about stuff like "how many possible glyph sequences of length 3 can either system produce?" - As well as the other information and ideas you have given me, so thank you for taking the time to explain and giving me a look at what may await.

On - "how many possible glyph sequences of length 3 can either system produce?"
Thinking about how this answer could be a "better" one (I haven't checked yet but the answer will be a lot more than CLS)

I have "y" mapped to "X" currently, which is essentially *spacebar* 
As "y" is mostly word start and end, I could just un-map it. Now *spacebar* becomes *delete*

This would result in much fewer possible glyph sequences while having a very negligible effect on overall conformance. 

I wonder though if the path of deciding "chokyeedy" is "chokeed" rather than "chok-eed-" to appease a scoring metric is "better". Also do the new glyph pairings I create here and there (y in middle of words) have any validity to them if it was me who created them? I suppose I did create "k-e" "d-" so the same question could be asked of that.

This is a core feature of CLS, rare glyphs + q, t, k, p, f are deleted. So in a way I guess I would just be following suit... but, I just don't like the idea.
(29-01-2025, 04:18 AM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.
(29-01-2025, 12:24 AM)Bluetoes101 Wrote: You are not allowed to view links. Register or Login to view.See, I'm hoping I missed the "After introducing the concept of ‘looping’ slot grammars" and that is why it sounds like Mauro is casting spells at me Big Grin . I may need to backtrack some to get a foothold into that, but I certainly will, just learning what an Nbit and LOOP-L (rappers?) is will help demystify things greatly.

I have the advantage of not being able to fully understand either your CLS or Mauro's grammars  Smile So for me they look similar enough.

But as far as I understand, both systems identify certain Voynichese sequences as "conforming" and certain other sequences as "non conforming". And you seem to be using a metric, normally called "coverage", that tells how many of actual Voynichese words (from a particular page or the whole manuscript) are identified as conforming. Or how many are identified as non-conforming, which is basically a variant of the same metric.

However, coverage alone doesn't show how good a system actually is, because you can make a very flexible "anything goes" system, that would obviously have perfect coverage. So, there should be another metric, for example, "specificity", which tells us how many possible sequences a system can generate, whether attested in the MS or now. For a loop-like system, that can generate sequences of any lengths, we can talk about how many possible sequences up to length X the system can produce. Then using a combination of "coverage" and "specificity" you can argue which system better describes the text.

But then again, you can make a system that just lists all known Voynichese words, either explicitly (just making a list) or implicitly (via a long list of rules like "qo can precede t if t is followed by edy, unless it's followed by..."). This system would have a perfect coverage (all words present in the MS are listed) and a perfect specificity (only words present in the MS are listed). So, "coverage" and "specificity" might not be enough, there probably should be another metric reflecting how complex the system itself is.

Perfectly summarized.


(29-01-2025, 04:18 AM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.As far as I understand this, Mauro has found a good answer to this by making a metric that shows how many bits it takes to encode the whole text and the grammar together. But this could be an overkill for a relatively simple system, maybe it makes sense to just start with understanding what the specificity of the original CLS and your version of CLS is. E.g., how many possible glyph sequences of length 3 can either system produce?

I think the ideal approach would be to use a compression algorithm which is not constrained in the set of possible chunks it can use but which can find the optimal solution nonetheless. Unfortunately I don't know how to do it: an exhaustive search is impossible (the search space is too big), or maybe a variant of Lempel-Ziv LZ78 could be used (but I'm no sure at all, LZ78 may even not guarantee optimality). The "slot grammars" approach was a (rather good, I think) workaround around this problem.

With an algorithm like that it would not matter much if one starts from a curve-line system or from EVA. In effect, a working algorithm should be able to pull toghether 'C's and '\'s into 'characters' (thus progressing on the transcription problem) and then 'chunks of characters' (thus progressing on words structure) in the most sensible way.

PS.: waiting for Bluetoes to post his curve-line model Smile
(29-01-2025, 03:48 PM)Bluetoes101 Wrote: You are not allowed to view links. Register or Login to view.Yeah I think it might be too simple. I could probably fit it on a couple of napkins in a bar. I guess we will see what happens. 

Simple is not bad, to me simple is actually better both at the metrics and in explanatory capacity. Also, I doubt that a 240+ page manuscript was created using a complex system with many processing steps. It is possible, but would require a separate explanation. To me it seems more likely that the underlying principle is quite simple.

(29-01-2025, 03:48 PM)Bluetoes101 Wrote: You are not allowed to view links. Register or Login to view.I worry a little about making it score better for the sake of it, but I should probably think about stuff like "how many possible glyph sequences of length 3 can either system produce?" - As well as the other information and ideas you have given me, so thank you for taking the time to explain and giving me a look at what may await.

I don't know if it makes sense to introduce changes that would improve the score if they go against the logic of the system. I'm certainly not arguing for that. I'm just curious to what extent your version of CLS represents an improvement over the original CLS according to some well defined metric. I think, coverage alone (comparing the number of conforming words) is probably not a good indicator without additional way to account for the predictive power of the system.
(27-01-2025, 03:00 AM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.I thought l/y pairing was already included at least in pfeaster's summary of the curve-line system? I don't remember the history of curve-line systems very well, but I think I remember l/y mentioned somewhere.

The pairing of [l] and [y] comes up in Michael Winkelmann’s You are not allowed to view links. Register or Login to view. (2008), Brian Cham's You are not allowed to view links. Register or Login to view. (2014/5), and my own You are not allowed to view links. Register or Login to view. (2020).  I believe we each came up with it independently, which may weigh in its favor.

Along with the obvious similarities, there's also a difference among "curve-line" hypotheses that may be worth pointing out.   

Winkelmann writes in terms of Harmoniegesetze or "laws of harmony" that govern the order of discrete glyphs depending on whether they belong to what he calls the e-class or the i-class.  Cham writes similarly of constraints on the order of discrete glyphs based on whether their "base shape" is a "curve" or a "line."

My own suspicion is that Voynichese is built on a frame of hatchmarks (made up of bare "curves" and bare "lines") that are punctuated or augmented with flourishes ("tails" and "loops" and such).  The idea is that there's not a discrete glyph [r] that prefers to come after [i] as [ir], and a discrete glyph [s] that prefers to come after [e] as [es]; rather, the composite forms transcribed as [r] and [s] happen to occur when the same flourish is added to the second hatchmark in a sequence [ii] or [ee].  In other words, I think it's probably not so much that similar glyphs "prefer" to appear next to each other, but rather that the forms we usually treat as "glyphs" don't correspond to the building blocks of Voynichese in the first place.  I see that Anton proposes much the same idea in a comment from 2015 at the end of Brian's article, referring to it as "superimposition."

So I'd suggest there are really two competing hypotheses here:

1. One treating curves and lines as parts of unitary glyphs (e.g. Winkelmann, Cham, Bluetoes)
2. One treating curves and lines as separable elements (e.g., Alipov, Feaster)
This separation of two possible approaches in interpreting the glyph shapes is central to my present thinking.
Right now, I am not yet beyond the "what if... " stage. Too early to decide, but I cannot see how 'a bit of both' would be possible.
Option 2 does lead to more interesting consequences...
Thank you very much Patrick, 

I was not aware of much of this. Unfortunately the first link is broken, but I look forward to reading your work, that links ok.

I think I started out with a work that believed glyphs band together and so followed suit. I don't think I align with that so much anymore, but I do believe the core glyphs were paired into curve-built and line-built alternatives. I don't know how it started or why (hatchmarks for example) but I do believe the glyphs can be sorted, my sorting is much like CLS and I would imagine anyone who subscribes to the "some are lines, some are curves" way of thinking. However I don't think "like attracts like". My sorting also abandons CLS logic for one glyph, because it better represents the my system. 

EVA: d. In my opinion is not a shape starting with a curve, it only ends with one. Which is weird, because the rest of the "c-built" glyphs are not this way. Obviously I have no evidence to back any of this up, but something I wondered when analysing the relationships of this glyph was;

[Image: 88.jpg]

Did the idea of a line based "d" just not really work, it got more and more curve-like until the result was too hard to tell apart maybe. Just chucking a ball at the target, no idea of facts obviously. Some parts of the early manuscript make me think "d" was always a bit of an issue. The top paragraph of "f2r" would be my prime example.. look at how many different attempts happen in different ways.  


The key idea to my system is switching between curve to line and line to curve, I don't think whoever wrote this text cared about camps of "c" and camps of "l" anymore. "ol" and "or" make up over 8000 pairings, for this alone I don't think the idea of separating curves and lines by "a" came into their thinking. While this idea was born out of CLS it has become quite distant in some areas.





  
@Mauro what do you need from me transcription-cleaning wise (is this even a term or did I just make it up?..)
Until now I have been using the below as a "whole manuscript" base check 

#=IVTFF EvaT 2.0 M 3
# Extracted from LSI_ivtff_0d.txt
# Version 2a of 02/02/2023

My preference however is, very strongly, with Rene's transcription "LZ v. 3a" The other was just easier for bulk conformance checks. 
My code is set up to deal with some things in this transcription, however ambiguity, extended eva etc, I was dealing with myself. 
Over 10 pages this was fine (I felt), but going beyond that does my code need to account for this in some way? I'm guessing "delete" is the usual option.. I doubt this makes a massive difference overall, but things like "is this a, or o?" when a or o make my system work.. its a bit painful to say "delete". Moreover I probably slightly disagree with Rene on maybe 1 glyph per page on average, but others can be (and usually are) several, it adds up.
In system update news.. I'm a wolly, as per usual.

I expressed an idea based on Tiltmans words, then went off on some brain adventure for no reason. 
I have removed "X" from the system. I have gone back to thinking "y" is sometimes a stand in for "a" or maybe "o". It is now my 3rd "Switch". 

I feel like I searched my entire house for my keys, then found them in my pocket. Maybe the "l/y" idea chucked me sideways, but while I think they are related via glyph creation/pairing, the function is that "l" belongs with glyphs made from lines, "y" is a tailed "o" or/and "a". "q" "qo" still allude me. but are the final puzzle piece so it is close to submitting (.. and tearing to bits Big Grin )
(30-01-2025, 01:07 AM)Bluetoes101 Wrote: You are not allowed to view links. Register or Login to view.Unfortunately the first link is broken

It looks like the problem may be that the connection isn't secure, so it can't use the "https" protocol.  This URL should work, however (copy and paste without the surrounding quotation marks):

"http://voynich.tamagothi.de/2008/07/05/die-harmonie-der-glyphenfolgen/"

If you don't read German and don't want to risk the non-secure "http," you can also plug that URL into Google Translate and get a securely accessible translation.  Definitely worth a read, given your interests!

Incidentally, I see the author identified in different places as Michael Winkelmann and Elias Schwerdtfeger.  Can anyone clarify?
Pages: 1 2 3 4 5 6