The Voynich Ninja
Ambiguous Spaces - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: Ambiguous Spaces (/thread-4196.html)

Pages: 1 2 3


RE: Ambiguous Spaces - RobGea - 13-03-2024

Bah..those pesky Button Clickers are drawing level with us Hummus follk. Come, rise up, you Army of hummus Lovers, let us defeat this foe that knows not the savor of sesame. Tongue


RE: Ambiguous Spaces - pfeaster - 13-03-2024

The best way of handling ambiguous spacing probably depends on just what you're looking to investigate and how.  But if you're approaching "vords" as words in a linguistic sense, you might want some approach that would preserve the ambiguity all the way through to your results, since it could provide some clues of its own.

After all, ambiguous spacing isn't limited to the VMs, and in other cases it often seems to reflect real ambiguities in the structure of a language.  Consider French texts from around the time of the VMs (and for some time afterwards), which frequently take an ambivalent approach to spacing between definite articles or prepositions and the nouns that follow them.  Sometimes the article or preposition is written together with the following noun as though it were a prefix, sometimes it's written with a definite space, and sometimes it's hard to judge whether there's a space or not.  The presence or absence of a space doesn't seem to have been governed strictly by any set rules as it is today (e.g., le vin but l'eau).  And this inconsistency doesn't apply everywhere -- just in specific situations that were legitimately ambiguous until they got resolved by later orthographic convention.

If Voynichese behaves similarly, perhaps it's for the same general reason (though the specifics could be very different).  And my own impression is that it does behave similarly -- at least, insofar as situations where we find ambiguous spacing seem largely to match situations where we also find inconsistent but unambiguous spacing.

That said, it seems these cases also correlate pretty well with specific glyph adjacencies -- that is, whenever a common-ish "vord" contains certain glyphs next to each other, it will typically be found with a break between them, without a break between them, and with an ambiguous sort-of-break between them, all in loosely consistent ratios.  That's admittedly not the case with French.

Representative examples include [ra] [r.a] [r,a] and [lch] [l.ch] [l,ch] for adjacencies and [oraiin] [or.aiin] [or,aiin] and [olchedy] [ol.chedy] [ol,chedy] for "vords."

Apart from that, if you're analyzing word morphologies, treating all ambiguous spaces as "real" spaces will risk giving you non-valid truncated forms, while ignoring all ambiguous spaces will risk giving you non-valid concatenated forms.  Does one of those outcomes seem likely to create more problems for you than the other?

I'm not sure your option 4 would leave you "with the question which set of results is correct" -- I'd think it would give you two sets of results that would both contain some errors, but that would also share a lot of overlap.  Maybe you could be confident about any conclusions that are true for both sets of results while taking any other conclusions that are true for only one or the other set of results with a grain of salt?  If you were looking at word morphologies, for example, you could be skeptical of longer structures that occur only when ignoring ambiguous spaces, and also of shorter structures that occur only when treating all ambiguous spaces as "real."

Personally, I've usually gone with the "try it both ways" approach, but it's a tough call.  Whatever you decide to do, I've missed reading your work at "Agnostic Voynich" and will definitely look forward to seeing where your thoughts have been taking you lately!


RE: Ambiguous Spaces - Emma May Smith - 13-03-2024

The work which I'm doing involve measuring the "distance" between features. If distance is defined by X number of words away, then ambiguous spaces might make a difference. Two features which are seemingly adjacent might appear either in the same word or two words apart.

Given that I'm looking to measure these distances over a larger set of data, maybe it won't make a massive difference. But I want to start out in the right way as I know other inaccuracies will also creep in, and they could all add up.


RE: Ambiguous Spaces - pfeaster - 14-03-2024

(13-03-2024, 10:29 PM)Emma May Smith Wrote: You are not allowed to view links. Register or Login to view.The work which I'm doing involve measuring the "distance" between features. If distance is defined by X number of words away, then ambiguous spaces might make a difference. Two features which are seemingly adjacent might appear either in the same word or two words apart.

Could you do something like assign a value of 0.5 to an ambiguous space and 1 to a "definite" space?  Or maybe 0.4 or 0.6, so that pairs of ambiguous spaces would add a sum distinct from the value of a single "definite" space?


RE: Ambiguous Spaces - ReneZ - 14-03-2024

One other thing you could do is to define your own units, and use these units instead of words.

I agree it is a non-trivial exercise.


RE: Ambiguous Spaces - Emma May Smith - 14-03-2024

(14-03-2024, 01:34 AM)pfeaster Wrote: You are not allowed to view links. Register or Login to view.Could you do something like assign a value of 0.5 to an ambiguous space and 1 to a "definite" space?  Or maybe 0.4 or 0.6, so that pairs of ambiguous spaces would add a sum distinct from the value of a single "definite" space?



(14-03-2024, 01:51 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.One other thing you could do is to define your own units, and use these units instead of words.



I agree it is a non-trivial exercise.

These are interesting alternatives, and not something I had considered. It wouldn't be possible to have a 0.5 with the measures I'm currently taking, but certainly if I change the units then this becomes more of a possibility.

But then the features themselves depend on whether they're word-start or word-end. The patterning of glyphs inside words is one of those key things about the text that it's just so useful to include in what's being measured.

Hmmm...


RE: Ambiguous Spaces - Koen G - 14-03-2024

Is there any metric where your results are drastically changed depending on whether or not you use ambiguous spaces? Other than the obvious of course, like word count and the frequency of the space character.


RE: Ambiguous Spaces - Hider - 14-03-2024

(13-03-2024, 10:29 PM)Emma May Smith Wrote: You are not allowed to view links. Register or Login to view.The work which I'm doing involve measuring the "distance" between features. If distance is defined by X number of words away, then ambiguous spaces might make a difference. Two features which are seemingly adjacent might appear either in the same word or two words apart.
Try this.
Small spaces are syllable separators, large spaces are word separators.


RE: Ambiguous Spaces - dashstofsk - 14-03-2024

Have you considered doing this: remove all spaces, ambiguous or otherwise, to get a continuous stream of characters. Then use techniques of statistical analysis to determine the frequency and spread of character pairs, 3-character sequences. If you were to do this for each folio you would then be able to determine which folios are close in language. Then determine how statistically significant your results are. Will the results indicate consistency in language, or will they show something that is outside of what would be expected by statistical variance?


RE: Ambiguous Spaces - Emma May Smith - 14-03-2024

(14-03-2024, 09:47 AM)Koen G Wrote: You are not allowed to view links. Register or Login to view.Is there any metric where your results are drastically changed depending on whether or not you use ambiguous spaces? Other than the obvious of course, like word count and the frequency of the space character.

I don't know if drastically is apposite, but affected, yes. The features I want to measure are not words, but parts of words, sometimes in relation to the start and end. And each ambiguous space could potentially do three things:
  • Miss a feature at the end of a word, by placing it in the middle of a word.
  • Miss a feature at the start of a word, by placing it in the middle of a word.
  • Create a "new" feature in the middle of a word by putting two glyphs adjacent where they shouldn't be.

Now, it's not that these issues haven't cropped up before, I just want to assure myself that I've done what I can to eliminate them.