Need advice for testing of hypotheses related to the self-citation method

Need advice for testing of hypotheses related to the self-citation method - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: Need advice for testing of hypotheses related to the self-citation method (/thread-4765.html)

Pages: 1 2 3 4 5 6 7 8 9 10

Need advice for testing of hypotheses related to the self-citation method - nablator - 25-06-2025

For the choice of a previous word in the "self-citation" (with modification) method, even if the selection of words is non-deterministic and more or less random, humans don't act like computers, so there would certainly be a human bias in selection... detectable or not in the text of the VM? Would the detection of a bias be good evidence for the "self-citation" (with modification) method or not? I'm not sure.

Likely patterns for the selection of source-target words could be: use two (or more) consecutive words together for the next two generated words (in any order), or skip one word (or two) while reading, because it's easier to read several close words together than just one and then go to a totally different area of the page and read the next word.

Type 1a:
... source1 source2 ...
...
... target1 target2 ...

Type 1b:
... source2 source1 ...
...
... target1 target2 ...

Type 2a:
... source1 skipped source2 ...
...
... target1 target2 ...

Type 2b:
... source2 skipped source1 ...
...
... target1 target2 ...

I've only tested these patterns, with source words on the same line, target words on the same line, all on the same page. They create many more "hits" (possible source-target locations by small Levenshtein distance) than in a word-shuffled page. But we already know that word order is not random in the VM, so I'm not sure if the statistics really show bias in selection of source-target or if the word ordering biases (such as the known y_q affinity and others, known and unknown) get transferred to the similar but modified words. I would appreciate advice on how to resolve the issue. Wink

A more general human bias for the selection of the next source word(s) would be the close proximity on the page, not necessarily on the same line: proximity between source and target or proximity between sources.

Note to self: I need to try various patterns, expected and unexpected, and explain better which hypotheses I'm trying to test. Smile

RE: Need advice for testing of hypotheses related to the self-citation method - ReneZ - 26-06-2025

I will be curious to hear what you find out.

Just a small collection of thoughts from my side...

I had always understood that the main 'mode' of self-citation would be from one source to one target, but I guess that there could be a combination of modes as well.

I have often wondered about the initialisation method. After all, the self-citation explains (in a way) how to set up the 'next' word from a previous section, but I never saw anything about how to start.
Can one detect how many words are needed for the initialisation?
Only once? Once per page? Once per paragraph?

Another thing I have been curious about is how the most frequent words come about.
Are these part of the initialisation, or are they always appearing as a result of modification of previous words? So far, I have not been able to find a good answer, but your experimentation might...

RE: Need advice for testing of hypotheses related to the self-citation method - Mauro - 26-06-2025

(26-06-2025, 01:47 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.I have often wondered about the initialisation method. After all, the self-citation explains (in a way) how to set up the 'next' word from a previous section, but I never saw anything about how to start.

The seed sentence is in the 'metadata' at the top of Torsten's output files. For the one posted on GitHub:

Quote:#text.initial_line=pchal shal shorchdy okeor okain shedy pchedy qotchedy qotar ol lkar

Note: actually the last two words 'ol lkar' are then not used in generating the text, I don't know why the sentece is truncated, maybe a bug, but it's not important for the successive elaboration.

(26-06-2025, 01:47 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.Another thing I have been curious about is how the most frequent words come about.

There are probabilty parameters inside the software. Ie. this instruction (file You are not allowed to view links. Register or Login to view.):

Quote:{{"k"} , { new Substitution( new String[] {"t"}, 77), new Substitution(new String[] {"p"}, 94), new Substitution(new String[] {"f"}, 100)}},

I think it determines what can be substituted for 'k': I guess 77% of times with 't', (94-77)= 17% of times with 'p' and the remaining times with 'f'.

I think a human being would behave more of less the same way, just with greater fuzziness, and with the 'parameters' varying in time.

RE: Need advice for testing of hypotheses related to the self-citation method - nablator - 26-06-2025

Thank you for the comments, I need to discuss these ideas, it helps a lot!

(26-06-2025, 01:47 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.I had always understood that the main 'mode' of self-citation would be from one source to one target, but I guess that there could be a combination of modes as well.

I need to generate texts to see what is the effect of using consistent patterns for selection of sources. This could potentially explain how word ordering preferences are preserved, why the first and last words of lines are different, etc. Much depends on the rules for evolving words (if there are any rules, there doesn't need to be an exact algorithm). So there can be no perfect solution, only models that fit the VM better than others. Much to experiment.

Quote:I have often wondered about the initialisation method. After all, the self-citation explains (in a way) how to set up the 'next' word from a previous section, but I never saw anything about how to start.
Can one detect how many words are needed for the initialisation?
Only once? Once per page? Once per paragraph?

The transfer of words (how many is an interesting question) from other pages to the blank page creates an evolutionary bottleneck, the result is that the statistics are skewed, sometimes extremely so. This can be simulated and compared to the VM to evaluate how many are needed.

Quote:Another thing I have been curious about is how the most frequent words come about.
Are these part of the initialisation, or are they always appearing as a result of modification of previous words? So far, I have not been able to find a good answer, but your experimentation might...

Since the list of most frequent words is inconsistent from one section to another there doesn't need to be a constraining mechanism, the most frequent words are what they are.

RE: Need advice for testing of hypotheses related to the self-citation method - nablator - 26-06-2025

Examples of selection patterns (maybe) visible in some instances (not the result of a systematic search, I noticed these similarities without searching for them):

Sequential transfers with skips:

f89r1.8: qokeol.chol.qodaiin.chol.cheody.qokechy.daiin.ctheody.dam
    1 2    3 4 5
f89r1.17: qeaiin.cheyl.seey.qotey.qokeeol.daiin.ykhedy.daiin.dam
      1 2 3 4 5

Sequential transfers from one line to the next:

f103v.6: dain.shey.qokeedy.cheol.qoeeor.lshor.qoky.shedy.qokaiin.chedy.qokam
         1    2    3 4 5 6 7
f103v.7: daiin.shey.chol.chey.oteey.lkeeor.okaiin.shedy.shedy.qokaiin.ol.chedydy
         1     2    3             4 5 5 6    7

RE: Need advice for testing of hypotheses related to the self-citation method - Mauro - 26-06-2025

(26-06-2025, 09:06 AM)nablator Wrote: You are not allowed to view links. Register or Login to view.Examples of selection patterns (maybe) visible in some instances (not the result of a systematic search, I noticed these similarities without searching for them):

Sequential transfers with skips:

f89r1.8: qokeol.chol.qodaiin.chol.cheody.qokechy.daiin.ctheody.dam
    1 2    3 4 5
f89r1.17: qeaiin.cheyl.seey.qotey.qokeeol.daiin.ykhedy.daiin.dam
      1 2 3 4 5

Sequential transfers from one line to the next:

f103v.6: dain.shey.qokeedy.cheol.qoeeor.lshor.qoky.shedy.qokaiin.chedy.qokam
         1    2    3 4 5 6 7
f103v.7: daiin.shey.chol.chey.oteey.lkeeor.okaiin.shedy.shedy.qokaiin.ol.chedydy
         1     2    3             4 5 5 6    7

Not exactly what you asked for, but while reading your post above it came to my mind a procedure which could be tried, as a first step.

- Scan the VMS words starting from the last word
- Search for the first word which is 'similar' to the word under consideration, always going backward in the text, and see how the resulting matches are distributed.

Roughly, something like this:

string[] text = { ... Voynich text, separated in words, goes here...  }

int source_word_index = text.Lenght() - 1;
int target_word_index = source_word_index - START_DISTANCE;

while (source_word_index >= 0)
{

while (target_word_index >= 0)
{

string source_word = text[source_word_index];

   if ( is_similar(text[target_word_index], source_word) )
{
   // match found, log the result
target_word_index--;
break;
}
target_word_index--;

}

source_word_index--;
}

Of course one must define what 'similar' means and write the is_similar() function, I guess one could start from what Torsten did in his software.

RE: Need advice for testing of hypotheses related to the self-citation method - nablator - 26-06-2025

(26-06-2025, 03:42 PM)Mauro Wrote: You are not allowed to view links. Register or Login to view.Not exactly what you asked for, but while reading your post above it came to my mind a procedure which could be tried, as a first step.

Thanks, but I already wrote the program to count patterns of type 1a, 1b, 2a, 2b in each page a few months ago. Now I am looking for something to do next. Wink

RE: Need advice for testing of hypotheses related to the self-citation method - Mauro - 26-06-2025

(26-06-2025, 04:27 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.
(26-06-2025, 03:42 PM)Mauro Wrote: You are not allowed to view links. Register or Login to view.Not exactly what you asked for, but while reading your post above it came to my mind a procedure which could be tried, as a first step.

Thanks, but I already wrote the program to count patterns of type 1a, 1b, 2a, 2b in each page a few months ago. Now I am looking for something to do next.

I'm not sure I understand what you mean by 1a, 1b, etc. but it looks interesting. Did you post the results? I may have missed the thread.

RE: Need advice for testing of hypotheses related to the self-citation method - nablator - 26-06-2025

(26-06-2025, 05:06 PM)Mauro Wrote: You are not allowed to view links. Register or Login to view.I'm not sure I understand what you mean by 1a, 1b, etc. but it looks interesting. Did you post the results? I may have missed the thread.

See the first post in this thread for the simple word selection patterns that I counted.

I didn't post the results because I didn't know what to do with them. There are more of these patterns in actual VM pages than on word-shuffled pages (but there are exceptions), what does it mean? I'd like to test the hypothesis of word selection patterns but I don't know how.

Now I think I should try other patterns to get a better idea of which patterns and which pages work best/worst. Then I will post the program and results. The next step will be to generate pseudo-Voynichese using these word selection patterns, to understand the consequences.

RE: Need advice for testing of hypotheses related to the self-citation method - ReneZ - 27-06-2025

(26-06-2025, 08:34 AM)nablator Wrote: You are not allowed to view links. Register or Login to view.Since the list of most frequent words is inconsistent from one section to another there doesn't need to be a constraining mechanism, the most frequent words are what they are.

For me it remains an open question, perhaps precisely because there should be no constraining mechanism. In normal text, the most frequent word (whatever it is) comes up most, because it is used a lot.
In an auto-citation environment, it comes up (like every other word?) because it is similar to a previous word on the page. But it comes up more often. How is that? This is something that can be checked.

I guess the auto-citation should allow for words to be simply copied from previous ones without change...