The Voynich Ninja
The Textual Work of August Walla, mentally disabled artist - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: The Textual Work of August Walla, mentally disabled artist (/thread-3123.html)

Pages: 1 2 3 4 5


RE: The Textual Work of August Walla, mentally disabled artist - Ben Trovato - 16-05-2020

It took longer than I expected (due to limited access to libraries), but now I have finished the compilation of text samples from August Walla's works. It's about 15.000 characters extracted from paintings, drawings, and documents, where he uses "foreign" or "encrypted" language. I did not include "plain german" text, of which there would be a lot more...

Alas, it was not possible to arrange the texts in chronological order, as in most cases, the date of origin is not passed down. But there are two exceptions: the works taken from "Feilacher Bd. [=vol.] 3" and "Feilacher Bd. 4" should be strictly chronological within the respective volumes.

I have labeled two groups of texts separately: those where Walla himself claims to write in Latin (="CL", Corpus Latinum) or in Russian (="CC", Corpus Cyrillicum). Each of them is roughly one third of the texts, but I have no idea whether this is sufficient for separate analysis. I think that especially the "Russian" texts might be interesting, because Wallla was forced to deal with cyrillic letters from his dictionary which he had no knowledge of, so he replaced those letters by "similar" latin ones. There are also texts which include "translations" or word lists in the style of a dictionary. The labels, and the source for the texts are found in squared brackets preceding each text. Within the text, "[@]" indicates that the text is interrupted by the drawing here. Line breaks, capitalization and all other special characters are kept to the original appearance.

... feel free to use this for your research, but then I'd ask you to feel obliged to share your findings!


RE: The Textual Work of August Walla, mentally disabled artist - Alin_J - 21-05-2020

Ben, 
So, if I understand correctly, everything that is not inside the square brackets are verbatim what Walla wrote, correct? 
If it is, sorry, my confusion was about the '=' signs (equal-signs) most often found at the end of lines in Walla's text. Because it's common in Voynich transcription formats to use '=' as a line-end marker.


RE: The Textual Work of August Walla, mentally disabled artist - Ben Trovato - 21-05-2020

Yes, your understanding is correct.

Walla uses the "=" sign mostly as a hyphen at line breaks. But sometimes, when he runs out of space, he places the "=" a few characters before, whereever he would find a void between characters or words. It has nothing to with Voynichese :-)


RE: The Textual Work of August Walla, mentally disabled artist - Alin_J - 31-05-2020

Here are some results accomplished this far on the WALLA.doc text file. These results are mainly about the inter-word relationships and word sequences. I will probably present further analyses later, but then mostly on the intra-word relationships (characters and character relationships). First of all I must comment on the limited size/length of the Walla corpus that can have impact on most of the statistics. I have not included the ’=’ (equal) sign which Walla used extensively, but instead treated it as a word-spacing marker.

Some basic statistics:

Average word length (counting each unique word only once): 7.29446 characters, standard deviation: 4.5279
Average word length (text aggregate): 6.62016 characters, standard deviation: 4.24014
Number of unique words: 1355
Total number of words (tokens): 1885
Number of lines of text analyzed: 974

First of all, the type-token ratio (number of unique words/total number of tokens) is extremely high (72%). This figure could possibly have decreased if the text would’we been running longer. But let's anyway jump to the results. I have run analysis to identify repeated sequences of 2-6 words in length. Most meaningful texts in any language contains a lot of these, but the most puzzling thing about the Voynich manuscript is that the repeated sequences 3-4 words or more in length drops of dramatically, while the number of repeated 2-word sequences (pairs) corresponds to what you find for many natural language texts (still a bit low in numbers, but not incompatibly so). Below is a graph over amount of repeated n-tuples (n = 2 to 6) in percentage for different texts. The files are some of the text files in Brian Cham’s corpus. I have also thrown in some files from the Project Gutenberg (ebook) website (prefixed by ”pg_”). One of these is an english poetry sample, and another is an encyclopedia sample in english. These were mostly used to illustrate the difference between various types of texts. The results from the Walla text is furthest right in the graph, and next to it is the Voynich Manuscript (VM). All of these texts that were longer than about 40 000 tokens have been truncated to the first 40 000 tokens, both to save time and to make their statistics more compatible with the VM which has this size.

   

As you can see there is large amount of variation in the repeated sequence-data. Although no text except the N-GRN (Guarani language) sample contains a complete lack of 5-6-tuples, if you disregard the one-letter/glyph-sequences (it is dubious to treat these glyphs as as individual words) in the circular diagrams in the VM, then the VM also completely lacks 5- or 6-tuples. It is difficult to see the low numbers in the diagrams but you can also have a look at the raw data (attached ’data.xls’ file). The N-GRN  sample contains text that is of deliberate poor quality which is the probable reson for the absense of these n-tuples.

The Walla text could be seen as a bit unusual in a way. As expected, because of the high number of unique words, the number of repeated pairs is low, however contradictary to this is the comparatively high number of longer sequences, which is more compatible with texts of known natural languages than with the VM. Additionally, Walla wrote long sequences in all upper-case as well as all lower-case both, and since this analysis was case-sensitive you would expect an even lower number of all n-tuples than if the whole text were mostly either-case. So, even though Walla invented his own words which were both many and significantly longer than words in most languages, he managed to write identical longer sequences repeatedly. It should be mentioned that all of the repeated word-pairs/word-n-tuples in Walla’s text were repeated only once, except two pairs which were repeated twice each. These are ”Alois Walla” and ”Walla George” – both his name?

I am hesitant to draw any more conclusions as of now because of the limited text size, but I am still working on/thinking of what more I can do with it. Ben, more text would be very welcomed if you have time.


RE: The Textual Work of August Walla, mentally disabled artist - Ben Trovato - 31-05-2020

Thank you for the effort!

I don't quite understand the decision of regarding the "=" sign as a word separator. Walla clearly uses it so connect parts of a word beyond a line break, so wouldn't it be reasonable to just delete it and treat the both parts as one word? That was Walla's intention ... As he separates words in a quite arbitrary way, this would - maybe significantly - decrease the number of unique words and increase repetitions.

More text of this kind will be hard to find. As I said, I restricted myself to the "foreign" and "fantastic" language in Walla's work. I think I covered most of what is available in print (4 catalogues of his Work, 1 monography on Walla and one 1 anthology on the Gugging artists). There may be some more images around on the internet, but not in abundance. There is much more of his production, but not published (there is no online archive from the Gugging museum).

What I can do is transcribe some of his other writings. He was writing pages and pages on an old-fashioned typewriter, using German language, of all different kinds of text. I could present some samples and then we could decide whether it makes sense to dwell into this.


RE: The Textual Work of August Walla, mentally disabled artist - Alin_J - 31-05-2020

(31-05-2020, 06:45 PM)Ben Trovato Wrote: You are not allowed to view links. Register or Login to view.Thank you for the effort!

I don't quite understand the decision of regarding the "=" sign as a word separator. Walla clearly uses it so connect parts of a word beyond a line break, so wouldn't it be reasonable to just delete it and treat the both parts as one word? That was Walla's intention ... As he separates words in a quite arbitrary way, this would - maybe significantly - decrease the number of unique words and increase repetitions.

More text of this kind will be hard to find. As I said, I restricted myself to the "foreign" and "fantastic" language in Walla's work. I think I covered most of what is available in print (4 catalogues of his Work, 1 monography on Walla and one 1 anthology on the Gugging artists). There may be some more images around on the internet, but not in abundance. There is much more of his production, but not published (there is no online archive from the Gugging museum).

What I can do is transcribe some of his other writings. He was writing pages and pages on an old-fashioned typewriter, using German language, of all different kinds of text. I could present some samples and then we could decide whether it makes sense to dwell into this.

Ok, sorry. I thought leaving them in made a number of words unreasonably long, so I treated them as separators. Treating them as you said could convert some pairs to single words, some triplets to pairs, and so on, and as well decrease the amount of unique words... I can do what you said, run the analysis again and report any differences here in a short while. 

Yeah, I think it would be a good idea to also examine the writings in German too, if you have the time. Thank you.


RE: The Textual Work of August Walla, mentally disabled artist - Alin_J - 31-05-2020

I did what you said, but the changes in the output wasn't large enough to make any noticeable difference in the graph, so I don't think I'll bother to update it. 
The changes to the basic statistics were as follows (the old values are given in parenthesis):

Average word length (counting each unique word only once): 7.39121 (7.29446) characters, standard deviation: 4.60227 (4.5279)
Average word length (text aggregate): 6.68255 (6.62016) characters, standard deviation: 4.31132 (4.24014)
Number of unique words: 1342 (1355)
Total number of words (tokens): 1868 (1885)


Changes to the number of repeated phrases:
Number of pairs decreased from 74 to 70
# Triplets decreased from 42 to 39
# Quadruplets decreased from 30 to 28
# Quintuples decreased from 22 to 21
# Sixtuples decreased from 18 to 17


RE: The Textual Work of August Walla, mentally disabled artist - Ben Trovato - 31-05-2020

wow! I wouldn't have thought this had such a little impact. I remember typing millions of "="-signs...

At first, it seems odd that the number of repeated phrases is going down. But it's coherent with how Walla creates new words by sticking two (or more) previous words together, I guess - you actually have more repetitions if you split them again. Did I get this right?

I hope i'll find the time to transcribe some pages in the next few days. I think you are a bit familiar with German, right?, so you can at least make some guesses what the text is about.


RE: The Textual Work of August Walla, mentally disabled artist - Alin_J - 01-06-2020

(31-05-2020, 09:08 PM)Ben Trovato Wrote: You are not allowed to view links. Register or Login to view.wow! I wouldn't have thought this had such a little impact. I remember typing millions of "="-signs...

Yes, it is a bit strange. I have to check my code to see if I did something wrong with the Walla file...


RE: The Textual Work of August Walla, mentally disabled artist - Ben Trovato - 01-06-2020

I would also suggest not to make the search case sensitive. Maybe this would decrease the unique word percentage to a more reasonable number.