![]() |
qo and the 15.5% factor - Printable Version +- The Voynich Ninja (https://www.voynich.ninja) +-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html) +--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html) +--- Thread: qo and the 15.5% factor (/thread-3380.html) Pages:
1
2
|
qo and the 15.5% factor - Voynichgibberish - 03-10-2020 How rare or not is it for a text of about 34,000 vords or words to have a prefix like You are not allowed to view links. Register or Login to view. which is present for about 15.5% of a corpus?
If any of our researchers were able to scan several different languages like Latin, German, English and lastly Italian would it be common to find a prefix which stands out with such a large foot print in a text like You are not allowed to view links. Register or Login to view.? To be fair if one were to study the statistics for the other various languages they should use only 34,000 words from that text for each language.
I know I'm being a bit bias for the language assumptions to test against, however the VMS was found in Italy so I presume if its not gibberish or it is gibberish in the Zipf presentation of my other thread that it is most likely Latin. However, that being said, if the Author just invented vords and attached them to Latin words maybe the You are not allowed to view links. Register or Login to view. phenomenon is of that reason.
Yet if we do a study and this is common then we all learn that You are not allowed to view links. Register or Login to view. is fine.
Here I am asking again for anyone who has knowledge regarding about how to return or print just the first two characters from words of a text file via python and get a count for those token pairs. As I would like to test this myself, although I'm just a beginner in python, I don't know how to write the code.
RE: qo and the 15.5% factor - MarcoP - 03-10-2020 Hi Voynichgibberish, the script you want is perfect for a beginner python developer. I encourage you to write it yourself. The behaviour of qo is a special case of a wider subject known as "character entropy". Most of what you posted on the forum (Zipf's law, the "gibberish" idea, character entropy) is discussed by Bowern and Lindemann in You are not allowed to view links. Register or Login to view.. You should definitely read it with care. Their references will give you further information about the main topics, if needed. RE: qo and the 15.5% factor - Voynichgibberish - 03-10-2020 These bigrams were stripped from Genesis and 2 Samuel of the Latin Vulgate Bible. These are all the first two characters from the words within these texts for a total of 34049. "et" was the highest stripped bigram return with 2586 for about 7.5% of the text. I will look at English, German and Italian to see if You are not allowed to view links. Register or Login to view. is an anomaly. I believe You are not allowed to view links. Register or Login to view. was used to shorten Latin words during the copy process, meaning[font=Eva] You are not allowed to view links. Register or Login to view. [/font][/font]really shortens the length for what was 4 tokens. I simply believe the Author of the VMS found a lazy way to not right out all text and to hide Latin as the language. If I'm correct and I find You are not allowed to view links. Register or Login to view. to be an anomaly against other texts and it is substantial like it is with Latin then to me this would point to the VMS being a gibberish text. Here is the code to find bigrams or the first two characters from words: Code: # dictionary to store count of each word (2 characters) eg. "an": 2 Code: in 1481 RE: qo and the 15.5% factor - Voynichgibberish - 04-10-2020 Everyone I'm sorry if this info is lame and I believe I'm making too many assumptions about a text that is mission impossible. What if's and I believe and It could be are statements that need facts behind them and that's what I'm trying to do. I'm probably not the only one who has gone down the You are not allowed to view links. Register or Login to view.hole before. Now prepare for this assumption before we look at German and Italian, could You are not allowed to view links. Register or Login to view. be "th"? Has anyone ever stated this before? Is this an assumption until I look at the other two languages, probably? If I'm right about "th", does this imply the VMS is not gibberish and in Middle English? Was the Voynich Manuscript made by a doctor during the Medieval ages? These are assumptions, with a little stat though. I have to say I feel like a fool for I found that "th" in the WyCliffe Bible for Jeremiah represented 5,740 as number with a text size of 34,054 and that was 16.8%! What does this tell us all? I'm wrong about You are not allowed to view links. Register or Login to view.[font=Eva] [/font]as being an anomaly or a two token that shorten vords that were 4 tokens. I have to say if one wants to look at the first two characters of all the Voynich Glyphs you have to remember some glyphs are represented by two English letters. So for comparison of languages please only target the VMS glyphs that only have a one to one relationship with Glyph to English letters. Here is a distribution for the first two characters and sorry some single characters are in there for the Voynich! Distinct Vords total 6818 Remember these are distinct vords so some vords have been used more than once so the count will only show that. If I could use 34,000 vords than the counts would be higher, but obtaining that is what I don't have. Code: a 1 RE: qo and the 15.5% factor - aStobbart - 04-10-2020 Just realized that there is not a single instance of 4o in You are not allowed to view links. Register or Login to view.. The first one appears in f1v. Has anyone else noticed this before? If it stants for "th" and the language is medieval english what are the odds it will be missing from an entire page ? RE: qo and the 15.5% factor - DONJCH - 04-10-2020 (never mind) RE: qo and the 15.5% factor - Voynichgibberish - 05-10-2020 I just remembered if anyone wants me to post all the returned two characters for Jeremiah from the WyCliffe Bible just reply with the request. I just finished the stats for looking only at the first two characters of every word for about 34,000 words from Dante's Inferno. "ch" topped the list at 1,813 for a text of 34,093 words and that statistic is 5.3% of those words. Now I have to find a German text with about 34k of words. "th" is looking like a strong candidate for You are not allowed to view links. Register or Login to view. ! Code: Ne 17 RE: qo and the 15.5% factor - Voynichgibberish - 05-10-2020 Here are the statistics from the Luther German Bible for Genesis. "un" don't forget to add in "Un" has a value of 2,862 for 8.4% of 33,812 words was in the Luther German Bible and stood out as the highest two character return from all those words. So what's the take away from these four languages that are often looked at when compared to the Voynich Manuscript. If I had to study one language on a hunch and this is all the information I had than I would choose Middle English, because "th" does stand out and compares to You are not allowed to view links. Register or Login to view. and fits nicely. Yet what if You are not allowed to view links. Register or Login to view. is a null? Does the Voynich Manuscript point to its being enciphered because "o" is the most used glyph so if it is "h" in Middle English. The Voynich Manuscript should be a substitution cipher, but what type would it be? Is it a polyalphabetic like an Alberti Cipher or just a Monoalphabetic Substitution Cipher. At any rate could one of the most used vords like "oe" translate to "he"? I guess my handle seems a little silly now, maybe not ![]() On a deeper note without a cipher yet and further study I just looked up a Middle English word for "otol", which is a label for a star, empty pipe, and a leaf. If you follow the rule than hihe could work for "otol" and here is its You are not allowed to view links. Register or Login to view.: Code: Am 28 RE: qo and the 15.5% factor - MichelleL11 - 05-10-2020 (04-10-2020, 03:43 PM)aStobbart Wrote: You are not allowed to view links. Register or Login to view.Just realized that there is not a single instance of 4o in You are not allowed to view links. Register or Login to view.. The first one appears in f1v. Has anyone else noticed this before? If it stants for "th" and the language is medieval english what are the odds it will be missing from an entire page ? If the VM plaintext had the same frequency as the Wycliffe Bible, about 16.8% of all word starts are “th,” based on Voynichgibberish’s data. If words in the VM are actual words in the plaintext, for f1r, which has 210 words (according to the Takahashi translation), on average (if the words, with the same frequency starts as the Wycliffe Bible were randomly placed) you would expect about 35 words in You are not allowed to view links. Register or Login to view. to start with “th” (e.g. start with qo). I think the problem with figuring the "odds" of this happening relies on the "qo" words being evenly distributed. A quick look on You are not allowed to view links. Register or Login to view. (qo search, then hit the two lines button to show the histograms) reveals that does not seem to be the case. About 20% are in Currier A and about 80% are in Currier B. There is a moderate, constant use in the “herbal” texts, little to no use in the “astrology” and “zodiac-like pages” and then extremely high use in the remaining sections of the manuscript. This could be argued to be underlying language related, related to whatever differentiates Currier A from Currier B if that is different than an underlying language, or perhaps, but less convincingly in my mind (assuming a constant language), subject matter related. I find this less convincing because I don’t know of any reason why the “th” sound would not start words related to certain subjects, unless perhaps the subject was dominated by loan words that do not use that sound at the beginning of words. Which could be the case . . . . But it should be noted that You are not allowed to view links. Register or Login to view. is not unique in its complete lack of qo words. There are a total of six folios that have no qo words on them (f1r (210 words), You are not allowed to view links. Register or Login to view. (plant with only 3 words), You are not allowed to view links. Register or Login to view. (101 words), f72v1 (102 words), f72v3 (121 words) (f72v for link), and f72r3 (169 words) (f72r for link). Admittedly, You are not allowed to view links. Register or Login to view. does have the most text of all of these, although its topic is usually surmised to be some sort of intro page – so the fact that some “introduced” topics somehow avoid the use of qo in such a summary, but use it to such extent in the actual pages, does seem a bit suspicious. I realize this discussion can provide no conclusion. But, like a lot of the VM questions -- the answer seems to be maybe. RE: qo and the 15.5% factor - Voynichgibberish - 05-10-2020 Dear MichelleL11, Quote:If the VM plaintext had the same frequency as the Wycliffe Bible, about 16.8% of all word starts are “th,” based on Voynichgibberish’s data. I decided to post the text of Wycliffe Bible for Jeremiah of only 34,054 words. So "th" and remember to add "Th" was 5,740 for 16.8% of the text. MichelleL11, The Voynich Manuscript contains between 34k-35k vords. Jeremiah WyCliffe Bible: Code: JE 1 |