If you look at the herbals like an unit being displayed, you could split a herbal drawing up into:
top with flowers
leafs top
branch (small)
(fruit or) seed
main branch (thick)
----
bottom base, root crown
root
For comparison with Arabic counterparts, You are not allowed to view links. Register or Login to view.
If you count the parts on the left and right side of each herbal, you will notice that there are balanced but also un-balanced herbals.
Sometimes you counts 4 big tubers on one side and 5 on the other, 7 leafs on one side and 6 on the other side etc.
Some random picks.
f10v: 2 flowers, leafs left: 3, leaf right: 5.
f16r: 1 big flower in middle, 2 left flowers, 2 right, 3 branches, on the left 3 with 3 leafs, on the right 4 branches with 4 leafs.
f41v: 4 branches left, 3 branches right. 2 root tubers balanced. 4 leafs left, 3 right. 1 long root thing.
Although there are also balanced herbals.
For example f33v: 3 bulbs, 3 flowers. 3 leafs left, 3 leaves right.
f16v: 1 big flower in middle, 2 left, 2 right
f48r: 3 flowers, 1 longer on the left, 5 branches left, 5 branches right, 6 green things on both sides, 3 root things balanced
On the contrary, in most Dioscorides and pseudo herbals that I've seen most herbals show a balanced drawing.
Hard figures I do not have, this is based on a general feeling during the last year.
If there is a (hidden) significance in showing a balanced or unbalanced herbal, and you and such information to share, please share that.
If you don't think it is relevant, please mention that together with any of your research references, would be much appreciated.
funny thing with the corpus regex (ace, btw. thx). google translate (yes, again), thinks the chol example is welsh. welsh dictionary tells me "from her lap", old english welsh thesaurus tells me chol comes from arab. "chalel", "to expect" (I expect it means expecting a child).
Hello everyone!
I'm a herbalist from the Adriatic area and i would like to join in on the discussion of the VM MS408 on this forum, seems the community is fairly active and solid, cheers.
as the name of the thread says this is about plant id. namely items on folios F3r F3v You are not allowed to view links. Register or Login to view.
Asplenium billotii
At the Frankfurt manuscript fair, 11-15 October the company just told me. They will be showing off their very first handwritten copy.
Anyone nearby to take a look? I can probably bag you a chat with them.
The official presentation is in November.
I wonder how useful it would be to have an agreed set of meta characters as part of EVA.
I have seen others use capital letters to stand for groups of characters which are similar in some ways. For example [E] can be used to stand for [e, ee, eee], ignoring exactly how many [e] characters are in a row. So [okEdy] could stand for [okedy, okeedy, okeeedy]. This is useful in talking about more general and abstract word structure.
Some proposals for meta EVA:
[B]: any bench [ch, sh]
[E]: any [e] sequence [e, ee, eee]
[G]: any non-bench gallows [k, t, f, p]
[I]: any [i] sequence [i, ii, iii]
[N]: any sequence ending [n] with any number of [i], so [n, in, iin, iiin]
[R]: any sequence ending [r] with any number of [i], so [r, ir, iir]
After reading You are not allowed to view links. Register or Login to view., I was curious to understand more of Hidden Markov Models and language analysis.
I found a reference to a 1980 paper by Cave and Neuwirth. Apparently, they experimented with several HMM configurations, mapping the symbols more likely to be produced by the single states to specific phonetic properties.
A python implementation of their experiment is available online: You are not allowed to view links. Register or Login to view.
Most of the theory escapes me, but I have run some simple experiments with the Python software.
In order to reproduce something vaguely similar to what Reddy and Knight did, I set the number of nodes to 2. I made tests with Latin, Italian and English and the algorithm is rather consistent in assigning vowels and consonants to two different states. Since the initial parameters of the HMM (the transition probabilities between the two nodes and the probability for each node to generate each symbol) are initially randomly set, the optimization phase (Baum–Welch) can produce different results in different runs.
Here is an example of what I get for Latin (the XVI Century Matthioli herbal in You are not allowed to view links. Register or Login to view.).
These are the results for what the python implementation calls “matrix A”: they are the probabilities with which the model passes from the current state to the next state:
Code:
0 1
0 0.203 0.797
1 0.785 0.215
The matrix is almost symmetrical. Both states have a higher probability of passing to the other state than remaining in the current state. The optimization has configured state 0 to emit consonants and state 1 to emit vowels: in Latin consonants and vowels tend to alternate, so the model alternates between the two states.
The color diagram above is the visual representation of Matrix B (the probability of emission of each symbol for each state) as produced by the python script. The top row corresponds to state 0, the bottom row corresponds to state 1. It should be clear that the two are complementary and that consonants are only emitted in state 0, while vowels are only emitted in state 1. It is also interesting to observe that space is the only symbol that is likely to be emitted by both states: this is because Latin words tend to end both with consonants and vowels: the probability of a consonant ending is higher, so the probability of state 1 emitting space is also higher. A typical pattern might be:
Code:
State1 State0 State1 State0 State1 State0 State1
SPACE Consonant Vowel Consonant Vowel Consonant SPACE
I have drawn by hand the black and white diagram, in order to summarize the configuration produced by the optimization algorithm. Symbols are sorted by decreasing emission probability. The 0-0 and 1-1 loops correspond to the generation of consonant-consonant and vowel-vowel digraphs.
I decided to go after otol otol. In essence, I wanted to see what affixes otol could have; and whether those affixes serve with other possible functors (You are not allowed to view links. Register or Login to view.).
I used my regex parser to You are not allowed to view links. Register or Login to view.. This tool is in beta, feel free to use it but don't rely on me not changing it.
47 Unique values found
You are not allowed to view links. Register or Login to view.
Prefixes are (ignoring the damned !) (count is one unless specified - count includes duplicates with distinct suffixes):
p,q (59),che (2),ch (10),sh (4),ksh (and one t! on f80r.P.28 which I will ignore)
Suffixes are (again ignoring the damned !) (count is one unless specified - count includes duplicates with distinct prefixes):
om,s,o,chey,osheey,dy (12),dyl,y(3),cheo,ol (2),oaiin,ches,dos,or (2),chcthy,olees,am (2),oaram,chd,aiino,dyl (2),fcho,arol,sar,ky,chy,chd
It appears 74 times with a prefix ignoring suffix. 62 times with a suffix ignoring prefix.
18 times with a prefix and NO suffix. 49 times with a suffix and NO prefix.
13 times with a prefix and a suffix. 84 times by itself.
So, taking the numbers above, we can postulate that otol is a function word; and furthermore, that it takes modifications with the affixes. This is logical, because it can appear either by itself, or in conjunction with affixes - they are obviously modifying the core word, otherwise, what is their function?
And yes, otol may very well be a fusion of ot/ol, except for the fact that it appears primarily as a single entity; and both ot and ol are affixes which do not run concurrently inthe corpus (according to a very quick visual examination by me). OK - so this is circular logic. But we have to draw the line somewhere. So the evidence points towards otol being a separate word.
I run a search for the prefixes in the corpus to see whether they are attaching themselves to other words.
They all appear thousands of times as prefixes. I'm not counting them.
I repeat the search for the suffixes in the corpus.
Again, they are all popular as suffixes.
But what is even more interesting is that the suffixes appear, at first glance, to have an order. Ch can be joined on with /ey,/eo,/es,/ct,/d, etc. It can then have further suffixes plugged in, so we see chcthy ch/ct/ch/y. Etc, etc. Prefixes don't do this as much - we get infrequent versions of them, but not such long constructs as the suffixes.
I take another word. Chol.
Chol is a more popular word, and has 165 distinct forms in the corpus. All of the affixes of otol appear in the list, plus quite a few more. Here is the You are not allowed to view links. Register or Login to view..
So to sum up: we have a popular word (otol) which survives perfectly well by itself. Sometimes it has a prefix; sometimes a suffix. The appendages can be tagged onto one another, in a rhythm. These appendages appear tagged onto another even more popular word (chol).
It's a bold thought but - I'm going to suggest that we see here is evidence of a fusional grammer, where we get a content-word which is modified by prefixes and suffixes to give context to the original content-word.
I'm not just talking about declensions; this is evidence of a strongly synthetic language. I take as an example Spanish, with the perfectly natural word grabandomelo. You take the gerund of to record (grabar), add it to me and end up with this fuser word. It's not exactly the same process as we're seeing here - there are no prefixes and the Voynich verb, if it is a verb, doesn't seem to decline - but it's a similar process.
The next step is to build up a list of the affixes and see if we can develop a comprehensive index to them.
Is there already a thread discussing glyph shape similarities between the different voynich glyphs?
Something like a relationship diagram or periodic table of glyphs.
I know about a blog post from
[--have to search for the link--]
which classifies glyphs into three types (c \ and E ). This seems to be a good beginning for relationships/similarities of glyphs.
I think the first step in reading voynich is to classify the glyphs again before transcribing them very precisely:
- Pen stroke order
- classification which is written by which scribe and their personal differences
- making a really big dictionary with examples of the glyphs and ligatures (the atoms of the script)
After that it is possible to construct a metric which defines similarity on glyphs/ligatures.
Perhaps with this toolset we can better correct transcription errors in a computational way.
When this classification is done the next step would be to analyze the tings which seem to be the syllables. There is a blog post of JKP which discusses Janus Pairs. I think they are good candidates for syllables.
You are not allowed to view links. Register or Login to view.
Not to looking at exact matches in transcription but for similar words. definded by theese metrics. The Levy-Metric but modified to take in account that the transcription may have some errors.
==================
The voynich script still reminds me of indic script systems. Only a little flavour but perhaps there are some rules which can also help to see more patterns in voynichese?
As a reference for scripts, glyph shapes and so on, comparing the scripts of the world, this website seems to be really good as a reference:
You are not allowed to view links. Register or Login to view.
Posted by: Koen G - 21-09-2017, 09:30 PM - Forum: Voynich Talk
- No Replies
These parts from our talk with Stephen Bax are about his thoughts on the MS and its study in general (i.e. not specifically focused on his 2014 paper).
[David] Part of the problem with the VM is that it *looks* common enough for people to start trying to identify it, and then build it up on top of their own experiences. But of course, as you say, all we have to is completely eradicate our own cultural references and start afresh with it, building it up without any preconceptions.
[Stephen] That's what I call the pick and mix: you pick the thing that you identify, and too many people email me, literally every week, saying: "I hope I am helping you, and I think this letter looks like a particular letter in Korean." Okay, you picked something, but then the danger is that people stick with that identification: it must be Korean, or it must be Icelandic, etc. And then they try to force home that conclusion, instead of just taking the next step and being flexible: "that letter looks like Icelandic, but the next one doesn't, what am I gonna do about that?" You got to be very critical of your own position first of all. That kind of skepticism is the only way we're going to get forward.
[Koen] One thing that contributes to that as well is that actually the script is really easy to read. So it's accessible for people who normally can't access such manuscript.0 For example if you read a manuscript in a cursive script, you can't even see what's written there if you have zero experience with manuscripts. But in the VM you can easily distinguish the separate glyphs and tell them apart. Do you think that might also tell us something about the people who made this? Because the glyphs do look similar to Latin glyphs to the extent that people think they must have been familiar with them.
[Stephen] Enough of them do look similar to Latin, to make people think: ah, there's a Latin element to it. But yet again there are enough which are unique and so completely different and have a Caucasian or Oriental feel, to make us question that as well. My personal theory - completely in the world of speculation - is that the MS is written quite confidently and consistently - obviously in different hands. But it looks to me like this was not the first MS which was written using this script. Cause if you've ever tried to make up a script, you start doing things, you change it, you scratch it out... and there's not much evidence of that in the MS, so it seems to me that the script is probably not in the first stage of the development in the VM, probably the second state of development.
[Koen] So you want to study it as a product of a culture?
[Stephen] That's the way I see it, the way that it's written is confident. Take a single page of the Voynich, one of the herbal pages. You'll see it's fairly confidently written and quite neatly witten as if somebody is quite fluent in the production of that text. It's not for example so laboured that it must be a copy, with every single letter copied laboriously; there's an element of fluency within the script. That makes me think it's possibly a second level of development of the script, which they are using to product a compendium of knowledge for a particular group of people.
One theory that I haven't written out in great detail, about the page which I call the alphabet page, folio 57. There's a circle of letters. What I'm quite interested in, is that some of the most frequent symbols in the VM are not within that circle! One possibility which has not been thought about elsewhere, is that what has been written down might simply be the consonants only. That's exactly what you would do with an abjad in Arabic language for example. You don't write the vowels down when you write the alphabet. There is the glyph which I have interpreted as "a", but this is a glottal stop, considered a consonant in Arabic. This is one possibility, which again makes me think that this was a script in some development.
[David who wasn't paying attention] So you think it' was an invented language?
[Stephen] No! An invented script!
[David] Sorry, I thought of that as I said it!
[Stephen] It's a very important difference because I obviously think it is an invented script, there can be no doubt about that. But then, all scripts are invented. If you think of Glagolithic script or Rongorongo, somebody sat down and developed this script, as for example with Armeian. A monk developed the script for the Armenians. The language had been existing for hundreds of years, but he invented a script to encode that language. That is how I see the VM script coming into existence. I would still hope to find that the language underneath the invented script is something we can identify.
[Koen] Would you agree that the person or persons who invented this script were familiar with other scripts?
[Stephen] Yes, absolutely. That's the case with Armenian. The monk who invented the script borrowed from other scripts, but he then made it work for the Armenian language features.
[Koen] So the reason to develop the new script was linguistic, because they thought this language is different, so we need to add new symbols or change them?
[Stephen] Yes, it can be linguistic, but it can also have to do with pride, or the idea that you want your own - we are a proud nation of Armenians, we would like our own script for ourselves, then it's taken up by a few monks and scholars and the people more broadly say, for their national identity: "this is going to be our script". and they teach it in schools and so on.
For me, the way to understand the VM is that is was developed for a particular group of people, for, as you say, particular language needs. But it could also be reasons of pride or secrecy. They may have been a persecuted group for some reason, and they decided to encode their own language in a way that was just for them. There are a lot of possible motivations. But as I see it in historical terms, the next stage didn't happen. The people who were using it died out or it was simply never taken on. Or, another thing, it could be that the script was so faulty that they decided to revert to Latin script.
[Koen] Or that their documents stayed under the radar and are now gone, and that the VM is the only survivor that we have.
[Stephen] It could be that, and it would be great if one day we find another sample of Voynich script.
[Koen] With a manual attached to it.
[Stephen] We're dreaming!
[David] A rosetta stone! But, when we look at the actual glyphs. They are very positionally aware, and you picked up on EVA [k], which you've also given the sound value of /k/, and sometimes it appears in front of the word [oror] and sometimes it doesn't. How would that fit into your reading of the word "juniper" as a noun. What's that glyph doing - is it modifying the noun in some way?
[Stephen] We don't know, We're still grappling with the gallows characters, trying to understand what exactly is going on, because they very often appear as the first character in a page, but they very often appear down the page within another word as well. The first line tends to have a proliferation of them, as if they add some kind of decorative element to that line. If that's true, then that implies that they are sometimes written as gallows, but further down they are written as a different character.
To be honest, we still don't know what these characters are doing. But the identification of [oror] as the word perhaps meaning /arar/ for "juniper" is more than that. It's looking at the distribution of the word throughout the MS, particularly on the folio where I've identified the possible juniper plant. The final /r/ seems to be differentiated when it appears at the end of a line. That causes some confusion since it seems that there are lots of r's in the script, but we'll come to that in a minute.
Occasionally there's a glyph in front of [oror], which could be... I mean, the starting glyph in some of the pages could just be a character indicating the beginning of the page, or a discourse marker or writing marker. But we can't be sure about that, the jury is out on that one.
[Koen] You said that the sound represented by gallows could be represented by a different glyph later on. Would that make them similar to the way we use capitals at the beginning of a sentence?
[Stephen] It could be, although it doesn't seem like a very consistent thing. If you look at the way they are used. Also, on the alphabet page, they seem to be there in the list of letters, consistently as a single letter. So I can't really throw much light on that. In the Roman alphabet, the k-symbol is quite tall, so it wouldn't be entirely surprising if this also represented a /k/. But again, that's speculation.
..............................
[David] One think people often mention is the line as a functional unit (LAAFU) which was first mentioned in the 70's by Currier. So you'd say that's symptomatic of what we've just been discussing?
[Stephen] It's quite interesting this LAAFU. Yeah, you could say that. If you see for example at the end of lines there are flourishes in terms of the script. But again you got to be careful with that. The line might be a functional unit in terms of the *script*, but that doesn't mean it's necessarily a functional unit in terms of the meaning underlying it. It could be a decorative element, saying that basically it's just a paragraph of text. But in order to decorate it at the end of each line, they make some kind of flourish. So it could be that when we look at it, we say "oh, this line is a functional unit", but if we then assume it's a functional unit in terms of meaning, we're making a mistake.
[David] That's a problem with the statistical analysis of the text, we get so tied up with saying "this glyph appeared here", we don't consider it in the way that you're formulating.
[Stephen] Right, the statistical analysis of the text - people pay a lot of heed to it. For example, it seems to follow Zipf's law and this kind of thing. But until we actually understand what each of these symbols means and is, we can't really transliterate it properly. There's the example of the minims: in Latin script you have a slash for the letter "i", but you then can put two of them together for the letter "n", and three for "m", without joining them as we do now. So you then have three slashes representing one sound /m/. So how many letters do we actually have? When we see in the VM three minims, how many letters do we have? Until we know that, we can't really transliterate it accurately in a form that will allow computers' statistical analysis on the text.
[Koen] About LAAFU, I've been reading the work of Emma May Smith, she's also a linguist Voynich researcher. She says that sandhi effects might be at work there, which is for example when you're speaking and you link two words by inserting an extra sound. That the person who wrote the VM was uncertain about how to write his language onto the paper and would insert sandhi effects only at the end of the line. Do you think something like that, some linguistic reality might be at the basis of the LAAFU? Or is it purely a script thing?
[Stephen] It's perfectly possible. Let's take for example Turkish. It's agglutinative, so that means you take a word, add on lots of pieces and end up with a very long word. But you could equally write it out as several separate words. Now we don't know the nature of the underlying VM language, and we don't know how they decided to encode it. And it might be that they separated things we would write together or the other way around. Again, we are in very early stages in terms of trying to understand what this script is actually doing. But I think we have to approach it with that kind of sensitivity, to how languages are encoded in the script. In the example you just gave, there is an element of variation which we have to expect. I think that means a job of at least ten years trying to unpick what's going on.
[Koen] Especially if the writing culture was in development, we might expect even more of these irregularities of spoken language.
[Stephen] Exactly. And another hint is that if the writing system did not catch on, it could be that it's a little bit defective in representing the underlying language, and they said "well actually, let's go over to another script instead". And therefore the script was abandoned. That adds another problem if we want to decode it, because if it's a really poor, defective script in terms of representing the underlying language, then it's a huge difficulty to decode it.
[Koen] It must have worked to some extent, because they wrote a few hundred pages!
[Stephen] Absolutely, you're right. And also, if Curier's right, and I think he is right, that there is more than one hand writing, then obviously more than one person was involved and therefore that involves an element of success.
[David] Curier also suggested that there were different dialects within the MS, the famous Curier languages. IS that something you've seen evidence of?
[Stephen] Well, again I think that needs to be grounded a bit more in linguistic reality. First of all, there's no reason to say that they need to be different *dialects*, it could simply be script variation. For example, William Shakespeare, we have multiple examples of his signature and each one spells his own name differently, as a mature literate person! So there's no reason for thinking that the Curier A scribe wasn't writing exactly the same language as the Curier B scribe. But there was enough variation, especially since it was a newly developed script, that they wrote things slightly differently, each one consistent to themselves.When Curier called them "laguage A" and "Language B" I think that really confused a lot of people. I think you can interpret what you see in the MS as simply script variation.
[Koen] Curier did express himself carefully, but of course these terms take on a life of their own.
[Stephen] I think you're right. But he did say language A and language B, which caused a lot of confusion. I would say "script Version A" and "script version B", that's the furthest I would go. As we said earlier on, David, I would not invoke any language, speak in terms of script, until we get a better idea of the language beneath it.
[David] Yeah, I was using his terminology. But to take a slightly different angle on this, a lot of the words appear to be made up of stems with prefixes and suffixes. What do you think of reuse of a stem like [daiin]? It's not something we're used to when looking at natural languages.
[Stephen] Well, is that true? I mean, you look at English words and very often they are made up of different parts. Just look at the word "catalog". You've got the word "cat", "a" and "log", different parts that make up a word itself. This was a criticism I think that came from Pelling's thing as well. And the idea is that for example the word "catalog" can't be a word because the word "cat" exists in the language already. This just seems to me... illogical.
I don't myself, as a linguist, see this as an issue. You could have the word that I read as "kentaur" as a perfectly sensible word, and then the word "daor" or whatever as a word that's used somewhere else in the language very frequently. That is not a particular mystery, languages to that a lot.
Your point about it looking a bit odd to English speakers, that's also true. But a language like Arabic has a lot of cases where a root form, made up of three letters, gets around it additions and subtractions, and sometimes you get something in the middle of the root as well. So I don't see much of a problem about particular roots, stems, additions etc. I wouldn't take this as a reason to say: no, this is not a real language, it must be something else. I think it's just another intriguing problem that we need to work through systematically.
[Koen] Do you think it might have to do something with tonal languages? There you get a lot of homonyms - to us-
which they pronounce on a different tone.
[Stephen] Not really, the difficulty with that is that they would have to signal somewhere that this particular form of the word "ma" is different from the form "má". The VM doesn't seem to signal this in any way, which means it would be impossible to read it. I wouldn't myself think in terms of tonal languages.
[David] What about repeating words? Quite often you get words that are repeated or very similar. And then there's Timm's Pairs, with one word above and then a very similar word underneath. Timm himself argues that this shows that the whole thing was a nonsense script. Is this something we see in natural languages?
[Stephen] There could be elements of literary embellishments for example, but again we don't know. I haven't entirely gone away from the idea that there might be a poetic element to it. Coming back to LAAFU, there are examples of even herbals written in poetry. There are aspects of the MS where you'd like to say they include some kind of artistic elaboration. We might agree that the illustrations are not very fine artistically but here are elements, some of the letters are embellished in interesting ways.... and you could argue that underneath it there is an embellished language which has a poetic dimension to it, which could account for some of the repetitions.
[Koen] We've just been discussing this on the forum and there are also languages which in their structure use a lot of reduplication, for example for forming plurals, for reinforcing, so you'd have to know a lot about linguistics to be aware of all the possibilities.
[Stephen] Yes, but I would agree with the people who raise the issue, that it's still a live issue, because there are some cases where it seems to be four or five repetitions of the same or nearly the same word, which is a curiosity. It's an interesting issue, but for me, I'd park it and see if we can explain it at a later stage.
[David] When you're looking at the MS, are you using the EVA transcriptions or are you working directly from the page?
[Stephen] I do use the transcriptions, and I credit the people who developed them because they are a great help. But I do tend now, actually, to look more directly at the script itself, because I'm slightly worried that we might be missing things in the transliteration. It's very useful to have the original text and the transliteration side by side and look at both of them with caution. But the people wo transcribed it did an incredible job!
[Koen] Do you actually still study the MS often?
[Stephen] Yes, I work less on it since I'm still in full-time employment doing different things, and I'm also recovering from some health issues. I'm not so lively in terms of putting things on the net as I used to be, but I still follow it keenly. I follow also your own site, actually (Voynich.ninja forum), but I haven't ever posted anything on there. One thing I'm quite keen to do is to avoid Voynich flaming. In a sense we all know that there's a danger of trivia overtaking, and we've all got other things to do. So I tend to avoid that kind of discussion where it's too nitty-gritty. But I do read them and follow them with great interest.
[Koen] My impression is that many people have gotten to the point where they discuss more each other than the actual MS. That's what you want to avoid, basically.
[Stephen] I think so, and I mean, what I would hope to do is to sometimes have a block of time where I'd be able to sit down and work on the approach to the MS which I see is the most fruitful one, and to try and take some things further in that direction. What this MS needs is some period of careful, detailed analysis, working through the possibilities. That's how I got to my 2014 paper. During 6 to 8 months I managed to devote a lot of time, even some weekends to it. And a lot of library time in the British Library with manuscripts and other documents. And if I'd be able to reproduce that, I might be able to get some progress that satisfied my own curiosity in the MS a bit more.