ReneZ > 11-02-2018, 01:50 PM
MarcoP > 11-02-2018, 02:59 PM
(11-02-2018, 12:44 PM)davidjackson Wrote: You are not allowed to view links. Register or Login to view.Quote:MarcoP: David, what do you mean when you say that the ratio of creation is "constant"? Do you mean that the number of unique words increases linearly with text length?
Here I was referring to Torsten's text. Words in his corpus were created on a linear basis, whereas the Heap's line is a power line, and hence the two diverge.
If you look at the graph from the stars section you see how the creation rate rises and falls -something which I assume without proof is due to the introduction of new topics-, but still has an association with the power line.
==> rnd8.txt <==
fbba bgfc bhgc adba bced cded dfhc aceh egag fgaa ahge ecde ebge hhed egab edah heff efaf fddg hbhc
hhhh aace eegd bcag bcaf egbc fcfh accg cggh gfdh hdgg hcfd eede ccbg cceh aabd eheh bfeg hcdg dgbh
==> rnd12.txt <==
gjbl dglh jgkc hhke cjha babg idih ejge egjf ffef klhj fbgk jldh aajc fljj lcie ljga hleh lhda jhag
bjal lbde jeha lhfd efif afah iflj hfae lfel bklj bkca dlbk eleb bdfe ebck lehi figk ldag ahie ehga
==> rnd26.txt <==
lrdb revy asdn ajsn zhmy ajrs glve lhkr pywn kdoj jijz asbs xuku cpdk vsvz uwyw aacf okdy pgxa hsik
rlvl uagf qmrr rlmi qdmf zked luvj zhcz mhoa pqgw praj icxo wzdb dbie anmv dytl dvul vkea mmxh bxds
Torsten > 11-02-2018, 08:14 PM
(11-02-2018, 02:59 PM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.Now, if Torsten's algorithm produces something close to f(x)=0.5*x, it could be that it copies half of the words from the already generated text (and these words obviously do not add to Heap's total of unique words) and randomly generates the other words (similarly to the blue line representing the rnd26 file). While copying already generated words is an excellent way to approximate Heap's law, one must also find a way to appropriately constrain the generation of new words.
davidjackson > 12-02-2018, 07:05 PM
Quote:ReneZ: To characterise the text in this manner, what one could do is take a group of words (say 1000) anywhere in the MS, and then inspect the following words (a smaller group, e.g. 100) to see how many of them occur in the previous 1000 and how many are new. One can do this in two different ways, either counting all words of the 100, or only the unique words of the 100.
Quote:MarcoP - While copying already generated words is an excellent way to approximate Heap's law, one must also find a way to appropriately constrain the generation of new words.Indeed. One must of course assume that power laws were unknown to the scribes, and so such a constraint must make sense for the imagined mindset of the scribe. More to the point - why constrain the text in this way? It makes no sense within our imagined knowledge of the time when the text was written.
Quote: The generated text only contains words copied from each other.
-JKP- > 12-02-2018, 09:47 PM
Quote:davidjackson: This could be a way to identify "subjects" within the corpus. Sudden bursts of new words would suggest that a topic change has been introduced - something the larger scale graphs that I have posted above seem to show.
Torsten > 13-02-2018, 12:57 AM
(12-02-2018, 07:05 PM)davidjackson Wrote: You are not allowed to view links. Register or Login to view.Torsten - this is certainly apparently true. However, I feel that your theory, which otherwise appears to account for much of the way the text was written, fails to explain the grammatical structure which appears to constrain Voynichese (see Stolfi's You are not allowed to view links. Register or Login to view.paradigm). What exactly are these structures, and why do they apply throughout the manuscript? Because if you could link those structures into your generation algorithm, we would be some way closed towards proving your theory. And why was the text created in such a way that unique words appear to be introduced in a topic like way? (OK, I don't expect you to prove that second question! )
Wladimir D > 13-02-2018, 01:51 PM
Цитата: Wrote:ReneZ: To characterise the text in this manner, what one could do is take a group of words (say 1000) anywhere in the MS, and then inspect the following words (a smaller group, e.g. 100) to see how many of them occur in the previous 1000 and how many are new. One can do this in two different ways, either counting all words of the 100, or only the unique words of the 100.
Koen G > 13-02-2018, 03:02 PM
Anton > 13-02-2018, 10:43 PM
Quote:For example - the stars are not stars, but symbols of flowers.
-JKP- > 14-02-2018, 01:23 AM