The Voynich Ninja

Pages: 1 2 3 4 5 6 7 8 9

I really like how Obelus' 2d histograms make Patrick's method even more readable. From these examples, the left-to-right shift in the behaviour of 'qo' is apparent. It is also clear that there are strong line-effects at the end of lines. Vertically, the last lines of paragraphs also appear to behave differently. The relative rarity of 'qo' in the last row and last column appear to add up in the bottom-right cell that shows a spike in the o/qo distribution in the last plot. This could be largely due to the very last word of paragraphs, where I count 202 o-words and only 35 q-words (ratio 5.8 vs an average of 1.4).

Once again, Voynichese patterns appear to be soft preferences rather than hard rules. Though qo- at the end of paragraphs appears to be avoided, there still are enough "exceptions" that one has to admit that the occurrence of qo- is that position is a valid option.

Just pitching in as a statistically challenged person that I find obelus' method of representation much more intuitive to comprehend. I'm following the discussion with interest.

(01-09-2021, 12:48 PM)obelus Wrote: You are not allowed to view links. Register or Login to view.No specific question is addressed by these particular graphics, but the general form may be a natural way of representing paragraph-level structure. Binning artifacts discussed earlier in the thread are naturally still present.

Belated thanks to Obelus for the idea and demonstration of using this kind of display for downward/rightward statistics. I agree that it's a big improvement over the approaches I was trying before, and I think we can get rid of those binning artifacts as well, if we want to, by following steps something like this:

(1) Assign values to successive points in each line, such as vords; "no" = 0, "yes" = total count of points in line.*
(2) Interpolate to expand the series to 500 points in width.
(3) Stack the results vertically by paragraph.
(4) Interpolate to expand the result to 250 points in height.
(5) Overlay the results for all paragraphs and sum them.
(6) Rescale the summed values to the range 0-255.
(7) Export as an image.

*The idea in assigning higher values to points in lines that contain more points is to help compensate for the smaller space they occupy in the final image. There's likely a better way to handle this.

What I've described so far would give us a grayscale image, but we can use color to add further nuance, for example:

(8) Carry out steps 1-7 separately for Currier A and Currier B.
(9) Assign the result for Currier A to one RGB color channel, the result for Currier B to another color channel, and the sum of the values for Currier A and Currier B to the remaining color channel.
(10) Rescale the summed values to the range 0-1.
(11) Adjust the Currier A and Currier B channels (but not the "sum" channel) from linear to sRGB encoding -- this makes the contrast between them stand out more sharply.
(12) Multiply all values by 255 and export as an image.

If we invert the result of all these steps, we get a display comparable in scale to the one Obelus proposed -- showing rightwardness and downwardness of vords beginning [qo] -- but with gentler gradients and somewhat cloudlike shapes.

[attachment=6046]

For comparison, here's the same display in its original form, with higher values mapped to brightness rather than darkness:

[attachment=6045]

Here's another display showing the distribution of all vords beginning [sh] on the left and all vords beginning [ch] on the right:

[attachment=6047]

And here's a similar display for two specific vords: [shedy] on the left, [chedy] on the right. Note the different coloration resulting from the vast majority of tokens coming from Currier B.

[attachment=6048]

Finally, here's a display for the vord [daiin], which shows a conspicuous peak I wasn't expecting:

[attachment=6049]

(27-11-2021, 03:55 AM)pfeaster Wrote: You are not allowed to view links. Register or Login to view.Belated thanks to Obelus for the idea and demonstration of using this kind of display for downward/rightward statistics. I agree that it's a big improvement over the approaches I was trying before, and I think we can get rid of those binning artifacts as well, if we want to, by following steps something like this:

... xray images

Very interesting images, but what does the RGB / brightness exactly show in these images, could your please explain.

If, for example Daiin, the brightness shows the left position, the image can not be correct because daiin does not have preferred position at the beginning of lines.

Really interesting, thanks, but I'd also benefit from an explanation of how these charts work.

Daiin does appear in line initial positions more than it "ought" to, so I'm not surprised if that's what the brightness is signifying here. But I'd also expect the rest to be more "glowy" - is it less bright because its middle-line occurrences are more spread out? But then we'd expect that with the others too. It would be good to see them all in the same colour scheme so we can compare.

(28-11-2021, 08:13 PM)Davidsch Wrote: You are not allowed to view links. Register or Login to view.Very interesting images, but what does the RGB / brightness exactly show in these images, could your please explain.

If, for example Daiin, the brightness shows the left position, the image can not be correct because daiin does not have preferred position at the beginning of lines.

Brightness corresponds to greater prevalence in a given position. Since submitting my previous post, I've simplified things a bit, so that the numerical pixel value at each point represents a straightforward fraction of the maximum value, with no adjustment for sRGB encoding.

Here are new grayscale images for [daiin] in Currier A and B, side by side.

[attachment=6053]

It's the image for Currier B that shows a bright peak at the left of lines and the vertical center of paragraphs. If I'm reading my spreadsheets correctly, in Currier B, about 30% of tokens of [daiin] are the first vord in a line; about 8.5% are the last vord in a line; and about 61.5% fall somewhere in the middle of the line. In all of Currier B, by contrast, about 11.5% of vords are first or last in their lines, while about 77% fall somewhere in the middle. So [daiin] does seem significantly overrepresented in the first line position in Currier B. The image suggests that this overrepresentation is also limited to the vertical center of the paragraph. I haven't double-checked that, but if it's true, then the anomaly would be even stronger than the line-position statistics would suggest by themselves.

If we were to overlay the images for Currier A and Currier B in two separate color channels, the bright peak would stand out in whatever color we've assigned to Currier B. In the color image I shared earlier, it's yellowish.

I've also tried separating out the first and last lines of paragraphs and the first, second, and last vords of lines, since these are sometimes regarded as significant "absolute" positions. If there are fewer than three lines, I assign whatever there is in the order first, last; and if there are fewer than four vords in a line, I assign whatever there is in the order first, last, second. Here's what that approach looks like for [daiin] in Currier A and B:

[attachment=6054]

From this it appears that [daiin] is also underrepresented as the second vord of a line, though I haven't looked into that further.

Color-coding can be used in a variety of ways to draw out contrasts. Here's an example of a different kind, limited to Currier B: vords beginning [ok] are assigned to the blue channel, and vords beginning [ot] are assigned to the green channel. Thus, [ok*] vords are more common in areas that look bluer, while [ot*] vords are more common in areas that look greener.

[attachment=6055]

Here's a similar display, again for Currier B, showing vords beginning [qo]+gallows in blue and vords beginning [o]+gallows in green.

[attachment=6056]

Three-way color combinations can be prettier but get harder to interpret. Here, again for Currier B, is [chedy] in blue, [qokeedy] in green, and [qokaiin] in red. The various shades represent areas in which those vords are differently concentrated. Note also how the separate treatment of first/second/last vords and first/last lines draws out (or creates?) different patterns.

[attachment=6057]

(28-11-2021, 09:35 PM)tavie Wrote: You are not allowed to view links. Register or Login to view.Really interesting, thanks, but I'd also benefit from an explanation of how these charts work.

Daiin does appear in line initial positions more than it "ought" to, so I'm not surprised if that's what the brightness is signifying here. But I'd also expect the rest to be more "glowy" - is it less bright because its middle-line occurrences are more spread out? But then we'd expect that with the others too. It would be good to see them all in the same colour scheme so we can compare.

Here's an explanatory diagram that may help clarify how these charts are designed to work.

[attachment=6059]

For these examples, I expanded the horizontal and vertical dimensions using a "blockier" algorithm than usual so that the individual data points are easier to see (otherwise I've been using a first-order spline interpolation). For each vord token that fits the target criteria, a rectangle of ones gets added to the blank canvas in the corresponding place with its width inversely proportional to line length and its height inversely proportional to paragraph length. Once all tokens have been added, all pixel values are divided by the maximum and then multiplied by 255 to yield an 8-bit image, brightest where the greatest number of rectangles overlap. The effect of "spreading out" mid-line and mid-paragraph tokens is an open question.

I'm wondering if you would see anything like this at all in regular manuscripts. Maybe in the first position due to capitalization, and maybe in the final position due to increased incentive for using abbreviation-related glyphs. For anything outside of that, I can't think of any good examples.

(28-11-2021, 10:38 PM)pfeaster Wrote: You are not allowed to view links. Register or Login to view.Here are new grayscale images for [daiin] in Currier A and B, side by side.

[attachment=6060]

Thank you, Patrick. Once again, your research is so interesting! These graphs make me question an assumption that is very appealing in other respects: that daiin "means" the same in Currier A and B. Your plots point out that the word behaves quite differently:

in A it shows a marked preference to appear at the end of paragraphs;
in B it is almost totally absent at the end of paragraphs; it mostly appears at the start of lines (but not first lines).

Given that aiin is so frequent in B, it seems possible that (in B) daiin is a line-initial variant of aiin, one of Emma's You are not allowed to view links. Register or Login to view.. As I discussed with Emma (and possibly on the forum) years ago, it does not seem that daiin in A can be equivalent to aiin in B, since daiin often appears reduplicated as daiin.daiin while aiin never does.

Both A and B have dark areas at the start of paragraphs: I guess this is due to the prevalence of Grove words and does not tell us much about A:daiin vs B:daiin.

(29-11-2021, 03:49 PM)Koen G Wrote: You are not allowed to view links. Register or Login to view.I'm wondering if you would see anything like this at all in regular manuscripts. Maybe in the first position due to capitalization, and maybe in the final position due to increased incentive for using abbreviation-related glyphs. For anything outside of that, I can't think of any good examples.

Hi Koen,
similar patterns can be observed in poetry (see You are not allowed to view links. Register or Login to view.). I hope to replicate Patrick's new plotting method in the future and run a few experiments, but I do expect that Shakespeare's sonnets will show some patterns. Of course, in a sonnet, a line really is "a functional unit", often a complete sentence or phrase; Voynichese does not look like poetry and these paragraph-level patterns are unexpected in prose. It would be interesting to experiment with abbreviated texts: as you say, something will likely show in those cases too; here we have the usual problem that diplomatic transcriptions are hard to find.

I have nothing useful to add to this discussion, but I just wanted to say that these visualizations are so beautiful and really really interesting! o other languages reveal similar patterns in paragraphs?

I'd love to hear what you see as the implications of these positional observations. Do different sections reveal different patterns? Are these patterns the result of different topics? Or a sophisticated encipherment? What about the different scribes who write in Currier B (i.e. my scribe 2 vs. 3 vs. 4 vs. 5)? Do they reveal different underlying patterns?

Pages: 1 2 3 4 5 6 7 8 9

MarcoP

Koen G

pfeaster

Davidsch

tavie

pfeaster

pfeaster

Koen G

MarcoP

LisaFaginDavis