| Welcome, Guest |
You have to register before you can post on our site.
|
| Online Users |
There are currently 1987 online users. » 12 Member(s) | 1971 Guest(s) Applebot, Bing, Google, Yandex, Cile cile, davidma, Dobri, JustAnotherTheory, Mark Knowles, obelus, Scotty
|
| Latest Threads |
Elephant in the Room Solu...
Forum: Theories & Solutions
Last Post: Jorge_Stolfi
18 minutes ago
» Replies: 95
» Views: 5,233
|
Possible Identification o...
Forum: Provenance & history
Last Post: nablator
19 minutes ago
» Replies: 8
» Views: 654
|
f and p appear predominan...
Forum: Analysis of the text
Last Post: tavie
25 minutes ago
» Replies: 10
» Views: 10,005
|
A random rant about the V...
Forum: Voynich Talk
Last Post: Jorge_Stolfi
27 minutes ago
» Replies: 13
» Views: 267
|
The Voynich Manuscript is...
Forum: Theories & Solutions
Last Post: Koen G
3 hours ago
» Replies: 16
» Views: 278
|
My Theory on the Voynich ...
Forum: Theories & Solutions
Last Post: nablator
5 hours ago
» Replies: 15
» Views: 1,064
|
On the word "luez" in the...
Forum: Marginalia
Last Post: PeteClifford
7 hours ago
» Replies: 13
» Views: 330
|
AI-generated "Voynich man...
Forum: Fiction, Comics, Films & Videos, Games & other Media
Last Post: Koen G
9 hours ago
» Replies: 111
» Views: 49,933
|
“The Library of Babel” by...
Forum: Fiction, Comics, Films & Videos, Games & other Media
Last Post: ReneZ
Today, 12:18 AM
» Replies: 8
» Views: 332
|
Would a NEW Voynich Manus...
Forum: Provenance & history
Last Post: Bluetoes101
Yesterday, 11:36 PM
» Replies: 30
» Views: 903
|
|
|
Weird linguistics: features of natural languages that might be relevant |
|
Posted by: bt2901 - 07-02-2025, 04:55 PM - Forum: Voynich Talk
- No Replies
|
 |
I'm not proposing a solution. Rather, I want to highlight some of the more exotic properties of natural languages that could possibly bring the text closer to VM regularity level and which I don't believe were discussed here.
I don't think these will be enough to explain features of VM text, but I believe that no "natural language plaintext substitution" theory could be plausible unless it uses at least some of them. So I want to play Devil's Advocate a little.
1) Phrasal clitics
Clitic is something in-between a bound morpheme and independent word. They were present in Proto-Indo-European and many languages descended from PIE still contain some vestiges. The position of phrasal clitics obeys Wackernagel's law: roughly speaking, it requies phrasal clitic(s) to always be in the second position of the sentence.
An example from contemporary Czech: Já jsem si ho prohlížel. All bolded words are clitics. As you can see, they are gathered in the same position -- importantly, their order inside this position is also very strict (e.g. "li", if present, should be placed first).
This effect could explain some of line-position and paragraph-position dependence. If we assume that each line/paragraph is a complete sentence, then the occurrence of clitics close to beginning is pretty expected (I'm not sure if repeating the same clitic is allowed though; it is ungrammatical in the languages I know of).
2) Strong vowel harmony, imperfectly transcribed
Turkic and Mongolian languages has vowel harmony system. For example, in the Turkish language all vowels in a given word are either front or back (nowadays it's more complicated thanks to loanwords, but let's ignore it). Modern Latin orthography divides vowels into a/ı/o/u and e/i/ö/ü. You could describe vowel harmony in Turkish as "either every vowel is dotted, or neither is". You probably could analyze Turkish as having 4 vowels and a "frontness" suprasegmental feature instead of 8 vowels.
This could plausibly cut the number of required vowel letters by half.
3) Tone indicators
Some of Chinese languages (Hmong, Zhuang, Unified Miao) use letters both for their sound value and as tone indicators.
You are not allowed to view links. Register or Login to view.
Quote:The Hmong alphabet uses its letters for a job traditionally assigned to diacritics. It’s a rare case where a K isn’t a K and a second O isn’t like the first O: they’re qualities of the vowels before them.
If we again assume that lines/paragraphs correspond to sentences, and the words that require distinguishing with tone prefer specific positions in the sentence (maybe they all are verbs, for example), then non-flat letter distribution could arise as a result.
4) Non-rhyming poetry
Some forms of poetry are unrhymed (most notably, Japanese haikus) but instead employ verbal and grammatical parallelism.
5) Vowels affecting the consonants
You are not allowed to view links. Register or Login to view. that iotated/non-iotated vowels in Cyrillic alphabet could create a digraph system where digraph doesn't look like a ligature. She asks:
Quote:The question is whether such relationships between any related series of sounds (such as palatized and non-palatized) is realistic.
Well, aside from palatalization in Slavic languages, there also are You are not allowed to view links. Register or Login to view.,
Abkhaz (that You are not allowed to view links. Register or Login to view. as a plausible language candidate before) has three-way voiced/voiceless/ejective and palatalized/labialized/plain distinction in the consonant system and only two different vowels. John Colarusso conjectures that the ancestor of Abkhaz (Proto-Northwest Caucasian) had a normal 5 vowel system originally that collapsed into /a e i o u/ > /a ʲa ʲə ʷa ʷə/, where the vowel affected the quality of preceding consonant.
|
|
|
| KMeans analysis of Currier A, B, C |
|
Posted by: Addsamuels - 06-02-2025, 11:19 PM - Forum: Analysis of the text
- Replies (6)
|
 |
I hope these links are useful
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
Generated using ChatGPT
Note the 'transitioning' You are not allowed to view links. Register or Login to view. and You are not allowed to view links. Register or Login to view.
(Ignore You are not allowed to view links. Register or Login to view. since there's not enough writing.)
ChatGPT says: What Do PCA Component 1 and PCA Component 2 Mean?
PCA (Principal Component Analysis) is a technique used to reduce the dimensionality of data while keeping as much important information as possible.
1. Why Do We Use PCA in This Case?
Your text data is transformed into a high-dimensional numerical representation using TF-IDF (Term Frequency-Inverse Document Frequency). This means that each folio is represented as a vector with many dimensions (one for each unique word). Since we can't easily visualize such high-dimensional data, we use PCA to compress this data into just two principal components (PCA Component 1 & PCA Component 2) that still retain most of the information.
2. What Are PCA Components?- PCA Component 1 (X-axis in the plot) → The most important direction of variance in the data.
- PCA Component 2 (Y-axis in the plot) → The second most important direction of variance (uncorrelated with Component 1).
These two components are new axes in a transformed space, where:- Similar folios (in terms of text content) are closer together.
- Different folios are farther apart.
3. What Does This Mean in the 3D Chart?- X (PCA Component 1): Main variation in folio text.
- Y (PCA Component 2): Second major variation in folio text.
- Z (Depth): Position of the folio within the document (line number).
This lets us see clusters of similar folios while keeping it visually interpretable.
Would you like to extract the top most important words contributing to PCA 1 and PCA 2? That would tell you what words are driving the differences! ?
|
|
|
| So.. What should I actually be doing? |
|
Posted by: eggyk - 06-02-2025, 01:52 PM - Forum: Voynich Talk
- Replies (18)
|
 |
Hi everyone,
This is my first post here, my apologies if it's formatted badly or a little disorganised in structure. I've been somewhat nervous to post, but I have a lot of things I want to talk about and no easy way to get them all out! So.. if this post is in the wrong section, or should be split into multiple areas please let me know!
I recently became interested in the VM and had no idea where to begin in regards to solving it. As i'm a total beginner with no expertise, I decided to read up on many of the theories, arguments, solutions, interpretations, and tasks that have been shared here (with many of these threads dating back 8 years!). I'm amazed some of these threads are still alive, and I appreciate the threads of banned users being archived and readable, they were rather entertaining to go through.
I have a few general observations, and a couple of ideas and theories, so I thought i would make a thread to go through them and (hopefully) be sent somewhere to catch up on what's been discussed/done already, to be told to keep my opinions to myself, or be sent out to do something useful so I can actually contribute something.
The main reason I am bringing any of these up is that I would like to avoid some the issues others have run into and decide which rabbit hole may be worth going into. I would love to contribute something worthwhile, if I am able. As for my suggestions, I am throwing them out on the off chance that someone finds them worthwhile/valuable.
1) Very few things seem to be generally accepted by most people
Almost everything is still up for debate, including the transcriptions into voynichese. If there are 6 different possible transcriptions/interpretations of the symbols on the pages, many forms of statistical analysis, frequency analysis, cipher cracking, and pattern tracking become "useless" in the eyes of many, even if perfectly excecuted. The best case scenario is that they will say "Your analysis is correct, but you have done it from an incorrect data set". If I were to go down that path, I would like to know what the baseline would have to be for it to be considered correct.
The motivation to finally post here was actually checking the "Positions we can all agree upon" section of the forum. It seems to me that everyone gave up on doing that long ago.
2) Many ideas and theories that led to nothing are hard to find
The forum has a "Solutions" section, which is great! I would love it if proposed methods or ideas were codified somewhere as well. For example, ordering and folding the pages a certain way to reveal a secret message, or removing a certain symbol from the text and then running frequency analysis, or drawing lines between gallows symbols to make a smiley face or map. None of these are claimed solutions, but it's useful to know it has been done. They may no be too awful to categorise (deciphering attempts, transliteration attempts, frequency and position analysis etc)
I have seen many instances of someone saying "I have done this thing over the last few months" only to be greeted by a helpful commenter or two that had already tried it and found nothing. If those instances could be reduced as much as possible, it's more likely that progress would be made. I certainly would like to avoid that happening to me.
3) Many threads get bogged down with mutually exclusive ideas
I would much like to persue ideas that could produce results that could be widely agreed upon, and not lead to huge "agree to disagree" territory. If someone thinks a rosette is the sea, another thinks that it's the embodiment of christ, a third thinks that its a naval map, and the fourth thinks its a description of the heavens and earth, I don't see much benefit to me chipping in and saying that its really a picture of constantinople without convincing proof that it's the case. In fact, my doing so may take attention away from someone else's theory when what the theory needs is deeper discussion.
4) People constantly seem to join the forum, claim to have solved it, do not listen to criticism and then leave
I would very much like to not be added to that list, if I can help it.
What things can I do, that are actually worth doing?
So this leads me to the post title, what should I be doing actually? What are the things that -if achieved- would actually be considered measurable, agreed upon progress on the VM?
A few I could think of:
-Finding voynichese text in another manuscript
-Finding a text match for a sentence or paragraph in another plain text manuscript (lords prayer, charm, recipe etc)
-Finding a 1:1 matching image/diagram in another manuscript
-Finding proof of ownership in the currently unknown time period
-Finding a signature/other marginalia match
-Creating a reproducable translation method that translates at least full paragraphs into meaningful text (not things like "apply forcefully candle goat king slowly moon") with very few degrees of freedom in how you choose the translated word (would even this be enough?)
Please tell me if there are any others.
Additionally, if I have the basis of a plan or idea, is it acceptable to make a thread floating such ideas even if I have not been able to excecute that idea or need help to begin? I would use the search function of course, but many discussions only seem accessible/locatable through the links of people that remember where and when it was discussed.
Hopefully I can help in some way, i'd love to hear what you think.
|
|
|
| Levenshtein SVG |
|
Posted by: Addsamuels - 05-02-2025, 12:38 AM - Forum: Analysis of the text
- Replies (1)
|
 |
After many failed attempts, I've got a SVG of the Levenshtein distances of words in the MS 408 (although the code was crafted by AI, so it could be completely inaccurate)
Enjoy.
You are not allowed to view links. Register or Login to view.
|
|
|
| How to fool men for centuries |
|
Posted by: voynichrose - 04-02-2025, 06:02 AM - Forum: Analysis of the text
- Replies (1)
|
 |
Original Text
How to fool men for centuries
Use some unknown symbols
Make sure the words from the symbols are not the same length as the original text
Once the work is completed burn the original text
Cipher
dkim lki wkka imn wkr cmnt
Usm pkimy un syi
iajsjeuwl aurn suel fuels dkel sruwow arei suel saim amen to are kree
Kn suel wok iss comp burn suel original text
I got sloppy and did not repeat some tokens but maybe the author did the same. Some voynich vords repeat but if you just had the cipher you would never decode it. This is what I believe the voynich is but with the tokens in tact to be copied the same through out the manuscript so similar vords come up more often.
|
|
|
| Folio, Quire and Section Word Correlation |
|
Posted by: 008348dc760f858fd668476b75fb6f - 28-01-2025, 10:26 AM - Forum: Analysis of the text
- Replies (8)
|
 |
I spent some time with the manuscript over the festive break. I have an idea as to what I think is happening here regarding the text but I'll set that aside for the end. This write-up is a byproduct of the overall investigation, on its own it might be beneficial to people following their own research paths.
The experiment:
Can we establish correlation between folios and sections based on the number of words shared.
The method:
I wrangled a python script that goes through each folio, takes each individual word (above a length of 3 characters) and counts how many times it appears in any other folio. To balance the fact that different folios will have a different number of words I divide the final score by the number of words in the comparison page. The idea is that we are balancing the count against the event opportunity (although I admit this is likely not a *clean* scoring method).
The output & processing:
The output is a big CSV file with the scores for each folio/folio row/column. I imported this into a spreadsheet so that I could do further analysis. I formatted the cells so that they are colored on a gradient based on how much higher they are than the average.
Some advanced warning, this method of displaying the results creates an implicit symmetry between an entry and its counter, basically everything is mirrored across the diagonal and this will create create patterns that don't exist. The output spreadsheet looks like the following:
![[Image: AP1GczNAoeoFK9D-S74M8UNov2z0ridcCM9zV1EO...authuser=0]](https://lh3.googleusercontent.com/pw/AP1GczNAoeoFK9D-S74M8UNov2z0ridcCM9zV1EOelTmYqNJ52XydpZWAPnYrx9Ao-jYVciN-aMWyORGTkW38Snz7ZfGfl_k0VgHZK8UlVTvQdafilYeiRbKGKrekUQDwJvwHaKp3GL2Mr61KjVi6SmYyz7TCg=w1036-h825-s-no?authuser=0)
Starting from the top left corner F1r.
The darker the red color the higher above the average score.
The bordered lines are quires.
The gaps in the column and row headers are the missing folios.
Observations:
We can see very clear correlation between two sections in the bottom right, these are exactly the bathing and recipe sections.
Both the bathing section and the recipe section have very distinct correlation to plants in the second half of the herbal section.
These same plant folios have a strong correlation with the preparation section as well as the bathing and recipe section.
The second clear block is the plant section itself, we see clear correlation between all entries, and perhaps an indication that the last 3 quires of this section are somehow more tightly correlated than the previous 4.
The quires wrap very nicely to these... "islands of correlation", we can see the borders of the quires outlining the sections.
A nice example of this is the "vine" plant that appears in f17v, f96v, f99. It's correlated through theme and word correlation.
How does this help:
It helps us isolate sections to work on, there is a lot of text in the manuscript and reducing the attack surface can help.
We can also use this to validate folio and quire order, for example I've always been confused by location of quire 19 and suspected it belonged somewhere else, but looking at its position on the sheet it shares the same correlations as its neighbors, the bathing section and the recipe section, meaning it likely belongs where it is.
The speculation part:
The current status of the "is it a cipher or not" debate boils down to either "no its the product of a generative method that produces pseudo-language" or "yes but we don't know how or what" or "something else". If this research shows anything it shows that the manuscript text shares a logical correlation to its pictographic themes and even the physical structure (the quires). If this is pseud-language then it follows themes and relations. Also, I don't think these concepts are mutually exclusive, I think you absolutely can use a generative method as an encoding or enciphering mechanism. If I was to state it in a simple way it would be, "you hide a needle in a bunch of haystacks, but you're going to need to manufacture those haystacks". I think the generative mechanism takes words as a seed and produces pseudo-language as an obscured output. I also think there is a sort of pun going on here, it's not just the text being generated, the characters and plants are too, hence why we see so many unrecognisable plants.
The plants are gathered, cut up thrown into a device and turned into something new, like the words.
If I'd say anything, my theory is that a generative mechanism turns a word into many words, perhaps letter by letter. It adds pre/suff-ixes, modifies letters, adds false root etc so that we end up with what we have, a text poor in individual characters and overburdened with words.
I think the generative mechanism is shown in You are not allowed to view links. Register or Login to view. and is based on a solar quadrant (You are not allowed to view links. Register or Login to view.).
I think the female characters in the center are showing how a word is modified, what that process is.
I think the second outermost circle (containing the repeating pattern of single characters) manages the substitutions.
I think the contents of the generative mechanism is defined by the astal/zodiac section allowing it to be adjusted based on themes.
I think some settings for the mechanism are indicated by the specific female character and their association to a star, or the star present.
I think we have some working notes in the margins of You are not allowed to view links. Register or Login to view. and You are not allowed to view links. Register or Login to view. that are further clues to the process.
I think sections sharing an above average correlation are encrypted using the same scheme/settings.
What next:
I'd like to focus on identifying the core word roots and re-scoring everything based on that, I think that would help identify how the prefixes and suffixes are defined based on the mechanism state/settings.
Notes:
I added the missing folios and quires based on You are not allowed to view links. Register or Login to view..
The solar quadrant was taken from a reference in the interesting academic paper "The Voynich Manuscript as a Manual for the Habsburgs" by You are not allowed to view links. Register or Login to view.
You can get a PDF of the spreadsheet here You are not allowed to view links. Register or Login to view.
You can get a spreadsheet of the spreadsheet here You are not allowed to view links. Register or Login to view.
As is the way with such things, it turns out ReneZ had done this research prior. Head You are not allowed to view links. Register or Login to view.for more correlation goodness.
|
|
|
|