20-10-2024, 04:46 PM
I'm getting a lot of comments on my videos (much more than I can answer appropriately), and sometimes they are really good questions. What I like the most is that a lot of people are thinking about the system, rather than coming up with obscure languages.
On my first video about entropy, @stoplight2554 commented:
I like the way they think: the system does suggest n-grams as a possible part of the solution. They also take into account that considering heavy use of n-grams would almost certainly mean that spaces aren't spaces.
Their experiment sounds interesting: you make a long string of characters with spaces removed, and test at which intervals entropy goes up. But would this be testable at all? You'd need to make choices for parsing (e.g. what's your initial treatment of [iin]?). And a single missing or extra character (by scribal error) would throw the system off.
Maybe it's more useful to think in terms of entropy, which is more of an average? So like, how easy is it to predict two characters over when spaces are removed?
I'd also assume that consistent use of, let's say, bigrams, would inflate your alphabet to such an extent that it would become impossible to compare to other texts?
On my first video about entropy, @stoplight2554 commented:
Quote:wouldnt this suggest some sort of n-gram based system? there has to be some trade off between character length of a text block and encoded characters per length. this sort of has to assume that spaces are to be ignored though..
also, this only works if the ability to predict the next character from the last is not 'continuous' across a section of text. if it reliably fails to predict at a certain interval, then you have your n-gram length. if it never fails to predict the next character, then its too deterministic to express any meaning whatsoever (unless the meaning itself is the repeated pattern)
I like the way they think: the system does suggest n-grams as a possible part of the solution. They also take into account that considering heavy use of n-grams would almost certainly mean that spaces aren't spaces.
Their experiment sounds interesting: you make a long string of characters with spaces removed, and test at which intervals entropy goes up. But would this be testable at all? You'd need to make choices for parsing (e.g. what's your initial treatment of [iin]?). And a single missing or extra character (by scribal error) would throw the system off.
Maybe it's more useful to think in terms of entropy, which is more of an average? So like, how easy is it to predict two characters over when spaces are removed?
I'd also assume that consistent use of, let's say, bigrams, would inflate your alphabet to such an extent that it would become impossible to compare to other texts?