Urtx13 > Yesterday, 10:57 AM
(Yesterday, 10:34 AM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.Sounds fine. Are you looking for some specific feedback?
oshfdk > Yesterday, 10:57 AM
(Yesterday, 10:51 AM)Urtx13 Wrote: You are not allowed to view links. Register or Login to view.Actually, I opened a new thread because, although the aim is the same, the approaches are different!
Urtx13 > Yesterday, 12:10 PM
nablator > Yesterday, 12:29 PM
(Yesterday, 12:10 PM)Urtx13 Wrote: You are not allowed to view links. Register or Login to view.Earlier researchers estimated that there were around 35,000 to 38,000 tokens in previous studies.
In my case, after strict cleaning and tokenization (only [a-z]{1,10} characters, with internal comments removed), I obtain 46,675 tokens and 8,421 unique tokens.
Urtx13 > Yesterday, 12:39 PM
(Yesterday, 12:29 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.(Yesterday, 12:10 PM)Urtx13 Wrote: You are not allowed to view links. Register or Login to view.Earlier researchers estimated that there were around 35,000 to 38,000 tokens in previous studies.
In my case, after strict cleaning and tokenization (only [a-z]{1,10} characters, with internal comments removed), I obtain 46,675 tokens and 8,421 unique tokens.
The ZL transliteration has uncertain word spaces ",", extended EVA "@" codes and illegible characters "?" that are not generally accepted as word separators.
RF1a-n has 38510 words with "," interpreted as a space, 37851 words without.
nablator > Yesterday, 12:51 PM
ReneZ > Yesterday, 01:03 PM
(Yesterday, 12:39 PM)Urtx13 Wrote: You are not allowed to view links. Register or Login to view.because the ZL3a-n transcription includes more material than earlier versions (such as RF1a-n),
ReneZ > Yesterday, 01:07 PM
(Yesterday, 12:51 PM)nablator Wrote: You are not allowed to view links. Register or Login to view."a@123;b" is not a valid [a-z]{1,10} token so this word will be split in two: "a" and "b".
Urtx13 > Yesterday, 02:10 PM
(Yesterday, 01:07 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.(Yesterday, 12:51 PM)nablator Wrote: You are not allowed to view links. Register or Login to view."a@123;b" is not a valid [a-z]{1,10} token so this word will be split in two: "a" and "b".
To @urtx13:
Note that @123; is just a low-Ascii way to describe a single character with Ascii value 123 (decimal), which is a rare character shape.
RobGea > Yesterday, 03:06 PM