nablator > 08-06-2019, 04:48 PM
(07-06-2019, 09:11 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.I will try other Latin texts that are much more repetitive.
ReneZ > 10-06-2019, 08:53 AM
nablator > 10-06-2019, 09:52 AM
(10-06-2019, 08:53 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.If I were to generate a sample text, would it be possible to create a similar graph from it?
Quote:What I have in mind is to take a meaningful text and to present it in three different ways.
Would it be a problem it if were presented as one word per line?
ReneZ > 10-06-2019, 12:52 PM
nablator > 10-06-2019, 01:50 PM
(10-06-2019, 12:52 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.Thanks!
I started with a relatively short text, in English, namely Poe's "the fall of the house of Usher".
Here is the plain text, one word per line: You are not allowed to view links. Register or Login to view.
Here is the same text, with a word-for-word substitution, where each word is replaced by a Roman numeral, following some rule that I can explain later: You are not allowed to view links. Register or Login to view.
nablator > 10-06-2019, 02:00 PM
ReneZ > 10-06-2019, 02:07 PM
nablator > 10-06-2019, 02:24 PM
/*
* Edit distance calculation shamelessly borrowed from https://rosettacode.org/wiki/Levenshtein_distance#Java
* Example of use: java MDistBetweenLines text.txt 250 > text_dist.txt
*/
import java.io.*;
import java.util.*;
public class MDistBetweenLines {
static int maxLineDistance;
static ArrayList<ArrayList<String>> lineList = new ArrayList();
public static void main(String args[]) {
try {
if (args.length == 2) {
maxLineDistance = Integer.parseInt(args[1]);
loadFile(new File(args[0]));
processLines();
}
}
catch (Exception e) {
e.printStackTrace();
}
}
static void loadFile(File f) throws Exception {
InputStreamReader isreader = new InputStreamReader(new FileInputStream(f), "UTF-8");
BufferedReader reader = new BufferedReader(isreader);
String line;
while((line = reader.readLine()) != null) {
if (!line.isEmpty()) {
String[] words = line.toLowerCase().replaceAll("\\p{P}", " ").split("[0-9 \t]+");
ArrayList<String> wordList = new ArrayList<String>();
for (String word: words) {
if (!word.isEmpty())
wordList.add(word);
}
lineList.add(wordList);
}
}
reader.close();
System.err.println("Loaded " + lineList.size() + " lines");
}
static void processLines() {
for (int lineDistance = 1; lineDistance <= maxLineDistance; lineDistance++) {
int nSum = 0, totalDist = 0;
for (int iLine = 0; iLine < lineList.size() - lineDistance; iLine++) {
ArrayList<String> wordList1 = lineList.get(iLine);
ArrayList<String> wordList2 = lineList.get(iLine + lineDistance);
for (String word1 : wordList1) {
for (String word2 : wordList2) {
totalDist += distance(word1, word2);
nSum++;
}
}
}
System.out.println(lineDistance + "\t" + (double)totalDist/nSum);
}
}
public static int distance(String a, String b) {
int [] costs = new int [b.length() + 1];
for (int j = 0; j < costs.length; j++)
costs[j] = j;
for (int i = 1; i <= a.length(); i++) {
costs[0] = i;
int nw = i - 1;
for (int j = 1; j <= b.length(); j++) {
int cj = Math.min(1 + Math.min(costs[j], costs[j - 1]), a.charAt(i - 1) == b.charAt(j - 1) ? nw : nw + 1);
nw = costs[j];
costs[j] = cj;
}
}
return costs[b.length()];
}
}
nablator > 10-06-2019, 02:31 PM
(10-06-2019, 12:52 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.Here is the same text, with a word-for-word substitution, where each word is replaced by a Roman numeral, following some rule that I can explain later: You are not allowed to view links. Register or Login to view.I wonder how you did it. Did you choose the Roman numerals somehow to maximize the effect?
ReneZ > 10-06-2019, 02:59 PM