I have been attempting to run bash commands in parallel on Windows with bash given by msys with some form of control over the number of processes spawned. With this setup I do not have access to the
parallel command.
For example, we can always specify the number of processes for compilation to the make command using:
$ make -j4
that uses 4 parallel processes and no more. After much trial and error, I finally figured out how multiple arbitrary commands can be run in the same way with a similar kind of control.
Let us presume we have a command file with one line per command. For example, I am trying to build different machine learning models to predict outcomes on various datasets in parallel on a multi-core machine using
WEKA. Hence I have a text file,
cmd.txt, prepared by a script that contains lines like:
$JBIN -Xmx3g weka.classifiers.trees.J48 -C 0.25 -M 2 -A -i -t result-1-1/filter6-weka-train.arff -T result-1-1/filter6-weka-test.arff -p 0 -d result-1-1/filter6-J48.model > result-1-1/filter6-J48-report.txt
$JBIN -Xmx3g weka.classifiers.functions.LibSVM -S 0 -K 2 -D 3 -G 0.0 -R 0.0 -N 0.5 -M 1000.0 -C 1000000.0 -E 0.0010 -P 0.1 -Z -o -i -t result-1-1/filter6-weka-train.arff -T result-1-1/filter6-weka-test.arff -p 0 -d result-1-1/filter6-SVM.model > result-1-1/filter6-SVM-report.txt
$JBIN -Xmx3g weka.classifiers.functions.LibSVM -S 0 -K 2 -D 3 -G 0.0 -R 0.0 -N 0.5 -M 1000.0 -C 1000000.0 -E 0.0010 -P 0.1 -Z -W '1 2' -o -i -t result-1-1/filter6-weka-train.arff -T result-1-1/filter6-weka-test.arff -p 0 -d result-1-1/filter6-SVM-w-1-2.model > result-1-1/filter6-SVM-w-1-2-report.txt
$JBIN -Xmx3g weka.classifiers.functions.LibSVM -S 0 -K 2 -D 3 -G 0.0 -R 0.0 -N 0.5 -M 1000.0 -C 1000000.0 -E 0.0010 -P 0.1 -Z -W '2 1' -o -i -t result-1-1/filter6-weka-train.arff -T result-1-1/filter6-weka-test.arff -p 0 -d result-1-1/filter6-SVM-w-2-1.model > result-1-1/filter6-SVM-w-2-1-report.txt
$JBIN -Xmx3g weka.classifiers.trees.J48 -C 0.25 -M 2 -A -i -t result-1-1/filter9-weka-train.arff -T result-1-1/filter9-weka-test.arff -p 0 -d result-1-1/filter9-J48.model > result-1-1/filter9-J48-report.txt
$JBIN -Xmx3g weka.classifiers.functions.LibSVM -S 0 -K 2 -D 3 -G 0.0 -R 0.0 -N 0.5 -M 1000.0 -C 1000000.0 -E 0.0010 -P 0.1 -Z -o -i -t result-1-1/filter9-weka-train.arff -T result-1-1/filter9-weka-test.arff -p 0 -d result-1-1/filter9-SVM.model > result-1-1/filter9-SVM-report.txt
$JBIN -Xmx3g weka.classifiers.functions.LibSVM -S 0 -K 2 -D 3 -G 0.0 -R 0.0 -N 0.5 -M 1000.0 -C 1000000.0 -E 0.0010 -P 0.1 -Z -W '1 2' -o -i -t result-1-1/filter9-weka-train.arff -T result-1-1/filter9-weka-test.arff -p 0 -d result-1-1/filter9-SVM-w-1-2.model > result-1-1/filter9-SVM-w-1-2-report.txt
$JBIN -Xmx3g weka.classifiers.functions.LibSVM -S 0 -K 2 -D 3 -G 0.0 -R 0.0 -N 0.5 -M 1000.0 -C 1000000.0 -E 0.0010 -P 0.1 -Z -W '2 1' -o -i -t result-1-1/filter9-weka-train.arff -T result-1-1/filter9-weka-test.arff -p 0 -d result-1-1/filter9-SVM-w-2-1.model > result-1-1/filter9-SVM-w-2-1-report.txt
where
$JBIN is an environment variable that points to the java bin. Now to run these in parallel but with a limit on the number of processes, use the
xargs command to split the input lines as follows:
$ cat cmd.txt | xargs -0 -d '\n' -L 1 -I {} -P 3 bash -c "eval \"{}\""
The options used are:
- -0 to retain quotes in the input line and presume arguments are terminated as \0 characters
- -d '\n' to set newline as the delimiter between arguments, overriding \0 in the previous point
- -L 1 to read one line at a time
- -I {} to set parenthesis as a replacement string to substitute the argument read, in this case an entire line
- -P 3 to limit to a maximum of 3 processes
- bash -c "eval \"{}\"" to execute the substituted command within bash
And that is it. It works as long as the commands are on a single line. I have yet to test it on commands spanning multiple lines.
No comments:
Post a Comment