Shell script for splitting tasks to multiple cores

by Jarno Elonen, 2007-12, released into the Public Domain

This bash script distributes a sequence of shell commands from stdin into multiple parallel screen sessions, running in parallel, in order to utilize dual/quad core or multiprocessor systems when the programs handling the individual tasks can only utilize one unit.

NOTE: if you don't need progress counter, see also the xargs command's argument --max-procs.

Usage

The script takes no command line arguments. It only reads a sequence of commands from stdin. To run a fictitious command inpaint on file 0001.png - 0080.png, you would do:

for x in 0*.png; do echo "inpaint $x"; done | core-split.sh

...and the output would look like this:

Scheduling 80 commands, about 21 per processing unit...
Processes started. Waiting for completion...
  23%...
  47%...
  71%...
  92%...
  100%...
All done.

The code

Download core-split.sh or copy-paste it from below. This version is for quad core processors - changes the line CORES= if you have something else:

#!/bin/sh
CORES="00 01 02 03"

if [ -n "$1" ]; then
  echo "The 'core-split' command doesn't take command line arguments. Feed commands to run to stdin."
  exit 2
fi

TMP=`tempfile`
BATCHID=`echo "$TMP" | md5sum | head -c 16`
PROGRESS_FILE="${TMP}_progress"

# Calculate tasks per processing unit
cat /dev/stdin > "$TMP"
LINES=`wc -l < "$TMP"`
SL=`echo "(($LINES+5)/4)" | bc`
#cat "$TMP" | sed "s@^@(echo . >> '$PROGRESS_FILE'); @"
#exit
cat "$TMP" | sed "s@^@(echo . >> '$PROGRESS_FILE'); @"  | split -l $SL -d - "${TMP}_proc"

# Split processes into 'screen's
echo "Scheduling $LINES commands, about $SL per processing unit..."
for x in $CORES; do
  echo "...core $x"
  screen -d -m -S "${BATCHID}_proc$x" bash "${TMP}_proc$x"
  sleep 1
done

# Wait for screens to end
echo "Processes started. Waiting for completion..."
while (screen -ls | grep -q "$BATCHID"); do
  DONE=`wc -l < "$PROGRESS_FILE"`
  PERC=`echo " $DONE * 100 / $LINES" | bc`
  echo "  $PERC%"
  sleep 2
done

# Clean up temporaries
rm -f "$TMP"
for x in $CORES; do
  rm -f "${TMP}_proc$x"
done

echo "All done."