Overview

Occasionally I find myself wanting to use a python function in a bash script. This is fairly easy to do with python's -c option and a heredoc, but can come with a small initialisation overhead, especially if the python script needs to import any modules. This is fine if it is only executed once within the bash script, but may present a significant performance hit if run in a loop.

Instead of giving up and rewriting the whole script in python itself, we can use Linux named pipes to pipe to and from a single background instance of the python script.

This may be best described with an example. For the sake of demonstration, the python script just prints out the input it receives; you would obviously perform whatever python transformation you require.

Naive approach

naive.sh

# Sample data
arr=(apple orange pear strawberry raspberry blueberry grape banana)

# The main loop
for x in "${arr[@]}"; do
	result="$(echo $x | python -c "
val = input()
# Perform python processing on val
print('python: ' + val)
	")"
	
	# Perform shell processing on $result

	echo $result
done

When I time this with time naive.sh, it takes about 0.2s real time to complete: this is incredibly slow when all we are doing is printing 8 values! Because the overhead is incurred with every loop, this will slow down linearly with the size of the data array.

Named pipe approach

fifos.sh

# Sample data
arr=(apple orange pear strawberry raspberry blueberry grape banana)

# Create temp files names
in_pipe="$(mktemp -u)"
out_pipe="$(mktemp -u)"

# Create input/output pipes
mkfifo $in_pipe
mkfifo $out_pipe

# Run python with -u to prevent output buffering
python -uc "
import sys
# Loop stdin, strip trailing newlines
for val in map(str.rstrip, sys.stdin):
	# Perform processing on val
	print('python: ' + val)
" <$in_pipe >$out_pipe &
# ^ Redirect stdin and stdout to pipes

# Hold pipes open
exec 3<>$in_pipe
exec 4<>$out_pipe

# The main loop
for x in "${arr[@]}"; do
	# Write to input pipe
	echo $x >$in_pipe
	
	# Read from output pipe
	read result <$out_pipe
	
	# perform shell processing on $result

	echo $result
done

# Cleanup: close pipes
exec 3>&-
exec 4>&-

# Cleanup: remove fifo files
rm $in_pipe
rm $out_pipe

When I time the main loop here, it completes in 0.035s real time: quite an improvement. You may think that 0.035s is still too long to print 8 values, but this factors in the single instance of overhead initialising the python script. Crucially, this delay won't scale with the size of the input array. If I time just the main loop (after already setting up the python script), it completes in 0.001s real time.

Explanation

Create named pipes

in_pipe="$(mktemp -u)"
mkfifo $in_pipe

We start by creating two named pipes to serve as the input/output to our python script. The -u option to mktemp specifies that a name should be generated, but no file created. The mkfifo command then creates a named pipe at the specified location. You can use hardcoded names in place of the mktemp command, but you may not wish the pipe files to clutter your working directory, and there may be a problem if files with those names already exist.

Prevent python from buffering

python -uc "

The -u option is important as it specifies that the output should be unbuffered; without it, python will buffer a certain amount of data from print before flushing it to the output: output values would not be available at the time we wanted to read them.

Instruct python to iterate stdin

for val in map(str.rstrip, sys.stdin):

Because we will repeatedly be piping into the same python instance, we need a loop inside our python script that will execute the same instructions for every input. We can loop over sys.stdin, which returns an iterable of input lines (delimited by a newline character). We likely don't want the newline character that appears at the end, so use map(str.rstrip, ...) to get rid of these.

Redirect python's stdin/stdout and send to background

<$in_pipe >$out_pipe &

<$in_pipe and >$out_pipe replace stdin/stdout respectively with our input and output pipes, and & on the end sends the process to the background.

Keep the in/out pipes open

exec 3<>$in_pipe
exec 4<>$out_pipe

These commands are probably the least obvious, but they are critical. If you want to understand these in depth you should research bash redirections, but I will attempt a TLDR. Our sys.stdin python loop will continue iterating until an EOF (end of file) character is received. Linux will send EOF to the pipe when all writers have closed. We need to keep the input pipe open until we are done, else EOF will be sent after our first write to the pipe. The command exec 3<>$in_pipe opens the input pipe and assigns it to the shell's file descriptor number 3, keeping it open for until we explicitly close it later. The output pipe is a similar story; Linux will close it after python writes to it the first time, unless kept open else where. We assign it to file descriptor 4 to hold it open.

Main loop: write

echo $x >$in_pipe

Within our main loop: rather than echoing directly to python as in the naive example, we write our input to the in pipe we assigned to python.

Main loop: read

read result <$out_pipe

Within our main loop: read output values from our $out_pipe, which is where we instructed python to write it's results.

Cleanup: close the input pipe

exec 3>&-

Remember that python will keep iterating until it's stdin closes, but we have held stdin open by assigning it to fd 3. This command closes the file descriptor, which will cause python's sys.stdin to stop providing values. The python for loop will exit, and the python script will exit in the background. If your python script needs any cleanup, just insert it after the for val in map(str.rstrip, sys.stdin): loop; it will run as normal when the loop finishes, before python exits

Cleanup: other

exec 4>&-
rm $in_pipe
rm $out_pipe

Keep everything clean by closing file descriptor 4 and remove our temporary pipes. We're all done!