Using IPython for parallel computing

IPython can be used to execute parallel commands on a single or remote machine. Here is a little howto to get it done using SSH!

Note: this howto needs that all implied machines have an ssh access to the controller. If this is not the case, it is possible to achieve it anyway, using ssh tunnels.

First, on the machine where you will execute IPython, we start a controller, specifying on which ports it “listens”. Specifying those ports is important as we will have to create tunnels to them.

controller-host ~ $ ipcontroller --ip=0.0.0.0 --location=127.0.0.1 --port=10101 --HubFactory.hb=10102,10112 --HubFactory.control=10203,10103 --HubFactory.mux=10204,10104 --HubFactory.task=10205,10105

The controller will create a control file in ~/.ipython/profile-default/security. This directory will need to be accessible to all other engine machines.

Now, on each machine that will be used to execute the commands, we mount the “security” folder using SSHFS. We start by configuring SSH to enable password-less login.

engine-host ~ $ ssh-copy-id USER@IP_CONTROLLER
engine-host ~ $ rm -rf ~/.ipython/profile_default/security/*
engine-host ~ $ sshfs USER@IP_CONTROLLER:.ipython/profile-default/security ~/.ipython/profile-default/security

We then create a series of SSH tunnels to the controller machine.

engine-host ~ $ for port in 10101 10102 10112 10103 10104 10105; do ssh USER@IP_CONTROLLER -f -N -L $port:localhost:$port; done

Finally, we start N engines, where N is the number of processors available on the machine (automatically detected).

engine-host ~ $ ipcluster engines

Done!

Here is an example of some code that would be executed on remote machines:

#!/usr/bin/env python

from IPython.parallel import Client
c = Client()
print "Using %d processors!" % (len(c.ids))

def f(lenght):
    import numpy as np
    rnd_matrix = np.random.random((lenght, lenght))
    return np.sum(np.dot(rnd_matrix, rnd_matrix))

lenghts = range(1000, 5000, 50)

lb_view = c.load_balanced_view()
results = lb_view.map(f, lenghts, block=True)
print results

Note on ssh tunnels:

To link the local port 2222 to the port 22 of a machine that we can’t access directly (machine1), passing through an intermediary machine (machine2):

$ ssh user@machine2 -L 2222:machine1:22

To mount a folder using sshfs:

$ sshfs -p 2222 localhost:remote_folder local_folder

Finally, here is the adapted tunnel creation loop:

$ for port in 10101 10102 10112 10103 10104 10105; do ssh -p 2222 localhost -f -N -L $port:localhost:$port; done