TAGS :Viewed: 10 - Published at: a few seconds ago

[ using multiprocessing with theano ]

I'm trying to use theano with cpu-multiprocessing with a neural network library, Keras.

I use device=gpu flag and load the keras model. Then for extracting features for over a million images, im using multiprocessing pool.

The function looks something like this:

from keras import backend as K

f = K.function([model.layers[0].input, K.learning_phase()], [model.layers[-3].output,])

def feature_gen(flz):
    im = imread(flz)
    cPickle.dump(f([im, 0])[0][0], open(flz, 'wb'), -1)

pool = mp.Pool(processes=10)
results = [pool.apply_async(feature_gen, args=(f, )) for f in filelist]]

This however starts creating pools in the GPU memory and my code fails with memory error. Is it possible to force multiprocessing to create threads in CPU memory and then use specific parts for feature extraction such as f([im, 0])[0][0] with GPU?

If not, is there an alternative to do the same thing in parallel in python?

Answer 1

It is possible to use multiple processes if the other processes do not use keras, to my knowledge you need to restrict usage of keras to a single process. This seems to include all keras classes and methods, even those who do not seem to use the gpu, e.g. ImageDataGenerator.

If the workload is GPU limited it is also possible to use the threading library, which creates threads instead of processes, e.g. to load data while the GPU processes the previous batch, then the restriction does not apply. Due to the global interpreter lock this is not a solution in CPU limited environments.

Your situation looks like a parallel [read, do work on GPU, write]. This can be reformed to a pipeline, e.g. some processes reading, the main process performing GPU work and some processes writing.

  1. Create Queue objects for input/output (threading.Queue or multiprocessing.Queue)
  2. Create background worker threads/processes which read data from disk and feed it to the input queue
  3. Create background worker threads/processes which write data from the output queue to disk
  4. main loop which takes data from the input queue, creates batches, processes the data on the gpu and fills the output queue