[ How to optimize loading of batch on tensorflow? ]
I have a problem with the very slow batch loading in tensorflow. Each step in the training is reasonably fast but my function to load data is extremely slow.
I was wondering if there were any ways to make this faster or run it in the background when train operation is running so that the batch can be ready by the time one step is done.
My features are stored in numpy arrays.
Any Ideas? This is my code.
def test_loadbatch(no_timesteps,list_of_file_paths,batch_size): nof=no_timesteps# No of combined files=list_of_file_paths files=shuffle_list(files) classes=get_class_number(files) temp_batch=np.zeros(shape=(batch_size,no_timesteps,4096),dtype=np.float32) temp_classes=np.zeros(shape=(batch_size,101),dtype=np.float32) bat_num=0 fileno=0 while bat_num != batch_size : if os.path.isfile(str(files[fileno])): val=np.load(str(files[fileno])) try: if val.shape>no_timesteps+2: num=random.randint(0,val.shape-(no_timesteps+2)) temp_batch[bat_num,:,:]=val[num:num+nof,:] temp_classes[bat_num,:]=create_one_hot(classes[fileno]) bat_num=bat_num+1 except Exception as ex: fileno=fileno+1 fileno=fileno+1 return np.maximum(np.tanh(temp_batch), 0),temp_classes #normalize in range 0->1
Input data preparation and training a model using the prepared data can be decoupled in TensorFlow using queues. You can create a queue with
tf.RandomShuffleQueue and enqueue your mini-batches into it with
tf.enqueue. The training part of the graph would get a mini-batch by running
tf.dequeue. Note that you should run the data preparation and training in different Python threads to get concurrency. Please have a look at the how to on threading and queues for more explanation and examples. Also, note that the throughput of the data preparation + training pipeline would be limited by the slowest stage. In your case, if your data preparation per mini-batch is slower than the training step, you may have to run multiple threads to create many mini-batches in parallel to keep up with the training thread.