TAGS :Viewed: 191 - Published at: a few seconds ago

[ Efficient way to drop a column from a Numpy array? ]

If I have a very large numpy array with one useless column, how could I drop it without creating a copy of the original array?

np.delete(my_np_array, 0, 1)

The above code will return a copy of the array without the zero-th column. But instead I would like to simply delete that column from my_np_array since I don't need it. For very large datasets, the memory management becomes important and copying may not be an option.

Answer 1


If memory is the main concern, what you can do is move columns around within your array such that the unneeded column gets at the very end of your array, then use ndarray.resize, which modifies he array in-place, to shrink it down and discard the outer column.

You cannot simply remove the first column of an array in-place using the provided API, and I suspect it is because of the memory layout of an ndarray that maps multidimensional indexing to unidimensional byte-oriented addressing within blocks of contiguous memory.

The following example copies the last column into the first and then deletes the last (now unneeded), immediately purging the associated memory. So it basically removes the obsolete column from memory completely, at the cost of changing your column order.

D1, D2 = A.shape
A[:, 0] = A[:, D2-1] 
A.resize((D1, D2-1), refcheck=False)
A.shape  
# => would be (5, 4) if the shape was initially (5, 5) for example

Answer 2


If you use slicing numpy won't make a copy; in other words

a = numpy.array([1, 2, 3, 4, 5])
b = a[1:]  # view elements from second to last, NOT making a copy
b[0] = 12  # Change first element of `b`, i.e. second of `a`
print a

will reply [1, 12, 3, 4, 5]

If you need to delete an element in the middle however a single slicing won't work.

Answer 3


Numpy arrays are immutable. So they can't be re-sized without creating a intermediate copy. How to remove specific elements in a numpy array Creating a view with slicing, and make a copy of that is probably the fastest you can do.

In [804]: a = np.ones((2,2))

In [805]: a
Out[805]:
array([[ 1.,  1.],
       [ 1.,  1.]])

In [806]: np.resize(a,(3,2))
Out[806]:
array([[ 1.,  1.],
       [ 1.,  1.],
       [ 1.,  1.]])

In [807]: a  <- a should now be resized if it was done inplace? 
Out[807]:
array([[ 1.,  1.],
       [ 1.,  1.]])