TAGS :Viewed: 7 - Published at: a few seconds ago

[ Speed up minimum search in Numpy/Python ]

I have two floating arrays and want to find data points which match within a certain range. This is what I got so far:

import numpy as np

for vx in range(len(arr1)):
    match = (np.abs(arr2-arr1[vx])).argmin()
    if abs(arr1[vx]-arr2[match])<0.375:
        point = arr2[match]

The problem is that arr1 contains 150000 elements and arr2 around 110000 elements. This takes an awful amount of time. Do you have suggestions to speed things up?

Answer 1

In addition to not being vectorized, your current search is (n * m) where n is the size of arr2 and m is the size of arr1. In these kinds of searches it helps to sort arr1 or arr2 so you can use a binary search. Sorting ends up being the slowest step but it's still faster if m is large because the n*log(n) sort is faster than (n*m).

Here is how you can do the search in a vectorized way using the sorted array:

def find_closest(A, target):
    #A must be sorted
    idx = A.searchsorted(target)
    idx = np.clip(idx, 1, len(A)-1)
    left = A[idx-1]
    right = A[idx]
    idx -= target - left < right - target
    return A[idx]

closest = find_closest(arr2, arr1)
closest = np.where(abs(closest - arr1) < .375, closest, np.nan)

Answer 2

The whole idea of using numpy is to avoid computation with loops.

Specifying criteria to extract a new array that satisfies the criteria can be implemented easily with array computation. Here's an example extracting values from array a which satisfies the criteria that that element has an absolute different of less than 0.75 from the corresponding element in array b:-

a = array([1, 0, 0.5, 1.2])

b = array([1.2, 1.1, 1.3, 1.4])

c = a[abs(a-b)<0.75]

Which gives us

array([ 1. ,  1.2])