TAGS :Viewed: 2 - Published at: a few seconds ago

[ result difference between scipy.fftpack.fft2 and cufft ]

Now, I am porting my python script to CUDA program. In my python script, scipy.fftpack.fft2 is used. To validate the results of cufft, I wrote the sample program using cufft. However, it seems that there were differences between scipy.fftpack.fft2 and cufft.

Is there any suggestion?

python script:

def test2():
   g = [18,19,19,23,24,24,23,24,24]
   g = numpy.array(g)
   g.shape = [3,3]
   G = fft2(g)

   print "---------------"
   print g
   print G

results of python script:

    [[18 19 19]
     [23 24 24]
     [23 24 24]]
    [[ 198.+0.j   -3.+0.j   -3.+0.j]
     [ -15.+0.j    0.+0.j    0.+0.j]
     [ -15.+0.j    0.+0.j    0.+0.j]]

cuda program:

        cufftHandle plan;
        int nRows = 3;
        int nCols = 3;
        cufftPlan2d(&plan, nRows, nCols, CUFFT_R2C);
        float h_in[9] = {18,19,19,23,24,24,23,24,24};
        float* d_in;
        cudaMalloc(&d_in, sizeof(cufftComplex)*9); 
        cufftComplex* d_freq;
        cudaMalloc(&d_freq, sizeof(cufftComplex)*9); 
        cudaMemcpy(d_in,h_in,sizeof( cufftComplex)*9,cudaMemcpyHostToDevice);
        cufftExecR2C(inverse_plan, d_in, d_freq);
        cufftComplex* h_freq = (float2*)malloc(sizeof( cufftComplex)*9);    
        cudaMemcpy(h_freq,d_freq,sizeof( cufftComplex)*9,cudaMemcpyDeviceToHost);
        for(int i=0; i<9; i++) {
        printf("%i %f %f\n", i, h_freq[i].x, h_freq[i].y);

results of cuda program:

0 198.000000 -0.000001
1 -2.999996 -0.000001
2 -15.000000 0.000000
3 -0.000000 0.000000
4 -15.000000 0.000000
5 -0.000000 0.000000
6 497922732955248410000000000000.000000 8589934592.000000
7 572199135312371230000000000000.000000 8589934592.000000
8 -0.000000 0.000000

Answer 1

I am no cufft expert, but the naming kind of gives away what's going on:

  • In numpy you are running a full 2D FFT. Because your input is real, the output is symmetrical, as you can see: the last item in every row (or column) is equal to the previous one.

  • You can take advantage of this to run the FFT faster, and in numpy this is implemented with the rfft2 function:

    >>> np.fft.rfft2(g)
    array([[ 198.+0.j,   -3.+0.j],
           [ -15.+0.j,    0.+0.j],
           [ -15.+0.j,    0.+0.j]])
  • My guess is that the R2C in your CUFFT_R2C plan name means 'real to complex', so you are asking for the equivalent of np.rfft2. And if you set aside the unused last 3 items of your array, the results are almost identical, aside from rounding errors, and the fact that your CUDA implementation is working with 32 bit floats, instead of the 64 that numpy willl be using by default.

  • A quick google search reveals that CUFFT_C2C and cufftExecR2C are valid cufft identifiers. Using those should yield the correct result you are after. For an even closer reproduction, refactor your code and go with the Z2Z versions, which work on double, not float.