[ result difference between scipy.fftpack.fft2 and cufft ]
Now, I am porting my python script to CUDA program. In my python script, scipy.fftpack.fft2 is used. To validate the results of cufft, I wrote the sample program using cufft. However, it seems that there were differences between scipy.fftpack.fft2 and cufft.
Is there any suggestion?
python script:
def test2():
g = [18,19,19,23,24,24,23,24,24]
g = numpy.array(g)
g.shape = [3,3]
G = fft2(g)
print "---------------"
print g
print G
return
results of python script:
---------------
[[18 19 19]
[23 24 24]
[23 24 24]]
[[ 198.+0.j -3.+0.j -3.+0.j]
[ -15.+0.j 0.+0.j 0.+0.j]
[ -15.+0.j 0.+0.j 0.+0.j]]
cuda program:
cufftHandle plan;
int nRows = 3;
int nCols = 3;
cufftPlan2d(&plan, nRows, nCols, CUFFT_R2C);
float h_in[9] = {18,19,19,23,24,24,23,24,24};
float* d_in;
cudaMalloc(&d_in, sizeof(cufftComplex)*9);
cufftComplex* d_freq;
cudaMalloc(&d_freq, sizeof(cufftComplex)*9);
cudaMemcpy(d_in,h_in,sizeof( cufftComplex)*9,cudaMemcpyHostToDevice);
cufftExecR2C(inverse_plan, d_in, d_freq);
cufftComplex* h_freq = (float2*)malloc(sizeof( cufftComplex)*9);
cudaMemcpy(h_freq,d_freq,sizeof( cufftComplex)*9,cudaMemcpyDeviceToHost);
for(int i=0; i<9; i++) {
printf("%i %f %f\n", i, h_freq[i].x, h_freq[i].y);
}
results of cuda program:
0 198.000000 -0.000001
1 -2.999996 -0.000001
2 -15.000000 0.000000
3 -0.000000 0.000000
4 -15.000000 0.000000
5 -0.000000 0.000000
6 497922732955248410000000000000.000000 8589934592.000000
7 572199135312371230000000000000.000000 8589934592.000000
8 -0.000000 0.000000
Answer 1
I am no cufft expert, but the naming kind of gives away what's going on:
In numpy you are running a full 2D FFT. Because your input is real, the output is symmetrical, as you can see: the last item in every row (or column) is equal to the previous one.
You can take advantage of this to run the FFT faster, and in numpy this is implemented with the
rfft2
function:>>> np.fft.rfft2(g) array([[ 198.+0.j, -3.+0.j], [ -15.+0.j, 0.+0.j], [ -15.+0.j, 0.+0.j]])
My guess is that the
R2C
in yourCUFFT_R2C
plan name means 'real to complex', so you are asking for the equivalent ofnp.rfft2
. And if you set aside the unused last 3 items of your array, the results are almost identical, aside from rounding errors, and the fact that your CUDA implementation is working with 32 bit floats, instead of the 64 that numpy willl be using by default.A quick google search reveals that
CUFFT_C2C
andcufftExecR2C
are valid cufft identifiers. Using those should yield the correct result you are after. For an even closer reproduction, refactor your code and go with theZ2Z
versions, which work ondouble
, notfloat
.