Fast Fourier Transform and rgb2gray (again)

Note: My thesis is done. However, there are still some things which I find of interest to address here. Due to how our thesis was (mis)handled, I’m only finding the time to post about these things now.

Last time, I told you how we are porting all our Matlab/Octave code to Java. I wrote about a huge difference between Matlab and Octave’s implementation of rgb2gray. However, as it became apparent to me, my issues with rgb2gray are not quite done yet.

First a little background. Our project relies heavily on the Fast Fourier Transform, a numerical algorithm whose products I bet you encounter everyday though you probably are not aware of it. The 2D FFT is built-in with Octave, as expected for a language meant specifically for mathematical processing. The same, however, cannot be said for Java.

It’d be crazy for me to attempt to code it from scratch due to a number of reasons not the least of which is the fact that I don’t really have a solid training with numerical algorithms. So, after browsing through various libraries, we decided to use JTransforms, largely because we can see the source code for ourselves and it is natively in Java, in contrast to, say, FFTW, which has Java bindings but is natively in C. Plus, it does not have power-of-2 constraints, a common trait among various implementations of the FFT.

Now, the thing is, there seems to be quite a considerable offset between the FFT results of Octave¹ and JTransforms; the kind of offset which you’d be so tempted to chalk up to floating-point handling differences though some part of your reason tells you it is too large to be so. The offset I was seeing ranged from 2 to 12 integer values.

As I admitted earlier, I don’t have any solid grounding with numerical algorithms. The way I see it, there are a number of possible culprits on why I’m not getting the results I should get. The FFT, after all, is a blanket term for a family of algorithms which all do the same thing, namely, apply the Fourier Transform to a discrete collection of values (i.e., vectors or matrices).

What I’ve Tried (Or, the scientific probable reason on why there is this offset, as discussed by my adviser)

Warning: Some equations ahead. I wouldn’t be delving too much into the equations and you should be able to understand my main point with just some basic algebra. But just so you know.

Consider this tutorial on the FFT.

It defines the 2D FFT as:

$F[k,l] = \frac{1}{\sqrt{MN}}\sum_{n=0}^{N-1}\sum_{m=0}^{M-1}f[m,n]e^{-2\pi i(\frac{mk}{M}+\frac{nl}{N})}$ $f[m,n]= \frac{1}{\sqrt{MN}}\sum_{l=0}^{N-1}\sum_{k=0}^{M-1}F[k,l]e^{2\pi i(\frac{mk}{M}+\frac{nl}{N})}$ $0 \leq m,k \leq M-1, 0 \leq n,l \leq N-1$

where M and N is the number of samples in the x and y direction (or, for our particular problem, the dimensions of the image).

While it may seem interesting to find mutual recursion here, what my adviser told me to note is the constant at the beginning of the equations, $\frac{1}{\sqrt{MN}}$ . According to him, depending on the implementation of the FFT we are using (remember what I told you above that Fast Fourier Transform is just a blanket term for a family of algorithms), the constant varies to either $\frac{1}{\sqrt{MN}}$ or $\frac{1}{MN}$ . It’s now just a matter of scaling the samples (the pixel values) by the appropriate constant to get the desired results.

The tutorial I linked to presents the following sample signal to demonstrate the FFT:

which should evaluate to the following real and imaginary component respectively

The following code listing directly translates the above example to Octave code:

x_real = zeros(8, 8);
x_real(2,3) = 70;
x_real(2,4) = 80;
x_real(2,5) = 90;
x_real(3,3) = 90;
x_real(3,4) = 100;
x_real(3,5) = 110;
x_real(4,3) = 110;
x_real(4,4) = 120;
x_real(4,5) = 130;
x_real(5,3) = 130;
x_real(5,4) = 140;
x_real(5,5) = 150
x_fft = fft2(x_real);
x_r = real(x_fft)
x_i = imag(x_fft)

Which results to the following output:

x_real =
 
     0     0     0     0     0     0     0     0
     0     0    70    80    90     0     0     0
     0     0    90   100   110     0     0     0
     0     0   110   120   130     0     0     0
     0     0   130   140   150     0     0     0
     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0
 
x_r =
 
   1.3200e+03  -7.9113e+02   8.0000e+01  -1.6887e+02   4.4000e+02  -1.6887e+02   8.0000e+01  -7.9113e+02
  -5.0485e+02  -9.0711e+01   2.2142e+02   1.0586e+02  -1.6828e+02   1.2721e+01  -2.6142e+02   6.8527e+02
   1.2000e+02   0.0000e+00  -4.0000e+01  -2.3431e+01   4.0000e+01   0.0000e+00   4.0000e+01  -1.3657e+02
  -3.3515e+02   1.3414e+02   2.1421e+01   5.0711e+01  -1.1172e+02   3.4731e+01  -6.1421e+01   2.6728e+02
   1.2000e+02  -6.8284e+01   0.0000e+00  -1.1716e+01   4.0000e+01  -1.1716e+01   0.0000e+00  -6.8284e+01
  -3.3515e+02   2.6728e+02  -6.1421e+01   3.4731e+01  -1.1172e+02   5.0711e+01   2.1421e+01   1.3414e+02
   1.2000e+02  -1.3657e+02   4.0000e+01   0.0000e+00   4.0000e+01  -2.3431e+01  -4.0000e+01   0.0000e+00
  -5.0485e+02   6.8527e+02  -2.6142e+02   1.2721e+01  -1.6828e+02   1.0586e+02   2.2142e+02  -9.0711e+01
 
x_i =
 
     0.00000  -711.12698   440.00000    88.87302     0.00000   -88.87302  -440.00000   711.12698
  -724.26407   713.55339  -216.56854    55.56349  -241.42136   134.14214   120.00000   158.99495
   120.00000  -136.56854    40.00000     0.00000    40.00000   -23.43146   -40.00000     0.00000
  -124.26407   255.56349  -120.00000    -6.44661   -41.42136    38.99495   103.43146  -105.85786
     0.00000   -68.28427    40.00000    11.71573     0.00000   -11.71573   -40.00000    68.28427
   124.26407   105.85786  -103.43146   -38.99495    41.42136     6.44661   120.00000  -255.56349
  -120.00000    -0.00000    40.00000    23.43146   -40.00000    -0.00000   -40.00000   136.56854
   724.26407  -158.99495  -120.00000  -134.14214   241.42136   -55.56349   216.56854  -713.55339

x_real = 0 0 0 0 0 0 0 0 0 0 70 80 90 0 0 0 0 0 90 100 110 0 0 0 0 0 110 120 130 0 0 0 0 0 130 140 150 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 x_r = 1.3200e+03 -7.9113e+02 8.0000e+01 -1.6887e+02 4.4000e+02 -1.6887e+02 8.0000e+01 -7.9113e+02 -5.0485e+02 -9.0711e+01 2.2142e+02 1.0586e+02 -1.6828e+02 1.2721e+01 -2.6142e+02 6.8527e+02 1.2000e+02 0.0000e+00 -4.0000e+01 -2.3431e+01 4.0000e+01 0.0000e+00 4.0000e+01 -1.3657e+02 -3.3515e+02 1.3414e+02 2.1421e+01 5.0711e+01 -1.1172e+02 3.4731e+01 -6.1421e+01 2.6728e+02 1.2000e+02 -6.8284e+01 0.0000e+00 -1.1716e+01 4.0000e+01 -1.1716e+01 0.0000e+00 -6.8284e+01 -3.3515e+02 2.6728e+02 -6.1421e+01 3.4731e+01 -1.1172e+02 5.0711e+01 2.1421e+01 1.3414e+02 1.2000e+02 -1.3657e+02 4.0000e+01 0.0000e+00 4.0000e+01 -2.3431e+01 -4.0000e+01 0.0000e+00 -5.0485e+02 6.8527e+02 -2.6142e+02 1.2721e+01 -1.6828e+02 1.0586e+02 2.2142e+02 -9.0711e+01 x_i = 0.00000 -711.12698 440.00000 88.87302 0.00000 -88.87302 -440.00000 711.12698 -724.26407 713.55339 -216.56854 55.56349 -241.42136 134.14214 120.00000 158.99495 120.00000 -136.56854 40.00000 0.00000 40.00000 -23.43146 -40.00000 0.00000 -124.26407 255.56349 -120.00000 -6.44661 -41.42136 38.99495 103.43146 -105.85786 0.00000 -68.28427 40.00000 11.71573 0.00000 -11.71573 -40.00000 68.28427 124.26407 105.85786 -103.43146 -38.99495 41.42136 6.44661 120.00000 -255.56349 -120.00000 -0.00000 40.00000 23.43146 -40.00000 -0.00000 -40.00000 136.56854 724.26407 -158.99495 -120.00000 -134.14214 241.42136 -55.56349 216.56854 -713.55339

As evident, the results are very far from what our source says. However, dividing the sample (matrix x_r) by 8 (cf sample size) produces the results as given by our source.

Add the following line before calling fft2

x_real /= 8;

And we get,

 
x_r =
 
   165.00000   -98.89087    10.00000   -21.10913    55.00000   -21.10913    10.00000   -98.89087
   -63.10660   -11.33883    27.67767    13.23223   -21.03553     1.59010   -32.67767    85.65864
    15.00000     0.00000    -5.00000    -2.92893     5.00000     0.00000     5.00000   -17.07107
   -41.89340    16.76777     2.67767     6.33883   -13.96447     4.34136    -7.67767    33.40990
    15.00000    -8.53553     0.00000    -1.46447     5.00000    -1.46447     0.00000    -8.53553
   -41.89340    33.40990    -7.67767     4.34136   -13.96447     6.33883     2.67767    16.76777
    15.00000   -17.07107     5.00000     0.00000     5.00000    -2.92893    -5.00000     0.00000
   -63.10660    85.65864   -32.67767     1.59010   -21.03553    13.23223    27.67767   -11.33883
 
x_i =
 
    0.00000  -88.89087   55.00000   11.10913    0.00000  -11.10913  -55.00000   88.89087
  -90.53301   89.19417  -27.07107    6.94544  -30.17767   16.76777   15.00000   19.87437
   15.00000  -17.07107    5.00000    0.00000    5.00000   -2.92893   -5.00000    0.00000
  -15.53301   31.94544  -15.00000   -0.80583   -5.17767    4.87437   12.92893  -13.23223
    0.00000   -8.53553    5.00000    1.46447    0.00000   -1.46447   -5.00000    8.53553
   15.53301   13.23223  -12.92893   -4.87437    5.17767    0.80583   15.00000  -31.94544
  -15.00000   -0.00000    5.00000    2.92893   -5.00000   -0.00000   -5.00000   17.07107
   90.53301  -19.87437  -15.00000  -16.76777   30.17767   -6.94544   27.07107  -89.19417

Which, at last, agrees with our source.

The Blooper (or, the actual and embarassing reason why JTransforms FFT is not coinciding with Octave FFT)

In my first post about rgb2gray, I mentioned that I was taking the floor of the average of the RGB components of a pixel (or, in Java code, just perform integer division as the resulting precision loss is tantamount to floor). Big mistake. Octave does not merely floor values. It rounds off values.

In code, so there be no misunderstandings,

public static int[][] rgb2gray(BufferedImage bi) {
  int heightLimit = bi.getHeight();
  int widthLimit = bi.getWidth();
  int[][] converted = new int[heightLimit][widthLimit];
 
  for (int height = 0; height < heightLimit; height++) {
    for (int width = 0; width < widthLimit; width++) {
      Color c = new Color(bi.getRGB(width, height) & 0x00ffffff);
      float rgbSum = c.getRed() + c.getGreen() + c.getBlue();
      float ave = (rgbSum / 3f);
      float diff = ave % 1f;
      int iave = diff < 0.5f ? (int) ave : (int) (ave + 1);
      converted[height][width] = iave;
    }
  }
 
  return converted;
}

The moral of this long story? If two supposedly-similar functions are returning different outputs, check that the inputs are exactly the same before jumping to any wild conclusions.

I’d like to note here that, according to the help file, Octave’s FFT is based on FFTW [↩]

{kode.play();}