Optimize scaleFFTData for float FFTs

BUG=

Speed up scaleFFTData by about 30% by doing the scaling on 4
complex (8 float) elements at a time.

Some timing measurements using perf measuring
time_fft_time -T -F -f 1 -n 11 -g 2 -c 1000000

Before optimization:

             samples  pcnt function                               DSO
             _______ _____ ______________________________________ _____________

             2364.00 25.9% evenOddButterflyLoopInv                [vectors]
             1957.00 21.4% radix4SetLoopINV                       [vectors]
             1197.00 13.1% radix4SkipReadINV                      [vectors]
             1009.00 11.0% scaleFFTData                           [vectors]

After optimization:
             samples  pcnt function                               DSO
             _______ _____ ______________________________________ _____________

             3806.00 25.9% evenOddButterflyLoopInv                [vectors]
             3523.00 23.9% radix4SetLoopINV                       [vectors]
             2103.00 14.3% radix4SkipReadINV                      [vectors]
             1471.00 10.0% radix4lsGrpLoopinv                     [vectors]
             1134.00  7.7% scaleFFTData                           [vectors]

The time spent has gone in scaleFFTData has gone down from 11% to 7.7%.

R=aedla@chromium.org, andrew@webrtc.org, kma@webrtc.org

Review URL: https://webrtc-codereview.appspot.com/1574005

git-svn-id: http://webrtc.googlecode.com/svn/deps/third_party/openmax@4148 4adac7df-926f-26a2-2b94-8c16560cd09d
1 file changed
tree: ce3dda73cb96f2aae2cdd142ba0a2fe632cec053
  1. dl/
  2. LICENSE
  3. OWNERS
  4. README.chromium