Home     |     .Net Programming    |     cSharp Home    |     Sql Server Home    |     Javascript / Client Side Development     |     Ajax Programming

Ruby on Rails Development     |     Perl Programming     |     C Programming Language     |     C++ Programming     |     IT Jobs

Python Programming Language     |     Laptop Suggestions?    |     TCL Scripting     |     Fortran Programming     |     Scheme Programming Language


 
 
Cervo Technologies
The Right Source to Outsource

MS Dynamics CRM 3.0

Fortran Programming Language

Speed of complex evaluation


Hello,

I received some code that compute the square of complex abs value like so:

   rz = real(z,fp)
   iz = aimag(z)
   z2 = rz*rz + iz*iz

at which point I thought: "wouldn't

   z2 = abs(z)**2

be faster?" I ran a quick test and the first method about is consistently faster than the
other (3-10x faster depending on compiler used; no optimization; linux RHE 4.0)

Any gurus out there know possible reason why?

cheers,

paulv

--
Paul van Delst             Ride lots.
CIMSS @ NOAA/NCEP/EMC               Eddy Merckx

With most compilers, unless you use an option like -ffast-math,
precautions will be taken against over/underflow in abs(), likely with
an external function call. Even if they are not, the unnecessary sqrt()
is time consuming.

Paul van Delst wrote:
> I received some code that compute the square of complex abs value like so:
>   rz = real(z,fp)
>   iz = aimag(z)
>   z2 = rz*rz + iz*iz
> at which point I thought: "wouldn't
>   z2 = abs(z)**2
> be faster?" I ran a quick test and the first method about is
> consistently faster than the other (3-10x faster depending on compiler
> used; no optimization; linux RHE 4.0)

abs(z) is at least computed as sqrt(real(z)**2+aimag(z)**2),
with extra work to avoid overflow for large real(z) or
aimag(z) when abs(z) doesn't overflow.

So the latter method at least takes an extra square
root that isn't needed.  I believe it is actually
closer to

  real(z)*sqrt(1+(iamag(z)/real(z))**2) when
abs(real(z)) is larger than abs(aimag(z)), or
the other way around when abs(aimag(z)) is larger.

-- glen

On Apr 11, 3:53 pm, Paul van Delst <Paul.vanDe@noaa.gov> wrote:

As well as the abs() difference, you might also check out what your
particular compiler(s) do w/ the "**2" as opposed to "*" w/o
optimization.  W/ BASIC in the olden days, replacing exponentiation w/
repeated multiplies was routine.

dpb wrote:

(snip)

> As well as the abs() difference, you might also check out what your
> particular compiler(s) do w/ the "**2" as opposed to "*" w/o
> optimization.  W/ BASIC in the olden days, replacing exponentiation w/
> repeated multiplies was routine.

Good question, but note that BASIC, at least traditionally, doesn't
have integer variables (or constants).

In Fortran X**2 and X**2.0 are very different.  The former is
done by multiplication, even if done through a subroutine call.
Even at low optimization levels, it is usual to do X**2 inline,
less usual as the power gets higher.

There is a convenient algorithm that does integer exponentiation.
If I remember it right, the number of multiplications needed is
one less than the sum of the number of bits in the binary
representation of the power (not including leading zeroes) plus
the number of '1' bits in the representation.  (Add one divide
for negative powers.)  It isn't always optimal, but it is close.
(Note that X**27 takes seven multiplies, but could be done in six.)

-- glen

glen herrmannsfeldt wrote:

> In Fortran X**2 and X**2.0 are very different.  The former is
> done by multiplication, even if done through a subroutine call.
> Even at low optimization levels, it is usual to do X**2 inline,
> less usual as the power gets higher.

Widely quoted benchmarks, such as Polyhedron, encourage compilers to
optimize integral real powers as if written with integer powers.

Hello,

This is what I get with 3 different compilers on linux (performing the calculation 10^9
times writing the results to disk so they're not inadvertently optimised away...I hope)
for a bog-standard compile. Also note that the timing for the "z2(i) = rz*rz + iz*iz" case
includes the statements
   rz = real(z(i),fp)
   iz = aimag(z(i))
to get the complex number components.

Lahey f95
---------
lnx:/f90_Test/Generic_Test_Programs : lf95 --version
Lahey/Fujitsu Fortran 95 Express Release L6.20d

lnx:f90_Test/Generic_Test_Programs : lf95 test_zabs2.f90
Encountered 0 errors, 0 warnings in file test_zabs2.f90.
lnx:f90_Test/Generic_Test_Programs : a.out
  Elapsed time: 00:00:05.849 z2(i) = rz*rz + iz*iz
  Elapsed time: 00:00:29.817 z2(i) = abs(z(i))**2
  Elapsed time: 00:00:14.255 z2(i) = (sqrt(real(z(i),fp)**2+aimag(z(i))**2))**2

PGI f95 (old version)
---------------------
lnx:f90_Test/Generic_Test_Programs : pgf90 -V
pgf90 6.0-5 32-bit target on x86 Linux

lnx:f90_Test/Generic_Test_Programs : pgf90 test_zabs2.f90
lnx:f90_Test/Generic_Test_Programs : a.out
  Elapsed time: 00:00:06.371 z2(i) = rz*rz + iz*iz
  Elapsed time: 00:00:16.196 z2(i) = abs(z(i))**2
  Elapsed time: 00:00:16.004 z2(i) = (sqrt(real(z(i),fp)**2+aimag(z(i))**2))**2

g95
---
lnx:f90_Test/Generic_Test_Programs : g95 --version
G95 (GCC 4.0.3 (g95 0.91!) Mar 20 2007)

lnx:f90_Test/Generic_Test_Programs : g95 test_zabs2.f90
lnx:f90_Test/Generic_Test_Programs : a.out
  Elapsed time: 00:00:12.514 z2(i) = rz*rz + iz*iz
  Elapsed time: 00:01:42.810 z2(i) = abs(z(i))**2
  Elapsed time: 00:00:16.674 z2(i) = (sqrt(real(z(i),fp)**2+aimag(z(i))**2))**2

This suggest to me that sqrt() is to be avoided at all costs. Optimisation up to -O2 for
all three only had an effect on the first method in g95. 6s instead of 12s. The very long
time for use of abs() in g95 is a bit strange (I assume it's an anomaly that is fixable)

cheers,

paulv

--
Paul van Delst             Ride lots.
CIMSS @ NOAA/NCEP/EMC               Eddy Merckx

Add to del.icio.us | Digg this | Stumble it | Powered by Megasolutions Inc