




Fortran Programming Language









Speed of complex evaluation
Hello, I received some code that compute the square of complex abs value like so: rz = real(z,fp) iz = aimag(z) z2 = rz*rz + iz*iz at which point I thought: "wouldn't z2 = abs(z)**2 be faster?" I ran a quick test and the first method about is consistently faster than the other (310x faster depending on compiler used; no optimization; linux RHE 4.0) Any gurus out there know possible reason why? cheers, paulv  Paul van Delst Ride lots. CIMSS @ NOAA/NCEP/EMC Eddy Merckx
Paul van Delst wrote: > Hello, > I received some code that compute the square of complex abs value like so: > rz = real(z,fp) > iz = aimag(z) > z2 = rz*rz + iz*iz > at which point I thought: "wouldn't > z2 = abs(z)**2 > be faster?" I ran a quick test and the first method about is > consistently faster than the other (310x faster depending on compiler > used; no optimization; linux RHE 4.0) > Any gurus out there know possible reason why?
With most compilers, unless you use an option like ffastmath, precautions will be taken against over/underflow in abs(), likely with an external function call. Even if they are not, the unnecessary sqrt() is time consuming.
Paul van Delst wrote: > I received some code that compute the square of complex abs value like so: > rz = real(z,fp) > iz = aimag(z) > z2 = rz*rz + iz*iz > at which point I thought: "wouldn't > z2 = abs(z)**2 > be faster?" I ran a quick test and the first method about is > consistently faster than the other (310x faster depending on compiler > used; no optimization; linux RHE 4.0)
abs(z) is at least computed as sqrt(real(z)**2+aimag(z)**2), with extra work to avoid overflow for large real(z) or aimag(z) when abs(z) doesn't overflow. So the latter method at least takes an extra square root that isn't needed. I believe it is actually closer to real(z)*sqrt(1+(iamag(z)/real(z))**2) when abs(real(z)) is larger than abs(aimag(z)), or the other way around when abs(aimag(z)) is larger.  glen
On Apr 11, 3:53 pm, Paul van Delst <Paul.vanDe@noaa.gov> wrote:
> Hello, > I received some code that compute the square of complex abs value like so: > rz = real(z,fp) > iz = aimag(z) > z2 = rz*rz + iz*iz > at which point I thought: "wouldn't > z2 = abs(z)**2 > be faster?" I ran a quick test and the first method about is consistently faster than the > other (310x faster depending on compiler used; no optimization; linux RHE 4.0) > Any gurus out there know possible reason why?
As well as the abs() difference, you might also check out what your particular compiler(s) do w/ the "**2" as opposed to "*" w/o optimization. W/ BASIC in the olden days, replacing exponentiation w/ repeated multiplies was routine.
dpb wrote:
(snip) > As well as the abs() difference, you might also check out what your > particular compiler(s) do w/ the "**2" as opposed to "*" w/o > optimization. W/ BASIC in the olden days, replacing exponentiation w/ > repeated multiplies was routine.
Good question, but note that BASIC, at least traditionally, doesn't have integer variables (or constants). In Fortran X**2 and X**2.0 are very different. The former is done by multiplication, even if done through a subroutine call. Even at low optimization levels, it is usual to do X**2 inline, less usual as the power gets higher. There is a convenient algorithm that does integer exponentiation. If I remember it right, the number of multiplications needed is one less than the sum of the number of bits in the binary representation of the power (not including leading zeroes) plus the number of '1' bits in the representation. (Add one divide for negative powers.) It isn't always optimal, but it is close. (Note that X**27 takes seven multiplies, but could be done in six.)  glen
glen herrmannsfeldt wrote: > In Fortran X**2 and X**2.0 are very different. The former is > done by multiplication, even if done through a subroutine call. > Even at low optimization levels, it is usual to do X**2 inline, > less usual as the power gets higher.
Widely quoted benchmarks, such as Polyhedron, encourage compilers to optimize integral real powers as if written with integer powers.
glen herrmannsfeldt wrote: > Paul van Delst wrote: >> I received some code that compute the square of complex abs value like >> so: >> rz = real(z,fp) >> iz = aimag(z) >> z2 = rz*rz + iz*iz >> at which point I thought: "wouldn't >> z2 = abs(z)**2 >> be faster?" I ran a quick test and the first method about is >> consistently faster than the other (310x faster depending on compiler >> used; no optimization; linux RHE 4.0) > abs(z) is at least computed as sqrt(real(z)**2+aimag(z)**2), > with extra work to avoid overflow for large real(z) or > aimag(z) when abs(z) doesn't overflow.
Hello, This is what I get with 3 different compilers on linux (performing the calculation 10^9 times writing the results to disk so they're not inadvertently optimised away...I hope) for a bogstandard compile. Also note that the timing for the "z2(i) = rz*rz + iz*iz" case includes the statements rz = real(z(i),fp) iz = aimag(z(i)) to get the complex number components. Lahey f95  lnx:/f90_Test/Generic_Test_Programs : lf95 version Lahey/Fujitsu Fortran 95 Express Release L6.20d lnx:f90_Test/Generic_Test_Programs : lf95 test_zabs2.f90 Encountered 0 errors, 0 warnings in file test_zabs2.f90. lnx:f90_Test/Generic_Test_Programs : a.out Elapsed time: 00:00:05.849 z2(i) = rz*rz + iz*iz Elapsed time: 00:00:29.817 z2(i) = abs(z(i))**2 Elapsed time: 00:00:14.255 z2(i) = (sqrt(real(z(i),fp)**2+aimag(z(i))**2))**2 PGI f95 (old version)  lnx:f90_Test/Generic_Test_Programs : pgf90 V pgf90 6.05 32bit target on x86 Linux lnx:f90_Test/Generic_Test_Programs : pgf90 test_zabs2.f90 lnx:f90_Test/Generic_Test_Programs : a.out Elapsed time: 00:00:06.371 z2(i) = rz*rz + iz*iz Elapsed time: 00:00:16.196 z2(i) = abs(z(i))**2 Elapsed time: 00:00:16.004 z2(i) = (sqrt(real(z(i),fp)**2+aimag(z(i))**2))**2 g95  lnx:f90_Test/Generic_Test_Programs : g95 version G95 (GCC 4.0.3 (g95 0.91!) Mar 20 2007) lnx:f90_Test/Generic_Test_Programs : g95 test_zabs2.f90 lnx:f90_Test/Generic_Test_Programs : a.out Elapsed time: 00:00:12.514 z2(i) = rz*rz + iz*iz Elapsed time: 00:01:42.810 z2(i) = abs(z(i))**2 Elapsed time: 00:00:16.674 z2(i) = (sqrt(real(z(i),fp)**2+aimag(z(i))**2))**2 This suggest to me that sqrt() is to be avoided at all costs. Optimisation up to O2 for all three only had an effect on the first method in g95. 6s instead of 12s. The very long time for use of abs() in g95 is a bit strange (I assume it's an anomaly that is fixable) cheers, paulv  Paul van Delst Ride lots. CIMSS @ NOAA/NCEP/EMC Eddy Merckx





