Home     |     .Net Programming    |     cSharp Home    |     Sql Server Home    |     Javascript / Client Side Development     |     Ajax Programming

Ruby on Rails Development     |     Perl Programming     |     C Programming Language     |     C++ Programming     |     IT Jobs

Python Programming Language     |     Laptop Suggestions?    |     TCL Scripting     |     Fortran Programming     |     Scheme Programming Language


 
 
Cervo Technologies
The Right Source to Outsource

MS Dynamics CRM 3.0

Fortran Programming Language

Vector-values functions slow?


Dear newsgroup,

I encountered some problems with vector-valued functions in F90/F95.
Here, is a simple demo-program:

program quick
   implicit none

   integer, parameter :: dp=8
   real(dp), dimension(:), pointer :: u,v,r
   real :: tt0,tt1
   integer :: i,n

   n=1000000

   allocate(u(n),v(n),r(n))

   u=2; v=2

   call cpu_time(tt0)
   do i=1,1000
     r=feval(u,v)
   end do
   call cpu_time(tt1)
   print *, tt1-tt0

   call cpu_time(tt0)
   do i=1,1000
     call seval(u,v,r)
   end do
   call cpu_time(tt1)
   print *, tt1-tt0

   deallocate(u,v,r)

contains

   function feval(u,v) result(r)
     real(dp), dimension(:), intent(in) :: u,v
     real(dp), dimension(size(u)) :: r

     r=u*v
   end function feval

   subroutine seval(u,v,r)
     real(dp), dimension(:), intent(in) :: u,v
     real(dp), dimension(:), intent(out) :: r

     r=u*v
   end subroutine seval

end program quick

I compiled it both with Ifort (9.1) and G95 (0.91) and see, that
functions are much slower. My question is why? Nothing needs to be
allocated on-the-fly. Shouldn't the code perform similar in both cases?
Here are the timing:

G95
function:   19.70523
subroutine:  5.620352

Ifort
function:   9.824615
subroutine: 5.768361

Many thanks in advance,

Matthias Moeller

Matthias Mller wrote:

> I compiled it both with Ifort (9.1) and G95 (0.91) and see, that
> functions are much slower. My question is why? Nothing needs to be
> allocated on-the-fly. Shouldn't the code perform similar in both cases?

The function result needs to be allocated on the fly, and depending on
compiler options and its size that may be on the heap or on the stack.

Perhaps more importantly, the function result also needs to be copied
from the internal variable to the the parent r on assignment , whereas
the assignment takes place once in the case of the subroutine.  You
didn't indicate what optimization options you used, but I'd be
interested to see if ifort didn't do a lot better in the function case
with inter-procedural optimization enabled -- with inlining, it's
possible the compiler would be able to eliminate the extra assignment.

Matthias Mller wrote:
> Dear newsgroup,

> I encountered some problems with vector-valued functions in F90/F95.
> Here, is a simple demo-program:

> program quick
>   implicit none

>   integer, parameter :: dp=8
>   real(dp), dimension(:), pointer :: u,v,r

these really should use allocatable (not pointer).
Add to del.icio.us | Digg this | Stumble it | Powered by Megasolutions Inc