|
|
 |
 |
 |
 |
Fortran Programming Language
|
 |
 |
 |
 |
 |
 |
 |
 |
passing large arrays to functions
Hello Fortran Experts, I've noticed that passing large arrays to functions will cause a segmentation fault (core dump) while passing the same array to a functionally identical subroutine will not. I've also noticed that functions tend to evaluate around 10% slower than their subroutine counterparts in general. Does anyone know why this is? I've read that subroutines don't typically create new memory, they just pass the address and work on the same memory. Are functions set up differently? The routines I'm writing are fundamental operators (like gradients) and I would rather write them as functions but I'm dealing with big arrays and optimizing for speed. Suggestions? Thanks, Gabe -------------------------------------------------- here's an example pseudo-code: module gradient contains pure function grad_fnc(p) real, intent(in) :: p(:,:,:) real, allocatable :: grad_fnc(:,:,:,:) ... allocate and fill grad_fnc... end function grad_fnc pure subroutine grad_sub(p,gp) real, intent(in) :: p(:,:,:) real, intent(out) :: gp(:,:,:,:) ... fill gp ... end subroutine grad_sub end module program test use gradient integer, parameter :: size = 100 real :: p(size,size,size), gp(3,size,size,size) ... fill p ... call grad_sub(p,gp) ! takes .27 seconds gp = grad_fnc(p) ! takes .31 seconds. wouldn't work with size = 150 end program
Gabe <DrWeymo @gmail.com> wrote: > I've read that subroutines don't > typically create new memory, they just pass the address and work on > the same memory. That is a *SEVERE* overgeneralization. It is true of some things, but certainly not with that kind of generality. > Are functions set up differently?
To the extent that you are talking dummy arguments (which is sounds like you mostly are), functions are the same as subroutines. But subroutines don't have result variables. No, a function result variable is not much like anything in a subroutine. Although there are some situations where a function result variable can be optimized to play a role somewhat like a subroutine argument, that is an optimization. Fundamentally, there isn't anything for the function result variable to be associated with. In trivial cases like x = f(args) the compiler might note that it can optimize things. But fundamentally, the definition of a statement like that involved first evaluating f(args) without even looking at x. Then, after f(args) is evaluated, the result is assigned (copied) to x. Yes, this involves extra allocation if the result is allocatable. The compiler might optimize that away, but you are starting at a disadvantage. > The routines I'm writing are fundamental operators (like gradients) > and I would rather write them as functions but I'm dealing with big > arrays and optimizing for speed. > pure function grad_fnc(p) > real, intent(in) :: p(:,:,:) > real, allocatable :: grad_fnc(:,:,:,:)
Allocatables are nice and handy, but if you are really optimizing for speed, you probably don't want to allocate every time you invoke the code. Typically, a gradient routine is going to get invoked many times with the same size. Things are likely to be faster if you can allocate once (or anyway, a few times) instead of many. That's so whether you use a function or subroutine, though it is likely to be more obvious how to do it for a subroutine. Also, some compilers have unreasonably low default stack sizes. That's likely what is causing your failures. You might want to up that (by a lot). Details of how to do so vary by compiler. -- Richard Maine | Good judgement comes from experience; email: last name at domain . net | experience comes from bad judgement. domain: summertriangle | -- Mark Twain
Thanks for the reply, > To the extent that you are talking dummy arguments (which is sounds like > you mostly are), functions are the same as subroutines. But subroutines > don't have result variables. No, a function result variable is not much > like anything in a subroutine.
I was talking about dummy arguments, and I wasn't thinking through the differences between function results and subroutine output dummy arguments. > Allocatables are nice and handy, but if you are really optimizing for > speed, you probably don't want to allocate every time you invoke the > code. Typically, a gradient routine is going to get invoked many times > with the same size. Things are likely to be faster if you can allocate > once (or anyway, a few times) instead of many.
I fall into the typical case you mention and I also tried using the header: -------------------------------------------------- pure function grad_fnc(p) real, intent(in) :: p(ni,nj,nk) real :: grad_fnc(nd,ni,nj,nk) -------------------------------------------------- but see no real speed up on my machine AND a further decrease in the allowable array size. I suppose the practical thing to do is simply use subroutines in these cases.
Gabe wrote: > I've noticed that passing large arrays to functions will cause a > segmentation fault (core dump) while passing the same array to a > functionally identical subroutine will not. I've also noticed that > functions tend to evaluate around 10% slower than their subroutine > counterparts in general.
In most cases, functions are either exactly like subroutines, but put a return value in a register, or are like subroutines with one extra argument (for the return value). > Does anyone know why this is? I've read that subroutines don't > typically create new memory, they just pass the address and work on > the same memory. Are functions set up differently?
I don't know what that means. Both subroutines and functions will pass arguments the same way. In the usual case, the address of an array (or non-array) is passed, and the called routine modifies the original. In some cases the called routine works on a copy, which is then copied back to the original. That should be the same for subroutines and functions. One complication is array valued functions, especially allocatable array value functions. That might be slower, and would only apply to functions. If that is what you mean, you should explain in more detail what you want to do. -- glen
Gabe wrote: > Thanks for the reply, >> To the extent that you are talking dummy arguments (which is sounds like >> you mostly are), functions are the same as subroutines. But subroutines >> don't have result variables. No, a function result variable is not much >> like anything in a subroutine. > I was talking about dummy arguments, and I wasn't thinking through the > differences between function results and subroutine output dummy > arguments. >> Allocatables are nice and handy, but if you are really optimizing for >> speed, you probably don't want to allocate every time you invoke the >> code. Typically, a gradient routine is going to get invoked many times >> with the same size. Things are likely to be faster if you can allocate >> once (or anyway, a few times) instead of many. > I fall into the typical case you mention and I also tried using the > header: > -------------------------------------------------- > pure function grad_fnc(p) > real, intent(in) :: p(ni,nj,nk) > real :: grad_fnc(nd,ni,nj,nk) > -------------------------------------------------- > but see no real speed up on my machine AND a further decrease in the > allowable array size. > I suppose the practical thing to do is simply use subroutines in these > cases.
At the risk of stating the obvious, the problem isn't with functions, it's with array-valued functions. It sounds like the result is being copied. There may be ways to avoid that; someone who knows more than I do can answer. My guess is that the reason your last example failed with smaller arrays is that allocatable arrays are allocated on the heap and other arrays are allocated on the stack (which as Richard noted may be way too small). Louis
Louis Krupp wrote:
(snip) > At the risk of stating the obvious, the problem isn't with functions, > it's with array-valued functions. It sounds like the result is being > copied. There may be ways to avoid that; someone who knows more than I > do can answer.
Avoid it by using subroutines (or function arguments). As previously mentioned, functions are processed and then the result is used in an expression. If the function is allowed to specify the size, copying from the array returned is probably the only way. -- glen
Gabe wrote in message <1179255184.451199.29 @l77g2000hsb.googlegroups.com>... >Hello Fortran Experts, >I've noticed that passing large arrays to functions will cause a >segmentation fault (core dump) while passing the same array to a >functionally identical subroutine will not. I've also noticed that >functions tend to evaluate around 10% slower than their subroutine >counterparts in general.
The difference between the two examples is not in passing large arrays to subroutines and functions, but in the way in which an even larger array is returned to the caller. The array being passed back is 4-D. In the subroutine case, the array is passed back via the dummy. In the function case, the 4-D array is passed back via the function return mechanism. Probably a copy is made. Whether the return value is placed on the stack, and then copied to the 4-D array gp, or whether it's computed in store and then copied to gp would depend on the implementation. If it's the stack, you may need to increase the stack limit to avoid the segmentation fault. There is another difference between the two examples: as to the definition of the space for the value to be passed back. In the subroutine case, you are using a normal array, but in the function, you are using an allocatable array.
>Does anyone know why this is? I've read that subroutines don't >typically create new memory, they just pass the address and work on >the same memory. Are functions set up differently? >The routines I'm writing are fundamental operators (like gradients) >and I would rather write them as functions but I'm dealing with big >arrays and optimizing for speed. Suggestions? >Thanks, >Gabe >-------------------------------------------------- >here's an example pseudo-code: >module gradient >contains > pure function grad_fnc(p) > real, intent(in) :: p(:,:,:) > real, allocatable :: grad_fnc(:,:,:,:) >... allocate and fill grad_fnc... > end function grad_fnc > pure subroutine grad_sub(p,gp) > real, intent(in) :: p(:,:,:) > real, intent(out) :: gp(:,:,:,:) >... fill gp ... > end subroutine grad_sub >end module >program test > use gradient > integer, parameter :: size = 100 > real :: p(size,size,size), gp(3,size,size,size) >... fill p ... > call grad_sub(p,gp) ! takes .27 seconds > gp = grad_fnc(p) ! takes .31 seconds. wouldn't work with size = >150 >end program
|
 |
 |
 |
 |
|