|
|
 |
 |
 |
 |
Fortran Programming Language
|
 |
 |
 |
 |
 |
 |
 |
 |
problems with XLF/64 bit
Hi newsgroup! I have encountered some problems when trying to port my code to a 64- (powerpc) machine using XLF. In short: When compiling with ifc on a 32-bit machine it works fine, giving reasonable results, but it wouldn't work anymore on a 64-bit (cluster) machine, i.e. the results are nonsense. FYI: the code is a numerical MHD solver (zeus-mp). I invoke the compiler via the mpif90 wrapper script using xlf90_r -q32 -O0 ... -q32 because I thought the problems might be due to 64-bit stuff, doesn't look like it, though. What happens is that it introduces NAN into the calculations. I tried to track the source of these NaN down, and while doing so, I stumbled across some interesting (=weird) behaviour. I have code like: do 350 j=jbeg,jend do 340 i=ibeg,iend w1(i,j,k) = u1(i,j,k) + q1 * st1(i,j,k) / srd1(i,j,k)**2 w2(i,j,k) = u2(i,j,k) + q1 * st2(i,j,k) / srd2(i,j,k)**2 w3(i,j,k) = u3(i,j,k) + q1 * st3(i,j,k) / srd3(i,j,k)**2 340 continue 350 continue 360 continue c write (*,*) '==' c write (*,*) u2(42,:,ks) c write (*,*) '--' c write (*,*) st2(42, :, ks) c write (*,*) '++' c write (*,*) srd2(42, :, ks) return end I know that after the code returned from this subroutine, there is a couple of NaNs in w2 (I know that by just writing the array to stdout). However, if I uncomment the write statements at the end, the NaNs go away! The results of the code still don't make sense, but at least they are numbers, so I guess you could consider that as a progress :-) Obviously, there shouldn't be a difference when I just print out the arrays.... I suspect some compiler optimization that takes place when the write commands aren't there and doesn't if they are. Do you have any ideas? I'd also be really grateful for any comments about how to get xlf to be compatible to ifc code. There is the possibility that this is some library/architecture problem, but I strongly suspect the compiler at the moment. Thanks heaps! Cheers, Matthias -- http://astro.ph.unimelb.edu.au/~mvigeliu/
Matthias Vigelius wrote: > Hi newsgroup! > I have encountered some problems when trying to port my code to a 64- > (powerpc) machine using XLF. In short: When compiling with ifc on a > 32-bit machine it works fine, giving reasonable results, but it wouldn't > work anymore on a 64-bit (cluster) machine, i.e. the results are > nonsense. FYI: the code is a numerical MHD solver (zeus-mp). > I invoke the compiler via the mpif90 wrapper script using > xlf90_r -q32 -O0 ... > -q32 because I thought the problems might be due to 64-bit stuff, > doesn't look like it, though. > What happens is that it introduces NAN into the calculations. I tried to > track the source of these NaN down, and while doing so, I stumbled > across some interesting (=weird) behaviour. > I have code like: > do 350 j=jbeg,jend > do 340 i=ibeg,iend > w1(i,j,k) = u1(i,j,k) + q1 * st1(i,j,k) / srd1(i,j,k)**2 > w2(i,j,k) = u2(i,j,k) + q1 * st2(i,j,k) / srd2(i,j,k)**2 > w3(i,j,k) = u3(i,j,k) + q1 * st3(i,j,k) / srd3(i,j,k)**2 > 340 continue > 350 continue > 360 continue > c write (*,*) '==' > c write (*,*) u2(42,:,ks) > c write (*,*) '--' > c write (*,*) st2(42, :, ks) > c write (*,*) '++' > c write (*,*) srd2(42, :, ks) > return > end > I know that after the code returned from this subroutine, there is a > couple of NaNs in w2 (I know that by just writing the array to stdout). > However, if I uncomment the write statements at the end, the NaNs go > away! The results of the code still don't make sense, but at least they > are numbers, so I guess you could consider that as a progress :-)
A likely one is array bound violation. The write statements add some memory space for constants and character strings, so when they are active, those values are overwritten innocuously instead of the important numbers. Another possibility is an "unsaved" variable that happens, by chance, to retain its value when the writes are there.
Hi Michel, Michel Olagnon wrote: >> I know that after the code returned from this subroutine, there is a >> couple of NaNs in w2 (I know that by just writing the array to >> stdout). However, if I uncomment the write statements at the end, the >> NaNs go away! The results of the code still don't make sense, but at >> least they are numbers, so I guess you could consider that as a >> progress :-) > A likely one is array bound violation. The write statements add some > memory space for constants and character strings, so when they are > active, those values are overwritten innocuously instead of the > important numbers.
ok, that sounds interesting. Any ideas how that can occur with xlf while it doesn't occur with ifc? Probably some side-effects in the storage handling of arrays... I'm trying to work through the thousand compiler options of xlf but haven't found any interesting yet. > Another possibility is an "unsaved" variable that happens, by chance, > to retain its value when the writes are there.
i thought about this, as well. Indeed, it sometimes helps to add -qsave but not in all cases - seems there's some multiple errors here. thanks heaps! Cheers, Matthisa
-- http://astro.ph.unimelb.edu.au/~mvigeliu/
On Fri, 13 Apr 2007 18:00:29 +1000, Matthias Vigelius wrote: > Michel Olagnon wrote: >>> I know that after the code returned from this subroutine, there is a >>> couple of NaNs in w2 (I know that by just writing the array to >>> stdout). However, if I uncomment the write statements at the end, the >>> NaNs go away! The results of the code still don't make sense, but at >>> least they are numbers, so I guess you could consider that as a >>> progress :-) >> A likely one is array bound violation. The write statements add some >> memory space for constants and character strings, so when they are >> active, those values are overwritten innocuously instead of the >> important numbers. > ok, that sounds interesting. Any ideas how that can occur with xlf while > it doesn't occur with ifc? Probably some side-effects in the storage > handling of arrays... I'm trying to work through the thousand compiler > options of xlf but haven't found any interesting yet.
If a program has array bounds errors, then it will typically produce different behavior with different compilers. There is no reason to suspect anything special with respect to those two compilers, other than the fact that they happen to use different memory layouts. If you have any other compilers available, it may be instructive to try the program with them. Something may pop up that sheds light on the problem. -- Dave Seaman Oral Arguments in Mumia Abu-Jamal Case to be heard May 17 U.S. Court of Appeals, Third Circuit <http://mumia2000.org/>
On Fri, 13 Apr 2007 18:00:29 +1000, Matthias Vigelius <mvige@physics.unimelb.edu.au> wrote in <evndb0$i8@news.albasani.net>: > Michel Olagnon wrote: >>> I know that after the code returned from this subroutine, there is a >>> couple of NaNs in w2 (I know that by just writing the array to >>> stdout). However, if I uncomment the write statements at the end, the >>> NaNs go away! The results of the code still don't make sense, but at >>> least they are numbers, so I guess you could consider that as a >>> progress :-) >> A likely one is array bound violation. The write statements add some >> memory space for constants and character strings, so when they are >> active, those values are overwritten innocuously instead of the >> important numbers. > ok, that sounds interesting. Any ideas how that can occur with xlf while > it doesn't occur with ifc? Probably some side-effects in the storage > handling of arrays... I'm trying to work through the thousand compiler > options of xlf but haven't found any interesting yet.
Have you found the one to turn on bounds-checking? That's the first thing I'd try. (Others mentioned bounds errors, but not turning on checking, as far as I've seen so far.) Hmm, a quick google suggests -C (or -qcheck). Maybe -qextchk and -qflttrp would be useful, too. >> Another possibility is an "unsaved" variable that happens, by chance, >> to retain its value when the writes are there. > i thought about this, as well. Indeed, it sometimes helps to add -qsave > but not in all cases - seems there's some multiple errors here.
-- Ivan Reid, School of Engineering & Design, _____________ CMS Collaboration, Brunel University. Ivan.Reid@[brunel.ac.uk|cern.ch] Room 40-1-B12, CERN KotPT -- "for stupidity above and beyond the call of duty".
|
 |
 |
 |
 |
|