|
|
 |
 |
 |
 |
Python Programming Language
|
 |
 |
 |
 |
 |
 |
 |
 |
Can os.remove followed by os.path.isfile disagree?
Can os.path.isfile(x) ever return True after os.remove(x) has successfully completed? (Windows 2003, Python 2.3) We had a couple of failures in a server application that we cannot yet reproduce in a simple case. Analysis of the code suggests that the only possible explanation is that the following occurs, os.remove(x) .... stuff if os.path.isfile(x): raise "Ooops, how did we get here?" file(x, "wb").write(content) We end up in the raise. By the time we get to look at the system the file is actually gone, presumably because of the os.remove. The "stuff" is a handful of lines of code which don't take any significant time to perform. There are no "try" blocks to mask a failure in the os.remove call. The application is multithreaded so it is possible that another thread writes to the file between the "remove" and the "isfile", but at the end of the failure the file is actually not on the filesystem and I don't believe there is a way that the file could be removed again in this scenario. Regards, Paul
ppater @gmail.com wrote: > Can os.path.isfile(x) ever return True after os.remove(x) has > successfully completed? (Windows 2003, Python 2.3) > We had a couple of failures in a server application that we cannot yet > reproduce in a simple case. Analysis of the code suggests that the > only possible explanation is that the following occurs, > os.remove(x) > .... stuff > if os.path.isfile(x): > raise "Ooops, how did we get here?" > file(x, "wb").write(content) > We end up in the raise. By the time we get to look at the system the > file is actually gone, presumably because of the os.remove. > The "stuff" is a handful of lines of code which don't take any > significant time to perform. There are no "try" blocks to mask a > failure in the os.remove call. > The application is multithreaded so it is possible that another thread > writes to the file between the "remove" and the "isfile", but at the > end of the failure the file is actually not on the filesystem and I > don't believe there is a way that the file could be removed again in > this scenario.
The os.remove implementation (in posixmodule.c) uses the DeleteFileW/A API call (depending on Unicode or not): http://msdn2.microsoft.com/En-US/library/aa363915.aspx No suggestion there that it might return before completion, but I'd be surprised if it did. The most likely bet would seem to be a race condition as you suggest below. Doesn't have to be from a thread in your program, although I assume you know best about your own filesystem ;) Another possibility -- I suppose, though without any authority -- is that the .remove is silently swallowing an error (ie failing at OS level without telling Python so no exception is raised). Much more likely is that somewhere in that "...stuff" is something which inadvertently recreates the file. Don't suppose you've got some kind of flashy software running which intercepts OS file-manipulation calls for Virus or Archiving purposes? TJG
ppater @gmail.com wrote: > Can os.path.isfile(x) ever return True after os.remove(x) has > successfully completed? (Windows 2003, Python 2.3) As an afterthought, have you tried NTFS auditing, or directory monitoring, such as: http://timgolden.me.uk/python/win32_how_do_i/watch_directory_for_chan... to see the sequence of events on the directory? At least that might confirm whether you are seeing delete-create-delete or some other pattern. TJG
<SNIP> > The application is multithreaded so it is possible that another thread > writes to the file between the "remove" and the "isfile", but at the > end of the failure the file is actually not on the filesystem and I > don't believe there is a way that the file could be removed again in > this scenario.
This sure sounds like a thread race condition. In theory, the os.remove call failing to actually remove the file before returning might also do this, but it seems unlikely that a bug that blatant in a fundamental OS call could survive very long, even in Windoze. I'd take the time to really examine the multiple threads of work you're running to make sure one of them isn't removing the file just as another creates it. Better still, use a locking semaphore around the code the creates/deletes the file to guarantee mutual exclusion.
ppater @gmail.com wrote: > Can os.path.isfile(x) ever return True after os.remove(x) has > successfully completed? (Windows 2003, Python 2.3) > We had a couple of failures in a server application that we cannot yet > reproduce in a simple case. Analysis of the code suggests that the > only possible explanation is that the following occurs, > os.remove(x) > .... stuff > if os.path.isfile(x): > raise "Ooops, how did we get here?" > file(x, "wb").write(content) > We end up in the raise. By the time we get to look at the system the > file is actually gone, presumably because of the os.remove. > The "stuff" is a handful of lines of code which don't take any > significant time to perform. There are no "try" blocks to mask a > failure in the os.remove call. > The application is multithreaded so it is possible that another thread > writes to the file between the "remove" and the "isfile", but at the > end of the failure the file is actually not on the filesystem and I > don't believe there is a way that the file could be removed again in > this scenario.
Is the file on a network drive by any chance? Diez
Thanks for the quick and detailed response! > The most likely bet would seem to be a race condition > as you suggest below. Doesn't have to be from a thread > in your program, although I assume you know best about > your own filesystem ;)
My first thought, after discounting the os.remove early return, was that it has to be from a thread in our application. But, a) it is highly unlikely due to the way tasks are scheduled b) even if it did occur I don't see a code path that ends with the file not there But, until I read the next part of your note, it was still the only credible explanation ... > Don't suppose you've got some kind of flashy software > running which intercepts OS file-manipulation calls for > Virus or Archiving purposes?
... I'm wondering if this is the culprit. I now recall that the Spambayes project saw a weird error due to Google Desktop Search where GDS would intervene at such a low level that some file system level "invariants" ... aren't! I don't remember the details but I think you delete or create a file and GDS jumps in to backup / index it and you don't have the access you thought you had moments ago. I don't think GDS is running on this server but it is running a lot of other enterprise monitoring apps and maybe they are doing a similar thing. I'm off to investigate more on this front! Thanks, Paul
Thanks for the response! > I'd take the time to really examine the multiple threads of work you're running > to make sure one of them isn't removing the file just as another creates it. > Better still, use a locking semaphore around the code the creates/deletes the file > to guarantee mutual exclusion.
The locking-semaphore idea is a good one - it would remove any possibility that this kind of race condition is causing the problem. Thanks, Paul
> Is the file on a network drive by any chance? > Diez
No, but the server is actually a VMWare VM and the drive is a virtual drive. I'm thinking that this may be significant as it may be that the VMWare VHD driver is the "flashy software running which intercepts OS file- manipulation calls" that Tim Golden suggested. Paul
> Don't suppose you've got some kind of flashy software > running which intercepts OS file-manipulation calls for > Virus or Archiving purposes? > TJG
As I mentioned in another reply, this server is virtual and so is the drive. I'm wondering if this might also be significant. Paul
ppater @gmail.com wrote: > Can os.path.isfile(x) ever return True after os.remove(x) has > successfully completed? (Windows 2003, Python 2.3) Yes. If another application has opened the file with FILE_SHARE_DELETE, os.remove succeeds but the file doesn't actually disappear until the last open handle to it is closed. Roger ----== Posted via Newsfeeds.Com - Unlimited-Unrestricted-Secure Usenet News==---- http://www.newsfeeds.com The #1 Newsgroup Service in the World! 120,000+ Newsgroups ----= East and West-Coast Server Farms - Total Privacy via Encryption =----
On Jun 6, 12:30 pm, "Roger Upole" <rup@hotmail.com> wrote: > ppater @gmail.com wrote: > > Can os.path.isfile(x) ever return True after os.remove(x) has > > successfully completed? (Windows 2003, Python 2.3) > Yes. If another application has opened the file with FILE_SHARE_DELETE, > os.remove succeeds but the file doesn't actually disappear until the last open > handle to it is closed.
Roger, Thanks - you've hit the nail on the head. This is the final piece of the puzzle and I've now been able to reproduce the problem! The cause is ... - a TSVCache.exe (Tortoise SVN) process is scanning the file with FILE_SHARE_DELETE access at the moment that the os.remove occurs - this causes os.remove to return but the file is still there while the scan completes - next, os.path.isfile returns True and the app raises an exception - a short while later the scan is complete and Windows deletes the file Thanks to everyone who responded - I didn't expect to be able to get to the bottom of this so quickly! Thanks, Paul
|
 |
 |
 |
 |
|