|
|
 |
 |
 |
 |
Python Programming Language
|
 |
 |
 |
 |
 |
 |
 |
 |
Python 3000: Standard API for archives?
I'm a relative newbie to Python, so please bear with me. There are currently two standard modules used to access archived data: zipfile and tarfile. The interfaces are completely different. In particular, script wanting to analyze different types of archives must duplicate substantial pieces of logic. The problem is not limited to method names; it includes how stat-like information is accessed. I think it would be a good thing if a standardized interface existed, similar to PEP 247. This would make it easier for one script to access multiple types of archives, such as RAR, 7-Zip, ISO, etc. In particular, a single factory class could produce PEP 302 import hooks for future as well as current archive formats. I think that an archive module adhering to the standard should adopt a least-common-denominator approach, initially supporting read-only access without seek, i.e. tar files on actual tape. For applications that require a seek method (such as importers) a standard wrapper class could transparently cache archive members in temp files; this would fit in well with Python 3000's rewrite of the I/O interface to support stackable interfaces. To this end, we'd need is_seekable and is_writable attributes for both the module and instances (moduel level would declare if something is possible, not if it is always true). Most importantly, all archive modules should provide a standard API for accessing their individual files via a single archive_content class that provides a standard 'read' method. Less importantly but nice to have would be a way for archives to be auto-magically scanned during walks of directories. Feedback?
samwyse wrote this on Mon, 04 Jun 2007 12:02:03 +0000. My reply is below. > I think it would be a good thing if a standardized interface > existed, similar to PEP 247. This would make it easier for one > script to access multiple types of archives, such as RAR, 7-Zip, > ISO, etc.
Gee, it would be great to be able to open an archive member for update I/O. This is kind of hard to do now. If it were possible, though, it would obscure the difference between file directories and archives, which would be kind of neat. Furthermore, you could navigate archives of archives (zips of tars and other abominations). -- .. Chuck Rhode, Sheboygan, WI, USA .. Weather: http://LacusVeris.com/WX .. 62 Wind N 7 mph Sky overcast. Mist.
Chuck Rhode wrote: > samwyse wrote this on Mon, 04 Jun 2007 12:02:03 +0000. My reply is > below. >> I think it would be a good thing if a standardized interface >> existed, similar to PEP 247. This would make it easier for one >> script to access multiple types of archives, such as RAR, 7-Zip, >> ISO, etc. > Gee, it would be great to be able to open an archive member for update > I/O. This is kind of hard to do now. If it were possible, though, it > would obscure the difference between file directories and archives, > which would be kind of neat. Furthermore, you could navigate archives > of archives (zips of tars and other abominations).
FWIW, there's no need to get hung on Python-3000 or any other release. Just put something together a module called "archive" or whatever, which exposes the kind of API you're thinking of, offering support across zip, bz2 and whatever else you want. Put it up on the Cheeseshop, announce it on c.l.py.ann and anywhere else which seems apt. See if it gains traction. Take it from there. NB This has the advantage that you can start small, say with zip and bz2 support and maybe see if you get contributions for less common formats, even via 3rd party libs. If you were to try to get it into the stdlib it would need to be much more fully specified up front, I suspect. TJG
Tim Golden wrote this on Mon, 04 Jun 2007 15:55:30 +0100. My reply is below.
> Chuck Rhode wrote: >> samwyse wrote this on Mon, 04 Jun 2007 12:02:03 +0000. My reply is >> below. >>> I think it would be a good thing if a standardized interface >>> existed, similar to PEP 247. This would make it easier for one >>> script to access multiple types of archives, such as RAR, 7-Zip, >>> ISO, etc. >> Gee, it would be great to be able to open an archive member for >> update I/O. This is kind of hard to do now. If it were possible, >> though, it would obscure the difference between file directories >> and archives, which would be kind of neat. Furthermore, you could >> navigate archives of archives (zips of tars and other >> abominations). > Just put something together a module called "archive" or whatever, > which exposes the kind of API you're thinking of, offering support > across zip, bz2 and whatever else you want. Put it up on the > Cheeseshop, announce it on c.l.py.ann and anywhere else which seems > apt. See if it gains traction. Take it from there. > NB This has the advantage that you can start small, say with zip and > bz2 support and maybe see if you get contributions for less common > formats, even via 3rd party libs. If you were to try to get it into > the stdlib it would need to be much more fully specified up front, I > suspect.
Yeah, this is in the daydreaming stages. I'd like to maintain not-just-read-only libraries of geographic shapefiles, which are available free from governmental agencies and which are riddled with obvious errors. Typically these are published in compressed archives within which every subdirectory is likewise compressed (apparently for no other purpose than a rather vain attempt at flattening the directory structure, which must be reconstituted on the User's end anyway). Building a comprehensive index to what member name(s) the different map layers (roads, political boundaries, watercourses) have in various political districts of varying geographic resolutions is much more than merely frustrating. I've given it up. However, I believe that once I've located something usable, the thing to do is save a grand unified reference locator (GURL) for it. The GURL would specify a directory path to the highest level archive followed by a (potential cascade of) archive member name(s for enclosed archives) of the data file(s) to be operated on. Unpacking and repacking would be behind the scenes. Updates (via FTP) of non-local resources would be transparent, too. I think, though, that notes about the publication date, publisher, resolution, area covered, and format of the map or map layer ought to be kept out of the GURL. My whole appetite for this sort of thing would vanish if access to the shapefiles were more tractable to begin with. -- .. Chuck Rhode, Sheboygan, WI, USA .. 1979 Honda Goldwing GL1000 (Geraldine) .. Weather: http://LacusVeris.com/WX .. 52 Wind N 9 mph Sky overcast.
|
 |
 |
 |
 |
|