|
|
 |
 |
 |
 |
Python Programming Language
|
 |
 |
 |
 |
 |
 |
 |
 |
multiline regular expression (replace)
Hi all, I would like to perform regular expression replace (e.g. removing everything from within tags in a XML file) with multiple-line pattern. How can I do this? where = open("filename").read() multilinePattern = "^<tag> .... <\/tag>$" re.search(multilinePattern, where, re.MULTILINE) Thanks greatly, Zdenek
On May 29, 2:03 am, Zdenek Maxa <zdenekm@yahoo.co.uk> wrote: > Hi all, > I would like to perform regular expression replace (e.g. removing > everything from within tags in a XML file) with multiple-line pattern. > How can I do this? > where = open("filename").read() > multilinePattern = "^<tag> .... <\/tag>$" > re.search(multilinePattern, where, re.MULTILINE) > Thanks greatly, > Zdenek
Why not use an xml package for working with xml files? I'm sure they'll handle your multiline tags. http://effbot.org/zone/element-index.htm http://codespeak.net/lxml/ ~Sean
half.ital @gmail.com wrote: > On May 29, 2:03 am, Zdenek Maxa <zdenekm @yahoo.co.uk> wrote: >> Hi all, >> I would like to perform regular expression replace (e.g. removing >> everything from within tags in a XML file) with multiple-line pattern. >> How can I do this? >> where = open("filename").read() >> multilinePattern = "^<tag> .... <\/tag>$" >> re.search(multilinePattern, where, re.MULTILINE) >> Thanks greatly, >> Zdenek > Why not use an xml package for working with xml files? I'm sure > they'll handle your multiline tags. > http://effbot.org/zone/element-index.htm > http://codespeak.net/lxml/ > ~Sean
Hi, that was merely an example of what I would like to achieve. However, in general, is there a way for handling multiline regular expressions in Python, using presumably only modules from distribution like re? Thanks, Zdenek
Zdenek Maxa wrote: > half.ital @gmail.com wrote: >> On May 29, 2:03 am, Zdenek Maxa <zdenekm @yahoo.co.uk> wrote: >>> Hi all, >>> I would like to perform regular expression replace (e.g. removing >>> everything from within tags in a XML file) with multiple-line pattern. >>> How can I do this? >>> where = open("filename").read() >>> multilinePattern = "^<tag> .... <\/tag>$" >>> re.search(multilinePattern, where, re.MULTILINE) >>> Thanks greatly, >>> Zdenek >> Why not use an xml package for working with xml files? I'm sure >> they'll handle your multiline tags. >> http://effbot.org/zone/element-index.htm >> http://codespeak.net/lxml/ >> ~Sean > Hi, > that was merely an example of what I would like to achieve. However, in > general, is there a way for handling multiline regular expressions in > Python, using presumably only modules from distribution like re? > Thanks, > Zdenek
So you mean you don't know how to *create* multiline patterns? One way is to use """ ... """ or ''' ... ''' quoting, which allows you to include newlines as part of your strings. Another is to use \n in your strings to represent newlines. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC/Ltd http://www.holdenweb.com Skype: holdenweb http://del.icio.us/steve.holden ------------------ Asciimercial --------------------- Get on the web: Blog, lens and tag your way to fame!! holdenweb.blogspot.com squidoo.com/pythonology tagged items: del.icio.us/steve.holden/python All these services currently offer free registration! -------------- Thank You for Reading ----------------
> Od: Zdenek Maxa <zdenekm @yahoo.co.uk> > Pedmt: Re: multiline regular expression (replace) > Datum: 29.5.2007 13:46:32 > ---------------------------------------- > half.ital @gmail.com wrote: > > On May 29, 2:03 am, Zdenek Maxa <zdenekm @yahoo.co.uk> wrote: > >> Hi all, > >> I would like to perform regular expression replace (e.g. removing > >> everything from within tags in a XML file) with multiple-line pattern. > >> How can I do this? > >> where = open("filename").read() > >> multilinePattern = "^<tag> .... <\/tag>$" > >> re.search(multilinePattern, where, re.MULTILINE) > >> Thanks greatly, > >> Zdenek > > Why not use an xml package for working with xml files? I'm sure > > they'll handle your multiline tags. > > http://effbot.org/zone/element-index.htm > > http://codespeak.net/lxml/ > > ~Sean > Hi, > that was merely an example of what I would like to achieve. However, in > general, is there a way for handling multiline regular expressions in > Python, using presumably only modules from distribution like re? > Thanks, > Zdenek > -- > http://mail.python.org/mailman/listinfo/python-list
There shouldn't be any problems matching multiline strings using re (even without flags), there might be some problem with the search pattern, however, especially the "..." part :-) if you are in fact using dots - which don't include newlines in this pattern. the flag re.M only changes the behaviour of ^ and $ metacharacters, cf. the docs: re.M MULTILINE When specified, the pattern character "^" matches at the beginning of the string and at the beginning of each line (immediately following each newline); and the pattern character "$" matches at the end of the string and at the end of each line (immediately preceding each newline). By default, "^" matches only at the beginning of the string, and "$" only at the end of the string and immediately before the newline (if any) at the end of the string. you may also check the S flag: re.S DOTALL Make the "." special character match any character at all, including a newline; without this flag, "." will match anything except a newline. see http://docs.python.org/lib/node46.html http://docs.python.org/lib/re-syntax.html Vlasta
On May 29, 11:03 am, Zdenek Maxa <zdenekm@yahoo.co.uk> wrote: > Hi all, > I would like to perform regular expression replace (e.g. removing > everything from within tags in a XML file) with multiple-line pattern. > How can I do this? > where = open("filename").read() > multilinePattern = "^<tag> .... <\/tag>$" > re.search(multilinePattern, where, re.MULTILINE)
If it helps, I have the following function: 8<----------------------------------------------------------- def update_xml(infile, outfile, mapping, deep=False): from xml.etree import cElementTree as ET from utils.elementfilter import ElementFilter doc = ET.parse(infile) efilter = ElementFilter(doc.getroot()) changes = 0 for key, val in mapping.iteritems(): pattern, repl = val efilter.filter = key changes += efilter.sub(pattern, repl, deep=deep) doc.write(outfile, encoding='UTF-8') return changes mapping = { '/portal/content-node[@type=="page"]/@action': ('.*', 'ZZZZ'), '/portal/web-app/portlet-app/portlet/localedata/title': ('Portal', 'Gateway'), } changes = update_xml('c:\\working\\tmp\\test.xml', 'c:\\working\\tmp\ \test2.xml', mapping, True) print 'There were %s changes' % changes 8<----------------------------------------------------------- where utils.elementfilter is this module: http://gflanagan.net/site/python/elementfilter/elementfilter.py It doesn't support `re` flags, but you could change the sub method of elementfilter.ElementFilter to do so, eg.(UNTESTED!): def sub(self, pattern, repl, count=0, deep=False, flags=None): changes = 0 if flags: pattern = re.compile(pattern, flags) for elem in self.filtered: ... [rest of method unchanged] ... Gerard
Hi, yes: import re a=""" I Am Multiline but short anyhow""" b="(I[\s\S]*line)" print re.search(b, a,re.MULTILINE).group(1) gives I Am Multiline Be aware that . matches NO newlines!!! May be this caused your problems? regards Holger
Zdenek Maxa wrote: > half.ital @gmail.com wrote: >> On May 29, 2:03 am, Zdenek Maxa <zdenekm @yahoo.co.uk> wrote: >>> Hi all, >>> I would like to perform regular expression replace (e.g. removing >>> everything from within tags in a XML file) with multiple-line pattern. >>> How can I do this? >>> where = open("filename").read() >>> multilinePattern = "^<tag> .... <\/tag>$" >>> re.search(multilinePattern, where, re.MULTILINE) >>> Thanks greatly, >>> Zdenek >> Why not use an xml package for working with xml files? I'm sure >> they'll handle your multiline tags. >> http://effbot.org/zone/element-index.htm >> http://codespeak.net/lxml/ >> ~Sean > Hi, > that was merely an example of what I would like to achieve. However, in > general, is there a way for handling multiline regular expressions in > Python, using presumably only modules from distribution like re? > Thanks, > Zdenek
Hi, Thanks a lot for useful hints to all of you who replied to my question. I could easily do now what I wanted. Cheers, Zdenek
Holger Berger wrote: > Hi, > yes: > import re > a=""" > I Am > Multiline > but short anyhow""" > b="(I[\s\S]*line)" > print re.search(b, a,re.MULTILINE).group(1) > gives > I Am > Multiline > Be aware that . matches NO newlines!!! > May be this caused your problems? > regards > Holger > Zdenek Maxa wrote: >> half.ital@gmail.com wrote: >>> On May 29, 2:03 am, Zdenek Maxa <zdenekm@yahoo.co.uk> wrote: >>>> Hi all, >>>> I would like to perform regular expression replace (e.g. removing >>>> everything from within tags in a XML file) with multiple-line pattern. >>>> How can I do this? >>>> where = open("filename").read() >>>> multilinePattern = "^<tag> .... <\/tag>$" >>>> re.search(multilinePattern, where, re.MULTILINE) >>>> Thanks greatly, >>>> Zdenek >>> Why not use an xml package for working with xml files? I'm sure >>> they'll handle your multiline tags. >>> http://effbot.org/zone/element-index.htm >>> http://codespeak.net/lxml/ >>> ~Sean >> Hi, >> that was merely an example of what I would like to achieve. However, in >> general, is there a way for handling multiline regular expressions in >> Python, using presumably only modules from distribution like re? >> Thanks, >> Zdenek
|
 |
 |
 |
 |
|