Home     |     .Net Programming    |     cSharp Home    |     Sql Server Home    |     Javascript / Client Side Development     |     Ajax Programming

Ruby on Rails Development     |     Perl Programming     |     C Programming Language     |     C++ Programming     |     IT Jobs

Python Programming Language     |     Laptop Suggestions?    |     TCL Scripting     |     Fortran Programming     |     Scheme Programming Language


 
 
Cervo Technologies
The Right Source to Outsource

MS Dynamics CRM 3.0

Python Programming Language

multiline regular expression (replace)


Hi all,

I would like to perform regular expression replace (e.g. removing
everything from within tags in a XML file) with multiple-line pattern.
How can I do this?

where = open("filename").read()
multilinePattern = "^<tag> .... <\/tag>$"
re.search(multilinePattern, where, re.MULTILINE)

Thanks greatly,
Zdenek

On May 29, 2:03 am, Zdenek Maxa <zdenekm@yahoo.co.uk> wrote:

> Hi all,

> I would like to perform regular expression replace (e.g. removing
> everything from within tags in a XML file) with multiple-line pattern.
> How can I do this?

> where = open("filename").read()
> multilinePattern = "^<tag> .... <\/tag>$"
> re.search(multilinePattern, where, re.MULTILINE)

> Thanks greatly,
> Zdenek

Why not use an xml package for working with xml files?  I'm sure
they'll handle your multiline tags.

http://effbot.org/zone/element-index.htm
http://codespeak.net/lxml/

~Sean

Hi,

that was merely an example of what I would like to achieve. However, in
general, is there a way for handling multiline regular expressions in
Python, using presumably only modules from distribution like re?

Thanks,
Zdenek

So you mean you don't know how to *create* multiline patterns?

One way is to use """ ... """ or ''' ... ''' quoting, which allows you
to include newlines as part of your strings. Another is to use \n in
your strings to represent newlines.

regards
  Steve
--
Steve Holden        +1 571 484 6266   +1 800 494 3119
Holden Web LLC/Ltd           http://www.holdenweb.com
Skype: holdenweb      http://del.icio.us/steve.holden
------------------ Asciimercial ---------------------
Get on the web: Blog, lens and tag your way to fame!!
holdenweb.blogspot.com        squidoo.com/pythonology
tagged items:         del.icio.us/steve.holden/python
All these services currently offer free registration!
-------------- Thank You for Reading ----------------

There shouldn't be any problems matching multiline strings using re (even without flags), there might be some problem with the search pattern, however, especially the "..." part :-) if you are in fact using dots - which don't include newlines in this pattern.

the flag re.M only changes the behaviour of ^ and $ metacharacters, cf. the docs:
re.M
MULTILINE
When specified, the pattern character "^" matches at the beginning of the string and at the beginning of each line (immediately following each newline); and the pattern character "$" matches at the end of the string and at the end of each line (immediately preceding each newline). By default, "^" matches only at the beginning of the string, and "$" only at the end of the string and immediately before the newline (if any) at the end of the string.

you may also check the S flag:
re.S
DOTALL
Make the "." special character match any character at all, including a newline; without this flag, "." will match anything except a newline.

see
http://docs.python.org/lib/node46.html
http://docs.python.org/lib/re-syntax.html

Vlasta

On May 29, 11:03 am, Zdenek Maxa <zdenekm@yahoo.co.uk> wrote:

> Hi all,

> I would like to perform regular expression replace (e.g. removing
> everything from within tags in a XML file) with multiple-line pattern.
> How can I do this?

> where = open("filename").read()
> multilinePattern = "^<tag> .... <\/tag>$"
> re.search(multilinePattern, where, re.MULTILINE)

If it helps, I have the following function:

8<-----------------------------------------------------------
def update_xml(infile, outfile, mapping, deep=False):
    from xml.etree import cElementTree as ET
    from utils.elementfilter import ElementFilter
    doc = ET.parse(infile)
    efilter = ElementFilter(doc.getroot())
    changes = 0
    for key, val in mapping.iteritems():
        pattern, repl = val
        efilter.filter = key
        changes += efilter.sub(pattern, repl, deep=deep)
    doc.write(outfile, encoding='UTF-8')
    return changes

mapping = {
        '/portal/content-node[@type=="page"]/@action': ('.*', 'ZZZZ'),
        '/portal/web-app/portlet-app/portlet/localedata/title':
('Portal', 'Gateway'),
        }

changes = update_xml('c:\\working\\tmp\\test.xml', 'c:\\working\\tmp\
\test2.xml', mapping, True)

print 'There were %s changes' % changes
8<-----------------------------------------------------------

where utils.elementfilter is this module:

    http://gflanagan.net/site/python/elementfilter/elementfilter.py

It doesn't support `re` flags, but you could change the sub method of
elementfilter.ElementFilter to do so, eg.(UNTESTED!):

    def sub(self, pattern, repl, count=0, deep=False, flags=None):
        changes = 0
        if flags:
            pattern = re.compile(pattern, flags)
        for elem in self.filtered:
              ...
              [rest of method unchanged]
              ...

Gerard

Hi,

yes:

import re

a="""
I Am
Multiline
but short anyhow"""

b="(I[\s\S]*line)"

print re.search(b, a,re.MULTILINE).group(1)

gives

I Am
Multiline

Be aware that . matches NO newlines!!!
May be this caused your problems?

regards
        Holger

Hi,

Thanks a lot for useful hints to all of you who replied to my question.
I could easily do now what I wanted.

Cheers,
Zdenek

Add to del.icio.us | Digg this | Stumble it | Powered by Megasolutions Inc