Home     |     .Net Programming    |     cSharp Home    |     Sql Server Home    |     Javascript / Client Side Development     |     Ajax Programming

Ruby on Rails Development     |     Perl Programming     |     C Programming Language     |     C++ Programming     |     IT Jobs

Python Programming Language     |     Laptop Suggestions?    |     TCL Scripting     |     Fortran Programming     |     Scheme Programming Language


 
 
Cervo Technologies
The Right Source to Outsource

MS Dynamics CRM 3.0

Perl Programming Language

perlre: why {n}?, {n,}?, {n,m}? ?


In perlre the descriptions given for each item in the following
pairs are identical:

  {n}   and {n}?     Match exactly n times
  {n,}  and {n,}?    Match at least n times
  {n,m} and {n,m}?   Match at least n but not more than m times

I.e. it seems that in all the cases above the ? serves no purpose
other than to add superfluous complexity to a regular expression.

I don't get it.  Is there a reason for this?  (The only one I can
come up is that it may simplify the automatic construction of
regexps, but this hardly seems worth the cost of the added line
noise.)

kj

--
NOTE: In my address everything before the first period is backwards;
and the last period, and everything after it, should be discarded.

>>>>> "k" == kj  <s@987jk.com.invalid> writes:

  k> In perlre the descriptions given for each item in the following
  k> pairs are identical:

the ? quantifier is non-greedy. it may have useful effects on some of these.

  k>   {n}   and {n}?     Match exactly n times

don't see how ? can change this match. probably there for syntactic consistancy

  k>   {n,}  and {n,}?    Match at least n times
  k>   {n,m} and {n,m}?   Match at least n but not more than m times

non-greedy makes sense with those two. with ? perl will try to match n times
and succeed and not go further. backtracking later one can still cause
more matching to occur up to the limit (none or m).

  k> I don't get it.  Is there a reason for this?  (The only one I can
  k> come up is that it may simplify the automatic construction of
  k> regexps, but this hardly seems worth the cost of the added line
  k> noise.)

you didn't analyze it enough to realize that ? can affect the latter two
quantifiers. only in the first one it is a no-op.

uri

--
Uri Guttman  ------  u@stemsystems.com  -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs  ----------------------------  http://jobs.perl.org

In article <f34fqm$q7@reader2.panix.com>,
    kj  <s@987jk.com.invalid> wrote:

: In perlre the descriptions given for each item in the following
: pairs are identical:
:
:   {n}   and {n}?     Match exactly n times
:   {n,}  and {n,}?    Match at least n times
:   {n,m} and {n,m}?   Match at least n but not more than m times
:
: I.e. it seems that in all the cases above the ? serves no purpose
: other than to add superfluous complexity to a regular expression.
:
: I don't get it.  Is there a reason for this?  (The only one I can
: come up is that it may simplify the automatic construction of
: regexps, but this hardly seems worth the cost of the added line
: noise.)

Note the following passage in the same manpage:

    By default, a quantified subpattern is "greedy", that
    is, it will match as many times as possible (given a
    particular starting location) while still allowing the
    rest of the pattern to match.  If you want it to match
    the minimum number of times possible, follow the
    quantifier with a "?".

For example:

    $ cat try
    #! /usr/bin/perl

    $_ = "aaaa";

    foreach my $pattern (qr/(a{2,3})/, qr/(a{2,3}?)/) {
      if (/$pattern/) {
        print "$pattern: \$1 = '$1' (length ", length($1), ")\n";
      }
      else {
        warn "$0: no match for $pattern!\n";
      }
    }

    $ ./try
    (?-xism:(a{2,3})): $1 = 'aaa' (length 3)
    (?-xism:(a{2,3}?)): $1 = 'aa' (length 2)

Note how the quantifier with the question mark is satisfied with a
shorter match.

Hope this helps,
Greg
--
Now let us retract the foreskin of misconception and apply the
wire brush of enlightenment.
    -- Geoff Miller

On Thu, 24 May 2007 16:51:34 +0000, kj wrote:
> In perlre the descriptions given for each item in the following
> pairs are identical:

>   {n}   and {n}?     Match exactly n times
>   {n,}  and {n,}?    Match at least n times
>   {n,m} and {n,m}?   Match at least n but not more than m times

> I.e. it seems that in all the cases above the ? serves no purpose
> other than to add superfluous complexity to a regular expression.

> I don't get it.  Is there a reason for this?  (The only one I can
> come up is that it may simplify the automatic construction of
> regexps, but this hardly seems worth the cost of the added line
> noise.)

Yes, there is a reason. Perl regular expressions are by default "greedy".
That means Perl will match the biggest possible pattern from the given
input. ? character denotes non-greedy match (+?, *? etc).

For example:

warbird:~% echo "abcabc" | perl -lne 'print $& if /a.*c/'
abcabc
warbird:~% echo "abcabc" | perl -lne 'print $& if /a.*?c/'
abc

--
Igor Pozgaj | ipozgaj at fly.srk.fer.hr
ICQ: 126002505  | IRC: @thunder (#linux@IdolNet)
PGP: 0xEF36A092 | http://fly.srk.fer.hr/~ipozgaj
http://ipozgaj.blogspot.com (/atom.xml RSS feed)

On May 24, 1:10 pm, Uri Guttman <u@stemsystems.com> wrote:

> >>>>> "k" == kj  <s@987jk.com.invalid> writes:

>   k> In perlre the descriptions given for each item in the following
>   k> pairs are identical:

> the ? quantifier is non-greedy.

Just to be pedantic, the ? *quantifier* is in fact greedy.  That is,
the quantifier that means "0 or 1 of the previous token" will prefer
to match 1.  A separate and distinct use of the question-mark
character is to make a preceding quantifier non-greedy.  So the ?
quantifier, modified by a ? will match 0 or 1 of the previous token,
preferring to match 0.

"ab" =~ /(ab?)/
matches, and sets $1 to 'ab'
"ab" =~ /(ab??)/
matches, and sets $1 to 'a'

Paul Lalli

>>>>> "PL" == Paul Lalli <mri@gmail.com> writes:

  PL> On May 24, 1:10 pm, Uri Guttman <u@stemsystems.com> wrote:
  >> >>>>> "k" == kj  <s@987jk.com.invalid> writes:
  >>
  k> In perlre the descriptions given for each item in the following
  k> pairs are identical:
  >>
  >> the ? quantifier is non-greedy.

  PL> Just to be pedantic, the ? *quantifier* is in fact greedy.  That is,
  PL> the quantifier that means "0 or 1 of the previous token" will prefer
  PL> to match 1.  A separate and distinct use of the question-mark
  PL> character is to make a preceding quantifier non-greedy.  So the ?
  PL> quantifier, modified by a ? will match 0 or 1 of the previous token,
  PL> preferring to match 0.

well, i should have said ? modifier vs the ? quantifier which is 0 or 1
times. but it is obvious from the OP that it is the ? modifier in
question.

uri

--
Uri Guttman  ------  u@stemsystems.com  -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs  ----------------------------  http://jobs.perl.org

In <x78xbe2lm5.@mail.sysarch.com> Uri Guttman <u@stemsystems.com> writes:

>>>>>> "k" == kj  <s@987jk.com.invalid> writes:
>  k> In perlre the descriptions given for each item in the following
>  k> pairs are identical:
>the ? quantifier is non-greedy.

I'm aware of this, and of the greediness when the ? is absent.
This alone does not explain to me the usefulness of ? in the cases
cited.

>  k>   {n}   and {n}?     Match exactly n times
>don't see how ? can change this match. probably there for syntactic consistancy
>  k>   {n,}  and {n,}?    Match at least n times
>  k>   {n,m} and {n,m}?   Match at least n but not more than m times
>non-greedy makes sense with those two. with ? perl will try to match n times
>and succeed and not go further. backtracking later one can still cause

                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>more matching to occur up to the limit (none or m).

 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

OK, this is what I was missing, the "backtracking later..." bit.
I mean, if I just wanted "perl to try to match n times and succeed
and not go further" I'd just use {n}; no need for ? anywhere.

Still, I'm not entirely clear on the backtracking stuff you allude
to.  Could someone give me a concrete example where

  {n,}?

would be preferable to plain ol'

{n}

?

Thanks!

kj

--
NOTE: In my address everything before the first period is backwards;
and the last period, and everything after it, should be discarded.

kj wrote:
> In <x78xbe2lm5.@mail.sysarch.com> Uri Guttman <u@stemsystems.com> writes:

>>>>>>> "k" == kj  <s@987jk.com.invalid> writes:

>>  k> In perlre the descriptions given for each item in the following
>>  k> pairs are identical:

>> the ? quantifier is non-greedy.

> I'm aware of this, and of the greediness when the ? is absent.
> This alone does not explain to me the usefulness of ? in the cases
> cited.

Please read

     perldoc -q greedy

That FAQ entry might help clarifying the greediness concept.

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

>>>>> "k" == kj  <s@987jk.com.invalid> writes:

  k> In <x78xbe2lm5.@mail.sysarch.com> Uri Guttman <u@stemsystems.com> writes:

  k> {n,}  and {n,}?    Match at least n times
  k> {n,m} and {n,m}?   Match at least n but not more than m times

  >> non-greedy makes sense with those two. with ? perl will try to
  >> match n times and succeed and not go further. backtracking later
  >> one can still cause
  k>                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  >> more matching to occur up to the limit (none or m).
  k>  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  k> OK, this is what I was missing, the "backtracking later..." bit.
  k> I mean, if I just wanted "perl to try to match n times and succeed
  k> and not go further" I'd just use {n}; no need for ? anywhere.

  k> Still, I'm not entirely clear on the backtracking stuff you allude
  k> to.  Could someone give me a concrete example where

  k>   {n,}?

  k> would be preferable to plain ol'

  k> {n}

you have to know what backtracking is first. it happens when a variable
length match fails in a later part. this can happen with {n,m}? if that
matches n or more things but then fails to match afterwards. the regex
engine will then backtrack over the failed match and redo the previous
part (the n,m stuff) with a longer match. so it could fail on n but
match n+1 and then the next part could match. now if it could actually
match up to m times it won't with ? since non-greedy means stop as soon
as you have enough for a complete match. it wouldn't be too hard to show
a case where {n} fails but {n,m}? would match with more than n repeated
matches. but i am too tired to come up with it. i am sure others here
can do it.

but as i said, {n}? is the same as {n} since there is no variable length
there. ? only makes sense on a variable length match like + * {n,} and
{n,m}.

and do what another poster just said, rtfm on non-greedy.

uri

--
Uri Guttman  ------  u@stemsystems.com  -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs  ----------------------------  http://jobs.perl.org

On 5 25 , 11 36 , kj <s@987jk.com.invalid> wrote:

> Still, I'm not entirely clear on the backtracking stuff you allude
> to.  Could someone give me a concrete example where

>   {n,}?

> would be preferable to plain ol'

> {n}

> ?

> Thanks!

> kj

{n,}? is not identical to {n}, though sometimes it seems to be so.

Consider, for example, two patterns "ab{2,}?c" and "ab{2}c". The
latter can match "aabbbcc", while the former can't.

In fact, "?" is not needed in the above example. But I think you know
the difference between "{n,}?" and "{n,}". :-)

> Consider, for example, two patterns "ab{2,}?c" and "ab{2}c". The
> latter can match "aabbbcc", while the former can't.

Oops, I should have written: "ab{2}c" and "ab{2,}c". I'm sorry if
anyone was confused.
On May 24, 3:43 pm, Uri Guttman <u@stemsystems.com> wrote:

Since we're in pedantic mode, perlre.pod seems a bit loose with
"modifier"
vs. "quantifier" in a few places.  Below, I've replaced "modifier"
with
"quantifier" where I think appropriate.

This is perl, v5.8.6 built for i386-openbsd, but it seems to agree
with
http://perldoc.perl.org/perlre.html (Perl 5.8.8)

<quasi-patch>
diff -u perlre.pod.orig perlre.pod
--- perlre.pod.orig     Fri May 25 20:32:10 2007
+++ perlre.pod  Fri May 25 20:47:55 2007
@@ -122,8 +122,8 @@

 (If a curly bracket occurs in any other context, it is treated
 as a regular character.  In particular, the lower bound
-is not optional.)  The "*" modifier is equivalent to C<{0,}>, the "+"
-modifier to C<{1,}>, and the "?" modifier to C<{0,1}>.  n and m are
limited
+is not optional.)  The "*" quantifier is equivalent to C<{0,}>, the
"+"
+quantifier to C<{1,}>, and the "?" quantifier to C<{0,1}>.  n and m
are limited
 to integral values less than a preset limit defined when perl is
built.
 This is usually 32766 on the most common platforms.  The actual limit
can
 be seen in the error message generated by code such as this:
@@ -1103,7 +1103,7 @@

 The C<o?> can match at the beginning of C<'foo'>, and since the
position
 in the string is not moved by the match, C<o?> would match again and
again
-because of the C<*> modifier.  Another common way to create a similar
cycle
+because of the C<*> quantifier.  Another common way to create a
similar cycle
 is with the looping modifier C<//g>:

     @matches = ( 'foo' =~ m{ o? }xg );
@@ -1123,7 +1123,7 @@

 Thus Perl allows such constructs, by I<forcefully breaking
 the infinite loop>.  The rules for this are different for lower-level
-loops given by the greedy modifiers C<*+{}>, and for higher-level
+loops given by the greedy quantifiers C<*+{}>, and for higher-level
 ones like the C</g> modifier or split() operator.

 The lower-level loops are I<interrupted> (that is, the loop is
</quasi-patch>

Otherwise the pod consistently refers to /imszge as "modifiers".

The places (unless I missed any) where it refers to the
non-greedy '?' thingy with words are:

If you want it to match the minimum number of times possible, follow
the quantifier with a "?". Note that the meanings don't change, just
the "greediness":
...
A fundamental feature of regular expression matching involves the
notion called backtracking, which is currently used (when needed)
by all regular expression quantifiers, namely * , *? , + , +?, {n,m},
and {n,m}?. Backtracking is often optimized internally, but the
general principle outlined here is valid.
...
At each position of the string the best match given by non-greedy ??
is the zero-length match, and the second best match is what is matched
by \w.

Conclusion: I don't see any place that explicitly refers to the
non-greedy ? as a modifier of a quantifier, even though that's
logically what it is.

Personally, I think it's questionable the number of different
purposes that '?' serves.  <maniacal_laughter/>

--
Brad

Add to del.icio.us | Digg this | Stumble it | Powered by Megasolutions Inc