|
|
 |
 |
 |
 |
Perl Programming Language
|
 |
 |
 |
 |
 |
 |
 |
 |
perlre: why {n}?, {n,}?, {n,m}? ?
In perlre the descriptions given for each item in the following pairs are identical: {n} and {n}? Match exactly n times {n,} and {n,}? Match at least n times {n,m} and {n,m}? Match at least n but not more than m times I.e. it seems that in all the cases above the ? serves no purpose other than to add superfluous complexity to a regular expression. I don't get it. Is there a reason for this? (The only one I can come up is that it may simplify the automatic construction of regexps, but this hardly seems worth the cost of the added line noise.) kj -- NOTE: In my address everything before the first period is backwards; and the last period, and everything after it, should be discarded.
>>>>> "k" == kj <s @987jk.com.invalid> writes: k> In perlre the descriptions given for each item in the following k> pairs are identical: the ? quantifier is non-greedy. it may have useful effects on some of these. k> {n} and {n}? Match exactly n times don't see how ? can change this match. probably there for syntactic consistancy k> {n,} and {n,}? Match at least n times k> {n,m} and {n,m}? Match at least n but not more than m times non-greedy makes sense with those two. with ? perl will try to match n times and succeed and not go further. backtracking later one can still cause more matching to occur up to the limit (none or m). k> I don't get it. Is there a reason for this? (The only one I can k> come up is that it may simplify the automatic construction of k> regexps, but this hardly seems worth the cost of the added line k> noise.) you didn't analyze it enough to realize that ? can affect the latter two quantifiers. only in the first one it is a no-op. uri -- Uri Guttman ------ u@stemsystems.com -------- http://www.stemsystems.com --Perl Consulting, Stem Development, Systems Architecture, Design and Coding- Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
In article <f34fqm$q7@reader2.panix.com>, kj <s@987jk.com.invalid> wrote: : In perlre the descriptions given for each item in the following : pairs are identical: : : {n} and {n}? Match exactly n times : {n,} and {n,}? Match at least n times : {n,m} and {n,m}? Match at least n but not more than m times : : I.e. it seems that in all the cases above the ? serves no purpose : other than to add superfluous complexity to a regular expression. : : I don't get it. Is there a reason for this? (The only one I can : come up is that it may simplify the automatic construction of : regexps, but this hardly seems worth the cost of the added line : noise.) Note the following passage in the same manpage: By default, a quantified subpattern is "greedy", that is, it will match as many times as possible (given a particular starting location) while still allowing the rest of the pattern to match. If you want it to match the minimum number of times possible, follow the quantifier with a "?". For example: $ cat try #! /usr/bin/perl $_ = "aaaa"; foreach my $pattern (qr/(a{2,3})/, qr/(a{2,3}?)/) { if (/$pattern/) { print "$pattern: \$1 = '$1' (length ", length($1), ")\n"; } else { warn "$0: no match for $pattern!\n"; } } $ ./try (?-xism:(a{2,3})): $1 = 'aaa' (length 3) (?-xism:(a{2,3}?)): $1 = 'aa' (length 2) Note how the quantifier with the question mark is satisfied with a shorter match. Hope this helps, Greg -- Now let us retract the foreskin of misconception and apply the wire brush of enlightenment. -- Geoff Miller
On Thu, 24 May 2007 16:51:34 +0000, kj wrote: > In perlre the descriptions given for each item in the following > pairs are identical: > {n} and {n}? Match exactly n times > {n,} and {n,}? Match at least n times > {n,m} and {n,m}? Match at least n but not more than m times > I.e. it seems that in all the cases above the ? serves no purpose > other than to add superfluous complexity to a regular expression. > I don't get it. Is there a reason for this? (The only one I can > come up is that it may simplify the automatic construction of > regexps, but this hardly seems worth the cost of the added line > noise.)
Yes, there is a reason. Perl regular expressions are by default "greedy". That means Perl will match the biggest possible pattern from the given input. ? character denotes non-greedy match (+?, *? etc). For example: warbird:~% echo "abcabc" | perl -lne 'print $& if /a.*c/' abcabc warbird:~% echo "abcabc" | perl -lne 'print $& if /a.*?c/' abc -- Igor Pozgaj | ipozgaj at fly.srk.fer.hr ICQ: 126002505 | IRC: @thunder (#linux@IdolNet) PGP: 0xEF36A092 | http://fly.srk.fer.hr/~ipozgaj http://ipozgaj.blogspot.com (/atom.xml RSS feed)
On May 24, 1:10 pm, Uri Guttman <u@stemsystems.com> wrote: > >>>>> "k" == kj <s @987jk.com.invalid> writes: > k> In perlre the descriptions given for each item in the following > k> pairs are identical: > the ? quantifier is non-greedy.
Just to be pedantic, the ? *quantifier* is in fact greedy. That is, the quantifier that means "0 or 1 of the previous token" will prefer to match 1. A separate and distinct use of the question-mark character is to make a preceding quantifier non-greedy. So the ? quantifier, modified by a ? will match 0 or 1 of the previous token, preferring to match 0. "ab" =~ /(ab?)/ matches, and sets $1 to 'ab' "ab" =~ /(ab??)/ matches, and sets $1 to 'a' Paul Lalli
>>>>> "PL" == Paul Lalli <mri @gmail.com> writes: PL> On May 24, 1:10 pm, Uri Guttman <u@stemsystems.com> wrote: >> >>>>> "k" == kj <s@987jk.com.invalid> writes: >> k> In perlre the descriptions given for each item in the following k> pairs are identical: >> >> the ? quantifier is non-greedy. PL> Just to be pedantic, the ? *quantifier* is in fact greedy. That is, PL> the quantifier that means "0 or 1 of the previous token" will prefer PL> to match 1. A separate and distinct use of the question-mark PL> character is to make a preceding quantifier non-greedy. So the ? PL> quantifier, modified by a ? will match 0 or 1 of the previous token, PL> preferring to match 0. well, i should have said ? modifier vs the ? quantifier which is 0 or 1 times. but it is obvious from the OP that it is the ? modifier in question. uri -- Uri Guttman ------ u@stemsystems.com -------- http://www.stemsystems.com --Perl Consulting, Stem Development, Systems Architecture, Design and Coding- Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
In <x78xbe2lm5.@mail.sysarch.com> Uri Guttman <u@stemsystems.com> writes: >>>>>> "k" == kj <s @987jk.com.invalid> writes: > k> In perlre the descriptions given for each item in the following > k> pairs are identical: >the ? quantifier is non-greedy. I'm aware of this, and of the greediness when the ? is absent. This alone does not explain to me the usefulness of ? in the cases cited. > k> {n} and {n}? Match exactly n times >don't see how ? can change this match. probably there for syntactic consistancy > k> {n,} and {n,}? Match at least n times > k> {n,m} and {n,m}? Match at least n but not more than m times >non-greedy makes sense with those two. with ? perl will try to match n times >and succeed and not go further. backtracking later one can still cause
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ >more matching to occur up to the limit (none or m).
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ OK, this is what I was missing, the "backtracking later..." bit. I mean, if I just wanted "perl to try to match n times and succeed and not go further" I'd just use {n}; no need for ? anywhere. Still, I'm not entirely clear on the backtracking stuff you allude to. Could someone give me a concrete example where {n,}? would be preferable to plain ol' {n} ? Thanks! kj -- NOTE: In my address everything before the first period is backwards; and the last period, and everything after it, should be discarded.
kj wrote: > In <x78xbe2lm5. @mail.sysarch.com> Uri Guttman <u @stemsystems.com> writes: >>>>>>> "k" == kj <s@987jk.com.invalid> writes: >> k> In perlre the descriptions given for each item in the following >> k> pairs are identical: >> the ? quantifier is non-greedy. > I'm aware of this, and of the greediness when the ? is absent. > This alone does not explain to me the usefulness of ? in the cases > cited.
Please read perldoc -q greedy That FAQ entry might help clarifying the greediness concept. -- Gunnar Hjalmarsson Email: http://www.gunnar.cc/cgi-bin/contact.pl
>>>>> "k" == kj <s @987jk.com.invalid> writes: k> In <x78xbe2lm5.@mail.sysarch.com> Uri Guttman <u@stemsystems.com> writes: k> {n,} and {n,}? Match at least n times k> {n,m} and {n,m}? Match at least n but not more than m times >> non-greedy makes sense with those two. with ? perl will try to >> match n times and succeed and not go further. backtracking later >> one can still cause k> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ >> more matching to occur up to the limit (none or m). k> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ k> OK, this is what I was missing, the "backtracking later..." bit. k> I mean, if I just wanted "perl to try to match n times and succeed k> and not go further" I'd just use {n}; no need for ? anywhere. k> Still, I'm not entirely clear on the backtracking stuff you allude k> to. Could someone give me a concrete example where k> {n,}? k> would be preferable to plain ol' k> {n} you have to know what backtracking is first. it happens when a variable length match fails in a later part. this can happen with {n,m}? if that matches n or more things but then fails to match afterwards. the regex engine will then backtrack over the failed match and redo the previous part (the n,m stuff) with a longer match. so it could fail on n but match n+1 and then the next part could match. now if it could actually match up to m times it won't with ? since non-greedy means stop as soon as you have enough for a complete match. it wouldn't be too hard to show a case where {n} fails but {n,m}? would match with more than n repeated matches. but i am too tired to come up with it. i am sure others here can do it. but as i said, {n}? is the same as {n} since there is no variable length there. ? only makes sense on a variable length match like + * {n,} and {n,m}. and do what another poster just said, rtfm on non-greedy. uri -- Uri Guttman ------ u@stemsystems.com -------- http://www.stemsystems.com --Perl Consulting, Stem Development, Systems Architecture, Design and Coding- Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
On 5 25 , 11 36 , kj <s@987jk.com.invalid> wrote: > Still, I'm not entirely clear on the backtracking stuff you allude > to. Could someone give me a concrete example where > {n,}? > would be preferable to plain ol' > {n} > ? > Thanks! > kj
{n,}? is not identical to {n}, though sometimes it seems to be so. Consider, for example, two patterns "ab{2,}?c" and "ab{2}c". The latter can match "aabbbcc", while the former can't. In fact, "?" is not needed in the above example. But I think you know the difference between "{n,}?" and "{n,}". :-)
> Consider, for example, two patterns "ab{2,}?c" and "ab{2}c". The > latter can match "aabbbcc", while the former can't.
Oops, I should have written: "ab{2}c" and "ab{2,}c". I'm sorry if anyone was confused.
On May 24, 3:43 pm, Uri Guttman <u@stemsystems.com> wrote:
> >>>>> "PL" == Paul Lalli <mri @gmail.com> writes: > PL> On May 24, 1:10 pm, Uri Guttman <u@stemsystems.com> wrote: > >> >>>>> "k" == kj <s@987jk.com.invalid> writes: > k> In perlre the descriptions given for each item in the following > k> pairs are identical: > >> the ? quantifier is non-greedy. > PL> Just to be pedantic, the ? *quantifier* is in fact greedy. That is, > PL> the quantifier that means "0 or 1 of the previous token" will prefer > PL> to match 1. A separate and distinct use of the question-mark > PL> character is to make a preceding quantifier non-greedy. So the ? > PL> quantifier, modified by a ? will match 0 or 1 of the previous token, > PL> preferring to match 0. > well, i should have said ? modifier vs the ? quantifier which is 0 or 1 > times. but it is obvious from the OP that it is the ? modifier in > question.
Since we're in pedantic mode, perlre.pod seems a bit loose with "modifier" vs. "quantifier" in a few places. Below, I've replaced "modifier" with "quantifier" where I think appropriate. This is perl, v5.8.6 built for i386-openbsd, but it seems to agree with http://perldoc.perl.org/perlre.html (Perl 5.8.8) <quasi-patch> diff -u perlre.pod.orig perlre.pod --- perlre.pod.orig Fri May 25 20:32:10 2007 +++ perlre.pod Fri May 25 20:47:55 2007 @@ -122,8 +122,8 @@ (If a curly bracket occurs in any other context, it is treated as a regular character. In particular, the lower bound -is not optional.) The "*" modifier is equivalent to C<{0,}>, the "+" -modifier to C<{1,}>, and the "?" modifier to C<{0,1}>. n and m are limited +is not optional.) The "*" quantifier is equivalent to C<{0,}>, the "+" +quantifier to C<{1,}>, and the "?" quantifier to C<{0,1}>. n and m are limited to integral values less than a preset limit defined when perl is built. This is usually 32766 on the most common platforms. The actual limit can be seen in the error message generated by code such as this: @@ -1103,7 +1103,7 @@ The C<o?> can match at the beginning of C<'foo'>, and since the position in the string is not moved by the match, C<o?> would match again and again -because of the C<*> modifier. Another common way to create a similar cycle +because of the C<*> quantifier. Another common way to create a similar cycle is with the looping modifier C<//g>: @matches = ( 'foo' =~ m{ o? }xg ); @@ -1123,7 +1123,7 @@ Thus Perl allows such constructs, by I<forcefully breaking the infinite loop>. The rules for this are different for lower-level -loops given by the greedy modifiers C<*+{}>, and for higher-level +loops given by the greedy quantifiers C<*+{}>, and for higher-level ones like the C</g> modifier or split() operator. The lower-level loops are I<interrupted> (that is, the loop is </quasi-patch> Otherwise the pod consistently refers to /imszge as "modifiers". The places (unless I missed any) where it refers to the non-greedy '?' thingy with words are: If you want it to match the minimum number of times possible, follow the quantifier with a "?". Note that the meanings don't change, just the "greediness": ... A fundamental feature of regular expression matching involves the notion called backtracking, which is currently used (when needed) by all regular expression quantifiers, namely * , *? , + , +?, {n,m}, and {n,m}?. Backtracking is often optimized internally, but the general principle outlined here is valid. ... At each position of the string the best match given by non-greedy ?? is the zero-length match, and the second best match is what is matched by \w. Conclusion: I don't see any place that explicitly refers to the non-greedy ? as a modifier of a quantifier, even though that's logically what it is. Personally, I think it's questionable the number of different purposes that '?' serves. <maniacal_laughter/> -- Brad
|
 |
 |
 |
 |
|