|
|
 |
 |
 |
 |
Perl Programming Language
|
 |
 |
 |
 |
 |
 |
 |
 |
find last match in a string?
Normally, perl searches left-to-right, and (consequently) will normally give me the first match for a regexp in a string. Is there anyway to get the LAST match? e.g. my $a = "abcabd"; $a =~ m/(ab.)/; print "$1\n"; I'd like (somehow) to get "abd" printed, not "abc" which is what the above code (obviously) does. BugBear
bugbear wrote: > Normally, perl searches left-to-right, and > (consequently) will normally give me > the first match for a regexp in a string. > Is there anyway to get the LAST match? > e.g. > my $a = "abcabd"; > $a =~ m/(ab.)/; > print "$1\n"; > I'd like (somehow) to get "abd" printed, not "abc" > which is what the above code (obviously) does.
$a =~ /.*(ab.)/s; -- Gunnar Hjalmarsson Email: http://www.gunnar.cc/cgi-bin/contact.pl
On May 29, 6:26 am, bugbear <bugbear@trim_papermule.co.uk_trim> wrote: > Normally, perl searches left-to-right, and > (consequently) will normally give me > the first match for a regexp in a string. > Is there anyway to get the LAST match? > e.g. > my $a = "abcabd"; > $a =~ m/(ab.)/; > print "$1\n"; > I'd like (somehow) to get "abd" printed, not "abc" > which is what the above code (obviously) does.
Gunnar's solution is the right way to go, IMHO, but in the spirit of TIMTOWTDI... $ perl -le' my $a = "abcabd"; my $last = ($a =~ m/(ab.)/g)[-1]; print $last; ' abd Or even... $ perl -le' my $a = "abcabd"; (reverse $a) =~ m/(.ba)/; print scalar reverse $1; ' abd Please don't do that, though... icky. :-) Paul Lalli
Gunnar Hjalmarsson wrote: > bugbear wrote: >> Normally, perl searches left-to-right, and >> (consequently) will normally give me >> the first match for a regexp in a string. >> Is there anyway to get the LAST match? >> e.g. >> my $a = "abcabd"; >> $a =~ m/(ab.)/; >> print "$1\n"; >> I'd like (somehow) to get "abd" printed, not "abc" >> which is what the above code (obviously) does. > $a =~ /.*(ab.)/s;
Are there any performance or memory issues with this approach if $a is rather large? (oh, and thanks for replying so fast, and so well) BugBear
bugbear wrote: > Gunnar Hjalmarsson wrote: >> bugbear wrote: >>> my $a = "abcabd"; >>> $a =~ m/(ab.)/; >>> print "$1\n"; >>> I'd like (somehow) to get "abd" printed, not "abc" >>> which is what the above code (obviously) does. >> $a =~ /.*(ab.)/s; > Are there any performance or memory issues > with this approach if $a is rather large?
Not that I'm aware of. Btw, how large is "rather large"? -- Gunnar Hjalmarsson Email: http://www.gunnar.cc/cgi-bin/contact.pl
Gunnar Hjalmarsson wrote: > bugbear wrote: >> Gunnar Hjalmarsson wrote: >>> bugbear wrote: >>>> my $a = "abcabd"; >>>> $a =~ m/(ab.)/; >>>> print "$1\n"; >>>> I'd like (somehow) to get "abd" printed, not "abc" >>>> which is what the above code (obviously) does. >>> $a =~ /.*(ab.)/s; >> Are there any performance or memory issues >> with this approach if $a is rather large? > Not that I'm aware of. > Btw, how large is "rather large"?
No probs - in my actual usage, a coupla' K, but I was thinking more generally, since the regexp would essentially be matching the majority of the input (assuming my "real" target is near the end) BugBear
On Tue, 29 May 2007 13:05:18 +0100, bugbear <bugbear@trim_papermule.co.uk_trim> wrote: >> $a =~ /.*(ab.)/s; >Are there any performance or memory issues >with this approach if $a is rather large?
Yes, terrible ones. I would start worrying as soon as possible. Fire up Benchmark.pm and start micro-optimizing. Michele -- {$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr (($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^<R<Y]*YB=' .'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,$_, 256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,
bugbear <bugbear@trim_papermule.co.uk_trim> wrote: > Gunnar Hjalmarsson wrote: > > bugbear wrote: > >> Normally, perl searches left-to-right, and > >> (consequently) will normally give me > >> the first match for a regexp in a string. > >> Is there anyway to get the LAST match? > >> e.g. > >> my $a = "abcabd"; > >> $a =~ m/(ab.)/; > >> print "$1\n"; > >> I'd like (somehow) to get "abd" printed, not "abc" > >> which is what the above code (obviously) does. > > $a =~ /.*(ab.)/s; > Are there any performance or memory issues > with this approach if $a is rather large?
Not particularly, other than the ones associated with $a being large in the first place. Xho -- -------------------- http://NewsReader.Com/ -------------------- Usenet Newsgroup Service $9.95/Month 30GB
bugbear (bugbear@trim_papermule.co.uk_trim) wrote on M September MCMXCIII in <URL:news:465c16ff$0$8715$ed2619ec@ptn-nntp-reader02.plus.net>: )) Gunnar Hjalmarsson wrote: )) > bugbear wrote:
)) >> Normally, perl searches left-to-right, and )) >> (consequently) will normally give me )) >> the first match for a regexp in a string. )) >> )) >> Is there anyway to get the LAST match? )) >> )) >> e.g. )) >> )) >> my $a = "abcabd"; )) >> $a =~ m/(ab.)/; )) >> print "$1\n"; )) >> )) >> I'd like (somehow) to get "abd" printed, not "abc" )) >> which is what the above code (obviously) does. )) > )) > $a =~ /.*(ab.)/s; )) )) Are there any performance or memory issues )) with this approach if $a is rather large? Of course. It's the same as if you enter a large hallway of say a castle. Getting to the *last* room will take longer then getting to the *first* room. Abigail -- perl -wle 'eval {die [[qq [Just another Perl Hacker]]]};; print ${${${@}}[$#{@{${@}}}]}[$#{${@{${@}}}[$#{@{${@}}}]}]'
bugbear wrote: > Normally, perl searches left-to-right, and > (consequently) will normally give me > the first match for a regexp in a string. > Is there anyway to get the LAST match? > e.g. > my $a = "abcabd"; > $a =~ m/(ab.)/; > print "$1\n"; > I'd like (somehow) to get "abd" printed, not "abc" > which is what the above code (obviously) does.
$ perl -le' my $string = reverse "abcabd"; my $pattern = reverse "ab."; print scalar reverse $1 if $string =~ /($pattern)/; ' abd John -- Perl isn't a toolbox, but a small machine shop where you can special-order certain sorts of tools at low cost and in short order. -- Larry Wall
On May 29, 2:29 pm, bugbear <bugbear@trim_papermule.co.uk_trim> wrote:
> Gunnar Hjalmarsson wrote: > > bugbear wrote: > >> Gunnar Hjalmarsson wrote: > >>> bugbear wrote: > >>>> my $a = "abcabd"; > >>>> $a =~ m/(ab.)/; > >>>> print "$1\n"; > >>>> I'd like (somehow) to get "abd" printed, not "abc" > >>>> which is what the above code (obviously) does. > >>> $a =~ /.*(ab.)/s; > >> Are there any performance or memory issues > >> with this approach if $a is rather large? > > Not that I'm aware of. > > Btw, how large is "rather large"? > No probs - in my actual usage, a coupla' K, > but I was thinking more generally, since the regexp > would essentially be matching the majority > of the input (assuming my "real" target > is near the end)
If the data is large, speed is of the essence and the target is a literal string then use rindex(). It is one $EXPLETIVE of a lot faster! Do not let personalities[1] put you off using an index()/rindex()/ substr() approach rather than a regex approach. If ease of coding is more important then speed use the pattern match. Note, there's no big memory issue with the regex as the pattern is not capturing the .* bit. use strict; use warnings; use Benchmark; my $a="sadihdiasjdisajdisadisadjsiadjisadjisadjsadabzaaa" x 10_000; timethese 10_000, { match => sub { my ($q) = $a =~ /.*(ab.)/; }, rindex => sub { my $i = rindex $a,'ab'; my $q = $i == -1 ? undef : substr($a,$i,3); } };
__END__ Benchmark: timing 10000 iterations of match, rindex... match: 21 wallclock secs (15.96 usr + 0.17 sys = 16.13 CPU) @ 619.85/s (n=10000) rindex: 0 wallclock secs ( 0.01 usr + 0.00 sys = 0.01 CPU) @ 1000000.00/s (n=10000) (warning: too few iterations for a reliable count) [1] In particular a certain "personality" who recommends index()/ rindex()/substr() for _all_ problems that would usually be solved with a simple regex.
On May 29, 1:35 pm, Brian McCauley <nobul@gmail.com> wrote:
> On May 29, 2:29 pm, bugbear <bugbear@trim_papermule.co.uk_trim> wrote: > > Gunnar Hjalmarsson wrote: > > > bugbear wrote: > > >> Gunnar Hjalmarsson wrote: > > >>> bugbear wrote: > > >>>> my $a = "abcabd"; > > >>>> $a =~ m/(ab.)/; > > >>>> print "$1\n"; > > >>>> I'd like (somehow) to get "abd" printed, not "abc" > > >>>> which is what the above code (obviously) does. > > >>> $a =~ /.*(ab.)/s; > > >> Are there any performance or memory issues > > >> with this approach if $a is rather large? > > > Not that I'm aware of. > > > Btw, how large is "rather large"? > > No probs - in my actual usage, a coupla' K, > > but I was thinking more generally, since the regexp > > would essentially be matching the majority > > of the input (assuming my "real" target > > is near the end) > If the data is large, speed is of the essence and the target is a > literal string then use rindex(). It is one $EXPLETIVE of a lot > faster! > Do not let personalities[1] put you off using an index()/rindex()/ > substr() approach rather than a regex approach. > If ease of coding is more important then speed use the pattern match. > Note, there's no big memory issue with the regex as the pattern is not > capturing the .* bit. > use strict; > use warnings; > use Benchmark; > my $a="sadihdiasjdisajdisadisadjsiadjisadjisadjsadabzaaa" x 10_000; > timethese 10_000, { > match => sub { > my ($q) = $a =~ /.*(ab.)/; > }, > rindex => sub { > my $i = rindex $a,'ab'; > my $q = $i == -1 ? undef : substr($a,$i,3); > }};
what if $a = "abzab"; you need to fix this first... Regards, Xicheng
On May 29, 6:52 pm, Xicheng Jia <xich@gmail.com> wrote:
> On May 29, 1:35 pm, Brian McCauley <nobul @gmail.com> wrote: > > On May 29, 2:29 pm, bugbear <bugbear@trim_papermule.co.uk_trim> wrote: > > > Gunnar Hjalmarsson wrote: > > > > bugbear wrote: > > > >> Gunnar Hjalmarsson wrote: > > > >>> bugbear wrote: > > > >>>> my $a = "abcabd"; > > > >>>> $a =~ m/(ab.)/; > > > >>>> print "$1\n"; > > > >>>> I'd like (somehow) to get "abd" printed, not "abc" > > > >>>> which is what the above code (obviously) does. > > > >>> $a =~ /.*(ab.)/s; > > If the data is large, speed is of the essence and the target is a > > literal string then use rindex(). It is one $EXPLETIVE of a lot > > faster! > > my $i = rindex $a,'ab'; > > my $q = $i == -1 ? undef : substr($a,$i,3); > what if > $a = "abzab"; > you need to fix this first...
It would be more accurate to say you need to _consider_ this. Although the rindex() solution does not exactly match the regex solution it is very likely that in the OP's real situation this is a non-issue so there's nothing to fix.
bugbear wrote: >Gunnar Hjalmarsson wrote: >> bugbear wrote: >>> Gunnar Hjalmarsson wrote: >>>> bugbear wrote: >>>>> I'd like (somehow) to get "abd" printed, not "abc" >>>>> which is what the above code (obviously) does. >>>> $a =~ /.*(ab.)/s; >>> Are there any performance or memory issues >>> with this approach if $a is rather large? >> Not that I'm aware of. >> Btw, how large is "rather large"? >No probs - in my actual usage, a coupla' K, >but I was thinking more generally, since the regexp >would essentially be matching the majority >of the input (assuming my "real" target >is near the end)
Good point. Once your string gets big, you might reconsider to use the pattern of "sexeger", which is just the reverse of "regexes", i.e. regexes from the back. See the post on Perlmonks: http://perlmonks.org/?node_id=33410 You (likely still) have to reverse the regex by hand, and you may have to test to see if it's worth the trouble. -- Bart.
|
 |
 |
 |
 |
|