|
|
 |
 |
 |
 |
Ruby Programming Language
|
 |
 |
 |
 |
 |
 |
 |
 |
No way of looking for a regrexp match starting from a particular point in a string?
I'm probably just missing something obvious, but I haven't found a way to match a regular expression against only part of a string, in particular only past a certain point of a string, as a way of finding successive matches. Of course, one could do a match against a string, take the substring past that match and do a match against the substring, and so on, to find all of the matches for the string, but that could be very expensive for very large strings. I'm aware of the String.scan method, but that doesn't work for me because it doesn't return MatchData instances. What I want is just something like regexp.match(string, n), where the regexp starts looking for a match at or after position n in the string. Thanks, Ken
Hi, At Sun, 3 Jun 2007 12:59:24 +0900, Kenneth McDonald wrote in [ruby-talk:254054]: > What I want is just something like regexp.match(string, n), where the > regexp starts looking for a match at or after position n in the string.
string.index(regexp, n) -- Nobu Nakada
On 6/3/07, Kenneth McDonald <kenneth.m.mcdon@sbcglobal.net> wrote: > What I want is just something like regexp.match(string, n), where the > regexp starts looking for a match at or after position n in the string. > Thanks, > Ken
You could match the string but ignore the first part of the match. str = "abcdefghabcehjjjuabcfjkiabcgdfg" str =~ /(abc.)/ p $1 # abcd str =~ /a.*ju(abc.)/ p $1 #abcf Harry -- A Look into Japanese Ruby List in English http://www.kakueki.com/
On 6/2/07, Kenneth McDonald <kenneth.m.mcdon@sbcglobal.net> wrote:
> I'm probably just missing something obvious, but I haven't found a way > to match a regular expression against only part of a string, in > particular only past a certain point of a string, as a way of finding > successive matches. Of course, one could do a match against a string, > take the substring past that match and do a match against the substring, > and so on, to find all of the matches for the string, but that could be > very expensive for very large strings. > I'm aware of the String.scan method, but that doesn't work for me > because it doesn't return MatchData instances. > What I want is just something like regexp.match(string, n), where the > regexp starts looking for a match at or after position n in the string. > Thanks, > Ken
I don't know of anything obvious, but I would probably do something a little more like: class String def match_each(exp) str = self while md = str.match(exp) yield md str = md.post_match end end end foo = "foo bar foo bar foo" foo.match_each /[oa][or]/ do |md| puts "Found: #{md}" end # pth
On 6/3/07, Nobuyoshi Nakada <n@ruby-lang.org> wrote: > Hi, > At Sun, 3 Jun 2007 12:59:24 +0900, > Kenneth McDonald wrote in [ruby-talk:254054]: > > What I want is just something like regexp.match(string, n), where the > > regexp starts looking for a match at or after position n in the string. > string.index(regexp, n) > -- > Nobu Nakada
I think he wanted MatchData objects. The String#index method returns the index (numeric position of the match). But if all you want are captures, then index is a good solution. pth
Kenneth McDonald wrote: > I'm probably just missing something obvious, but I haven't found a way > to match a regular expression against only part of a string, in > particular only past a certain point of a string, as a way of finding > successive matches. Of course, one could do a match against a string, > take the substring past that match and do a match against the substring, > and so on, to find all of the matches for the string, but that could be > very expensive for very large strings. > I'm aware of the String.scan method, but that doesn't work for me > because it doesn't return MatchData instances. > What I want is just something like regexp.match(string, n), where the > regexp starts looking for a match at or after position n in the string. > Thanks, > Ken
How about this? def match(s, re, n) /(?:.{#{n}})(#{re})/.match(s) end irb(main):043:0> p s "abdefgh abdefgh abdefgh abdefgh abdefgh abdefgh abdefgh abdefgh abdefgh abdefgh " irb(main):044:0> p match(s, /abd/, 10).begin(1) 16 irb(main):045:0> p match(s, /abd/, 20).begin(1) 24 -- Posted via http://www.ruby-forum.com/.
On 6/3/07, Harry Kakueki <list.p@gmail.com> wrote:
> On 6/3/07, Kenneth McDonald <kenneth.m.mcdon @sbcglobal.net> wrote: > > What I want is just something like regexp.match(string, n), where the > > regexp starts looking for a match at or after position n in the string. > > Thanks, > > Ken > You could match the string but ignore the first part of the match. > str = "abcdefghabcehjjjuabcfjkiabcgdfg" > str =~ /(abc.)/ > p $1 # abcd > str =~ /a.*ju(abc.)/ > p $1 #abcf > Harry
If you want to specify the point in the string by number, you could do this. str = "abcdefghabcehjjjuabcfjkiabcgdfg" str =~ /.{10}(abc.).*(abc.)/ p $1 #abcf p $2 #abcg Harry -- A Look into Japanese Ruby List in English http://www.kakueki.com/
Edwin Fine wrote: > Kenneth McDonald wrote: >> I'm probably just missing something obvious, but I haven't found a way >> to match a regular expression against only part of a string, in >> particular only past a certain point of a string, as a way of finding >> successive matches. Of course, one could do a match against a string, >> take the substring past that match and do a match against the substring, >> and so on, to find all of the matches for the string, but that could be >> very expensive for very large strings. >> I'm aware of the String.scan method, but that doesn't work for me >> because it doesn't return MatchData instances. >> What I want is just something like regexp.match(string, n), where the >> regexp starts looking for a match at or after position n in the string. >> Thanks, >> Ken > How about this? > def match(s, re, n) > /(?:.{#{n}})(#{re})/.match(s) > end > irb(main):043:0> p s > "abdefgh abdefgh abdefgh abdefgh abdefgh abdefgh abdefgh abdefgh abdefgh > abdefgh " > irb(main):044:0> p match(s, /abd/, 10).begin(1) > 16 > irb(main):045:0> p match(s, /abd/, 20).begin(1) > 24
That's clever. Obscure, but clever :-). I wonder if the regexp engine is clever enough to turn a match like .{n} into a constant time operation? Thanks, Ken
Hi, At Sun, 3 Jun 2007 13:56:05 +0900, Patrick Hurley wrote in [ruby-talk:254059]: > I think he wanted MatchData objects. The String#index method returns > the index (numeric position of the match). But if all you want are > captures, then index is a good solution.
String#index also sets $~. -- Nobu Nakada
On 6/3/07, Nobuyoshi Nakada <n@ruby-lang.org> wrote: > Hi, > At Sun, 3 Jun 2007 13:56:05 +0900, > Patrick Hurley wrote in [ruby-talk:254059]: > > I think he wanted MatchData objects. The String#index method returns > > the index (numeric position of the match). But if all you want are > > captures, then index is a good solution. > String#index also sets $~. > -- > Nobu Nakada
I should have know to never question Nobu Nakada :-), I always forget about those variables. Thanks pth
On 03.06.2007 07:30, Nobuyoshi Nakada wrote: > Hi, > At Sun, 3 Jun 2007 13:56:05 +0900, > Patrick Hurley wrote in [ruby-talk:254059]: >> I think he wanted MatchData objects. The String#index method returns >> the index (numeric position of the match). But if all you want are >> captures, then index is a good solution. > String#index also sets $~.
But then you can also use String#scan: irb(main):002:0> "ababb".scan(/(a)b+/) {p $~} #<MatchData:0x7ff94618> #<MatchData:0x7ff94578> => "ababb" irb(main):003:0> "ababb".scan(/(a)b+/) {p $~.to_a} ["ab", "a"] ["abb", "a"] => "ababb" Ken, why do you need MatchData objects? Kind regards robert
Nobuyoshi Nakada wrote: > String#index also sets $~.
For that matter, so does String#scan.
On Sun, Jun 03, 2007 at 12:59:24PM +0900, Kenneth McDonald wrote: > I'm probably just missing something obvious, but I haven't found a way > to match a regular expression against only part of a string, in > particular only past a certain point of a string, as a way of finding > successive matches. Of course, one could do a match against a string, > take the substring past that match and do a match against the substring, > and so on, to find all of the matches for the string, but that could be > very expensive for very large strings. > I'm aware of the String.scan method, but that doesn't work for me > because it doesn't return MatchData instances. > What I want is just something like regexp.match(string, n), where the > regexp starts looking for a match at or after position n in the string.
require 'strscan' scanner = StringScanner.new(string) scanner.pos = n if scanner.scan(regexp) p scanner[1] p scanner.matched p scanner.pos end It's in the stdlib. (Note, it doesn't actually give you a match data, or set $~, but of the top of my head I can't think of anything that a matchdata can do that the stringscanner can't.)
On 6/3/07, Devin Mullins <twif@comcast.net> wrote: > Nobuyoshi Nakada wrote: > > String#index also sets $~. > For that matter, so does String#scan.
Hence: irb(main):001:0> "abcdefabc".scan(/abc/) {puts "#{$~.inspect}, #{$~}"} #<MatchData:0xb7b0220c>, abc #<MatchData:0xb7b021e4>, abc => "abcdefabc" -- Rick DeNatale My blog on Ruby http://talklikeaduck.denhaven2.com/
Is $~ thread safe? To bad it has to be done this way (though my library will hide it). I first looked at Ruby several years ago, and at that time, didn't go further with it because it was too PERLish for me. (PERL was great for its time, but speaking as someone who actually had to maintain a lot of PERL code, it's actually a pretty grotty language). One of the things that brought me back to Ruby was the fact that an effort was being made to move Ruby away from its PERLisms. But I guess it'll take a while longer... Thanks everyone, Ken
Rick DeNatale wrote: > On 6/3/07, Devin Mullins <twif @comcast.net> wrote: >> Nobuyoshi Nakada wrote: >> > String#index also sets $~. >> For that matter, so does String#scan. > Hence: > irb(main):001:0> "abcdefabc".scan(/abc/) {puts "#{$~.inspect}, #{$~}"} > #<MatchData:0xb7b0220c>, abc > #<MatchData:0xb7b021e4>, abc > => "abcdefabc"
Kenneth McDonald wrote: > Is $~ thread safe?
Yes. All the regex match "global" variables are actually per-thread. See p.319 of Pick Axe 2nd ed. -- vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407
On 04.06.2007 00:44, Kenneth McDonald wrote: > Is $~ thread safe?
Yes. > To bad it has to be done this way (though my library will hide it). I > first looked at Ruby several years ago, and at that time, didn't go > further with it because it was too PERLish for me. (PERL was great for > its time, but speaking as someone who actually had to maintain a lot of > PERL code, it's actually a pretty grotty language). One of the things > that brought me back to Ruby was the fact that an effort was being made > to move Ruby away from its PERLisms. But I guess it'll take a while > longer... > Thanks everyone,
Ken, I still don't understand why exactly you need MatchData objects. What are you trying to achieve? Kind regards robert
On 6/3/07, Kenneth McDonald <kenneth.m.mcdon@sbcglobal.net> wrote: > I'm probably just missing something obvious, but I haven't found a way > to match a regular expression against only part of a string, in > particular only past a certain point of a string, as a way of finding > successive matches. Of course, one could do a match against a string, > take the substring past that match and do a match against the substring, > and so on, to find all of the matches for the string, but that could be > very expensive for very large strings. > I'm aware of the String.scan method, but that doesn't work for me > because it doesn't return MatchData instances. > What I want is just something like regexp.match(string, n),
Hmm apart of using #scan and #index with $~ as indicated, I do not think that there is a performance penalty if you do rg.match(string[n..-1]) Cheers Robert -- You see things; and you say Why? But I dream things that never were; and I say Why not? -- George Bernard Shaw
On 6/4/07, Robert Dober <robert.do@gmail.com> wrote: > rg.match(string[n..-1])
My bad how stupid, am I thinking in C???? Robert
On Jun 4, 6:19 am, "Robert Dober" <robert.do@gmail.com> wrote:
> On 6/3/07, Kenneth McDonald <kenneth.m.mcdon @sbcglobal.net> wrote:> I'm probably just missing something obvious, but I haven't found a way > > to match a regular expression against only part of a string, in > > particular only past a certain point of a string, as a way of finding > > successive matches. Of course, one could do a match against a string, > > take the substring past that match and do a match against the substring, > > and so on, to find all of the matches for the string, but that could be > > very expensive for very large strings. > > I'm aware of the String.scan method, but that doesn't work for me > > because it doesn't return MatchData instances. > > What I want is just something like regexp.match(string, n), > Hmm apart of using #scan and #index with $~ as indicated, I do not > think that there is a performance penalty if you do > rg.match(string[n..-1])
How can that be? You have to create a whole new String. If that can be avoided in the internal implementation then adding an optional offset index to #match is not an unreasonable idea. T.
On 6/4/07, Trans <transf@gmail.com> wrote:
> On Jun 4, 6:19 am, "Robert Dober" <robert.do@gmail.com> wrote: > > On 6/3/07, Kenneth McDonald <kenneth.m.mcdon@sbcglobal.net> wrote:> I'm probably just missing something obvious, but I haven't found a way > > > to match a regular expression against only part of a string, in > > > particular only past a certain point of a string, as a way of finding > > > successive matches. Of course, one could do a match against a string, > > > take the substring past that match and do a match against the substring, > > > and so on, to find all of the matches for the string, but that could be > > > very expensive for very large strings. > > > I'm aware of the String.scan method, but that doesn't work for me > > > because it doesn't return MatchData instances. > > > What I want is just something like regexp.match(string, n), > > Hmm apart of using #scan and #index with $~ as indicated, I do not > > think that there is a performance penalty if you do > > rg.match(string[n..-1]) > How can that be? You have to create a whole new String.
Beating a dead man Tom? As mentioned I had a terrible slip to C in my reasoning, no idea why :( > If that can be avoided in the internal implementation then adding an optional offset > index to #match is not an unreasonable idea. > T.
-- You see things; and you say Why? But I dream things that never were; and I say Why not? -- George Bernard Shaw
Hi -- On Mon, 4 Jun 2007, Kenneth McDonald wrote: > Is $~ thread safe? > To bad it has to be done this way (though my library will hide it). I first > looked at Ruby several years ago, and at that time, didn't go further with it > because it was too PERLish for me. (PERL was great for its time, but speaking > as someone who actually had to maintain a lot of PERL code, it's actually a > pretty grotty language). One of the things that brought me back to Ruby was > the fact that an effort was being made to move Ruby away from its PERLisms. > But I guess it'll take a while longer...
The best thing is really just to use Ruby without thinking about Perl. They're very different languages, and get mentioned in the same breath far too often. David -- Q. What is THE Ruby book for Rails developers? A. RUBY FOR RAILS by David A. Black (http://www.manning.com/black) (See what readers are saying! http://www.rubypal.com/r4rrevs.pdf) Q. Where can I get Ruby/Rails on-site training, consulting, coaching? A. Ruby Power and Light, LLC (http://www.rubypal.com)
On 04.06.2007 13:28, Robert Dober wrote:
> On 6/4/07, Trans <transf @gmail.com> wrote: >> On Jun 4, 6:19 am, "Robert Dober" <robert.do@gmail.com> wrote: >> > On 6/3/07, Kenneth McDonald <kenneth.m.mcdon@sbcglobal.net> >> wrote:> I'm probably just missing something obvious, but I haven't >> found a way >> > > to match a regular expression against only part of a string, in >> > > particular only past a certain point of a string, as a way of finding >> > > successive matches. Of course, one could do a match against a string, >> > > take the substring past that match and do a match against the >> substring, >> > > and so on, to find all of the matches for the string, but that >> could be >> > > very expensive for very large strings. >> > > I'm aware of the String.scan method, but that doesn't work for me >> > > because it doesn't return MatchData instances. >> > > What I want is just something like regexp.match(string, n), >> > Hmm apart of using #scan and #index with $~ as indicated, I do not >> > think that there is a performance penalty if you do >> > rg.match(string[n..-1]) >> How can that be? You have to create a whole new String. > Beating a dead man Tom? As mentioned I had a terrible slip to C in my > reasoning, no idea why :( >> If that can be avoided in the internal implementation then adding an >> optional offset >> index to #match is not an unreasonable idea.
Robert, actually string[n..-1] is cheaper than you might assume: I believe the new string shares the char buffer with the old string, so you basically just get a new String object with a different offset - the large bit (the char data) is not copied. Kind regards robert
On 6/4/07, Robert Klemme <shortcut@googlemail.com> wrote: > On 04.06.2007 13:28, Robert Dober wrote: > Robert, actually string[n..-1] is cheaper than you might assume: I > believe the new string shares the char buffer with the old string, so > you basically just get a new String object with a different offset - the > large bit (the char data) is not copied.
I am afraid that this is not true anymore when the slice is passed as a formal parameter, the data has to be copied :( irb(main):011:0> def change(x) irb(main):012:1> x << "changed" irb(main):013:1> end => nil irb(main):014:0> a="abcdef" => "abcdef" irb(main):015:0> change(a[1..2]) => "bcchanged" irb(main):016:0> a => "abcdef" Cheers Robert -- You see things; and you say Why? But I dream things that never were; and I say Why not? -- George Bernard Shaw
On 04.06.2007 14:06, Robert Dober wrote:
> On 6/4/07, Robert Klemme <shortcut @googlemail.com> wrote: >> On 04.06.2007 13:28, Robert Dober wrote: >> Robert, actually string[n..-1] is cheaper than you might assume: I >> believe the new string shares the char buffer with the old string, so >> you basically just get a new String object with a different offset - the >> large bit (the char data) is not copied. > I am afraid that this is not true anymore when the slice is passed as > a formal parameter, the data has to be copied :( > irb(main):011:0> def change(x) > irb(main):012:1> x << "changed" > irb(main):013:1> end > => nil > irb(main):014:0> a="abcdef" > => "abcdef" > irb(main):015:0> change(a[1..2]) > => "bcchanged" > irb(main):016:0> a > => "abcdef"
Copying in this case is not caused by using the string as a parameter but by appending to it. I thought this thread was about /scanning/ which is a read only operation. Did I miss something? Kind regards robert
|
 |
 |
 |
 |
|