|
|
 |
 |
 |
 |
Ruby Programming Language
|
 |
 |
 |
 |
 |
 |
 |
 |
efficient regex scanning
Hello there, I wan't to extract all the words from a file and so i wrote the following code: file = ARGV[0] File.open('output','w') {|f| IO.read(file).scan(/\w+/).each{|w| f.print w} }
The problem with this code is that it stores all the words in an array which is not so good in terms of efficiency. Is there a better way to do it? Something like IO.read(file).each_scan { foo } Thanks Christos
Trochalakis Christos wrote: > Hello there, > I wan't to extract all the words from a file and so i wrote the > following code: > file = ARGV[0] > File.open('output','w') {|f| > IO.read(file).scan(/\w+/).each{|w| f.print w} > } > The problem with this code is that it stores all the words in an array > which is not so good in terms of efficiency. > Is there a better way to do it? > Something like IO.read(file).each_scan { foo } > Thanks > Christos
Scan takes a block form: ri String.scan IO.read(file).scan(/\w+/) {|w| f.print w} Cheers -- Ola Bini (http://ola-bini.blogspot.com) JRuby Core Developer Developer, ThoughtWorks Studios (http://studios.thoughtworks.com) "Yields falsehood when quined" yields falsehood when quined.
Trochalakis Christos wrote: > Hello there, > The problem with this code is that it stores all the words in an array > which is not so good in terms of efficiency. > Is there a better way to do it? > Something like IO.read(file).each_scan { foo } > Thanks > Christos
Does just using a block with scan do what you need? IO.read(file).scan(/\w+/) { |word| f.print word } http://www.ruby-doc.org/core/classes/String.html#M000827 best, Dan -- Posted via http://www.ruby-forum.com/.
Hi -- On Wed, 6 Jun 2007, Trochalakis Christos wrote: > Hello there, > I wan't to extract all the words from a file and so i wrote the > following code: > file = ARGV[0] > File.open('output','w') {|f| > IO.read(file).scan(/\w+/).each{|w| f.print w} > } > The problem with this code is that it stores all the words in an array > which is not so good in terms of efficiency. > Is there a better way to do it? > Something like IO.read(file).each_scan { foo }
You could do something like this (untested, and reversing your logic somewhat): File.open(file).each {|line| f.print(line.scan(/\w+/)) } (You might want to join them with a space or something so they don't all run together.) David -- Q. What is THE Ruby book for Rails developers? A. RUBY FOR RAILS by David A. Black (http://www.manning.com/black) (See what readers are saying! http://www.rubypal.com/r4rrevs.pdf) Q. Where can I get Ruby/Rails on-site training, consulting, coaching? A. Ruby Power and Light, LLC (http://www.rubypal.com)
On Jun 6, 2:00 pm, Ola Bini <ola.b@gmail.com> wrote:
> Trochalakis Christos wrote: > > Hello there, > > I wan't to extract all the words from a file and so i wrote the > > following code: > > file = ARGV[0] > > File.open('output','w') {|f| > > IO.read(file).scan(/\w+/).each{|w| f.print w} > > } > > The problem with this code is that it stores all the words in an array > > which is not so good in terms of efficiency. > > Is there a better way to do it? > > Something like IO.read(file).each_scan { foo } > > Thanks > > Christos > Scan takes a block form: > ri String.scan > IO.read(file).scan(/\w+/) {|w| f.print w} > Cheers
Thanks a lot! I suppose should have checked first :)
On 06.06.2007 13:08, dbl@wobblini.net wrote:
> Hi -- > On Wed, 6 Jun 2007, Trochalakis Christos wrote: >> Hello there, >> I wan't to extract all the words from a file and so i wrote the >> following code: >> file = ARGV[0] >> File.open('output','w') {|f| >> IO.read(file).scan(/\w+/).each{|w| f.print w} >> } >> The problem with this code is that it stores all the words in an array >> which is not so good in terms of efficiency. >> Is there a better way to do it? >> Something like IO.read(file).each_scan { foo } > You could do something like this (untested, and reversing your logic > somewhat): > File.open(file).each {|line| f.print(line.scan(/\w+/)) } > (You might want to join them with a space or something so they don't > all run together.)
You're not closing the IO. I know it's not an issue for a small script but... I'd do this: ARGF.each {|line| puts line.scan /\w+/} :-) Kind regards robert
Hi --
On Wed, 6 Jun 2007, Robert Klemme wrote: > On 06.06.2007 13:08, dbl @wobblini.net wrote: >> Hi -- >> On Wed, 6 Jun 2007, Trochalakis Christos wrote: >>> Hello there, >>> I wan't to extract all the words from a file and so i wrote the >>> following code: >>> file = ARGV[0] >>> File.open('output','w') {|f| >>> IO.read(file).scan(/\w+/).each{|w| f.print w} >>> } >>> The problem with this code is that it stores all the words in an array >>> which is not so good in terms of efficiency. >>> Is there a better way to do it? >>> Something like IO.read(file).each_scan { foo } >> You could do something like this (untested, and reversing your logic >> somewhat): >> File.open(file).each {|line| f.print(line.scan(/\w+/)) } >> (You might want to join them with a space or something so they don't >> all run together.) > You're not closing the IO. I know it's not an issue for a small script > but...
It's not a complete script; I was only showing one line. At the very least it's not going to run unless you assign something to f :-) David -- Q. What is THE Ruby book for Rails developers? A. RUBY FOR RAILS by David A. Black (http://www.manning.com/black) (See what readers are saying! http://www.rubypal.com/r4rrevs.pdf) Q. Where can I get Ruby/Rails on-site training, consulting, coaching? A. Ruby Power and Light, LLC (http://www.rubypal.com)
Trochalakis Christos wrote: > Hello there, > I wan't to extract all the words from a file and so i wrote the > following code: > file = ARGV[0] > File.open('output','w') {|f| > IO.read(file).scan(/\w+/).each{|w| f.print w} > } > The problem with this code is that it stores all the words in an array > which is not so good in terms of efficiency. > Is there a better way to do it? > Something like IO.read(file).each_scan { foo }
Here's a thought. Note that it doesn't handle //m regexen. Like David's and Robert's solutions, it doesn't read the whole at once. (I guess one could check for pat.options&Regexp::MULTILINE, and read the whole IO in that case.) class IO def scan pat if block_given? each {|line| line.scan(pat) {|s| yield s} } else read.scan(pat) end end end File.open(filename) do |f| f.scan(/\w+/) {|word| puts word} end -- vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407
|
 |
 |
 |
 |
|