|
|
 |
 |
 |
 |
Perl Programming Language
|
 |
 |
 |
 |
 |
 |
 |
 |
Sorting
Hello, I've had a search through CPAN, and have not been able to find an answer yet, but I would like to know if there is something like File::Sort which will allow me to specify that there is one or more header records at the start of the input which should be untouched by the sort. Does anyone know of such a module (or an easy way to do this using File::Sort!) Thx, k
On 7 Jun, 09:44, k@bytebrothers.co.uk wrote: > File::Sort which will allow me to specify that there is one or more > header records at the start of the input which should be untouched by > the sort.
OK, no responses, so I had time to find more research material, which led me to this solution. Any advice on ways to tighten this up a tad without losing too much readability? The data look like this (delimiters line up vertically): ================================== Licence | Created| Crtd By | Products | Qty | To Loc | Last | DZone 01799|05/06/07| OOS1| NIV0327R| 960| YH3621| | BACK 1|07/06/07| SPODE| STT0014V| 156| SFF15| | S 10106|06/06/07| DALEC| VAN1383T| 0| JLE12| | GDSIN1 1015|29/05/07| OOSOFFC| CIF0012T| 192| XP4417| | BACK 1022|31/05/07| WOODC| DET0065Y| 141| XE4313| | BACK 10222|04/06/07| COLEROB| FLU0473P| 1640| UAB12| SMITHN| None 10319|07/06/07| HALLPHIL| SCH3318Q| 240| MDL22| | GDSIN1 10350|07/06/07| QUINNJ| DOS0030K| 4072| CRH52| | GDSIN1 ================================== So, to preserve the header and sort by the 'Products' column: ================================== #!/usr/local/bin/perl -w @lines = (); @key = (); while (<>) { $row++; if ($row == 1) { print; next; } chomp; push @lines,$_; push @key, (split(/\|/))[3]; }
@indices = sort {$key[$a] cmp $key[$b]} 0..$#lines; foreach $index (@indices) { print "$lines[$index]\n"; }
==================================
k @bytebrothers.co.uk wrote: > I would like to know if there is something like File::Sort which will > allow me to specify that there is one or more header records at the > start of the input which should be untouched by the sort. my ( @headers, @records ); while ( <DATA> ) { push @headers, $_; push @records, <DATA> if /^===/; }
print @headers, sort @records; __DATA__ First header Another header ============================ Record B Record C Record A -- Gunnar Hjalmarsson Email: http://www.gunnar.cc/cgi-bin/contact.pl
On Jun 7, 11:27 am, k@bytebrothers.co.uk wrote:
> On 7 Jun, 09:44, k @bytebrothers.co.uk wrote: > > File::Sort which will allow me to specify that there is one or more > > header records at the start of the input which should be untouched by > > the sort. > OK, no responses, so I had time to find more research material, which > led me to this solution. Any advice on ways to tighten this up a tad > without losing too much readability? > The data look like this (delimiters line up vertically): > ================================== > Licence | Created| Crtd By | Products | Qty | To Loc | Last | > DZone > 01799|05/06/07| OOS1| NIV0327R| 960| YH3621| | > BACK > 1|07/06/07| SPODE| STT0014V| 156| SFF15| | > S > 10106|06/06/07| DALEC| VAN1383T| 0| JLE12| | > GDSIN1 > 1015|29/05/07| OOSOFFC| CIF0012T| 192| XP4417| | > BACK > 1022|31/05/07| WOODC| DET0065Y| 141| XE4313| | > BACK > 10222|04/06/07| COLEROB| FLU0473P| 1640| UAB12| SMITHN| > None > 10319|07/06/07| HALLPHIL| SCH3318Q| 240| MDL22| | > GDSIN1 > 10350|07/06/07| QUINNJ| DOS0030K| 4072| CRH52| | > GDSIN1 > ================================== > So, to preserve the header and sort by the 'Products' column: > ================================== > #!/usr/local/bin/perl -w
use strict; > @lines = (); > @key = ();
no need to intialize an array to the empty list. That's what it is already. > while (<>) > { > $row++;
This variable already exists for you. It's name is '$.'. No need to keep track the line count separately. > if ($row == 1) > { > print; > next; > } > chomp; > push @lines,$_; > push @key, (split(/\|/))[3]; > } > @indices = sort {$key[$a] cmp $key[$b]} 0..$#lines; > foreach $index (@indices) > { > print "$lines[$index]\n";}
rather than messing with a bunch of indices, I would prefer a Schwartzian transform. The syntax has a bit of a learning curve, but once you "get it", it becomes intuitive. So my rewrite of your script comes down to: #!/opt2/perl/bin/perl use strict; use warnings; my @lines; while (<DATA>) { print and next if $. == 1; push @lines, $_; }
print map { $_->[0] } sort { $a->[1] cmp $b->[1] } map { [ $_, (split /\|/)[3] ] } @lines; __DATA__ Licence | Created| Crtd By | Products | Qty | To Loc | Last | DZone 01799|05/06/07| OOS1| NIV0327R| 960| YH3621| | BACK 1|07/06/07| SPODE| STT0014V| 156| SFF15| | S 10106|06/06/07| DALEC| VAN1383T| 0| JLE12| | GDSIN1 1015|29/05/07| OOSOFFC| CIF0012T| 192| XP4417| | BACK 1022|31/05/07| WOODC| DET0065Y| 141| XE4313| | BACK 10222|04/06/07| COLEROB| FLU0473P| 1640| UAB12| SMITHN| None 10319|07/06/07| HALLPHIL| SCH3318Q| 240| MDL22| | GDSIN1 10350|07/06/07| QUINNJ| DOS0030K| 4072| CRH52| | GDSIN1 Paul Lalli
On 7 Jun, 16:46, Paul Lalli <mri@gmail.com> wrote: > On Jun 7, 11:27 am, k @bytebrothers.co.uk wrote: > > Any advice on ways to tighten this up a tad without losing too much readability? > rather than messing with a bunch of indices, I would prefer a > Schwartzian transform. The syntax has a bit of a learning curve, but > once you "get it", it becomes intuitive. > print map { $_->[0] } > sort { $a->[1] cmp $b->[1] } > map { [ $_, (split /\|/)[3] ] } > @lines;
Oh, that's sweet! All I need to do now is sit down and work out exactly how the feck that works!
On 8 Jun, 09:28, k@bytebrothers.co.uk wrote:
> On 7 Jun, 16:46, Paul Lalli <mri @gmail.com> wrote: > > On Jun 7, 11:27 am, k@bytebrothers.co.uk wrote: > > > Any advice on ways to tighten this up a tad without losing too much readability? > > rather than messing with a bunch of indices, I would prefer a > > Schwartzian transform. The syntax has a bit of a learning curve, but > > once you "get it", it becomes intuitive. > > print map { $_->[0] } > > sort { $a->[1] cmp $b->[1] } > > map { [ $_, (split /\|/)[3] ] } > > @lines; > Oh, that's sweet! All I need to do now is sit down and work out > exactly how the feck that works!
I've been working through this, and I think I'm getting there, slowly; there's something going on here with anonymous list references, for a start. But how would I use this paradigm if there was a more complicated key? For example, in my original example, if I needed to sort by the second column, which contains a date, I would have done something like: @fields = split(/\|/); ($dy,$mn,$yr) = split(/\//,$field[1]); push @key, "$yr$mn$dy"; etc... How would this transform approach allow me to do something similar?
On Jun 8, 5:59 am, k@bytebrothers.co.uk wrote:
> > On 7 Jun, 16:46, Paul Lalli <mri @gmail.com> wrote: > > > print map { $_->[0] } > > > sort { $a->[1] cmp $b->[1] } > > > map { [ $_, (split /\|/)[3] ] } > > > @lines; > But how would I use this paradigm if there was a more > complicated key? For example, in my original example, if I > needed to sort by the second column, which contains a date, I > would have done something like: > @fields = split(/\|/); > ($dy,$mn,$yr) = split(/\//,$field[1]); > push @key, "$yr$mn$dy"; > etc... > How would this transform approach allow me to do something similar?
Well, obviously, it's going to be a little messier, but the concept is the same; print map { $_->[0] } sort { $a->[1] cmp $b->[1] } map { [ $_, do { my ($d,$m,$y) = split '/', (split /\|/)[1]; "$y$m$d"; } ] } @lines; When trying to decipher a Schwartzian transform, read it backwards. 1) We start with the array of @lines. 2) The bottom map transform the array of lines into a list of array references. The first element of the array reference is the line itself, and the second is the value we want to sort by eventually. In this case, that's the "year-month-day" value. 3) The sort now takes this list of array references, and sorts it by the second element of each referenced array. That is, it sorts the array references on our sort key. 4) The top map takes this sorted list of array references and transforms it to a new list containing the first element of each referenced array - that is, the original line. 5) print is passed this list of lines. It might be helpful if you break it out into it's individual steps. In this case, I'll use a generic get_key() to represent obtaining the sort key from your line. That's the only part of a Schwartzian transform that ever changes. The syntax is always the same for the rest of it. my @lines_keys = map { [ $_, get_key($_) ] } @lines; my @sorted_lines_keys = sort { $a->[1] cmp $b->[1] } @lines_keys; my @sorted_lines = map { $_->[0] } @sorted_lines_keys; print @sorted_lines; Hope that helps, Paul Lalli
On 8 Jun, 11:32, Paul Lalli <mri@gmail.com> wrote: > When trying to decipher a Schwartzian transform, read it backwards. > 1) We start with the array of @lines. > 2) The bottom map transform the array of lines into a list of array > references. The first element of the array reference is the line > itself, and the second is the value we want to sort by eventually. In > this case, that's the "year-month-day" value. > 3) The sort now takes this list of array references, and sorts it by > the second element of each referenced array. That is, it sorts the > array references on our sort key. > 4) The top map takes this sorted list of array references and > transforms it to a new list containing the first element of each > referenced array - that is, the original line. > 5) print is passed this list of lines.
I think I just had a religious experience. That is new and wonderful, and thank you for explaining it for me!
On Jun 8, 6:46 am, k@bytebrothers.co.uk wrote: > On 8 Jun, 11:32, Paul Lalli <mri @gmail.com> wrote: > > [description of Schwartzian Transform] > I think I just had a religious experience. That is new and > wonderful, and thank you for explaining it for me! You're welcome. Glad to help. I would be remiss, however, if I didn't point out that Uri has created a module which generalizes the creation of a Schwartzian Transform sort algorithm (amongst other things). It is available on the CPAN, named Sort::Maker. Using that module, the process becomes: use Sort::Maker my $sorter = make_sorter('ST', string => \&get_key); print $sorter->(@lines); #get_key simply extracts the key from your data #so in the second example, it would be: sub get_key { my $date = (split /\|/, $_)[1]; my ($d, $m, $y) = split '/', $date; "$y$m$d"; }
#in the original, it would be as simple as: sub get_key { (split /\|/)[3]; }
Paul Lalli
Paul Lalli <mri @gmail.com> writes: > When trying to decipher a Schwartzian transform, read it backwards. That was the most difficult part to wrap my brain around. It's the reason that, upon encountering an ST, I *still* have to stop and think about it for a moment to parse it. sherm-- -- Web Hosting by West Virginians, for West Virginians: http://wv-www.net Cocoa programming in Perl: http://camelbones.sourceforge.net
>>>>> "k" == keith <k @bytebrothers.co.uk> writes: k> I think I just had a religious experience. That is new and wonderful, k> and thank you for explaining it for me! if you want a module to do all that (and more) for you, check out Sort::Maker. uri -- Uri Guttman ------ u@stemsystems.com -------- http://www.stemsystems.com --Perl Consulting, Stem Development, Systems Architecture, Design and Coding- Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
|
 |
 |
 |
 |
|