|
|
 |
 |
 |
 |
Perl Programming Language
|
 |
 |
 |
 |
 |
 |
 |
 |
reverse a glob expansion
Hello, I need a piece of code that can "unexpand" a glob pattern. For example, given the following list: Foo-1-Bar Foo-2-Bar Foo-3-Bar I would like to get back: Foo-{1,2,3}-Bar Any help would be greatly appreciated. Thanks, -Topher
topher67 <t0ph3r1 @netscape.net> wrote: > Hello, > I need a piece of code that can "unexpand" a glob pattern. For > example, given the following list: > Foo-1-Bar > Foo-2-Bar > Foo-3-Bar > I would like to get back: > Foo-{1,2,3}-Bar > Any help would be greatly appreciated.
This could could be either very easy or very hard. Are curlies the only specials allowed, and are the things in the curly always to be exactly one character long, and is there only going to be exactly one set of curlies per pattern? Xho -- -------------------- http://NewsReader.Com/ -------------------- Usenet Newsgroup Service $9.95/Month 30GB
On May 25, 10:19 am, xhos@gmail.com wrote:
> topher67 <t0ph3r1 @netscape.net> wrote: > > Hello, > > I need a piece of code that can "unexpand" a glob pattern. For > > example, given the following list: > > Foo-1-Bar > > Foo-2-Bar > > Foo-3-Bar > > I would like to get back: > > Foo-{1,2,3}-Bar > > Any help would be greatly appreciated. > This could could be either very easy or very hard. > Are curlies the only specials allowed, and are the things in the curly > always to be exactly one character long, and is there only going to be > exactly one set of curlies per pattern? > Xho > -- > --------------------http://NewsReader.Com/-------------------- > Usenet Newsgroup Service $9.95/Month 30GB
Let's assume the following: * curlies are the only specials allowed * the substrings inside the curlies can be of differing lengths * there may be more than one expanded set in the input list * we won't handle nested curlies (e.g. Foo{A{1,2,3}Z,XY}Bar ) Here's another example: FooZZZBar FooYBar FooXXBar Baz11 Baz222 Nop Becomes: Foo{ZZZ,Y,XX}Bar Baz{11,222} Nop I realize that this is a hard problem to solve. Any help is greatly appreciated.
> I realize that this is a hard problem to solve. Any help is greatly > appreciated.
I think I might be able to make use of this module: Regexp::List - builds regular expressions out of a list of words
topher67 <t0ph3r1 @netscape.net> wrote: > On May 25, 10:19 am, xhos @gmail.com wrote: > > topher67 <t0ph3r1 @netscape.net> wrote: > > > Hello, > > > I need a piece of code that can "unexpand" a glob pattern. For > > > example, given the following list: > > > Foo-1-Bar > > > Foo-2-Bar > > > Foo-3-Bar > > > I would like to get back: > > > Foo-{1,2,3}-Bar > > > Any help would be greatly appreciated. > > This could could be either very easy or very hard. > > Are curlies the only specials allowed, and are the things in the curly > > always to be exactly one character long, and is there only going to be > > exactly one set of curlies per pattern? > > Xho > > -- > > --------------------http://NewsReader.Com/-------------------- > > Usenet Newsgroup Service $9.95/Month 30GB > Let's assume the following: > * curlies are the only specials allowed > * the substrings inside the curlies can be of differing lengths
Ah, that makes it harder than I had hoped... > * there may be more than one expanded set in the input list
Do you mean like "abc{d,e,f}ghi{j,k,l}mn" where you have a cartesian join, or do you mean like in your example below, where there is more than one "lines" of pattern but any given one of them has at most one set of curlies? > * we won't handle nested curlies (e.g. Foo{A{1,2,3}Z,XY}Bar )
Nesting actually probably wouldn't be so bad to implement, at least compared to Cartesian joins. In fact, the example you give below is just a special kind of nesting, equivalent to {Foo{ZZZ,Y,XX}Bar,Baz{11,222},Nop}. A special kind because you can only have two levels, and the outer level cannot have any fixed characters in before or after--but still it is nested.
> Here's another example: > FooZZZBar > FooYBar > FooXXBar > Baz11 > Baz222 > Nop > Becomes: > Foo{ZZZ,Y,XX}Bar > Baz{11,222} > Nop > I realize that this is a hard problem to solve. Any help is greatly > appreciated.
There are many possible solutions, and it is not obvious how to assign a score to each so that we can choose a single best one. Also, once a scoring system is designed, it maybe computationally expensive to achieve. So some kind of heuristic is probably needed. In the example you give, the best matching at the front (Foo) corresponds to the best matching at the rear (Bar). Is that likely to be a common occurrence in your data, or was it just a coincident? Does Regexp::List come up with a regex which matches all of the given words *and nothing else*? The docs didn't seem to address that issue. Anyway, if your goal is condense, say, a large directory listing down to a handful of patterns that human could easily discern, I'm not sure that something optimized for a regex engine would do a good job. (Although looking at the techniques used by it could certainly be informative.) If this is for human consumption, I would have a preference for patterns in which the curlies occur at natural boundaries, such as transitions from letter to number or number to letter or punctuation to non-punctuation, etc. As someone who frequently looks at very long directory listings of computer-generated file names, this is something I've often thought about, but never actually attempted. Xho -- -------------------- http://NewsReader.Com/ -------------------- Usenet Newsgroup Service $9.95/Month 30GB
While it may not always return a human friendly result, it does seem to work: # refactor a glob sub reglob { my($pat) = @_; # glob2list my @list; my @glob = bsd_glob($pat, GLOB_NOCHECK | GLOB_BRACE); if (@glob) { for my $glob (@glob) { push @list, $glob; } } else { push @list, $pat; } # list2re my $rl = Regexp::List->new(lookahead => 0, quotemeta => 0); my $re = $rl->list2re( @list ); # re2glob $re =~ s/\(\?-xism:(.*)\)/$1/g; $re =~ s/\(\?:/(/g; $re =~ s/^\(// and $re =~ s/\)$//; $re =~ tr/()|/{},/; $re; }
Sample in: aaa{11,22},aaa33 Sample out: aaa{11,22,33}
|
 |
 |
 |
 |
|