Home     |     .Net Programming    |     cSharp Home    |     Sql Server Home    |     Javascript / Client Side Development     |     Ajax Programming

Ruby on Rails Development     |     Perl Programming     |     C Programming Language     |     C++ Programming     |     IT Jobs

Python Programming Language     |     Laptop Suggestions?    |     TCL Scripting     |     Fortran Programming     |     Scheme Programming Language


 
 
Cervo Technologies
The Right Source to Outsource

MS Dynamics CRM 3.0

Perl Programming Language

reverse a glob expansion


Hello,

I need a piece of code that can "unexpand" a glob pattern.  For
example, given the following list:

Foo-1-Bar
Foo-2-Bar
Foo-3-Bar

I would like to get back:

Foo-{1,2,3}-Bar

Any help would be greatly appreciated.

Thanks,
-Topher

topher67 <t0ph3r1@netscape.net> wrote:
> Hello,

> I need a piece of code that can "unexpand" a glob pattern.  For
> example, given the following list:

> Foo-1-Bar
> Foo-2-Bar
> Foo-3-Bar

> I would like to get back:

> Foo-{1,2,3}-Bar

> Any help would be greatly appreciated.

This could could be either very easy or very hard.

Are curlies the only specials allowed, and are the things in the curly
always to be exactly one character long, and is there only going to be
exactly one set of curlies per pattern?

Xho

--
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service                        $9.95/Month 30GB

On May 25, 10:19 am, xhos@gmail.com wrote:

Let's assume the following:
* curlies are the only specials allowed
* the substrings inside the curlies can be of differing lengths
* there may be more than one expanded set in the input list
* we won't handle nested curlies  (e.g. Foo{A{1,2,3}Z,XY}Bar )

Here's another example:

FooZZZBar
FooYBar
FooXXBar
Baz11
Baz222
Nop

Becomes:

Foo{ZZZ,Y,XX}Bar
Baz{11,222}
Nop

I realize that this is a hard problem to solve.  Any help is greatly
appreciated.

> I realize that this is a hard problem to solve.  Any help is greatly
> appreciated.

I think I might be able to make use of this module:

Regexp::List - builds regular expressions out of a list of words

Ah, that makes it harder than I had hoped...

> * there may be more than one expanded set in the input list

Do you mean like "abc{d,e,f}ghi{j,k,l}mn" where you have a cartesian join,
or do you mean like in your example below, where there is more than one
"lines" of pattern but any given one of them has at most one set of
curlies?

> * we won't handle nested curlies  (e.g. Foo{A{1,2,3}Z,XY}Bar )

Nesting actually probably wouldn't be so bad to implement, at least
compared to Cartesian joins.  In fact, the example you give below is just a
special kind of nesting, equivalent to {Foo{ZZZ,Y,XX}Bar,Baz{11,222},Nop}.
A special kind because you can only have two levels, and the outer level
cannot have any fixed characters in before or after--but still it is
nested.

There are many possible solutions, and it is not obvious how to assign a
score to each so that we can choose a single best one.  Also, once a
scoring system is designed, it maybe computationally expensive to achieve.
So some kind of heuristic is probably needed.  In the example you give, the
best matching at the front (Foo) corresponds to the best matching at the
rear (Bar).  Is that likely to be a common occurrence in your data, or was
it just a coincident?

Does Regexp::List come up with a regex which matches all of the given words
*and nothing else*?  The docs didn't seem to address that issue.

Anyway, if your goal is condense, say, a large directory listing down to a
handful of patterns that human could easily discern, I'm not sure that
something optimized for a regex engine would do a good job.  (Although
looking at the techniques used by it could certainly be informative.)

If this is for human consumption, I would have a preference for patterns
in which the curlies occur at natural boundaries, such as transitions
from letter to number or number to letter or punctuation to
non-punctuation, etc.

As someone who frequently looks at very long directory listings of
computer-generated file names, this is something I've often thought about,
but never actually attempted.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service                        $9.95/Month 30GB

While it may not always return a human friendly result, it does seem
to work:

# refactor a glob
sub reglob {
    my($pat) = @_;

    # glob2list
    my @list;
    my @glob = bsd_glob($pat, GLOB_NOCHECK | GLOB_BRACE);
    if (@glob) {
        for my $glob (@glob) {
            push @list, $glob;
        }
    }
    else {
        push @list, $pat;
    }
    # list2re
    my $rl  = Regexp::List->new(lookahead => 0, quotemeta => 0);
    my $re = $rl->list2re( @list );
    # re2glob
    $re =~ s/\(\?-xism:(.*)\)/$1/g;
    $re =~ s/\(\?:/(/g;
    $re =~ s/^\(// and $re =~ s/\)$//;
    $re =~ tr/()|/{},/;

    $re;

}

Sample in: aaa{11,22},aaa33

Sample out: aaa{11,22,33}

Add to del.icio.us | Digg this | Stumble it | Powered by Megasolutions Inc