[23014] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 5234 Volume: 10

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Thu Jul 17 18:05:39 2003

Date: Thu, 17 Jul 2003 15:05:08 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Thu, 17 Jul 2003     Volume: 10 Number: 5234

Today's topics:
        Behind the scene <mpapec@yahoo.com>
    Re: Behind the scene <uri@stemsystems.com>
    Re: better way of building string from hash <uri@stemsystems.com>
        HTML REGEX <bruce@ghbraille.com>
    Re: HTML REGEX <asu1@c-o-r-n-e-l-l.edu>
    Re: macros in perl <tzz@lifelogs.com>
    Re: macros in perl <tassilo.parseval@rwth-aachen.de>
    Re: macros in perl <uri@stemsystems.com>
    Re: Need Perl teacher/school: Network programming nobull@mail.com
    Re: Need Perl teacher/school: Network programming <flavell@mail.cern.ch>
        perlstyles <mpapec@yahoo.com>
    Re: perlstyles <uri@stemsystems.com>
    Re: perlstyles <tassilo.parseval@rwth-aachen.de>
    Re: perlstyles <uri@stemsystems.com>
        Regular Expression help (Joseph)
    Re: Regular Expression help <mpapec@yahoo.com>
    Re: Regular Expression help (Greg Bacon)
        tab delimited file processing problem (Domenico Discepola)
    Re: tab delimited file processing problem (Greg Bacon)
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Thu, 17 Jul 2003 22:27:07 +0200
From: Matija Papec <mpapec@yahoo.com>
Subject: Behind the scene
Message-Id: <qjtdhv4o0be297sg1m8qeog39p6qdq77kf@4ax.com>


http://www.perl.com/lpt/a/2003/07/16/soto2003.html
>I'm not looking for sympathy, but I want you to know that I almost certainly could have landed a full-time job 20 months ago if I'd been willing to forget about Perl 6.

huh?



-- 
Matija


------------------------------

Date: Thu, 17 Jul 2003 21:01:41 GMT
From: Uri Guttman <uri@stemsystems.com>
Subject: Re: Behind the scene
Message-Id: <x71xwoj2a3.fsf@mail.sysarch.com>

>>>>> "MP" == Matija Papec <mpapec@yahoo.com> writes:

  MP> http://www.perl.com/lpt/a/2003/07/16/soto2003.html

  >> I'm not looking for sympathy, but I want you to know that I almost
  >> certainly could have landed a full-time job 20 months ago if I'd
  >> been willing to forget about Perl 6.

  MP> huh?

larry has been working on the perl6 apocalypses and design for much of
this period. he has been funded partly by the perl foundation but has
not had major income from them. and he has 2 kids in college, a
mortgage, etc. so he has sacrificed a lot. his comment above means he
could have had a decent paying job but he turned it down instead to work
on perl6.

so the way you can help is to donate money to the perl foundation and
help larry pay for his mortgage. :)

uri

-- 
Uri Guttman  ------  uri@stemsystems.com  -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs  ----------------------------  http://jobs.perl.org


------------------------------

Date: Thu, 17 Jul 2003 18:15:19 GMT
From: Uri Guttman <uri@stemsystems.com>
Subject: Re: better way of building string from hash
Message-Id: <x765m1hvex.fsf@mail.sysarch.com>

>>>>> "SS" == Sundial Services <info_ns5@sundialservices.com> writes:

  >> my $arrMess  = join ',', map "'$_'", values %required;
  >> my $arrField = join ',', map "'$_'", keys   %required;

  SS> Just remember though ... "it has to be clear."  Abundantly clear.
  SS> And it has to be resistant to side-effects, caused by an unrelated
  SS> change made to the software at some future time.  (A change, that
  SS> is, that you intended to be "unrelated," but because of the
  SS> original design, "suh-prize!")

and you think join and map will change and are unclear?

  SS> Perl teeters on being a "write-only language."  Keep your code
  SS> simple and direct, and extremely well-documented AS you write it.

all langs can be write only. it is the coder's issue and not the
language. this is patent FUD and you should stop spreading it.

uri

-- 
Uri Guttman  ------  uri@stemsystems.com  -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs  ----------------------------  http://jobs.perl.org


------------------------------

Date: Thu, 17 Jul 2003 14:20:23 -0500
From: "Boudga" <bruce@ghbraille.com>
Subject: HTML REGEX
Message-Id: <vhdtd7jrp2mi50@corp.supernews.com>

I want to remove the Style tags from my HTML files so I wrote this Perl
script to remove the line breaks and attempt to remove the Style tags but it
fails....any help would be much appreciated!


#!/usr/bin/perl

$ARGV[0] =~ /(.*)\.([^\.]*)/;
$outfile = "$1_cleaned.$2";

open (INFILE, "<$ARGV[0]");
open (OUTFILE, ">$outfile");

$outline="";

while($outline=<INFILE>){


 #$outline =~ s/\n/ /gi;
 #$outline =~ s/<style[^>]*>//gi;


 print OUTFILE "$outline";
}

print OUTFILE "$outline";
close INFILE;
close OUTFILE;

eof

Boudga





------------------------------

Date: 17 Jul 2003 19:35:39 GMT
From: "A. Sinan Unur" <asu1@c-o-r-n-e-l-l.edu>
Subject: Re: HTML REGEX
Message-Id: <Xns93BB9EA351392asu1cornelledu@132.236.56.8>

"Boudga" <bruce@ghbraille.com> wrote in
news:vhdtd7jrp2mi50@corp.supernews.com: 

> I want to remove the Style tags from my HTML files so I wrote this
> Perl script to remove the line breaks and attempt to remove the Style
> tags but it fails....any help would be much appreciated!

How does it fail?

In any case, HTML::Parser might be of some help. The example below is 
based on an example to remove comments in the documentation for that 
module:

#! C:/Perl/bin/perl.exe -w

use diagnostics;
use strict;
use warnings;

use HTML::Parser;

my $parser = HTML::Parser->new(default_h => 
    	    	    	    	    	    	[sub { print shift }, 'text']);
$parser->ignore_elements('style');

$parser->parse_file(shift || die) || die $!;

__END__

-- 
A. Sinan Unur
asu1@c-o-r-n-e-l-l.edu
Remove dashes for address
Spam bait: mailto:uce@ftc.gov


------------------------------

Date: Thu, 17 Jul 2003 15:02:37 -0400
From: Ted Zlatanov <tzz@lifelogs.com>
Subject: Re: macros in perl
Message-Id: <4n4r1lgenm.fsf@lockgroove.bwh.harvard.edu>

On Wed, 16 Jul 2003, marc0@autistici.org wrote:
> IMHO the point in using long and specific names instead of the
> general purpose ones is that you can guess what the
> variable/piece-of-code contains/does more easily, the code becomes
> almost self-documenting, and the code looks better too (but this
> last is a matter of taste).

Unfortunately this is not always true.  If the code maintainer 5 years
from now has to look up all your long and specific contructs, it's
going to make his job harder, not easier no matter how
self-documenting it looks to you.  This is because we imagine that the
code we know intimately is just as clear to everyone else - a very
common pitfall.

Another problem with the sort of macros you imagine is that they don't
really help you, and in fact may prevent you from learning the better
idioms that people have pointed out already.  Whereas C can definitely
benefit from macros, Perl is (in my experience) not a repetitive
language because of its rich syntax; if you find yourself repeating
things so you need a macro it's time to write a subroutine.

You may imagine a gain in performance from the inline expansion of
macros vs. the call penalty of subroutines, but remember that
"premature optimization is the root of all evil."  Better to write
standard subroutines and later convert them to inline expansion
through Filter::Simple or whatever you want, if the optimization is
necessary.  Use the Benchmark module and the profiler to study your
subroutines' performance.

Abigail pointed out a module like Switch, which is certainly useful.
Switch.pm, however, was written after by a Perl expert after many
people requested that functionality over the years.  You should learn
the basics of Perl well, and make sure you really need the macros you
think you need, before you spend the time writing them.

Ted


------------------------------

Date: 17 Jul 2003 21:46:40 GMT
From: "Tassilo v. Parseval" <tassilo.parseval@rwth-aachen.de>
Subject: Re: macros in perl
Message-Id: <bf75g0$baf$1@nets3.rz.RWTH-Aachen.DE>

Also sprach Ted Zlatanov:

> Another problem with the sort of macros you imagine is that they don't
> really help you, and in fact may prevent you from learning the better
> idioms that people have pointed out already.  Whereas C can definitely
> benefit from macros, Perl is (in my experience) not a repetitive
> language because of its rich syntax; if you find yourself repeating
> things so you need a macro it's time to write a subroutine.

Functions do not serve the same purpose as macros. If they did, you
wouldn't need macros in C. I often came across a situation in which I
would have liked to have macros in Perl, too. The main difference is
that macros only have a compile-time effect. You don't usually want to
find out the machine's byte-order in order to do some byte-swapping at
run-time. That's a job for a macro because all needed information are
already present at compile-time (actually, even by the time your
processor was layed out by the engineers on their scratch-pad).

Perl6 will have macros, wont it? I think that should give some
indication that they are even useful for such non-repetitive languages
as Perl is.

Tassilo
-- 
$_=q#",}])!JAPH!qq(tsuJ[{@"tnirp}3..0}_$;//::niam/s~=)]3[))_$-3(rellac(=_$({
pam{rekcahbus})(rekcah{lrePbus})(lreP{rehtonabus})!JAPH!qq(rehtona{tsuJbus#;
$_=reverse,s+(?<=sub).+q#q!'"qq.\t$&."'!#+sexisexiixesixeseg;y~\n~~dddd;eval


------------------------------

Date: Thu, 17 Jul 2003 21:55:47 GMT
From: Uri Guttman <uri@stemsystems.com>
Subject: Re: macros in perl
Message-Id: <x7u19khl7g.fsf@mail.sysarch.com>

>>>>> "TvP" == Tassilo v Parseval <tassilo.parseval@rwth-aachen.de> writes:

  TvP> Perl6 will have macros, wont it? I think that should give some
  TvP> indication that they are even useful for such non-repetitive
  TvP> languages as Perl is.

and they will be very cool too. the definition of a macro in perl6 is
that it is just a sub that executes as soon as it is parsed (in the
compile phase). this integrates the macro concept into perl6 in a very
neat way. if the macro returns a string, it replaces the original macro
call. if it returns a code block (all blocks are code refs in perl6), it
inserts that compiled opcode tree at this point in the main tree.

also macros can alter the way they are parsed when called! i won't go
into that as it is confusing to me so far. but i see its possibilities.

uri

-- 
Uri Guttman  ------  uri@stemsystems.com  -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs  ----------------------------  http://jobs.perl.org


------------------------------

Date: 17 Jul 2003 12:39:57 -0700
From: nobull@mail.com
Subject: Re: Need Perl teacher/school: Network programming
Message-Id: <4dafc536.0307171139.2d962720@posting.google.com>

"Alan J. Flavell" <flavell@mail.cern.ch> wrote in message news:<Pine.LNX.4.53.0307160041290.23806@lxplus084.cern.ch>...
> On Tue, Jul 15, Jacqui Caren inscribed on the eternal scroll:
> > Note sure if anyone has gotten any of the proxy modules
> > working with SSL. Be very interested if they have though
> > although it does seem rather silly to app-relay what should
> > be a single SSL encypted connection and then store the contents
> > in plain text ;_)
> 
> I'm a bit confused as to what you have in mind here.
> 
> If that's possible at all, I mean other than deliberate co-operation
> between the server and the proxy, then it represents a complete
> security failure.

Actually this is a feature I've considered adding to Apache to help me
reverse engineer some forms on https site for use with LWP.

It does not require the co-operation of the server.  It only requires
the co-operation of _either_ the client _or_ the server.

So long as the proxy holds a CA private key that's trusted by the
client you can get away with it.

> The client and server are supposed to negotiate an
> end to end encrypted path precisely in order to prevent any
> intermediate from overhearing what goes on.  If the proxy succeeds in
> masquerading as the target server, then that whole purpose is
> defeated, and the crypto folk would surely be working on overtime to
> solve the problem, no?

There is no technological solution to the human problem of choosing
the who to trust.  If I can trick you into trusting my CA then I can
intercept your https traffic.


------------------------------

Date: Thu, 17 Jul 2003 22:50:06 +0200
From: "Alan J. Flavell" <flavell@mail.cern.ch>
Subject: Re: Need Perl teacher/school: Network programming
Message-Id: <Pine.LNX.4.53.0307172245100.27244@lxplus076.cern.ch>

On Thu, Jul 17, nobull@mail.com inscribed on the eternal scroll:

> > If that's possible at all, I mean other than deliberate co-operation
> > between the server and the proxy, then it represents a complete
> > security failure.
[..]
> It does not require the co-operation of the server.  It only requires
> the co-operation of _either_ the client _or_ the server.

Good point, thanks.

> So long as the proxy holds a CA private key that's trusted by the
> client you can get away with it.
[..]
> There is no technological solution to the human problem of choosing
> the who to trust.  If I can trick you into trusting my CA then I can
> intercept your https traffic.

Can't argue with that.

all the best


------------------------------

Date: Thu, 17 Jul 2003 22:27:08 +0200
From: Matija Papec <mpapec@yahoo.com>
Subject: perlstyles
Message-Id: <g1udhv065eijbi9kd4flhr3k2o1n19uoa7@4ax.com>


I was wandering what would be more readable for code maintainers, I prefer
first as it's far more obvious what's going on(not to mention typing
laziness), but that's just me.

#1
@arr = join ',', map s|'|\\'|g && "'$_'", grep /^MB/, @arr;
#2
@arr = join(',', map(s|'|\\'|g && "'$_'", grep(/^MB/, @arr)));

and another one, how do you prefer $h{key} over $h{'key'} in case where key
is strictly English \w class? 

Most importantly, what would you tell to someone who doesn't have a clue
about use strict and warnings, and at the same time insisting on quoted hash
keys? :)



-- 
Matija


------------------------------

Date: Thu, 17 Jul 2003 21:05:57 GMT
From: Uri Guttman <uri@stemsystems.com>
Subject: Re: perlstyles
Message-Id: <x7y8ywhnij.fsf@mail.sysarch.com>

>>>>> "MP" == Matija Papec <mpapec@yahoo.com> writes:

  MP> I was wandering what would be more readable for code maintainers,
  MP> I prefer first as it's far more obvious what's going on(not to
  MP> mention typing laziness), but that's just me.

  MP> #1
  MP> @arr = join ',', map s|'|\\'|g && "'$_'", grep /^MB/, @arr;
  MP> #2
  MP> @arr = join(',', map(s|'|\\'|g && "'$_'", grep(/^MB/, @arr)));

when using nested stuff like that, i usually prefer parens to help
out. but i am not strict about it. it may depend subtly on the actual
code and my mood.

  MP> and another one, how do you prefer $h{key} over $h{'key'} in case
  MP> where key is strictly English \w class?

i prefer quotes all the time for fixed string hash keys. combination of
old habit and being cautious. i have seen issues where a key was also a
function name and with quotes you know which one it is (even if it works
the way you want).

  MP> Most importantly, what would you tell to someone who doesn't have
  MP> a clue about use strict and warnings, and at the same time
  MP> insisting on quoted hash keys? :)

that is another matter and i don't know the person. the size of the
cluebat you use will be critical. :)

uri

-- 
Uri Guttman  ------  uri@stemsystems.com  -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs  ----------------------------  http://jobs.perl.org


------------------------------

Date: 17 Jul 2003 21:54:08 GMT
From: "Tassilo v. Parseval" <tassilo.parseval@rwth-aachen.de>
Subject: Re: perlstyles
Message-Id: <bf75u0$bnm$1@nets3.rz.RWTH-Aachen.DE>

Also sprach Uri Guttman:

>>>>>> "MP" == Matija Papec <mpapec@yahoo.com> writes:

>  MP> and another one, how do you prefer $h{key} over $h{'key'} in case
>  MP> where key is strictly English \w class?
> 
> i prefer quotes all the time for fixed string hash keys. combination of
> old habit and being cautious. i have seen issues where a key was also a
> function name and with quotes you know which one it is (even if it works
> the way you want).

Is that a problem actually? I tend to have problems with the opposite
case: Where something is meant to be a keyword but perl treats it as a
string, as in

    $hash{ shift } = 1;

That leads to some contortions like

    $hash{ +shift } = 1;

etc.

I've lately come to use quotes, too. My reason is the syntax-highlighter
of vim that marks hash-keys in the same color as strings only when they
are enclosed in some pair of quotation marks. Otherwise they wont get
colored at all.

Tassilo
-- 
$_=q#",}])!JAPH!qq(tsuJ[{@"tnirp}3..0}_$;//::niam/s~=)]3[))_$-3(rellac(=_$({
pam{rekcahbus})(rekcah{lrePbus})(lreP{rehtonabus})!JAPH!qq(rehtona{tsuJbus#;
$_=reverse,s+(?<=sub).+q#q!'"qq.\t$&."'!#+sexisexiixesixeseg;y~\n~~dddd;eval


------------------------------

Date: Thu, 17 Jul 2003 21:58:21 GMT
From: Uri Guttman <uri@stemsystems.com>
Subject: Re: perlstyles
Message-Id: <x7r84ohl37.fsf@mail.sysarch.com>

>>>>> "TvP" == Tassilo v Parseval <tassilo.parseval@rwth-aachen.de> writes:

  TvP> Also sprach Uri Guttman:
  >>>>>>> "MP" == Matija Papec <mpapec@yahoo.com> writes:

  MP> and another one, how do you prefer $h{key} over $h{'key'} in case
  MP> where key is strictly English \w class?
  >> 
  >> i prefer quotes all the time for fixed string hash keys. combination of
  >> old habit and being cautious. i have seen issues where a key was also a
  >> function name and with quotes you know which one it is (even if it works
  >> the way you want).

  TvP> Is that a problem actually? I tend to have problems with the opposite
  TvP> case: Where something is meant to be a keyword but perl treats it as a
  TvP> string, as in

  TvP>     $hash{ shift } = 1;

  TvP> That leads to some contortions like

  TvP>     $hash{ +shift } = 1;

maybe it isn't a real problem but i don't like the possible visual
ambiguity. and the issue you raise is true. i would do shift() instead
though. and i can't ever recall needing that kind of code. i would more
likely be using a var for the shift and the var as the hash key. again,
the outer code matters as to what i would do.

uri

-- 
Uri Guttman  ------  uri@stemsystems.com  -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs  ----------------------------  http://jobs.perl.org


------------------------------

Date: 17 Jul 2003 12:57:35 -0700
From: joseph.vanquakebeke@ngc.com (Joseph)
Subject: Regular Expression help
Message-Id: <d216aa8d.0307171157.5e63c194@posting.google.com>

Hi all;

   I am having trouble with a regular expression not working the way I
expect it to.  My code:

 if (/^\s*<!--\s*\w*\s*-->|^\s*<!--\s*\w*\s*\w*\s*-->$/) {
             if (DEBUG) {
                              print "<!-- Some words and spaces -->
comment line pattern single line : \n";
                              print "\n$_";
             }
             # match a multi-line comment that spans 1 or lines.
             $count++;
             next;
         }

As you can see I am matching a begining of a line followed by 0 or
more whitespace followed by <!-- followed by 0 or more white space
followed by --> OR the same begining with the addition of another 0 or
spaces followed 0 or more words with the same ending.  Now if I run
this snippet on an xml file I ghet the following output:
   <!-- finder permissions -->
blank line pattern:
<!-- Some words and spaces --> comment line pattern single line :

   <!-- transactions -->
blank line pattern:
<!-- Some words and spaces --> comment line pattern single line :

   <!-- finder transactions -->
blank line pattern:

but if I take out the OR and just have the first patteren I get:
blank line pattern:
blank line pattern:
blank line pattern:
blank line pattern:
blank line pattern:
<!-- Some words and spaces --> comment line pattern single line :

   <!-- transactions -->
blank line pattern:
blank line pattern:


where are the lines with more than one word.   I really do not want to
have write a seperate expression for 1 .. N words.  I thought the
asterisk was supposed to be greedy. I have tried the + but it has the
same effect.

Can someone explain why this is not working?  I think I can fix this
if I try to use a character class but I am not sure. I am still
reading the camel book to find out.

TIA
Joseph


------------------------------

Date: Thu, 17 Jul 2003 22:27:06 +0200
From: Matija Papec <mpapec@yahoo.com>
Subject: Re: Regular Expression help
Message-Id: <av0ehv0tdd9le46u5glfj1lkpau21a4sta@4ax.com>

X-Ftn-To: Joseph 

joseph.vanquakebeke@ngc.com (Joseph) wrote:
>Hi all;
>
>   I am having trouble with a regular expression not working the way I
>expect it to.  My code:
>
> if (/^\s*<!--\s*\w*\s*-->|^\s*<!--\s*\w*\s*\w*\s*-->$/) {

imo, you'll be better with grouping things with (),

/^\s*<!--(?:\s*\w*\s*|\s*\w*\s*\w*\s*)-->$/

or something similar.
(?: .. ) -> doesn't capture it's contents to $1,$2..

You could also chomp before condition as newline character is probably at
the end of $_.



-- 
Matija


------------------------------

Date: Thu, 17 Jul 2003 22:02:06 -0000
From: gbacon@hiwaay.net (Greg Bacon)
Subject: Re: Regular Expression help
Message-Id: <vhe76ucu4ds8d8@corp.supernews.com>

In article <d216aa8d.0307171157.5e63c194@posting.google.com>,
    Joseph <joseph.vanquakebeke@ngc.com> wrote:

:    I am having trouble with a regular expression not working the way I
: expect it to.  My code:
: 
:  [reformatted]
:  if (/^\s*<!--\s*\w*\s*-->|^\s*<!--\s*\w*\s*\w*\s*-->$/) {
:      if (DEBUG) {
:          print "<!-- Some words and spaces --> ",
:                "comment line pattern single line : \n",
:                "\n$_";
:      }
:      # match a multi-line comment that spans 1 or lines.
:      $count++;
:      next;
:  }
: 
: [snip output]

In general with Usenet questions, it's almost always better to give a
complete -- but BRIEF! -- example.  The less effort people have to
expend in understanding your post, the more likely you are to get good
answers.

Where did the blank line pattern come from?

: where are the lines with more than one word.   I really do not want to
: have write a seperate expression for 1 .. N words.  I thought the
: asterisk was supposed to be greedy. I have tried the + but it has the
: same effect.

You could write your pattern as

    if (/^\s*<!--\s*[\w\s]*\s*-->$/) {
        ...
    }

This will look for alternating runs of any length of either whitespace
or word characters within your comments.

IMPORTANT: this regular expression will not correctly recognize all XML
comments and could yield false positives.  If you want to parse XML, use
an XML parser. :-)

: Can someone explain why this is not working?  I think I can fix this
: if I try to use a character class but I am not sure. I am still
: reading the camel book to find out.

The Kleene star has a greedy in perl's regular expression matcher, but
you're anchoring your match, with a possibility for at most two runs of
word characters.

Consider the following comment:

    <!-- a b c -->

The matcher will try your first alternative (^\s*<!--\s*\w*\s*-->), but
will only get as far as indicated:

    substring   subpattern
    =========   ==========  
    BEGIN           ^
    SPACES         \s*
    <!--          <!--
    SPACE          \s*
    a              \w*
    SPACE          \s*
    b              -->     FAIL!

Then it'll try the other alternative (^\s*<!--\s*\w*\s*\w*\s*-->$):

    substring   subpattern
    =========   ==========  
    BEGIN           ^
    SPACES         \s*
    <!--          <!--
    SPACE          \s*
    a              \w*
    SPACE          \s*
    b              \w*
    SPACE          \s*
    c              -->     FAIL!

With the pattern I gave (/^\s*<!--\s*[\w\s]*\s*-->$/), it'll match:


    substring   subpattern
    =========   ==========  
    BEGIN           ^
    SPACES         \s*
    <!--          <!--
    SPACE          \s*
    a b c        [\w\s]*
    SPACE          \s*
    -->            -->
    END             $

Hope this helps,
Greg
-- 
The whole aim of practical politics is to keep the populace alarmed -- and
thus clamorous to be led to safety -- by menacing it with an endless
series of hobgoblins, all of them imaginary. 
    -- H.L. Mencken


------------------------------

Date: 17 Jul 2003 14:02:20 -0700
From: joeminga@yahoo.com (Domenico Discepola)
Subject: tab delimited file processing problem
Message-Id: <698c67f.0307171302.5b601332@posting.google.com>

Hi all.  I have constructed a script that uses Win32::OLE to save an Excel
workbook as a tab-delimited text file (TSV file).  This works fine.  My next
step is to perform formatting on each field per line in the TSV file while
retaining the # of fields.  The problem lies with "empty" cells in the 1st
column of the Excel file.

Example Excel file row:
col A's value=<empty>
col B's value = "1"
col C's value = "2"
<end of row>

When you use Win32::OLE to "tell" Excel to save this as a TSV file (using
the SaveAs method), a hex-dump of the resultant TSV file reveals row1 as:
/^\t12$/  (using regex notation).  In other words, I lose the existence of
col A (which I need).

I was thinking of the following solution:
s/^\t/\s\t/, $my_line;
but could there be a 'better' way to handle it?

Any suggestions on how to best solve this problem would be appreciated.

Thanks in advance.


------------------------------

Date: Thu, 17 Jul 2003 21:29:54 -0000
From: gbacon@hiwaay.net (Greg Bacon)
Subject: Re: tab delimited file processing problem
Message-Id: <vhe5ainqmno028@corp.supernews.com>

In article <698c67f.0307171302.5b601332@posting.google.com>,
    Domenico Discepola <joeminga@yahoo.com> wrote:

: [...]
: Example Excel file row:
: col A's value=<empty>
: col B's value = "1"
: col C's value = "2"
: <end of row>
: 
: When you use Win32::OLE to "tell" Excel to save this as a TSV file (using
: the SaveAs method), a hex-dump of the resultant TSV file reveals row1 as:
: /^\t12$/  (using regex notation).  In other words, I lose the existence of
: col A (which I need).

How have you lost column A?  Consider the example below:

    C:\Temp>type try
    #! perl

    use warnings;
    use strict;

    use Data::Dumper;

    my $data   = "\t12";
    my @fields = split /\t/, $data;

    print Dumper \@fields;

    C:\Temp>perl try
    $VAR1 = [
              '',
              '12'
            ];

Are you sure there wasn't a TAB between the 1 and the 2?  Even so,
you're still happy; note that the first element of @fields is empty:

    #! perl

    use warnings;
    use strict;

    use Data::Dumper;

    my $data   = "\t1\t2";
    my @fields = split /\t/, $data;

    print Dumper \@fields;

    C:\Temp>perl try
    $VAR1 = [
              '',
              '1',
              '2'
            ];

How were you trying to extract the fields in your TSV file?

Technical side note: what you're calling tab-delimited is really
tab-separated.  Using [TAB] to make things stand out, a tab-delimited
record would look like

    [TAB]field_1[TAB]field_2[TAB]...[TAB]field_n[TAB]

Tab-*separated*, however, would look like

    field_1[TAB]field_2[TAB]...[TAB]field_n

Hope this helps,
Greg
-- 
It remains true today as it did in fascist Italy, socialist Germany, New
Deal America, and socialist Russia: freedom has no greater opponents than
those who despise and demonize commercial society.
    -- Lew Rockwell


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

The Perl-Users Digest is a retransmission of the USENET newsgroup
comp.lang.perl.misc.  For subscription or unsubscription requests, send
the single line:

	subscribe perl-users
or:
	unsubscribe perl-users

to almanac@ruby.oce.orst.edu.  

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

To request back copies (available for a week or so), send your request
to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
where x is the volume number and y is the issue number.

For other requests pertaining to the digest, send mail to
perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
sending perl questions to the -request address, I don't have time to
answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V10 Issue 5234
***************************************


home help back first fref pref prev next nref lref last post