[24657] in Perl-Users-Digest


home	help	back	first	fref	pref	prev	next	nref	lref	last	post
Perl-Users Digest, Issue: 6821 Volume: 10

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Tue Aug 3 14:12:00 2004

Date: Tue, 3 Aug 2004 11:11:21 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Tue, 3 Aug 2004     Volume: 10 Number: 6821

Today's topics:
        print join to a file <laura.hradowy@NOSPAM.mts.caaaaa>
    Re: print join to a file <nobull@mail.com>
    Re: print join to a file <Joe.Smith@inwap.com>
    Re: Priority (J. Romano)
    Re: Priority <bik.mido@tiscalinet.it>
        PROBLEM IDENTIFIED Re: Parsing form POST without CGI.pm <aaron@deloachcorp.com>
    Re: PROBLEM IDENTIFIED Re: Parsing form POST without CG <gnari@simnet.is>
    Re: PROBLEM IDENTIFIED Re: Parsing form POST without CG <ceo@nospam.on.net>
        Problem with file upload in forum <rijt@dse.nl>
    Re: Problem with file upload in forum <tadmc@augustmail.com>
    Re: Problem with file upload in forum <rijt@dse.nl>
    Re: Problem with file upload in forum <ebohlman@omsdev.com>
    Re: Problem with file upload in forum (Andrew Palmer)
    Re: Problem with file upload in forum <rijt@dse.nl>
    Re: Problem with file upload in forum <andrewpalmer@email.com>
        RegEx issue (Dan)
    Re: RegEx issue <mritty@gmail.com>
    Re: RegEx issue <nobull@mail.com>
    Re: RegEx issue <noreply@gunnar.cc>
    Re: RegEx issue <mritty@gmail.com>
    Re: RegEx issue <noreply@gunnar.cc>
    Re: RegEx issue (Charles DeRykus)
    Re: RegEx issue <noreply@gunnar.cc>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Tue, 27 Jul 2004 11:19:52 -0500
From: "LHradowy" <laura.hradowy@NOSPAM.mts.caaaaa>
Subject: print join to a file
Message-Id: <JgvNc.2255$tK5.11262@news1.mts.net>

I have been able to prompt user for data, then use join to add it to the
data. But, I need to go further.

First off I need to open the file, right now I just type script.pl <file>

Then I need to take the output of the data, and put it into a file. I have
tried opening then print to it, then close the file but it does not work.
Actually I need to do 3 things, print the join, then grep for CUST and send
that to a file, grep for TELN send that to a file, and grep -v and send
everything that does not have CUST of TELN to another file.

#!/opt/perl/bin/perl
system ("clear");   #Clear the screen
$acode="204";

$[ = 1;                 # set array base to 1
$, = ',';               # set output field separator
$\ = "\n";              # set output record separator
Enter BLD: $room=<STDIN> ; chomp $room;
print "Enter ROOM:"; $room=<STDIN> ; chomp $room;

$[ = 1;                 # set array base to 1
$, = ',';               # set output field separator
$\ = "\n";              # set output record separator

while(<ARGV>) {
chomp;
@a=split(",",$_);
print FH  join(",",$acode.$a[0],$bld,$room,$a[2],$a[3],"\n") ;
}




------------------------------

Date: 27 Jul 2004 18:05:53 +0100
From: Brian McCauley <nobull@mail.com>
Subject: Re: print join to a file
Message-Id: <u97jspcra9.fsf@wcl-l.bham.ac.uk>

"LHradowy" <laura.hradowy@NOSPAM.mts.caaaaa> writes:

> Newsgroups: comp.lang.perl.misc,comp.lang.perl.tk

Can you explain why you think this has the slightest thing to do with TK?

> I have been able to prompt user for data, then use join to add it to the
> data. But, I need to go further.
> 
> First off I need to open the file, right now I just type script.pl <file>

You open files with open().

> Then I need to take the output of the data, and put it into a file. I have
> tried opening then print to it, then close the file but it does not
> work.

Please show us a minimal but complete example of code in which you
have actually tried to do this and explain in what way it does not
work.  (Code you posted would not compile and does not attempt to open
any output files).  This, and much other, usefull advice can be found
in the posting guidelines for this group. 

> Actually I need to do 3 things, print the join, then grep for CUST and send
> that to a file, grep for TELN send that to a file, and grep -v and send
> everything that does not have CUST of TELN to another file.
> 
> #!/opt/perl/bin/perl
> system ("clear");   #Clear the screen
> $acode="204";
> 
> $[ = 1;                 # set array base to 1

Why are you doing this?  Messing with $[ is very strongly discouraged.
Your code then later tries to access element 0 of an array which will
not, of course, exist if your array subscripts start at 1.

> $, = ',';               # set output field separator

Why are you doing this?  There are no prints in your code that have
more than one argument.

> $\ = "\n";              # set output record separator

Why are you doing this?  You may have a reason but I suspect you are
just doing random things without knowing what they mean.

> Enter BLD: $room=<STDIN> ; chomp $room;
> print "Enter ROOM:"; $room=<STDIN> ; chomp $room;

That's not even syntactically valid Perl.

You should always declare all variables as lexically scoped in the
smallest applicable lexical scope unless you have a positive reason to
do otherwise. BTW: this is not perculliar to Perl, it applies in all
programming languges - allowing that a language not having lexical
variables is a positive reason :-).

For Perl this means that most of the time the declaration of scalars
should be combined with the first assignment. BTW: this to is not
perculliar to Perl, it also applies in other programming languges
where assignment and declaration can be combined.

By following this convention you will be able to get maximum beniefit
out of putting "use strict" at the top of all your scripts.

Try to get into this habit now, do not wait for your failure to do so
to cause you the unecessary distress of wasting your own time and that
of other people.  The longer you leave it the harder you will find it
to adjust.  Worse still, if you leave it too long you may never adjust
and may mutate into a bitter and twisted troll.

> $[ = 1;                 # set array base to 1
> $, = ',';               # set output field separator
> $\ = "\n";              # set output record separator

Why are you doing these again?  Perl is not going to have forgotten you know.

> while(<ARGV>) {
> chomp;
> @a=split(",",$_);

The point of $_ is that it is the implicit "current thing that goes
without saying".  So don't say it! :-) 

> print FH  join(",",$acode.$a[0],$bld,$room,$a[2],$a[3],"\n") ;

You have not opened FH.

Did you really want a comma before the newline?

-- 
     \\   ( )
  .  _\\__[oo
 .__/  \\ /\@
 .  l___\\
  # ll  l\\
 ###LL  LL\\


------------------------------

Date: Tue, 27 Jul 2004 17:44:53 GMT
From: Joe Smith <Joe.Smith@inwap.com>
Subject: Re: print join to a file
Message-Id: <pwwNc.38166$8_6.37192@attbi_s04>

LHradowy wrote:

> $[ = 1;                 # set array base to 1

Ugh.  That is horrible.  Don't do that.
Telling perl that you will be using $a[1] to mean the first element
in the array (as opposed to $a[0]) is not good perl programming.

> print FH  join(",",$acode.$a[0],$bld,$room,$a[2],$a[3],"\n") ;

You've told perl that you are never going to use $a[0], yet here
you are using it.  With contradictions like that, the program won't work.

	-Joe


------------------------------

Date: 23 Jul 2004 00:22:32 -0700
From: jl_post@hotmail.com (J. Romano)
Subject: Re: Priority
Message-Id: <b893f5d4.0407222322.11f951aa@posting.google.com>

Vito Corleone <corleone@godfather.com> wrote in message news:<20040723110810.7311a9d6.corleone@godfather.com>...
> 
> If I write:
> return $x || undef();
> Which will be executed first? Is it:
> (return $x) || undef(); 
> or
> return ($x || undef());


Dear Vito,

   Sometimes it's not immediately clear how Perl will parse an
expression.  In cases like these, I use the "-MO=Deparse,-p" switch,
like this:

      # In UNIX:
      perl -MO=Deparse,-p -e 'sub { return $x || undef(); }'

      # In DOS:
      perl -MO=Deparse,-p -e "sub { return $x || undef(); }"

(I put your expression inside a subroutine in order to make the
one-liner compile.)

   The output of this command shows the line:

      return(($x || undef()));

signifying that your second guess is correct.

   Hope this helps,

   -- Jean-Luc


------------------------------

Date: Sat, 24 Jul 2004 20:28:25 +0200
From: Michele Dondi <bik.mido@tiscalinet.it>
Subject: Re: Priority
Message-Id: <ih64g0hdaf6s3mj10rvsoq4jcps6bp82l4@4ax.com>

On Fri, 23 Jul 2004 11:08:10 +0900, Vito Corleone
<corleone@godfather.com> wrote:

>Hi,
>
>If I write:
>return $x || undef();
>
>Which will be executed first? Is it:
>(return $x) || undef();
>
>or
>return ($x || undef());

  perldoc perlop

and/or, as another poster already suggested,

  perl -MO=Deparse,-p


HTH,
Michele
-- 
you'll see that it shouldn't be so. AND, the writting as usuall is
fantastic incompetent. To illustrate, i quote:
- Xah Lee trolling on clpmisc,
  "perl bug File::Basename and Perl's nature"


------------------------------

Date: Mon, 2 Aug 2004 00:22:40 -0500
From: "Aaron DeLoach" <aaron@deloachcorp.com>
Subject: PROBLEM IDENTIFIED Re: Parsing form POST without CGI.pm on Win32
Message-Id: <v4udnXi4E8kxUpDcRVn-hQ@eatel.net>

"Aaron DeLoach" <aaron@deloachcorp.com> wrote in message
news:KM6dnaNCa5IO8pHcRVn-hA@eatel.net...
> My Perl programs are developed in the Win32 environment. Some of my work
> gets ported to the Unix OS.
>
> I use the CGI.pm module to 'paramitize' form post data. Everything works
> well with this great module.
>
> However, I have a program that will be ran every ten seconds or so (maybe
> more?).  I use the CGI.pm just to parse the initial form post data into
> parameters that I immediately place and work with in hashes (I love
hashes).
> We control the form post data, so I'm not terribly worried about problems
> that the CGI.pm module tends too regarding such. This seems like a bit of
> overkill just to parse parameters I know, but in Windows there is no STDIN
> to parse form posts from like the Unix OS.
>
> Does anybody have a work-around/solution/tip/anything to get around using
> the CGI.pm for this instance?
>
> Regards,
> Aaron
>
>
>

I am posting this message to update the thread. Maybe it will help someone
else.

Throughout my trials with this subject I was lead to believe that WinXP Home
did not expose the STDIN object. The problem is an Internet Explorer 6 issue
(I don't know about earlier versions) . I could not read the STDIN via Perl
when a form was submitted with IE 6. On NN and Opera the STDIN was
available. Now I'll try to find the solution... (I'll update the ng)

Regards,
Aaron





------------------------------

Date: Mon, 2 Aug 2004 11:30:55 -0000
From: "gnari" <gnari@simnet.is>
Subject: Re: PROBLEM IDENTIFIED Re: Parsing form POST without CGI.pm on Win32
Message-Id: <cel8hk$ktn$1@news.simnet.is>

"Aaron DeLoach" <aaron@deloachcorp.com> wrote in message
news:v4udnXi4E8kxUpDcRVn-hQ@eatel.net...

[ seemingly wierd problem with STDIN ]

>
> I am posting this message to update the thread. Maybe it will help someone
> else.
>
> Throughout my trials with this subject I was lead to believe that WinXP
Home
> did not expose the STDIN object. The problem is an Internet Explorer 6
issue
> (I don't know about earlier versions) . I could not read the STDIN via
Perl
> when a form was submitted with IE 6. On NN and Opera the STDIN was
> available. Now I'll try to find the solution... (I'll update the ng)

[ from another branch of this thread ]

"Aaron DeLoach" <aaron@deloachcorp.com> wrote in message
news:uY2dnbT3OeRYPJHcRVn-rw@eatel.net...
>
> "ChrisO" <ceo@nospam.on.net> wrote in message
> news:zJ_Oc.3614$iH4.2189@newssvr15.news.prodigy.com...
> > Aaron DeLoach wrote:
>
> >
> > perl -e "print while (<STDIN>)"
> > Dude, this works fine!
> > Dude, this works fine!
> > ^Z
> >
>
> This does not work for me (Win XP Home/Perl 5.8.4)

what about this red herring , then ?

gnari





------------------------------

Date: Tue, 03 Aug 2004 02:46:34 GMT
From: ChrisO <ceo@nospam.on.net>
Subject: Re: PROBLEM IDENTIFIED Re: Parsing form POST without CGI.pm on Win32
Message-Id: <e0DPc.1779$TR5.1176@newssvr16.news.prodigy.com>

gnari wrote:
> "Aaron DeLoach" <aaron@deloachcorp.com> wrote in message
> news:v4udnXi4E8kxUpDcRVn-hQ@eatel.net...
> 
> [ seemingly wierd problem with STDIN ]
> 
> 
>>I am posting this message to update the thread. Maybe it will help someone
>>else.
>>
>>Throughout my trials with this subject I was lead to believe that WinXP
> 
> Home
> 
>>did not expose the STDIN object. The problem is an Internet Explorer 6
> 
> issue
> 
>>(I don't know about earlier versions) . I could not read the STDIN via
> 
> Perl
> 
>>when a form was submitted with IE 6. On NN and Opera the STDIN was
>>available. Now I'll try to find the solution... (I'll update the ng)
> 
> 
> [ from another branch of this thread ]
> 
> "Aaron DeLoach" <aaron@deloachcorp.com> wrote in message
> news:uY2dnbT3OeRYPJHcRVn-rw@eatel.net...
> 
>>"ChrisO" <ceo@nospam.on.net> wrote in message
>>news:zJ_Oc.3614$iH4.2189@newssvr15.news.prodigy.com...
>>
>>>Aaron DeLoach wrote:
>>
>>>perl -e "print while (<STDIN>)"
>>>Dude, this works fine!
>>>Dude, this works fine!
>>>^Z
>>>
>>
>>This does not work for me (Win XP Home/Perl 5.8.4)
> 
> 
> what about this red herring , then ?
> 

The immediate exact question I had/have...

-ceo


------------------------------

Date: Sat, 31 Jul 2004 10:53:17 +0200
From: "Maarten" <rijt@dse.nl>
Subject: Problem with file upload in forum
Message-Id: <410b5d0b$0$138$e4fe514c@dreader19.news.xs4all.nl>

I'm running a forum on a debian box and all of a sudden my uploads don't
seem to be working anymore.

The code for the upload is the folowing:

sub _save_file{
    my ($self, $file_hdr, $file_name) = @_;
    my ($ret, $bytesread, $data);

    open (SAVE, ">$config->{'file_path'}/$file_name") or die "an error
occured:
    while ($bytesread = read($file_hdr,$data,1024)) {
    print SAVE $data;
    }
    close SAVE;
$ret = -s "$config->{'file_path'}/$file_name";
return $ret;

Is there something wrong with this code?
Because I'm getting files with a filesize of 0 kb instead of the real
filesize.




------------------------------

Date: Sat, 31 Jul 2004 08:44:02 -0500
From: Tad McClellan <tadmc@augustmail.com>
Subject: Re: Problem with file upload in forum
Message-Id: <slrncgn8h2.22j.tadmc@magna.augustmail.com>

Maarten <rijt@dse.nl> wrote:

>     open (SAVE, ">$config->{'file_path'}/$file_name") or die "an error
> occured:


Where is the end of the die() string?


>     while ($bytesread = read($file_hdr,$data,1024)) {
>     print SAVE $data;
>     }
>     close SAVE;
> $ret = -s "$config->{'file_path'}/$file_name";
> return $ret;
> 
> Is there something wrong with this code?


Yes, it has a syntax error and thus will not execute at all!


-- 
    Tad McClellan                          SGML consulting
    tadmc@augustmail.com                   Perl programming
    Fort Worth, Texas


------------------------------

Date: Sun, 1 Aug 2004 20:36:04 +0200
From: "Maarten" <rijt@dse.nl>
Subject: Re: Problem with file upload in forum
Message-Id: <410d373d$0$141$e4fe514c@dreader11.news.xs4all.nl>

Sorry this wasn't the correct code it should be this:

     sub _save_file{
            my ($self, $file_hdr, $file_name) = @_;
            my ($ret, $bytesread, $buffer);

            open FILE, ">$config->{'file_path'}/$file_name" ||
                return -1;
            binmode(FILE);
            while ($bytesread=read($file_hdr,$buffer,1024)) {
                print FILE $buffer;
            }
            warn( "$file_hdr");

            close FILE;
            $ret = -s "$config->{'file_path'}/$file_name";
            return $ret;
        }

If you got a solution please explain what is wrong and how to fix it. I'm
not really familiar with perl.




------------------------------

Date: 1 Aug 2004 20:40:57 GMT
From: Eric Bohlman <ebohlman@omsdev.com>
Subject: Re: Problem with file upload in forum
Message-Id: <Xns9538A02EECEC8ebohlmanomsdevcom@130.133.1.4>

"Maarten" <rijt@dse.nl> wrote in 
news:410d373d$0$141$e4fe514c@dreader11.news.xs4all.nl:

> Sorry this wasn't the correct code it should be this:
> 
>      sub _save_file{
>             my ($self, $file_hdr, $file_name) = @_;
>             my ($ret, $bytesread, $buffer);
> 
>             open FILE, ">$config->{'file_path'}/$file_name" ||
>                 return -1;

Due to precedence issues, this statement will never return even if the open 
fails.  Change the '||' to 'or'.

>             binmode(FILE);
>             while ($bytesread=read($file_hdr,$buffer,1024)) {

There's no need for $bytesread.

>                 print FILE $buffer;
>             }
>             warn( "$file_hdr");

What is this supposed to do?

 
>             close FILE;
>             $ret = -s "$config->{'file_path'}/$file_name";
>             return $ret;

You don't need $ret, and I get the sneaking suspicion that what you really 
want to do is test if the file was written successfully, which you would do 
by checking the result of the close.

>         }


------------------------------

Date: 1 Aug 2004 14:13:55 -0700
From: andrewpalmer@email.com (Andrew Palmer)
Subject: Re: Problem with file upload in forum
Message-Id: <faebb1fb.0408011313.4330a3e4@posting.google.com>

"Maarten" <rijt@dse.nl> wrote in message news:<410d373d$0$141$e4fe514c@dreader11.news.xs4all.nl>...
> Sorry this wasn't the correct code it should be this:
> 
>      sub _save_file{
>             my ($self, $file_hdr, $file_name) = @_;
>             my ($ret, $bytesread, $buffer);
> 
>             open FILE, ">$config->{'file_path'}/$file_name" ||
>                 return -1;
>             binmode(FILE);
>             while ($bytesread=read($file_hdr,$buffer,1024)) {

read() takes a FILEHANDLE, not a file name. What is $file_hdr?

>                 print FILE $buffer;
>             }
>             warn( "$file_hdr");

This should report something like "GLOB(0x...) at line whatever" if
$file_hdr is a FILEHANDLE. If this is a file name, you need to open()
it first and send the FILEHANDLE to read().

> 
>             close FILE;
>             $ret = -s "$config->{'file_path'}/$file_name";
>             return $ret;
>         }
> 
> If you got a solution please explain what is wrong and how to fix it. I'm
> not really familiar with perl.


------------------------------

Date: Mon, 2 Aug 2004 00:55:59 +0200
From: "Maarten" <rijt@dse.nl>
Subject: Re: Problem with file upload in forum
Message-Id: <410d74e2$0$30778$e4fe514c@dreader16.news.xs4all.nl>

The code I was posting was how it used to work. It's from a forum called
Sporum. But with some upgrade of perl I guess it isn't working anymore.
This script is excecuted by another script. I don't really understand how it
works.
The warn() gives me a filename.


The page that opens the script looks like this:

<FORM METHOD="post" ACTION="$config->{'cgidir'}/comments.cgi"
ENCTYPE="multipart/form-data">
<INPUT TYPE="file" name="file_hdl" size=30 maxlength=80>
$lang->{'select_file'}:</B><BR>
          <font size=1>$file_namef
<INPUT TYPE="submit" name="op" value="$lang->{'AttachFile'}">|


The file attach section in comments.cgi looks like this:

    require Ops::FileAttach;
      # --- save the file, and return error if one
      my ($fid, $caution) =
      Ops::FileAttach->new("", $spdb, $STATE)->save_file_attachment();

       require Templates::NewComment;

                # --- create a template object
                $template = Templates::NewComment->new($spcgi, $spdb,
                                                       $fid, $caution);

I hope this helps.
"Andrew Palmer" <andrewpalmer@email.com> schreef in bericht
news:faebb1fb.0408011313.4330a3e4@posting.google.com...
> "Maarten" <rijt@dse.nl> wrote in message
news:<410d373d$0$141$e4fe514c@dreader11.news.xs4all.nl>...
> > Sorry this wasn't the correct code it should be this:
> >
> >      sub _save_file{
> >             my ($self, $file_hdr, $file_name) = @_;
> >             my ($ret, $bytesread, $buffer);
> >
> >             open FILE, ">$config->{'file_path'}/$file_name" ||
> >                 return -1;
> >             binmode(FILE);
> >             while ($bytesread=read($file_hdr,$buffer,1024)) {
>
> read() takes a FILEHANDLE, not a file name. What is $file_hdr?
>
> >                 print FILE $buffer;
> >             }
> >             warn( "$file_hdr");
>
> This should report something like "GLOB(0x...) at line whatever" if
> $file_hdr is a FILEHANDLE. If this is a file name, you need to open()
> it first and send the FILEHANDLE to read().
>
> >
> >             close FILE;
> >             $ret = -s "$config->{'file_path'}/$file_name";
> >             return $ret;
> >         }
> >
> > If you got a solution please explain what is wrong and how to fix it.
I'm
> > not really familiar with perl.




------------------------------

Date: Mon, 2 Aug 2004 12:11:11 -0500
From: "Andrew Palmer" <andrewpalmer@email.com>
Subject: Re: Problem with file upload in forum
Message-Id: <cyuPc.917$zc1.550@fe40.usenetserver.com>


"Maarten" <rijt@dse.nl> wrote in message
news:410d74e2$0$30778$e4fe514c@dreader16.news.xs4all.nl...

> The warn() gives me a filename.

So...

> > >
> > >      sub _save_file{
> > >             my ($self, $file_hdr, $file_name) = @_;
> > >             my ($ret, $bytesread, $buffer);
> > >
> > >             open FILE, ">$config->{'file_path'}/$file_name" ||
> > >                 return -1;
> > >             binmode(FILE);

open(IN,"< $file_hdr") or die;

> > >             while ($bytesread=read($file_hdr,$buffer,1024)) {

while ($bytesread=read(IN,$buffer,1024)) {

> > >                 print FILE $buffer;
> > >             }

close IN;

> > >             warn( "$file_hdr");
> > >
> > >             close FILE;
> > >             $ret = -s "$config->{'file_path'}/$file_name";
> > >             return $ret;
> > >         }
> > >
> > > If you got a solution please explain what is wrong and how to fix it.
> I'm
> > > not really familiar with perl.
>
>





------------------------------

Date: 29 Jul 2004 14:45:34 -0700
From: dodonnell@gmail.com (Dan)
Subject: RegEx issue
Message-Id: <6e52dd80.0407291345.11023ad6@posting.google.com>

OK, I have a perl script that reads in html files and makes some link
replacements. Everything works OK except it changes something it
shouldn't. Here is my line of code:

@getstaf2[0] =~ s!href=\"([^(/)]+)[\.]+([^@?]+)\"!href=\"_miscfiles\/$1\.$2\"!ig;

This code replaces a file of the form <a href="whatever.xxx"> to <a
href="_miscfiles/whatever.xxx">.

Now that works fine, but it seems to change things it shouldn't be,
namely instances of <a href="mailto:whoever@whatever.com"> to <a
href="_miscfiles/whoever@whaever.com">.

Interestingly, if I have two or more mailto references on a page, it
will nicely not touch the first, but will change the second. More
interestingly, if I take out the global parameter 'g' from the end of
the regex, things for fine for the emails (it doesn't touch them), but
then the actual whatever.xxx replacements don't get done.

So I don't understand why it would (a) leave one alone but not the
other since the 'g' should make it do the same for all instances, or
(b) touch the email references at all. The [^?@] atom should make sure
it skips over any email address that happen to be of the form
whoever.whoever@whatever.com.

Any help is greatly appreciated!! I've been trying to get this solved
for days!

Thanks,

Dan


------------------------------

Date: Thu, 29 Jul 2004 18:05:53 -0400
From: Paul Lalli <mritty@gmail.com>
Subject: Re: RegEx issue
Message-Id: <20040729175851.P3404@barbara.cs.rpi.edu>

On Thu, 29 Jul 2004, Dan wrote:

> OK, I have a perl script that reads in html files and makes some link
> replacements. Everything works OK except it changes something it
> shouldn't. Here is my line of code:
>
> @getstaf2[0] =~ s!href=\"([^(/)]+)[\.]+([^@?]+)\"!href=\"_miscfiles\/$1\.$2\"!ig;
>
> This code replaces a file of the form <a href="whatever.xxx"> to <a
> href="_miscfiles/whatever.xxx">.
>
> Now that works fine, but it seems to change things it shouldn't be,
> namely instances of <a href="mailto:whoever@whatever.com"> to <a
> href="_miscfiles/whoever@whaever.com">.
>
> Interestingly, if I have two or more mailto references on a page, it
> will nicely not touch the first, but will change the second. More
> interestingly, if I take out the global parameter 'g' from the end of
> the regex, things for fine for the emails (it doesn't touch them), but
> then the actual whatever.xxx replacements don't get done.
>
> So I don't understand why it would (a) leave one alone but not the
> other since the 'g' should make it do the same for all instances, or

My guess - one is contained on a single line, another spans multiple
lines, and your methodology is reading the HTML file line by line.

> (b) touch the email references at all. The [^?@] atom should make sure
> it skips over any email address that happen to be of the form
> whoever.whoever@whatever.com.

That's not what's in the regexp above.  What's in the regexp above is
[^@?]  which is looking for any pattern that doesn't match the @?
variable.  @ needs to be escaped in regexps, because they undergo
double-quotish interpolation.

> Any help is greatly appreciated!! I've been trying to get this solved
> for days!

The canonical answer to this question is: Don't parse HTML with RegExps!
Use one of the plethora of modules available on CPAN.

Paul Lalli


------------------------------

Date: 29 Jul 2004 23:17:52 +0100
From: Brian McCauley <nobull@mail.com>
Subject: Re: RegEx issue
Message-Id: <u9pt6e1mr3.fsf@wcl-l.bham.ac.uk>

dodonnell@gmail.com (Dan) writes:

> OK, I have a perl script that reads in html files and makes some link
> replacements. Everything works OK except it changes something it
> shouldn't. Here is my line of code:
> 
> @getstaf2[0] =~ s!href=\"([^(/)]+)[\.]+([^@?]+)\"!href=\"_miscfiles\/$1\.$2\"!ig;
> 
> This code replaces a file of the form <a href="whatever.xxx"> to <a
> href="_miscfiles/whatever.xxx">.
> 
> Now that works fine, but it seems to change things it shouldn't be,
> namely instances of <a href="mailto:whoever@whatever.com"> to <a
> href="_miscfiles/whoever@whaever.com">.

Define "it shouldn't be".  That target matches your regex.

> Interestingly, if I have two or more mailto references on a page, it
> will nicely not touch the first, but will change the second.

Actually that's probably not what's happening.  Note that the regex
[^(/)]+ can match quote characters and angle brakets so can run right
out of one tag and into another.

> So I don't understand why it would (a) leave one alone but not the
> other since the 'g' should make it do the same for all instances, or
> (b) touch the email references at all. The [^?@] atom should make sure
> it skips over any email address that happen to be of the form
> whoever.whoever@whatever.com.

It does not prevent the @ being matched by the [^(/)]

> Any help is greatly appreciated!! I've been trying to get this solved
> for days!

There is a reason we keep telling everyone who comes here trying to
parse HTML using simple regex[1] not to do that[2].

Can you guess what that reason is?

[1] Typically at least a couple a week.

[2] And use an HTML parsing module instead.

-- 
     \\   ( )
  .  _\\__[oo
 .__/  \\ /\@
 .  l___\\
  # ll  l\\
 ###LL  LL\\


------------------------------

Date: Fri, 30 Jul 2004 00:39:15 +0200
From: Gunnar Hjalmarsson <noreply@gunnar.cc>
Subject: Re: RegEx issue
Message-Id: <2mtcniFr5s74U1@uni-berlin.de>

Paul Lalli wrote:
> On Thu, 29 Jul 2004, Dan wrote:
>> (b) touch the email references at all. The [^?@] atom should make
>> sure it skips over any email address that happen to be of the
>> form whoever.whoever@whatever.com.
> 
> That's not what's in the regexp above.  What's in the regexp above
> is [^@?]

That's the same character class.

> which is looking for any pattern that doesn't match the @? 
> variable.  @ needs to be escaped in regexps, because they undergo 
> double-quotish interpolation.

That's not true when defining a character class, is it?

-- 
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl


------------------------------

Date: Thu, 29 Jul 2004 19:06:37 -0400
From: Paul Lalli <mritty@gmail.com>
Subject: Re: RegEx issue
Message-Id: <20040729190544.K3404@barbara.cs.rpi.edu>

On Fri, 30 Jul 2004, Gunnar Hjalmarsson wrote:

> Paul Lalli wrote:
> > On Thu, 29 Jul 2004, Dan wrote:
> >> (b) touch the email references at all. The [^?@] atom should make
> >> sure it skips over any email address that happen to be of the
> >> form whoever.whoever@whatever.com.
> >
> > That's not what's in the regexp above.  What's in the regexp above
> > is [^@?]
>
> That's the same character class.

It would seem not.

> > which is looking for any pattern that doesn't match the @?
> > variable.  @ needs to be escaped in regexps, because they undergo
> > double-quotish interpolation.
>
> That's not true when defining a character class, is it?

It would seem it is.

#!/usr/bin/perl
@f = qw/a-z/;
print "letters\n" if 'abc' =~ /[@f]/;
print "numbers\n" if '123' =~ /[@f]/;

__END__
letters



Paul Lalli


------------------------------

Date: Fri, 30 Jul 2004 01:35:36 +0200
From: Gunnar Hjalmarsson <noreply@gunnar.cc>
Subject: Re: RegEx issue
Message-Id: <2mtg17FrfhmkU1@uni-berlin.de>

Paul Lalli wrote:
> Gunnar Hjalmarsson wrote:
>> Paul Lalli wrote:
>>> On Thu, 29 Jul 2004, Dan wrote:
>>>> (b) touch the email references at all. The [^?@] atom should
>>>> make sure it skips over any email address that happen to be
>>>> of the form whoever.whoever@whatever.com.
>>> 
>>> That's not what's in the regexp above.  What's in the regexp
>>> above is [^@?]
>> 
>> That's the same character class.
> 
> It would seem not.
> 
>>> which is looking for any pattern that doesn't match the @? 
>>> variable.  @ needs to be escaped in regexps, because they
>>> undergo double-quotish interpolation.
>> 
>> That's not true when defining a character class, is it?
> 
> It would seem it is.
> 
> #!/usr/bin/perl
> @f = qw/a-z/;
> print "letters\n" if 'abc' =~ /[@f]/;
> print "numbers\n" if '123' =~ /[@f]/;
> 
> __END__
> letters

Hmm... It would seem I stand corrected. :)

Nevertheless, before posting I did something like this:

     print "No match\n" unless 'abc@def' =~ /^[^@?]+$/;
     print "Match\n" if 'abcdef' =~ /^[^@?]+$/;

Outputs:
No match
Match

So the case seems not to be *that* obvious...

-- 
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl


------------------------------

Date: Fri, 30 Jul 2004 18:19:35 GMT
From: ced@bcstec.ca.boeing.com (Charles DeRykus)
Subject: Re: RegEx issue
Message-Id: <I1oG8n.CKv@news.boeing.com>

In article <2mtg17FrfhmkU1@uni-berlin.de>,
Gunnar Hjalmarsson  <noreply@gunnar.cc> wrote:
>Paul Lalli wrote:
>> Gunnar Hjalmarsson wrote:
>>> Paul Lalli wrote:
>>>> On Thu, 29 Jul 2004, Dan wrote:
>>>>> (b) touch the email references at all. The [^?@] atom should
>>>>> make sure it skips over any email address that happen to be
>>>>> of the form whoever.whoever@whatever.com.
>>>> 
>>>> That's not what's in the regexp above.  What's in the regexp
>>>> above is [^@?]
>>> 
>>> That's the same character class.
>> 
>> It would seem not.
>> 
>>>> which is looking for any pattern that doesn't match the @? 
>>>> variable.  @ needs to be escaped in regexps, because they
>>>> undergo double-quotish interpolation.
>>> 
>>> That's not true when defining a character class, is it?
>> 
>> It would seem it is.
>> 
>> #!/usr/bin/perl
>> @f = qw/a-z/;
>> print "letters\n" if 'abc' =~ /[@f]/;
>> print "numbers\n" if '123' =~ /[@f]/;
>> 
>> __END__
>> letters
>
>Hmm... It would seem I stand corrected. :)
>
>Nevertheless, before posting I did something like this:
>
>     print "No match\n" unless 'abc@def' =~ /^[^@?]+$/;
>     print "Match\n" if 'abcdef' =~ /^[^@?]+$/;
>
>Outputs:
>No match
>Match
>
>So the case seems not to be *that* obvious...
>

Looks like you're right...  
  
perl -MO=Deparse -wle '/[@?]/'
/[\@?]/;

perl -MO=Deparse -wle '/[ab@]/'
/[ab\@]/;

perl -MO=Deparse -wle '/[@m]/'
Possible unintended interpolation of @m in string at -e line 1.
Name "main::m" used only once: possible typo at -e line 1.
/[@m]/;


--
Charles DeRykus




------------------------------

Date: Sat, 31 Jul 2004 23:23:40 +0200
From: Gunnar Hjalmarsson <noreply@gunnar.cc>
Subject: Re: RegEx issue
Message-Id: <2n2gu3Fsf2gjU1@uni-berlin.de>

Charles DeRykus wrote:
> Gunnar Hjalmarsson wrote:
>> 
>>     print "No match\n" unless 'abc@def' =~ /^[^@?]+$/;
>>     print "Match\n" if 'abcdef' =~ /^[^@?]+$/;
>> 
>> Outputs:
>> No match
>> Match
> 
> Looks like you're right...

I seem to be right about /[^@?]/, but I apparently jumped at conclusions.

> perl -MO=Deparse -wle '/[@?]/'
> /[\@?]/;
> 
> perl -MO=Deparse -wle '/[ab@]/'
> /[ab\@]/;
> 
> perl -MO=Deparse -wle '/[@m]/'
> Possible unintended interpolation of @m in string at -e line 1.
> Name "main::m" used only once: possible typo at -e line 1.
> /[@m]/;

Those warnings are displayed if strictures are not enabled and you
haven't declared the @m variable.

So, I'm a little confused. The lesson here is that @ gets interpolated
in regexes sometimes. Maybe a good enough reason to always escape that
character, but a less ambigous conclusion would be nice. :)

-- 
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc.  For subscription or unsubscription requests, send
#the single line:
#
#	subscribe perl-users
#or:
#	unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.  

NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice. 

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V10 Issue 6821
***************************************

home	help	back	first	fref	pref	prev	next	nref	lref	last	post
[24657] in Perl-Users-Digest

Perl-Users Digest, Issue: 6821 Volume: 10

daemon@ATHENA.MIT.EDU (Perl-Users Digest)Tue Aug 3 14:12:00 2004

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Tue Aug 3 14:12:00 2004