[30071] in Perl-Users-Digest


home	help	back	first	fref	pref	prev	next	nref	lref	last	post
Perl-Users Digest, Issue: 1314 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Wed Feb 27 21:09:42 2008

Date: Wed, 27 Feb 2008 18:09:09 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Wed, 27 Feb 2008     Volume: 11 Number: 1314

Today's topics:
    Re: a very simplistic example of a perl module <tadmc@seesig.invalid>
    Re: how to process each directory <rsarpi@gmail.com>
    Re: how to process each directory <jimsgibson@gmail.com>
    Re: how to process each directory <rsarpi@gmail.com>
    Re: how to process each directory <jimsgibson@gmail.com>
    Re: how to process each directory <tadmc@seesig.invalid>
        how to remove the letter n from a line of text <pauls@nospam.off>
    Re: how to remove the letter n from a line of text <jurgenex@hotmail.com>
    Re: how to remove the letter n from a line of text <jimsgibson@gmail.com>
    Re: how to remove the letter n from a line of text <john@castleamber.com>
        parameterized sorting function <agw@comcast.net>
    Re: parameterized sorting function <ben@morrow.me.uk>
    Re: parameterized sorting function <uri@stemsystems.com>
        RegEx - matching previous match <jellings@gmail.com>
        RegEx - matching previous match <jellings@gmail.com>
    Re: RegEx - matching previous match <noreply@gunnar.cc>
    Re: RegEx - matching previous match <tadmc@seesig.invalid>
        subroutines vs method vs function <Benson.Hoi@googlemail.com>
    Re: subroutines vs method vs function <joost@zeekat.nl>
    Re: subroutines vs method vs function <uri@stemsystems.com>
    Re: subroutines vs method vs function <joost@zeekat.nl>
    Re: uc() and utf8 <ben@morrow.me.uk>
    Re: uc() and utf8 <stoupa@practisoft.cz>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Wed, 27 Feb 2008 19:37:57 -0600
From: Tad J McClellan <tadmc@seesig.invalid>
Subject: Re: a very simplistic example of a perl module
Message-Id: <slrnfsc43l.ijc.tadmc@tadmc30.sbcglobal.net>

[ If you don't want to continue to look silly, then do not top-post.
  Text rearranged into actual chronological order.
]

BH <Benson.Hoi@googlemail.com> wrote:
> On Feb 26, 7:51 pm, xhos...@gmail.com wrote:


>> Who will receive credit for your homework, me or you?
>>
> Interesting! It's interesting someone actually thinks that any typical
> school/university will teach Perl as a subject? ;)


I don't "think" it is taught, I "know" it is.

I have taught a for-credit Perl class at a typical university.


-- 
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"


------------------------------

Date: Wed, 27 Feb 2008 13:17:09 -0800 (PST)
From: monk <rsarpi@gmail.com>
Subject: Re: how to process each directory
Message-Id: <c1587ec1-4592-4c55-bb4e-a6918120cacd@n75g2000hsh.googlegroups.com>

I get from the shell:

<< is not a directory.r/logs
what the hell?...No such file or directory

Then I do a manual regular cd command to the directory logs and I'm
in.
Even with the full path which the program itself printed with print "$_
\n";

any clues?


------------------------------

Date: Wed, 27 Feb 2008 14:16:18 -0800
From: Jim Gibson <jimsgibson@gmail.com>
Subject: Re: how to process each directory
Message-Id: <270220081416187878%jimsgibson@gmail.com>

In article
<c1587ec1-4592-4c55-bb4e-a6918120cacd@n75g2000hsh.googlegroups.com>,
monk <rsarpi@gmail.com> wrote:

> I get from the shell:
> 
> << is not a directory.r/logs

The above output line is a clue.

> what the hell?...No such file or directory
> 
> Then I do a manual regular cd command to the directory logs and I'm
> in.
> Even with the full path which the program itself printed with print "$_
> \n";
> 
> any clues?

Yes. Your output starts with '<<'. Where is the '>>$_' part of your
line? And why do you have '.r/logs' at the end of the line that isn't
in the original print statement?

Maybe you have an embedded \r character in your file? Or some other
unprintable character. On *nix, use 'od -c' to see just what characters
are in your file. Try using a string value or array in your program
instead of reading an external file to ensure you can actually open the
directory with opendir.

If that isn't it, please post a short, complete program (with 'use
strict;' and 'use warnings;') that demonstrates your problem. Use the
special <DATA> filehandle and include your data file at the end of your
program after the line

__DATA__

Also please observe the other guidelines for posting to
comp.lang.perl.misc:
<http://mail.augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html>
(currently not accessible to me) and posted here twice a week. 

Also note that many of the most helpful people reading this newsgroup
are not using Google Groups as a newsreader, so it helps to include the
lines of code to which you are referring.

Good luck!

-- 
Jim Gibson

 Posted Via Usenet.com Premium Usenet Newsgroup Services
----------------------------------------------------------
    ** SPEED ** RETENTION ** COMPLETION ** ANONYMITY **
----------------------------------------------------------        
                http://www.usenet.com


------------------------------

Date: Wed, 27 Feb 2008 15:23:24 -0800 (PST)
From: monk <rsarpi@gmail.com>
Subject: Re: how to process each directory
Message-Id: <541fc356-fc1c-4fd1-a2ea-e9ea1de6fe65@s19g2000prg.googlegroups.com>

Thanks for the tip 'od -c <filename>'

I uncovered an extra character was being added somehow when slurping
the config file.
I added one little line to my code and now it works like a charm.

 foreach (@directories_to_clean){
    chomp;

    chop; # one extra line that solved my problem.

    print "$_\n";

    #and everything else is the same

}

Thanks everybody again.




------------------------------

Date: Wed, 27 Feb 2008 15:51:44 -0800
From: Jim Gibson <jimsgibson@gmail.com>
Subject: Re: how to process each directory
Message-Id: <270220081551448116%jimsgibson@gmail.com>

In article
<541fc356-fc1c-4fd1-a2ea-e9ea1de6fe65@s19g2000prg.googlegroups.com>,
monk <rsarpi@gmail.com> wrote:

> Thanks for the tip 'od -c <filename>'
> 
> I uncovered an extra character was being added somehow when slurping
> the config file.
> I added one little line to my code and now it works like a charm.
> 
>  foreach (@directories_to_clean){
>     chomp;
> 
>     chop; # one extra line that solved my problem.
> 
>     print "$_\n";
> 
>     #and everything else is the same
> 
> }

That is not a good, long-term solution. It is not likely that slurping
the file is adding extra characters. It is more likely that the file
does not have the proper line endings for your system. As soon as
somebody edits your file with the proper editor, it may remove the
extra characters and your program will no longer work.

You should determine exactly what the extra characters are and remove
them. chop will remove the last character of your string, regardless of
what it is.

In your original program, which you do not show, you have already used
chomp on the lines read from your file, and the chomp shown above is a
no-op. chomp will remove the expected line-ending character or
characters from a string. If they are not found, chomp does nothing.

Use the substitute operator or the tr operator to remove only those
characters that do not belong:

  s/[\r\n]+//;
  tr/\r\n/d;

-- 
Jim Gibson

 Posted Via Usenet.com Premium Usenet Newsgroup Services
----------------------------------------------------------
    ** SPEED ** RETENTION ** COMPLETION ** ANONYMITY **
----------------------------------------------------------        
                http://www.usenet.com


------------------------------

Date: Wed, 27 Feb 2008 19:31:58 -0600
From: Tad J McClellan <tadmc@seesig.invalid>
Subject: Re: how to process each directory
Message-Id: <slrnfsc3oe.ijc.tadmc@tadmc30.sbcglobal.net>

Jim Gibson <jimsgibson@gmail.com> wrote:

><http://mail.augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html>
> (currently not accessible to me) and posted here twice a week. 


It changed location last summer (at revision 1.8):

    http://www.rehabitation.com/clpmisc.shtml
or
    http://www.rehabitation.com/clpmisc/clpmisc_guidelines.html


-- 
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"


------------------------------

Date: Wed, 27 Feb 2008 15:19:44 -0800
From: pauls <pauls@nospam.off>
Subject: how to remove the letter n from a line of text
Message-Id: <Tcadnbal4eWBbVjanZ2dnUVZ_j6dnZ2d@seanet.com>

I want to replace all occurances of the letter n in a line of text.
The confusion I am having is due to the fact that n denotes a new line 
in reg expressions.

I tried to do this:

s/n/fred/;        to replace the letter n with fred. But, it did not happen.

This is an area of PERL for which I have not seemed to get it straight. 
That is, when you want to replace a letter that is used by PERL in reg 
expression operations.

Thanks!

P.


------------------------------

Date: Wed, 27 Feb 2008 23:40:17 GMT
From: Jürgen Exner <jurgenex@hotmail.com>
Subject: Re: how to remove the letter n from a line of text
Message-Id: <lmsbs31lpf4m6b9fhp2a0o1gocv1acev1u@4ax.com>

pauls <pauls@nospam.off> wrote:
>I want to replace all occurances of the letter n in a line of text.
>The confusion I am having is due to the fact that n denotes a new line 
>in reg expressions.

No, it doesn't. "\n" denotes a newline when interpolated in a double quoted
string.

>I tried to do this:
>s/n/fred/;        to replace the letter n with fred. But, it did not happen.

Works for me

	use strict; use warnings;
	$_ = 'banana';
	s/n/fred/g;
	print;

	C:\tmp>t.pl
	bafredafreda

jue


------------------------------

Date: Wed, 27 Feb 2008 16:00:02 -0800
From: Jim Gibson <jimsgibson@gmail.com>
Subject: Re: how to remove the letter n from a line of text
Message-Id: <270220081600028050%jimsgibson@gmail.com>

In article <Tcadnbal4eWBbVjanZ2dnUVZ_j6dnZ2d@seanet.com>, pauls
<pauls@nospam.off> wrote:

> I want to replace all occurances of the letter n in a line of text.
> The confusion I am having is due to the fact that n denotes a new line 
> in reg expressions.
> 
> I tried to do this:
> 
> s/n/fred/;        to replace the letter n with fred. But, it did not happen.
> 
> This is an area of PERL for which I have not seemed to get it straight. 
> That is, when you want to replace a letter that is used by PERL in reg 
> expression operations.

'n' only denotes a newline if it is preceded by a backslash character
in double-quotish mode: "\n".

Examples:

perl -e '$x="alno";$x=~s/n/fred/;print"$x\n";'
alfredo

perl -e '$x="al\no";$x=~s/n/fred/;print"$x\n";'
al
o

perl -e '$x=q(al\no);$x=~s/n/fred/;print"$x\n";'
al\fredo

If this doesn't work for you, please post some code showing what you
get and how it differs from what you expect.

(FYI: you can type Perl as a name, not an acronym)

-- 
Jim Gibson

 Posted Via Usenet.com Premium Usenet Newsgroup Services
----------------------------------------------------------
    ** SPEED ** RETENTION ** COMPLETION ** ANONYMITY **
----------------------------------------------------------        
                http://www.usenet.com


------------------------------

Date: 28 Feb 2008 00:19:37 GMT
From: John Bokma <john@castleamber.com>
Subject: Re: how to remove the letter n from a line of text
Message-Id: <Xns9A51BA6F9B4C2castleamber@130.133.1.4>

Jürgen Exner <jurgenex@hotmail.com> wrote:

>      bafredafreda

LOL

-- 
John

Arachnids near Coyolillo - part 1
http://johnbokma.com/mexit/


------------------------------

Date: Wed, 27 Feb 2008 17:24:41 -0500
From: Art Werschulz <agw@comcast.net>
Subject: parameterized sorting function
Message-Id: <m2wsoqm40m.fsf@comcast.net>

Hi.

I would like to do radix sort on an array @A of strings having equal
length.  Is there some way to do something like the following?

sub radixSort{
  my @A = @_;
  my $d = length($A[0]);
  for (my $pos = $d - 1; $pos >= 0; $pos--) {
    @A = sort  sortFunction($pos) @A;
    displayArray($pos, @A);
  }
}

sub sortFunction {$a[$pos] <=> $b[$pos]}

IOW, I would like sortFunction to depend on a parameter.

Thanks.

-- 
Art Werschulz (agw STRUDEL comcast.net)
207 Stoughton Ave Cranford NJ 07016
(908) 272-1146


------------------------------

Date: Wed, 27 Feb 2008 22:44:49 +0000
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: parameterized sorting function
Message-Id: <1o5h95-915.ln1@osiris.mauzo.dyndns.org>


Quoth agw@dsm.fordham.edu:
> 
> I would like to do radix sort on an array @A of strings having equal
> length.  Is there some way to do something like the following?
> 
> sub radixSort{
>   my @A = @_;
>   my $d = length($A[0]);
>   for (my $pos = $d - 1; $pos >= 0; $pos--) {
>     @A = sort  sortFunction($pos) @A;
>     displayArray($pos, @A);
>   }
> }
> 
> sub sortFunction {$a[$pos] <=> $b[$pos]}
> 
> IOW, I would like sortFunction to depend on a parameter.

Use a closure:

    sub radixSort {
        my @A = @_;
        my $d = length($A[0]);
        for my $pos (reverse 0 .. $d - 1) {
            @A = sort { $a[$pos] <=> $b[$pos] } @A;
            displayArray($pos, @A);
        }
    }

Ben



------------------------------

Date: Wed, 27 Feb 2008 22:55:40 GMT
From: Uri Guttman <uri@stemsystems.com>
Subject: Re: parameterized sorting function
Message-Id: <x7lk5657rn.fsf@mail.sysarch.com>

>>>>> "BM" == Ben Morrow <ben@morrow.me.uk> writes:

  BM> Quoth agw@dsm.fordham.edu:
  >> 
  >> I would like to do radix sort on an array @A of strings having equal
  >> length.  Is there some way to do something like the following?
  >> 
  >> sub radixSort{
  >> my @A = @_;
  >> my $d = length($A[0]);
  >> for (my $pos = $d - 1; $pos >= 0; $pos--) {
  >> @A = sort  sortFunction($pos) @A;
  >> displayArray($pos, @A);
  >> }
  >> }
  >> 
  >> sub sortFunction {$a[$pos] <=> $b[$pos]}
  >> 
  >> IOW, I would like sortFunction to depend on a parameter.

  BM> Use a closure:

  BM>     sub radixSort {
  BM>         my @A = @_;
  BM>         my $d = length($A[0]);
  BM>         for my $pos (reverse 0 .. $d - 1) {
  BM>             @A = sort { $a[$pos] <=> $b[$pos] } @A;
  BM>             displayArray($pos, @A);
  BM>         }
  BM>     }

or a sort block:

@A = sort { sortFunction($pos) } @A;

try looking at Sort::Maker as well. you can create sort functions and
dump the code for copy/paste use or call them directly. it will also
generate very fast sorts.

uri

-- 
Uri Guttman  ------  uri@stemsystems.com  --------  http://www.sysarch.com --
-----  Perl Architecture, Development, Training, Support, Code Review  ------
-----------  Search or Offer Perl Jobs  ----- http://jobs.perl.org  ---------
---------  Gourmet Hot Cocoa Mix  ----  http://bestfriendscocoa.com ---------


------------------------------

Date: Wed, 27 Feb 2008 14:12:56 -0800 (PST)
From: j  ellings <jellings@gmail.com>
Subject: RegEx - matching previous match
Message-Id: <783aa06f-07ec-469c-bb85-e7777be0a622@64g2000hsw.googlegroups.com>

Hello.

I have an html file converted from PDF that includes the following
sample lines:

(html has been converted)

&lt;i&gt;&lt;b&gt;Z &amp; A Newsstand&lt;/b&gt;&lt;/i&gt;&lt;br&gt;
&lt;i&gt;Retail Food: Mobile Food Vendor&lt;/i&gt;&lt;br&gt;
&lt;i&gt;2 N 10th St&lt;/i&gt;&lt;br&gt;
&lt;i&gt;Philadelphia, PA 19107&lt;/i&gt;&lt;br&gt;
&lt;b&gt;Inspection Date&lt;/b&gt;&lt;br&gt;
&lt;i&gt;4/11/07&lt;/i&gt;&lt;br&gt;
No Critical Violations&lt;br&gt;
&lt;i&gt;4/11/07&lt;/i&gt;&lt;br&gt;
No Critical Violations&lt;br&gt;
&lt;i&gt;11/28/06&lt;/i&gt;&lt;br&gt;
No Critical Violations&lt;br&gt;
&lt;i&gt;4/24/06&lt;/i&gt;&lt;br&gt;
No Critical Violations&lt;br&gt;
&lt;i&gt;&lt;b&gt;Newstand&lt;/b&gt;&lt;/i&gt;&lt;br&gt;
&lt;i&gt;Retail Food: Mobile Food Vendor&lt;/i&gt;&lt;br&gt;
&lt;i&gt;32 N 10th St&lt;/i&gt;&lt;br&gt;
&lt;i&gt;Philadelphia, PA 19107&lt;/i&gt;&lt;br&gt;
&lt;b&gt;Inspection Date&lt;/b&gt;&lt;br&gt;
&lt;i&gt;7/2/07&lt;/i&gt;&lt;br&gt;
No Critical Violations&lt;br&gt;
&lt;i&gt;&lt;b&gt;Pudgies Deli&lt;/b&gt;&lt;/i&gt;&lt;br&gt;
&lt;i&gt;Retail Food: Restaurant, Eat-in&lt;/i&gt;&lt;br&gt;
&lt;i&gt;46 N 10th St&lt;/i&gt;&lt;br&gt;
&lt;i&gt;Philadelphia, PA 19107&lt;/i&gt;&lt;br&gt;
&lt;b&gt;Inspection Date&lt;/b&gt;&lt;br&gt;
&lt;i&gt;1/11/07&lt;/i&gt;&lt;br&gt;
No Critical Violations&lt;br&gt;
&lt;i&gt;9/25/06&lt;/i&gt;&lt;br&gt;
No Critical Violations&lt;br&gt;
&lt;i&gt;8/7/06&lt;/i&gt;&lt;br&gt;
No Critical Violations&lt;br&gt;


I am trying to capture the information between the &lt;i&gt;&lt;b&gt;
tags as these are the only unique delimiters between entries.

My regex is as follows:

while ($html =~ m{<i><b>(.*?)<i><b>}gs) {
#do something
}

Unfortunately, the regex will match the first instance( Z &amp; A
Newsstand), but ignore the second (Newstand) and then match on the
third (Pudgies Deli).

I can see that the match is working according to what I wrote;  I am
trying to fine tune it so that I can grab every match.  Is there a way
to include the previous &lt;i&gt;&lt;b&gt; in the next match such that
it will not skip  a potential match?

Any suggestions or advice would be most appreciated.

John

Any



------------------------------

Date: Wed, 27 Feb 2008 14:12:59 -0800 (PST)
From: j  ellings <jellings@gmail.com>
Subject: RegEx - matching previous match
Message-Id: <245e0a0b-21e5-443e-8a3e-774e50656a15@t66g2000hsf.googlegroups.com>

Hello.

I have an html file converted from PDF that includes the following
sample lines:

(html has been converted)

&lt;i&gt;&lt;b&gt;Z &amp; A Newsstand&lt;/b&gt;&lt;/i&gt;&lt;br&gt;
&lt;i&gt;Retail Food: Mobile Food Vendor&lt;/i&gt;&lt;br&gt;
&lt;i&gt;2 N 10th St&lt;/i&gt;&lt;br&gt;
&lt;i&gt;Philadelphia, PA 19107&lt;/i&gt;&lt;br&gt;
&lt;b&gt;Inspection Date&lt;/b&gt;&lt;br&gt;
&lt;i&gt;4/11/07&lt;/i&gt;&lt;br&gt;
No Critical Violations&lt;br&gt;
&lt;i&gt;4/11/07&lt;/i&gt;&lt;br&gt;
No Critical Violations&lt;br&gt;
&lt;i&gt;11/28/06&lt;/i&gt;&lt;br&gt;
No Critical Violations&lt;br&gt;
&lt;i&gt;4/24/06&lt;/i&gt;&lt;br&gt;
No Critical Violations&lt;br&gt;
&lt;i&gt;&lt;b&gt;Newstand&lt;/b&gt;&lt;/i&gt;&lt;br&gt;
&lt;i&gt;Retail Food: Mobile Food Vendor&lt;/i&gt;&lt;br&gt;
&lt;i&gt;32 N 10th St&lt;/i&gt;&lt;br&gt;
&lt;i&gt;Philadelphia, PA 19107&lt;/i&gt;&lt;br&gt;
&lt;b&gt;Inspection Date&lt;/b&gt;&lt;br&gt;
&lt;i&gt;7/2/07&lt;/i&gt;&lt;br&gt;
No Critical Violations&lt;br&gt;
&lt;i&gt;&lt;b&gt;Pudgies Deli&lt;/b&gt;&lt;/i&gt;&lt;br&gt;
&lt;i&gt;Retail Food: Restaurant, Eat-in&lt;/i&gt;&lt;br&gt;
&lt;i&gt;46 N 10th St&lt;/i&gt;&lt;br&gt;
&lt;i&gt;Philadelphia, PA 19107&lt;/i&gt;&lt;br&gt;
&lt;b&gt;Inspection Date&lt;/b&gt;&lt;br&gt;
&lt;i&gt;1/11/07&lt;/i&gt;&lt;br&gt;
No Critical Violations&lt;br&gt;
&lt;i&gt;9/25/06&lt;/i&gt;&lt;br&gt;
No Critical Violations&lt;br&gt;
&lt;i&gt;8/7/06&lt;/i&gt;&lt;br&gt;
No Critical Violations&lt;br&gt;


I am trying to capture the information between the &lt;i&gt;&lt;b&gt;
tags as these are the only unique delimiters between entries.

My regex is as follows:

while ($html =~ m{<i><b>(.*?)<i><b>}gs) {
#do something
}

Unfortunately, the regex will match the first instance( Z &amp; A
Newsstand), but ignore the second (Newstand) and then match on the
third (Pudgies Deli).

I can see that the match is working according to what I wrote;  I am
trying to fine tune it so that I can grab every match.  Is there a way
to include the previous &lt;i&gt;&lt;b&gt; in the next match such that
it will not skip  a potential match?

Any suggestions or advice would be most appreciated.

John

Any



------------------------------

Date: Thu, 28 Feb 2008 02:21:09 +0100
From: Gunnar Hjalmarsson <noreply@gunnar.cc>
Subject: Re: RegEx - matching previous match
Message-Id: <62mgobF24drntU1@mid.individual.net>

j ellings wrote:
>
> (html has been converted)

Yes, but why on earth did you post the data in that format?

<non-html data snipped>

> I am trying to capture the information between the &lt;i&gt;&lt;b&gt;
> tags as these are the only unique delimiters between entries.
> 
> My regex is as follows:
> 
> while ($html =~ m{<i><b>(.*?)<i><b>}gs) {
> #do something
> }
> 
> Unfortunately, the regex will match the first instance( Z &amp; A
> Newsstand), but ignore the second (Newstand) and then match on the
> third (Pudgies Deli).
> 
> I can see that the match is working according to what I wrote;  I am
> trying to fine tune it so that I can grab every match.  Is there a way
> to include the previous &lt;i&gt;&lt;b&gt; in the next match such that
> it will not skip  a potential match?

A zero-width positive look-ahead assertion may be what you are after; 
see "perldoc perlre".

     while ($html =~ m{<i><b>(.*?)(?=<i><b>)}gs) {
---------------------------------^^^------^

Another approach that doesn't slurp the whole file into a scalar variable:

     local $/ = '<i><b>';
     while ( my $html = <> ) {
         #do something
     }

-- 
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl


------------------------------

Date: Wed, 27 Feb 2008 19:21:26 -0600
From: Tad J McClellan <tadmc@seesig.invalid>
Subject: Re: RegEx - matching previous match
Message-Id: <slrnfsc34m.ijc.tadmc@tadmc30.sbcglobal.net>

j ellings <jellings@gmail.com> wrote:
> Hello.
>
> I have an html file converted from PDF that includes the following
> sample lines:
>
> (html has been converted)


Why has HTML been converted?

This is a plain-text medium...


> &lt;i&gt;&lt;b&gt;Z &amp; A Newsstand&lt;/b&gt;&lt;/i&gt;&lt;br&gt;
                                           ^^        ^^
                                           ^^        ^^


> My regex is as follows:
>
> while ($html =~ m{<i><b>(.*?)<i><b>}gs) {


End tags have slash characters in them that your pattern will not match.

Your data closes the bold before the italic, but your regex looks
for the italic close before the bold close.


> I can see that the match is working according to what I wrote;


You have a strange definition of "working" then...


> trying to fine tune it so that I can grab every match.  Is there a way
> to include the previous &lt;i&gt;&lt;b&gt; in the next match such that
> it will not skip  a potential match?


You do not need a way to include the previous <i><b> in the next match.


> Any suggestions or advice would be most appreciated.


    while ($html =~ m{<i><b>(.*?)</b></i>}gs) {


-- 
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"


------------------------------

Date: Wed, 27 Feb 2008 14:38:00 -0800 (PST)
From: BH <Benson.Hoi@googlemail.com>
Subject: subroutines vs method vs function
Message-Id: <076d22a1-8d0c-478f-8cd9-5a32eff935ae@28g2000hsw.googlegroups.com>

Hi,

Can someone clarify the 3 terms wrt Perl providing with some examples?

Regards,

BH


------------------------------

Date: Wed, 27 Feb 2008 23:57:45 +0100
From: Joost Diepenmaat <joost@zeekat.nl>
Subject: Re: subroutines vs method vs function
Message-Id: <873arevwgm.fsf@zeekat.nl>

BH <Benson.Hoi@googlemail.com> writes:

> Hi,
>
> Can someone clarify the 3 terms wrt Perl providing with some examples?

they're all subroutines. functions and subroutines are the same thing in
perl.

methods are subroutines that are called using method resolution and get
the object passed as the first argument. IOW there methods as such, just
functions that get called using method call semantics.

package Bla;
sub something { print "my arguments are '@_'\n" }

Bla->something();   # call as (class) method

something();        # call as function

Bla::something();   # call as function with explicit package name

my $o = bless {},"Bla";  # make a "Bla" object
$o->something();    # call as object method

-- 
Joost Diepenmaat | blog: http://joost.zeekat.nl/ | work: http://zeekat.nl/


------------------------------

Date: Wed, 27 Feb 2008 22:58:43 GMT
From: Uri Guttman <uri@stemsystems.com>
Subject: Re: subroutines vs method vs function
Message-Id: <x7hcfu57mj.fsf@mail.sysarch.com>

>>>>> "B" == BH  <Benson.Hoi@googlemail.com> writes:

  B> Can someone clarify the 3 terms wrt Perl providing with some examples?

put your question in the BODY of your message. then it reads properly
and it is easier to write replies.

subroutines vs method vs function

subs are perl level routines you code and call.

sub bar {
	print "bar was called\n" ;
}


functions are things built into perl. read perldoc perlfunc for a list
and description of them all.

methods are just perl subs that are called via an object or a class
using a object oriented call. read perldoc perlobj for more on that.

uri

-- 
Uri Guttman  ------  uri@stemsystems.com  --------  http://www.sysarch.com --
-----  Perl Architecture, Development, Training, Support, Code Review  ------
-----------  Search or Offer Perl Jobs  ----- http://jobs.perl.org  ---------
---------  Gourmet Hot Cocoa Mix  ----  http://bestfriendscocoa.com ---------


------------------------------

Date: Wed, 27 Feb 2008 23:59:57 +0100
From: Joost Diepenmaat <joost@zeekat.nl>
Subject: Re: subroutines vs method vs function
Message-Id: <87y796uhsi.fsf@zeekat.nl>

Joost Diepenmaat <joost@zeekat.nl> writes:

> IOW there methods as such, just functions that get called using method
> call semantics.

That should read "there are no methods as such"

-- 
Joost Diepenmaat | blog: http://joost.zeekat.nl/ | work: http://zeekat.nl/


------------------------------

Date: Wed, 27 Feb 2008 22:12:20 +0000
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: uc() and utf8
Message-Id: <4r3h95-on4.ln1@osiris.mauzo.dyndns.org>


Quoth Alex <check.sig@for.email.invalid>:
> Ben Morrow wrote:
> > Quoth Alex <check.sig@for.email.invalid>:
> >> 
> >> I suggest you also look into, and play around with, the functions 
> >> is_utf8(), _utf8_on(), _utf8_off() and from_to(). This will give you a 
> >> good overall picture of how Perl does UTF-8, but you probably won't need 
> >> these here.
> > 
> > No, don't touch any of the functions in the utf8:: namespace. They are
> > part of the internals of perl's Unicode implementation, and shouldn't be
> > used by ordinary Perl code; especially _utf8_{on,off}. Use the functions
> > in Encode:: instead.
> 
> Those were the ones I was talking about. I thought it was clear from the 
> fact that Encode:: was the only namespace I mentioned.

Sorry: I had forgotten they were in Encode:: as well. Don't use any of
the *utf8* functions from there, either: stick to encode and decode.

Ben



------------------------------

Date: Thu, 28 Feb 2008 02:25:48 +0100
From: "Petr Vileta" <stoupa@practisoft.cz>
Subject: Re: uc() and utf8
Message-Id: <fq52sk$3gq$2@ns.felk.cvut.cz>

Ben Morrow wrote:
>> My data source is in cp1250 so I *really* need to use this codepage.
>
> You data source (input) and your output do not need to be in the same
> encoding.
>
Yes, and must NOT be ;-) Maybe my English is too poor. I wanted to say I must 
use cp1250 because my source data is coded in this codepage, but output I want 
to be in utf8.

>    1. ead in binary data (make sure you use binmode on your
>        filehandles, btw),
I'm not sure if I must to do it every time. My data going from $ARGV, from 
param() /CGI module/,  from disk file or from LWP module.

>    2. convert that binary data into characters, using the CP1250
>        encoding (Encode::decode),
Sure

>    3. uc or lc those characters,
This is what I was asking for ;-)

>    4. convert them back into binary data, using any encoding of your
>        choice (Encode::encode),
>    5. write that data out to a filehandle (again, make sure you use
>        binmode).
Really I must do it? My output is to browser, in other word my script is cgi 
script on web server (Linux/Apache).
I tested to do output without Encode::encode and this work as I need.

>> my $utftxt = Encode::decode( 'CP1250', $wintxt );
>
> This far is good.
>
>> print "<br>utf8:$utftxt, " uppercase utf8: ", uc($utftxt), "\n";
>
> No, you've missed a step: read the code I posted again. You can't just
> print character data to a filehandle: you'll get 'Wide character in
> print' warnings, and you'll get output in perl's internal data format,
> which is an incomprehensible mixture of ISO8859-1 and UTF8. You hace
> to convert the characters back into bytes, using any encoding of your
> choice.
Hmm, curious, but this work for me. IMHO all characters in range \x20 - \x7f 
are on the same position in utf8 code, right? And all national characters 
\x80 - \xff was be converted to utf8 by Encode::decode( 'CP1250', $wintxt ).
-- 
Petr Vileta, Czech republic
(My server rejects all messages from Yahoo and Hotmail. Send me your
mail from another non-spammer site please.)

Please reply to <petr AT practisoft DOT cz>



------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc.  For subscription or unsubscription requests, send
#the single line:
#
#	subscribe perl-users
#or:
#	unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.  

NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice. 

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 1314
***************************************

home	help	back	first	fref	pref	prev	next	nref	lref	last	post
[30071] in Perl-Users-Digest

Perl-Users Digest, Issue: 1314 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)Wed Feb 27 21:09:42 2008

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Wed Feb 27 21:09:42 2008