[22112] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 4334 Volume: 10

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Thu Jan 2 18:06:06 2003

Date: Thu, 2 Jan 2003 15:05:10 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Thu, 2 Jan 2003     Volume: 10 Number: 4334

Today's topics:
    Re: \U or how to uppercase <jurgenex@hotmail.com>
    Re: \U or how to uppercase <Jan.Schubert@GMX.li>
    Re: AWK vs PERL - splitting fields (Christopher Hamel)
    Re: Charting/Graphs <yanoff@yahoo.com>
        conditional search and replace <member@dbforums.com>
    Re: conditional search and replace (Tad McClellan)
    Re: Dealing with split() and quotes <mpapec@yahoo.com>
    Re: Dealing with split() and quotes ctcgag@hotmail.com
    Re: Direct Shared Memory Mapping? ctcgag@hotmail.com
    Re: Direct Shared Memory Mapping? ctcgag@hotmail.com
    Re: Direct Shared Memory Mapping? <goldbb2@earthlink.net>
        IDS: Image Display System <majiksznak@yahoo.com>
    Re: LWP & Proxy/Firewalls <pkrupa@redwood.rsc.raytheon.com>
        MSSQL::DBlib 1.009 available <sommar@algonet.se>
        Net::DNS problem on Win 98 <nobody@dev.null>
    Re: Passing an array to a subroutine <nobull@mail.com>
    Re: Printed string truncated. ctcgag@hotmail.com
    Re: RecDescent and variables (Tad McClellan)
    Re: RecDescent and variables (Tad McClellan)
        Regular Expression Help (James Colby)
    Re: Regular Expression Help <jurgenex@hotmail.com>
    Re: Regular Expression Help <perl-dvd@darklaser.com>
        Sorting hash tree from Xml::simple. (stew dean)
    Re: Sorting hash tree from Xml::simple. (Tad McClellan)
    Re: The diamond operator <goldbb2@earthlink.net>
    Re: vectors & large amounts of data - time & space prob <goldbb2@earthlink.net>
    Re: vectors & large amounts of data - time & space prob (Robert McArthur)
    Re: while and eof <news@roth.lu>
    Re: while and eof (Anno Siegel)
    Re: while and eof <news@roth.lu>
    Re: xmlgrep (bill schaller)
    Re: xmlgrep <noemail@nowhere.net>
    Re: Yet Another Question  About: "my" and scope of vars <nobull@mail.com>
    Re: Yet Another Question  About: "my" and scope of vars <goldbb2@earthlink.net>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Thu, 02 Jan 2003 17:19:49 GMT
From: "Jürgen Exner" <jurgenex@hotmail.com>
Subject: Re: \U or how to uppercase
Message-Id: <Vw_Q9.21702$Ab.4121@nwrddc03.gnilink.net>

Jan Schubert wrote:
> I've a very simple problem, but can't find an elegant oneliner to
> solve it (Shame on me):
> I just want to switch the 1st character of a string to uppercase but
> everytime i get more than one or even two lines!?

Maybe using the function "ucfirst" is easier than fumbling around with REs?

> BTW: May some explain me how to use \U? I can't find any documentation
> on it, so this might an cool solution for me...

perldoc perlre

jue




------------------------------

Date: Thu, 02 Jan 2003 18:30:38 +0100
From: Jan Schubert <Jan.Schubert@GMX.li>
Subject: Re: \U or how to uppercase
Message-Id: <av1t4b$ad2hr$1@ID-2265.news.dfncis.de>

Jürgen Exner wrote:
> Jan Schubert wrote:
> 
>>I've a very simple problem, but can't find an elegant oneliner to
>>solve it (Shame on me):
>>I just want to switch the 1st character of a string to uppercase but
>>everytime i get more than one or even two lines!?
> 
> 
> Maybe using the function "ucfirst" is easier than fumbling around with REs?
Thx, after submitting this post i just saw the coresponding FAQ :-).

Yes this might be elegant (like \U), if you not have to fiddling with 
these locale-settings. So something using tr/// would be great.

Happy new year,
Jan



------------------------------

Date: 2 Jan 2003 08:32:22 -0800
From: hamelcd@hotmail.com (Christopher Hamel)
Subject: Re: AWK vs PERL - splitting fields
Message-Id: <4f60d5b3.0301020832.8f791ac@posting.google.com>

Martien Verbruggen <mgjv@tradingpost.com.au> wrote in message news:<slrnb10cqr.4tt.mgjv@martien.heliotrope.home>...
> On Mon, 30 Dec 2002 11:11:36 +0000,
> 	Miguel Angelo Lapa Duarte <Miguel.Duarte@tmn.pt> wrote:
> > This is a multi-part message in MIME format.
> > --------------020107060506030809040900
> > Content-Type: text/plain; charset=us-ascii; format=flowed
> > Content-Transfer-Encoding: 7bit
> > 
> > Once I argued whith an Un*x old timer at my company that perl was better
> >    then awk. He told me that perl, although more flexible, could be
> > orders of magnitude slower than AWK while spliting fields.
> 
> He's right. And awk has other advantages over perl. 
> 
> $ man perlvar
> [snip]
>                Remember: the value of "$/" is a string, not a
>                regex.  awk has to be better for something. :-)
> [snip]
> 

On that note, 'cut' is likely faster than AWK if the only goal is
splitting fields, but neither AWK nor cut nor <insert favorite OS tool
here> is really a programming language.  AWK is a nice tool, and I
like it a lot, but it's no more a programming lanugage than 'cat.'

If performance is REALLY that big of an issue, I personally have no
problem with imbedding the OS tools into the Perl program:

  open IN1, "cut -d\\| -f23 $file1 |" or die;
  open IN2, "grep -v ^M- $file2 |" or die;

I realize this is frowned upon, as it makes the program non-portable,
and the performance increase is typically marginal in the grand scheme
of the overall program, but it can help if you're fighting for seconds
here and there.

CH


------------------------------

Date: Thu, 02 Jan 2003 16:41:51 -0600
From: Scott Yanoff <yanoff@yahoo.com>
Subject: Re: Charting/Graphs
Message-Id: <3e14bdc2$0$588$39cecf19@nnrp1.twtelecom.net>

Joe Smith wrote:
> In article <3E0FFD55.6030003@thecouch.homeip.net>,
> Mina Naguib  <spam@thecouch.homeip.net> wrote:
> 
>>Koos Pol wrote:
>>| Robert Sipe wrote (Monday 30 December 2002 05:48):
>>|
>>|
>>|>I need to chart performance data I extract from an SNMP daemon.  Thus, I
>>|>will periodically poll the system for a few performance values, store
>>|>the data to a file, then chart it out in on a Web page.  I have the SNMP
>>|>polling and data collection all taken care of.  It is simple graph of
>>|>perf data vs. time.  What is the recommended module/method to
>>|>chart/graph this data using a cgi script?  Thanx in advance!

Another graphing solution is to use "fly":
http://martin.gleeson.com/fly/

Good luck,


-- 
-Scott
yanoff@yahoo.com | http://www.yanoff.org | AOL IM: SAY KJY



------------------------------

Date: Thu, 02 Jan 2003 17:44:31 +0000
From: Damian Ibbotson <member@dbforums.com>
Subject: conditional search and replace
Message-Id: <2340326.1041529471@dbforums.com>


I posted this in comp.unix.shell but I reckon it's probably
better suited to this forum.  I really don't know much Perl, so
go easy on me...

I have a requirement to modify a string to correct assumed typos.
Basically the logic I have to apply is to convert the letters "I" or "O"
to the numbers "1" or "0" respectively if they occur in positions 12 or
13 of the string. I apply the same logic for the 14th to the last (the
17th) character but in this instance must also map the letter "Z" to the
number "2".

I could do this with some messy logic and/or numerous pipes but I'm
somewhat foolishly trying to do it in Perl (foolish because I'm hopeless
with Perl).

I've come up with something like this...

echo "12345678901IOZZZZ\n12345678901IOZZZX" | perl -pne 'BEGIN
{
%tr1=("O"=>"0", "I"=>"1");
%tr2=("O"=>"0", "I"=>"1", "Z"=>"2");
}
s/(^.{11})(.)(.)(.)(.)(.)(.)/"$1$tr1{$2}$tr1{$3}$tr2{$4}$tr2{$5}$tr2{$-
  6}$tr2{$7}
"/eg'

Returns...

12345678901102222
1234567890110222

The problem is, I need to account for characters that are not held in
the hashmap so that they are printed as they are without being nulled
out (as in the second line of output).

Any ideas (with explanation)? An elegant solution in awk or similar
would also be appreciated.

--
Posted via http://dbforums.com


------------------------------

Date: Thu, 2 Jan 2003 13:26:53 -0600
From: tadmc@augustmail.com (Tad McClellan)
Subject: Re: conditional search and replace
Message-Id: <slrnb194jt.n88.tadmc@magna.augustmail.com>

Damian Ibbotson <member@dbforums.com> wrote:

> I have a requirement to modify a string to correct assumed typos.
> Basically the logic I have to apply is to convert the letters "I" or "O"
> to the numbers "1" or "0" respectively if they occur in positions 12 or
> 13 of the string. I apply the same logic for the 14th to the last (the
> 17th) character but in this instance must also map the letter "Z" to the
> number "2".


> Any ideas (with explanation)? 


Use substr() as an lvalue to restrict where a tr/// is to be applied:

   perl -pe 'substr($_, 11) =~ tr/OI/01/; substr($_, 13) =~ tr/Z/2/'


> An elegant solution in awk or similar
> would also be appreciated.


From the Perl newsgroup?

Pffft!   :-)


-- 
    Tad McClellan                          SGML consulting
    tadmc@augustmail.com                   Perl programming
    Fort Worth, Texas


------------------------------

Date: Thu, 02 Jan 2003 17:00:01 +0100
From: Matija Papec <mpapec@yahoo.com>
Subject: Re: Dealing with split() and quotes
Message-Id: <8nn81vsgk99ueq6451qkb25neullv00lpk@4ax.com>

X-Ftn-To: Benjamin Goldberg 

Benjamin Goldberg <goldbb2@earthlink.net> wrote:
>[snip]
>> >1, high, "text1, text2", some_more_text
>> >
>> >I need the "text1, text 2" string to count as one element. When I use
>[snip]
>> for (split(/"/, $line)) {
>>   push @fields, ++$i%2 ? split(/, /) : $_
>
>What happens if $line is something like

Wrong results; it seems that Max Power managed so far to live on the edge
with similar dangers. I think that one should know what his input files are
and then come out with proper solution.

>    1, high, "This string contains a \" quote", some_more_text
>Or:
>    this, list, contains, an, unbalanced, ", quote mark.


-- 
Matija


------------------------------

Date: 02 Jan 2003 19:08:45 GMT
From: ctcgag@hotmail.com
Subject: Re: Dealing with split() and quotes
Message-Id: <20030102140845.532$gp@newsreader.com>

Mike@Kordik.net (Max Power) wrote:
> I have a comma delimted file that I need to parse. I need to do
> somethign with each element in between commas however there are a few
> strings that have commas in them. For example:
>
> 1, high, "text1, text2", some_more_text
>
> I need the "text1, text 2" string to count as one element. When I use
> split and split on a comma I get two elements. Now all strings have
> commas in them though. How do I get PERL to treat text in between
> quotes as one element even thought it might have the delimiter in it?

As long as literal quotes are escaped by doubling them (or literal quotes
are not allowed) and embedded newlines are not allowed, then it's a simple
matter of splitting each line on all commas that are followed by an even
number of quotes (zero is an even number).

I posted this a while ago, here it is again (with the same caveats):

This seems to work, but I'm sure someone will find some pathological cases
where it won't.  (Other than the obvious of unpaired quotes).  I don't
imagine that it is horribly efficient.

my @x = split /,(?=[^\"]*(?:\"[^\"]*\"[^\"]*)*$)/ ;

Or, as unnumerable people have said, use a module.

Xho

-- 
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service              New Rate! $9.95/Month 50GB


------------------------------

Date: 02 Jan 2003 20:02:34 GMT
From: ctcgag@hotmail.com
Subject: Re: Direct Shared Memory Mapping?
Message-Id: <20030102150234.032$na@newsreader.com>

Gordan <gordan@_NOSPAM_bobich.net> wrote:
>
> Say I have a simple parallel web crawler and I need to store a list of
> URLs. I only ever need to do 3 things on the array:

What degree of parallelism are you aiming for here?

I'd be tempted to dispense with all the perl stuff for the shared
memory and go directly to a database.  The below is written with
MySQL in mind.  to_visit is a table with a single column, url, which
is the primary key.


> 1) Find if ($#List > 1)

last I_HAVE_STUFF_TO_DO unless ($dbh->do("select count(*) from to_visit"));
# But the next step needs to check this anyway, so you can skip it here

> 2) $URL = shift(@List)

my $x=0;
do {
  my $URL= $dbh->do("select url from to_visit limit 1");
  last I_HAVE_STUFF_TO_DO unless defined $URL;
  $x=$dbh->("delete from to_visit where url=?",undef, $URL)
} until $x>0;  # returns 0E0, so can't just say 'until $x;'
# The loop detects race conditions, it is the deletion, not selection,
#   that determines the winner.


> 3) push(@List, $URL)

$dbh->do('insert ignore into to_visit values (?)',undef,$URL);


To combine this with the hash functionality, add a second column,
"visited", which is zero by default to indicated not visisted yet,
and one to indicate visited.  Then the select turns into:
  select url from to_visit where visited=0 limit 1;
and the delete turns into:
  update to_visit set visited=visited+1 where url=? and visited=0;

(You'd probably want a non-unique index on visited to speed up the
select).


> Similar approach could be used in coordinating saving of those pages, by
> having multiple crawler bots retrieve pages, and save the content into a
> list of content strings. A separate database thread could then go through
> that list and save the elements one by one.

Why do that?  Handling concurrent access is what databases specialize in,
let the database do it.

Xho

-- 
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service              New Rate! $9.95/Month 50GB


------------------------------

Date: 02 Jan 2003 20:24:33 GMT
From: ctcgag@hotmail.com
Subject: Re: Direct Shared Memory Mapping?
Message-Id: <20030102152433.667$Gg@newsreader.com>

Gordan <gordan@_NOSPAM_bobich.net> wrote:
>
> Yes, this is how I did it initially - by having a separate DB connection
> for each thread. However, DBI would very frequently (but inconsistently)
> crash the program. The program seemed vastly reduced if only one thread
> ever had an open DB connection (but it still occasionally crashes out).

That sounds very unfortunate.  Perhaps I'll continue to put off learning
about threads, if they are that unstable.


> I'll still need a small array in shared memory for status checking across
> all processes. If all threads are waiting for a URL, then the queue is
> empty and nothing is going to re-fill it, therefore, the task is
> complete. But this is not a problem, as it only needs to be an array of N
> integers, where N is the number of threads. I suppose I can live with
> copying that in and out of shared memory every time a process finds it's
> time to sleep for a few seconds while the queue size increases to
> something above 0. :-)

I'd be tempted to do this through the database also, having a table
with a single row, storing non-waiting-processes.
On process start-up, you increment the count.  When there is nothing
to do, you decrement the count and sleep.  Then you check the count,
exiting if it is zero and incrementing it and continuing otherwise.

Actually, I'd probably go more complicated, having one row per process,
with the columns being PID and working/waiting flag, that way I could
manually recover from processes that died abnormally.

Xho

-- 
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service              New Rate! $9.95/Month 50GB


------------------------------

Date: Thu, 02 Jan 2003 16:10:45 -0500
From: Benjamin Goldberg <goldbb2@earthlink.net>
Subject: Re: Direct Shared Memory Mapping?
Message-Id: <3E14AAD5.AE205C60@earthlink.net>

ctcgag@hotmail.com wrote:
> Gordan <gordan@_NOSPAM_bobich.net> wrote:
> > Yes, this is how I did it initially - by having a separate DB
> > connection for each thread. However, DBI would very frequently (but
> > inconsistently) crash the program. The program seemed vastly reduced
> > if only one thread ever had an open DB connection (but it still
> > occasionally crashes out).
> 
> That sounds very unfortunate.  Perhaps I'll continue to put off
> learning about threads, if they are that unstable.

It isn't precisely that threads themselves are unstable, but rather that
some XS modules are unstable when used with threads (most likely having
to do with them using global static data, when they should either be
using per-thread data or else use a mutex to govern access to that
global data).

Of course, I suppose that "some XS modules being unstable" is almost the
same thing as "perl being unstable", whichever projects need those
particular XS modules.

-- 
$..='(?:(?{local$^C=$^C|'.(1<<$_).'})|)'for+a..4;
$..='(?{print+substr"\n !,$^C,1 if $^C<26})(?!)';
$.=~s'!'haktrsreltanPJ,r  coeueh"';BEGIN{${"\cH"}
|=(1<<21)}""=~$.;qw(Just another Perl hacker,\n);


------------------------------

Date: Thu, 02 Jan 2003 13:43:29 -0700
From: Majik Sznak <majiksznak@yahoo.com>
Subject: IDS: Image Display System
Message-Id: <iv891v0r40fqk5ukcg2ej31h6o9suim43a@4ax.com>

I'm trying to get IDS working in Windows using Apache.  Has anybody
here done it?  Would you be able to tell me what packages I need to
get and where?  All I know I need the Image::Magick and Image::Info
perl libraries.

Thanks.


------------------------------

Date: Thu, 02 Jan 2003 19:42:03 +0000
From: "Peter A. Krupa" <pkrupa@redwood.rsc.raytheon.com>
Subject: Re: LWP & Proxy/Firewalls
Message-Id: <3E14960B.8257B68F@redwood.rsc.raytheon.com>

Show us what you tried.  I use the following from behind our firewall.
Change my
proxy, username and password, and let us know what status code you get.


use LWP::UserAgent;

$ua = new LWP::UserAgent;

$ua->proxy ( ['http', 'ftp'] => 'http://proxy-dn:80' );

$req = new HTTP::Request 'GET',"http://www.perl.com/images/ng2_3.gif";

$req->proxy_authorization_basic ( "pkrupa", "password" );

$res = $ua->request ( $req, "ng2_3.gif" );

print $res->code;



------------------------------

Date: Wed, 1 Jan 2003 20:50:27 +0000 (UTC)
From: Erland Sommarskog <sommar@algonet.se>
Subject: MSSQL::DBlib 1.009 available
Message-Id: <3e14b743$1_2@news.teranews.com>

I have now released version 1.009 MSSQL::DBlib. Its accompanying module
MSSQL::Sqllib is unchanged.

The main news are two:
* There is now a binary for ActivePerl 8xx.
* The problem that you not invoke MSSQL::* a second time from things like 
  ISAPI appears to be solved.

I have uploaded the module on CPAN. It is also available on 
http://www.algonet.se/~sommar/mssql/. Binary modules for ActivePerl 
is available.

MSSQL::DBlib is a module for accessing MS SQL Server on Windows 
platforms through DB-Library. As Microsoft is no longer developing
DB-Library, you should not consider MSSQL::DBlib or MSSQL::Sqllib
for new development, unless you can live with the restrictions they 
imply.

-- 
Erland Sommarskog, Stockholm, sommar@algonet.se




------------------------------

Date: Thu, 02 Jan 2003 17:45:30 GMT
From: Andras Malatinszky <nobody@dev.null>
Subject: Net::DNS problem on Win 98
Message-Id: <3E1479F5.80508@dev.null>

I am trying to run Net::DNS (version 0.31) on my Windows 98 machine with 
ActivePerl 5.6.1, but I keep running into a "can't read registry: No 
such file or directory" error message. Apparently, the message emanates 
from Net::DNS::Resolver, a module used internally by Net::DNS. As far as 
I can tell, Net::DNS::Resolver detects whether it's running on a Windows 
box, and if so, it tries to read the

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters

entry in the registry (this in lines 199-201 of Net/DNS/Resolver.pm). 
Problem is, my machine, along with three others I've checked, does not 
have that entry in its registry, hence the error the module is 
complaining about.

I'm wondering if any of you have encountered the same problem and if you 
possibly know a way to resolve it.

Thanks in advance.



------------------------------

Date: 02 Jan 2003 14:15:22 +0000
From: Brian McCauley <nobull@mail.com>
Subject: Re: Passing an array to a subroutine
Message-Id: <u9vg17bptx.fsf@wcl-l.bham.ac.uk>

happyman_132000@yahoo.com (HM) writes:

 [ exactly the same thing he did 2 days ago ]

The OP has got two perfectly good answers already so I suggest people
don't waste their time trying to help.

-- 
     \\   ( )
  .  _\\__[oo
 .__/  \\ /\@
 .  l___\\
  # ll  l\\
 ###LL  LL\\


------------------------------

Date: 02 Jan 2003 22:24:32 GMT
From: ctcgag@hotmail.com
Subject: Re: Printed string truncated.
Message-Id: <20030102172432.105$g9@newsreader.com>

chris_snow@bigfoot.com (Chris Snow) wrote:
> All,
>
> If I assign the output of a command to a scalar variable then print
> that scalar variable to the screen the output is truncated.
>
> eg
>
> $string = `somecommand -last 100`;
> print $string;
>
> (data protection forbids that I post the output or the command!

Geez, then make up a dummy command with dummy output that you *can* post.

> Sorry.
> But suffice to say it gives me the last 100 records of a log file)

Well, then what is the problem? from the "-last 100" options given to
somecommand, I assume you only wanted the last 100 records.  If that's
what you get, then what is wrong?

Xho

-- 
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service              New Rate! $9.95/Month 50GB


------------------------------

Date: Thu, 2 Jan 2003 08:58:00 -0600
From: tadmc@augustmail.com (Tad McClellan)
Subject: Re: RecDescent and variables
Message-Id: <slrnb18kro.m3v.tadmc@magna.augustmail.com>

Peter H.J. v.d. Kamp <kamp@inl.nl_nospam> wrote:
> When trying to run the following script I got
> the following error:
> Global symbol "%tables" requires explicit package name.


That message is from "use strict".


> I can't figure out what I'm doing wrong; 


Attempting to access a lexical variable that is out of scope.


> from Damian's documentation
> I understand that it must be possible to use variables in actions.


Yes, but you want a "package variable" rather than a "lexical variable".

The relevant part of the module's docs:

  Actions
    An action is a block of Perl code which is to be executed (as the block
    of a "do" statement) when the parser reaches that point in a production.
    The action executes within a special namespace belonging to the active
    parser, so care must be taken in correctly qualifying variable names
    (see also "Start-up Actions" below).


For more on Perl's two separate systems of variables 
(package and lexical) see:

   perldoc -q lexical

and

   "Coping with Scoping":

      http://perl.plover.com/FAQs/Namespaces.html



>    mediumQuery:
>       '/m/' querystring
>       { print "Gevonden: $item[1] $item[2]\n";
>          $tables{'bron'}=1;


   $main::tables{'bron'}=1;  # the %tables that is in the "main" namespace


>       }
>             


> my %tables = ('bron', 0, 'topic', 0, 'auteur', 0);


   our %tables = ('bron', 0, 'topic', 0, 'auteur', 0);  # package variable


-- 
    Tad McClellan                          SGML consulting
    tadmc@augustmail.com                   Perl programming
    Fort Worth, Texas


------------------------------

Date: Thu, 2 Jan 2003 09:28:20 -0600
From: tadmc@augustmail.com (Tad McClellan)
Subject: Re: RecDescent and variables
Message-Id: <slrnb18mkk.meu.tadmc@magna.augustmail.com>

Tad McClellan <tadmc@augustmail.com> wrote:
> Peter H.J. v.d. Kamp <kamp@inl.nl_nospam> wrote:


>> from Damian's documentation
>> I understand that it must be possible to use variables in actions.
> 
> 
> Yes, but you want a "package variable" rather than a "lexical variable".


That is not strictly true.

If you want to access %tables outside of the parser, then it is true.

If you only need %tables internal to the parser...


> The relevant part of the module's docs:
[snip]
>     (see also "Start-up Actions" below).


 ... then see that section of the module docs.


-- 
    Tad McClellan                          SGML consulting
    tadmc@augustmail.com                   Perl programming
    Fort Worth, Texas


------------------------------

Date: 2 Jan 2003 11:51:07 -0800
From: rainmak23@yahoo.com (James Colby)
Subject: Regular Expression Help
Message-Id: <8a263857.0301021151.6ff78e88@posting.google.com>

Hello Everybody -

I was hoping that somebody here would be able to tell me what is wrong
with the following script:

#!/usr/local/bin/perl
$time = 120031;

($H, $M, $S) = split(/[0-9]{3}/, $time);
print("$H:$M:$S\n");

I am looking for the output to look like  12:00:31.  Does anyone know
what I am doing wrong.

Thanks,
James


------------------------------

Date: Thu, 02 Jan 2003 20:14:51 GMT
From: "Jürgen Exner" <jurgenex@hotmail.com>
Subject: Re: Regular Expression Help
Message-Id: <%41R9.56099$Jb.13582@nwrddc02.gnilink.net>

James Colby wrote:
> Hello Everybody -
>
> I was hoping that somebody here would be able to tell me what is wrong
> with the following script:
>
> #!/usr/local/bin/perl
> $time = 120031;
>
> ($H, $M, $S) = split(/[0-9]{3}/, $time);
> print("$H:$M:$S\n");
>
> I am looking for the output to look like  12:00:31.  Does anyone know
> what I am doing wrong.

Couple of things.
First you are not using strict and warnings (although that wouldn't have
helped in this particular case).
Second upper case variable names are normally used for file handles only
(but this is more style issue).

And third you seem to misunderstand what split does.
The RE matches any three digits. For your example it will match the part
"120". Therefore $H will be assigned the text that comes before that text
segment. Unfortunately that is .... nothing.
Same for the $M and $S: they will contain the text between the first and
second match resp. the second and third match which happens to be ....
nothing again.

If you want to capture the matched part of the text then use grouping in the
RE of a match operation (details please see "perldoc -f m" and "perldoc
perlre").
However in your case a simple substr should be the easiest.

jue




------------------------------

Date: Thu, 02 Jan 2003 20:16:03 GMT
From: "David" <perl-dvd@darklaser.com>
Subject: Re: Regular Expression Help
Message-Id: <761R9.29$Nt3.27362@news-west.eli.net>

"James Colby" <rainmak23@yahoo.com> wrote in message
news:8a263857.0301021151.6ff78e88@posting.google.com...
> Hello Everybody -
>
> I was hoping that somebody here would be able to tell me what is wrong
> with the following script:
>
> #!/usr/local/bin/perl
> $time = 120031;
>
> ($H, $M, $S) = split(/[0-9]{3}/, $time);
> print("$H:$M:$S\n");
>
> I am looking for the output to look like  12:00:31.  Does anyone know
> what I am doing wrong.


Um, yes, to start, you might want to request 2 numbers per match rather
than 3
Second, split doesn't work quite the way you are trying to use it.
Instead try a plain regex:
This should do the trick for you.

my ($H, $M, $S) = $time =~ /(\d{2})/g;
# \d means any digit {2} means exactly two of the preceding character
# parens cause the matched values to get placed in your variables
# the g on the end tells the regex to match as many as it can

Regards,
David





------------------------------

Date: 2 Jan 2003 07:42:40 -0800
From: stewart@webslave.dircon.co.uk (stew dean)
Subject: Sorting hash tree from Xml::simple.
Message-Id: <2b68957a.0301020742.2fbdf7a8@posting.google.com>

Let me start by saying I'm newish to the perl stuff and have a fairly
defined problem which, so far, I've been sorting out.

I've now got a problem that I have found answers for but don't
understand the answers.

I am using XML::simple to read an xml file into a hash thingie. So far
I can read values and am spitting them out into HTML files. Now what I
need to do sort my hash by alphabetical order.

Here's some code:

#!/usr/bin/perl
use XML::Simple;

# grabs file.
my $venueml = XMLin('../myxmlfile.xml');

# header
print "Content-Type: text/html\n\n";
print "<br>\n";

# sort subroutine that I need to apply to venueName value.

sub ascend_alpha {lc($a) cmp lc($b)} ;
     
# print list of venuenames

$x=0;
foreach my $venue (@{$venueml->{venueList}->{venue}})
   {
   print "<font size=+2>
$venueml->{venueList}->{venue}[$x]->{venueName} <br></font>";
   $x++;
   }


Now as you can see the tree is quite complex and all the answers I
have appear to sort by a key or a value - they don't tell me how to
sort by something like venueName.

I'm kinda stumped here. I've searched for hash sort perl etc and got
back lots of answers but these appear to only work with simple hashes.

Could anyone help me with either code or an explaination that will get
me towards and answer (or both). I have done a lot of reading up on
this but time is also tight.

Cheers

Stew Dean


------------------------------

Date: Thu, 2 Jan 2003 12:26:26 -0600
From: tadmc@augustmail.com (Tad McClellan)
Subject: Re: Sorting hash tree from Xml::simple.
Message-Id: <slrnb1912i.msd.tadmc@magna.augustmail.com>

stew dean <stewart@webslave.dircon.co.uk> wrote:

> Let me start by saying I'm newish to the perl stuff


You should check out the Posting Guidelines that are posted
here weekly. You've already hurt your chances of getting
help with future Perl problems.

Also, the Data::Dumper module is invaluable for debugging
complex data structures.


> I've now got a problem that I have found answers for but don't
> understand the answers.


I expect that the answers you've found are for a different problem.

Most "sort a hash" questions are about sorting values that are
within the same hash...


> I am using XML::simple to read an xml file into a hash thingie. 


 ... but XML::Simple puts the "values of interest" in _separate_ hashes.


> Now what I
> need to do sort my hash by alphabetical order.

> #!/usr/bin/perl


You should ask for all the help you can get:

   #!/usr/bin/perl
   use strict;
   use warnings;


> my $venueml = XMLin('../myxmlfile.xml');


It would have been easier to answer your question about sorting
data if you had included the data to be sorted.


> $x=0;
> foreach my $venue (@{$venueml->{venueList}->{venue}})
>    {
>    print "<font size=+2>
> $venueml->{venueList}->{venue}[$x]->{venueName} <br></font>";
>    $x++;
>    }


There are 2 "red flags" there.

You seldom need to maintain array indexes in Perl.

You never use $venue in the body of your loop.

You could replace all of that code with just:

   foreach my $venue (@{$venueml->{venueList}->{venue}}) {
      print "<font size=+2>$venue->{venueName} <br></font>";
   }


> I'm kinda stumped here.


I suggest using map to grab the values first, then sort them:

   my @names = map $_->{venueName}, @{$venueml->{venueList}->{venue}};

   foreach my $name ( sort ascend_alpha @names ) {
      print "$name\n";
   }


Of course you don't really need the @names temporary variable,
though it may be easier to read and understand if you left it in:

   foreach my $name (sort ascend_alpha
                     map $_->{venueName}, @{$venueml->{venueList}->{venue}} ) {
      print "$name\n";
   }


-- 
    Tad McClellan                          Pedantic Prick
    tadmc@augustmail.com                   SGML consulting
    Fort Worth, Texas                      Perl programming


------------------------------

Date: Thu, 02 Jan 2003 16:20:48 -0500
From: Benjamin Goldberg <goldbb2@earthlink.net>
Subject: Re: The diamond operator
Message-Id: <3E14AD30.3465E956@earthlink.net>

trwww wrote:
> Benjamin Goldberg wrote:
[snip]
> > The diamond operator with no argument (that is, <>) is a syntactic
> > shortcut for <ARGV>.
[snip]
> This is a good reply , but the two constructs are not exactly the same
> =0)
> 
> from perldoc perlop:
[snip]

Sure, but read the next part immediately afterward the perl-like
psuedocode, where perldoc perlop continues with:

> ...except that it isn't so cumbersome to say, and will actually work.
> It really does shift the @ARGV array and put the current filename into
> the $ARGV variable. It also uses filehandle ARGV internally--<> is
> just a synonym for <ARGV>, which is magical. (The pseudo code above
> doesn't work because it treats <ARGV> as non-magical.)

-- 
$..='(?:(?{local$^C=$^C|'.(1<<$_).'})|)'for+a..4;
$..='(?{print+substr"\n !,$^C,1 if $^C<26})(?!)';
$.=~s'!'haktrsreltanPJ,r  coeueh"';BEGIN{${"\cH"}
|=(1<<21)}""=~$.;qw(Just another Perl hacker,\n);


------------------------------

Date: Thu, 02 Jan 2003 16:37:18 -0500
From: Benjamin Goldberg <goldbb2@earthlink.net>
Subject: Re: vectors & large amounts of data - time & space problems
Message-Id: <3E14B10E.50A2A01D@earthlink.net>

Robert McArthur wrote:
[snip]
> We've implemented this fine in Perl using a hash for a vector, the
> keys of the hash being the strings and the values being the real
> numbers of the dimensions. The vector is an object with name and a
> couple of other meta-information things as well as the dimension's
> hash. A set of vectors is simply another object which is a hash
> storing the name (keys) and the vector (values).
> It all works fine...
>
> except we're now trying it with large amounts of data and running
> into problem with both space and time. We're using a vocab of about
> 100,000 strings, each of which is on average 6 characters long.

So you've got something like:

my %vectors;
my $vector = $vectors{ $vectorname };
my $vector_name = $vector->{name}; # and other meta-info
my $vector_dimensions = $vector->{dimensions};
my $dimension_number = $vector_dimensions->{$dimension_name};

Where for each $vectorname, the 'dimensions' subhash contained (with
your test data) about 200 different $_string/$_number pairs, but now
(with the real data) has about 100_000 pairs.

How many different vectors (elements of %vectors) are there?

Assuming that it's a small number, then the obvious thing to do is to
tie the %{ $vector->{dimensions} } part to a DB_File or SDBM_File when
it's created.  Use filter_store_value and filter_fetch_value to pack()
and unpack() the value using the 'd' format, for optimum disk space
efficiency.

[snip]
> Can anyone give me any ideas for a better way to do this in Perl?
> 
> We're doing too many lookups, I believe, to have it on disk rather
> than in memory so that road's out.

Don't be so sure of that -- many tied-hash implementations have pretty
good caching schemes.

-- 
$..='(?:(?{local$^C=$^C|'.(1<<$_).'})|)'for+a..4;
$..='(?{print+substr"\n !,$^C,1 if $^C<26})(?!)';
$.=~s'!'haktrsreltanPJ,r  coeueh"';BEGIN{${"\cH"}
|=(1<<21)}""=~$.;qw(Just another Perl hacker,\n);


------------------------------

Date: 2 Jan 2003 22:58:42 GMT
From: mcarthur@dstc.edu.au (Robert McArthur)
Subject: Re: vectors & large amounts of data - time & space problems
Message-Id: <1041548322.763959@eeyore.dstc.edu.au>

anno4000@lublin.zrz.tu-berlin.de (Anno Siegel) writes:
>It is hard to come up with a space-saving data structure without
>knowing what kind of access those vectors have to support.

Sorry, you're right. The algorithm is called HAL - hyperspace analogue
to language (see http://locutus.ucr.edu/abstracts/ab-comput.html ).
The vectors are created by working through text. The
dimensions are words associated with the word that is the name of the
vector; the dimension's value is the 'strength' of the association.
As new associations are found when you go through the test, new
dimensions are added. When an existing association is found again,
the weight (value) is updated.

So for every association in the text we need to determine whether
the vector exists; if it does, does the dimension exist; if it does,
then add something to the value, otherwise create the dimension in
the vector and give it the initial value. If the vector doesn't exist,
then create it.  This happens a *lot* - the window size (see HAL) is
six words so for *every word* in the text, we're doing these checks
and updates 12 times. We're trying to work with about 250,000 documents,
each of which is about a page long :-) That's a lot of looking and
updating! Hence the need for memory-based. The hashes in Perl have
been great for the lookups - quite possible in, say, C, but Perl
has them for free. It looks like we just need SV's to take up less
room!

>Here is one suggestion:  Use a compressed form of the hash for
>storage and only expand to real hashes the one(s) you are actually
>working with.  If there is a character (like "\0") that can never
>appear in the hash keys or values, you can use it as a separator
>as follows:

Good idea. We thought of something similar but that's more elegant.
However, I suspect that it's going to be too much of a performance hit.
I'll try it though and see if it stops us swapping, which is
probably even more of a performance hit given how many essentially
random lookups and updates we need to do.

Thanks Anno!
Robert
--
Robert McArthur		CRC for Enterprise Distributed System Technology
  BSc(Hons)		  Ph. +61 7 3365 4310        Brisbane, Australia
  MInfTech		  Fax +61 7 3365 4311	
  Grad.Cert.Ed.		  mcarthur@dstc.edu.au	


------------------------------

Date: Thu, 2 Jan 2003 15:20:10 +0100
From: "J.M.Roth" <news@roth.lu>
Subject: Re: while and eof
Message-Id: <3e144a9a$1_2@news.vo.lu>

Hi again,
I don't know where we're going but now I'm at the point where the script
only outputs the first line of each matching file.
E.g. the line before the while, $. is undefined
Then there is the "while( my($line) = <$fh> ) {" line.
Immediately after that $. contains the number of the last line, so it goes
on to the next file and never reads what's left of the file
What I find amazing is that although $. points to the last line (before the
push), @result contains the first line of the file, ...
Greets
jm




------------------------------

Date: 2 Jan 2003 14:37:25 GMT
From: anno4000@lublin.zrz.tu-berlin.de (Anno Siegel)
Subject: Re: while and eof
Message-Id: <av1ir5$rvt$1@mamenchi.zrz.TU-Berlin.DE>

J.M.Roth <news@roth.lu> wrote in comp.lang.perl.misc:
> Hi again,
> I don't know where we're going but now I'm at the point where the script
> only outputs the first line of each matching file.
> E.g. the line before the while, $. is undefined
> Then there is the "while( my($line) = <$fh> ) {" line.
                              ^     ^
> Immediately after that $. contains the number of the last line, so it goes
> on to the next file and never reads what's left of the file
> What I find amazing is that although $. points to the last line (before the
> push), @result contains the first line of the file, ...

The parentheses spoil it, giving list context to the expression.  Since
<$fh> is read in list context, all remaining lines are read, but only
the first one is assigned to $line.  That's why $. points to the last
line.

Anno


------------------------------

Date: Thu, 2 Jan 2003 15:42:33 +0100
From: "J.M.Roth" <news@roth.lu>
Subject: Re: while and eof
Message-Id: <3e144fd9$1_2@news.vo.lu>

Never mind
It should've been my $line (scalar, not list context...)
jm

"J.M.Roth" <news@roth.lu> wrote in message news:3e144a9a$1_2@news.vo.lu...
> Hi again,
> I don't know where we're going but now I'm at the point where the script
> only outputs the first line of each matching file.
> E.g. the line before the while, $. is undefined
> Then there is the "while( my($line) = <$fh> ) {" line.
> Immediately after that $. contains the number of the last line, so it goes
> on to the next file and never reads what's left of the file
> What I find amazing is that although $. points to the last line (before
the
> push), @result contains the first line of the file, ...
> Greets
> jm
>
>




------------------------------

Date: 2 Jan 2003 15:36:23 GMT
From: schalleb@mayo.edu (bill schaller)
Subject: Re: xmlgrep
Message-Id: <av1m9n$v1p$1@tribune.mayo.edu>

Several people have commented about not using the xml libraries in Perl.  I didn't
use them because I wanted to parse non XML files that may contain XML fragments.
I have not used the XML libraries, so I don't know how well they deal with this
situation.  All my XML work is Java based, and over there, if the file isn't
perfect XML, forget it.  Maybe the Perl XML libraries are more forgiving?

Also I would like to make the parsing more grep like.  I have thought of ways
to do this, but have not persued any of them yet.

So where might one find a chunk of XML that was not in a standard XML file?
Maybe in a program that writes XML, or a log file...
-- 
Bill  Schaller   N0PUJ  schaller.william@mayo.edu




------------------------------

Date: 2 Jan 2003 17:24:15 GMT
From: nobody <noemail@nowhere.net>
Subject: Re: xmlgrep
Message-Id: <Xns92F77EEBF2DC3abccbaabc@129.250.170.99>

> Several people have commented about not using the xml libraries in
> Perl.  I didn't use them because I wanted to parse non XML files
> that may contain XML fragments. 

"Parse" the raw data with custom code, extracting the XML parts
however you see fit. Then pass the raw XML to the XML parser as
as a string. (Each fragment you pass must be well-formed.)
XML::Parser will parse a string. 


------------------------------

Date: 02 Jan 2003 18:30:11 +0000
From: Brian McCauley <nobull@mail.com>
Subject: Re: Yet Another Question  About: "my" and scope of vars...
Message-Id: <u97kdnbe18.fsf@wcl-l.bham.ac.uk>

anno4000@lublin.zrz.tu-berlin.de (Anno Siegel) writes:

> >   /href\s*=\s*([\"\'])([^\"\']+)\1/i and push my @urlz, $2 while (<>);
> 
> This behavior is expected and useful in larger loops, 

This is true.

> so Perl can't treat it as an error.

I disagree.  

Perl should treat 'my' as an error (or at the very least emit a
warning) where it appears in a statement with a while/until/if/unless
statement qualifier.

-- 
     \\   ( )
  .  _\\__[oo
 .__/  \\ /\@
 .  l___\\
  # ll  l\\
 ###LL  LL\\


------------------------------

Date: Thu, 02 Jan 2003 16:03:54 -0500
From: Benjamin Goldberg <goldbb2@earthlink.net>
Subject: Re: Yet Another Question  About: "my" and scope of vars...
Message-Id: <3E14A93A.3BA61F9E@earthlink.net>

Brian McCauley wrote:
> 
> anno4000@lublin.zrz.tu-berlin.de (Anno Siegel) writes:
> 
> > > /href\s*=\s*([\"\'])([^\"\']+)\1/i and push my @urlz, $2 while <>;
> >
> > This behavior is expected and useful in larger loops,
> 
> This is true.
> 
> > so Perl can't treat it as an error.
> 
> I disagree.
> 
> Perl should treat 'my' as an error (or at the very least emit a
> warning) where it appears in a statement with a while/until/if/unless
> statement qualifier.

Then use the perlbug program and make a request for it.  There is, after
all, a 'wishlist' level of bug severity, which seems appropriate.

-- 
$..='(?:(?{local$^C=$^C|'.(1<<$_).'})|)'for+a..4;
$..='(?{print+substr"\n !,$^C,1 if $^C<26})(?!)';
$.=~s'!'haktrsreltanPJ,r  coeueh"';BEGIN{${"\cH"}
|=(1<<21)}""=~$.;qw(Just another Perl hacker,\n);


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

The Perl-Users Digest is a retransmission of the USENET newsgroup
comp.lang.perl.misc.  For subscription or unsubscription requests, send
the single line:

	subscribe perl-users
or:
	unsubscribe perl-users

to almanac@ruby.oce.orst.edu.  

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

To request back copies (available for a week or so), send your request
to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
where x is the volume number and y is the issue number.

For other requests pertaining to the digest, send mail to
perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
sending perl questions to the -request address, I don't have time to
answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V10 Issue 4334
***************************************


home help back first fref pref prev next nref lref last post