[25455] in Perl-Users-Digest
Perl-Users Digest, Issue: 7700 Volume: 10
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Thu Jan 27 09:10:29 2005
Date: Thu, 27 Jan 2005 06:10:23 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Thu, 27 Jan 2005 Volume: 10 Number: 7700
Today's topics:
Re: Regexp kicking my ass <noreply@gunnar.cc>
Re: Regexp kicking my ass <terrylr@blauedonau.com>
Re: Regexp kicking my ass <terrylr@blauedonau.com>
Re: Regexp kicking my ass <noreply@gunnar.cc>
Script dumps core....? Any suggestions... <ganesh_tiwari@hotmail.com>
sorting just the largest values <alex_the_hart@yahoo.com>
Re: sorting just the largest values <do-not-use@invalid.net>
Re: sorting just the largest values <nospam@bigpond.com>
Re: sorting just the largest values <someone@example.com>
Re: sorting just the largest values <ebohlman@omsdev.com>
Re: sorting just the largest values <alex_the_hart@yahoo.com>
Why My XML can't be displayed <user@email.com>
Re: Why My XML can't be displayed <terrylr@blauedonau.com>
Re: Why My XML can't be displayed <spamtrap@dot-app.org>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Thu, 27 Jan 2005 06:23:28 +0100
From: Gunnar Hjalmarsson <noreply@gunnar.cc>
Subject: Re: Regexp kicking my ass
Message-Id: <35rc69F4pqmggU1@individual.net>
Tuc wrote:
> I'm trying to get a regexp to make a match, and its not working,
> and its kicking my ass. The text I'm going against is :
>
> $text='<div id="sr_SearchResultsPageNavTop"> <div
> id="sr_SaveSearchImage"><img
> src="http://images.match.com/match//search/sr_NavIconPlaceHolder.gif"
> width="15
> " height="12" alt="" border="0"></div> <div
> id="sr_ViewPhotoGalleryText"><a
> href="come.aspx?sid=A1065D66-8275-47BE-85F2-AC161E2D6D26&theme=214&trackingid=0
> &RN=2102522&lid=7&PN=1&DO=2" class="cssGlobalLinks_PageNav"
> id="lnkSaveThisSearch">viewas photo gallery</a></div> <div
> id="sr_Pagination"><span
> class="cssGlobalSysText_LightGray">page </span><a
> href="some.aspx?sid=A1065D66-8275-47BE-85F2-AC161E2D6D26&theme=214&trackingid=0&RN=2102522&lid=8&PN=1&DO=0"
> class="cssSr_PaginationCurrentPage" id="lnkPage">1</a><a
> href="come.aspx?sid=A1065D66-8275-47BE-85F2-AC161E2D6D26&theme=214&trackingid=0&RN=2102522&lid=8&PN=2&DO=0"class="cssSr_PageNav"
> id="lnkPage">2</a><a
> href="come.aspx?sid=A1065D66-8275-47BE-85F2-AC161E2D6D26&theme=214&trackingid=0&RN=2102522&lid=8&PN=3&DO=0"
> class="cssSr_PageNav" id="lnkPage">';
>
> What I'm looking for is the url between the href and
> cssSr_PaginationCurrentPage .
This may or may not work:
if ( $text =~ /<a\s+href\s*=\s*
(?:(?:(["'])(\S+)\1)|(\S+))
[^>]*class\s*=\s*(?:["'])?cssSr_PaginationCurrentPage/x ) {
print $+;
}
--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
------------------------------
Date: Wed, 26 Jan 2005 23:49:26 -0600
From: "terry l. ridder" <terrylr@blauedonau.com>
Subject: Re: Regexp kicking my ass
Message-Id: <Pine.LNX.4.61.0501262346270.25620@johann.blauedonau.com>
On Wed, 26 Jan 2005, Tuc wrote:
> Hi,
>
> I'm trying to get a regexp to make a match, and its not working,
> and its kicking my ass. The text I'm going against is :
>
<snip>
>
> What I'm looking for is the url between the href and
> cssSr_PaginationCurrentPage . When I do it, it ends ip starting at
> the first href and going all the way to the
> cssSr_PaginationCurrentPage. I've tried \b, I've tried {}, I tried
> ()'s.... And I just can't get it to get the one url of
> some.aspx?sid=A1065D66-8275-47BE-85F2-AC161E2D6D26&theme=214&trackingid=0&RN=2102522&lid=8&PN=1&DO=0
>
> How am I to tell it to start at the cssSr_PaginationCurrentPage
> and work backwards to the first instance of href="
>
perhaps you need to 'divide and conquer'.
this works for me.
use strict;
use warnings;
if ( $text =~ /href="(.*?)class="cssSr_PaginationCurrentPage/s )
{
my $url = $1;
chomp($url);
$url =~ s/^.*?href="//s;
$url =~ s/"$//s;
print STDOUT "url == ``". $url . "''\n";
}
>
>
> Thanks, Tuc
>
>
--
terry l. ridder ><>
------------------------------
Date: Thu, 27 Jan 2005 03:03:20 -0600
From: "terry l. ridder" <terrylr@blauedonau.com>
Subject: Re: Regexp kicking my ass
Message-Id: <Pine.LNX.4.61.0501270302170.25620@johann.blauedonau.com>
On Thu, 27 Jan 2005, Gunnar Hjalmarsson wrote:
>
> This may or may not work:
>
> if ( $text =~ /<a\s+href\s*=\s*
> (?:(?:(["'])(\S+)\1)|(\S+))
> [^>]*class\s*=\s*(?:["'])?cssSr_PaginationCurrentPage/x ) {
> print $+;
> }
>
>
that works rather well.
beats my 'divide and conquer' approach.
--
terry l. ridder ><>
------------------------------
Date: Thu, 27 Jan 2005 12:41:13 +0100
From: Gunnar Hjalmarsson <noreply@gunnar.cc>
Subject: Re: Regexp kicking my ass
Message-Id: <35s2frF4nocevU1@individual.net>
terry l. ridder wrote:
> On Thu, 27 Jan 2005, Gunnar Hjalmarsson wrote:
>> This may or may not work:
>>
>> if ( $text =~ /<a\s+href\s*=\s*
>> (?:(?:(["'])(\S+)\1)|(\S+))
>> [^>]*class\s*=\s*(?:["'])?cssSr_PaginationCurrentPage/x ) {
>> print $+;
>> }
>
> that works rather well.
A shorter (and clearer) variant would be:
if ( $text =~ /href\s*=\s*
(?:
(?:
(["'])(\S+)\1 # quoted URL
)
|
(\S+) # non-quoted URL
)
[^>]+cssSr_PaginationCurrentPage/x ) {
print $+;
}
Yeah, it works, provided that
1) the class attribute actually does come after the href attribute, and
2) no 'weird' attribute such as
someattr="x > z"
has been put in between.
Which I suppose illustrates Bob's point that it *is* difficult to parse
HTML with regular expressions...
--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
------------------------------
Date: 27 Jan 2005 02:11:03 -0800
From: "Gancy" <ganesh_tiwari@hotmail.com>
Subject: Script dumps core....? Any suggestions...
Message-Id: <1106820663.908332.11220@f14g2000cwb.googlegroups.com>
Here is the snipet of the perl script, I have perl version v5.8.5 built
for sun4-solaris. I have run this script on thousands of 'c','C++'
headers and source files. Runs smoothly as my new ESTEEM car. But i
have one surce file toke.c in my test case. soon this scripts hits this
file at it dumps. I have tried and still trying to debug, but still no
solutions. If anybody can help me with this would be of great
appreciation. I can uploaded source file (toke.c) as well as core file
frames(core), if needed.
#!/usr/bin/perl
$np = qr{
\(
(?:
(?>[^()]+ )
|
(??{ $np })
)*
\)
}x;
$funpat = qr/((\W)?(\*?\*?\w+)\s*($np))/;
my $temp;
open (FILE, "toke.c") || die "Cannot open file";
while($temp = <FILE>)
{
$tstring.=$temp;
}
close FILE;
get_fn_call($tstring);
sub get_fn_call($){
my ($cur_str) = @_;
while( $cur_str =~ m/$funpat/g )
{
$4 =~ /^\(((.*\n*.*)*)\)$/;
get_fn_call($1);
}
}
Message 604 of 606 | Pre
------------------------------
Date: 27 Jan 2005 00:34:52 -0800
From: "Alex Hart" <alex_the_hart@yahoo.com>
Subject: sorting just the largest values
Message-Id: <1106814892.550560.192190@c13g2000cwb.googlegroups.com>
I have a very large list of data (up to 20,000 lines), and I just want
to print out the largest 50 values. What is the quickest way to sort
this using perl?
Thanks in advance.
- Alex Hart
------------------------------
Date: 27 Jan 2005 09:43:24 +0100
From: Arndt Jonasson <do-not-use@invalid.net>
Subject: Re: sorting just the largest values
Message-Id: <yzdmzuvjm8j.fsf@invalid.net>
"Alex Hart" <alex_the_hart@yahoo.com> writes:
> I have a very large list of data (up to 20,000 lines), and I just want
> to print out the largest 50 values. What is the quickest way to sort
> this using perl?
For very large lists of data: while reading in the lines, keep a
sorted array with the up to 50 largest seen values.
But 20000 is not very large, or even large. Read the whole file and
sort it.
If you say "sort" to search.cpan.org, it will show you some modules
that are likely to be useful.
------------------------------
Date: Thu, 27 Jan 2005 19:15:24 +1000
From: Gregory Toomey <nospam@bigpond.com>
Subject: Re: sorting just the largest values
Message-Id: <35rppdF4qn7beU1@individual.net>
Alex Hart wrote:
> I have a very large list of data (up to 20,000 lines), and I just want
> to print out the largest 50 values. What is the quickest way to sort
> this using perl?
>
> Thanks in advance.
>
> - Alex Hart
Selection sort - just do the outer loop 50 times.
http://en.wikipedia.org/wiki/Selection_sort
The time complexity is O(n) (sorting top k items ;k<<n)
compared to O(n log n) for sorting the whole list.
gtoomey
------------------------------
Date: Thu, 27 Jan 2005 09:29:01 GMT
From: "John W. Krahn" <someone@example.com>
Subject: Re: sorting just the largest values
Message-Id: <xv2Kd.47543$Qb.27450@edtnps89>
Alex Hart wrote:
> I have a very large list of data (up to 20,000 lines), and I just want
> to print out the largest 50 values. What is the quickest way to sort
> this using perl?
my @largest = `sort -n yourfile | tail -50`;
John
--
use Perl;
program
fulfillment
------------------------------
Date: 27 Jan 2005 09:32:06 GMT
From: Eric Bohlman <ebohlman@omsdev.com>
Subject: Re: sorting just the largest values
Message-Id: <Xns95EB253494BA2ebohlmanomsdevcom@130.133.1.4>
"Alex Hart" <alex_the_hart@yahoo.com> wrote in
news:1106814892.550560.192190@c13g2000cwb.googlegroups.com:
> I have a very large list of data (up to 20,000 lines), and I just want
> to print out the largest 50 values. What is the quickest way to sort
> this using perl?
Most people wouldn't consider 20,000 items to be a "very large list."
I'd suggest first just trying the brute-force approach of reading them
into an array, sorting them, and taking the last 50 results. Only if
this turns out to be unacceptably slow or memory-consuming for your
application should you consider something different. Remember that
perl's sort routines are implemented in optimized C code, so even if you
implement a lower-time-complexity algorithm yourself, the value of N for
which it becomes faster may be quite large.
If the brute-force approach turns out to be unacceptable (again, based on
actual measurement, not guesswork), here's a linear-time (O(N)) selection
algorithm:
1) Read the first 50 values into the top 50 list.
2) For each subsequent value, find the smallest value in the top 50 list
that's less than it. If there isn't one, do nothing. If there is, kick
it out of the list and add the new value to the list.
Step 2 will be simpler to program if you sort the list once after step 1.
You'll find shift() and splice() particularly helpful.
[UNTESTED]
my @top;
while (<>) {
push @top,$_ if $.<=50;
@top=sort @top if $.=50;
if ($.>50) {
for (my $i=49;$i>=0;--$i) {
if ($top[$i] lt $_) {
# this is where the new value belongs
splice @top,$i,0,$_; # insert the new value
shift @top; # remove the smallest value
last;
}
}
}
}
------------------------------
Date: 27 Jan 2005 03:13:09 -0800
From: "Alex Hart" <alex_the_hart@yahoo.com>
Subject: Re: sorting just the largest values
Message-Id: <1106824389.127379.125340@c13g2000cwb.googlegroups.com>
> Most people wouldn't consider 20,000 items to be a "very large list."
> I'd suggest first just trying the brute-force approach of reading
them
> into an array, sorting them, and taking the last 50 results. Only if
I'm developing a real-time application, and a particular sort was
taking several minutes when the list got up to 20,000. It might be a
problem with the data being too ordered to start with (or does per
shuffle before it sorts?). I already extract the sort key and just
sort on that. I can try the packed string sort-key, but I think even a
slow method of finding the top 50 must be faster than sorting the whole
list.
I'll try some of the suggestions here, and I'll be sure to benchmark
along the way.
------------------------------
Date: Thu, 27 Jan 2005 14:55:30 +0800
From: Belinda Wu <user@email.com>
Subject: Why My XML can't be displayed
Message-Id: <1106808930.666345@cswreg.cos.agilent.com>
I have a perl cgi script it can generate a xml file then I'd like to
display this xml file on the web. Code looks like below.
use CGI::Carp qw(fatalsToBrowser);
use XML::Writer;
my $xmlWri = XML::Writer->new( );
print "Content-type: text/xml\n\n";
$xmlWri->xmlDecl ('UTF-8', 'yes');
$xmlWri->pi('xml-stylesheet', 'type="text/xsl" href="Standard.xsl"');
$xmlWri->startTag('TEST");
....
.....
$xmlWri->endTag("TEST");
I get error infromation from Apache errlog file
[Wed Jan 26 18:58:37 2005] access to
/opt/apache/cgi-bin/cats/Reports/Standard.x
sl failed for belinda.chn.agilent.com, reason: Premature end of script
heade
rs
exec of /opt/apache/cgi-bin/cats/Reports/Standard.xsl failed, reason:
Exec f
orma
t error (errno = 8)
[Wed Jan 26 18:58:37 2005] access to
/opt/apache/cgi-bin/cats/Reports/Standa
rd.x
sl failed for belinda.chn.agilent.com, reason: Premature end of script
heade
rs
I got error message from IE
The stylesheet does not contain a document element. The stylesheet may
be empty, or it may not be a well-formed XML documen...
I'm trying to save the xml, generated by that script, under htdocs it
can be displayed correctly.
Can somebody tell me why this?
------------------------------
Date: Thu, 27 Jan 2005 02:20:55 -0600
From: "terry l. ridder" <terrylr@blauedonau.com>
Subject: Re: Why My XML can't be displayed
Message-Id: <Pine.LNX.4.61.0501270217050.25620@johann.blauedonau.com>
On Thu, 27 Jan 2005, Belinda Wu wrote:
> I have a perl cgi script it can generate a xml file then I'd like to
> display this xml file on the web. Code looks like below.
>
> use CGI::Carp qw(fatalsToBrowser);
> use XML::Writer;
> my $xmlWri = XML::Writer->new( );
> print "Content-type: text/xml\n\n";
> $xmlWri->xmlDecl ('UTF-8', 'yes');
> $xmlWri->pi('xml-stylesheet', 'type="text/xsl" href="Standard.xsl"');
> $xmlWri->startTag('TEST"); <------note a single quote and a double quote.
well it would help to use the same type of quote.
> ....
> .....
> $xmlWri->endTag("TEST");
>
>
> I get error infromation from Apache errlog file
> [Wed Jan 26 18:58:37 2005] access to
> /opt/apache/cgi-bin/cats/Reports/Standard.xsl failed for
> belinda.chn.agilent.com, reason:
> Premature end of script headers
> exec of /opt/apache/cgi-bin/cats/Reports/Standard.xsl failed,
> reason: Exec format error (errno = 8)
> [Wed Jan 26 18:58:37 2005] access to
> /opt/apache/cgi-bin/cats/Reports/Standard.xsl failed for
> belinda.chn.agilent.com, reason: Premature end of script headers
>
generally those would be the errors received when you mix single and
double quotes.
>
> I got error message from IE
> The stylesheet does not contain a document element. The stylesheet may
> be empty, or it may not be a well-formed XML documen...
>
>
> I'm trying to save the xml, generated by that script, under htdocs it
> can be displayed correctly.
>
> Can somebody tell me why this?
>
see above.
--
terry l. ridder ><>
------------------------------
Date: Thu, 27 Jan 2005 03:56:24 -0500
From: Sherm Pendley <spamtrap@dot-app.org>
Subject: Re: Why My XML can't be displayed
Message-Id: <v42dnZYa9qUkMWXcRVn-hA@adelphia.com>
terry l. ridder wrote:
> On Thu, 27 Jan 2005, Belinda Wu wrote:
>
>> $xmlWri->pi('xml-stylesheet', 'type="text/xsl" href="Standard.xsl"');
>> $xmlWri->startTag('TEST"); <------note a single quote and a double quote.
>
> well it would help to use the same type of quote.
Yes it would, but...
>> [Wed Jan 26 18:58:37 2005] access to
>> /opt/apache/cgi-bin/cats/Reports/Standard.xsl failed for
>> belinda.chn.agilent.com, reason:
>> Premature end of script headers
>
> generally those would be the errors received when you mix single and
> double quotes.
Terry: Did you *read* the errors? Apache is trying to run the XSL as a CGI,
which it obviously is not.
Belinda: Store your XSL somewhere outside of cgi-bin, and specify the full
path to it in your href.
sherm--
--
Cocoa programming in Perl: http://camelbones.sourceforge.net
Hire me! My resume: http://www.dot-app.org
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc. For subscription or unsubscription requests, send
#the single line:
#
# subscribe perl-users
#or:
# unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.
NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice.
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.
#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V10 Issue 7700
***************************************