[32340] in Perl-Users-Digest
Perl-Users Digest, Issue: 3607 Volume: 11
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Tue Feb 7 03:09:26 2012
Date: Tue, 7 Feb 2012 00:09:08 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Tue, 7 Feb 2012 Volume: 11 Number: 3607
Today's topics:
Re: Loop in regex match <ben@morrow.me.uk>
Re: Loop in regex match sln@netherlands.com
Re: Loop in regex match sln@netherlands.com
Re: Loop in regex match (Seymour J.)
Re: Loop in regex match (Seymour J.)
Re: Perl 32-bit vs 64-bit question <vilain@NOspamcop.net>
Re: Perl 32-bit vs 64-bit question <ben@morrow.me.uk>
Re: WWW::Mechanize and outputing what's returned <justin.1201@purestblue.com>
Re: WWW::Mechanize and outputing what's returned <ben@morrow.me.uk>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Sun, 5 Feb 2012 23:43:29 +0000
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: Loop in regex match
Message-Id: <1qg309-jq91.ln1@anubis.morrow.me.uk>
Quoth Shmuel (Seymour J.) Metz <spamtrap@library.lspace.org.invalid>:
> I have an regex that appears to hang where I would expect it to fail
> due to \d not matching a comma. If I remove the \d then the regex
> correctly matches the partial string. the match that hangs is at line
> 452 of the script:
The only usual reason for a regex to hang is because you have nested
quantifiers that require an exponential number of backtracks to check
all possibilities. The standard example would be something like
("a" x 50) =~ /(a+a+)+b/
which will not match, but will do a lot of backtracking before it
decides that. I can't see any obvious case of that in the code you
posted, but you may want to run the program under use re "debug" to get
some idea of where the match is getting stuck. Otherwise, you'll need to
start ripping bits out of the pattern and/or the string until you find
the problem.
If you do find that's the problem, the usual solution (assuming you
can't get rid of the nested quantifiers) is to use (?>) to limit the
backtracking. (You have to be careful, of course, to still allow
backtracking where it is needed.) 5.10 also has '++', '?+' and '*+'
variants of the quantifiers which never backtrack.
Ben
------------------------------
Date: Sun, 05 Feb 2012 18:24:33 -0800
From: sln@netherlands.com
Subject: Re: Loop in regex match
Message-Id: <udeui79a99fmv3kvaebcvku3spdfhgd145@4ax.com>
On Sun, 05 Feb 2012 11:13:17 -0500, Shmuel (Seymour J.) Metz <spamtrap@library.lspace.org.invalid> wrote:
>I have an regex that appears to hang where I would expect it to fail
>due to \d not matching a comma. If I remove the \d then the regex
>correctly matches the partial string. the match that hangs is at line
>452 of the script:
>
>if (/($RecFromPat $RecByPat? $RecOptInfo \; $CFWS? \d )/xi)
>
I suspect the massive backtracking is being caused by the
usage of the combination of:
/$RecFromPat $RecByPat? $RecOptInfo/x
wherever it is being used like that.
This means there are underlying real world problems with the construction
(ie; lack of thought) of the core regex's.
I fixed your temporary problems, but will not try to debug the flawed massive
composite regex's in your program.
From the comments in the code, it looks as if you are trying to program to a standard.
That runs more the realm of validation first, data extraction second.
These things are easy to misconstrue when you try to mix and match when it comes to
quantifiers.
I was going to explicitly state the changes you should make, but it was to much work
given the flawwed basis. You should pick out the changes.
Good luck shmuel !
PS. I can donate my services for a donation!
(Donations are negotiable)
PPS. Be careful with (?R), it recurses to the beginning of the whole (composite?) regex,
although, I'm only %90 sure that escapes the enclosing qr// object.
-sln
-----------------
use Data::Dumper;
use Regexp::Common qw /net URI/;
use Socket;
use strict;
my $decOctetPat = qr/ \d |
[1-9] \d |
1 \d \d |
2 [0-4] \d |
25 [0-5]
/x;
my $IPv4addressPat = qr/ (?:$decOctetPat\.){3} $decOctetPat /x;
my $IPv6h16 = qr/[[:xdigit:]]{1,4}/;
my $IPv6ls32 = qr/ $IPv6h16 \: $IPv6h16 | $IPv4addressPat /x;
my $IPv6AddrPat = qr/ (?: (?: $IPv6h16 \: ){6} $IPv6ls32 ) |
(?: \:\: (?: $IPv6h16 \: ){5} $IPv6ls32 ) |
(?: (?: $IPv6h16 )? \:\: (?: $IPv6h16 \: ){4} $IPv6ls32 ) |
(?: (?: $IPv6h16 \: $IPv6h16 )? \:\: (?: $IPv6h16 \: ){3} $IPv6ls32 ) |
(?: (?: (?: $IPv6h16 \: ){2} $IPv6h16 )? \:\: (?: $IPv6h16 \: ){2} $IPv6ls32 ) |
(?: (?: (?: $IPv6h16 \: ){3} $IPv6h16 )? \:\: $IPv6h16 \: $IPv6ls32 ) |
(?: (?: (?: $IPv6h16 \: ){5} $IPv6h16 )? \:\: $IPv6ls32 ) |
(?: (?: (?: $IPv6h16 \: ){6} $IPv6h16 )? \:\: )
/x;
my $domainPat = qr/[[:alnum:]]+
[[:alnum:]-]*
(?:\. [[:alnum:]]+ [[:alnum:]-]*)*
/x;
my $addressLiteralPat = qr/\[
(?:$IPv4addressPat |
$IPv6AddrPat
)
\]
/x;
my $atextPat = qr"(?:[\w!#\$%&'*+/=?^`{|}~-]+)";
#y $FWS = qr/ (?:[ \t]*\15?\12)? [ \t]+ /x;
my $FWS = qr/ (?:[ \t]*\15?\n)? [ \t]+ /x;
my $atomPat = qr/$atextPat+/x;
my $ctext = '[\x21-\x27\x2A-\x5B\x5D-\x7E]';
my $quotedPairPat = qr/ \\ [\x20-\x7E] /x;
my $commentPat =qr/
(
\(
(?:
(?> (?:\\[\s\S] | [^(\\)])+ )
| (?-1)
)*
\)
)
/x;
my $CFWS = qr/
(?: (?:$FWS+ $commentPat)+ $FWS?) |
$FWS
/x;
my $dayPat = qr/$CFWS? (?<DAY>\d{1,2}) $CFWS?/x;
# per RFC 5322 case matters
my $day_of_weekPat = qr/
$CFWS
(?<DAY_OF_WEEK>
Mon |
Tue |
Wed |
Thu |
Fri |
Sat |
Sun
)
$CFWS?
/x;
my $dotStringPat = qr/$atextPat+ (?:\. $atextPat)*/x;
my $dtextPat = '[\x21-\x50\x54-\x7E]';
my $hourPat = qr/(?<HOUR>
(\d\d)
(?(?{$^N > 24})
(*FAIL)
)
)
/x;
my $idLeftPat = qr/$dotStringPat/;
my $LinkPat = qr/TCP | $atomPat/xi;
my $noFoldLiteralPat = qr/\[ $dtextPat* \]/x;
my $idRightPat = qr/$dotStringPat | $noFoldLiteralPat/x;
my $msgIdPat = qr/\s* \< $idLeftPat \@ $idRightPat \> \s*/x;
my $minutePat = qr/(?<MINUTE>
(\d\d)
(?(?{$^N > 59})
(*FAIL)
)
)
/x;
# per RFC 5322 case matters
my $monthPat = qr/
(?<MONTH>
Jan |
Feb |
Mat |
Apr |
May |
Jum |
Jul |
Aug |
Sep |
Oct |
Mov |
Dec
)
/x;
my $obs_zonePat = qr/
CDT |
CST |
EDT |
EST |
GMT |
MDT |
MST |
PDT |
PST |
UT |
[A-IK-Za-ik-z]
/x;
my $qtextPat = '[\x20-\x21\x23-\x5B\x5d-\x7E]';
my $QcontentPat = qr/$qtextPat | $quotedPairPat/x;
my $QuotedStringPat = qr/"$QcontentPat*"/;
my $rDNSstat =qr/
\s*
(?:\s* \( may \s be \s forged \) ) |
(?: \s* \( misconfigured \s sender \) ) |
(?: \s* RDNS \s failed)
/x;
# Non-5321 prefix to domain in TCP-INFO
my $RecLocalPat = '(?:(?:IDENT:)?[\w+-]+[\w\.+-]*@)?';
my $secondPat = qr/(?<SECOND>
(\d\d)
(?(?{$^N > 59})
(*FAIL)
)
)
/x;
# Malformed Received headers may have 'RDNS failed' after the IP address
# or a dotted quad without framing []
my $TCPinfoPat =qr/
(?<IP>$addressLiteralPat) (?:\s* RDNS \s failed)? |
(?<IP>$IPv4addressPat) |
(?:$RecLocalPat
(?<RDNS>$domainPat)
$FWS
(?<IP>$addressLiteralPat)
$rDNSstat*
)
/x;
my $time_of_day = qr/$hourPat : $minutePat (?: : $secondPat)?/x;
my $yearPat = qr/$FWS (?<YEAR>\d{2,4}) $FWS/x;
# RFC 5322 semantic constraint not applied in order to match malformed zones.
my $zonePat = qr/$FWS
(?<ZONE>
(?:
(?:
[+-]
\d\d\d\d
) |
$obs_zonePat
)
)
/x;
# RFC 5322 shows spaces in day and year, not here
my $datePat = qr/$dayPat $monthPat $yearPat/x;
# Received: FROM non-5321 tokens seen in the wild
my $Non5321DomainPat = qr/
\. |
$IPv4addressPat |
\d+
/x;
# Malformed Received headers may have a leading hyphen in a
# domain name, a period as a domain name or an address
# literal without TCPINFO. They may also have an IPv4
# address expressed as a hexadecimal, decimal or octal constant.
my $ExtendedDomainPat = qr/
(?:(?<HELO>-?$domainPat) \s (?<IP>$addressLiteralPat)) |
(?:(?<HELO>-?$domainPat) (?:$FWS \( $TCPinfoPat \))?) |
(?<IP>$addressLiteralPat) (?:$FWS \( $TCPinfoPat \))? |
$Non5321DomainPat (?:$FWS \( $TCPinfoPat \))?
/x;
my $localPartPat = qr/$dotStringPat | $QuotedStringPat/x;
my $MailboxPat = qr/(?<LOCAL_PART>$localPartPat) \@ (?<DOMAIN>$domainPat | $addressLiteralPat)/x;
my $protocolPat = qr/SMTP | ESMTP | $atomPat/xi;
# I don't expect to see source routing in the wild
my $RecPathPat = qr/
\<
(?:\@ $domainPat (?:, \@ $domainPat)* :)?
$MailboxPat
\>
/x;
# Can't use $RE{net}{domain} due to malformed domain names
my $RecHELOpat = "(?<HELO>(?:-?$domainPat)|" .
"\\.|" .
"(?:\\[$RE{net}{IPv4}\\])|" .
"$RE{net}{IPv4}|" .
"\\d+)";
# Road Runner Received: FROM
my $RRfromPat = qr/
(?<HELO>$RE{net}{IPv4})
\s+
\(
Forwarded-For:
\s
\[
(?<IP>$RE{net}{IPv4})
\]
\)
/x;
# QMAIL Received: FROM
my $QMfromPat = qr/(?<IP>$RE{net}{IPv4})
\s+
\(
\[
(?<RDNS>$domainPat)
\]
:
\d+
\s+
"
\w+
\s*
\[
(?<HELO>$domainPat)
\]
"
[^)]*
\)';
/x;
my $RecSrcPat = qr/$RecLocalPat
(?<RDNS>$domainPat)?
\s*
\[
(?<IP>$RE{net}{IPv4})
\]
\s*
(?:\(may\sbe\sforged\))?
\s*
(?:\(misconfigured\ssender\))?
\s*
(?:\s*RDNS\sfailed)?
/x;
# The RFC 5321 syntax for From-domain does not allow an address literal without
# TCP-info in parentheses, but Yahoo creates a Stamp in that format.
# Some software puts significant information in comments beyond the
# TCPINFO of the Extended-Domain.
my $RecFromPat = qr/^
FROM
$FWS
(?<FROM>
$ExtendedDomainPat |
(?:
(
\[
(?<IP>$IPv4addressPat)
\]
)
\s*
\(
HELO=$RecHELOpat
\)
) |
(?:(?<RDNS>$domainPat)
\s+
\(
\[
(?<IP>$IPv4addressPat)
\]
\s+
HELO=$RecHELOpat
\)
) |
(?:(?<RDNS>$domainPat)
\s+
\(
HELO
\s
$RecHELOpat
\)
\s+
\(
\[
(?<IP>$IPv4addressPat)
\]
\)
) |
$QMfromPat |
$RRfromPat
)
/xi;
# per RFC 5321 it's CFWS "BY" FWS Extended-Domain
# in the wild it's CFWS "BY" FWS Domain FWS '(' MTA ')'
my $RecByPat = qr!$CFWS
BY
$FWS
(?<BY1>
(?:$domainPat |
\[ $RE{net}{IPv4} \]
)
)
(?:
$FWS
\(
(?<BY2>[\s\w\./-]+)
\)
)?
!xi;
my $RecForPat = qr/$CFWS FOR $FWS (?: $RecPathPat | $MailboxPat)/xi;
my $RecIdPat = qr/$CFWS ID $FWS (?<ID>$atomPat | $msgIdPat)/xi;
my $RecViaPat = qr/$CFWS VIA $FWS (?<LINK>$LinkPat)/xi;
# m$ lookout violates RFC 5321 syntax
my $RecWithMS = qr/Microsoft \s+ (?:ESMTP|SMTP) (?:\s+ Server | SVC\(\d+(?:\.\d+)*\))/xi;
my $RecWithPat = qr/$CFWS
WITH $FWS
(?:
(?:ESMTP|SMTP) |
$RecWithMS |
NNFMP # Yahoo
)
(?:
$FWS
\(
SMTP
[\d\w\.-]*
\)
)?
/xi;
my $RecOptInfo = qr/
(?<VIA>$RecViaPat)?
(?<WITH>$RecWithPat)?
(?:$RecIdPat)?
(?<FOR>$RecForPat)?
/xi;
my $timePat = qr/$time_of_day $zonePat/x;
# RFC 5322 shows spaces in day and year, not here
my $date_timePat = qr/
(?: $day_of_weekPat [,])?
$datePat
$timePat
(?:$CFWS)?
/x;
my $RecPat = qr/^
$RecFromPat
$RecByPat
$RecOptInfo
$CFWS?
\;
# $date_timePat
$datePat
# $timePat
# $time_of_day
$hourPat
/x;
my @testheaders = (<<'EOF1',<<'EOF2',<<'EOF3',<<'EOF4',<<'EOF5',<<'EOF6',<<EOF7,<<EOF8);
from amethyst.nstc.com (majordomo@amethyst.nstc.com [207.166.196.179]) by mail.acm.org (8.8.5/8.7.5) with ESMTP id AAA44952 for <Shmuel@ACM.Org>; Wed, 6 Jan 1999 00:06:57 -0500
EOF1
(from majordomo@localhost)
by amethyst.nstc.com (8.9.1/8.9.1/nstc.com) id AAA16796
for freemail-outgoing; Wed, 6 Jan 1999 00:26:52 -0500
EOF2
from devel.nacs.net (IDENT:root@devel.nacs.net [207.166.192.85])
by amethyst.nstc.com (8.9.1/8.9.1/nstc.com) with ESMTP id AAA16789
for <freemail@nstc.com>; Wed, 6 Jan 1999 00:26:49 -0500
EOF3
from relay1.mnsinc.com (relay1.mnsinc.com [206.55.3.25])
by devel.nacs.net (8.8.7/8.8.8) with ESMTP id WAA10733
for <freemail@nstc.com>; Tue, 5 Jan 1999 22:44:35 -0500
EOF4
from U86 (u86.os2bbs.com [206.55.10.86])
by relay1.mnsinc.com (8.9.0/8.9.0) with SMTP id WAA29325
for <freemail@nstc.com>; Tue, 5 Jan 1999 22:39:39 -0500 (EST)
EOF5
from localhost (localhost [127.0.0.1])
by lincoln-at-leros.patriot.net (Postfix) with ESMTP id 12BBE55E73
for <marianne@patriot.net>; Fri, 27 Jan 2012 09:23:59 -0500 (EST)
EOF6
from mail.acm.org [199.222.69.4] by piglet.toward.com with ESMTP
(SMTPD32-4.06) id AF982F028E; Wed, 06 Jan 1999 00:07:36 EDT
EOF7
from mail.acm.org [199.222.69.4] by piglet.toward.com with ESMTP
(foo) id AF982F028E; Wed, 06 Jan 1999 00:07:36 EDT
EOF8
msg("\n\@testheaders has " . scalar @testheaders . " lines\n");
foreach (@testheaders) {
msg("\n\t --> $_\n");
my $RecTest = qr/
(?>
$RecFromPat
$RecByPat?
$RecOptInfo
)
$CFWS?
\;
(?: $day_of_weekPat [,])?
$datePat
$timePat
(?:$CFWS)?
/xi;
if (/$RecTest/) {
msg("\nMatched \$RecTest\n");
foreach my $key (sort keys %+) {
print STDERR "\$+{$key}=$+{$key}\n";
}
msg("\n");
foreach my $key (sort keys %-) {
print STDERR "\$-{$key}=",grep defined, @{$-{$key}},"\n";
}
} else {
msg("\nDid not match\$RecTest\n");
}
# if (/($RecFromPat $RecByPat? $RecOptInfo \; $CFWS? (?<DAY>\d{1,2}) )/xi) {
# if (/($RecFromPat $RecByPat? $RecOptInfo \; $CFWS? \d{1,2} )/xi) {
if (/((?> $RecFromPat $RecByPat? $RecOptInfo) \; $CFWS? \d )/xi) {
msg("\nMatched $1\n");
# msg("\nDumper(\%+):\n");
# msg(Dumper(%+),"\n");
# msg("\nDumper(\%-):\n");
# msg(Dumper(%-),"\n");
foreach my $key (sort keys %+) {
print STDERR "\$+{$key}=$+{$key}\n";
}
msg("\n");
foreach my $key (sort keys %-) {
print STDERR "\$-{$key}=",grep defined, @{$-{$key}},"\n";
}
} else {
msg("\nDid not match\n");
}
msg("\n");
}
sub msg {
print STDERR @_;
}
1;
__END__
------------------------------
Date: Sun, 05 Feb 2012 18:27:57 -0800
From: sln@netherlands.com
Subject: Re: Loop in regex match
Message-Id: <iieui7dc8d50dnjdjufp4i3j5muhm5g92c@4ax.com>
On Sun, 5 Feb 2012 23:43:29 +0000, Ben Morrow <ben@morrow.me.uk> wrote:
>
>Quoth Shmuel (Seymour J.) Metz <spamtrap@library.lspace.org.invalid>:
>> I have an regex that appears to hang where I would expect it to fail
>> due to \d not matching a comma. If I remove the \d then the regex
>> correctly matches the partial string. the match that hangs is at line
>> 452 of the script:
>
>The only usual reason for a regex to hang is because you have nested
>quantifiers that require an exponential number of backtracks to check
>all possibilities. The standard example would be something like
>
> ("a" x 50) =~ /(a+a+)+b/
>
>which will not match, but will do a lot of backtracking before it
>decides that. I can't see any obvious case of that in the code you
>posted, but you may want to run the program under use re "debug" to get
>some idea of where the match is getting stuck. Otherwise, you'll need to
>start ripping bits out of the pattern and/or the string until you find
>the problem.
>
>If you do find that's the problem, the usual solution (assuming you
>can't get rid of the nested quantifiers) is to use (?>) to limit the
>backtracking. (You have to be careful, of course, to still allow
>backtracking where it is needed.) 5.10 also has '++', '?+' and '*+'
>variants of the quantifiers which never backtrack.
>
>Ben
The backtracking is massive.
-sln
------------------------------
Date: Mon, 06 Feb 2012 10:51:38 -0500
From: Shmuel (Seymour J.) Metz <spamtrap@library.lspace.org.invalid>
Subject: Re: Loop in regex match
Message-Id: <4f2ff70a$13$fuzhry+tra$mr2ice@news.patriot.net>
In <udeui79a99fmv3kvaebcvku3spdfhgd145@4ax.com>, on 02/05/2012
at 06:24 PM, sln@netherlands.com said:
>I suspect the massive backtracking is being caused by the usage of
>the combination of:
>/$RecFromPat $RecByPat? $RecOptInfo/x
my $RecOptInfo = qr/
(?<VIA>$RecViaPat)?+
(?<WITH>$RecWithPat)?+
(?:$RecIdPat)?+
(?<FOR>$RecForPat)?+
/xi;
>wherever it is being used like that.
At Ben's suggestion I made some of the matches greedy and that seems
to have solved the problem. $CFWS was indeed one of the regexen I had
to change:
my $CFWS = qr/
(?: (?:$FWS+ $commentPat)++ $FWS?+) |
$FWS
/x;
>I fixed your temporary problems,
It looks like you changed the syntax of comments; \S will match
characters that were excluded from $ctext.
>From the comments in the code, it looks as if you are trying to
>program to a standard.
Theoretically. The Received header fields is described by RFC 5321,
but there's a lot of software that's non-compliant and I need to parse
the header fields that they generate as well as the compliant ones.
That forces this to be an exercise in rapid prototyping :-(
--
Shmuel (Seymour J.) Metz, SysProg and JOAT <http://patriot.net/~shmuel>
Unsolicited bulk E-mail subject to legal action. I reserve the
right to publicly post or ridicule any abusive E-mail. Reply to
domain Patriot dot net user shmuel+news to contact me. Do not
reply to spamtrap@library.lspace.org
------------------------------
Date: Mon, 06 Feb 2012 10:55:29 -0500
From: Shmuel (Seymour J.) Metz <spamtrap@library.lspace.org.invalid>
Subject: Re: Loop in regex match
Message-Id: <4f2ff74e$14$fuzhry+tra$mr2ice@news.patriot.net>
In <1qg309-jq91.ln1@anubis.morrow.me.uk>, on 02/05/2012
at 11:43 PM, Ben Morrow <ben@morrow.me.uk> said:
>Otherwise, you'll need to start ripping bits out of the pattern
>and/or the string until you find the problem.
I've been able to reproduce the problem using a much smaller pattern,
and I initially tried making things greedy $CFWS, but it was still
hanging
my $FWS = qr/ (?:[ \t]*+\R)? [ \t]++ /x;
my $ctext = '[\x21-\x27\x2A-\x5B\x5D-\x7E]';
my $quotedPairPat = qr/ \\ [\x20-\x7E] /x;
my $commentPat =qr/
\(
(?:$FWS?+
(?:$ctext | $quotedPairPat | (?R))
)*
$FWS?+
\)
/x;
my $CFWS = qr/
(?: (?:$FWS+ $commentPat)++ $FWS?+) |
$FWS
/x;
if (/( \; (?>$CFWS)? \d )/xi)
The other piece I needed was greedy matches in
my $RecOptInfo = qr/
(?<VIA>$RecViaPat)?+
(?<WITH>$RecWithPat)?+
(?:$RecIdPat)?+
(?<FOR>$RecForPat)?+
/xi;
Thanks.
--
Shmuel (Seymour J.) Metz, SysProg and JOAT <http://patriot.net/~shmuel>
Unsolicited bulk E-mail subject to legal action. I reserve the
right to publicly post or ridicule any abusive E-mail. Reply to
domain Patriot dot net user shmuel+news to contact me. Do not
reply to spamtrap@library.lspace.org
------------------------------
Date: Mon, 06 Feb 2012 21:41:22 -0800
From: Michael Vilain <vilain@NOspamcop.net>
Subject: Re: Perl 32-bit vs 64-bit question
Message-Id: <vilain-3A3E09.21412106022012@news.individual.net>
In article
<7e2122cd-2e89-4f88-b05b-acaedb1e3ba8@4g2000pbz.googlegroups.com>,
snorble <snorble@hotmail.com> wrote:
> When I have used Python 64-bit I have run into problems (third party
> libraries not working, etc). So I just use Python 32-bit on everything
> and have no problems. Are there any similar problems with Perl? On a
> 64-bit computer, am I better off with 64-bit, or better off running 32-
> bit Perl on everything? I'm looking to use Perl on Windows systems for
> administration purposes. Thank you for any guidance you can provide.
I've always heard that a 64-bit executable can't load 32-bit shared
libraries. I may be wrong with that in MacOS in that 10.6 (Snow
Leopard) is 32-bit but can run 64-bit executables on a suitable
processor. Windows 7 has two flavors--32-bit and 64-bit. The drivers
have to be the same but I think programs don't matter as much unless you
attempt to mix 32 and 64-bit libraries.
Mixing compiled shared libraries is the problem. I don't think this
problem is only with Python. But I could be wrong.
--
DeeDee, don't press that button! DeeDee! NO! Dee...
[I filter all Goggle Groups posts, so any reply may be automatically ignored]
------------------------------
Date: Tue, 7 Feb 2012 07:32:14 +0000
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: Perl 32-bit vs 64-bit question
Message-Id: <uk0709-bu92.ln1@anubis.morrow.me.uk>
Quoth Michael Vilain <vilain@NOspamcop.net>:
> In article
> <7e2122cd-2e89-4f88-b05b-acaedb1e3ba8@4g2000pbz.googlegroups.com>,
> snorble <snorble@hotmail.com> wrote:
>
> > When I have used Python 64-bit I have run into problems (third party
> > libraries not working, etc). So I just use Python 32-bit on everything
> > and have no problems. Are there any similar problems with Perl? On a
> > 64-bit computer, am I better off with 64-bit, or better off running 32-
> > bit Perl on everything? I'm looking to use Perl on Windows systems for
> > administration purposes. Thank you for any guidance you can provide.
I'm running a 64bit perl on this machine with no problems, but then I'm
running FreeBSD so everything running on this machine was compiled on
this machine, in 64bit mode. Running under Windows where you potentially
need to deal with 3rd-party binary-only libraries is rather different.
I would expect 32bit perl to run without problems on Win64. According to
strawberryperl.com there are currently some important modules which will
not build in 64bit mode; probably the most important are the SSL
modules, none of which apparently currently build in 64bit mode under
Windows.
The most important question is whether there is anything in particular
you know you are going to need to use. If there is, you need to check it
will work with your chosen version of perl. However, I rather get the
impression you're just starting to learn Perl: in that case, I would
recommend going to http://strawberryperl.com and installing the default
version from there. Currently that is a 32bit build of 5.12.3, so I
would expect that to be the most reliable and generally useful version.
> I've always heard that a 64-bit executable can't load 32-bit shared
> libraries.
It is possible to load a 32bit library from a 64bit executable, but not
with the normal runtime linker under most OSs. A few programs like
mplayer have their own custom runtime linking code which allows them to
load 32bit Win32 libraries into a 64bit Unix process.
> I may be wrong with that in MacOS in that 10.6 (Snow
> Leopard) is 32-bit but can run 64-bit executables on a suitable
> processor.
10.6 has the whole OS available in both 32- and 64bit versions. (The two
versions live in the same files, using the same 'fat binaries' as
earlier versions used for Intel vs. PPC.) Many machines normally run a
32bit kernel even though their processors are 64bit-capable; this
doesn't stop you from running 64bit processes, since the 64bit libraries
are available.
> Windows 7 has two flavors--32-bit and 64-bit. The drivers
> have to be the same but I think programs don't matter as much unless you
> attempt to mix 32 and 64-bit libraries.
Yes. A device driver (or, at least, a device driver that runs in kernel
mode: Windows uses the term rather broadly) is essentially a shared
library to be loaded into the kernel. That means the word size has to
match. A suitably-written 32bit kernel can run a 64bit process, but only
if 64bit libraries are available.
> Mixing compiled shared libraries is the problem. I don't think this
> problem is only with Python. But I could be wrong.
There is nothing Python- (or Perl-) specific about any of this, no.
Ben
------------------------------
Date: Mon, 6 Feb 2012 15:48:05 +0000
From: Justin C <justin.1201@purestblue.com>
Subject: Re: WWW::Mechanize and outputing what's returned
Message-Id: <la9509-ccn.ln1@zem.masonsmusic.co.uk>
On 2012-02-03, Ben Morrow <ben@morrow.me.uk> wrote:
>
> Quoth Justin C <justin.1201@purestblue.com>:
>> I've just written my first WWW::Mechanize program, it does it's job,
>> and I can export the data to PDF using PDF::FromHTML. What I don't get
>> with this, however, are the images on the page, so my PDF is ugly.
>
> Also, that module makes no attempt to handle CSS, so for most ordinary
> web pages it's probably useless.
>
>> I've tried using $mech->find_all_images(), and downloading them, but
>> the images on the page are all relative links - and, it seems, the
>> relative path is being set depending which style sheet is in force at
>> the time.
>
> I'm not sure what you mean here. ->find_all_images returns
> WWW::Mech::Image objects, which have both ->url and ->base methods. Is
> that not enough to download the image and put it in the right place in a
> tree?
I may look at that again in a while. For now I've given up on the
images...
>
>> Can anyone suggest where I start reading so that I can learn how to
>> get the entire page, including images, and have the html in
>> $mech->content display links to the locally downloaded copies of the
>> images?
>>
>> Or is there a better way to submit a form and get what is returned
>> into a PDF?
>
> Rendering modern HTML is an extremely complicated business. I wouldn't
> try to to it in pure Perl unless there's no other option. For rendering
> to PDF I'd look at PDF::WebKit, which uses an external WebKit-based
> binary to do the rendering; unfortunately it also requires Qt, which may
> mean you can't use it.
It wasn't fun installing, but my PDF's look much better (hence being
able to do without the images).
The major problem I had is that wkhtmltopdf (what PDF::WebKit drives
to get PDF output) requires a running X server, but I wanted to run
this on a headless box that has no X. So I installed xvfb (it still
dragged in a whole bunch of dependencies), I then needed a bash
script:
xvfb-run wkhtmltopdf $@
and then a hack to PDF::WebKit so that it doesn't look for
wkhtmltopdf, but uses my script instead. It's dirty, but it works.
wkhtmltopdf, if run from the command line in an xterm/rxvt/whatever
works fine, but it will not run outside and X server. :-(
> If you are having trouble because you're feeding WebKit HTML from Mech
> and it can't resolve the URLs, you probably want to use the base_href
> parameter to Mech->content.
I had that in my code, I was probably doing something wrong, but I
wasn't getting what I wanted. I'll give it another try when I've got
this whole thing worked out, not just this part.
Thank you for the pointers, PDF::WebKit (apart from the install
overhead) is much easier than PDF::FromHTML.
Justin.
--
Justin C, by the sea.
------------------------------
Date: Mon, 6 Feb 2012 22:05:21 +0000
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: WWW::Mechanize and outputing what's returned
Message-Id: <1ev509-smq1.ln1@anubis.morrow.me.uk>
Quoth Justin C <justin.1201@purestblue.com>:
> On 2012-02-03, Ben Morrow <ben@morrow.me.uk> wrote:
> >
> > Rendering modern HTML is an extremely complicated business. I wouldn't
> > try to to it in pure Perl unless there's no other option. For rendering
> > to PDF I'd look at PDF::WebKit, which uses an external WebKit-based
> > binary to do the rendering; unfortunately it also requires Qt, which may
> > mean you can't use it.
>
> It wasn't fun installing, but my PDF's look much better (hence being
> able to do without the images).
>
> The major problem I had is that wkhtmltopdf (what PDF::WebKit drives
> to get PDF output) requires a running X server, but I wanted to run
> this on a headless box that has no X.
See <http://madalgo.au.dk/~jakobt/wkhtmltoxdoc/wkhtmltopdf-0.9.9-doc.html>,
particularly the section 'Reduced Funtionality', and
<http://code.google.com/p/wkhtmltopdf/downloads/list>.
(Quite *why* it links Qt I can't imagine, but there we are...)
Ben
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests.
#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V11 Issue 3607
***************************************