[10699] in Perl-Users-Digest
Perl-Users Digest, Issue: 4291 Volume: 8
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Tue Nov 24 20:07:24 1998
Date: Tue, 24 Nov 98 17:01:38 -0800
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Tue, 24 Nov 1998 Volume: 8 Number: 4291
Today's topics:
Re: regexp on multiple lines, etc. <uri@fastengines.com>
Re: Returning to a previous URL from a CGI script dturley@pobox.com
Re: sending patterns for substitutes as a parameter to <rootbeer@teleport.com>
Re: should `-w' become the default? <rootbeer@teleport.com>
simple problem bkraymond@geocities.com
Re: simple problem (Sean McAfee)
Re: simple problem (BenJamin Prater)
Re: sysread / need help w/ single byte I/O, urgent! pl <rootbeer@teleport.com>
Re: Trouble with Perl-Postres <rootbeer@teleport.com>
Special: Digest Administrivia (Last modified: 12 Mar 98 (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: 24 Nov 1998 18:39:28 -0500
From: Uri Guttman <uri@fastengines.com>
To: jjoerg@camlaw.rutgers.edu (john joergensen)
Subject: Re: regexp on multiple lines, etc.
Message-Id: <sarww4k8xu7.fsf@camel.fastserv.com>
this is a long post. enjoy if you dare.
uri
>>>>> "jj" == john joergensen <jjoerg@camlaw.rutgers.edu> writes:
jj> In article <sarg1b99b80.fsf@camel.fastserv.com>, Uri Guttman
jj> <uri@fastengines.com> wrote:
>>>>>>> "JJ" == John Joergensen <jjoerg@crab.rutgers.edu> writes:
>>
JJ> I have the regexp set to global (ie. m/regexp/g), and I set the $/
JJ> parameter to null (so I can slurp paragraphs, but not the whole
JJ> file. I also set the expression to accept return chrarcters in
JJ> all the where the expression may be wrapped. So far, so good.
>>
JJ> The problem is, when I set the $/ to null, I am only getting one
JJ> match per paragraph.
>>
>> $/ has nothing to do with regexes, only reading files.
i finally understand what you are trying to do. you have a lot of perl
to learn. the code below is broken every which way but loose. and it is
formatted from hell. i will try to comment on it and clear up your
misconceptions.
first off, how you read the file in and how you match are separate
issues. but since you want to match over multiple lines, you have to have
paragraphs to match against. so setting $/ to '' is correct. but then
you have to LOOP over the paragraph to see each match. your line
oriented loop (not clearing $/) partly worked since it saw more
citations (at most one per line) and i didn't see any that line
wrapped.
jj> #!/usr/bin/perl -w
good, you use -w, but where is use strict?
jj> while(defined($filename = glob("*.html"))) {
this makes no sense. you loop over all the files but only open
test.html. if you want to loop over the files found by glob use this:
foreach $file_name ( glob("*.html") ) {
jj> $/ = '';
put this outside the loop. it only needs to be set once.
jj> open (IN, "test.html");
jj> open (OUT, ">result.html");
ALWAYS test the result of opens
jj> while(<IN>){if(m/[\d]{1,3}[\s\n<]{1,2}[A-W<][A-Za-z234<>\.
jj> \s\/\&\;]*[\s\n]{1,2}[\d]{1,4}/g) {
this regex is from hell. it doesn't even work the way you want it too
and it is way too complex for the job you are trying to do. you don't
get what char classes are and many other regex issues. get "mastering
regular expressions" and read it.
but as i said before you need to loop over each match to find all of
them in each paragraph so the if should be a while.
this regex seems to do the job very nicely and it is accurate (assuming
i understand the format of citations.
the citations seem to a number followed by the court (in <u>) and more
numbers. the tricky part is finding the last number. if it is followed
by another citation then it ends in , but it could end with a (quoting
or (year) too.
also i don't know the format of the rutgers search. do they want only
the first number of the trailing numbers like you have it? i assume all
the numbers are useful and i delete the trailing , ( and (year).
i will show it in the expanded form to explain it:
m[
( # group the match
\d{1,3} # a 1 to 3 digit number
\s* # optional white space
<u> # html markup
.+? # any text (non-greedy)
</u> # html markup end
\s* # optional white space
(?:at\s+)? # optional word 'at '
(?:\d{1,3} # 1 to 3 digits followed by
(?:[,(] # , (
| # or
\s+ # white space
\(\d+\) # and a year in ()
) # end of following group
) # end of digits group
) # end of full match group
]gxs # global match, extended, . matches \n
NOTE: i found cites with 4 digits which i handle in the example below
but the code after this is just as confusing.
jj> $cite = $&;
jj> $saveorig = $&;
since the match is now grouped you can use $1.
another major improvement would be to use the /e modifier and make the
m// a s/// so you don't need to save the orginal citation and do another
s/// on it. in fact that could be a sub call which would make it nice
and clean
jj> print "Found a cite: $cite\n";
jj> @parts = split(/\s/, $cite);
jj> foreach $b (@parts) {
jj> if ($b ne "at") {
jj> @parts2 = (@parts2, $b);
this should be
push( @parts2, $b ) ;
jj> }
jj> }
jj> $search = join("\%20", @parts2);
no need to \ a % in a string.
jj> undef @parts2;
this is wrong. it should be
@parts2 = () ;
but the whole section above could be done with this:
# remove 'at '
$search =~ s/\bat\s+//g ;
# change white space to escape code. this is better done with a CGI.pm
# routine as i am sure others will yell
$search =~ s/\s+/%20/g ;
jj> } # close the else
you mean the if!
jj> @fixthecite = split(/<[u\/]{1,2}>/, $search);
ugly!!
why not just delete the <u> and </u> from the string (and it should be
done before you encode the spaces):
$search =~ s|</?u>||g ;
jj> $ready = join("", @fixthecite);
jj> if ($ready =~ /term/i) {
jj> print "Whoa! Leave this one out: $saveorig!\n";
this is unclear but i assume it is a special case
jj> } else {
jj> s/$saveorig/<ahref=\"http:\/\/lawlibrary.rutgers.edu\/cgi-bin\/getlink.
jj> cgi?cite=$ready\">$saveorig<\/a>\n/;
this is broken as the tag is 'ahref' when it should be 'a' followed by
href.
if you used a different delimiter (like i show above), you wouldn't have
to backwhack the /
so here is a working program that does close to what you want.
i made 2 paragrahs and munged some data to test it out more:
#!/usr/local/bin/perl
$/ = '';
while(<DATA>) {
s[
(
\d{1,4}
\s*
<u>
.+?
</u>
\s*
(?:at\s+)?
(?:\d{1,4}
(?:[,(]
|
\s+
\(\d+\)
)
)+
)]
[ &fix_cite( $1 ) ]gexs ;
print ;
}
sub fix_cite {
my( $cite ) = @_ ;
my( $search ) ;
#print "Found a cite: [$cite]\n";
$search = $cite ;
$search =~ s/\bat\s+//g ;
$search =~ s|</?u>||g ;
$search =~ s/[,(]$// ;
$search =~ s/\s+\(\d+\)$// ;
$search =~ s/\s+/%20/g ;
#print "search [$search]\n" ;
return( <<HREF ) ;
<a href="http://lawlibrary.rutgers.edu/cgi-bin/getlink.cgi?cite=$search">
$cite</a>
HREF
}
__DATA__
<p><p> The U.S. Supreme Court formerly resorted to
various fictions, such as implied consent. <u>E.g.</u>, <u>Hess v.
Pawloski</u>, 274<u>U.S.</u> 352, 47 <u>S. Ct.</u> 632, 71 <u>L. Ed.</u> 1091
(1927). Eventually, in<u>International Shoe Co. v. Washington</u>, 326
<u>U.S.</u> 310, 66 <u>S. Ct.</u>154, 90 <u>L. Ed.</u> 95 (1945), the Court
cast those fictions aside andheld that a state court's assertion of personal
jurisdiction doesnot violate the Due Process Clause if the defendant has
"certainminimum contacts with it such that the maintenance of the
suitdoes not offend `traditional notions of fair play and
substantialjustice.'" 326 <u>U.S.</u> at 316, 66 <u>S. Ct.</u> at 158,
90 <u>L. Ed.</u> at 102(quoting <u>Milliken v. Meyer</u>, 311 <u>U.S.</u> 457,
463, 61 <u>S. Ct.</u> 339,343, 85 <u>L. Ed.</u> 278, 283 (1940)). The
concomitant understandingof legislative jurisdiction was similarly modified.
<br><p><p>
<p><p> The U.S. Supreme Court formerly resorted to
various fictions, such as implied consent. <u>E.g.</u>, <u>Hess v.
Pawloski</u>, 123<u>U.S.</u> 28, 284, 47 <u>S. Ct.
</u> 632, 71 <u>L. Ed.</u> 1091
(1927). Eventually, in<u>International Shoe Co. v. Washington</u>, 326
<u>U.S.</u> 310, 66
<u>S. Ct.</u>154, 90 <u>L. Ed.</u> 95 (1945), the Court
cast those fictions aside andheld that a state court's assertion of personal
jurisdiction doesnot violate the Due Process Clause if the defendant has
"certainminimum contacts with it such that the maintenance of the
suitdoes not offend `traditional notions of fair play and
substantialjustice.'" 326 <u>U.
S.</u> at 316, 66 <u>S. Ct.</u> at 158, 18,
90 <u>L. Ed.</u> at 102(quoting <u>Milliken v. Meyer</u>, 311 <u>U.S.</u> 457,
463, 61 <u>S. Ct.</u> 339,343, 85 <u>L. Ed.</u> 278, 283 (1940)). The
concomitant understandingof legislative jurisdiction was similarly modified.
<br><p><p>
hth,
uri
--
Uri Guttman Fast Engines -- The Leader in Fast CGI Technology
uri@fastengines.com http://www.fastengines.com
------------------------------
Date: Wed, 25 Nov 1998 00:04:40 GMT
From: dturley@pobox.com
Subject: Re: Returning to a previous URL from a CGI script
Message-Id: <73fhia$goc$1@nnrp1.dejanews.com>
In article <73ehpj$e494@shark.ncr.pwgsc.gc.ca>,
"Brian Gaber" <brian.gaber@pwgsc.gc.ca> wrote:
> How can I create a link at the end of my CGI script for the user to go back
> two URLs (i.e. not the CGI calling URL, but the one before that). Many
> URL's can lead to the CGI calling URL.
Try javascript:
#Using cgi.pm
print a({href=>'javascript:history.go(-2);'},'Go back 2.');
creates a link that jumps back 2 in the history file, of course the browser
must support JS
david
-----------== Posted via Deja News, The Discussion Network ==----------
http://www.dejanews.com/ Search, Read, Discuss, or Start Your Own
------------------------------
Date: Wed, 25 Nov 1998 00:46:22 GMT
From: Tom Phoenix <rootbeer@teleport.com>
Subject: Re: sending patterns for substitutes as a parameter to a function
Message-Id: <Pine.GSO.4.02A.9811241643020.4375-100000@user2.teleport.com>
On Mon, 23 Nov 1998, Ori Aruj wrote:
> I've had some problems sending patterns for the function s/// as
> parameters.
It's tricky, until you get the quoting right.
> the main problems are:
>
> the $1,...,$10 variables - the interpolating time is too early or to
> late
Now, are you talking about the pattern or the replacement string? The FAQ
has information on how to do this kind of repeated interpolation.
> s/\[(\d+)\]/_$1/
>
> s/\[(\d)\-(\d*)\]/_$1_$2/
Nothing wrong there, but you're not passing any patterns as parameters.
Cheers!
--
Tom Phoenix Perl Training and Hacking Esperanto
Randal Schwartz Case: http://www.rahul.net/jeffrey/ovs/
------------------------------
Date: Wed, 25 Nov 1998 00:28:41 GMT
From: Tom Phoenix <rootbeer@teleport.com>
Subject: Re: should `-w' become the default?
Message-Id: <Pine.GSO.4.02A.9811241626150.4375-100000@user2.teleport.com>
On Sat, 21 Nov 1998, Russell Schulz wrote:
> Subject: should `-w' become the default?
Whether it should or not, it's not gonna, so this discussion is moot.
> so why not enable it by default, and have a separate switch the
> experts can use (if they must) to disable it?
I think Randal answered that well in what you quoted:
> > There are many of my one-off programs that are not -w clean. Some are
> > not even "use strict" clean. Perl is a tool to get the job done. Some
> > parts of Perl prevent some kinds of debugging, and are therefore
> > redundant with some parts of my 29+ years of programming experience.
> > I don't need to put training wheels on my motorcycle, thank you.
> how much grief would it cause you to have to add a switch to these
> invocations to shut off the warnings?
Lots of grief. If we implement this, we'll change perlbug to send the bug
reports to your email address. :-)
--
Tom Phoenix Perl Training and Hacking Esperanto
Randal Schwartz Case: http://www.rahul.net/jeffrey/ovs/
------------------------------
Date: Tue, 24 Nov 1998 22:52:57 GMT
From: bkraymond@geocities.com
Subject: simple problem
Message-Id: <73fdc2$crr$1@nnrp1.dejanews.com>
Let's say I have two variables; $name = My Name and $age=21 How could a
save this to a file. Then in future get this data from the file and store
them in the original variable names $name and $age. (I don't actually need to
print them to the screen.) ThanQ in advance.
-----------== Posted via Deja News, The Discussion Network ==----------
http://www.dejanews.com/ Search, Read, Discuss, or Start Your Own
------------------------------
Date: Tue, 24 Nov 1998 23:07:40 GMT
From: mcafee@waits.facilities.med.umich.edu (Sean McAfee)
Subject: Re: simple problem
Message-Id: <0%G62.1284$CY1.5249945@news.itd.umich.edu>
In article <73fdc2$crr$1@nnrp1.dejanews.com>, <bkraymond@geocities.com> wrote:
>Let's say I have two variables; $name = My Name and $age=21 How could a
>save this to a file. Then in future get this data from the file and store
>them in the original variable names $name and $age. (I don't actually need to
>print them to the screen.) ThanQ in advance.
To store the data to a file:
use Data::Dumper;
open(FILE, ">myfile") || die "Can't open file myfile: $!\n";
print FILE Data::Dumper->Dump([$name, $age], [qw(name age)]);
close(FILE);
To retrieve the data from the file:
do "myfile";
Not exactly pretty, but it gets the job done.
--
Sean McAfee | GS d->-- s+++: a26 C++ US+++$ P+++ L++ E- W+ N++ |
| K w--- O? M V-- PS+ PE Y+ PGP?>++ t+() 5++ X+ R+ | mcafee@
| tv+ b++ DI++ D+ G e++>++++ h- r y+>++** | umich.edu
------------------------------
Date: 24 Nov 1998 18:37:08 -0600
From: ben@sofnet.com (BenJamin Prater)
Subject: Re: simple problem
Message-Id: <365b4f8d.1497058@news.sofnet.com>
Sean's idea is good, but might be hard for you to follow if you are
new to perl.
There are a million ways to save data, but for simplicity's sake, just
write to a file and separate the items, such as:
open FILE, ">$file" or die $!;
print FILE, "$name||$age";
close FILE or die $!;
The file will now look like:
My Name||21
to read it back:
open FILE, "$file" or die $!;
$first_line = <FILE>;
($name, $age) = split /\|\|/, $first_line;
close FILE or die $!;
Very simple, but understandable, I hope.
Ben
On Tue, 24 Nov 1998 22:52:57 GMT, bkraymond@geocities.com wrote:
>Let's say I have two variables; $name = My Name and $age=21 How could a
>save this to a file. Then in future get this data from the file and store
>them in the original variable names $name and $age. (I don't actually need to
>print them to the screen.) ThanQ in advance.
>
>-----------== Posted via Deja News, The Discussion Network ==----------
>http://www.dejanews.com/ Search, Read, Discuss, or Start Your Own
------------------------------
Date: Wed, 25 Nov 1998 00:41:48 GMT
From: Tom Phoenix <rootbeer@teleport.com>
Subject: Re: sysread / need help w/ single byte I/O, urgent! please!
Message-Id: <Pine.GSO.4.02A.9811241639130.4375-100000@user2.teleport.com>
On 23 Nov 1998, Ralph Forsythe wrote:
> I need to read the bytes one-at-a-time and do ord() conversions on
> them - but when I try this it just hangs the program.
What hangs? Reading bytes, doing ord, something else?
> MSB$ = INPUT$ (1, #1)
> LSB$ = INPUT$ (1, #1)
> reading1 = (ASC(MSB$)*256) + ASC(LSB$)
Maybe you should write this in Perl. I don't know what language this is,
and I don't know what it's supposed to do. But maybe you want the read()
function, or getc(), documented in perlfunc. Hope this helps!
--
Tom Phoenix Perl Training and Hacking Esperanto
Randal Schwartz Case: http://www.rahul.net/jeffrey/ovs/
------------------------------
Date: Tue, 24 Nov 1998 23:08:50 GMT
From: Tom Phoenix <rootbeer@teleport.com>
Subject: Re: Trouble with Perl-Postres
Message-Id: <Pine.GSO.4.02A.9811241507500.4375-100000@user2.teleport.com>
On Fri, 20 Nov 1998, Tester wrote:
> Newsgroups: comp.lang.perl, comp.lang.perl.misc
If your news server still lists comp.lang.perl as an active newsgroup,
replace your news admin. That group is defunct. When it was active,
Kevin Costner was still thought of as a promising film director.
> Can't load '/usr/lib/perl5/site_perl/i686-linux/auto/Postgres/Postgres.so'
> for m
> odule Postgres: File not found at
> /usr/lib/perl5/i686-linux/5.00401/DynaLoader.p
> m line 155.
Have you seen what the perldiag manpage says about this message? You need
to (re-)install Postgres properly on your system. Good luck!
--
Tom Phoenix Perl Training and Hacking Esperanto
Randal Schwartz Case: http://www.rahul.net/jeffrey/ovs/
------------------------------
Date: 12 Jul 98 21:33:47 GMT (Last modified)
From: Perl-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Special: Digest Administrivia (Last modified: 12 Mar 98)
Message-Id: <null>
Administrivia:
Special notice: in a few days, the new group comp.lang.perl.moderated
should be formed. I would rather not support two different groups, and I
know of no other plans to create a digested moderated group. This leaves
me with two options: 1) keep on with this group 2) change to the
moderated one.
If you have opinions on this, send them to
perl-users-request@ruby.oce.orst.edu.
The Perl-Users Digest is a retransmission of the USENET newsgroup
comp.lang.perl.misc. For subscription or unsubscription requests, send
the single line:
subscribe perl-users
or:
unsubscribe perl-users
to almanac@ruby.oce.orst.edu.
To submit articles to comp.lang.perl.misc (and this Digest), send your
article to perl-users@ruby.oce.orst.edu.
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
To request back copies (available for a week or so), send your request
to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
where x is the volume number and y is the issue number.
The Meta-FAQ, an article containing information about the FAQ, is
available by requesting "send perl-users meta-faq". The real FAQ, as it
appeared last in the newsgroup, can be retrieved with the request "send
perl-users FAQ". Due to their sizes, neither the Meta-FAQ nor the FAQ
are included in the digest.
The "mini-FAQ", which is an updated version of the Meta-FAQ, is
available by requesting "send perl-users mini-faq". It appears twice
weekly in the group, but is not distributed in the digest.
For other requests pertaining to the digest, send mail to
perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
sending perl questions to the -request address, I don't have time to
answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V8 Issue 4291
**************************************