[10900] in Perl-Users-Digest
Perl-Users Digest, Issue: 4501 Volume: 8
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Sun Dec 27 13:08:28 1998
Date: Sun, 27 Dec 98 10:00:19 -0800
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Sun, 27 Dec 1998 Volume: 8 Number: 4501
Today's topics:
ANNOUNCE: HTML::FromText <garethr@cre.canon.co.uk>
ANNOUNCE: Parse-Yapp-0.21, Christmas Release <desar@club-internet.fr>
ANNOUNCE: Statistics::MaxEntropy v0.9 <terdoest@cs.utwente.nl>
ANNOUNCE: v1998.1204 Squeeze.pm -- Shorten text to page (Jari Aalto+mail.perl)
Re: Basic Perl DOS/Win95 + WWW + CGI course for Newbies <mlabor@sprintmail.com>
Re: Get Title <gellyfish@btinternet.com>
Re: get webpage with perl <gellyfish@btinternet.com>
Java/Perl Tool Available as Open Source Software <silver@oreilly.com>
Makepatch version 2.00 released (Johan Vromans)
Re: mkdir and -p <tchrist@mox.perl.com>
News::Newsrc 1.07 released (Steven W McDougall)
Set::IntSpan 1.07 released (Steven W McDougall)
Special: Digest Administrivia (Last modified: 12 Dec 98 (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: 27 Dec 1998 17:02:23 GMT
From: Gareth Rees <garethr@cre.canon.co.uk>
Subject: ANNOUNCE: HTML::FromText
Message-Id: <765p6v$36o$1@play.inetarena.com>
See http://www.perl.com/CPAN/authors/id/G/GD/GDR/HTML-FromText-1.000.tar.gz
NAME
HTML::FromText - flexibly mark up plain text as HTML
SYNOPSIS
use HTML::FromText 'text2html';
print text2html($text, paras => 1, urls => 1);
DESCRIPTION
The function `text2html' converts plain text to HTML. It can
apply the follow transformations (each transformation is
selected by passing the appropriate flag as an argument):
* Turn HTML metacharacters into HTML entities.
* Spot URLs and convert them to links.
* Spot e-mail addresses and convert them to `mailto:' links.
* Preserve line breaks.
* Expand tabs and preserve spaces throughout the text.
* Mark up words surrounded with *asterisks* as bold.
* Mark up words surrounded with _underscores_ as underlined.
* Format the text as paragraphs.
* Spot paragraphs where every line begins with whitespace, and
mark them up as block quotes.
* Spot bulleted paragraphs and mark them up as an unordered
list.
* Spot numbered paragraphs and marks them up as an ordered list.
* Spot headings (paragraphs starting with numbers) and mark them
up as headings of the appropriate level.
* Format the first paragraph of the text as a first-level
heading.
INSTALLATION
perl Makefile.PL && make && make test && make install
BUGS
* There are lots of transformations it doesn't do.
--
Gareth Rees
------------------------------
Date: 27 Dec 1998 17:01:48 GMT
From: Francois Desarmenien <desar@club-internet.fr>
Subject: ANNOUNCE: Parse-Yapp-0.21, Christmas Release
Message-Id: <765p5s$36e$1@play.inetarena.com>
I'm pleased to announce that Parse-Yapp-0.21 (Christmas Release) has
been
uploaded to CPAN. It should be available soon on your nearest CPAN
mirror.
Note that beginning with version 0.20, Parse::Yapp is no longer alpha
software:
it has been promoted to beta.
Enjoy and merry Christmas
Frangois Disarminien
-----------------------------------------------------------------------------
Parse::Yapp - Parse::Yapp Yet Another Perl Parser compiler
Compiles yacc-like LALR grammars to generate Perl OO parser modules.
COPYRIGHT
(c) 1998 Francois Desarmenien, all rights reserved.
(see the Copyright section in Yapp.pm for usage and distribution rights)
IMPORTANT NOTES
THIS IS BETA SOFTWARE.
Though it has been tested a lot, there are probably bugs in it ;-)
The BETA status does not reflect the quality of the code, but the
possible
changes in the generated parser modules.
I need FEEDBACK for every problem or bug you could encounter so I can
fix
them in the next release. Comments are welcome too.
But I also need FEEDBACK if you use it and have it work fine so I can
step
to production releases. Just drop me a mail.
The Parse::Yapp pod section is the main documentation and it assumes
you already have a good knowledge of yacc. If not, I suggest the GNU
Bison manual which is a very good tutorial to LALR parsing and yacc's
grammar syntax.
The documentation is only a draft and should be rewritten (I think).
Any help on this issue would be very welcome.
DESCRIPTION
This is the alpha release 0.21 of the Parse::Yapp parser generator.
It lets you create Perl OO fully reentrant LALR(1) parser
modules (see the Yapp.pm pod pages for more details) and has
been designed to be functionnaly as close as possible to yacc,
but using the full power of Perl and opened for enhancements.
REQUIREMENTS
Requires perl5.004 or better :)
It is written only in Perl, with standard distribution modules,
so you don't need any compiler nor special modules.
INSTALLATION
perl Makefile.PL
make
make test
make install
WARRANTY
This software comes with absolutly NO WARRANTY of any kind.
I just hope it can be useful.
FEEDBACK
Send feedback, comments and bug reports to:
Francois Desarmenien
desar@club-internet.fr
------------------------------
Date: 27 Dec 1998 17:03:39 GMT
From: Hugo ter Doest <terdoest@cs.utwente.nl>
Subject: ANNOUNCE: Statistics::MaxEntropy v0.9
Message-Id: <765p9b$374$1@play.inetarena.com>
CHANGES:
- Now has own support for sparse vectors, no longer requires
Bit::Vector, and no longer supports it!
- Added support for the (Abney 1997) Newton estimation.
- Enumeration of event space no longer stored in memory.
README:
NAME
MaxEntropy - Perl5 module for Maximum Entropy Modeling and
Feature Induction
SYNOPSIS
use Statistics::MaxEntropy;
# debugging messages; default 0
$Statistics::MaxEntropy::debug = 0;
# maximum number of iterations for IIS; default 100
$Statistics::MaxEntropy::NEWTON_max_it = 100;
# minimal distance between new and old x for Newton's method;
# default 0.001
$Statistics::MaxEntropy::NEWTON_min = 0.001;
# maximum number of iterations for Newton's method; default 100
$Statistics::MaxEntropy::KL_max_it = 100;
# minimal distance between new and old x; default 0.001
$Statistics::MaxEntropy::KL_min = 0.001;
# the size of Monte Carlo samples; default 1000
$Statistics::MaxEntropy::SAMPLE_size = 1000;
# creation of a new event space from an events file
$events = Statistics::MaxEntropy::new($file);
# Generalised Iterative Scaling, "corpus" means no sampling
$events->scale("corpus", "gis");
# Improved Iterative Scaling, "mc" means Monte Carlo sampling
$events->scale("mc", "iis");
# Feature Induction algorithm, also see Statistics::Candidates POD
$candidates = Statistics::Candidates->new($candidates_file);
$events->fi("iis", $candidates, $nr_to_add, "mc");
# writing new events, candidates, and parameters files
$events->write($some_other_file);
$events->write_parameters($file);
$events->write_parameters_with_names($file);
# dump/undump the event space to/from a file
$events->dump($file);
$events->undump($file);
DESCRIPTION
This module is an implementation of the Generalised and Improved
Iterative Scaling (GIS, IIS) algorithms and the Feature
Induction (FI) algorithm as defined in (Darroch and Ratcliff
1972) and (Della Pietra et al. 1997). The purpose of the scaling
algorithms is to find the maximum entropy distribution given a
set of events and (optionally) an initial distribution. Also a
set of candidate features may be specified; then the FI
algorithm may be applied to find and add the candidate
feature(s) that give the largest `gain' in terms of Kullback
Leibler divergence when it is added to the current set of
features.
Events are specified in terms of a set of feature functions
(properties) f_1...f_k that map each event to {0,1}: an event is
a string of bits. In addition of each event its frequency is
given. We assume the event space to have a probability
distribution that can be described by
p(x) = 1/Z e^{sum_i alpha_i f_i(x)}
The module requires the `Bit::SparseVector' module by Steffen
Beyer and the `Data::Dumper' module by Gurusamy Sarathy. Both
can be obtained from CPAN just like this module.
CONFIGURATION VARIABLES
`$Statistics::MaxEntropy::debug'
If set to `1', lots of debug information, and intermediate
results will be output. Default: `0'
`$Statistics::MaxEntropy::NEWTON_max_it'
Sets the maximum number of iterations in Newton's method.
Newton's method is applied to find the new parameters
\alpha_i of the features `f_i'. Default: `100'.
`$Statistics::MaxEntropy::NEWTON_min'
Sets the minimum difference between x' and x in Newton's
method (used for computing parameter updates in IIS); if
either the maximum number of iterations is reached or the
difference between x' and x is small enough, the iteration
is stopped. Default: `0.001'. Sometimes features have
Infinity or -Infinity as a solution; these features are
excluded from future iterations.
`$Statistics::MaxEntropy::KL_max_it'
Sets the maximum number of iterations applied in the IIS
algorithm. Default: `100'.
`$Statistics::MaxEntropy::KL_min'
Sets the minimum difference between KL divergences of two
distributions in the IIS algorithm; if either the maximum
number of iterations is reached or the difference between
the divergences is enough, the iteration is stopped.
Default: `0.001'.
`$Statistics::MaxEntropy::SAMPLE_size'
Determines the number of (unique) events a sample should
contain. Only makes sense if for sampling "mc" is selected
(see below). Its default is `1000'.
METHODS
`new'
$events = Statistics::MaxEntropy::new($events_file);
A new event space is created, and the events are read from
`$file'. The events file is required, its syntax is
described in the section on "FILE SYNTAX".
`write'
$events->write($file);
Writes the events to a file. Its syntax is described in the
section on "FILE SYNTAX".
`scale'
$events->scale($sample, $scaler);
If `$scaler' equals `"gis"', the Generalised Iterative
Scaling algorithm (Darroch and Ratcliff 1972) is applied on
the event space; `$scaler' equals `"iis"', the Improved
Iterative Scaling Algorithm (Della Pietra et al. 1997) is
used. If `$sample' is `"corpus"', there is no sampling done
to re-estimate the parameters (the events previously read
are considered a good sample); if it equals `"mc"' Monte
Carlo (Metropolis-Hastings) sampling is performed to obtain
a random sample; if `$sample' is `"enum"' the complete event
space is enumerated.
`fi'
fi($scaler, $candidates, $nr_to_add, $sampling);
Calls the Feature Induction algorithm. The parameter
`$nr_to_add' is for the number of candidates it should add.
If this number is greater than the number of candidates, all
candidates are added. Meaningfull values for `$scaler' are
`"gis"' and `"iis"'; default is `"gis"' (see previous item).
`$sampling' should be one of `"corpus"', `"mc"', `"enum"'.
`$candidates' should be in the `Statistics::Candidates'
class:
$candidates = Statistics::Candidates->new($file);
See the Statistics::Candidates manpage.
`write_parameters'
$events->write_parameters($file);
`write_parameters_with_names'
$events->write_parameters_with_names($file);
`dump'
$events->dump($file);
`$events' is written to `$file' using `Data::Dumper'.
`undump'
$events = Statistics::MaxEntropy->undump($file);
The contents of file `$file' is read and eval'ed into
`$events'.
FILE SYNTAX
Lines that start with a `#' and empty lines are ignored.
Below we give the syntax of in and output files.
EVENTS FILE (input/output)
Syntax of the event file (`n' features, and `m' events); the
following holds for features:
* each line is an event;
* each column represents a feature function; the co-domain of a
feature function is {0,1};
* no space between feature columns;
* constant features (i.e. columns that are completely 0 or 1) are
forbidden;
* 2 or more events should be specified (this is in fact a
consequence of the previous requirement;
The frequency of each event precedes the feature columns.
Features are indexed from right to left. This is a consequence
of how `Bit::SparseVector' reads bit strings. Each `f_ij' is a
bit and `freq_i' an integer in the following schema:
name_n <tab> name_n-1 ... name_2 <tab> name_1 <newline>
freq_1 <white> f_1n ... f_13 f_12 f_11 <newline>
. .
. .
. .
freq_i <white> f_in ... f_i3 f_i2 f_i1 <newline>
. .
. .
. .
freq_m <white> f_mn ... f_m3 f_m2 f_m1
(`m' events, `n' features) The feature names are separated by
tabs, not white space. The line containing the feature names
will be split on tabs; this implies that (non-tab) white space
may be part of the feature names.
PARAMETERS FILE (input/output)
Syntax of the initial parameters file; one parameter per line:
par_1 <newline>
.
.
.
par_i <newline>
.
.
.
par_n
The syntax of the output distribution is the same. The
alternative procedure for saving parameters to a file
`write_parameters_with_names' writes files that have the
following syntax
n <newline>
name_1 <tab> par_1 <newline>
.
.
.
name_i <tab> par_i <newline>
.
.
.
name_n <tab> par_n <newline>
bitmask
where bitmask can be used to tell other programs what features
to use in computing probabilities. Features that were ignored
during scaling or because they are constant functions, receive a
`0' bit.
DUMP FILE (input/output)
A dump file contains the event space (which is a hash blessed
into class `Statistics::MaxEntropy') as a Perl expression that
can be evaluated with eval.
BUGS
It's slow.
SEE ALSO
the perl(1) manpage, the Statistics::Candidates manpage, the
Statistics::SparseVector manpage, the Bit::Vector manpage, the
Data::Dumper manpage, the POSIX manpage, the Carp manpage.
DIAGNOSTICS
The module dies with an appropriate message if
* it cannot open a specified events file;
* if you specified a constant feature function (in the events file
or the candidates file);
* if the events file, candidates file, or the parameters file is
not consistent; possible causes are (a.o.): insufficient or
too many features for some event; inconsistent candidate
lines; insufficient, or to many event lines in the
candidates file.
The module captures `SIGQUIT' and `SIGINT'. On a `SIGINT'
(typically <CONTROL-C> it will dump the current event space(s)
and die. If a `SIGQUIT' (<CONTROL-BACKSLASH>) occurs it dumps
the current event space as soon as possible after the first
iteration it finishes.
REFERENCES
(Abney 1997)
Steven P. Abney, Stochastic Attribute Value Grammar,
Computational Linguistics 23(4).
(Darroch and Ratcliff 1972)
J. Darroch and D. Ratcliff, Generalised Iterative Scaling
for log-linear models, Ann. Math. Statist., 43, 1470-1480,
1972.
(Jaynes 1983)
E.T. Jaynes, Papers on probability, statistics, and
statistical physics. Ed.: R.D. Rosenkrantz. Kluwer Academic
Publishers, 1983.
(Jaynes 1997)
E.T. Jaynes, Probability theory: the logic of science, 1997,
unpublished manuscript.
`URL:http://omega.math.albany.edu:8008/JaynesBook.html'
(Della Pietra et al. 1997)
Stephen Della Pietra, Vincent Della Pietra, and John
Lafferty, Inducing features of random fields, In:
Transactions Pattern Analysis and Machine Intelligence,
19(4), April 1997.
VERSION
Version 0.8.
AUTHOR
Hugo WL ter Doest, terdoest@cs.utwente.nl
COPYRIGHT
`Statistics::MaxEntropy' comes with ABSOLUTELY NO WARRANTY and
may be copied only under the terms of the GNU Library General
Public License (version 2, or later), which may be found in the
distribution.
------------------------------
Date: 27 Dec 1998 17:03:28 GMT
From: jari.aalto@poboxes.com (Jari Aalto+mail.perl)
Subject: ANNOUNCE: v1998.1204 Squeeze.pm -- Shorten text to pagers and GSM phones
Message-Id: <765p90$373$1@play.inetarena.com>
What's New: Variable SQZ_OPTIMIZE_LEVEL
Title
ANNOUNCE: v1998.1204 Squeeze.pm -- Shorten text to minimum syllables
The version number is based on date format YYYY.MMDD
Download
Home page:
(eg. ftp://ftp.funet.fi/pub/languages/perl/CPAN/)
CPAN//modules/by-module/Lingua/
Perl language interpreter pointers at (Win32/Unix etc.)
Perl: http://language.perl.com/info/software.html
Description
A module that I use to compress text from email before it is
sent to my Cellular phone. If you have a pager, you know how
tight the space is and every extra characters saver is a plus.
A shortened POD page follows. The Module's Interface functions
and interface variables are not included in this announcement.
I would welcome more text compresion rules, so feel free to
suggest more hash entries like:
WORD => CONVERSION
MULTI WORD => CONVERSION
NAME
Squeeze.pm - Shorten text to minimum syllables by using hash and vowel
deletion
REVISION
$Id: Squeeze.pm,v 1.24 1998/10/08 14:58:15 jaalto Exp $
SYNOPSIS
use Squeeze.pm; # imnport only function
use Squeeze qw( :ALL ); # import all functions and variables
use English;
while (<>)
{
print SqueezeText $ARG;
}
DESCRIPTION
Squeeze English text to most compact format possibly so that it is
barely readable. You should convert all text to lowercase for maximum
compression, because optimizations have been designed mostly fr
uncapitalised letters.
`Warning: Each line is processed multiple times, so prepare for slow
conversion time'
You can use this module e.g. to preprocess text before it is sent to
electronic media that has some maximum text size limit. For example
pagers have an arbitrary text size limit, typically 200 characters,
which you want to fill as much as possible. Alternatively you may have
GSM cellular phone which is capable of receiving Short Messages (SMS),
whose message size limit is 160 characters. For demonstration of this
module's SqueezeText() function , the description text of this paragraph
has been converted below. See yourself if it's readable (Yes, it takes
some time to get used to). The compress ratio is typically 30-40%
u _n use thi mod e.g. to prprce txt bfre i_s snt to
elrnic mda has som max txt siz lim. f_xmple pag
hv abitry txt siz lim, tpcly 200 chr, W/ u wnt
to fll as mch as psbleAlternatvly u may hv GSM cllar P8
w_s cpble of rcivng Short msg (SMS), WS/ msg siz
lim is 160 chr. 4 demonstrton of thi mods SquezText
fnc , dsc txt of thi prgra has ben cnvd_ blow
See uself if i_s redble (Yes, it tak som T to get usdto
compr rat is tpcly 30-40
And if $SQZ_OPTIMIZE_LEVEL is set to non-zero
u_nUseThiModE.g.ToPrprceTxtBfreI_sSntTo
elrnicMdaHasSomMaxTxtSizLim.F_xmplePag
hvAbitryTxtSizLim,Tpcly200Chr,W/UWnt
toFllAsMchAsPsbleAlternatvlyUMayHvGSMCllarP8
w_sCpbleOfRcivngShortMsg(SMS),WS/MsgSiz
limIs160Chr.4DemonstrtonOfThiModsSquezText
fnc,DscTxtOfThiPrgraHasBenCnvd_Blow
SeeUselfIfI_sRedble(Yes,ItTakSomTToGetUsdto
comprRatIsTpcly30-40
The comparision of these two show
Original text : 627 characters
Level 0 : 433 characters reduction 31 %
Level 1 : 345 characters reduction 45 % (+14 improvement)
There are few grammar rules which are used to shorten some English
tokens very much:
Word that has _ is usually a verb
Word that has / is usually a substantive, noun,
pronomine or other non-verb
For example, these tokens must be understood before text can be read.
This is not yet like Geek code, because you don't need external parser
to understand this, but just some common sense and time to adapt
yourself to this text. *For a complete up to date list, you have to peek
the source code*
automatically => 'acly_'
for => 4
for him => 4h
for her => 4h
for them => 4t
for those => 4t
can => _n
does => _s
it is => i_s
that is => t_s
which is => w_s
that are => t_r
which are => w_r
less => -/
more => +/
most => ++
however => h/ver
think => thk_
useful => usful
you => u
your => u/
you'd => u/d
you'll => u/l
they => t/
their => t/r
will => /w
would => /d
with => w/
without => w/o
which => W/
whose => WS/
Time is expressed with big letters
time => T
minute => MIN
second => SEC
hour => HH
day => DD
month => MM
year => YY
Other Big letter acronyms
phone => P8
EXAMPLES
To add new words e.g. to word conversion hash table, you'd define your
custom set and merge them to existing ones. Do similarly to
`%SQZ_WXLATE_MULTI_HASH' and `$SQZ_ZAP_REGEXP' and then start using the
conversion function.
use English;
use Squeeze qw( :ALL );
my %myExtraWordHash =
(
new-word1 => 'conversion1'
, new-word2 => 'conversion2'
, new-word3 => 'conversion3'
, new-word4 => 'conversion4'
);
# First take the existing tables and merge them with my
# translation table
my %mySustomWordHash =
(
%SQZ_WXLATE_HASH
, %SQZ_WXLATE_EXTRA_HASH
, %myExtraWordHash
);
my $myXlat = 0; # state flag
while (<>)
{
if ( $condition )
{
SqueezeHashSet \%%mySustomWordHash; # Use MY conversions
$myXlat = 1;
}
if ( $myXlat and $condition )
{
SqueezeHashSet "reset"; # Back to default table
$myXlat = 0;
}
print SqueezeText $ARG;
}
Similarly you can redefine the multi word thanslate table by supplying
another hash reference in call to SqueezeHashSet(), and to kill more
text immediately in addtion to default, just concatenate the regexps to
*$SQZ_ZAP_REGEXP*
KNOWN BUGS
There may be lot of false conversions and if you think that some word
squeezing went too far, please turn on the debug end send the log to the
maintainer. To see how the conversion goes e.g. for word *Messages*:
use English;
use Lingua::EN:Squeeze;
SqueezeDebug( 1, '(?i)Messages' );
$ARG = "This line has some Messages in it";
print SqueezeText $ARG;
AVAILABILITY
Author can be reached at jari.aalto@poboxes.com HomePage via forwarding
service is at http://www.netforward.com/poboxes/?jari.aalto or
alternatively absolute url is at ftp://cs.uta.fi/pub/ssjaaa/ but this
may move without notice. Prefer keeping the forwarding service link in
your bookmark.
Latest version of this module can be found at $CPAN/modules/by-
module/Lingua/
AUTHOR
Copyright (C) 1998-1999 Jari Aalto. All rights reserved. This program is
free software; you can redistribute it and/or modify it under the same
terms as Perl itself or in terms of Gnu General Public licence v2 or
later.
------------------------------
Date: Sun, 27 Dec 1998 11:42:42 -0500
From: "Manual Labor" <mlabor@sprintmail.com>
Subject: Re: Basic Perl DOS/Win95 + WWW + CGI course for Newbies , Christmas free offer .
Message-Id: <765o1c$j0$1@fir.prod.itd.earthlink.net>
I would be very interested in subjecting myself to you expert tutelage.
contact me at manlabor@hotmail.com
Franklin wrote in message <3687237a.387639806@news>...
>How do I access your course?
>
>On Sat, 26 Dec 1998 17:51:37 -0800, TRG Software
><chatmaster@c-zone.net> wrote:
>
>>Expert wrote:
>>>
>>> I would like to give basic Perl course for newbies.
>>> Integration of CGI Perl scripts with WWW pages.
>>> Setting up simple WIN95/ web server and setting up web pages + CGI
>>> programs running on your PC , for testing purposes.
>>>
>>> Hope this course to be free, interactive, mayby on shareware basis.
>>>
>>> If some of you guys are interested I set up news group to move us
>>> there.
>>> regards,
>>> Jack
>>
>>I might be interested in helping you with this (if you need any more
>>help), but I'll need more details. :-)
>>
>>Good luck
>
------------------------------
Date: 27 Dec 1998 13:56:10 -0000
From: Jonathan Stowe <gellyfish@btinternet.com>
Subject: Re: Get Title
Message-Id: <765e9q$1f4$1@gellyfish.btinternet.com>
On Sun, 27 Dec 1998 13:29:24 +0100 Frank de Bot <debot@xs4all.nl> wrote:
> Does anybody know a script to get the Title of a webpage?
> Some help for making a own script is OK.
You will probably want to use the HTML::HeadParser module (part of the
HTML::Parser package available from CPAN) to obtain the title of the
document from the HTML - if you require to do this via HTTP rather than from
a file then you will also probably want to use the LWP::UserAgent module to
obtain the document.
The following is an example of using both Modules to to obtain the title of
a document from a URL given on the command line:
#!/usr/bin/perl
use HTML::HeadParser;
use LWP::UserAgent;
$ua = new LWP::UserAgent;
$ua->agent("$0/0.1 " . $ua->agent);
$req = new HTTP::Request 'GET' => $ARGV[0];
$req->header('Accept' => 'text/html');
# send request
$res = $ua->request($req);
if ($res->is_success)
{
my $parser = HTML::HeadParser->new;
$parser->parse($res->content);
$outtitle = $parser->header('Title');
print $outtitle,"\n";
}
else
{
print $res->status_line,"\n";
}
__END__
--
Jonathan Stowe <jns@btinternet.com>
Some of your questions answered:
<URL:http://www.btinternet.com/~gellyfish/resources/wwwfaq.htm>
Hastings: <URL:http://www.newhoo.com/Regional/UK/England/East_Sussex/Hastings>
------------------------------
Date: 27 Dec 1998 14:09:12 -0000
From: Jonathan Stowe <gellyfish@btinternet.com>
Subject: Re: get webpage with perl
Message-Id: <765f28$1fq$1@gellyfish.btinternet.com>
On Sun, 27 Dec 1998 11:57:31 GMT h9250293@obelix.wu-wien.ac.at wrote:
> hi!
>
> I would like to go through webpages on the web and grep the essential
> information for me with PERL and mail it to my account. How can i do that? is
> there anywhere a ready source for retriving the HTML of a webpage?
>
You will probably want to use the LWP::* modules available from CPAN to do
this - the distribution of this package comes with a document 'lwpcook' that
has an example that would form a reasonable basis to do this. If you
require a program that will do 'spidering' - following the links on each
successive page then you will require the HTML::Parser module to parse the
documents and retrieve the URLs of the linked pages.
/J\
--
Jonathan Stowe <jns@btinternet.com>
Some of your questions answered:
<URL:http://www.btinternet.com/~gellyfish/resources/wwwfaq.htm>
Hastings: <URL:http://www.newhoo.com/Regional/UK/England/East_Sussex/Hastings>
------------------------------
Date: 27 Dec 1998 17:03:01 GMT
From: Ellen Maremont Silver <silver@oreilly.com>
Subject: Java/Perl Tool Available as Open Source Software
Message-Id: <765p85$371$1@play.inetarena.com>
This announcement was sent to the press on December 1, 1998.
For further information please see http://www.perl.com
and http://perl.oreilly.com.
JAVA/PERL TOOL AVAILABLE AS OPEN SOURCE SOFTWARE
Programmers Can Use Strengths of Two Popular Languages in the Same Environment
Java/Perl Lingo (JPL), software which enables programmers to use the use
the strengths of both Java and Perl in the same environment, is now freely
available as open source software. Until now, the tool has been available
exclusively in O'Reilly & Associates' Perl Resource Kit-UNIX Edition, a
commercial product. JPL was developed by Larry Wall, creator of Perl and
Senior Software Developer at O'Reilly & Associates.
JPL, available since November, 1997, is a unique project whose goal is to
seamlessly unite the two popular languages in a way which lets them
complement each other's strengths. Java excels at helping computers across
a network or the Internet communicate and share data; Perl is used
especially for system administration and interactive Web sites. JPL enables
programmers to implement Java methods with Perl, and for Perl code to
access Java via the Java Native Interface (JNI). It includes a translator
and build system that make it easy to create JPL applications.
The JPL tool and its source code are being made available as part of the
latest development release of Perl (version 5.005_54) and can be obtained
at http://www.perl.com/CPAN/authors/id/GSAR/ (perl5.005_54.patch.gz and
perl5.005_54.tar.gz: see
directory notes for important caveats). Subscription information for the
JPL mailing list is available at http://www.perl.org/maillist.html.
"O'Reilly has been a strong supporter of open source software, so releasing
JPL as open source matches our company values," said Gina Blaber, Director
of O'Reilly's Software Products. "JPL will benefit from the attention of
the broader development community. Further, our Perl books and software are
an important part of the O'Reilly business, so we want to thank and support
the open source community by making the JPL source available."
O'Reilly first released the Perl Resource Kit-UNIX in November, 1997, and
followed it in August with the Perl Resource Kit-Win32 Edition.
# # #
-------------------------------------------------------
Ellen Maremont Silver (formerly Elias) Publicist
O'Reilly & Associates, Inc.
101 Morris St., Sebastopol, CA 95472
Cambridge - Koeln - Paris - Sebastopol - Tokyo
phone: (707) 829-0515 ext. 322 fax: (707) 829-0104
Online: software.oreilly.com, www.oreilly.com
------------------------------
Date: 27 Dec 1998 17:03:17 GMT
From: JVromans@Squirrel.nl (Johan Vromans)
Subject: Makepatch version 2.00 released
Message-Id: <765p8l$372$1@play.inetarena.com>
I'm very pleased to announce release 2.00 of the makepatch package.
URL: $CPAN/authors/id/JV/makepatch-2.00a.tar.gz
This package contains a pair of programs to assist in the generation
and application of patch kits to synchronise source trees.
INTRODUCTION
Traditionally, source trees are updated with the 'patch' program,
processing patch information that is generated by the 'diff' program.
Although 'diff' and 'patch' do a very good job at patching file
contents, most versions do not handle creating and deleting files and
directories, and adjusting of file modes and time stamps. Newer
versions of 'diff' and 'patch' seem to be able to create files, and
very new versions of 'patch' can remove files. But that's about it.
Another typical problem is that patch kits are typically downloaded
from the Internet, of transmitted via electronic mail. It is often
desirable to verify the correctness of a patch kit before even
attempting to apply it.
The makepatch package is designed to overcome these limitations.
DESCRIPTION
The makepatch package contains two programs, both written in Perl:
'makepatch' and 'applypatch'.
'makepatch' will generate a patch kit from two source trees.
It traverses the source directory and runs a 'diff' on each pair of
corresponding files, accumulating the output into a patch kit. It
knows about the conventions for patch kits: if a file named
patchlevel.h exists, it is handled first, so 'patch' can check the
version of the source tree. Also, to deal with the non-perfect
versions of 'patch' that are in use, it supplies 'Index:' and
'Prereq:' lines, so 'patch' can correctly locate the files to patch,
and it relocates the patch to the current directory to avoid problems
with creating new files.
The list of files can be specified in a so called 'manifest' file, but
it can also be generated by recursively traversing the source tree.
Files can be excludes using shell style wildcards and Perl regex
patterns.
Moreover, 'makepatch' prepends a small shell script in front of the
patch kit that creates the necessary files and directories for the
patch process. By running the patch kit as a shell script your source
directory is prepared for the patching process.
But that is not it! 'makepatch' also inserts some additional
information in the patch kit for use by the 'applypatch' program.
The 'applypatch' program will do the following:
- It will extensively verify that the patch kit is complete and not
corrupted during transfer.
- It will apply some heuristics to verify that the directory in
which the patch will be applied does indeed contain the expected
sources.
- It creates files and directories as necessary.
- It applies the patch by running the 'patch' program.
- Upon completion, obsolete files, directories and .orig files are
removed, file modes of new files are set, and the timestamps of
all patched files are adjusted.
Note that 'applypatch' only requires the 'patch' program. It does not
rely on a shell or shell tools. This makes it possible to apply
'makepatch' generated patches on non-Unix systems.
REQUIREMENTS
- Perl 5.005 standard installation.
- For 'makepatch': the 'diff' program.
- For 'applypatch': the 'patch' program.
AVALIABLILTY
CPAN and its mirrors, e.g.
http://www.perl.com/CPAN/authors/id/JV/makepatch-2.00a.tar.gz
--------------------------------------------------------------------------
Johan Vromans jvromans@squirrel.nl
Squirrel Consultancy Haarlem, the Netherlands
http://www.squirrel.nl http://www.squirrel.nl/people/jvromans
PGP Key 2048/4783B14D KFP=65 44 CA 66 B3 50 0B 34 CE 0E FB CA 2D 95 34 D0
---------------------- "Arms are made for hugging" -----------------------
------------------------------
Date: 27 Dec 1998 16:21:52 GMT
From: Tom Christiansen <tchrist@mox.perl.com>
Subject: Re: mkdir and -p
Message-Id: <765mr0$a3q$1@csnews.cs.colorado.edu>
[courtesy cc of this posting sent to cited author via email]
In comp.lang.perl.misc,
webmaster@skatesearch.com writes:
:I'd like to invoke mkdir -p from perl. I know I can shell it out to do this,
:but is there a way to do this from the built in version fo mkdir in perl?
No, there isn't. mkdir makes one directory. It's ok to use the
toolbox now and then you know.
You might look at the File::Path module.
--tom
--
double value; /* or your money back! */
short changed; /* so triple your money back! */
--Larry Wall in cons.c from the 4.0 perl source code
------------------------------
Date: 27 Dec 1998 17:02:34 GMT
From: swmcd@world.std.com (Steven W McDougall)
Subject: News::Newsrc 1.07 released
Message-Id: <765p7a$36p$1@play.inetarena.com>
News::Newsrc 1.07 has been uploaded to PAUSE and will soon propagate
through CPAN.
>From the README file:
News::Newsrc VERSION 1.07 - manage newsrc files
DESCRIPTION
News::Newsrc manages newsrc files, of the style
alt.foo: 1-21,28,31-34
alt.bar! 3,5,9-2900,2902
>From the Changes file:
Revision history for Perl extension News::Newsrc
1.07 1998 Dec 21
- added import_rc and export_rc
- added VERSION_FROM, DISTNAME, ABSTRACT, AUTHOR, and dist
keys in Makefile.PL
Thanks to Philip Hallstrom for suggesting the import/export methods.
- SWM
------------------------------
Date: 27 Dec 1998 17:02:41 GMT
From: swmcd@world.std.com (Steven W McDougall)
Subject: Set::IntSpan 1.07 released
Message-Id: <765p7h$36q$1@play.inetarena.com>
Set::IntSpan 1.07 has been uploaded to PAUSE and will soon propagate
through CPAN.
>From the Changes file:
Revision history for Perl extension Set::IntSpan
1.07 1988 Dec 03
- fixes to facilitate subclassing
o use ref $this instead of hardcoded "Set::IntSpan"
o made internal functions into methods
o use method call syntax on all internal method calls,
not function call syntax
o use direct object syntax on all internal method calls,
because indirect object syntax sometimes parses as a
function call
- added ABSTRACT and AUTHOR keys in Makefile.PL
>From the README file:
Set::IntSpan VERSION 1.07 - Manages sets of integers
DESCRIPTION
Set::IntSpan manages sets of integers. It is optimized for sets that
have long runs of consecutive integers. These arise, for example, in
.newsrc files, which maintain lists of articles:
alt.foo: 1-21,28,31
alt.bar: 1-14192,14194,14196-14221
Sets are stored internally in a run-length coded form. This provides
for both compact storage and efficient computation. In particular,
set operations can be performed directly on the encoded
representation.
Thanks to Chris Sidi for showing me how to fix the code to support
subclassing.
- SWM
------------------------------
Date: 12 Dec 98 21:33:47 GMT (Last modified)
From: Perl-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Special: Digest Administrivia (Last modified: 12 Dec 98)
Message-Id: <null>
Administrivia:
Well, after 6 months, here's the answer to the quiz: what do we do about
comp.lang.perl.moderated. Answer: nothing.
]From: Russ Allbery <rra@stanford.edu>
]Date: 21 Sep 1998 19:53:43 -0700
]Subject: comp.lang.perl.moderated available via e-mail
]
]It is possible to subscribe to comp.lang.perl.moderated as a mailing list.
]To do so, send mail to majordomo@eyrie.org with "subscribe clpm" in the
]body. Majordomo will then send you instructions on how to confirm your
]subscription. This is provided as a general service for those people who
]cannot receive the newsgroup for whatever reason or who just prefer to
]receive messages via e-mail.
The Perl-Users Digest is a retransmission of the USENET newsgroup
comp.lang.perl.misc. For subscription or unsubscription requests, send
the single line:
subscribe perl-users
or:
unsubscribe perl-users
to almanac@ruby.oce.orst.edu.
To submit articles to comp.lang.perl.misc (and this Digest), send your
article to perl-users@ruby.oce.orst.edu.
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
To request back copies (available for a week or so), send your request
to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
where x is the volume number and y is the issue number.
The Meta-FAQ, an article containing information about the FAQ, is
available by requesting "send perl-users meta-faq". The real FAQ, as it
appeared last in the newsgroup, can be retrieved with the request "send
perl-users FAQ". Due to their sizes, neither the Meta-FAQ nor the FAQ
are included in the digest.
The "mini-FAQ", which is an updated version of the Meta-FAQ, is
available by requesting "send perl-users mini-faq". It appears twice
weekly in the group, but is not distributed in the digest.
For other requests pertaining to the digest, send mail to
perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
sending perl questions to the -request address, I don't have time to
answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V8 Issue 4501
**************************************