[28074] in Perl-Users-Digest
Perl-Users Digest, Issue: 9438 Volume: 10
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Sat Jul 8 18:05:40 2006
Date: Sat, 8 Jul 2006 15:05:04 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Sat, 8 Jul 2006 Volume: 10 Number: 9438
Today's topics:
converting line input into columns vanagas99@yahoo.com
Re: converting line input into columns <1usa@llenroc.ude.invalid>
Re: converting line input into columns <mumia.w.18.spam+nospam.usenet@earthlink.net>
Re: converting line input into columns <DJStunks@gmail.com>
Re: converting line input into columns <1usa@llenroc.ude.invalid>
Re: Get the reference to an array from a function... <tadmc@augustmail.com>
Re: Get the reference to an array from a function... <sherm@Sherm-Pendleys-Computer.local>
Re: Get the reference to an array from a function... <David.Squire@no.spam.from.here.au>
Re: How to force formatted date (month) language ? <ynleder@nspark.org>
Re: How to force formatted date (month) language ? <DJStunks@gmail.com>
Re: How to force formatted date (month) language ? <bart@nijlen.com>
Re: kill the process <wcooley@nakedape.cc>
Re: kill the process <ced@blv-sam-01.ca.boeing.com>
Need help to find byte offsets for regexps in a file <robert.dodier@gmail.com>
Pls excuse if you consider this off-topic. Conceptual a M_Mann@artenom.com
Profanity checking, phonetically. <shrike@cyberspace.org>
Re: Profanity checking, phonetically. <john@castleamber.com>
Using References to Formats? <vtatila@mail.student.oulu.fi>
Re: Using References to Formats? <attn.steven.kuo@gmail.com>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: 8 Jul 2006 07:29:50 -0700
From: vanagas99@yahoo.com
Subject: converting line input into columns
Message-Id: <1152368990.670158.258930@m79g2000cwm.googlegroups.com>
hi,
I have a file with formatted output like this:
Severity: Important
Status: Unknown
PDI ID: 1895
Finding Details
This vulnerability... blah, blah, blah
Vulnerability Discussion
blah, blah, blah text
Fix recommendations
blah, blah, blah text
Please advice on how to parse such a file allowing me to put it in a
column type format. As you can see, can't use : as a separator since
not all categories have it. Plus, some of the details of these
categories are plopped in a separate line instead off next to it. Best
way would probably be to put all of it in one tab seberated line
(cleaning out severity, status, etc. later) I just dont know how to do
that. Please advice.,
Thanks,
AV
------------------------------
Date: Sat, 08 Jul 2006 18:52:10 GMT
From: "A. Sinan Unur" <1usa@llenroc.ude.invalid>
Subject: Re: converting line input into columns
Message-Id: <Xns97FA975B35B23asu1cornelledu@127.0.0.1>
vanagas99@yahoo.com wrote in news:1152368990.670158.258930
@m79g2000cwm.googlegroups.com:
> I have a file with formatted output like this:
>
> Severity: Important
> Status: Unknown
> PDI ID: 1895
> Finding Details
> This vulnerability... blah, blah, blah
> Vulnerability Discussion
> blah, blah, blah text
> Fix recommendations
> blah, blah, blah text
>
> Please advice on how to parse such a file allowing me to put it in a
> column type format.
Please consult the posting guidelines for this group. You can help
others help you by posting what you have tried so far, and explaining
the problems you have encountered.
On the other hand, it has been a month or so since I wrote any code, so
I thought this might be a good warm-up exercise for me.
I am sure someone will correct my errors.
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my @single_line_items = ( 'Severity', 'Status', 'PDI ID' );
my @multi_line_items = (
'Finding Details',
'Vulnerability Discussion',
'Fix recommendations',
);
my @records;
my $current = 1;
while ( my $line = <DATA> ) {
next unless $line =~ /^Severity/;
my $text = $line;
while ( <DATA> ) {
last if /^\s+$/;
$text .= $_;
}
$text .= "\n";
eval {
push @records, parse_record( \$text );
};
$@ and warn "Malformed record: $current: $@\n";
++ $current;
}
print Dumper \@records;
sub parse_record {
my ($text_ref) = @_;
my $record = { };
for my $item ( @single_line_items ) {
if ( $$text_ref =~ /$item:\s+([^\n]+)/mg ) {
$record->{$item} = $1;
}
else {
die "Missing '$item' in\n$$text_ref";
}
}
for my $item ( @multi_line_items ) {
$$text_ref =~ /^$item$/mg
or die "Missing '$item' in\n$$text_ref";
if ( $$text_ref =~ /\s+(.+?)\n(?:\n|\w)/sg ) {
$record->{$item} = $1;
pos $$text_ref -= 1;
}
else {
die "Missing text for '$item' in\n$$text_ref";
}
}
return $record;
}
__DATA__
Severity: Trivial
Status: Uppity
PDI ID: 1895
Finding Details
Finding details for id 1895
Vulnerability Discussion
Vulnerability discussion for id 1895
more discussion
Fix recommendations
Fix recommendations for id 1895
more recommendations
Severity: Severe
Status: Fixed
PDI ID: 1897
Finding Details
Finding details for id 1897
Vulnerability Discussion
Vulnerability discussion for id 1897
more discussion
Fix recommendations
Fix recommendations for id 1897
more recommendations
Severity: Offensive
Status: What's That?
PDI ID: 1898
Finding Details
Finding details for id 1898
Vulnerability Discussion
Vulnerability discussion for id 1898
more discussion
Fix recommendations
Fix recommendations for id 1898
more recommendations
--
A. Sinan Unur <1usa@llenroc.ude.invalid>
(remove .invalid and reverse each component for email address)
comp.lang.perl.misc guidelines on the WWW:
http://augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html
------------------------------
Date: Sat, 08 Jul 2006 19:03:43 GMT
From: "Mumia W." <mumia.w.18.spam+nospam.usenet@earthlink.net>
Subject: Re: converting line input into columns
Message-Id: <jkTrg.5039$ye3.1658@newsread1.news.pas.earthlink.net>
vanagas99@yahoo.com wrote:
> hi,
>
> I have a file with formatted output like this:
>
> Severity: Important
> Status: Unknown
> PDI ID: 1895
> Finding Details
> This vulnerability... blah, blah, blah
> Vulnerability Discussion
> blah, blah, blah text
> Fix recommendations
> blah, blah, blah text
>
> Please advice on how to parse such a file allowing me to put it in a
> column type format. [...]
Let's make the problem simpler by breaking it into pieces. You need two
reg-ex's, one for grabbing things like 'Severity: Important' and one for
grabbing things like 'Finding Details....'
Can you think of a reg-ex that'll match 'Severity: Important' and grab
'Important'?
------------------------------
Date: 8 Jul 2006 12:54:21 -0700
From: "DJ Stunks" <DJStunks@gmail.com>
Subject: Re: converting line input into columns
Message-Id: <1152388461.361579.37150@m79g2000cwm.googlegroups.com>
A. Sinan Unur wrote:
> vanagas99@yahoo.com wrote in news:1152368990.670158.258930
> @m79g2000cwm.googlegroups.com:
>
> > I have a file with formatted output like this:
> >
> > Severity: Important
> > Status: Unknown
> > PDI ID: 1895
> > Finding Details
> > This vulnerability... blah, blah, blah
> > Vulnerability Discussion
> > blah, blah, blah text
> > Fix recommendations
> > blah, blah, blah text
> >
> > Please advice on how to parse such a file allowing me to put it in a
> > column type format.
>
> Please consult the posting guidelines for this group. You can help
> others help you by posting what you have tried so far, and explaining
> the problems you have encountered.
>
> On the other hand, it has been a month or so since I wrote any code, so
> I thought this might be a good warm-up exercise for me.
I was wondering if you were on vacation..... :p
> I am sure someone will correct my errors.
> <script snipped>
I don't see any errors, but it does seem needlessly complex? Perhaps
you were trying to stretch your Perl muscles after your hiatus.
If the record is as static as presented, I would just parse the whole
thing in one fell swoop, repairing leading, trailing and multiline
spacing afterward:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
use English qw{ -no_match_vars };
$INPUT_RECORD_SEPARATOR = '';
RECORD:
while (my $record = <DATA>) {
my (%record) = $record =~ m{\A \s*
(Severity) :(.+?)
(Status) :(.+?)
(PDI . ID) :(.+?)
(Finding . Details) (.+?)
(Vulnerability . Discussion) (.+?)
(Fix . recommendations) (.+?)
\z}xms;
if (not %record) {
warn "Malformed record";
next RECORD;
}
else {
# fix up spacing
for my $entry ( values %record ) {
$entry =~ s/^\s+//gm;
$entry =~ s/\s+$//gm;
$entry =~ s/\n/ /g;
}
print Dumper \%record;
}
}
__DATA__
Severity: Trivial
Status: Uppity
PDI ID: 1895
Finding Details
Finding details for id 1895
Vulnerability Discussion
Vulnerability discussion for id 1895
more discussion
Fix recommendations
Fix recommendations for id 1895
more recommendations
Severity: Severe
Status: Fixed
PDI ID: 1897
Finding Details
Finding details for id 1897
Vulnerability Discussion
Vulnerability discussion for id 1897
more discussion
Fix recommendations
Fix recommendations for id 1897
more recommendations
Severity: Offensive
Status: What's That?
PDI ID: 1898
Finding Details
Finding details for id 1898
Vulnerability Discussion
Vulnerability discussion for id 1898
more discussion
Fix recommendations
Fix recommendations for id 1898
more recommendations
Comments welcome,
-jp
------------------------------
Date: Sat, 08 Jul 2006 20:05:56 GMT
From: "A. Sinan Unur" <1usa@llenroc.ude.invalid>
Subject: Re: converting line input into columns
Message-Id: <Xns97FAA3DD5812Basu1cornelledu@127.0.0.1>
"DJ Stunks" <DJStunks@gmail.com> wrote in
news:1152388461.361579.37150@m79g2000cwm.googlegroups.com:
> A. Sinan Unur wrote:
>> vanagas99@yahoo.com wrote in news:1152368990.670158.258930
>> @m79g2000cwm.googlegroups.com:
>>
>> > I have a file with formatted output like this:
>> >
>> > Severity: Important
>> > Status: Unknown
>> > PDI ID: 1895
>> > Finding Details
>> > This vulnerability... blah, blah, blah
>> > Vulnerability Discussion
>> > blah, blah, blah text
>> > Fix recommendations
>> > blah, blah, blah text
>> >
>> > Please advice on how to parse such a file allowing me to put it in
>> > a column type format.
>>
>> Please consult the posting guidelines for this group. You can help
>> others help you by posting what you have tried so far, and explaining
>> the problems you have encountered.
>>
>> On the other hand, it has been a month or so since I wrote any code,
>> so I thought this might be a good warm-up exercise for me.
>
> I was wondering if you were on vacation..... :p
Thanks for noticing. Some vacation ... some family business ;-)
>> I am sure someone will correct my errors.
>> <script snipped>
>
> I don't see any errors, but it does seem needlessly complex?
Agreed. Your solution is quite elegant.
I do need the warm up.
Sinan
--
A. Sinan Unur <1usa@llenroc.ude.invalid>
(remove .invalid and reverse each component for email address)
comp.lang.perl.misc guidelines on the WWW:
http://augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html
------------------------------
Date: Sat, 8 Jul 2006 08:17:50 -0500
From: Tad McClellan <tadmc@augustmail.com>
Subject: Re: Get the reference to an array from a function...
Message-Id: <slrneavc3u.njt.tadmc@magna.augustmail.com>
David Squire <David.Squire@no.spam.from.here.au> wrote:
> This implies that the subroutine knows the
> context in which it was called,
So you can make your own subroutines that have different scalar context
vs. list context behaviors, just like Perl's builtin functions do.
perldoc -f wantarray
--
Tad McClellan SGML consulting
tadmc@augustmail.com Perl programming
Fort Worth, Texas
------------------------------
Date: Sat, 08 Jul 2006 11:01:41 -0400
From: Sherm Pendley <sherm@Sherm-Pendleys-Computer.local>
Subject: Re: Get the reference to an array from a function...
Message-Id: <m2veq8m97u.fsf@Sherm-Pendleys-Computer.local>
David Squire <David.Squire@no.spam.from.here.au> writes:
> True, and, to me, surprising. This implies that the subroutine knows
> the context in which it was called
It does - have a look at "perldoc -f wantarray".
sherm--
--
Cocoa programming in Perl: http://camelbones.sourceforge.net
Hire me! My resume: http://www.dot-app.org
------------------------------
Date: Sat, 08 Jul 2006 16:32:17 +0100
From: David Squire <David.Squire@no.spam.from.here.au>
Subject: Re: Get the reference to an array from a function...
Message-Id: <e8oj61$fdj$1@gemini.csx.cam.ac.uk>
Sherm Pendley wrote:
> David Squire <David.Squire@no.spam.from.here.au> writes:
>
>> True, and, to me, surprising. This implies that the subroutine knows
>> the context in which it was called
>
> It does - have a look at "perldoc -f wantarray".
Yes. If I had thought longer, I would have remembered wantarray. I guess
it is just a bit of implicit polymorphism in how the argument to return
is handled when putting stuff on the stack.
Regards,
DS
------------------------------
Date: Sat, 8 Jul 2006 19:29:39 +0200
From: Yohan N. Leder <ynleder@nspark.org>
Subject: Re: How to force formatted date (month) language ?
Message-Id: <MPG.1f1a09f1865c5f4e9897c8@news.free.fr>
In article <44af432e$0$25284$afc38c87@news.optusnet.com.au>, sisyphus1
@nomail.afraid.org says...
> For Win32 (and perhaps others ?), try:
> setlocale(LC_TIME, "English_USA.1252");
> [...]
> I'm on an "English" Win32 Machine - and, with the second strftime() call, I
> still got 'Jul' instead of 'juil.' However, when I changed your second
> setlocale() call to:
> setlocale(LC_TIME, "French_France.1252");
>
> I then got the desired 'juil.' in the output.
>
> Cheers,
> Rob
Effectively, I've tried 'setlocale(LC_TIME, "English_USA.1252");' and
'setlocale(LC_TIME, "French_France.1252");' under Win 2K FR and it
respectively gives 'Jul' and "juil.' as expected. Does it means the
'en_US' and 'fr_FR' are not supported under Win32 ? Thanks ;)
Also, I've done the same test under a FreeBSD US and it still gives :
ENGLISH => 08 Jul 2006 @ 17:26:58 GMT
FRENCH => 08 Jul 2006 @ 17:26:58 GMT
What's the right LC_TIME specification for Unix flavors and, more
generally, no-Win32 operating systems ?
------------------------------
Date: 8 Jul 2006 12:22:02 -0700
From: "DJ Stunks" <DJStunks@gmail.com>
Subject: Re: How to force formatted date (month) language ?
Message-Id: <1152386522.438934.268850@35g2000cwc.googlegroups.com>
Yohan N. Leder wrote:
> In article <44af432e$0$25284$afc38c87@news.optusnet.com.au>, sisyphus1
> @nomail.afraid.org says...
> > For Win32 (and perhaps others ?), try:
> > setlocale(LC_TIME, "English_USA.1252");
> > [...]
> > I'm on an "English" Win32 Machine - and, with the second strftime() call, I
> > still got 'Jul' instead of 'juil.' However, when I changed your second
> > setlocale() call to:
> > setlocale(LC_TIME, "French_France.1252");
> >
> > I then got the desired 'juil.' in the output.
> >
> > Cheers,
> > Rob
>
> Effectively, I've tried 'setlocale(LC_TIME, "English_USA.1252");' and
> 'setlocale(LC_TIME, "French_France.1252");' under Win 2K FR and it
> respectively gives 'Jul' and "juil.' as expected. Does it means the
> 'en_US' and 'fr_FR' are not supported under Win32 ? Thanks ;)
To build up a locale string in Win32 follow the instructions on this
page:
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/vclib/html/_crt_language_and_country_strings.asp
The possible choices for lang are listed:
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/vclib/html/_crt_language_strings.asp
The optional Country/Region and Code Pages supported by Windows are:
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/vclib/html/_crt_country_strings.asp
http://www.microsoft.com/globaldev/reference/wincp.mspx
> Also, I've done the same test under a FreeBSD US and it still gives :
> ENGLISH => 08 Jul 2006 @ 17:26:58 GMT
> FRENCH => 08 Jul 2006 @ 17:26:58 GMT
>
> What's the right LC_TIME specification for Unix flavors and, more
> generally, no-Win32 operating systems ?
as perllocale states, you should be able to produce a list of supported
locales on a UNIX-ish OS by using
$ locale -a
in this case, I would say the easiest way to get the same script to
produce similar** results is to simply use 'english' and 'french' as
your locale strings.
HTH,
-jp
Note **: on redhat linux, 'french' produced "jui" rather than "juil.".
------------------------------
Date: 8 Jul 2006 12:40:03 -0700
From: "Bart Van der Donck" <bart@nijlen.com>
Subject: Re: How to force formatted date (month) language ?
Message-Id: <1152387603.366980.217680@p79g2000cwp.googlegroups.com>
Yohan N. Leder wrote:
> Effectively, I've tried 'setlocale(LC_TIME, "English_USA.1252");' and
> 'setlocale(LC_TIME, "French_France.1252");' under Win 2K FR and it
> respectively gives 'Jul' and "juil.' as expected. Does it means the
> 'en_US' and 'fr_FR' are not supported under Win32 ?
Both English and French locales should be available.
> Also, I've done the same test under a FreeBSD US and it still gives :
> ENGLISH =3D> 08 Jul 2006 @ 17:26:58 GMT
> FRENCH =3D> 08 Jul 2006 @ 17:26:58 GMT
1252 is a Microsoft proprietary code page and thus not very useful for
general use across various OS's.
The code of your first post should be okay, but there is something
going on you're not aware of.
The "official" French month abbreviations are the following:
janvier : janv.
f=E9vrier : f=E9vr.
mars : mars
avril : avr.
mai : mai
juin : juin
juillet : juil. ou juill.
ao=FBt : ao=FBt
septembre : sept.
octobre : oct.
novembre : nov.
d=E9cembre : d=E9c.
(http://fr.wikipedia.org/wiki/Mois)
It seems that MS' French_France.1252 follows these rules. But if we
want to force a three-char month notation (more English & informatics
style), then Houston has a problem:
French for June =3D Juin
French for July =3D Juillet
So, the three first chars are 'jui' for both months.
The logic would then be to take the first following character, like
Juin =3D jui
Juillet =3D jul
> What's the right LC_TIME specification for Unix flavors and, more
> generally, no-Win32 operating systems ?
General UNIX/Linux: setlocale('LC_TIME', 'fr_FR.ISO_8859-1');
Win32: setlocale('LC_TIME', 'fr');
Solaris: setlocale("LC_TIME", "fr");
FreeBSD: setlocale("LC_TIME", "fr_FR.ISO8859-1");
(out of http://www.oscommerce-fr.info/faq/qa_info.php?qID=3D52)
I also met:
French_France.1252
french.ISO_8859-1
french
fr_FR
A last super-safe option is to switch to the month number (like '7' in
stead of July). And then manually code out the part to tie it to the
actual month name/abbreviation you wish (according to the language the
user specified on the web page).=20
=20
--=20
Bart
------------------------------
Date: Sat, 08 Jul 2006 12:13:28 -0700
From: Wil Cooley <wcooley@nakedape.cc>
Subject: Re: kill the process
Message-Id: <pan.2006.07.08.19.13.28.82503@nakedape.cc>
On Fri, 07 Jul 2006 12:51:15 -0700, blackdog wrote:
> I have a perl script, I like to kill it (commit suicide) if the script
> is running on the system for more than one hour. What is the best way
> to do it?
If you're not otherwise using sleep() or anything else that sets an alarm,
you can use alarm()--see 'perldoc -f alarm'. Here's a little example:
#!/usr/bin/perl
#
# alarm-test.pl - Simple alarm() demonstration
#
use strict;
use warnings;
use Carp;
# How long is the alarm set for?
my $alarm_after_secs = 3;
# Set a handler for the ALRM signal; the POSIX default action is to
# terminate the program. See perlipc and signal(7).
$SIG{'ALRM'} = sub { croak 'Received alarm signal'; };
# Set the alarm
alarm $alarm_after_secs;
# Do something to kill time here
while (1) {
my $x = <STDIN>;
}
Wil
------------------------------
Date: Sat, 8 Jul 2006 20:26:45 GMT
From: Charles DeRykus <ced@blv-sam-01.ca.boeing.com>
Subject: Re: kill the process
Message-Id: <J23q4J.G0q@news.boeing.com>
blackdog wrote:
> I have a perl script, I like to kill it (commit suicide) if the script
> is running on the system for more than one hour. What is the best way
> to do it?
>
Here's a possible Unix solution if you mean what I think you mean:
# near the top of your script
$SIG{ALRM} = sub { die 'internal timeout'; };
alarm(3600);
Of course, you should read all the caveats in the docs about
mixing sleep and alarm, stacking alarms, etc. Also, sometimes
signals may be lost, particularly across fork boundaries.
Hth,
--
Charles DeRykus
------------------------------
Date: 8 Jul 2006 13:18:03 -0700
From: "Robert Dodier" <robert.dodier@gmail.com>
Subject: Need help to find byte offsets for regexps in a file
Message-Id: <1152389883.658944.161960@75g2000cwc.googlegroups.com>
Hello,
I am hoping to find byte offsets of regular expressions in a file.
I'm working on the built-in doc system for Maxima, an open-
source computer algebra system. The doc text is a Texinfo
output file. I want to find the strings " -- Function: FOO (x, y, z)
..."
and print their byte offsets, and the number of bytes from one such
string to the end of the corresponding documentation item
(which might be the next " -- Function: " item or a different regex).
Here is some pseudocode to illustrate what I am attempting --
let re1 = " --Function: <some name>"
let re2 = FOO (not sure what to put here yet)
slurp file into string S (this is OK, texinfo limits file to 300 k)
byte_offset_1 = 0
while seach for re1 beginning from byte_offset_1 succeeds
extract <some name> from re1 match
search for re2 beginnng from byte_offset_1
let byte_offset_2 = byte offset of re2 match
print <some name>, byte_offset_1, byte_offset_2
let byte_offset_1 = byte_offset_2
I'm planning to slurp the resulting output into another program
that will then carry out matching on the list of <some name> strings
and use file seek to grab the corresponding texts. That program
will be written in another programming language so let's not worry
about that now.
If anyone has some advice about making a workable Perl
program from this pseudocode, I'll be very grateful.
Thanks in advance & all the best.
Robert Dodier
------------------------------
Date: 8 Jul 2006 10:46:08 -0700
From: M_Mann@artenom.com
Subject: Pls excuse if you consider this off-topic. Conceptual artists seek programmers here.
Message-Id: <1152380768.581234.255960@p79g2000cwp.googlegroups.com>
Hello,
Pls excuse if you consider this off-topic. Conceptual artists seek
programmers here.
We are authors of "Exhibition of Living Managers" (MANAGEX,
www.managex.info) which is global conceptual art project, performed in
world's leading contemporary art centres. Art objects at MANAGEX are
real employed managers, who volunteer to exhibit themselves in a
gallery setting. Our new project is "Exhibition of Living Programmers"
(PROGRAMEX), which is similar to MANAGEX but focusing on professional
programmers.
Managex' official website is www.managex.info. We have also just opened
a Google Group here on Managex project:
http://groups.google.com/group/Exhibition-of-Living-Managers-MANAGEX,
where you are welcome to register and participate. Once we have
substantial number of interested programmers, we will open a dedicated
group on Programex.
Hope to hear from you and see you at Programex. Again, sorry if this
posting disturbed anybody.
Best,
MANAGEX / PROGRAMEX team www.managex.info
------------------------------
Date: 8 Jul 2006 13:35:00 -0700
From: "shrike@cyberspace.org" <shrike@cyberspace.org>
Subject: Profanity checking, phonetically.
Message-Id: <1152390900.669510.327250@s13g2000cwa.googlegroups.com>
Howdy,
I have a randomly generated alphabetic string, and I need to profanity
check it, phonetically. I didn't see anything like this on CPAN.
Anybody done anything like this?
-Thanks in advance
-Matt
------------------------------
Date: 8 Jul 2006 20:42:02 GMT
From: John Bokma <john@castleamber.com>
Subject: Re: Profanity checking, phonetically.
Message-Id: <Xns97FA9FB64366Dcastleamber@130.133.1.4>
"shrike@cyberspace.org" <shrike@cyberspace.org> wrote:
> Howdy,
>
> I have a randomly generated alphabetic string, and I need to profanity
> check it, phonetically. I didn't see anything like this on CPAN.
>
> Anybody done anything like this?
Soundex? And there is a better algorithm IIRC.
OTOH, why bother, people start using fsck, or f*kc etc.
--
John Bokma Freelance software developer
&
Experienced Perl programmer: http://castleamber.com/
------------------------------
Date: Sat, 8 Jul 2006 17:26:05 +0300
From: "Veli-Pekka Tätilä" <vtatila@mail.student.oulu.fi>
Subject: Using References to Formats?
Message-Id: <e8ofa7$3i1$1@news.oulu.fi>
Hi,
Browsing perldiag, I noticed messages related to format references. So being
curious and wishing to continue my exploration of Perl's dark and archaic
corners, I decided to write a sample program to see how format references
could be used in Perl. first is an account of what I've attempted, the
relevant code in small chunks and the output received. The mail ends with
the full program source and output from a sample run.
Curiously, references to formats are not documented in perlref, perlform
etc... I'm running ActiveState Perl v5.8.7 (build 815, XP Pro SP2 English).
And now to the program:
To motivate taking references to formats I started out with a rather useless
toy function that generates formats using eval. The format name and the
number of chars it extracts from a global named $text can be parameterized.
sub genForm
{ # A simple format named $name outputting $n chars of $text.
my($name, $length) = @_;
eval
(
"format $name =\nFirst $length chars: @" .
'<' x $length . "\n\$text\n."
); # eval
die $@ unless $@ eq ''; # Eval failed.
} # sub
There's a related output function, which given a format name, writes it out
to the default file handle:
sub writeForm
{ # Write out the specified format.
local $~ = shift;
write;
} # sub
At this point I started wondering whether I could use a real reference to a
format in stead of an "indirect format" (in analogy to indirect file
handles). First I had to use the *foo{THING} syntax to get at a format. The
following statement, using main's symbol table and *foo{FORMAT} did the
trick for me:
my $formRef = *{ $::{$name} }{FORMAT}; # *foo{FORMAT} syntax in main
package.
To test what info could be gleaned from a format reference I made a function
for that, too. Here it is:
sub dumpForm
{ # Dump info on a format reference.
my $formRef = shift;
print "The formref $name is:";
print "Stringified: $formRef";
print "of type: " . ref($formRef);
print "Dumped: ";
eval { print Dumper($formRef) };
} # sub
Oddly, neither the docs for the ref built-in nor Data::Dumper mentioned
references to formats. Despite this the ref function and stringification
worked all right but Dumper didn't. Here's some output:
The formref eight is:
Stringified: FORMAT(0x18c4ecc)
of type: FORMAT
Dumped:
cannot handle ref type 14 at C:/Perl/lib/Data/Dumper.pm line 167.
$VAR1 = ;
I wonder if the debugger does any better. I have not tested it yet.
To make format references useful at all, I suppose one would have to be able
to dereference them somehow. IS that possible, and if so how? Formats have
no sigil so I myself have absolutely no idea how they could be dereferenced.
Would being able to work with format references bring any benefits compared
to refering to formats by name? I suppose not though using format references
does seem to sort of work.
The first thing that occurred to me was to try assigning a format reference
to $~, as opposed to a format name. The same writeForm function could be
used, just passing it a reference:
eval { writeForm($formRef) };
print "Using formref for $~: $@";
This strategy didn't work all that well. The statement printing the eval
error outputs:
Using formref for STDOUT: Undefined format "FORMAT(0x18c4eb4)" called at
C:\programming\plx\test.plx line 29.
Apparently no magical dereferencing is going on here. Starting to run out of
ideas, I thought of testing what would happen if I tried to dereference the
format as a scalar. I have no real rationale for that apart from scalar
derefs working for elements in arrays and hashes. I did realize right from
the start this wouldn't work for formats but typed in the following
nevertheless:
eval { writeForm(${$formRef}) };
print "Using desperate scalar deref for $~: $@";
And the output is:
Using desperate scalar deref for STDOUT: Not a format reference at
C:\programming\plx\test.plx line 29.
Quite right, not a format reference. But the thing that puzzles me here is
that the error is phrased as though Perl expected a format reference. Yet
when I give it one, as in the previous attempt, it doesn't seem to like it
any better, either. It just takes the stringified form of the reference to
be a format name which is no good.
Finally, here's the full code followed by some sample output:
Full code:
use strict; use warnings;
use Data::Dumper;
our $text = 'this is a test';
(my $name, local $\) = ('eight', "\n");
genForm($name , 8);
writeForm($name);
my $formRef = *{ $::{$name} }{FORMAT}; # *foo{FORMAT} syntax in main
package.
dumpForm($formRef);
# Try using formatref in stead of format name for writing the data.
eval { writeForm($formRef) };
print "Using formref for $~: $@";
eval { writeForm(${$formRef}) };
print "Using desperate scalar deref for $~: $@";
sub genForm
{ # A simple format named $name outputting $n chars of $text.
my($name, $length) = @_;
eval
(
"format $name =\nFirst $length chars: @" .
'<' x $length . "\n\$text\n."
); # eval
die $@ unless $@ eq ''; # Eval failed.
} # sub
sub writeForm
{ # Write out the specified format.
local $~ = shift;
write;
} # sub
sub dumpForm
{ # Dump info on a format reference.
my $formRef = shift;
print "The formref $name is:";
print "Stringified: $formRef";
print "of type: " . ref($formRef);
print "Dumped: ";
eval { print Dumper($formRef) };
} # sub
Sample output:
First 8 chars: this is a
The formref eight is:
Stringified: FORMAT(0x18c4eb4)
of type: FORMAT
Dumped:
cannot handle ref type 14 at C:/Perl/lib/Data/Dumper.pm line 167.
$VAR1 = ;
Using formref for STDOUT: Undefined format "FORMAT(0x18c4eb4)" called at
C:\programming\plx\test.plx line 29.
Use of uninitialized value in scalar assignment at
C:\programming\plx\test.plx line 28.
Using desperate scalar deref for STDOUT: Not a format reference at
C:\programming\plx\test.plx line 29.
--
With kind regards Veli-Pekka Tätilä (vtatila@mail.student.oulu.fi)
Accessibility, game music, synthesizers and programming:
http://www.student.oulu.fi/~vtatila/
------------------------------
Date: 8 Jul 2006 10:15:03 -0700
From: "attn.steven.kuo@gmail.com" <attn.steven.kuo@gmail.com>
Subject: Re: Using References to Formats?
Message-Id: <1152378903.285640.148110@75g2000cwc.googlegroups.com>
Veli-Pekka T=E4til=E4 wrote:
(snipped)
> At this point I started wondering whether I could use a real reference to=
a
> format in stead of an "indirect format" (in analogy to indirect file
> handles). First I had to use the *foo{THING} syntax to get at a format. T=
he
> following statement, using main's symbol table and *foo{FORMAT} did the
> trick for me:
>
> my $formRef =3D *{ $::{$name} }{FORMAT}; # *foo{FORMAT} syntax in main
> package.
>
> To test what info could be gleaned from a format reference I made a funct=
ion
> for that, too. Here it is:
>
> sub dumpForm
> { # Dump info on a format reference.
> my $formRef =3D shift;
> print "The formref $name is:";
> print "Stringified: $formRef";
> print "of type: " . ref($formRef);
> print "Dumped: ";
> eval { print Dumper($formRef) };
> } # sub
>
> Oddly, neither the docs for the ref built-in nor Data::Dumper mentioned
> references to formats. Despite this the ref function and stringification
> worked all right but Dumper didn't. Here's some output:
>
> The formref eight is:
> Stringified: FORMAT(0x18c4ecc)
> of type: FORMAT
> Dumped:
> cannot handle ref type 14 at C:/Perl/lib/Data/Dumper.pm line 167.
> $VAR1 =3D ;
You can use Devel::Peek instead of Data::Dumper
if you want to look at the guts of a format reference.
For me Devel::Peek::Dump prints:
SV =3D RV(0x1821258) at 0x182ad40
REFCNT =3D 1
FLAGS =3D (PADBUSY,PADMY,ROK)
RV =3D 0x182ae60
SV =3D PVFM(0x4489e0) at 0x182ae60
REFCNT =3D 3
FLAGS =3D (CLONE)
IV =3D 0
NV =3D 0
COMP_STASH =3D 0x0
START =3D 0x448b90 =3D=3D=3D> 5120
ROOT =3D 0x448c90
XSUB =3D 0x0
XSUBANY =3D 0
GVGV::GV =3D 0x182ae9c "main" :: "eight"
etc.,
> I wonder if the debugger does any better. I have not tested it yet.
>
> To make format references useful at all, I suppose one would have to be a=
ble
> to dereference them somehow. IS that possible, and if so how? Formats have
> no sigil so I myself have absolutely no idea how they could be dereferenc=
ed.
> Would being able to work with format references bring any benefits compar=
ed
> to refering to formats by name? I suppose not though using format referen=
ces
> does seem to sort of work.
Well, just use another typeglob for dereferencing.
Throw in the 'reftype' function from the
Scalar::Util module and then you can
pass either a format NAME or format
reference to writeForm:
> sub writeForm
> { # Write out the specified format.
> local $~ =3D shift;
> write;
> } # sub
becomes:
use Scalar::Utiil ('reftype');
sub writeForm {
if (reftype $_[0] and reftype $_[0] eq 'FORMAT') {
local *FOO;
*FOO =3D $_[0];
local $~ =3D 'FOO';
write;
} else {
local $~ =3D shift;
write;
}
}
--=20
Hope this helps,
Steven
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc. For subscription or unsubscription requests, send
#the single line:
#
# subscribe perl-users
#or:
# unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.
NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice.
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.
#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V10 Issue 9438
***************************************