[25027] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 7277 Volume: 10

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Wed Oct 20 18:12:09 2004

Date: Wed, 20 Oct 2004 15:10:11 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Wed, 20 Oct 2004     Volume: 10 Number: 7277

Today's topics:
        Perl output displaying ??? characteres (Jay)
    Re: Perl output displaying ??? characteres <uguttman@athenahealth.com>
    Re: Perl output displaying ??? characteres <1usa@llenroc.ude.invalid>
    Re: perl to english <uguttman@athenahealth.com>
    Re: perl to english (Malcolm Dew-Jones)
    Re: perl to english <ioneabu@yahoo.com>
    Re: Perl Whirl IV from Dubrovnik <perl@my-header.org>
    Re: Regex matching non-contiguous sheds of text <Jon.Ericson@jpl.nasa.gov>
    Re: Regex matching non-contiguous sheds of text <elektrophyte-yahoo>
    Re: Regex matching non-contiguous sheds of text <mritty@gmail.com>
    Re: Regex matching non-contiguous sheds of text <mritty@gmail.com>
    Re: Regex matching non-contiguous sheds of text <elektrophyte-yahoo>
    Re: regex to clean path <parv_@yahooWhereElse.com>
    Re: Regular Expression for HTML Tags and Special Charac (Vijai Kalyan)
    Re: source a config file (Malcolm Dew-Jones)
    Re: source a config file carloschoenberg@yahoo.com
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: 20 Oct 2004 12:24:37 -0700
From: jay.mistry@gmail.com (Jay)
Subject: Perl output displaying ??? characteres
Message-Id: <3490f341.0410201124.6a9f15a4@posting.google.com>

Hi,

I'm trying to write a simple PERL parser that searches a given regexp
given a file. The types of files that I'm trying to search are over
10000 lines of code. Following is my code

print "Enter file name and location:";
chop($basfile = <STDIN>);
open INFILE, "< $basfile" or die "Can't open file:$basfile";
open OUTFILE, "> $basfile.out";
while(<INFILE>) {
	$_;
	print OUTFILE $_;
	if($_ =~ m/ProcedureName/) {
		print "$_\n";
		break;
	}
}

This seems to output stuff that is really crazy. It prints out a bunch
of ???????????? characters when viewed in EditPlus; these are chinese
characters when viewed in WordPad. I even try to do just simple input
and output (removed the if statement) and it still does that. Funny
thing is, if I take a subset of this file, then it seems to do the
correct output. What am I doing wrong? Any help will be very helpful.

Thank you.

Jay.


------------------------------

Date: Wed, 20 Oct 2004 15:44:15 -0400
From: Uri Guttman <uguttman@athenahealth.com>
Subject: Re: Perl output displaying ??? characteres
Message-Id: <m3k6tljhtc.fsf@linux.local>

>>>>> "J" == Jay  <jay.mistry@gmail.com> writes:

  J> print "Enter file name and location:";
  J> chop($basfile = <STDIN>);

use chomp.

  J> open INFILE, "< $basfile" or die "Can't open file:$basfile";
  J> open OUTFILE, "> $basfile.out";
  J> while(<INFILE>) {
  J> 	$_;

what is that supposed to do?
  J> 	print OUTFILE $_;
  J> 	if($_ =~ m/ProcedureName/) {

no need for the $_ as it is the default string to match against.
also no need for the m/ as // is a match op by itself.

  J> 		break;

that isn't perl. please paste your code and not retype it.

  J> This seems to output stuff that is really crazy. It prints out a bunch
  J> of ???????????? characters when viewed in EditPlus; these are chinese
  J> characters when viewed in WordPad. I even try to do just simple input
  J> and output (removed the if statement) and it still does that. Funny
  J> thing is, if I take a subset of this file, then it seems to do the
  J> correct output. What am I doing wrong? Any help will be very helpful.

how things display has to do with your screen/terminal emulator and
nothing to do with perl.

but how text data is interpreted could be an issue. since you mention
seeing chinese characters i suspect a unicode/utf issue and i will leave
that to others as i am an ascii bigot :)

uri


------------------------------

Date: 20 Oct 2004 19:58:45 GMT
From: "A. Sinan Unur" <1usa@llenroc.ude.invalid>
Subject: Re: Perl output displaying ??? characteres
Message-Id: <Xns9588A28CD5DE3asu1cornelledu@132.236.56.8>

jay.mistry@gmail.com (Jay) wrote in news:3490f341.0410201124.6a9f15a4
@posting.google.com:

> This seems to output stuff that is really crazy. It prints out a bunch
> of ???????????? characters when viewed in EditPlus; these are chinese
> characters when viewed in WordPad. I even try to do just simple input
> and output (removed the if statement) and it still does that. Funny
> thing is, if I take a subset of this file, then it seems to do the
> correct output. 

> What am I doing wrong? 

Other than you, who cares?

> Any help will be very helpful.

Here is some help: EditPlus is a supported, commercial product. Why don't 
you ask them how to use it? Oh, I see, you did not register.

Go help yourself.

Sinan.


------------------------------

Date: Wed, 20 Oct 2004 14:19:44 -0400
From: Uri Guttman <uguttman@athenahealth.com>
Subject: Re: perl to english
Message-Id: <m3sm89jlq7.fsf@linux.local>

>>>>> "w" == wana  <ioneabu@yahoo.com> writes:

  w> Helgi Briem wrote:
  >> 
  >> Well written Perl is already as close to English as code gets.

  w> In my opinion, SQL is slightly closer to English than Perl.  But
  w> then again, SQL is a 4G language and Perl is 3G I think.  How about
  w> writing English comments mixed in with your code and write a
  w> program that can extract and format the comments?  Has anyone tried
  w> doing that?

sql close to english? as cobol is also close? did you read my comment on
how useless english is to describing program logic? ever heard of
ambiguities in language? my favorite (especially in poorly written
technical documentation) is dangling pronouns. which previous thing does
'it' refer to?

so again, you are barking up the wrong tree. this has been tried many
times and is doomed to failure. natural languages are not good for
logical flow descriptions and only perl is good for program poetry!

uri


------------------------------

Date: 20 Oct 2004 11:26:50 -0800
From: yf110@vtn1.victoria.tc.ca (Malcolm Dew-Jones)
Subject: Re: perl to english
Message-Id: <4176adea@news.victoria.tc.ca>

buildmorelines (bulk88@hotmail.com) wrote:
: Is there any script or pragma or something that will translate perl
: code to pure english, like that perl latin module, just in english. I
: want to show a person who cant read perl code or any computer
: language, some perl code, so they have remotly a clue what the code
: does or how it flows. It doesnt need to perfectly make sense or be
: proper english sentences. Just soemthing that will translate perl code
: and/or syntax to english.

I don't know the name of the project, and I suspect it is _written_ in
perl, not _tranlating_ perl.

However, there was a project that converted program code into proper
english.

The object was to prove that computer code is an expression of an idea,
not merely functional, and therefore (in the U.S.) protected.  This came
about due to issues around the licensing of DVD code and the intention of
the owners of the code to prevent people from being allowed to read copies
of the code.  (not whether they could _use_ the code, but whether they
could even be allowed to _read_ the code).

The argument was that by proving computer code expresses ideas, they would
have proved that people can not be prevented from reading (and discussing
etc) those ideas.

So, googling over dvd topics might find something.



------------------------------

Date: Wed, 20 Oct 2004 17:21:43 -0400
From: wana <ioneabu@yahoo.com>
Subject: Re: perl to english
Message-Id: <10ndlt9dh9og462@news.supernews.com>

buildmorelines wrote:

> Is there any script or pragma or something that will translate perl
> code to pure english, like that perl latin module, just in english. I
> want to show a person who cant read perl code or any computer
> language, some perl code, so they have remotly a clue what the code
> does or how it flows. It doesnt need to perfectly make sense or be
> proper english sentences. Just soemthing that will translate perl code
> and/or syntax to english.

It is possible!  I have written and tested a preliminary version of the
program that performs the task you are requesting.  Here it is:

#! /usr/bin/perl

use STD; #my home made module
use strict;
use warnings;

my $file = $ARGV[0];
ReplaceInFile('=', 'is equal to', $file);

I tested it on several perl scripts and it worked beautifully.





------------------------------

Date: Wed, 20 Oct 2004 21:33:57 +0200
From: Matija Papec <perl@my-header.org>
Subject: Re: Perl Whirl IV from Dubrovnik
Message-Id: <a0fdn01v88e8706p4j884hl6ke5jkcfu5v@4ax.com>

X-Ftn-To: David K. Wall 

"David K. Wall" <dwall@fastmail.fm> wrote:
>> it's Larry on most of the pictures for everybody wanted to take
>> picture with him.
>
>No names with the pictures? I recognize Larry and Randal, but if there 
>are other names I might know, it would be pleasant (although by no 
>means necessary) to be able to associate a face with them. 

Sorry, can't help you there; I've heard that whirl (or captain?) was moving
very fast so there was just quick launch break.



-- 
Matija


------------------------------

Date: Wed, 20 Oct 2004 11:39:45 -0700
From: Jon Ericson <Jon.Ericson@jpl.nasa.gov>
Subject: Re: Regex matching non-contiguous sheds of text
Message-Id: <rcgoeixb5e6.fsf@Jon-Ericson.sdsio.prv>

DM <elektrophyte-yahoo> writes:

> I'm trying to design a regular expression to match the href attribute
> of <a> tags. I'm testing it on the command line (on Redhat Linux
> Enterprise Server) using grep with the Perl regex option.
>
> Here's the command I'm using:
>
> # grep -rHInPo --color=auto 'href=.*TEA-21_Side-by-Side\.pdf[^>]*>'
> /home/mtc_website/
>
> (On my console, the above is all one line. The URL part -- 
> "TEA-21_Side-by-Side\.pdf" in this example, would be determined at
> runtime in the actual Perl script.)
>
> It almost works as expected. I set the color and -o options in order
> to clearly show the highlighted match. In most cases it *does* match
> exactly what I want it to.
>
> However, in a few cases what is matched is totally unexpected.

If you were actually using perl, this wouldn't be too difficult with
the HTML::Parser module.  See perldoc -q html for some discussion
about the pitfalls of using a regex to parse HTML.

Jon


------------------------------

Date: Wed, 20 Oct 2004 12:39:53 -0700
From: DM <elektrophyte-yahoo>
Subject: Re: Regex matching non-contiguous sheds of text
Message-Id: <4176bf32$0$801$2c56edd9@news.cablerocket.com>

Jon Ericson wrote:

> DM <elektrophyte-yahoo> writes:
> 
> 
>>I'm trying to design a regular expression to match the href attribute
>>of <a> tags. I'm testing it on the command line (on Redhat Linux
>>Enterprise Server) using grep with the Perl regex option.
>>
>>Here's the command I'm using:
>>
>># grep -rHInPo --color=auto 'href=.*TEA-21_Side-by-Side\.pdf[^>]*>'
>>/home/mtc_website/

[ ... ]

>>However, in a few cases what is matched is totally unexpected.
> 
> 
> If you were actually using perl, this wouldn't be too difficult with
> the HTML::Parser module.  See perldoc -q html for some discussion
> about the pitfalls of using a regex to parse HTML.
> 
> Jon

Thanks for the reply. I don't see how the HTML::Parser module would help me in 
the task I described in my original post.

I checked perldoc as you recommended, but the "pitfalls" mentioned don't seem to 
apply to what I'm doing.

As I explained in my original post, I'm not trying to do some kind of general 
HTML parsing operation, such as stripping out HTML tags. I'm trying to find this 
string:

href="[SOME_URL_FRAGMENT].pdf">

My regex almost works, but is acting really weird in a few cases. I'm trying to 
nail down the reason for that. Perhaps I have a misconception or 
misunderstanding of regex syntax?


------------------------------

Date: Wed, 20 Oct 2004 19:52:21 GMT
From: "Paul Lalli" <mritty@gmail.com>
Subject: Re: Regex matching non-contiguous sheds of text
Message-Id: <Vlzdd.4983$bV5.4448@trndny07>

"DM" <elektrophyte-yahoo> wrote in message
news:41769fe2$0$796$2c56edd9@news.cablerocket.com...
> I'm trying to design a regular expression to match the href attribute
of <a>
> tags. I'm testing it on the command line (on Redhat Linux Enterprise
Server)
> using grep with the Perl regex option.
>
> Here's the command I'm using:
>
> # grep -rHInPo --color=auto 'href=.*TEA-21_Side-by-Side\.pdf[^>]*>'
> /home/mtc_website/

[^>]*

means matches EVERYTHING it can in the string.  Here it can match
everything until the very last > in the string.

You need to make it non-greedy.

[^>]*?

means to match only as much as as necessary to make the pattern match
succeed.

Paul Lalli




------------------------------

Date: Wed, 20 Oct 2004 19:53:22 GMT
From: "Paul Lalli" <mritty@gmail.com>
Subject: Re: Regex matching non-contiguous sheds of text
Message-Id: <Smzdd.3083$TU5.2496@trndny06>

"Paul Lalli" <mritty@gmail.com> wrote in message
news:Vlzdd.4983$bV5.4448@trndny07...
> "DM" <elektrophyte-yahoo> wrote in message
> news:41769fe2$0$796$2c56edd9@news.cablerocket.com...
> > I'm trying to design a regular expression to match the href
attribute
> of <a>
> > tags. I'm testing it on the command line (on Redhat Linux Enterprise
> Server)
> > using grep with the Perl regex option.
> >
> > Here's the command I'm using:
> >
> > # grep -rHInPo --color=auto 'href=.*TEA-21_Side-by-Side\.pdf[^>]*>'
> > /home/mtc_website/
>
> [^>]*
>
> means matches EVERYTHING it can in the string.  Here it can match
> everything until the very last > in the string.
>
> You need to make it non-greedy.
>
> [^>]*?
>
> means to match only as much as as necessary to make the pattern match
> succeed.

Of course, this applies to the first * in your regexp as well.

href=.*?

Paul Lalli




------------------------------

Date: Wed, 20 Oct 2004 14:07:55 -0700
From: DM <elektrophyte-yahoo>
Subject: Re: Regex matching non-contiguous sheds of text
Message-Id: <4176d3d4$0$801$2c56edd9@news.cablerocket.com>

>>>Here's the command I'm using:
>>>
>>># grep -rHInPo --color=auto 'href=.*TEA-21_Side-by-Side\.pdf[^>]*>'
>>>/home/mtc_website/
>>
>>[^>]*
>>
>>means matches EVERYTHING it can in the string.  Here it can match
>>everything until the very last > in the string.
>>
>>You need to make it non-greedy.
>>
>>[^>]*?
>>
>>means to match only as much as as necessary to make the pattern match
>>succeed.
> 
> 
> Of course, this applies to the first * in your regexp as well.
> 
> href=.*?
> 
> Paul Lalli
> 
> 

OK, thanks. That seems to help.

dm


------------------------------

Date: Wed, 20 Oct 2004 15:03:45 -0500
From: parv <parv_@yahooWhereElse.com>
Subject: Re: regex to clean path
Message-Id: <slrncndh9o.dj.parv_@localhost.holy.cow>

in message <el1bn0la63rr4td74r1nfv00dfgsdmuvgq@4ax.com>,
wrote Michele Dondi ...

> On Mon, 18 Oct 2004 16:45:22 -0500, parv <parv_@yahooWhereElse.com>
> wrote:
>
>>> Does anybody know an elegant oneliner using regex?
>>
>>No regex, just split(); on top of that, in more than one line ...
>
> Huh?!? 66 lines of code? I'm not even watching into it, but... I

66.  How did you get that number? (No, no need to reply.)


> heartily hope that it does *much* more than the OP requested...

If interested, read or try it yourself.


>>  printf "Unordered: %s\n\nOrdered: %s\n"
>>  , ${ make_path( @paths ) }
>>  , ${ make_path_ordered( @paths ) }
>>  ;
>
> Hmmm, I couldn't help giving a peek at least into the first few
> lines... however this makes me think I should be moderately glad
> I'm refusing to read it all!

Hey, whatever suits you, fine w/ me.


  - parv

-- 
As nice it is to receive personal mail, too much sweetness causes
tooth decay.  Unless you have burning desire to contact me, do not do
away w/ WhereElse in the address for private communication.



------------------------------

Date: 20 Oct 2004 12:46:30 -0700
From: vijai.kalyan@gmail.com (Vijai Kalyan)
Subject: Re: Regular Expression for HTML Tags and Special Characters
Message-Id: <18b36e50.0410201146.324adeeb@posting.google.com>

> How can I allowed some HTML-Tags like <BR>, <B>, <P> but
> filter out <, >, when they stand alone? 
> 
> Must be something like: "^[A-Za-Z0-9\>\<]+$"
> for the < and >, but where do i have to put in my tags?

As others said below, you should be using a parser instead of regexp
for this, but I am just a beginner with perl and am trying to answer
questions to get practice.

If you really want to use a regexp, lookup an example that's in the
first chapter of the Camel book.

It goes something like this: (I will let u do the homework :)

m/<(.*?)>.*?(\/\1)/

which means, 

a. minimally match something within a < and a >

b. minimally match anything (. matches everything but newline, so u
might want to modify that - again, homework :)

c. make a back reference to what was found between the first < and >.

NOTE:

a. This probably won't work if you have attributes so a modification
might be:

m/<\s*(\w+)\s+.*?>.*?(\/\1)/

which (I think) means:

i. Match a < followed any number of ws chars, followed by one or more
word chars followed again by ws chars.

ii. Finally any number of chars is minimally matched till again a > is
met.

iii. Again the back reference is used to force the same pattern (here,
this will be the tag) to match at the end.

As someone said, it gets complicated. 

hth,
----
vijai.


------------------------------

Date: 20 Oct 2004 11:49:44 -0800
From: yf110@vtn1.victoria.tc.ca (Malcolm Dew-Jones)
Subject: Re: source a config file
Message-Id: <4176b348@news.victoria.tc.ca>

Sam Holden (sholden@flexal.cs.usyd.edu.au) wrote:
: On 19 Oct 2004 21:23:59 -0700, carloschoenberg@yahoo.com wrote:
: > I want to source (do/require/use) a config file. It must be compatible
: > with warnings and strict.
: >
: > I don't want warnings about a variable being used only once.
: >
: > I don't want to put too much cruft in the config file. A package
: > statement or a my is ok but an exporter is not. There will be many
: > config files.

: That's nice. Do you have a question?


I think he's asking for advice on how best to do this.

There are some nice code examples amongst the perl distrib that use the
constant module to define config constants.  grep "use constant" in the
perl directory tree would probably find them.

Otherwise, the following is the stripped down version of one similar
config file I have.  If the syntax is wrong it's because I just stripped
it down - the original works.  It includes exporter, but if the config
uses the correct packages then exporting is not required.  I didn't
quickly find an example without exporter, but I have done this same basic
setup without exporter in the past.  The point is to get all globals into
a single @list_of_globals so that all programs can refer to a single value
to define a set of globals, with little redundancy, and allowing warnings
and strict to check (almost) everything.



# the config file

	use strict;

	######################
	package My::Config;
	######################
		our (@ISA, @EXPORT);
		require Exporter;
		@ISA = ('Exporter');
		@EXPORT = @::MY_GLOBALS;

	# ----------------------------
	# configuration options
	# ----------------------------
	BEGIN{  @::MY_GLOBALS =
	qw(
	    $MAX_SIZE
	    $USER_DOC_ROOT
	  )  } #/BEGIN

	use vars @::MY_GLOBALS ; # this checks for typos in the config file

	$MAX_SIZE = 100_000;
	$USER_DOC_ROOT = '/a/path/to/a/dir/';

	1; # true for caller



# example use of the config file

	use My::Config;
	use vars @::MY_GLOBALS;


There is only a single variable that doesn't get checked - @::MY_GLOBAL


------------------------------

Date: 20 Oct 2004 15:04:22 -0700
From: carloschoenberg@yahoo.com
Subject: Re: source a config file
Message-Id: <8c526b62.0410201404.55667a9a@posting.google.com>

"A. Sinan Unur" <usa1@llenroc.ude.invalid> wrote in message news:<Xns9588E677D7Easu1cornelledu@132.236.56.8>...
> carloschoenberg@yahoo.com wrote in 
> news:8c526b62.0410192023.4c2ce6b@posting.google.com:
> 
> > I want to source (do/require/use) a config file. It must be compatible
> > with warnings and strict.
> > 
> > I don't want warnings about a variable being used only once.
> > 
> > I don't want to put too much cruft in the config file. A package
> > statement or a my is ok but an exporter is not. There will be many
> > config files.
> 
> It is always nice to know what you want and what you don't want.
> 
> Did you have a Perl question?

I want to do what I want to do (described above), in Perl. I want to
do this without doing what I don't want to do (described above).

Here's one way I tried to do it, based on a random suggestion found on
the net:

$ cat a
$x="hi";

$ cat b
use strict;
use warnings;

{ package MyConfig; do './a'; }

print "$MyConfig::x\n";

But this happens:
$ perl b
Name "MyConfig::x" used only once: possible typo at b line 6.
hi


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc.  For subscription or unsubscription requests, send
#the single line:
#
#	subscribe perl-users
#or:
#	unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.  

NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice. 

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V10 Issue 7277
***************************************


home help back first fref pref prev next nref lref last post