[30893] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 2138 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Sun Jan 18 18:09:40 2009

Date: Sun, 18 Jan 2009 15:09:04 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Sun, 18 Jan 2009     Volume: 11 Number: 2138

Today's topics:
    Re: Checking return values (was: Re: opening a file) <whynot@pozharski.name>
    Re: Checking return values (was: Re: opening a file) <tim@burlyhost.com>
    Re: Checking return values <tim@burlyhost.com>
    Re: Circular lists <gamo@telecable.es>
    Re: Circular lists <xhoster@gmail.com>
    Re: fastest way to allocate memory ? <stoupa@practisoft.cz>
    Re: fastest way to allocate memory ? <hjp-usenet2@hjp.at>
    Re: fastest way to allocate memory ? <xhoster@gmail.com>
    Re: Parsing out text from in between HTML tags <tadmc@seesig.invalid>
    Re: unable to open file <Tintin@teranews.com>
    Re: unable to open file <tadmc@seesig.invalid>
    Re: unable to open file <hjp-usenet2@hjp.at>
    Re: What do you need to have to be considered a Master  <hjp-usenet2@hjp.at>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Sun, 18 Jan 2009 14:07:54 +0200
From: Eric Pozharski <whynot@pozharski.name>
Subject: Re: Checking return values (was: Re: opening a file)
Message-Id: <slrngn66us.l81.whynot@orphan.zombinet>

On 2009-01-18, Tad J McClellan <tadmc@seesig.invalid> wrote:
> Eric Pozharski <whynot@pozharski.name> wrote:
>> On 2009-01-17, Tad J McClellan <tadmc@seesig.invalid> wrote:
>>> Tim Greer <tim@burlyhost.com> wrote:
>>>
>>> [ recap: I wrote:
>>>
>>>     You should always, yes *always*, check the return value from open()
>>> ]
>>>
>>>
>>>> I would personally never intentionally fail
>>>> to check a return on a call
>>>
>>>
>>> Just to show that my "pendulum of inflexibility" can also swing
>>> the other way, I *never* check the return value from print().
>>>
>>> print() has a return value to indicate success, as do many functions
>>> in Perl.
>>>
>>> But once you have a successfully opened write filehandle, about the only
>>> thing that can go wrong with a print() is "filesystem full".
>>>
>>> (assuming the filehandle is connected to a real file rather than a 
>>>  socket or something.)
>>>
>>> If the filesystem is full, I won't need my little Perl program to tell
>>> me that something is wrong, because just about everything will fail to work.
>>
>> Some time ago I'd concluded that checking return of B<close> of
>> RO filehandle is useles since that syscall would fail only in case when
>> a whole system crashed.  While reading that braindead thread I've came
>> to idea that I was somewhat wrong.  Am I?
>
>
> If it was a pipe open, then yes, you should have checked.

I recall the thread about closing pipes.  It was very noisy, thanks to
me.

> See the 3rd paragraph in:
>
>    perldoc -f close

I was about reading regular files though.


-- 
Torvalds' goal for Linux is very simple: World Domination
Stallman's goal for GNU is even simpler: Freedom


------------------------------

Date: Sun, 18 Jan 2009 12:47:49 -0800
From: Tim Greer <tim@burlyhost.com>
Subject: Re: Checking return values (was: Re: opening a file)
Message-Id: <WzMcl.42054$H12.23661@newsfe12.iad>

Peter J. Holzer wrote:

> You may not have to check each print, because - as mentioned above -
> errors are "sticky". So if you have a loop like
> 
> while (...) {
> print $fd $record_header;
> print $fd $first_part;
> some_sub_which_prints_more_parts();
> print $fd $record_trailer or die "...";
> }
> 
> it's probably sufficient to check the last print in the loop.
> Or you can use $io->error:
> 
> use IO::Handle;
> ...
> while (...) {
> print $fd $record_header;
> print $fd $first_part;
> some_sub_which_prints_more_parts();
> print $fd $record_trailer;
> die "..."

I'm sure there are cases where if you really want or need to check if
print was successful, it could be used and come in handy.  I don't
recall ever running into a situation where it would have really
necessary in that case -- but again, I could see, depending on what
you're doing, some things could benefit from ultra-paranoid checking.
-- 
Tim Greer, CEO/Founder/CTO, BurlyHost.com, Inc.
Shared Hosting, Reseller Hosting, Dedicated & Semi-Dedicated servers
and Custom Hosting.  24/7 support, 30 day guarantee, secure servers.
Industry's most experienced staff! -- Web Hosting With Muscle!


------------------------------

Date: Sun, 18 Jan 2009 12:45:22 -0800
From: Tim Greer <tim@burlyhost.com>
Subject: Re: Checking return values
Message-Id: <DxMcl.42053$H12.38631@newsfe12.iad>

Andrew DeFaria wrote:


> And, if you think about it, if print fails what are you gonna do? I
> mean how are you gonna inform the outside world of the failure if you
> can't talk to the outside world?!?
> 
> ;-)
> 
> Now print to a file... That's another matter...

:-)
-- 
Tim Greer, CEO/Founder/CTO, BurlyHost.com, Inc.
Shared Hosting, Reseller Hosting, Dedicated & Semi-Dedicated servers
and Custom Hosting.  24/7 support, 30 day guarantee, secure servers.
Industry's most experienced staff! -- Web Hosting With Muscle!


------------------------------

Date: Sun, 18 Jan 2009 14:29:53 +0100
From: gamo <gamo@telecable.es>
Subject: Re: Circular lists
Message-Id: <alpine.LNX.2.00.0901181422001.8399@jvz.es>

On Sun, 18 Jan 2009, gamo wrote:

> > 
> > 2a) So how can I efficiently communicate with those past and future
> > representations so that one and only one of us knows whether we are the "it"
> > one or not?  Do it by rule, not by explicit communication.  Each of us tests
> > whether our representation canonicalizes to itself.  If it does, I know I am
> > "it".  If it doesn't,  I know that one of the other representations is (or
> > will be) "it", so I bow out gracefully.
> > 
> Good idea.

 ...and it solves definitively the problem, but I wonder if it
could be optimized even more. I.e. given a large list, could be 
predicted when the candidates would be of a pattern of success? 

Thanks!


-- 
http://www.telecable.es/personales/gamo/
"Was it a car or a cat I saw?"
perl -E 'say 111_111_111**2;'


------------------------------

Date: Sun, 18 Jan 2009 13:02:35 -0800
From: Xho Jingleheimerschmidt <xhoster@gmail.com>
Subject: Re: Circular lists
Message-Id: <49739da5$0$25672$ed362ca5@nr5c.newsreader.com>

gamo wrote:
> On Sun, 18 Jan 2009, gamo wrote:
> 
>>> 2a) So how can I efficiently communicate with those past and future
>>> representations so that one and only one of us knows whether we are the "it"
>>> one or not?  Do it by rule, not by explicit communication.  Each of us tests
>>> whether our representation canonicalizes to itself.  If it does, I know I am
>>> "it".  If it doesn't,  I know that one of the other representations is (or
>>> will be) "it", so I bow out gracefully.
>>>
>> Good idea.
> 
> ....and it solves definitively the problem, but I wonder if it
> could be optimized even more. I.e. given a large list, could be 
> predicted when the candidates would be of a pattern of success? 


Yes, but that is not going to get you orders of magnitude improvement.

The first thing I did was translate the algorithm already described 
(including the part about fixing one of the "a"s to be the first slot, 
which means less permutations plus less rotations (you only have to find 
the other "a" and rotate it, rather than doing all 20 rotations) into
C, because that will get you orders of magnitude improvement and my gut 
tells me that nothing left that you can do in Perl will do so (other 
than large scale parallelization, assuming you have a 100 CPU cluster 
laying around).  Also, because at this point it was pretty 
straightforward to translate to C, but after more optimization done in 
Perl the translation to C would become exponentially harder.

The next optimization is to hack the dpermute code itself (as opposed to 
the callback made from it) in the manner you describe, so that as soon 
as the second 'a' is placed into the variable 'prefix', I can start 
testing to see if the "rotated" string is, to our knowledge so far, 
going to be less than, greater than, or undecidable compared to the 
original string.  If the rotated will be less than the original, then we 
will fail the self-canonicalization test and I can abort now before 
filling in any more letters.  If the rotated will be greater than the 
original, then I know that any string I make by filling in more letters 
will pass the canonicalization test, so I can set a flag so that from 
here (recursively) on, no more lexical order testing is needed, it 
passes by induction.  If it is undecidable, then the I just have to do 
the test-skip-flag thing when the next letter is added.  (And of course 
I only need to test the added letter to its mate, I don't have to test 
all preceding letters as I know that they are equal because otherwise I 
wouldn't have reached this point without the flag being set)

After all that, I got C code that could generate all 4,888,643,760 
answers for the size 2-3-4-5-6 problem in about one hour, on a 900MHz 
processor, using negligible memory.

Of course, if there were not exactly two instance of the rarest letter, 
then this optimization would not work.  A general-case optimization 
should be possible, but much more difficult.   (and if the rarest letter
were not also the alphabetically smallest, then rather than changing my 
code to work under that scenario, I would just remap the alphabet to 
make it be the case that the rarest was the smallest, and then use 
either Perl's or linux's "tr" to map the answers back to the original 
alphabet.)

But this all because I think it's fun.  If I were doing this for real, 
I'd probably be asking, "What the heck am I going to do with over 4 
billion 20-letter strings, and maybe I should focus my optimization on 
the use of them rather than the generation of them"


Xho


------------------------------

Date: Sun, 18 Jan 2009 14:53:34 +0100
From: "Petr Vileta \"fidokomik\"" <stoupa@practisoft.cz>
Subject: Re: fastest way to allocate memory ?
Message-Id: <gkvcgq$30kg$1@ns.felk.cvut.cz>

"Ilya Zakharevich" <nospam-abuse@ilyaz.org> píše v diskusním pøíspìvku 
news:gkupi0$m2m$1@agate.berkeley.edu...

> $gras x= 1024 * 1024 * 10000;

From which Perl version this x= operator work (5.8 or 5.10)?

-- 
Petr Vileta, Czech republic
(My server rejects all messages from Yahoo and Hotmail.
Send me your mail from another non-spammer site please.)
Please reply to <petr AT practisoft DOT cz>



------------------------------

Date: Sun, 18 Jan 2009 18:26:53 +0100
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: fastest way to allocate memory ?
Message-Id: <slrngn6piu.81b.hjp-usenet2@hrunkner.hjp.at>

On 2009-01-18 13:53, Petr Vileta "fidokomik" <stoupa@practisoft.cz> wrote:
> "Ilya Zakharevich" <nospam-abuse@ilyaz.org> píše v diskusním příspěvku 
> news:gkupi0$m2m$1@agate.berkeley.edu...
>
>> $gras x= 1024 * 1024 * 10000;
>
> From which Perl version this x= operator work (5.8 or 5.10)?

At least 5.005_03:

% /usr/bin/perl -le '$x = "a"; $x x= 5; print $x'
aaaaa
% /usr/bin/perl -V                               
Summary of my perl5 (5.0 patchlevel 5 subversion 3) configuration:
  Platform:
    osname=linux, osvers=2.2.5-22smp, archname=i386-linux
    uname='linux porky.devel.redhat.com 2.2.5-22smp #1 smp wed jun 2 09:11:51 edt 1999 i686 unknown '

	hp


------------------------------

Date: Sun, 18 Jan 2009 10:45:54 -0800
From: Xho Jingleheimerschmidt <xhoster@gmail.com>
Subject: Re: fastest way to allocate memory ?
Message-Id: <4973797e$0$25707$ed362ca5@nr5c.newsreader.com>

Ilya Zakharevich wrote:
> [A complimentary Cc of this posting was sent to
> georg.heiss@gmx.de
> <georg.heiss@gmx.de>], who wrote in article <bbace9ea-54e6-4b7e-8b89-4e81e89d6a60@b38g2000prf.googlegroups.com>:
>> Hi, i try to allocate 10GB of memory on my box and it takes about 27
>> seconds.
> 
> You are allocating 30000MiB, not 10GB (and not 20GB, as somebody
> mentioned) + malloc() overhead.

I'm the one who mentioned 20GB.  And I didn't just pull that number out 
of my butt; it was verified using system monitoring tools.

>> my $gras = "A" x (1024 * 1024 * 10000);
> 
> RHS is constant, thus computed at compile time ==> 10000GiB.
> LHS taks another 10000GiB.
> 
>> my $needle = "B";
>> print "\nAllocated " . length($gras) . " byte buffer\n";
>> $gras = $gras.$needle;
> 
> RHS takes another 10000GiB.

It seems that, at least as of 5.8.8, this construct is optimized
to be about the same as $gras.=$needle, and as long as there is room to 
grow $gras in-place it does not need to be copied.


Xho


------------------------------

Date: Sun, 18 Jan 2009 08:13:58 -0600
From: Tad J McClellan <tadmc@seesig.invalid>
Subject: Re: Parsing out text from in between HTML tags
Message-Id: <slrngn6e96.2ot.tadmc@tadmc30.sbcglobal.net>

tgwaltz@googlemail.com <tgwaltz@googlemail.com> wrote:


> I'm new to perl and am having a tough time trying to complete a
> theoretically simple statement.  


What you want to do (parse a context-free language) is not
as simple as it seems. It is, in fact, pretty darn complex.


> What I'm trying to do is write a very
> simple search engine that searches an html file for a given
> searchQuery.  The way it's set up now is that if the searchQuery is
> something like "java," every single page is a hit because the word
> "javascript" is in the code in the form of the "<script
> language="javascript">" etc.  


Should it match the below, or should it not match the below?

   <p>You can use <strong>javascript</strong> for client-side programming</p>

If it should not match, then you probably want word-boundaries (\b) in 
your pattern.


> I want to specify that $searchQuery
> should be surrounded like so:
>
> ">(anything)searchQuery(anything)<"


If $searchQuery = 'HTML tags' then should it match or not match the below?

   <p><acronym title="HyperText Markup Language">HTML</acronym> 
   tags have angle-brackets</p>

If it should match, then "anything" above does not really mean anything...

"HTML tags", "HTML&nbsp;tags" and "HTML\ntags" should probably all match...


> In other words, the searchQuery has to be in between two HTML tags.
                                               ^^^^^^^^^^^^^^^^^^^^^

That too is over-simplified.

    <p>It is spelled ja<strong>v</strong>a, not "jabba"</p>


> Here's what I have at this point (the wrong way):
>
> return unless ($fileName =~ /\Q$searchQuery\E/i);
                      ^^^^

Do you want to search the name or search the content?

If you want to search the content, then you have chosen an extremely
poor name for your variable...

Once you have culled the data to only its content (ie. removed all markup),
and normalized it (eg. folded whitespace) then you probably want something like:

   ... $file_content =~ /\b\Q$searchQuery\E\b/i ...


> Any help would be greatly appreciated!


Use a module that understands HTML for processing HTML data.

    perldoc -q "remove HTML"

suggests a couple of modules that can help you (and there are many others as well).


-- 
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"


------------------------------

Date: Mon, 19 Jan 2009 08:44:04 +1300
From: "Tintin@teranews.com" <Tintin@teranews.com>
Subject: Re: unable to open file
Message-Id: <eELcl.49075$Jy.42296@newsfe06.iad>

Chris Mattern wrote:
> On 2009-01-16, Tintin@teranews.com <Tintin@teranews.com> wrote:
> 
> <snip>
> 
>> Tad J McClellan wrote:
>>> FindBin tells you where the program is.
>>>
>>> Where the program is does not matter with regard to relative paths.
>> It does in relation to the OP's question.  It is quite common to 
>> reference configuration files or similar from a Perl/CGI script in 
>> relation to the location where the script exists.
> 
> No, it isn't.  That's Tad's point.  Relative pathnames are relative to your
> current working directory.  Period.  It is pointless to speculate how your
> CWD may relate to where your script is located when you can just find out 
> where the CWD is, and also set it to where you want it to be.
> 

I don't want to flog a dead horse (or camel) here, but I'll give you an 
example of where knowing the script location rather than CWD is useful.

Say you have the dir structure

/var/www/cgi-bin
/var/www/config

and in your Perl/CGI script, you have

require '../config/file.cfg';

Now this path is reliant on the location of the script in the cgi-bin 
directory, which may or may not be CWD.


------------------------------

Date: Sun, 18 Jan 2009 14:17:14 -0600
From: Tad J McClellan <tadmc@seesig.invalid>
Subject: Re: unable to open file
Message-Id: <slrngn73ia.78c.tadmc@tadmc30.sbcglobal.net>

Tintin@teranews.com <Tintin@teranews.com> wrote:
> Chris Mattern wrote:
>> On 2009-01-16, Tintin@teranews.com <Tintin@teranews.com> wrote:
>> 
>> <snip>
>> 
>>> Tad J McClellan wrote:
>>>> FindBin tells you where the program is.
>>>>
>>>> Where the program is does not matter with regard to relative paths.
>>> It does in relation to the OP's question.  It is quite common to 
>>> reference configuration files or similar from a Perl/CGI script in 
>>> relation to the location where the script exists.
>> 
>> No, it isn't.  That's Tad's point.  Relative pathnames are relative to your
>> current working directory.  Period.  It is pointless to speculate how your
>> CWD may relate to where your script is located when you can just find out 
>> where the CWD is, and also set it to where you want it to be.
>> 
>
> I don't want to flog a dead horse (or camel) here, 


I will do you the favor of continuing to flog it until you get the point.


> but I'll give you an 
> example of where knowing the script location rather than CWD is useful.


Your example does NOT show where knowing the script location 
rather than CWD is useful...

It shows where knowing the script location is useful so
that you can make the cwd be what you need it to be.


> Say you have the dir structure
>
> /var/www/cgi-bin
> /var/www/config
>
> and in your Perl/CGI script, you have
>
> require '../config/file.cfg';


If your cwd is '/tmp', then that require will FAIL!


> Now this path is reliant on the location of the script in the cgi-bin 


No it is not.

Relative paths are always relative to your cwd.

Relative paths are never relative to the location of the program.


> directory, which may or may not be CWD.


If it is the cwd, then the require will succeed.

If it is not the cwd, then the require will fail.

What FindBin *is* useful for is getting a value to feed to chdir()
so that the situation you describe will succeed.

    chdir $Bin or die "could not cd to '$Bin' $!"
    require '../config/file.cfg';

Now it will work even if the cwd started out at '/tmp'.


-- 
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"


------------------------------

Date: Sun, 18 Jan 2009 21:33:26 +0100
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: unable to open file
Message-Id: <slrngn74gm.9n0.hjp-usenet2@hrunkner.hjp.at>

On 2009-01-18 19:44, Tintin@teranews.com <Tintin@teranews.com> wrote:
> I don't want to flog a dead horse (or camel) here, but I'll give you an 
> example of where knowing the script location rather than CWD is useful.
>
> Say you have the dir structure
>
> /var/www/cgi-bin
> /var/www/config
>
> and in your Perl/CGI script, you have
>
> require '../config/file.cfg';
>
> Now this path is reliant on the location of the script in the cgi-bin 
> directory, which may or may not be CWD.

No.

	hp


------------------------------

Date: Sun, 18 Jan 2009 13:11:23 +0100
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: What do you need to have to be considered a Master at Perl?
Message-Id: <slrngn673b.55m.hjp-usenet2@hrunkner.hjp.at>

On 2009-01-17 22:29, ~greg <g_m@remove-comcast.net> wrote:
> Jürgen Exner > ...
>> Sherm Pendley > ..
>>>Someone who understands the Chomsky hierarchy,
>>> and why not every context-free language can be
>>> described with a regular grammar.
>>
>> :-))
>>
>> YMMD
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~
> I looked it up:
>
> Acronym     Definition
>
> YMMD       You Made My Day
> or
> YMMD       Your Mileage May Differ

The latter is rare. It is normally written

  YMMV       Your Mileage May Vary

So you can assume that the former was meant.

> However, whether "recursive regular expressions" is an oxymoron,
> or whether perl forces us to expand the definition of "regular expression",
> isn't the sort of terminological question that could ever bother me.

But as a Perl programmer, should be aware that Perl regexps are not
regular expressions in the mathematical sense.

	hp



------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc.  For subscription or unsubscription requests, send
#the single line:
#
#	subscribe perl-users
#or:
#	unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.  

NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice. 

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 2138
***************************************


home help back first fref pref prev next nref lref last post