[29457] in Perl-Users-Digest
Perl-Users Digest, Issue: 701 Volume: 11
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Mon Jul 30 21:14:24 2007
Date: Mon, 30 Jul 2007 18:14:11 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Mon, 30 Jul 2007 Volume: 11 Number: 701
Today's topics:
Using DBI, better option than importing into @array <jwcarlton@gmail.com>
Re: Using DBI, better option than importing into @array xhoster@gmail.com
Re: Using DBI, better option than importing into @array usenet@DavidFilmer.com
Re: Using DBI, better option than importing into @array <glex_no-spam@qwest-spam-no.invalid>
Re: Using DBI, better option than importing into @array <noreply@gunnar.cc>
Re: Using DBI, better option than importing into @array <jwcarlton@gmail.com>
Re: Using DBI, better option than importing into @array <jwcarlton@gmail.com>
Re: Using DBI, better option than importing into @array <jwcarlton@gmail.com>
Re: Using DBI, better option than importing into @array <tzz@lifelogs.com>
Re: Using SQLServer Stored Procedure returns Data with <tadmc@seesig.invalid>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Mon, 30 Jul 2007 21:30:36 -0000
From: Jason <jwcarlton@gmail.com>
Subject: Using DBI, better option than importing into @array
Message-Id: <1185831036.355199.126500@w3g2000hsg.googlegroups.com>
I've posted a few times now that I'm rebuilding a message board
program to use MySQL instead of flat text files. I'm relatively new to
MySQL, though, so I'm having fun with the challenges along the way.
The database has 2 tables: one to hold subjects, and one to hold all
of the posts. The subjects table has around 17000 rows, while the
posts table has around 600,000.
The most current problem is SPEED! From sheer lack of knowledge, I'm
importing the subjects table into an array in the beginning of the
script, then using a for loop throughout the program to access that
array. But the program is running pretty slow, and I'm sure that the
bottleneck is with this array.
Here is the code that I'm using:
# Push subjects into Perl array
my $filelist = $dbh->selectall_arrayref("SELECT `id`, `lastmodified`,
`subject` FROM $forum_subjects ORDER BY lastmodified DESC");
my @filenames;
for my $row (@$filelist) {
my ($id, $lastmodified, $subject) = @$row;
push(@filenames, $id . "|:|" . $lastmodified . "|:|" . $subject);
}
# In the "view subject" section, loop through last 20 indexes of
@filenames
# 0 and 20 are dynamic in the real script
for ($count=0; $count < 20; $count++) {
($id, $lastmodified, $subject) = split(/\|:\|/, $filenames[$count]);
my $topiclist = $dbh->selectall_arrayref("SELECT `id`, `subject`,
`postdate`, `username`, `email`, `comment` FROM $forum_posts WHERE
id=" . $dbh->quote($id) . " ORDER BY postdate ASC");
print ...
}
I know that this has got to be the most inefficient method possible,
but I haven't found a better way. Is there a faster method to get the
information I want than this?
TIA,
Jason
------------------------------
Date: 30 Jul 2007 22:25:46 GMT
From: xhoster@gmail.com
Subject: Re: Using DBI, better option than importing into @array
Message-Id: <20070730182549.467$qp@newsreader.com>
Jason <jwcarlton@gmail.com> wrote:
> I've posted a few times now that I'm rebuilding a message board
> program to use MySQL instead of flat text files. I'm relatively new to
> MySQL, though, so I'm having fun with the challenges along the way.
>
> The database has 2 tables: one to hold subjects, and one to hold all
> of the posts. The subjects table has around 17000 rows, while the
> posts table has around 600,000.
>
> The most current problem is SPEED! From sheer lack of knowledge, I'm
> importing the subjects table into an array in the beginning of the
> script, then using a for loop throughout the program to access that
> array. But the program is running pretty slow, and I'm sure that the
> bottleneck is with this array.
>
> Here is the code that I'm using:
>
> # Push subjects into Perl array
> my $filelist = $dbh->selectall_arrayref("SELECT `id`, `lastmodified`,
> `subject` FROM $forum_subjects ORDER BY lastmodified DESC");
>
> my @filenames;
> for my $row (@$filelist) {
> my ($id, $lastmodified, $subject) = @$row;
> push(@filenames, $id . "|:|" . $lastmodified . "|:|" . $subject);
Why join the data with |:| just to later use split on it? Unless you are
severally strapped for memory, that is probably not a good thing to do.
> }
>
> # In the "view subject" section, loop through last 20 indexes of
> @filenames
> # 0 and 20 are dynamic in the real script
You could use the "limit" keyword (mysql specific) to retrieve only
the topics you are interested in, rather than returning all of them
and then using only a subset. I don't know how much, if any, of a speed
improvement this would make.
> for ($count=0; $count < 20; $count++) {
> ($id, $lastmodified, $subject) = split(/\|:\|/, $filenames[$count]);
>
> my $topiclist = $dbh->selectall_arrayref("SELECT `id`, `subject`,
> `postdate`, `username`, `email`, `comment` FROM $forum_posts WHERE
> id=" . $dbh->quote($id) . " ORDER BY postdate ASC");
As a side note, it is generally better to use placeholders than to use
dbh->quote.
Is there an index on the id column of $forum_posts? If not, then MySQL has
to read through all 600,000 rows, for each of the 21 topics. I'm guessing
that this will make MySQL, not Perl, the bottleneck.
Xho
--
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service $9.95/Month 30GB
------------------------------
Date: Mon, 30 Jul 2007 22:26:42 -0000
From: usenet@DavidFilmer.com
Subject: Re: Using DBI, better option than importing into @array
Message-Id: <1185834402.341938.30100@m37g2000prh.googlegroups.com>
On Jul 30, 2:30 pm, Jason <jwcarl...@gmail.com> wrote:
> The most current problem is SPEED!
Yeah, I imagine so. The killer is that you are submitting a full
selectall inside a loop. You should generally NEVER do that. By
constructing a different SQL statement each time (because you hard-
code the id in the SQL) it means the statement will never be found in
your database server's cache, so the server must compute an execution
plan over and over again. Instead, you should always use the prepare
method (outside of the loop) and then an execute (inside the loop).
The array is almost surely also a problem, but I'm not fully
understanding your intent. It would be very helpful to see a few
sample rows (or mock-up rows) from each table. I think you should be
doing a table straight join to avoid the need to pre-load the array,
but it would be far easier to make a recommendation if I saw some
sample data and you gave a specific example of what you want to do
with this data.
On other points related to style (and having nothing to do with
whether these points should be used in your instance, because I don't
think they should):
> for ($count=0; $count < 20; $count++) {
> ($id, $lastmodified, $subject) = split(/\|:\|/, $filenames[$count]);
It's more Perlish to do a simple iterative loop like this:
for (0..20) { #Perl ain't C!
But your intent seems to be just to do an array slice. If you want to
take an array slice, don't create a counter and increment it; just
slice the array directly, such as:
foreach my $line_of_the_array ( @array[0..20] ) {
Oh, and, also:
> $dbh->quote($id)
If you are sure your $id cannot be tainted (such as it has an integer
type imposed by the database) then you don't need to quote it (and
doing so further slows you down).
Show me some sample rows and I'll respond with some more specific
info...
--
The best way to get a good answer is to ask a good question.
David Filmer (http://DavidFilmer.com)
------------------------------
Date: Mon, 30 Jul 2007 17:33:04 -0500
From: "J. Gleixner" <glex_no-spam@qwest-spam-no.invalid>
Subject: Re: Using DBI, better option than importing into @array
Message-Id: <46ae6721$0$3567$815e3792@news.qwest.net>
Jason wrote:
> The most current problem is SPEED! From sheer lack of knowledge, I'm
> importing the subjects table into an array in the beginning of the
> script, then using a for loop throughout the program to access that
> array. But the program is running pretty slow, and I'm sure that the
> bottleneck is with this array.
You probably want to look at the SQL 'limit' command. Fetching 17000
rows is a complete waste of time because there's no way you're going
to be able to efficiently display that much data.
>
> Here is the code that I'm using:
>
> # Push subjects into Perl array
> my $filelist = $dbh->selectall_arrayref("SELECT `id`, `lastmodified`,
> `subject` FROM $forum_subjects ORDER BY lastmodified DESC");
>
> my @filenames;
> for my $row (@$filelist) {
> my ($id, $lastmodified, $subject) = @$row;
> push(@filenames, $id . "|:|" . $lastmodified . "|:|" . $subject);
> }
No need to do that. You have $filelist, iterate through it below,
instead of creating @filenames.
>
> # In the "view subject" section, loop through last 20 indexes of
> @filenames
> # 0 and 20 are dynamic in the real script
No, use 'limit' and 'order by' to get the last 20 'id' fields.
> for ($count=0; $count < 20; $count++) {
> ($id, $lastmodified, $subject) = split(/\|:\|/, $filenames[$count]);
>
> my $topiclist = $dbh->selectall_arrayref("SELECT `id`, `subject`,
> `postdate`, `username`, `email`, `comment` FROM $forum_posts WHERE
> id=" . $dbh->quote($id) . " ORDER BY postdate ASC");
Look into placeholders and bind_columns and you don't need any of
those '`'.
my $sql = qq{ SELECT id, subject, postdate, username, email, comment
FROM $forum_posts
WHERE id=?
ORDER BY postdate ASC};
You could probably do it all in one query. There are plenty of
sites that discuss using DBI and many on MySQL, give those
a try too.
------------------------------
Date: Tue, 31 Jul 2007 00:37:42 +0200
From: Gunnar Hjalmarsson <noreply@gunnar.cc>
Subject: Re: Using DBI, better option than importing into @array
Message-Id: <5h77o4F3hpa1vU1@mid.individual.net>
Jason wrote:
> The most current problem is SPEED! From sheer lack of knowledge, I'm
> importing the subjects table into an array in the beginning of the
> script, then using a for loop throughout the program to access that
> array. But the program is running pretty slow, and I'm sure that the
> bottleneck is with this array.
How do you know that? Anyway, it seems to me that you can easily skip
that array.
> my $filelist = $dbh->selectall_arrayref("SELECT `id`, `lastmodified`,
> `subject` FROM $forum_subjects ORDER BY lastmodified DESC");
for ( 0 .. 20 ) {
my $topiclist = ... id=" . $dbh->quote( $filelist->[$_][0] ) . " ...
--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
------------------------------
Date: Mon, 30 Jul 2007 23:16:41 -0000
From: Jason <jwcarlton@gmail.com>
Subject: Re: Using DBI, better option than importing into @array
Message-Id: <1185837401.958440.267280@57g2000hsv.googlegroups.com>
> Why join the data with |:| just to later use split on it? Unless you are
> severally strapped for memory, that is probably not a good thing to do.
To be quite frank, the only reason I'm doing that is because I don't
know a better way. I know that it's sloppy, but since the
selectall_arrayref is creating a dynamically named array I couldn't
figure out how to pick out the right thing later.
LIMIT is probably what I need there, though.
> As a side note, it is generally better to use placeholders than to use
> dbh->quote.
I didn't realize that, I was just looking at overall character size.
I'll giving the ? a shot, though.
> Is there an index on the id column of $forum_posts? If not, then MySQL has
> to read through all 600,000 rows, for each of the 21 topics. I'm guessing
> that this will make MySQL, not Perl, the bottleneck.
No, but I don't quite understand how I can use an index to help me on
this one. I posted that question to a MySQL ng earlier today, though;
I know it's not really appropriate for CLPM.
Thanks for the tips.
------------------------------
Date: Mon, 30 Jul 2007 23:34:44 -0000
From: Jason <jwcarlton@gmail.com>
Subject: Re: Using DBI, better option than importing into @array
Message-Id: <1185838484.562454.41110@22g2000hsm.googlegroups.com>
> Yeah, I imagine so. The killer is that you are submitting a full
> selectall inside a loop. You should generally NEVER do that. By
> constructing a different SQL statement each time (because you hard-
> code the id in the SQL) it means the statement will never be found in
> your database server's cache, so the server must compute an execution
> plan over and over again. Instead, you should always use the prepare
> method (outside of the loop) and then an execute (inside the loop).
OK, I'll give that a shot.
> The array is almost surely also a problem, but I'm not fully
> understanding your intent. It would be very helpful to see a few
> sample rows (or mock-up rows) from each table. I think you should be
> doing a table straight join to avoid the need to pre-load the array,
> but it would be far easier to make a recommendation if I saw some
> sample data and you gave a specific example of what you want to do
> with this data.
It's really a simple message board, just with a lot of rows. So the
subjects table is
Categories: id - lastmodified - subject
Actual row: 17090 - 20070730192222 - This is a test
The posts table would be like:
Categories: id - subject - postdate - username - email - comment
Actual row: 17090 - This is a test - 20070730192222 - Jason -
jwcarlton@gmail.com - This is a test comment.
In retrospect, I can't quite remember why I'm duplicating the
information into the subjects table; I think it was at the suggestion
of someone in another NG. I'm sure that the goal was to make the "view
subjects" section load faster by reading 17,000 rows instead of
600,000.
Would it be better to get rid of the "subjects" table altogether, and
just create an index with the ID field?
> On other points related to style (and having nothing to do with
> whether these points should be used in your instance, because I don't
> think they should):
>
> > for ($count=0; $count < 20; $count++) {
> > ($id, $lastmodified, $subject) = split(/\|:\|/, $filenames[$count]);
>
> It's more Perlish to do a simple iterative loop like this:
>
> for (0..20) { #Perl ain't C!
>
> But your intent seems to be just to do an array slice. If you want to
> take an array slice, don't create a counter and increment it; just
> slice the array directly, such as:
>
> foreach my $line_of_the_array ( @array[0..20] ) {
I guess my background is showing through, isn't it? LOL I'm using the
counter for a few other things, too, though; in this case, I use it to
determine the background color of the table row (it alternates based
on whether $count is divisible by 2).
I don't think that I've seen your foreach version before, though. I'll
have to remember that for future reference.
> Oh, and, also:
>
> > $dbh->quote($id)
>
> If you are sure your $id cannot be tainted (such as it has an integer
> type imposed by the database) then you don't need to quote it (and
> doing so further slows you down).
This confused me, too. When I didn't use quote(), it gave me an error;
something along the lines of "$id cannot be null." But it in the
database as a number, every time, so I couldn't understand why it did
that. Using quote() solved the problem, but I don't know why.
- J
------------------------------
Date: Mon, 30 Jul 2007 23:37:16 -0000
From: Jason <jwcarlton@gmail.com>
Subject: Re: Using DBI, better option than importing into @array
Message-Id: <1185838636.859966.142540@l70g2000hse.googlegroups.com>
> > I'm sure that the
> > bottleneck is with this array.
>
> How do you know that? Anyway, it seems to me that you can easily skip
> that array.
I don't actually know it, but it's my best guess. When the program
used flat text files, it had gotten pretty slow, too, and the only
thing that the 2 scripts have in common is this large array.
> for ( 0 .. 20 ) {
> my $topiclist = ... id=" . $dbh->quote( $filelist->[$_][0] ) . " ...
Thanks, Gunnar,
Jason
------------------------------
Date: Mon, 30 Jul 2007 20:00:34 -0400
From: Ted Zlatanov <tzz@lifelogs.com>
Subject: Re: Using DBI, better option than importing into @array
Message-Id: <m2sl75l9wd.fsf@lifelogs.com>
On Mon, 30 Jul 2007 21:30:36 -0000 Jason <jwcarlton@gmail.com> wrote:
J> # Push subjects into Perl array
J> my $filelist = $dbh->selectall_arrayref("SELECT `id`, `lastmodified`,
J> `subject` FROM $forum_subjects ORDER BY lastmodified DESC");
...
J> # In the "view subject" section, loop through last 20 indexes of
J> @filenames
J> # 0 and 20 are dynamic in the real script
J> for ($count=0; $count < 20; $count++) {
J> ($id, $lastmodified, $subject) = split(/\|:\|/, $filenames[$count]);
J> my $topiclist = $dbh->selectall_arrayref("SELECT `id`, `subject`,
J> `postdate`, `username`, `email`, `comment` FROM $forum_posts WHERE
J> id=" . $dbh->quote($id) . " ORDER BY postdate ASC");
J> print ...
J> }
J> I know that this has got to be the most inefficient method possible,
J> but I haven't found a better way. Is there a faster method to get the
J> information I want than this?
First of all, you may want to use sprintf() to make your SQL strings
clearer. Also, don't use $dbh->quote() only sometimes, use it all the
time.
Second, learn SQL better and you'll be able to write the exact query
that will return the last 20 items with the detail you need. It's not a
Perl problem that your approach is slow. I won't write the query for
you, because you should look it up, and it may make sense for you to use
the MySQL-specific version as opposed to the standard SQL version.
Last but most important, use Rose::DB::Object or Class::DBI (both on
CPAN) to automate your database work. You'll spend 2 hours learning how
to set up a Rose::DB connection, then you'll save weeks of your time
using all the functionality that RDBO provides for you. I like
Class::DBI too, but RDBO is IMHO much better supported and designed, so
I reccomend it.
Ted
------------------------------
Date: Mon, 30 Jul 2007 17:55:59 -0500
From: Tad McClellan <tadmc@seesig.invalid>
Subject: Re: Using SQLServer Stored Procedure returns Data with DBI...
Message-Id: <slrnfasr3v.bfi.tadmc@tadmc30.sbcglobal.net>
sharemania@gmail.com <sharemania@gmail.com> wrote:
> I am trying
Show us what you tried.
> to use stored Procedure in SQL Server with the DBi module,
> and I am not getting the data that have been returned.
>
> How should I do it?
Some way other than the way you tried.
But since we don't know what way you tried, we cannot help
you fix it...
--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc. For subscription or unsubscription requests, send
#the single line:
#
# subscribe perl-users
#or:
# unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.
NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice.
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.
#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V11 Issue 701
**************************************