[30196] in Perl-Users-Digest


home	help	back	first	fref	pref	prev	next	nref	lref	last	post
Perl-Users Digest, Issue: 1439 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Tue Apr 15 15:36:30 2008

Date: Tue, 15 Apr 2008 12:35:58 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Tue, 15 Apr 2008     Volume: 11 Number: 1439

Today's topics:
        design strategy on handling large DB <ela@yantai.org>
    Re: design strategy on handling large DB <tzz@lifelogs.com>
    Re: design strategy on handling large DB <ela@yantai.org>
    Re: design strategy on handling large DB <RedGrittyBrick@SpamWeary.foo>
    Re: design strategy on handling large DB xhoster@gmail.com
    Re: design strategy on handling large DB <tzz@lifelogs.com>
    Re: design strategy on handling large DB <ela@yantai.org>
    Re: design strategy on handling large DB <hjp-usenet2@hjp.at>
    Re: design strategy on handling large DB xhoster@gmail.com
    Re: design strategy on handling large DB <hjp-usenet2@hjp.at>
    Re: design strategy on handling large DB <szrRE@szromanMO.comVE>
    Re: design strategy on handling large DB <hjp-usenet2@hjp.at>
    Re: design strategy on handling large DB <simon.chao@fmr.com>
    Re: design strategy on handling large DB <simon.chao@fmr.com>
    Re: design strategy on handling large DB <tzz@lifelogs.com>
    Re: design strategy on handling large DB <cartercc@gmail.com>
    Re: design strategy on handling large DB <yankeeinexile@gmail.com>
    Re: design strategy on handling large DB <simon.chao@fmr.com>
    Re: design strategy on handling large DB xhoster@gmail.com
    Re: design strategy on handling large DB xhoster@gmail.com
    Re: design strategy on handling large DB <simon.chao@fmr.com>
    Re: design strategy on handling large DB <pgodfrin@gmail.com>
    Re: design strategy on handling large DB <szrRE@szromanMO.comVE>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Fri, 11 Apr 2008 20:46:13 +0800
From: "Ela" <ela@yantai.org>
Subject: design strategy on handling large DB
Message-Id: <ftnmio$7ns$1@ijustice.itsc.cuhk.edu.hk>

except in using mysql or related programs, how can perl be used to deal with 
large databases? Furthermore, how can I load that large db into cache in 
advance and later use another perl to access/search that db? 




------------------------------

Date: Fri, 11 Apr 2008 09:31:50 -0500
From: Ted Zlatanov <tzz@lifelogs.com>
Subject: Re: design strategy on handling large DB
Message-Id: <863apsv4x5.fsf@lifelogs.com>

On Fri, 11 Apr 2008 20:46:13 +0800 "Ela" <ela@yantai.org> wrote: 

E> except in using mysql or related programs, how can perl be used to deal with 
E> large databases? 

In order (you may already know some of these):

1) learn SQL and at least some RDBMS design theory (select+join, update,
   delete, indexing, foreign keys, normalization, data types, etc.)

2) read the DBI documentation

3) try Rose::DB::Object or Class::DBIx to see if they will work for you
instead of direct DBI queries.

I hope by "large" you mean at most a hundred gigabytes of data (which is
considered a mid-size database by most people).  Large databases are
usually in the hundreds of gigabytes - terabytes range and very
different beasts.

E> Furthermore, how can I load that large db into cache in advance and
E> later use another perl to access/search that db?

I don't understand exactly what you want; there are 3 possibilities I
can think of and I'd rather not guess.  Can you please explain what you
want to accomplish in more detail?

Ted


------------------------------

Date: Fri, 11 Apr 2008 23:03:20 +0800
From: "Ela" <ela@yantai.org>
Subject: Re: design strategy on handling large DB
Message-Id: <ftnujs$b70$1@ijustice.itsc.cuhk.edu.hk>

> I hope by "large" you mean at most a hundred gigabytes of data (which is
> considered a mid-size database by most people).  Large databases are
> usually in the hundreds of gigabytes - terabytes range and very
> different beasts.

It's only 3GB... but I find when I run the current java program with the 
mysql, it uses up all the 2GB memory...

>
> E> Furthermore, how can I load that large db into cache in advance and
> E> later use another perl to access/search that db?
>
> I don't understand exactly what you want; there are 3 possibilities I
> can think of and I'd rather not guess.  Can you please explain what you
> want to accomplish in more detail?
>
> Ted

In MySQL, the db is already structured and therefore runtime prog can access 
the structued DB. But in perl, I don't know whether I can put the DB flat 
file into some data structures in advance, so users can query the DB 
whenever they want instead of dynamically create the DB every time. 




------------------------------

Date: Fri, 11 Apr 2008 16:40:29 +0100
From: RedGrittyBrick <RedGrittyBrick@SpamWeary.foo>
Subject: Re: design strategy on handling large DB
Message-Id: <47ff866d$0$32050$da0feed9@news.zen.co.uk>

Ela wrote:
> 
> It's only 3GB... but I find when I run the current java program with the 
> mysql, it uses up all the 2GB memory...

3 is usually greater than 2. Unless the DBMS is very space-inefficient 
at storing data and the Java program is very space-efficient at storing 
data.


>> E> Furthermore, how can I load that large db into cache in advance 

cache usually means memory doesn't it?


There must be something I haven't understood that explains how 3 GB data 
can be expected to fit into less than 2GB of memory.

-- 
RGB


------------------------------

Date: 11 Apr 2008 17:55:08 GMT
From: xhoster@gmail.com
Subject: Re: design strategy on handling large DB
Message-Id: <20080411135510.515$Aq@newsreader.com>

"Ela" <ela@yantai.org> wrote:
> except in using mysql or related programs, how can perl be used to deal
> with large databases?

Many ways.  Flat files which you write your own access to, flat
files used with some DBD module to access, special formats like DBM::Deep,
etc.

> Furthermore, how can I load that large db into
> cache in advance

To my mind, a DB that can be loaded into cache is by definition not large.

> and later use another perl to access/search that db?

I don't know of any way of reliably doing this that is less annoying
than just running mysql or related programs.  Afterall, that is what they
are designed for.

Xho

-- 
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.


------------------------------

Date: Fri, 11 Apr 2008 13:12:23 -0500
From: Ted Zlatanov <tzz@lifelogs.com>
Subject: Re: design strategy on handling large DB
Message-Id: <86r6dcs1ko.fsf@lifelogs.com>

On Fri, 11 Apr 2008 23:03:20 +0800 "Ela" <ela@yantai.org> wrote: 

>> I hope by "large" you mean at most a hundred gigabytes of data (which is
>> considered a mid-size database by most people).  Large databases are
>> usually in the hundreds of gigabytes - terabytes range and very
>> different beasts.

E> It's only 3GB... but I find when I run the current java program with the 
E> mysql, it uses up all the 2GB memory...

You are loading all your data into memory.  That's almost certainly the
wrong approach.  Do you need all the data?  What does the data look
like?  What operations are you doing across the data?  SQL has almost
every operation covered, and it's rare indeed that you will find it
inadequate for managing structured data.

E> Furthermore, how can I load that large db into cache in advance and
E> later use another perl to access/search that db?
>> 
>> I don't understand exactly what you want; there are 3 possibilities I
>> can think of and I'd rather not guess.  Can you please explain what you
>> want to accomplish in more detail?

E> In MySQL, the db is already structured and therefore runtime prog can access 
E> the structued DB. But in perl, I don't know whether I can put the DB flat 
E> file into some data structures in advance, so users can query the DB 
E> whenever they want instead of dynamically create the DB every time. 

I still don't understand why Perl can't query the database directly, but
you have to answer the questions above before I or anyone else can
suggest the next step without guessing.

Ted


------------------------------

Date: Sat, 12 Apr 2008 08:57:19 +0800
From: "Ela" <ela@yantai.org>
Subject: Re: design strategy on handling large DB
Message-Id: <ftp1df$r49$1@ijustice.itsc.cuhk.edu.hk>


"Ted Zlatanov" <tzz@lifelogs.com> wrote in message 
news:86r6dcs1ko.fsf@lifelogs.com...
> On Fri, 11 Apr 2008 23:03:20 +0800 "Ela" <ela@yantai.org> wrote:
>
> You are loading all your data into memory.  That's almost certainly the
> wrong approach.  Do you need all the data?  What does the data look
> like?  What operations are you doing across the data?  SQL has almost
> every operation covered, and it's rare indeed that you will find it
> inadequate for managing structured data.


Do you think the bottleneck according to the following statistics is from 
mysqld?

 08:56:56  up 11:45,  5 users,  load average: 1.00, 1.00, 1.00
84 processes: 82 sleeping, 2 running, 0 zombie, 0 stopped
CPU states:  cpu    user    nice  system    irq  softirq  iowait    idle
           total   47.5%    0.0%    2.6%   0.0%     0.0%    0.0%   49.8%
           cpu00   95.1%    0.0%    4.8%   0.0%     0.0%    0.0%    0.0%
           cpu01    0.0%    0.0%    0.4%   0.0%     0.0%    0.0%   99.6%
Mem:  2061424k av, 2043432k used,   17992k free,       0k shrd,   17308k 
buff
                    515388k actv,      12k in_d,   37452k in_c
Swap: 2040212k av,       0k used, 2040212k free                 1278708k 
cached

  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
 3825 ela    25   0 20732  20M  3024 R    49.7  1.0 550:05   0 mysqld
 3815 ela      24   0 60772  59M 10428 S     0.2  2.9   1:11   0 java
 4373 ela      15   0  1044 1044   808 R     0.2  0.0   0:00   1 top





------------------------------

Date: Sat, 12 Apr 2008 18:18:59 +0200
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: design strategy on handling large DB
Message-Id: <slrng01o7j.1to.hjp-usenet2@hrunkner.hjp.at>

On 2008-04-12 00:57, Ela <ela@yantai.org> wrote:
> Do you think the bottleneck according to the following statistics is from 
> mysqld?

[top output snipped]

The programmer.

	hp


------------------------------

Date: 13 Apr 2008 02:05:06 GMT
From: xhoster@gmail.com
Subject: Re: design strategy on handling large DB
Message-Id: <20080412220510.051$wL@newsreader.com>

"Ela" <ela@yantai.org> wrote:
> > I hope by "large" you mean at most a hundred gigabytes of data (which
> > is considered a mid-size database by most people).  Large databases are
> > usually in the hundreds of gigabytes - terabytes range and very
> > different beasts.
>
> It's only 3GB... but I find when I run the current java program with the
> mysql, it uses up all the 2GB memory...

What uses up all 2GB?  MySQL or Java?  If MySQL, then you have it
poorly configured.  If Java, then you must be loading all the data into
memory at one time, either accidentally or on purpose.  Bad concepts aren't
going to become good ones just by re-writing in a different language.


Xho

-- 
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.


------------------------------

Date: Sun, 13 Apr 2008 13:26:42 +0200
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: design strategy on handling large DB
Message-Id: <slrng03rfi.4ek.hjp-usenet2@hrunkner.hjp.at>

On 2008-04-13 02:05, xhoster@gmail.com <xhoster@gmail.com> wrote:
> "Ela" <ela@yantai.org> wrote:
>> It's only 3GB... but I find when I run the current java program with the
>> mysql, it uses up all the 2GB memory...
>
> What uses up all 2GB?  MySQL or Java?  If MySQL, then you have it
> poorly configured.  If Java, then you must be loading all the data into
> memory at one time, either accidentally or on purpose.

I don't know about Java, but in Perl this is easy to do accidentally.

if you do something like 

    my $sth = $dbh->prepare("select * from largetable");
    $sth->execute();

    while (my $r = $sth->fetchrow_hashref) {
	process_one_row($r);
    }

You read one row at a time into memory. Right?

Wrong. DBD::mysql by default uses mysql_store_result, so the
$sth->execute will (try to) load the entire table into memory. You need
to tell DBD::mysql to use mysql_use_result instead:

    $dbh->{mysql_use_result} = 1;

But then you cannot issue other queries until you have finished reading
the results ...

> Bad concepts aren't going to become good ones just by re-writing in a
> different language.

Right.

	hp



------------------------------

Date: Sun, 13 Apr 2008 08:28:14 -0700
From: "szr" <szrRE@szromanMO.comVE>
Subject: Re: design strategy on handling large DB
Message-Id: <ftt8qe01s92@news3.newsguy.com>

Peter J. Holzer wrote:
> On 2008-04-13 02:05, xhoster@gmail.com <xhoster@gmail.com> wrote:
>> "Ela" <ela@yantai.org> wrote:
>>> It's only 3GB... but I find when I run the current java program
>>> with the mysql, it uses up all the 2GB memory...
>>
>> What uses up all 2GB?  MySQL or Java?  If MySQL, then you have it
>> poorly configured.  If Java, then you must be loading all the data
>> into memory at one time, either accidentally or on purpose.
>
> I don't know about Java, but in Perl this is easy to do accidentally.
>
> if you do something like
>
>    my $sth = $dbh->prepare("select * from largetable");
>    $sth->execute();
>
>    while (my $r = $sth->fetchrow_hashref) {
> process_one_row($r);
>    }
>
> You read one row at a time into memory. Right?
>
> Wrong. DBD::mysql by default uses mysql_store_result, so the
> $sth->execute will (try to) load the entire table into memory. You
> need to tell DBD::mysql to use mysql_use_result instead:
>
>    $dbh->{mysql_use_result} = 1;
>
> But then you cannot issue other queries until you have finished
> reading the results ...

Couldn't one get around that by just using multiple DBD objects (with 
mysql_use_result set) ?

-- 
szr 




------------------------------

Date: Sun, 13 Apr 2008 18:46:57 +0200
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: design strategy on handling large DB
Message-Id: <slrng04e81.6qk.hjp-usenet2@hrunkner.hjp.at>

On 2008-04-13 15:28, szr <szrRE@szromanMO.comVE> wrote:
> Peter J. Holzer wrote:
>> DBD::mysql by default uses mysql_store_result, so the $sth->execute
>> will (try to) load the entire table into memory. You need to tell
>> DBD::mysql to use mysql_use_result instead:
>>
>>    $dbh->{mysql_use_result} = 1;
>>
>> But then you cannot issue other queries until you have finished
>> reading the results ...
>
> Couldn't one get around that by just using multiple DBD objects (with 
> mysql_use_result set) ?
>

Yes, that would work. But then you have two sessions and have to be
careful with transactions, locking, etc. All solvable, of course, you
just have to be aware of it. 

	hp


------------------------------

Date: Mon, 14 Apr 2008 07:04:18 -0700 (PDT)
From: nolo contendere <simon.chao@fmr.com>
Subject: Re: design strategy on handling large DB
Message-Id: <76bd70f0-5e0c-4584-9340-df9d52ae70cd@e67g2000hsa.googlegroups.com>

On Apr 13, 7:26=A0am, "Peter J. Holzer" <hjp-usen...@hjp.at> wrote:

> if you do something like
>
> =A0 =A0 my $sth =3D $dbh->prepare("select * from largetable");
> =A0 =A0 $sth->execute();
>
> =A0 =A0 while (my $r =3D $sth->fetchrow_hashref) {
> =A0 =A0 =A0 =A0 process_one_row($r);
> =A0 =A0 }
>
> You read one row at a time into memory. Right?
>
> Wrong. DBD::mysql by default uses mysql_store_result, so the
> $sth->execute will (try to) load the entire table into memory. You need
> to tell DBD::mysql to use mysql_use_result instead:
>
> =A0 =A0 $dbh->{mysql_use_result} =3D 1;
>
> But then you cannot issue other queries until you have finished reading
> the results ...


This is weird. So what would be the difference between any of the
fetch/select<row> methods and the fetch/select<all> methods for mysql
if the default 'mysql_store_result' were used?


------------------------------

Date: Mon, 14 Apr 2008 07:15:28 -0700 (PDT)
From: nolo contendere <simon.chao@fmr.com>
Subject: Re: design strategy on handling large DB
Message-Id: <63983b3b-e6a0-400d-adf3-24643a4d6a70@2g2000hsn.googlegroups.com>

On Apr 14, 10:04=A0am, nolo contendere <simon.c...@fmr.com> wrote:
> On Apr 13, 7:26=A0am, "Peter J. Holzer" <hjp-usen...@hjp.at> wrote:
>
>
>
> > if you do something like
>
> > =A0 =A0 my $sth =3D $dbh->prepare("select * from largetable");
> > =A0 =A0 $sth->execute();
>
> > =A0 =A0 while (my $r =3D $sth->fetchrow_hashref) {
> > =A0 =A0 =A0 =A0 process_one_row($r);
> > =A0 =A0 }
>
> > You read one row at a time into memory. Right?
>
> > Wrong. DBD::mysql by default uses mysql_store_result, so the
> > $sth->execute will (try to) load the entire table into memory. You need
> > to tell DBD::mysql to use mysql_use_result instead:
>
> > =A0 =A0 $dbh->{mysql_use_result} =3D 1;
>
> > But then you cannot issue other queries until you have finished reading
> > the results ...
>
> This is weird. So what would be the difference between any of the
> fetch/select<row> methods and the fetch/select<all> methods for mysql
> if the default 'mysql_store_result' were used?

Sorry, my question should be: Why would you choose to use any of the
'row' methods over the 'all' methods if the default were being used?


------------------------------

Date: Mon, 14 Apr 2008 09:36:17 -0500
From: Ted Zlatanov <tzz@lifelogs.com>
Subject: Re: design strategy on handling large DB
Message-Id: <86lk3g5wri.fsf@lifelogs.com>

On Sat, 12 Apr 2008 08:57:19 +0800 "Ela" <ela@yantai.org> wrote: 

E> "Ted Zlatanov" <tzz@lifelogs.com> wrote in message 
E> news:86r6dcs1ko.fsf@lifelogs.com...
>> On Fri, 11 Apr 2008 23:03:20 +0800 "Ela" <ela@yantai.org> wrote:
>> 
>> You are loading all your data into memory.  That's almost certainly the
>> wrong approach.  Do you need all the data?  What does the data look
>> like?  What operations are you doing across the data?  SQL has almost
>> every operation covered, and it's rare indeed that you will find it
>> inadequate for managing structured data.

E> Do you think the bottleneck according to the following statistics is from 
E> mysqld?

E>  08:56:56  up 11:45,  5 users,  load average: 1.00, 1.00, 1.00
E> 84 processes: 82 sleeping, 2 running, 0 zombie, 0 stopped
E> CPU states:  cpu    user    nice  system    irq  softirq  iowait    idle
E>            total   47.5%    0.0%    2.6%   0.0%     0.0%    0.0%   49.8%
E>            cpu00   95.1%    0.0%    4.8%   0.0%     0.0%    0.0%    0.0%
E>            cpu01    0.0%    0.0%    0.4%   0.0%     0.0%    0.0%   99.6%
E> Mem:  2061424k av, 2043432k used,   17992k free,       0k shrd,   17308k 
E> buff
E>                     515388k actv,      12k in_d,   37452k in_c
E> Swap: 2040212k av,       0k used, 2040212k free                 1278708k 
E> cached

E>   PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
E>  3825 ela    25   0 20732  20M  3024 R    49.7  1.0 550:05   0 mysqld
E>  3815 ela      24   0 60772  59M 10428 S     0.2  2.9   1:11   0 java
E>  4373 ela      15   0  1044 1044   808 R     0.2  0.0   0:00   1 top

I don't mean to be rude, but you haven't answered any of the questions
above, and as I said, no one can help you without understanding what
you're trying to do.

You could post your source code, but that's less useful than some
answers.

For what it's worth, the Java interpreter is taking up just a few MB of
memory in the `top' output you've pasted, and MySQL is not using much
either.  Hit `M' in `top' to see the processes sorted by memory usage,
which should be helpful.

Ted


------------------------------

Date: Mon, 14 Apr 2008 08:37:40 -0700 (PDT)
From: ccc31807 <cartercc@gmail.com>
Subject: Re: design strategy on handling large DB
Message-Id: <0c9952a7-afa8-40f9-8034-7f62b42c0703@b64g2000hsa.googlegroups.com>

On Apr 11, 8:46 am, "Ela" <e...@yantai.org> wrote:
> except in using mysql or related programs, how can perl be used to deal with
> large databases? Furthermore, how can I load that large db into cache in
> advance and later use another perl to access/search that db?

What kind of user interface do you want? If you have your database,
you already have all you need to access and search the data, i.e.,
SQL. You can certainly do it from the command line, and depending of
the kind of database you may have graphical utilities. Enterprise
Manager for SQL Server, or Toad for Mysql, or even Access (via ODBC)
for Mysql.

I would suggest using Access as a front end for your database is a
good, efficient, easy way for users that are comfortable for using
Access. They can easily do inserts, updates, deletes, and select
queries using query objects and the graphical query builder.

If this doesn't work, you can always build an HTML like network
interface that uses a browser for user interaction. This is commonly
done on the web and isn't difficult to do. You can program dynamic
queries on the fly using user supplied parameters.

The important thing to remember is that the RDBMS already does
everything you want -- no need to reinvent the wheel. All you have to
do is connect it to your chosen interface. Perl can easily translate
between the two.

CC


------------------------------

Date: 14 Apr 2008 11:27:35 -0500
From: Lawrence Statton <yankeeinexile@gmail.com>
Subject: Re: design strategy on handling large DB
Message-Id: <87ej98wgeg.fsf@hummer.cluon.com>

nolo contendere <simon.chao@fmr.com> writes:
> 
> Sorry, my question should be: Why would you choose to use any of the
> 'row' methods over the 'all' methods if the default were being used?

Because DBI is providing an abstraction layer -- it is a "quirk" of
the DBD::mysql driver that the row methods have the same footprint as
the "all" methods.

-- 
	Lawrence Statton - lawrenabae@abaluon.abaom s/aba/c/g
Computer  software  consists of  only  two  components: ones  and
zeros, in roughly equal proportions.   All that is required is to
place them into the correct order.


------------------------------

Date: Mon, 14 Apr 2008 09:31:01 -0700 (PDT)
From: nolo contendere <simon.chao@fmr.com>
Subject: Re: design strategy on handling large DB
Message-Id: <3f923c0a-5b6e-4ed6-9709-af6ef051d76b@k13g2000hse.googlegroups.com>

On Apr 14, 12:27=A0pm, Lawrence Statton <yankeeinex...@gmail.com> wrote:
> nolo contendere <simon.c...@fmr.com> writes:
>
> > Sorry, my question should be: Why would you choose to use any of the
> > 'row' methods over the 'all' methods if the default were being used?
>
> Because DBI is providing an abstraction layer -- it is a "quirk" of
> the DBD::mysql driver that the row methods have the same footprint as
> the "all" methods.

Ok, didn't know if there was some non-obvious benefit for the mysql
driver to implement it that way...


------------------------------

Date: 14 Apr 2008 16:43:50 GMT
From: xhoster@gmail.com
Subject: Re: design strategy on handling large DB
Message-Id: <20080414124352.073$gi@newsreader.com>

nolo contendere <simon.chao@fmr.com> wrote:
> On Apr 14, 10:04=A0am, nolo contendere <simon.c...@fmr.com> wrote:
> > On Apr 13, 7:26=A0am, "Peter J. Holzer" <hjp-usen...@hjp.at> wrote:
 ...
> > > Wrong. DBD::mysql by default uses mysql_store_result, so the
> > > $sth->execute will (try to) load the entire table into memory. You
> > > need to tell DBD::mysql to use mysql_use_result instead:
> >
> > > =A0 =A0 $dbh->{mysql_use_result} =3D 1;
> >
> > > But then you cannot issue other queries until you have finished
> > > reading the results ...
> >
> > This is weird. So what would be the difference between any of the
> > fetch/select<row> methods and the fetch/select<all> methods for mysql
> > if the default 'mysql_store_result' were used?
>
> Sorry, my question should be: Why would you choose to use any of the
> 'row' methods over the 'all' methods if the default were being used?

For one, because you are using DBI and trying to abstract away the
peculiarities of the particular DBDs, to the extent possible.  The behavior
of DBD::mysql is certainly peculiar; you should only bow down to it as
a last resort.

For another, store_result is much more memory efficient, as it seems to
store all the data in a highly compact low level structure.  Then fetchrow
parcels it out into memory-inefficient Perl structures one row at a time.
While fetchall stores the whole result set in Perl structures, taking
several times as much memory.

Xho

-- 
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.


------------------------------

Date: 14 Apr 2008 16:48:21 GMT
From: xhoster@gmail.com
Subject: Re: design strategy on handling large DB
Message-Id: <20080414124823.289$Om@newsreader.com>

"Peter J. Holzer" <hjp-usenet2@hjp.at> wrote:
> On 2008-04-13 02:05, xhoster@gmail.com <xhoster@gmail.com> wrote:
> > "Ela" <ela@yantai.org> wrote:
> >> It's only 3GB... but I find when I run the current java program with
> >> the mysql, it uses up all the 2GB memory...
> >
> > What uses up all 2GB?  MySQL or Java?  If MySQL, then you have it
> > poorly configured.  If Java, then you must be loading all the data into
> > memory at one time, either accidentally or on purpose.
>
> I don't know about Java, but in Perl this is easy to do accidentally.
>
> if you do something like
>
>     my $sth = $dbh->prepare("select * from largetable");
>     $sth->execute();
>
>     while (my $r = $sth->fetchrow_hashref) {
>         process_one_row($r);
>     }
>
> You read one row at a time into memory. Right?
>
> Wrong. DBD::mysql by default uses mysql_store_result, so the
> $sth->execute will (try to) load the entire table into memory. You need
> to tell DBD::mysql to use mysql_use_result instead:
>
>     $dbh->{mysql_use_result} = 1;
>
> But then you cannot issue other queries until you have finished reading
> the results ...

I wasn't aware of that last limitation, or at least had forgotten it.  I
thought store_result was the default because mysql locks table during the
read and they wanted to make those locks last as briefly as possible, by
reading the all of the data up front.  But I guess it is also there to work
around this other limitation you point out.


Xho

-- 
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.


------------------------------

Date: Mon, 14 Apr 2008 09:55:03 -0700 (PDT)
From: nolo contendere <simon.chao@fmr.com>
Subject: Re: design strategy on handling large DB
Message-Id: <71923065-4fe4-4102-a027-dc71b27b5a7e@d1g2000hsg.googlegroups.com>

On Apr 14, 12:43=A0pm, xhos...@gmail.com wrote:
> nolo contendere <simon.c...@fmr.com> wrote:
> > On Apr 14, 10:04=3DA0am, nolo contendere <simon.c...@fmr.com> wrote:
> > > On Apr 13, 7:26=3DA0am, "Peter J. Holzer" <hjp-usen...@hjp.at> wrote:
> ...
> > > > Wrong. DBD::mysql by default uses mysql_store_result, so the
> > > > $sth->execute will (try to) load the entire table into memory. You
> > > > need to tell DBD::mysql to use mysql_use_result instead:
>
> > > > =3DA0 =3DA0 $dbh->{mysql_use_result} =3D3D 1;
>
> > > > But then you cannot issue other queries until you have finished
> > > > reading the results ...
>
> > > This is weird. So what would be the difference between any of the
> > > fetch/select<row> methods and the fetch/select<all> methods for mysql
> > > if the default 'mysql_store_result' were used?
>
> > Sorry, my question should be: Why would you choose to use any of the
> > 'row' methods over the 'all' methods if the default were being used?
>
> For one, because you are using DBI and trying to abstract away the
> peculiarities of the particular DBDs, to the extent possible. =A0The behav=
ior
> of DBD::mysql is certainly peculiar; you should only bow down to it as
> a last resort.
>
> For another, store_result is much more memory efficient, as it seems to
> store all the data in a highly compact low level structure. =A0Then fetchr=
ow
> parcels it out into memory-inefficient Perl structures one row at a time.
> While fetchall stores the whole result set in Perl structures, taking
> several times as much memory.
>

Ok, that makes sense--just seems like a 'gotcha' to keep in mind when
using DBI with mysql, as Peter pointed out earlier in this thread.
Thx.


------------------------------

Date: Mon, 14 Apr 2008 10:27:11 -0700 (PDT)
From: pgodfrin <pgodfrin@gmail.com>
Subject: Re: design strategy on handling large DB
Message-Id: <97cece15-368a-45d6-94ba-3269aa5e37dd@u12g2000prd.googlegroups.com>

On Apr 14, 11:55 am, nolo contendere <simon.c...@fmr.com> wrote:
> On Apr 14, 12:43 pm, xhos...@gmail.com wrote:
>
>
>
> > nolo contendere <simon.c...@fmr.com> wrote:
> > > On Apr 14, 10:04=A0am, nolo contendere <simon.c...@fmr.com> wrote:
> > > > On Apr 13, 7:26=A0am, "Peter J. Holzer" <hjp-usen...@hjp.at> wrote:
> > ...
> > > > > Wrong. DBD::mysql by default uses mysql_store_result, so the
> > > > > $sth->execute will (try to) load the entire table into memory. You
> > > > > need to tell DBD::mysql to use mysql_use_result instead:
>
> > > > > =A0 =A0 $dbh->{mysql_use_result} =3D 1;
>
> > > > > But then you cannot issue other queries until you have finished
> > > > > reading the results ...
>
> > > > This is weird. So what would be the difference between any of the
> > > > fetch/select<row> methods and the fetch/select<all> methods for mysql
> > > > if the default 'mysql_store_result' were used?
>
> > > Sorry, my question should be: Why would you choose to use any of the
> > > 'row' methods over the 'all' methods if the default were being used?
>
> > For one, because you are using DBI and trying to abstract away the
> > peculiarities of the particular DBDs, to the extent possible.  The behavior
> > of DBD::mysql is certainly peculiar; you should only bow down to it as
> > a last resort.
>
> > For another, store_result is much more memory efficient, as it seems to
> > store all the data in a highly compact low level structure.  Then fetchrow
> > parcels it out into memory-inefficient Perl structures one row at a time.
> > While fetchall stores the whole result set in Perl structures, taking
> > several times as much memory.
>
> Ok, that makes sense--just seems like a 'gotcha' to keep in mind when
> using DBI with mysql, as Peter pointed out earlier in this thread.
> Thx.

Yes that is certainly a "gotcha". However, the OP's question was never
answered (although the question was strangely posed...).

Question to the OP - why would you want to cache the table outside of
the DBMS? I can see in certain cases, like very (VERY) small 'lookup'
tables, which you would prefer not to have to I/O to the database
every time you user interface is called, but I think there is a module
to do that sort of thing already - Cache. But, I would not try to
cache the underlying database file - I would cache the results of a
select statement. For example if you have a code table which is 20
rows, I would select the 20 rows and cache the result. Not try to
cache the underlying database file in which those 20 rows are
located...

phil


------------------------------

Date: Mon, 14 Apr 2008 13:44:43 -0700
From: "szr" <szrRE@szromanMO.comVE>
Subject: Re: design strategy on handling large DB
Message-Id: <fu0fns028fb@news2.newsguy.com>

pgodfrin wrote:
> On Apr 14, 11:55 am, nolo contendere <simon.c...@fmr.com> wrote:
>> On Apr 14, 12:43 pm, xhos...@gmail.com wrote:
>>
>>
>>
>>> nolo contendere <simon.c...@fmr.com> wrote:
>>>> On Apr 14, 10:04=A0am, nolo contendere <simon.c...@fmr.com> wrote:
>>>>> On Apr 13, 7:26=A0am, "Peter J. Holzer" <hjp-usen...@hjp.at>
>>>>> wrote:
>>> ...
>>>>>> Wrong. DBD::mysql by default uses mysql_store_result, so the
>>>>>> $sth->execute will (try to) load the entire table into memory.
>>>>>> You need to tell DBD::mysql to use mysql_use_result instead:
>>
>>>>>> =A0 =A0 $dbh->{mysql_use_result} =3D 1;
>>
>>>>>> But then you cannot issue other queries until you have finished
>>>>>> reading the results ...
>>
>>>>> This is weird. So what would be the difference between any of the
>>>>> fetch/select<row> methods and the fetch/select<all> methods for
>>>>> mysql if the default 'mysql_store_result' were used?
>>
>>>> Sorry, my question should be: Why would you choose to use any of
>>>> the 'row' methods over the 'all' methods if the default were being
>>>> used?
>>
>>> For one, because you are using DBI and trying to abstract away the
>>> peculiarities of the particular DBDs, to the extent possible.  The
>>> behavior of DBD::mysql is certainly peculiar; you should only bow
>>> down to it as
>>> a last resort.
>>
>>> For another, store_result is much more memory efficient, as it
>>> seems to store all the data in a highly compact low level
>>> structure.  Then fetchrow parcels it out into memory-inefficient
>>> Perl structures one row at a time. While fetchall stores the whole
>>> result set in Perl structures, taking several times as much memory.
>>
>> Ok, that makes sense--just seems like a 'gotcha' to keep in mind when
>> using DBI with mysql, as Peter pointed out earlier in this thread.
>> Thx.
>
> Yes that is certainly a "gotcha". However, the OP's question was never
> answered (although the question was strangely posed...).

You could also take a look at DBD::mysqlPP

(http://search.cpan.org/~oyama/DBD-mysqlPP-0.04/mysqlPP.pm)

which is a pure Perl version of DBD::mysql

-- 
szr 




------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc.  For subscription or unsubscription requests, send
#the single line:
#
#	subscribe perl-users
#or:
#	unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.  

NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice. 

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 1439
***************************************

home	help	back	first	fref	pref	prev	next	nref	lref	last	post
[30196] in Perl-Users-Digest

Perl-Users Digest, Issue: 1439 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)Tue Apr 15 15:36:30 2008

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Tue Apr 15 15:36:30 2008