[30319] in Perl-Users-Digest
Perl-Users Digest, Issue: 1562 Volume: 11
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Wed May 21 03:10:13 2008
Date: Wed, 21 May 2008 00:09:09 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Wed, 21 May 2008 Volume: 11 Number: 1562
Today's topics:
Re: FAQ 6.19 What good is "\G" in a regular expression? <szrRE@szromanMO.comVE>
Re: FAQ 6.19 What good is "\G" in a regular expression? <benkasminbullock@gmail.com>
Re: FAQ 6.19 What good is "\G" in a regular expression? <szrRE@szromanMO.comVE>
Re: FAQ 6.19 What good is "\G" in a regular expression? <someone@example.com>
Re: high and low bytes of a decimal <jl_post@hotmail.com>
Re: How to determine if a word has an extended characte <someone@example.com>
Re: How to determine if a word has an extended characte <benkasminbullock@gmail.com>
Re: How to determine if a word has an extended characte <benkasminbullock@gmail.com>
Re: I need ideas on how to sort 350 million lines of da <RedGrittyBrick@SpamWeary.foo>
Re: I need ideas on how to sort 350 million lines of da chadda@lonemerchant.com
Re: I need ideas on how to sort 350 million lines of da <bill@ts1000.us>
import CSV files to Excel <slick.users@gmail.com>
Re: import CSV files to Excel <jimsgibson@gmail.com>
Re: import CSV files to Excel <see@sig.invalid>
Re: import CSV files to Excel <slick.users@gmail.com>
Re: import CSV files to Excel <slick.users@gmail.com>
new CPAN modules on Wed May 21 2008 (Randal Schwartz)
PerlNET 7.2.0 build 284799 compile error <arlie.c@gmail.com>
Re: script to find the files with very long names <jurgenex@hotmail.com>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Tue, 20 May 2008 20:21:14 -0700
From: "szr" <szrRE@szromanMO.comVE>
Subject: Re: FAQ 6.19 What good is "\G" in a regular expression?
Message-Id: <g104fb0aie@news4.newsguy.com>
PerlFAQ Server wrote:
[...]
> while (<>) {
> chomp;
> PARSER: {
> m/ \G( \d+\b )/gcx && do { print "number: $1\n"; redo; };
Just wodnering, why is there a C< \b > following the C< \d+ > ? I
mean
doesn't the C< \d+ > already match until it encounters a non-digit,
which implicitly include word boundries? If not, what is the reasoning
for the C< \b > in this case?
--
szr
------------------------------
Date: Wed, 21 May 2008 04:16:07 +0000 (UTC)
From: Ben Bullock <benkasminbullock@gmail.com>
Subject: Re: FAQ 6.19 What good is "\G" in a regular expression?
Message-Id: <g107m6$l7a$1@ml.accsnet.ne.jp>
On Tue, 20 May 2008 20:21:14 -0700, szr wrote:
>> m/ \G( \d+\b )/gcx && do { print "number: $1\n"; redo; };
>
> Just wodnering, why is there a C< \b > following the C< \d+ > ? I
> mean
> doesn't the C< \d+ > already match until it encounters a non-digit,
> which implicitly include word boundries? If not, what is the reasoning
> for the C< \b > in this case?
Notice that \d+\b and \d+ are different:
#!/usr/local/bin/perl -w
use strict;
my @tests = qw/123abc 123 abc123 abc123def/;
my @regex = qw/\d+\b \d+/;
for (@tests) {
print "'$_': ";
for my $r (@regex) {
print " '$r' matches" if (/$r/)
}
print "\n";
}
In this case it seems like Jeffrey had wanted to break text into words,
space, numbers, and "other", so he used \b to not match numbers in the
middle of words.
--
perl -e'@a=qw/Harder Better Faster Stronger/;use Time::HiRes"ualarm";@z=(0,2,4,
6);@v=("Work It","Make It","Do It","Makes Us");sub w{("")x$_[0]}$|=$t=432250;@f
=split"/","More Than/Ever/Hour/After/Our/Work Is/Never/Over";@e=((map{join(":",
@f[$_,$_+1])}@z),"");@w=map"$v[$_]:$a[$_]",0..3;@h=(@w,@e);@j=w(5);@t=(@v,@j,@a
,@j);@l=(@t,@f[@z],@j,(map{$f[$_+1]}@z),@j,@t,@w,@j,@e,w(4),(@h)x6,w(9),(@h)x7)
;ualarm$t,$t;$SIG{ALRM}=sub{print p()};while(1){}sub p{if(($c++)%2){exit if!@l;
if($_=shift@l){if(/(.*):(.*)/){$s=$2;$1}else{"$_\n"}}}elsif($s){" $s\n",$s=""}}'
------------------------------
Date: Tue, 20 May 2008 22:30:12 -0700
From: "szr" <szrRE@szromanMO.comVE>
Subject: Re: FAQ 6.19 What good is "\G" in a regular expression?
Message-Id: <g10c140i45@news4.newsguy.com>
Ben Bullock wrote:
> On Tue, 20 May 2008 20:21:14 -0700, szr wrote:
>
>>> m/ \G( \d+\b )/gcx && do { print "number: $1\n"; redo;
>>> };
>>
>> Just wodnering, why is there a C< \b > following the C< \d+ > ? I
>> mean
>> doesn't the C< \d+ > already match until it encounters a non-digit,
>> which implicitly include word boundries? If not, what is the
>> reasoning for the C< \b > in this case?
>
> Notice that \d+\b and \d+ are different:
>
> #!/usr/local/bin/perl -w
> use strict;
> my @tests = qw/123abc 123 abc123 abc123def/;
> my @regex = qw/\d+\b \d+/;
> for (@tests) {
> print "'$_': ";
> for my $r (@regex) {
> print " '$r' matches" if (/$r/)
> }
> print "\n";
> }
>
> In this case it seems like Jeffrey had wanted to break text into
> words, space, numbers, and "other", so he used \b to not match
> numbers in the middle of words.
Ah, thanks, that makes great sense now :-)
--
szr
------------------------------
Date: Wed, 21 May 2008 06:23:43 GMT
From: "John W. Krahn" <someone@example.com>
Subject: Re: FAQ 6.19 What good is "\G" in a regular expression?
Message-Id: <PdPYj.3227$Yp.1246@edtnps92>
PerlFAQ Server wrote:
>
> Typically you use the "\G" anchor with the "c" flag when you want to try
> a different match if one fails, such as in a tokenizer. Jeffrey Friedl
> offers this example which works in 5.004 or later.
>
> while (<>) {
> chomp;
> PARSER: {
> m/ \G( \d+\b )/gcx && do { print "number: $1\n"; redo; };
> m/ \G( \w+ )/gcx && do { print "word: $1\n"; redo; };
> m/ \G( \s+ )/gcx && do { print "space: $1\n"; redo; };
> m/ \G( [^\w\d]+ )/gcx && do { print "other: $1\n"; redo; };
> }
> }
The character class [^\w\d] doesn't make sense because \w includes \d so
perhaps it is supposed to be [^\w\s] instead?
John
--
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order. -- Larry Wall
------------------------------
Date: Tue, 20 May 2008 12:53:37 -0700 (PDT)
From: "jl_post@hotmail.com" <jl_post@hotmail.com>
Subject: Re: high and low bytes of a decimal
Message-Id: <43799df1-6c0f-4847-97d8-955e5cdade97@w1g2000prd.googlegroups.com>
On May 20, 11:29 am, Susanne West <sw...@gmx.de> wrote:
>
> thanks very much for clarifying! yes, you are more
> or less right. but i do need all the variants:
> - encoded little endian (goes into bytestream)
> - encoded big endian (goes into bytestream)
> - only low byte (for other reasons)
> - only high byte (for other reasons)
Now there's one thing you have to be aware of with bytes, or else
you'll mix things up: An 8-bit byte is basically just one
"character." Bytes have numerical "values," however, which are often
expressed in hexadecimal or decimal representation. What's important
to know is that these numerical "values" may represent the byte (or
"character"), but they are NOT the same.
Let me clarify with an example. Let's say you had a byte, like
this:
my $byte = 'A';
The $byte itself equals the character 'A', so this would be true:
if ($byte eq 'A') # evaluates to true (note the use of "eq")
and this would print out 'A':
print $byte; # prints out 'A'
but its "value" is actually 65 (or 0x41 in hexadecimal
representation). You can see this with the following code:
my $byteValue = ord($byte);
print $byteValue; # prints out "65"
As you might already know, the ord() function takes a single byte
and returns its numeric value. (The opposite of ord() is chr(), and
you can read about both in "perldoc -f ord" and "perldoc -f chr".)
So I have a question for you: When you say you need the high/low
byte, does that mean you need the literal byte "character," or the
byte's numerical "value"?
If you only need the numerical byte values, this is easy to do, as
it just involves doing mathematical operations on the $integerToSend,
like this:
my $integerToSend = 2008;
my $lowByteValue = integerToSend % 256; # 216
my $nextByteValue = (integerToSend >> 8) % 256; # 7
But if instead of the numerical byte value, you need the literal
byte character, you would just extract out the characters from the
$stringToSend, like this:
my $integerToSend = 2008;
my $stringToSend = pack('v', $integerToSend); # note little-
endian
my $lowByte = substr($stringToSend, 0, 1);
my $nextByte = substr($stringToSend, 1, 1);
(To learn more about the substr() function, read "perldoc -f substr".)
Note that if you print() out the $lowByte and $nextByte variables
you won't necessarily see numbers; what you'll see are characters that
correspond to values 7 and 216.
Also note that I've avoided using the term $highByte (even though I
used the term $lowByte). The reason for this is because if you try to
pack() a large value (like 200,000) into a two-byte string, the high-
bytes will be lost (meaning that the byte next to the low byte isn't
really the high byte, since the high byte(s) weren't recorded). To
avoid the confusion, I use $nextByte instead.
> but what happens if for unexpected reasons:
> $integerToSend = 200000;
> my $stringToSend = pack('n', $integerToSend);
> is it correct, that this is truncated to 65025?
I'm curious: Why do you say "65025"? Is it because you mean the
largest unsigned two-byte (16-bit) integer? If so, that's actually
256^2-1, which is actually 65535.
At any rate, no, it doesn't truncate it to either 65025 nor 65535.
What it does is convert the value 200000 to bytes whose values are 0,
3, 13, and 64, but since it can only keep two of them, it discards the
first two bytes and only keeps the lower two bytes (the ones whose
values are 13 and 64). And so 13*256 + 64 equals 3392.
So if you modified the code you gave a little while ago to this
code and ran it:
my $integerToSend = 200000;
my $stringToSend = '';
print "\n4 byte-values (big-endian):\n";
$stringToSend = pack('N', $integerToSend);
print ord, "\n" foreach split //, $stringToSend;
print "\n2 byte-values (also big-endian):\n";
$stringToSend = pack('n', $integerToSend);
print ord, "\n" foreach split //, $stringToSend;
you would see this output:
4 byte-values (big-endian):
0
3
13
64
2 byte-values (also big-endian):
13
64
(Note that the first two values are discarded when displaying only two
byte-values.)
> problem 2: only high and low bytes (similar story)
> what is the best (fastest, safest) way to extract
> the lowest two bytes of an decimal of unknown length?
> i'm currently using
> my $integerToSend = 2008;
> my $lowbyte = $decimal % 256;
> my $highbyte = $decimal >> 8;
> but i doubt that this is the 'proper' way to do it.
> especially when (again) for uexpected reasons:
> my $integerToSend = 200000;
> my $lowbyte = $decimal % 256;
> my $highbyte = $decimal >> 8;
If you're looking for byte "values," you can use what I did above:
my $integerToSend = 20000;
my $lowByteValue = integerToSend % 256; # 32
my $nextByteValue = (integerToSend >> 8) % 256; # 78
If you're looking for the literal byte "characters," you can
convert the values to byte-characters with the chr() function, like
this:
my $lowByte = chr($lowByteValue);
my $nextByte = chr($nextByteValue);
or you can pack() the $integerToSend to a $stringToSend with little-
endian ordering, and just extract out the first and second characters,
like this:
my $stringToSend = pack('v', $integerToSend); # note: little-
endian
my $lowByte = substr($stringToSend, 0, 1); # gets first byte
my $nextByte = substr($stringToSend, 1, 1); # gets next byte
> thanks for your comments. you've almost put me back
> on track...
You're very welcome.
Don't forget the distinction between the byte values and the bytes
themselves. Otherwise, if you want to send the bytes for the number
2008 over a bytestream, but send the strings "7" and "216" over
instead, you'll be sending the string "7216" (a four-byte string)
instead of $stringToSend (a two-byte string).
I hope this helps, Susanne.
-- Jean-Luc
------------------------------
Date: Tue, 20 May 2008 23:29:08 GMT
From: "John W. Krahn" <someone@example.com>
Subject: Re: How to determine if a word has an extended character?
Message-Id: <89JYj.3989$KB3.3516@edtnps91>
Hartmut Camphausen wrote:
> In <<405f2950-fa4a-4a3e-b5a7-c030a4604b2f@k1g2000prb.googlegroups.com>>=
=20
> schrieb ...
>> I have a file which contains just one word. My task is just to find
>> out if the word has any extended character. Thats all.
>>
>> I can use regex, but am not able to find out a regex pattern for
>> extended character. Any hints?
>>
>>
>> For example, if the file content is: sample, then the Perl code prints=
>> false; and if the file content is sampl=E9, then the Perl code prints
>> true.
>=20
>=20
> $string =3D~ m/[^\w]/ ? print "\nhas extended." : print "\nOK.";
[^\w] is usually written as \W.
> should do the trick.
>=20
> This prints "has extended" if $string contains any characters other=20
> ([^...]) then 'a' to 'z', 'A' to 'Z', '0' to '9' plus '_' (the \w=20
> character class).
From perlre.pod:
<QUOTE>
If "use locale" is in effect, the list of alphabetic characters=20
generated by "\w" is taken from the current locale. See perllocale.
</QUOTE>
In other words, if your locale supports it then '=E9' will be included in=
\w.
> If you want to exclude the '_' (contained in \w), use [^a-zA-Z0-9]
[^a-zA-Z0-9] means any character that is *not* alphanumeric. You=20
probably meant [a-zA-Z0-9].
John
--=20
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order. -- Larry Wall
------------------------------
Date: Wed, 21 May 2008 04:28:32 +0000 (UTC)
From: Ben Bullock <benkasminbullock@gmail.com>
Subject: Re: How to determine if a word has an extended character?
Message-Id: <g108dg$l7a$2@ml.accsnet.ne.jp>
On Tue, 20 May 2008 23:29:08 +0000, John W. Krahn wrote:
> Hartmut Camphausen wrote:
>>
>> $string =~ m/[^\w]/ ? print "\nhas extended." : print "\nOK.";
>
> [^\w] is usually written as \W.
Helmut mentioned that one could add more characters to the ^\w in the
following part of his post, which may explain why he chose this method rather
than using \W.
>> This prints "has extended" if $string contains any characters other
>> ([^...]) then 'a' to 'z', 'A' to 'Z', '0' to '9' plus '_' (the \w
>> character class).
>
> From perlre.pod:
>
> <QUOTE>
> If "use locale" is in effect, the list of alphabetic characters
> generated by "\w" is taken from the current locale. See perllocale.
> </QUOTE>
>
> In other words, if your locale supports it then 'é' will be included in
> \w.
Or if you use Unicode:
#!/usr/bin/perl
use warnings;
use strict;
use Unicode::UCD 'charinfo';
sub count_match
{
my ($re)=@_;
my $c;
for my $n (0x00 .. 0xD7FF, 0xE000 .. 0xFDCF, 0xFDF0.. 0xFFFD) {
if (chr($n) =~ /$re/) {
my $ci = charinfo($n);
# print sprintf ('%02X', $n), " which is ", $$ci{name}, " matches\n";
$c++;
}
}
print "There are $c characters matching \"$re\".\n";
}
count_match('\w');
Uncommenting the "print" statement will produce a lot of output.
>> If you want to exclude the '_' (contained in \w), use [^a-zA-Z0-9]
>
> [^a-zA-Z0-9] means any character that is *not* alphanumeric. You
> probably meant [a-zA-Z0-9].
I think he meant what he said, [^\w] matches _ but [^a-zA-Z0-9] doesn't.
------------------------------
Date: Wed, 21 May 2008 05:24:51 +0000 (UTC)
From: Ben Bullock <benkasminbullock@gmail.com>
Subject: Re: How to determine if a word has an extended character?
Message-Id: <g10bn3$mb9$1@ml.accsnet.ne.jp>
On Wed, 21 May 2008 04:28:32 +0000, Ben Bullock wrote:
>>> If you want to exclude the '_' (contained in \w), use [^a-zA-Z0-9]
>>
>> [^a-zA-Z0-9] means any character that is *not* alphanumeric. You
>> probably meant [a-zA-Z0-9].
>
> I think he meant what he said, [^\w] matches _ but [^a-zA-Z0-9] doesn't.
Sorry, I meant to say "[^\w] doesn't match _, but [^a-zA-Z0-9] does."
------------------------------
Date: Tue, 20 May 2008 19:51:17 +0100
From: RedGrittyBrick <RedGrittyBrick@SpamWeary.foo>
Subject: Re: I need ideas on how to sort 350 million lines of data
Message-Id: <SOqdnWhORq4rgK7VnZ2dnUVZ8qbinZ2d@bt.com>
Bill H wrote:
> On May 17, 11:21 am, cha...@lonemerchant.com wrote:
>> I have roughly 350 million lines of data in the following form
>>
>> name, price, weight, brand, sku, upc, size
>>
>> sitting on my home PC.
>>
>> Is there some kind of sane way to sort this without taking up too much
>> ram or jacking up my limited CPU time?
>
> Just out of curiosity I would like to know how someone has a file
> containing 350 million line of product information sitting on a home
> pc in the first place. I mean it had to have come from some sort of
> database to start with, and withthose numbers we aren't talking about
> a second hand store.
>
In an earlier thread* you'll see the OP is planning to download 350
million records one at a time from the doba.com website. Sinan pointed
out this would take 3.7 years of continuous scraping (at 3 pages/sec).
Perhaps the OP is planning ahead.
--
RGB
* "Need ideas on how to make this code faster than a speeding turtle"
------------------------------
Date: Tue, 20 May 2008 13:07:17 -0700 (PDT)
From: chadda@lonemerchant.com
Subject: Re: I need ideas on how to sort 350 million lines of data
Message-Id: <f5b27b49-a782-465a-9767-33dd8ecbfd70@t12g2000prg.googlegroups.com>
On May 18, 1:51 pm, xhos...@gmail.com wrote:
> cha...@lonemerchant.com wrote:
> > I have roughly 350 million lines of data in the following form
>
> > name, price, weight, brand, sku, upc, size
>
> Name, in particular, seems like it might be able to contain embedded
> punctuation and might be escaped in some way. That could complicate
> things
>
> > sitting on my home PC.
>
> What kind of PC is your home PC?
>
My home PC is an 700MHZ intel, 256MB RAM running Fedora Core Linux 6
------------------------------
Date: Tue, 20 May 2008 15:04:01 -0700 (PDT)
From: Bill H <bill@ts1000.us>
Subject: Re: I need ideas on how to sort 350 million lines of data
Message-Id: <b2ccceaa-9d9c-431a-9931-8bdf0e2c8a3f@t12g2000prg.googlegroups.com>
On May 20, 2:51=A0pm, RedGrittyBrick <RedGrittyBr...@SpamWeary.foo>
wrote:
> Bill H wrote:
> > On May 17, 11:21 am, cha...@lonemerchant.com wrote:
> >> I have roughly 350 million lines of data in the following form
>
> >> name, price, weight, brand, sku, upc, size
>
> >> sitting on my home PC.
>
> >> Is there some kind of sane way to sort this without taking up too much
> >> ram or jacking up my limited CPU time?
>
> > Just out of curiosity I would like to know how someone has a file
> > containing 350 million line of product information sitting on a home
> > pc in the first place. I mean it had to have come from some sort of
> > database to start with, and withthose numbers we aren't talking about
> > a second hand store.
>
> In an earlier thread* you'll see the OP is planning to download 350
> million records one at a time from the doba.com website. Sinan pointed
> out this would take 3.7 years of continuous scraping (at 3 pages/sec).
>
> Perhaps the OP is planning ahead.
>
> --
> RGB
> * "Need ideas on how to make this code faster than a speeding turtle"- Hid=
e quoted text -
>
> - Show quoted text -
Well if he was downloading them individually he should have sorted
them at the same time and killed 2 birds with one stone in those 3.7
years.
Bill H
BTW - whats up with google now using captcha in their posting??
------------------------------
Date: Tue, 20 May 2008 15:24:22 -0700 (PDT)
From: Slickuser <slick.users@gmail.com>
Subject: import CSV files to Excel
Message-Id: <9f5328bc-31bd-4f5a-9982-034c30e8d01f@i36g2000prf.googlegroups.com>
I have roughly around 2000 files of CSV (each file is in the range of
49bytes-10MB, 117218 lines), a total of ~35MB.
I would like to import each CSV file to an individual sheet in Excel
workbook.
Is it best to open each CSV file, add new sheet, copy & paste to Excel
workbook?
After that, I will do all text formating, sorting, and hyper links for
all sheets.
Thanks.
------------------------------
Date: Tue, 20 May 2008 17:21:11 -0700
From: Jim Gibson <jimsgibson@gmail.com>
Subject: Re: import CSV files to Excel
Message-Id: <200520081721111755%jimsgibson@gmail.com>
In article
<9f5328bc-31bd-4f5a-9982-034c30e8d01f@i36g2000prf.googlegroups.com>,
Slickuser <slick.users@gmail.com> wrote:
> I have roughly around 2000 files of CSV (each file is in the range of
> 49bytes-10MB, 117218 lines), a total of ~35MB.
>
> I would like to import each CSV file to an individual sheet in Excel
> workbook.
>
> Is it best to open each CSV file, add new sheet, copy & paste to Excel
> workbook?
"best" is a very subjective term. Best in what sense?
How are you going to copy & paste to an Excel workbook? Are you going
to use the Spreadsheet::WriteExcel module to create an xls file
directly, or are you going to use Win32::OLE to control the Excel
application on Windows.
If you are going to use the Spreadsheet::WriteExcel module, there will
be no "copy & paste"-ing. You will read data from each CSV file and
write to each worksheet.
>
> After that, I will do all text formating, sorting, and hyper links for
> all sheets.
Do you intend to do this formating, sorting, and hyper-linking manually
or programatically. I hope you mean programatically, because it will
take you a long time to format, sort, and link 2000 worksheets. If that
is the case, you are better off doing the formating and sorting before
you write the data to the spreadsheet. I am not sure about the linking,
though, or if that is even possible with Spreadsheet::WriteExcel
module.
So you will end up with a 2000-sheet workbook? What possible use can
you have for such a beast.
It sounds like you could use a database, instead.
If you want people to advise you, you need to explain better what you
are trying to do, how you intend to create the spreadsheet, and how you
intend to use the resulting data.
--
Jim Gibson
Posted Via Usenet.com Premium Usenet Newsgroup Services
----------------------------------------------------------
http://www.usenet.com
------------------------------
Date: Tue, 20 May 2008 20:30:35 -0400
From: Bob Walton <see@sig.invalid>
Subject: Re: import CSV files to Excel
Message-Id: <48336cb5$0$30241$4c368faf@roadrunner.com>
Slickuser wrote:
> I have roughly around 2000 files of CSV (each file is in the range of
> 49bytes-10MB, 117218 lines), a total of ~35MB.
>
> I would like to import each CSV file to an individual sheet in Excel
> workbook.
>
> Is it best to open each CSV file, add new sheet, copy & paste to Excel
> workbook?
>
> After that, I will do all text formating, sorting, and hyper links for
> all sheets.
...
I assume you mean how to automate it using Perl, since you asked in a
Perl newsgroup? One way is to
use Win32::OLE;
Look in the docs for that module for usage examples. If you have
specific Perl problems doing that, post what you have done and the
specific problems you are having.
However, Excel has limitations. There is a limit (65536) to how many
rows there are in a spreadsheet, so your 10 Mb file may or may not fit,
depending upon how many lines there are. If the 117218 lines are all in
the 10 Mb file (rather than the total of all the files, as you might
mean), then your scheme won't work. Also, it is claimed that the number
of sheets in a workbook is limited only by memory, but I would be
surprised if 2000 came off without a hitch. It doesn't seem like it
would be very useful, either. It would take gobs of memory, and be
sluggish to load and save, and probably sluggish on other tasks as well.
I have no clue what your requirements are, but you probably should
explore the use of other tools, like perhaps a database (Access,
perhaps, or MySQL, PostgreSQL, etc). Perl's DBI module would prove very
useful in populating and manipulating such a database.
--
Bob Walton
Email: http://bwalton.com/cgi-bin/emailbob.pl
------------------------------
Date: Tue, 20 May 2008 22:28:45 -0700 (PDT)
From: Slickuser <slick.users@gmail.com>
Subject: Re: import CSV files to Excel
Message-Id: <3b507a0b-314f-486e-bfb8-ea6da6efc61d@d77g2000hsb.googlegroups.com>
I was using Win32:OLE to my print my data to Excel but it seem to take
more than 30 minutes. Once I'm done, I format all worksheet, sort, and
add hyper links automatically.
So I decide to print to an individual file as CSV format first (this
take only 4 minutes).
Now I have about 2000 CSV files and I want to combine into one Excel
workbook as each CSV file is a worksheet using Win32:OLE.
I split the CSV once I reach 65536 row to prevent Excel limitation.
I'm using Win32:OLE to open up the CSV file.
Here is the code, it takes about 14 minutes to run to copy csv to
excel only.
But it seem it's pasting the content from my Ctrl+C not from Excel
copy to memory.
use strict;
use warnings;
use Win32::OLE;
use Win32::OLE::Const 'Microsoft Excel';
print localtime() ." START \n";
my $localTime = localtime();
$localTime =~ s/\s/_/g;
$localTime =~ s/\:/_/g;
my $csv_path = "C:/slickuser/Tue_May_20_20_49_44_2008/*.csv";
my $fileOutput = "C:/slickuser/out.xls";
my $MAX_ROW = 65356;
my @csvFile = glob($csv_path);
my ($Excel,$CSV,$Workbook,$CurrentSheet,$CSV_WB);
$Excel = Win32::OLE->new('Excel.Application', 'Quit') || die "Can't
create Excel object \n";
$Excel->{'Visible'} = 0; #0 is hidden, 1 is visible
$Excel->{DisplayAlerts}= 0; #0 is hide alerts
$Excel->{SheetsInNewWorkbook} = 0;
$CSV = Win32::OLE->new('Excel.Application', 'Quit') || die "Can't
create Excel object \n";
$CSV->{'Visible'} = 0; #0 is hidden, 1 is visible
$CSV->{DisplayAlerts}= 0; #0 is hide alerts
$Workbook = $Excel->Workbooks->Add();
$Workbook->SaveAs($fileOutput) or die $!;
foreach my $filename (sort(@csvFile))
{
$filename =~ s/\\/\//g;
my @tempFile = split(/\//,$filename);
my @sheetName = split(/\./,$tempFile[scalar(@tempFile)-1]);
$CSV_WB = $CSV->Workbooks->Open($filename);
$CSV_WB->ActiveSheet->Range("A1:Z$MAX_ROW")->Select;
$CSV_WB->Copy;
$Workbook->Worksheets->Add( {after => $Workbook-
>Worksheets($Workbook->Worksheets->{count})} );
$CurrentSheet = $Workbook->ActiveSheet;
$CurrentSheet->{Name} = $sheetName[0];
$Workbook->ActiveSheet->Paste();
$Excel->ActiveWorkbook->Save();
$CSV_WB->Close();
}
$Excel->ActiveWorkbook->Save();
$Excel->Quit();
$CSV->Quit();
Win32::OLE->FreeUnusedLibraries();
print localtime() ." END \n";
On May 20, 5:30 pm, Bob Walton <s...@sig.invalid> wrote:
> Slickuser wrote:
> > I have roughly around 2000 files of CSV (each file is in the range of
> > 49bytes-10MB, 117218 lines), a total of ~35MB.
>
> > I would like to import each CSV file to an individual sheet in Excel
> > workbook.
>
> > Is it best to open each CSV file, add new sheet, copy & paste to Excel
> > workbook?
>
> > After that, I will do all text formating, sorting, and hyper links for
> > all sheets.
>
> ...
> I assume you mean how to automate it using Perl, since you asked in a
> Perl newsgroup? One way is to
>
> use Win32::OLE;
>
> Look in the docs for that module for usage examples. If you have
> specific Perl problems doing that, post what you have done and the
> specific problems you are having.
>
> However, Excel has limitations. There is a limit (65536) to how many
> rows there are in a spreadsheet, so your 10 Mb file may or may not fit,
> depending upon how many lines there are. If the 117218 lines are all in
> the 10 Mb file (rather than the total of all the files, as you might
> mean), then your scheme won't work. Also, it is claimed that the number
> of sheets in a workbook is limited only by memory, but I would be
> surprised if 2000 came off without a hitch. It doesn't seem like it
> would be very useful, either. It would take gobs of memory, and be
> sluggish to load and save, and probably sluggish on other tasks as well.
>
> I have no clue what your requirements are, but you probably should
> explore the use of other tools, like perhaps a database (Access,
> perhaps, or MySQL, PostgreSQL, etc). Perl's DBI module would prove very
> useful in populating and manipulating such a database.
> --
> Bob Walton
> Email:http://bwalton.com/cgi-bin/emailbob.pl
------------------------------
Date: Tue, 20 May 2008 23:06:59 -0700 (PDT)
From: Slickuser <slick.users@gmail.com>
Subject: Re: import CSV files to Excel
Message-Id: <605ade98-8449-4569-a3b8-e0998012d7af@m73g2000hsh.googlegroups.com>
If I change the copy code to:
$CSV_WB->ActiveSheet->Range("A1:Z$MAX_ROW")->Copy;
At a sample of 3 CSV files, 17 lines each. It took about 7 minutes to
run the script.
I am thinking it was A1:Z65536 that slow the script.
Is there a better approach for this to speed thing up with 2000 files?
Thanks.
On May 20, 10:28 pm, Slickuser <slick.us...@gmail.com> wrote:
> I was using Win32:OLE to my print my data to Excel but it seem to take
> more than 30 minutes. Once I'm done, I format all worksheet, sort, and
> add hyper links automatically.
>
> So I decide to print to an individual file as CSV format first (this
> take only 4 minutes).
> Now I have about 2000 CSV files and I want to combine into one Excel
> workbook as each CSV file is a worksheet using Win32:OLE.
>
> I split the CSV once I reach 65536 row to prevent Excel limitation.
>
> I'm using Win32:OLE to open up the CSV file.
>
> Here is the code, it takes about 14 minutes to run to copy csv to
> excel only.
>
> But it seem it's pasting the content from my Ctrl+C not from Excel
> copy to memory.
>
> use strict;
> use warnings;
> use Win32::OLE;
> use Win32::OLE::Const 'Microsoft Excel';
>
> print localtime() ." START \n";
>
> my $localTime = localtime();
> $localTime =~ s/\s/_/g;
> $localTime =~ s/\:/_/g;
>
> my $csv_path = "C:/slickuser/Tue_May_20_20_49_44_2008/*.csv";
> my $fileOutput = "C:/slickuser/out.xls";
> my $MAX_ROW = 65356;
>
> my @csvFile = glob($csv_path);
> my ($Excel,$CSV,$Workbook,$CurrentSheet,$CSV_WB);
>
> $Excel = Win32::OLE->new('Excel.Application', 'Quit') || die "Can't
> create Excel object \n";
> $Excel->{'Visible'} = 0; #0 is hidden, 1 is visible
> $Excel->{DisplayAlerts}= 0; #0 is hide alerts
> $Excel->{SheetsInNewWorkbook} = 0;
>
> $CSV = Win32::OLE->new('Excel.Application', 'Quit') || die "Can't
> create Excel object \n";
> $CSV->{'Visible'} = 0; #0 is hidden, 1 is visible
> $CSV->{DisplayAlerts}= 0; #0 is hide alerts
>
> $Workbook = $Excel->Workbooks->Add();
> $Workbook->SaveAs($fileOutput) or die $!;
>
> foreach my $filename (sort(@csvFile))
> {
> $filename =~ s/\\/\//g;
> my @tempFile = split(/\//,$filename);
> my @sheetName = split(/\./,$tempFile[scalar(@tempFile)-1]);
> $CSV_WB = $CSV->Workbooks->Open($filename);
> $CSV_WB->ActiveSheet->Range("A1:Z$MAX_ROW")->Select;
> $CSV_WB->Copy;
> $Workbook->Worksheets->Add( {after => $Workbook->Worksheets($Workbook->Worksheets->{count})} );
>
> $CurrentSheet = $Workbook->ActiveSheet;
> $CurrentSheet->{Name} = $sheetName[0];
> $Workbook->ActiveSheet->Paste();
> $Excel->ActiveWorkbook->Save();
> $CSV_WB->Close();
> }
>
> $Excel->ActiveWorkbook->Save();
> $Excel->Quit();
> $CSV->Quit();
> Win32::OLE->FreeUnusedLibraries();
>
> print localtime() ." END \n";
>
> On May 20, 5:30 pm, Bob Walton <s...@sig.invalid> wrote:
>
> > Slickuser wrote:
> > > I have roughly around 2000 files of CSV (each file is in the range of
> > > 49bytes-10MB, 117218 lines), a total of ~35MB.
>
> > > I would like to import each CSV file to an individual sheet in Excel
> > > workbook.
>
> > > Is it best to open each CSV file, add new sheet, copy & paste to Excel
> > > workbook?
>
> > > After that, I will do all text formating, sorting, and hyper links for
> > > all sheets.
>
> > ...
> > I assume you mean how to automate it using Perl, since you asked in a
> > Perl newsgroup? One way is to
>
> > use Win32::OLE;
>
> > Look in the docs for that module for usage examples. If you have
> > specific Perl problems doing that, post what you have done and the
> > specific problems you are having.
>
> > However, Excel has limitations. There is a limit (65536) to how many
> > rows there are in a spreadsheet, so your 10 Mb file may or may not fit,
> > depending upon how many lines there are. If the 117218 lines are all in
> > the 10 Mb file (rather than the total of all the files, as you might
> > mean), then your scheme won't work. Also, it is claimed that the number
> > of sheets in a workbook is limited only by memory, but I would be
> > surprised if 2000 came off without a hitch. It doesn't seem like it
> > would be very useful, either. It would take gobs of memory, and be
> > sluggish to load and save, and probably sluggish on other tasks as well.
>
> > I have no clue what your requirements are, but you probably should
> > explore the use of other tools, like perhaps a database (Access,
> > perhaps, or MySQL, PostgreSQL, etc). Perl's DBI module would prove very
> > useful in populating and manipulating such a database.
> > --
> > Bob Walton
> > Email:http://bwalton.com/cgi-bin/emailbob.pl
------------------------------
Date: Wed, 21 May 2008 04:42:20 GMT
From: merlyn@stonehenge.com (Randal Schwartz)
Subject: new CPAN modules on Wed May 21 2008
Message-Id: <K17BqK.1BG4@zorch.sf-bay.org>
The following modules have recently been added to or updated in the
Comprehensive Perl Archive Network (CPAN). You can install them using the
instructions in the 'perlmodinstall' page included with your Perl
distribution.
BDB-Wrapper-0.08
http://search.cpan.org/~hikarine/BDB-Wrapper-0.08/
Wrapper module for BerkeleyDB.pm
----
BLOB-1.00
http://search.cpan.org/~juerd/BLOB-1.00/
Perl extension for explicitly marking binary strings
----
CPANPLUS-Dist-Gentoo-0.01
http://search.cpan.org/~vpit/CPANPLUS-Dist-Gentoo-0.01/
CPANPLUS backend generating Gentoo ebuilds.
----
Cache-Swifty-0.07
http://search.cpan.org/~kazuho/Cache-Swifty-0.07/
A Perl frontend for the Swifty cache engine
----
Catalyst-View-TT-ForceUTF8-0.09
http://search.cpan.org/~lyokato/Catalyst-View-TT-ForceUTF8-0.09/
Template View Class with utf8 encoding
----
Class-DBI-Plugin-FilterOnClick-1.2
http://search.cpan.org/~aaronjj/Class-DBI-Plugin-FilterOnClick-1.2/
Generate browsable and searchable HTML Tables using FilterOnClick in conjunction with Class::DBI
----
Config-Model-CursesUI-1.007
http://search.cpan.org/~ddumont/Config-Model-CursesUI-1.007/
Curses interface to edit config data
----
Config-Model-Itself-0.202
http://search.cpan.org/~ddumont/Config-Model-Itself-0.202/
Model editor for Config::Model
----
Config-Model-Xorg-0.512
http://search.cpan.org/~ddumont/Config-Model-Xorg-0.512/
Xorg configuration model for Config::Model
----
DBIx-Class-Schema-Slave-0.02400
http://search.cpan.org/~travail/DBIx-Class-Schema-Slave-0.02400/
DBIx::Class::Schema for slave (EXPERIMENTAL)
----
DateTime-0.4302
http://search.cpan.org/~drolsky/DateTime-0.4302/
A date and time object
----
Devel-CoverX-Covered-0.01
http://search.cpan.org/~johanl/Devel-CoverX-Covered-0.01/
Collect and report caller (test file) and covered (source file) statistics from the cover_db
----
Dowse-BadSSH-0.09
http://search.cpan.org/~samv/Dowse-BadSSH-0.09/
----
EV-3.4
http://search.cpan.org/~mlehmann/EV-3.4/
perl interface to libev, a high performance full-featured event loop
----
GRID-Machine-0.092
http://search.cpan.org/~casiano/GRID-Machine-0.092/
Remote Procedure Calls over a SSH link
----
Games-RolePlay-MapGen-1.2.15
http://search.cpan.org/~jettero/Games-RolePlay-MapGen-1.2.15/
The base object for generating dungeons and maps
----
Helios-1.19_05
http://search.cpan.org/~lajandy/Helios-1.19_05/
----
IO-CaptureOutput-1.08_50
http://search.cpan.org/~dagolden/IO-CaptureOutput-1.08_50/
capture STDOUT and STDERR from Perl code, subprocesses or XS
----
IO-Lambda-0.17
http://search.cpan.org/~karasik/IO-Lambda-0.17/
non-blocking I/O in lambda style
----
Image-Size-FillFullSelect-0.0.0
http://search.cpan.org/~vvelox/Image-Size-FillFullSelect-0.0.0/
Choose wether a image fill setting for a image should be fill or full.
----
Imager-0.65
http://search.cpan.org/~tonyc/Imager-0.65/
Perl extension for Generating 24 bit Images
----
Kephra-0.3.9.10
http://search.cpan.org/~lichtkind/Kephra-0.3.9.10/
crossplatform, CPAN-installable GUI-Texteditor along perllike Paradigms
----
Log-Log4perl-1.16
http://search.cpan.org/~mschilli/Log-Log4perl-1.16/
Log4j implementation for Perl
----
MRO-Compat-0.07
http://search.cpan.org/~blblack/MRO-Compat-0.07/
mro::* interface compatibility for Perls < 5.9.5
----
Math-Random-MT-Perl-1.00
http://search.cpan.org/~jfreeman/Math-Random-MT-Perl-1.00/
Pure Perl Pseudorandom Number Generator
----
Net-Akamai-0.10
http://search.cpan.org/~jgoulah/Net-Akamai-0.10/
----
Number-Compare-Duration-0.001
http://search.cpan.org/~hdp/Number-Compare-Duration-0.001/
numeric comparisons of time durations
----
PDF-Create-0.9
http://search.cpan.org/~markusb/PDF-Create-0.9/
create PDF files
----
POE-Filter-Zlib-1.96
http://search.cpan.org/~bingos/POE-Filter-Zlib-1.96/
A POE filter wrapped around Compress::Zlib
----
POSIX-Regex-0.90.3
http://search.cpan.org/~jettero/POSIX-Regex-0.90.3/
OO interface for the gnu regex engine
----
Perl-Critic-1.083_005
http://search.cpan.org/~elliotjs/Perl-Critic-1.083_005/
Critique Perl source code for best-practices.
----
Sjis-0.18
http://search.cpan.org/~ina/Sjis-0.18/
Source code filter for ShiftJIS script
----
Spreadsheet-XLSX-0.03
http://search.cpan.org/~dmow/Spreadsheet-XLSX-0.03/
Perl extension for reading MS Excel 2007 files;
----
Term-GentooFunctions-1.1.7
http://search.cpan.org/~jettero/Term-GentooFunctions-1.1.7/
provides gentoo's einfo, ewarn, eerror, ebegin and eend.
----
Test-File-1.24
http://search.cpan.org/~bdfoy/Test-File-1.24/
test file attributes
----
Test-POE-Client-TCP-0.04
http://search.cpan.org/~bingos/Test-POE-Client-TCP-0.04/
A POE Component providing TCP client services for test cases
----
Test-POE-Server-TCP-0.10
http://search.cpan.org/~bingos/Test-POE-Server-TCP-0.10/
A POE Component providing TCP server services for test cases
----
Text-Wrap-Smart-XS-0.04_01
http://search.cpan.org/~schubiger/Text-Wrap-Smart-XS-0.04_01/
Wrap text fast into chunks of (mostly) equal length
----
WWW-MobileCarrierJP-0.16
http://search.cpan.org/~tokuhirom/WWW-MobileCarrierJP-0.16/
scrape mobile carrier information
----
ZConf-0.0.0
http://search.cpan.org/~vvelox/ZConf-0.0.0/
A configuration system allowing for either file or LDAP backed storage.
----
autobox-2.51
http://search.cpan.org/~chocolate/autobox-2.51/
call methods on native types
----
autobox-2.52
http://search.cpan.org/~chocolate/autobox-2.52/
call methods on native types
----
openStatisticalServices-0.014
http://search.cpan.org/~rphaney/openStatisticalServices-0.014/
----
parrot-0.6.2
http://search.cpan.org/~chromatic/parrot-0.6.2/
----
perl-GPS-0.16
http://search.cpan.org/~srezic/perl-GPS-0.16/
----
pod-mode-0.5
http://search.cpan.org/~schwigon/pod-mode-0.5/
If you're an author of one of these modules, please submit a detailed
announcement to comp.lang.perl.announce, and we'll pass it along.
This message was generated by a Perl program described in my Linux
Magazine column, which can be found on-line (along with more than
200 other freely available past column articles) at
http://www.stonehenge.com/merlyn/LinuxMag/col82.html
print "Just another Perl hacker," # the original
--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Smalltalk/Perl/Unix consulting, Technical writing, Comedy, etc. etc.
See http://methodsandmessages.vox.com/ for Smalltalk and Seaside discussion
------------------------------
Date: Tue, 20 May 2008 22:47:01 -0700 (PDT)
From: Arlie <arlie.c@gmail.com>
Subject: PerlNET 7.2.0 build 284799 compile error
Message-Id: <89161510-4265-4b65-b067-b9b03bcc4e88@w5g2000prd.googlegroups.com>
Hi,
I have Microsoft .NET Framework 2.0 Redistributable Package (x86).exe
and Microsoft .NET Framework 2.0 Software Development Kit (SDK) (x86)
- setup.exe both installed.
I'm now testing Activatestate PDK 7 and tried to compile winform
example but got this error:
PerlNET 7.2.0 build 284799
Copyright (C) 1998-2008 ActiveState Software Inc. All rights
reserved.
Commercial license for ibm
System.ApplicationException: Can't locate type TextBox
at PerlRuntime.Interpreter.GetType(Int32 x, String klass)
at PerlRuntime.Interpreter.typeof(Int32 x, Int32 klass_, Int32& e,
Int32& rt)
I really need help me on this because I am now testing Perl.NET. Below
is the code I got from the User's Guide:
package HelloWorldForm;
use strict;
use PerlNET qw(with AUTOCALL);
use namespace "System";
use namespace "System.Windows.Forms";
use namespace "System.Drawing";
=for interface
[extends: Form]
[STAThread]
static void Main();
private field TextBox textBox1;
private void button1_Click(any sender, EventArgs evArgs);
=cut
sub Main {
my $self = HelloWorldForm->new;
Application->Run($self);
}
sub HelloWorldForm {
my $this = shift;
with(my $textBox1 = TextBox->new(),
Text => "Hello Windows Forms World",
Location => Point->new(16, 24),
Size => Size->new(360, 20),
TabIndex => 1);
with(my $button1 = Button->new(),
Text => "Click Me!",
Location => Point->new(256, 64),
Size => Size->new(120, 40),
TabIndex => 2);
$button1->add_Click(EventHandler->new($this, "button1_Click"));
my $height = SystemInformation->CaptionHeight;
with($this,
Text => "Hello Windows Forms World",
AutoScaleBaseSize => Size->new(5, 13),
ClientSize => Size->new(392, 117),
MinimumSize => Size->new(392, int(117 + $height)),
AcceptButton => $button1,
textBox1 => $textBox1);
$this->{Controls}->Add($_) for $textBox1, $button1;
}
sub button1_Click {
my($this, $sender, $evargs) = @_;
MessageBox->Show("Text is: '$this->{textBox1}->{Text}'");
}
------------------------------
Date: Tue, 20 May 2008 22:34:53 GMT
From: Jürgen Exner <jurgenex@hotmail.com>
Subject: Re: script to find the files with very long names
Message-Id: <i8k634p52p650e4n1su69pdknbmbl6ime4@4ax.com>
Ilya Zakharevich <nospam-abuse@ilyaz.org> wrote:
>[A complimentary Cc of this posting was NOT [per weedlist] sent to
>Jürgen Exner
><jurgenex@hotmail.com>], who wrote in article <d94434hooi2jh36ger39usqo47ro1vhtfo@4ax.com>:
>> use strict; use warnings;
>> use File::Find;
>> sub wanted{print "$_\n" if length>26;}
>> find(\&wanted, '.');
>
>Wrong. You do not want to use $_ there (or use 'nochdir').
Why? The docs say
"$_" [contains] the current filename
As far as I can tell that's exactly what the OP asked for: filenames
longer than 26 characters.
>Better, use
> pfind . "length > 26"
C:\tmp>perldoc -f pfind
No documentation for perl function `pfind' found
jue
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc. For subscription or unsubscription requests, send
#the single line:
#
# subscribe perl-users
#or:
# unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.
NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice.
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.
#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V11 Issue 1562
***************************************