[17373] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 4795 Volume: 9

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Thu Nov 2 14:15:48 2000

Date: Thu, 2 Nov 2000 11:15:17 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Message-Id: <973192517-v9-i4795@ruby.oce.orst.edu>
Content-Type: text

Perl-Users Digest           Thu, 2 Nov 2000     Volume: 9 Number: 4795

Today's topics:
    Re: Want to process all files less than 24 hours old (Jerome O'Neil)
    Re: Want to process all files less than 24 hours old (Jerome O'Neil)
    Re: Want to process all files less than 24 hours old (Tom Christiansen)
    Re: Want to process all files less than 24 hours old <bh_ent@my-deja.com>
    Re: Want to process all files less than 24 hours old <bart.lateur@skynet.be>
    Re: Want to process all files less than 24 hours old <mark-lists@webstylists.com>
    Re: Want to process all files less than 24 hours old <mark-lists@webstylists.com>
        Win32::OLE with office2000 <lukus@hongkong.com>
        Digest Administrivia (Last modified: 16 Sep 99) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Thu, 02 Nov 2000 17:58:19 GMT
From: jerome@activeindexing.com (Jerome O'Neil)
Subject: Re: Want to process all files less than 24 hours old
Message-Id: <%WhM5.814$3T.123631@news.uswest.net>

tadmc@metronet.com (Tad McClellan) elucidates:

>>I have the idea that there's an entire
>>online collection of documentation that I'm missing if I could ever
>>find it.
> 
> 
> Ah hah!
> 
> My PSI::ESP module works like a well-oiled machine!

Damn!  You are good.

-- 
"Civilization rests on two things: the discovery that fermentation 
produces alcohol, and the voluntary ability to inhibit defecation.  
And I put it to you, where would this splendid civilization be without 
both?" --Robertson Davies "The Rebel Angels" 


------------------------------

Date: Thu, 02 Nov 2000 18:09:27 GMT
From: jerome@activeindexing.com (Jerome O'Neil)
Subject: Re: Want to process all files less than 24 hours old
Message-Id: <r5iM5.831$3T.128256@news.uswest.net>

Mark Thompson <mark-lists@webstylists.com> elucidates:

>>I think once we know that, you'll be much enlightened.
> 
> The only thing I could find on the subject in the documentation was
> what's at:
> 
> 	http://search.cpan.org/doc/JHI/perl-5.7.0/lib/File/Find.pm
> 
> This lets me know that there has to be way more than I can see here
> (by mentioning things such as Find::File::name), but it doesn't point
> me in the right direction.   I have the idea that there's an entire
> online collection of documentation that I'm missing if I could ever
> find it.

There *is* an entire online collection of documentation that
you're missing, and it's most likely right on your disk.  While some find
the documentation hard to navigate, once you are familiar with its layout,
I think you will find it quite thourough.

But lets go back to File::Find.  To use it properly, you should
understand file tests, and references, as you need to pass a code
reference to the find method.

Do you feel you have a good grasp on references and file tests?

Try this, and see if it helps you grok the solution.

Substitute a known directory for /home/foo.

perl -MFile::Find -e 'find(sub {print $File::Find::name,"\n"}, "/home/foo")'







> 
> Thanks,
> 
> Mark

-- 
"Civilization rests on two things: the discovery that fermentation 
produces alcohol, and the voluntary ability to inhibit defecation.  
And I put it to you, where would this splendid civilization be without 
both?" --Robertson Davies "The Rebel Angels" 


------------------------------

Date: 2 Nov 2000 11:16:26 -0700
From: tchrist@perl.com (Tom Christiansen)
Subject: Re: Want to process all files less than 24 hours old
Message-Id: <3a01af7a@cs.colorado.edu>

In article <p9p10tk2sp6s2q8poklv74onp8afh910vh@4ax.com>,
Mark Thompson  <mark-lists@webstylists.com> wrote:
>The only thing I could find on the subject in the documentation was
>what's at:
>
>	http://search.cpan.org/doc/JHI/perl-5.7.0/lib/File/Find.pm


Huh?  What are you doing over there?  That's not on your system.
YOU HAVE THE PERL DOCUMENTATION.  

There *is* documentation you might not have, like

    head1 File::Find

	use File::Find;

	# Print out all directories below current one.
	find sub { print "$File::Find::name\n" if -d }, ".";

	# Compute total space used by all files in listed directories.
	@dirs = @ARGV ? @ARGV : ('.');
	my $sum = 0;
	find sub { $sum += -s }, @dirs;
	print "@dirs contained $sum bytes\n";

	# Alter default behavior to go through symlinks
	# and visit sub-directories first.
	find { wanted => \&myfunc, follow => 1, bydepth => 1 }, ".";

    The C<File::Find> module's C<find> function recursively descends
    directories.  Its first argument should be a reference to a
    function, and all following arguments should be directories.
    The function is called on each filename from the listed
    directories.  Within that function, the C<$_> variable is set
    to the basename of the current filename visited, and the process's
    current working directory is by default set to that directory.
    The package variable C<$File::Find::name> is the full pathname
    of the visited filename.  An alternative calling convention
    takes as its first argument a reference to a hash containing
    option specifications, including "C<wanted>", "C<bydepth>",
    "C<follow>", "C<follow_fast>", "C<follow_skip>", "C<no_chdir>",
    "C<untaint>", "C<untaint_pattern>", and "C<untaint_skip>", as
    fully explained in the online documentation.  This module is
    also used by the standard I<find2perl>(1) translator program
    that comes with Perl.  See the online docs for more on that.

But even still, you should
notice the stressed importance of the online (which means "on the
computer, not in hardcopy") documentation.

--tom

    =head1 Processing All Files in a Directory Recursively

    =head2 Problem

    You want to do something to each file and subdirectory in a particular
    directory.

    =head2 Solution

    Use the standard File::Find module.

	use File::Find;
	sub process_file {
	    # do whatever;
	}
	find(\&process_file, @DIRLIST);

    =head2 Discussion

    File::Find provides a convenient way to process a directory
    recursively.  It takes care of the directory scans and recursion
    for you.  All you do is pass C<find> a code reference and a
    list of directories.  For each file in those directories,
    recursively, C<find> calls your function.

    Before calling your function, C<find> changes directory to the
    directory being visited, whose path relative to the starting
    directory is stored in the C<$File::Find::dir> variable.  $_
    is set to the basename of the file being visited, and the full
    path of that file can be found in C<$File::Find::name>.  Your
    code can set C<$File::Find::prune> to true, to tell C<find> not
    to descend into the directory just seen.

    This simple example shows how to use File::Find.  We give C<find>
    an anonymous code subroutine that prints the name of each file
    visited and adds a C</> to the names of directories:

	@ARGV = qw(.) unless @ARGV;
	use File::Find;
	find sub { print $File::Find::name, -d && '/', "\n" }, @ARGV;

    This prints a C</> after directory names using the B<-d> file test
    operator, which returns the empty string C<''> if it fails.

    The following program prints the sum of everything in a given
    directory.  It gives C<find> an anonymous subroutine to keep a
    running sum of the size of each file [FOOTNOTE: That includes
    all inode types, including the sizes of directories and symbolic
    links, not just regular files.] it visits.  Once the C<find>
    function returns, the accumulated sum is displayed.

	use File::Find;
	@ARGV = ('.') unless @ARGV;
	my $sum = 0;
	find sub { $sum += -s }, @ARGV;
	print "@ARGV contains $sum bytes\n";

    This code finds the largest single file within a given set of
    directories:

	use File::Find;
	@ARGV = ('.') unless @ARGV;
	my ($saved_size, $saved_name) = (-1, '');
	sub biggest {
	    return unless -f && -s _ > $saved_size;
	    $saved_size = -s _;
	    $saved_name = $File::Find::name;
	}
	find(\&biggest, @ARGV);
	print "Biggest file $saved_name in @ARGV is $saved_size bytes long.\n";

    We use $saved_size and $saved_name to keep track of the name
    and the size of the largest file visited.  If we find a file
    bigger than the largest so far seen, we replace the saved name
    and size with the current ones.  When the C<find> is done
    running, the largest file and its size are printed out, rather
    verbosely.  A more general tool would probably just print the
    file name, its size, or both.  This time we used a named function
    rather than an anonymous one because the function was getting
    a bit big.

    It's a simple matter to change this to find the most recently changed
    file:

	use File::Find;
	@ARGV = ('.') unless @ARGV;
	my ($age, $name);
	sub youngest {
	    return if defined $age && $age > -M;
	    $age = (stat(_))[9];
	    $name = $File::Find::name;
	}
	find(\&youngest, @ARGV);
	print "$name " . scalar(localtime($age)) . "\n";

    The File::Find module doesn't export its $name variable, so you
    must always refer to it by its fully qualified name.  The final
    example is more a demonstration of namespace munging than of
    recursive directory traversal, although it does serve to find
    all the directories.  It makes $name in our current package an
    alias for the one in File::Find, which is essentially identical
    to the way Exporter works.  Then it declares its own version
    of find(), this one with a prototype that lets it be called
    more C<grep> or C<map>.

	#!/usr/bin/perl -lw
	# fdirs - find all directories
	@ARGV = qw(.) unless @ARGV;
	use File::Find ();
	sub find(&@) { &File::Find::find }
	*name = *File::Find::name;
	find { print $name if -d } @ARGV;

    Our C<find> does nothing but call the C<find> in File::Find,
    which we were careful not to import by specifying an C<()> empty
    list in the C<use> statement. Now rather than writing this:

	find sub { print $File::Find::name if -d }, @ARGV;

    We can write the more pleasant:

	find { print $name if -d } @ARGV;

    =head2 See Also

    The File::Find and the Exporter modules.



------------------------------

Date: Thu, 02 Nov 2000 18:35:11 GMT
From: Drew Myers <bh_ent@my-deja.com>
Subject: Re: Want to process all files less than 24 hours old
Message-Id: <8tsc4v$h3g$1@nnrp1.deja.com>

In article <3a01073d$1@cs.colorado.edu>,
  tchrist@perl.com (Tom Christiansen) wrote:


> Time to post perlrtfm again.
>
> --tom
>
Tom,

Please do!


--
Drew Myers
perotsystems


Sent via Deja.com http://www.deja.com/
Before you buy.


------------------------------

Date: Thu, 02 Nov 2000 18:50:32 GMT
From: Bart Lateur <bart.lateur@skynet.be>
Subject: Re: Want to process all files less than 24 hours old
Message-Id: <led30t428gg0jj945mdi5p2pegqqc8csaj@4ax.com>

Mark Thompson wrote:

>I'm wondering how would be the best way to go about processing all
>files in a directory that matches a certain extension that have been
>modified within the past 24 hours.
>
>The way I'm thinking about doing this cycling through all of the files
>in the directory, determining whether the extension of the files is .d
>and then determing whether performing a -M on the file gets a result
>of less than 1 and if it passes all of these tests, it will process
>the file.
>
>Any thoughts on whether there's a better way of doing this?

Basic intro to File::Find :

 * the first parameter is a reference to a sub, which will be called for
each directory and each file, including '.' on the root directory
 * the other parameters are any directories and/or files you want to
scan.

For each file/directory:

 * the current directory will be the directory the file/directory is in;
 * $_ will be set to the file/directory name, or to '.';
 * $File::Find::name will contain the full path, which will be an
absolute path only if the original path was an absolute path.

So: -d $_ will work.

What else? Ah, yes: for a directory, you can set $File::Find::prune to
true, in order to prevent scanning the contents of this directory.

Example:
	use File::Find;
	find sub {
	    -f or return; # only plain files
	    print "$File::Find::name\n" if -M _ < 1;
	}, $dir1, $dir2;

Provided that those directories $dir1 and $dir2 are set, this will list
all files that are under a day old.

-- 
	Bart.


------------------------------

Date: Thu, 02 Nov 2000 10:54:45 -0800
From: Mark Thompson <mark-lists@webstylists.com>
Subject: Re: Want to process all files less than 24 hours old
Message-Id: <dqd30tsss8m4onlt1r9698lso8ngtmi1ac@4ax.com>



On 2 Nov 2000 11:16:26 -0700, tchrist@perl.com (Tom Christiansen)
wrote:

>In article <p9p10tk2sp6s2q8poklv74onp8afh910vh@4ax.com>,
>Mark Thompson  <mark-lists@webstylists.com> wrote:
>>The only thing I could find on the subject in the documentation was
>>what's at:
>>
>>	http://search.cpan.org/doc/JHI/perl-5.7.0/lib/File/Find.pm
>
>
>Huh?  What are you doing over there?  That's not on your system.
>YOU HAVE THE PERL DOCUMENTATION.  
>
>There *is* documentation you might not have, like

ok ok ok, I found it :)

I've learned so much to expect that I can find everything on the web
and get examples on Usenet that I didn't even realize all the
documentation I could get with the perldoc command on my local
machine.

Thanks for the help!

Mark




------------------------------

Date: Thu, 02 Nov 2000 11:03:21 -0800
From: Mark Thompson <mark-lists@webstylists.com>
Subject: Re: Want to process all files less than 24 hours old
Message-Id: <t4e30t41pgl30fgn8trif9spklekr1dkcc@4ax.com>

On Wed, 01 Nov 2000 13:43:41 -0800, Mark Thompson
<mark-lists@webstylists.com> wrote:

>Hi,
>
>I'm wondering how would be the best way to go about processing all
>files in a directory that matches a certain extension that have been
>modified within the past 24 hours.
>
>The way I'm thinking about doing this cycling through all of the files
>in the directory, determining whether the extension of the files is .d
>and then determing whether performing a -M on the file gets a result
>of less than 1 and if it passes all of these tests, it will process
>the file.
>
>Any thoughts on whether there's a better way of doing this?  What I'm
>basically trying to do is determine what files appear in my logs
>directory in the past 24 hours.

I finally realized that my goal was to process every log file that
moved in to our archive directory within the past 24 hours.  I found
that even though -C doesn't show file creation, assuming that files
aren't changed within that directory it does let me know how long ago
a file appeared in that directory.

Anyway, for anyone who wants a code sample (it's kindof ugly), here is
what I've done.  Feel free to comment on it if you're bored :) I've
created a program that takes a data file that has a field that
contains the contents of a unique cookie.  I also take the results of
a small log file that shows what files the machine with that cookie
has downloaded.  I then print out a report of all users who downloaded
PDF files and what PDF files they downloaded. Like I said, it's ugly
but I haven't cleaned it up all pretty yet (yeah, I should do it now
but I'll have to get back to it):

#!/usr/local/bin/perl5

$logfiledir = '/u/logs/companyname';
$databasefile = '/u/companyname/data/download_req.txt';

use File::Find;

readdatabasefile();
find(\&wanted, $logfiledir);
outputreport();

sub readdatabasefile
{
    open(DBFILE, $databasefile);
    while(<DBFILE>)
    {
        chomp;
        ($dataitem, $rest) = split(/\t/, $_, 2);
        
        $data{$dataitem} = $_;
        
    }
    close (DBFILE);
}
    

sub wanted 
{ 
    if (-f && (-C ($_) < 1) && /\.d$/)
    {

        open(LOGFILE, $File::Find::name);
        while(<LOGFILE>)
        {
            chomp;
            @logitem = split(/\t/, $_, 4); 
            $time = $logitem[0];
            $cookie = $logitem[1];
            $file = $logitem[2];
            @cookies = split(/;/, $cookie);
            @cookies = trim(@cookies);
            $count = 1;
            if ($file =~ /pdf$/i)
            {
	        while ($count <= @cookies) 
	        {
	            $cookies[$count - 1] =~ /ManualDownloadCode=(.+)/;
	            if ($1)
	            {     
	                $manualcode = $1;
	                $log{$manualcode}{$file} = 1;
	            }  
	            $count++;
	        }
	    }
        }
        close(LOGFILE);
    }
}

sub outputreport()
{

    # print the whole thing
    foreach $user ( keys %log ) 
    {
        @dataarray = split("\t",$data{$user});
        for ($i = 0; $i <= @dataarray; $i++)
        {
            if (length($dataarray[$i]))
            {
                print "$dataarray[$i]\n"
            }
        }
    
    
        print "Files Downloaded:\n";
        for $filedl ( keys %{ $log{$user} } ) 
        {
            print "  $filedl\n";
        }
        print "\n";
    }
}


sub trim {
    my @out = @_;
    for (@out) {
        s/^\s+//;
        s/\s+$//;
    }
    return wantarray ? @out : $out[0];
}


------------------------------

Date: Thu, 2 Nov 2000 22:40:16 +0800
From: "Lucas Gump" <lukus@hongkong.com>
Subject: Win32::OLE with office2000
Message-Id: <8truap$1r51@imsp212.netvigator.com>

Is that Win32::OLE don't support Office2000?




------------------------------

Date: 16 Sep 99 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 16 Sep 99)
Message-Id: <null>


Administrivia:

The Perl-Users Digest is a retransmission of the USENET newsgroup
comp.lang.perl.misc.  For subscription or unsubscription requests, send
the single line:

	subscribe perl-users
or:
	unsubscribe perl-users

to almanac@ruby.oce.orst.edu.  

| NOTE: The mail to news gateway, and thus the ability to submit articles
| through this service to the newsgroup, has been removed. I do not have
| time to individually vet each article to make sure that someone isn't
| abusing the service, and I no longer have any desire to waste my time
| dealing with the campus admins when some fool complains to them about an
| article that has come through the gateway instead of complaining
| to the source.

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

To request back copies (available for a week or so), send your request
to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
where x is the volume number and y is the issue number.

For other requests pertaining to the digest, send mail to
perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
sending perl questions to the -request address, I don't have time to
answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V9 Issue 4795
**************************************


home help back first fref pref prev next nref lref last post