[24282] in Perl-Users-Digest


home	help	back	first	fref	pref	prev	next	nref	lref	last	post
Perl-Users Digest, Issue: 6473 Volume: 10

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Mon Apr 26 21:10:41 2004

Date: Mon, 26 Apr 2004 18:10:09 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Mon, 26 Apr 2004     Volume: 10 Number: 6473

Today's topics:
    Re: RFC: Text similarity <tore@aursand.no>
    Re: RFC: Text similarity <tore@aursand.no>
    Re: sending data from one program to a perl prog <noreply@gunnar.cc>
        SSL.pm decryption failed or bad record mac (Asier)
        SV_TYPE_* constants? <fixerdave@hotmail.com>
    Re: textarea problem <robin @ infusedlight.net>
    Re: Tripod wont find my lib files <robin @ infusedlight.net>
    Re: Tripod wont find my lib files <invalid-email@rochester.rr.com>
        what I was doing (originally textarea problem) <robin @ infusedlight.net>
    Re: what I was doing (originally textarea problem) <matthew.garrish@sympatico.ca>
    Re: wow...jackpot. <robin @ infusedlight.net>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Tue, 27 Apr 2004 00:37:29 +0200
From: Tore Aursand <tore@aursand.no>
Subject: Re: RFC: Text similarity
Message-Id: <pan.2004.04.26.22.36.19.167562@aursand.no>

On Fri, 23 Apr 2004 21:50:52 +0200, Michele Dondi wrote:
>> I have a large (more than 3,000 at the moment) set of documents in
>> various formats (mostly PDF and Word).  I need to create a sort of
>> (...) index of these documents based on their similarity.  I thought it
>> would be nice to gather some suggestions from the people in this group
>> before I proceeded.

> I know that this may seem naive, but in a popular science magazine I
> read that a paper has been published about a technique that indeed
> identifies the (natural) language some documents are written in by
> compressing (e.g. LZW) them along with some more text from samples taken
> from a bunch of different languages and comparing the different
> compressed sizes. You may try some variation on this scheme...

I really don't have the opportunity to categorize any of the documents;
Everything must be 100% automatic without human interference.

I should also point out that the text is mainly in Norwegian, but there
might be occurances of English text (as we're talking about technical
manuals).

> I for one would be interested in the results, BTW!

I will keep you updated! :)


-- 
Tore Aursand <tore@aursand.no>
"First, God created idiots. That was just for practice. Then He created
 school boards." (Mark Twain)


------------------------------

Date: Tue, 27 Apr 2004 00:37:29 +0200
From: Tore Aursand <tore@aursand.no>
Subject: Re: RFC: Text similarity
Message-Id: <pan.2004.04.26.22.34.02.350587@aursand.no>

On Fri, 23 Apr 2004 11:46:44 -0400, James Willmore wrote:
>> First of all:  Converting the documents to a more sensible format (text
>> in my case) is not the problem.  The problem is the indexing and how to
>> store the data which represents the similarity between the documents.

> I'd use a database to store information about each document in.

That has already been taken care of;  I will use MySQL for this, and have
already a database up and running which consists of meta information about
each document (title, description and where it is stored).

The next step will be to retrieve all the words from each document, remove
obvious stopwords, and then associate each document with its words (and
how many times it appears in each document).

Based on this information I will create a script which tries to find
similar documents based on the associated words; If two documents holds a
majority of the same words, they are doomed to be similar. :)

The documents are in Norwegian, though, so I'm not able to rely on some of
the excellent Lingua- and Stem-modules out there.  I'm aware that there
are a few modules for the Norwegian language, too, but I'm not quite sure
about the quality of them (and if they rely too much on the Danish
language, which at least some of the modules do).

The whole application is - of course - split into more than one script;

  * Processing: Converting the documents to text, and converting the
    text into words (and how many times each word appears).
  * Inserting into the database.
  * Similiarity checking; A script which checks every document in the
    database against all the other documents. Quite expensive, this one,
    but easily run around 5 in the morning when everyone is asleep. :)
  * Web frontend for querying the database (ie. selecting/reading the
    documents and letting the user choose to see related documents).

> There are Statistics modules as well.  You could perform tests againist
> two documents and get a statistically correlation between the documents
> to see *how* similar they are.

Hmm.  Do you have any module names?  A "brief search" didn't yield any
useful hits.

> I'm rusty on Statistics 101, but my thinking is maybe using a t-test
> between the two documents might be the way to go.

I don't even know what a "t-test" is, but googling for "t-test" may give
me the answer...?  Or should I search for something else (specific)?

> Just my $0.02 :-)

Great!  Thanks alot!


-- 
Tore Aursand <tore@aursand.no>
"Then there was the man who drowned crossing a stream with an average
 depth of six inches." (W.I.E. Gates)


------------------------------

Date: Tue, 27 Apr 2004 01:34:24 +0200
From: Gunnar Hjalmarsson <noreply@gunnar.cc>
Subject: Re: sending data from one program to a perl prog
Message-Id: <c6k6fc$d4e92$1@ID-184292.news.uni-berlin.de>

Gunnar Hjalmarsson wrote:
> 
>     my $msg;
>     my $maxsize = 131072;
>     unless ($ENV{CONTENT_LENGTH} > $maxsize) {
>         $msg = do {local $/; <STDIN>};

I'd better add that such a check of message size does not always work. 
For instance, the sendmail configuration on my own brand new server is 
so 'secure' so that no %ENV variable at all is present when a process 
is run as the mail program. Perl's stat() function, i.e.
(stat STDIN)[7], does not contain the size either...

Oh, well.

-- 
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl



------------------------------

Date: 26 Apr 2004 16:12:13 -0700
From: kasier@hispavista.com (Asier)
Subject: SSL.pm decryption failed or bad record mac
Message-Id: <8ae0c44.0404261512.1c6b2cec@posting.google.com>

Hi;

i have developed a ssl applicattion with perl. Basically a server and
a client that send information. I have created my own certification
authority, and a server key, which is certified.

I do not know if it is becouse of this, but before some time and when
everything goes fine before sending and receiving information throught
the socket, and i try to send again application data, i lost the
socket.

Somebody suggested me to use SSLDUMP, but this does not give me much
information.

This is all i have:

My server's output;
*****************

Dealing with 2 clients
SSL read errorerror:1408F455:SSL routines:SSL3_GET_RECORD:decryption
failed or bad record mac
 at /usr/lib/perl5/site_perl/5.8.0/IO/Socket/SSL.pm line 480
CLIENT sudden close in 2nd stage
Dealing with 1 clients

SSLDUMP's output;
****************

New TCP connection #1: localhost.localdomain(40006) <->
localhost.localdomain(1002)
1 1  0.0039 (0.0039)  C>S SSLv2 compatible client hello
  Version 3.1
  cipher suites
  Unknown value 0x3a
  Unknown value 0x39
  Unknown value 0x38
  Unknown value 0x35
  Unknown value 0x34
  Unknown value 0x33
  Unknown value 0x32
  Unknown value 0x2f
  TLS_DHE_DSS_WITH_RC4_128_SHA
  TLS_DHE_RSA_WITH_3DES_EDE_CBC_SHA
  TLS_DHE_DSS_WITH_3DES_EDE_CBC_SHA
  TLS_RSA_WITH_3DES_EDE_CBC_SHA
  TLS_RSA_WITH_RC4_128_SHA
  TLS_RSA_WITH_RC4_128_MD5
  TLS_DH_anon_WITH_3DES_EDE_CBC_SHA
  TLS_DH_anon_WITH_RC4_128_MD5
  SSL2_CK_3DES
  SSL2_CK_RC2
  SSL2_CK_RC4
1 2  0.0049 (0.0009)  S>C  Handshake
      ServerHello
        Version 3.1
        session_id[32]=
          cf 83 c9 1d af 25 4d 44 1c 85 bc df f4 60 d9 04
          8c 1a 79 7d 2a 56 da 7d 18 d5 31 58 5c f0 42 26
        cipherSuite         Unknown value 0x35
        compressionMethod                   NULL
1 3  0.0049 (0.0000)  S>C  Handshake
      Certificate
1 4  0.0049 (0.0000)  S>C  Handshake
      ServerHelloDone
1 5  0.0231 (0.0182)  C>S  Handshake
      ClientKeyExchange
1 6  0.0231 (0.0000)  C>S  ChangeCipherSpec
1 7  0.0231 (0.0000)  C>S  Handshake
1 8  0.0279 (0.0047)  S>C  ChangeCipherSpec
1 9  0.0279 (0.0000)  S>C  Handshake
1 10 13.7405 (13.7126)  C>S  application_data
1 11 13.7426 (0.0020)  S>C  application_data
1 12 13.7487 (0.0061)  C>S  application_data
1 13 78.1044 (64.3557)  C>S  application_data
1 14 78.1051 (0.0007)  S>C  Alert
1 15 78.1120 (0.0068)  C>S  Alert
1 16 78.1423 (0.0303)  C>S  application_data
1 17 78.1423 (0.0000)  C>S  Alert
1 18 78.1423 (0.0000)  C>S  application_data
1 19 103.2412 (25.0989)  C>S  application_data


Suggestions, ideas? i am lost.
Does this have something in common with Kerberos ( i hear about some
problems with red hat 9, my current linux box).
Thank you very much in advance.


------------------------------

Date: Mon, 26 Apr 2004 15:21:54 -0700
From: "FixerDave" <fixerdave@hotmail.com>
Subject: SV_TYPE_* constants?
Message-Id: <408d8b81$1@obsidian.gov.bc.ca>

Hi,

anyone willing to point me in the right direction for finding information
on "SV_TYPE_* constants"?  The docs for Win32::NetAdmin say "For flags, see
SV_TYPE_* constants," with no other mention of them.  I've been running
around is circles trying to figure out what these are.

Actually, all I really want is to check ONE stupid little checkbox in NT's
usermanager for "User must change password at next logon"  Everything else
works fine...  Oh, I'm trying to avoid using Win32::AdminMisc as I'm on
ActiveState's 5.8.0 build.  I'm hoping this flag thing will let me make the
setting I need.

    David...




------------------------------

Date: Mon, 26 Apr 2004 16:39:13 -0800
From: "Robin" <robin @ infusedlight.net>
Subject: Re: textarea problem
Message-Id: <c6k32d$t7i$1@news.f.de.plusline.net>

I figured it out, sorry for not including a working script.
-Robin





------------------------------

Date: Mon, 26 Apr 2004 17:14:46 -0800
From: "Robin" <robin @ infusedlight.net>
Subject: Re: Tripod wont find my lib files
Message-Id: <c6k5ih$uhj$1@news.f.de.plusline.net>


"javatiger" <tigermott@yahoo.com> wrote in message
news:962e2a34.0404251428.28daf177@posting.google.com...
> Every time I upload a library file into my cgi bin, tripod refuses to
> find it and comes up with an error.
>
> It looks like there was an error:
>
> Your script produced this error:
> Can't locate File/Glob.pm in @INC (@INC contains: . / /lib /site_perl)
> at msg.pl line 8.
> BEGIN failed--compilation aborted at msg.pl line 8.
>
> I get usual scripts to work but just can't get the lib ones too.
>
> Is there something I need to be doing?
>
> Cheers

try www.free-webhosts.com - and specify the full path to your library,
you'll probably have to find out from tripod. It might be something like
/usr/home/myuser or something.
-Robin





------------------------------

Date: Mon, 26 Apr 2004 23:29:03 GMT
From: Bob Walton <invalid-email@rochester.rr.com>
Subject: Re: Tripod wont find my lib files
Message-Id: <408D9B0D.4030701@rochester.rr.com>

javatiger wrote:

> What's "tripod"?
> 
> Its a free webhosting site with cgi, http://www.tripod.lycos.com/
> 
> This is the script that I have loaded into the cgi bin
> -------------------------------------------------------------------
> #!/usr/bin/perl


You are missing:

     use warnings; #if your Perl is too old to support this, add -w
                   #to the #! line above
     use strict;
     use CGI; #when you are dealing with a CGI script, you should
              #really make use of the CGI module.  Not to do so is
              #to invite gobs of errors and make lots and lots of
              #additional work for you.


> require "formparser.lib"; &parseform;

----------------------------^
Minor pick:  You should only use & in front of a sub call if you
want the behaviors it generates.  You don't, so use:

     parseform();

instead.  And, of course, you should really be using the CGI
module to "parse your form".

    my %formdata; #used as a global in your sub below


> $txt = $formdata{'msg'};

my $txt = ...


> $name = $formdata{'from'};

   my $name = ...


> open( TXT, ">>messages.txt" );                        
> print TXT "Message: $txt - From $name \n";                     
> close(TXT);
> open( DATA, "<messages.txt" );

--------^^^^
This is a special filehandle in Perl.  You can use it as
you did, but you should really choose a different name.


> @data = < DATA >;

-----------^----^
   my @data = <DATA>;

Get rid of the spaces in the <...> operator.  That
is the source of the call to module File::Glob
which isn't being found.  Read the docs on the
<...> operator very carefully to see why this is
the case:

     perldoc perlop

particularly the section on I/O operators.


> close( DATA );
> print "Content-type:text/html\n\n";               
> foreach $item(@data){ print "<li>$item"; } 

           my $item(...


> 
> ------------------------------------------------------------------------
> and ive uploaded the formparser lib
> -------------------------------------------------------------------------
> sub parseform
> {                                    


     my @pairs;


>   if( $ENV{'REQUEST_METHOD'} eq 'GET' ) 
>   {	@pairs = split( /&/, $ENV{'QUERY_STRING'} ); }   
>   elsif( $ENV{'REQUEST_METHOD'} eq 'POST' ) 
>   {                                                   
> 	read( STDIN, $buffer, $ENV{'CONTENT_LENGTH'} );
> 	@pairs = split( /&/, $buffer );                          	
> 	if( $ENV{'QUERY_STRING'} ) 
> 	{ @getpairs = split( /&/, $ENV{'QUERY_STRING'} );

           my @getpairs = ...


> 	  push( @pairs, @getpairs ); }
>   }                                                 
>   else 
>   {                                               
>     print "Content-type:text/html\n\n";            
>     print "Unrecognized Method - Use GET or POST.";
>   }
>   foreach $pair( @pairs ) 

             my $pair ...


>   {                                           
>     ( $key, $value ) = split( /=/, $pair );

       my($key,$value) = ...


>     $key =~ tr/+/ /;	
>     $value =~ tr/+/ /;                                   
>     $key =~ s/%(..)/pack("c", hex($1))/eg;
>     $value =~ s/%(..)/pack("c", hex($1))/eg;         
>     $value =~ s/<!--(.|\n)*-->//g;  		# ignore SSI
>     if( $formdata{$key} ){$formdata{$key} .= ", $value";}
>     else{ $formdata{$key} = $value; }                         
>   }                                                           }	
> 1;
> ---------------------------------------------------------------------
> 
> But it throws the error.
> 
> Could you edit what need to be changed please.
> 
> ---------------------------------------------------------------------
> 
> Bob Walton <invalid-email@rochester.rr.com> wrote in message news:<408C4F17.6030702@rochester.rr.com>...
> 
 ...

-- 
Bob Walton
Email: http://bwalton.com/cgi-bin/emailbob.pl



------------------------------

Date: Mon, 26 Apr 2004 17:21:41 -0800
From: "Robin" <robin @ infusedlight.net>
Subject: what I was doing (originally textarea problem)
Message-Id: <c6k5ii$uhj$2@news.f.de.plusline.net>


"Robin" <robin @ infusedlight.net> wrote in message
news:c6jrk6$ov7$1@news.f.de.plusline.net...
> I have a textarea in a script that displays and when the user submits
> something it sends the output to the browser from the textarea, why is is
> that when I press the return key within the textarea and type more text
> below my carriage return does that text below the carriage not display? I
> don't have any special parse code, I'm using cgi.pm.


I'm posting this so users won't get confused with textareas and cgi.

it was a problem with the script, not a problem with the textarea or the
parse code. I was basically reading the lines from a textarea into a text
file, and before the textarea I'd put a number and then a special string (to
split) and then an indentifier, post or comment, and then the textarea
contents. The problem was whenver the textarea input included a newline it
would screw up the way the file was parsed and printed later on, I finally
figured out that all I had to do was remove the newlines and replace them
with "<br>" and then remove the carriage returns and replace them with
nothing so it would be a "<br>" everytime someone inputed a newline into the
text area. The code is as follows.
  $posttext = param ('posttext');
  $posttext =~ s/\r/<br>/m;
  $posttext =~ s/\n//m;
  #and then I printed param ('posttext') - and it worked

I'm stupid.

--
Regards,
-Robin
--
[ webmaster @ infusedlight.net ]
www.infusedlight.net




------------------------------

Date: Mon, 26 Apr 2004 20:07:42 -0400
From: "Matt Garrish" <matthew.garrish@sympatico.ca>
Subject: Re: what I was doing (originally textarea problem)
Message-Id: <bvhjc.18190$OU.359798@news20.bellglobal.com>


"Robin" <robin @ infusedlight.net> wrote in message
news:c6k5ii$uhj$2@news.f.de.plusline.net...
>
> I'm stupid.
>

No need to state the obvious when posting here...

Matt




------------------------------

Date: Mon, 26 Apr 2004 17:01:06 -0800
From: "Robin" <robin @ infusedlight.net>
Subject: Re: wow...jackpot.
Message-Id: <c6k4kg$u6e$2@news.f.de.plusline.net>


"Tore Aursand" <tore@aursand.no> wrote in message
news:pan.2004.04.14.04.05.25.380100@aursand.no...
> On Tue, 13 Apr 2004 20:33:50 -0700, Robin wrote:
> > http://sunsite.iisc.ernet.in/virlib/cgi/week/ewtoc.html
> > I suppose the editors know about this?
>
> I don't know, but I don't think they would bother, either.  The book is
> from 1996, and - therefore - quite outdated.
>
> And:  Every book which has a title that promises you to learn something in
> x weeks, day or even hours is lying.

too true..hehe...





------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc.  For subscription or unsubscription requests, send
#the single line:
#
#	subscribe perl-users
#or:
#	unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.  

NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice. 

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V10 Issue 6473
***************************************

home	help	back	first	fref	pref	prev	next	nref	lref	last	post
[24282] in Perl-Users-Digest

Perl-Users Digest, Issue: 6473 Volume: 10

daemon@ATHENA.MIT.EDU (Perl-Users Digest)Mon Apr 26 21:10:41 2004

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Mon Apr 26 21:10:41 2004