[69280] in North American Network Operators' Group


home	help	back	first	fref	pref	prev	next	nref	lref	last	post

RE: Spam with no purpose?

daemon@ATHENA.MIT.EDU (Paul Jakma)
Fri Apr 2 19:08:43 2004

Date: Sat, 3 Apr 2004 01:00:58 +0100 (IST)
From: Paul Jakma <paul@clubi.ie>
To: Michel Py <michel@arneill-py.sacramento.ca.us>
Cc: Deepak Jain <deepak@ai.net>, nanog@merit.edu
In-Reply-To: <DD7FE473A8C3C245ADA2A2FE1709D90B0DB04C@server2003.arneill-py.sacramento.ca.us>
Errors-To: owner-nanog-outgoing@merit.edu

On Wed, 31 Mar 2004, Michel Py wrote:

> 1. Reduce the efficiency of Bayesian-like filters: Trouble with this
> kind of email is that they are a) of sufficient length b) contain only
> "real" words c) contain none of the words regularly used by spammers
> such as the v. word.

Good bayesian filters do not score on single words alone, they also 
score on "phrases" (ie multiple words). Random strings of words will 
result in neutral scores (presuming those words are also used in 
non-spam), while the phrases will be slightly higher. Re-used 
gibberish (ie apparently random) strings of words will result in 
"phrases" from that gibberish having high scores. 

Also, a good bayesian filter should prune its database regularly of
phrases (including one word phrases) that have not had their score
updated recently, further reducing "pollution" by random words and
phrases. 

noise is just noise. the spam specific stuff will still be 
statistically significant, hopefully.

regards,
-- 
Paul Jakma	paul@clubi.ie	paul@jakma.org	Key ID: 64A2FF6A
	warning: do not ever send email to spam@dishone.st
Fortune:
It's currently a problem of access to gigabits through punybaud.
-- J. C. R. Licklider


home	help	back	first	fref	pref	prev	next	nref	lref	last	post

[69280] in North American Network Operators' Group

RE: Spam with no purpose?

daemon@ATHENA.MIT.EDU (Paul Jakma)Fri Apr 2 19:08:43 2004

daemon@ATHENA.MIT.EDU (Paul Jakma)
Fri Apr 2 19:08:43 2004