[69244] in North American Network Operators' Group

home help back first fref pref prev next nref lref last post

RE: Spam with no purpose?

daemon@ATHENA.MIT.EDU (Michel Py)
Wed Mar 31 23:25:56 2004

Date: Wed, 31 Mar 2004 20:25:15 -0800
From: "Michel Py" <michel@arneill-py.sacramento.ca.us>
To: "Deepak Jain" <deepak@ai.net>, <nanog@merit.edu>
Errors-To: owner-nanog-outgoing@merit.edu

> Deepak Jain wrote:
> Can someone explain to me (publicly or privately) why someone
> would send spam with no product to sell, no position to pitch,
> nothing except text designed to get by a spam filter -- without
> even HTML to KNOW it got by a spam filter..

Likely two different goals here:

1. Reduce the efficiency of Bayesian-like filters: Trouble with this
kind of email is that they are a) of sufficient length b) contain only
"real" words c) contain none of the words regularly used by spammers
such as the v. word.

It's a lose-lose situation for the spam engine:

- If this message is marked as spam, it increases the likeliness of
false positives, as the message shares different common points with real
email in spam-measuring metrics such as length, percentage of real
words, etc.

- If this message is marked as legit, it reduces the catching abilities
of the spam engines as it shares similar patterns with a spam that would
be essentially the same text altered to a spam content.

You can bet that it won't be long until we see such messages that not
only use only dictionary words, but furthermore are constructed with a
valid grammar (and still mean nothing). One of the next fronts in spam
detection is based on grammatical correctness. What we are looking at is
the eternal battle between the shield and the weapon: as soon as someone
invents a new shield, someone else develops a new weapon that will
pierce it.

2. It might a statistical probe spam:
Spammer xyz has a list of 1 million addresses, out of which 500,000 are
invalid and bounce. By using a return address that is actually not
bogus, the spammer can indirectly measure the efficiency of Bayesian
outsmarting strategies.

First the spammer send a spam that will be blocked by the majority of
spam-detection engines (even the dumber ones) by including correctly
spelled well-knows spam words in both subject and text. You know, the
stuff that promises to put a foot in your pants that features
"always-on" service.

Let's say that our spammer gets 150,000 bounces out of this one, the
math is simple: out of 500,000 potential bounces they got only 150,000
which means that 350,000 have been blocked by spam engines prior to the
non-existing-user bounce.

Then, our spammer sends the kind of email you referred to the same list
and measures the bounce rate one more time. If this time he gets 450,000
bounces, it means that only 50,000 out of the 500,000 potential bounces
have been blocked by spam engines, which in turns mean that the same
email slightly alter will reach a large part of its targets.

This is a simplified view, as bounce rate alone is not a valid
measurement of outsmarting strategies, but correlating two or three of
that kind of metric gives a reasonably precise of which spamming
techniques still work, and which have become a waste of bandwidth.


When someone dies, it's a tragedy.
When millions die, it's a statistic.
    -- Josef Stalin --

home help back first fref pref prev next nref lref last post