 |
Tu Ouyang, Soumya Ray, Mark Allman, Michael Rabinovich. A Large-Scale Empirical Analysis of Email Spam Detection Through Transport-Level Characteristics. Technical Report 10-001, International Computer Science Institute, January 2010.
PDF
Abstract:
Spam is a never-ending issue that constantly consumes resources to no
useful end. In this paper we evaluate the efficacy of using a machine
learning-based model of the transport layer characteristics of email
traffic to identify spam. The underlying idea is that the manner in
which spam is transmitted has an impact that is statistically observable
in the traffic (e.g., in the network round-trip time or jitter between
packets). Therefore, by identifying a solid set of traffic features we
can construct a model that can identify spam without relying on
expensive content filtering. We carry out a large scale empirical
analysis of this idea with data collected over the course of one year
(roughly 600K messages). With this data, we train classifiers using
machine learning methods and test several hypotheses. First, we validate
prior results using similar techniques. Second, we determine which
transport characteristics contribute most significantly to the detection
process. Third, we analyze the behavior of our detectors over weekly
and monthly intervals and in the presence of major network events.
Finally, we evaluate the behavior of our detectors in a practical
setting where they are used in a filtering pipeline along with standard
off-the-shelf content filtering methods, and demonstrate that they can
lead to computational savings in practice.
BibTeX:
@techreport{ORAR10,
author = "Tu Ouyang and Soumya Ray and Mark Allman and Michael Rabinovich",
title = "{A Large-Scale Empirical Analysis of Email Spam Detection Through Transport-Level Characteristics}",
institution = "International Computer Science Institute",
year = 2010,
number = "10-001",
month = jan,
}
|
|