Bayesian Spam Filter :: POPFile Automatic Email Sorting

| | TrackBacks (0)

POPFile Automatic Email Sorting using Naive Bayes

Paul has been working on the theory and an application in ARC for filtering SPAM based on message content.
This project is based on that theory and is written in perl.


Imagine that you have three folders you'd like to sort email into: work, personal and spam (POPFile calls these folders 'buckets'). Setting up an email client to know how to sort the mail ranges from hard (in the case of work where you'd have to tell it about everyone in your company) and impossible (spammers keep changing their emails to evade filtering).

Bayes Theorem gives POPFile a way to calculate the probability that an email is work, personal or spam by calculating P(work|E), P(personal|E), and P(spam|E) where E is the new email and P(work|E) is the probability of email E being a 'work' email and so on. By picking the largest probability of the three POPFile can automatically pick the appropriate folder. POPFile calculates these probabilities by looking at the frequency with which words occur in each folder and applying Bayes Theorem.

0 TrackBacks

Listed below are links to blogs that reference this entry: Bayesian Spam Filter :: POPFile Automatic Email Sorting.

TrackBack URL for this entry: http://kennethhunt.com/mt/mt-tb.cgi/411

About this Entry

This page contains a single entry by klsh published on November 13, 2002 8:08 AM.

FACET was the previous entry in this blog.

Perl SVG Examples :: Nice Graphics & Tutorials is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.