Bayesian Spam Filter :: POPFile Automatic Email Sorting
POPFile Automatic Email Sorting using Naive Bayes
Paul has been working on the theory and an application in ARC for filtering SPAM based on message content.
This project is based on that theory and is written in perl.
Imagine that you have three folders you'd like to sort email into: work, personal and spam (POPFile calls these folders 'buckets'). Setting up an email client to know how to sort the mail ranges from hard (in the case of work where you'd have to tell it about everyone in your company) and impossible (spammers keep changing their emails to evade filtering).Bayes Theorem gives POPFile a way to calculate the probability that an email is work, personal or spam by calculating P(work|E), P(personal|E), and P(spam|E) where E is the new email and P(work|E) is the probability of email E being a 'work' email and so on. By picking the largest probability of the three POPFile can automatically pick the appropriate folder. POPFile calculates these probabilities by looking at the frequency with which words occur in each folder and applying Bayes Theorem.
0 TrackBacks
Listed below are links to blogs that reference this entry: Bayesian Spam Filter :: POPFile Automatic Email Sorting.
TrackBack URL for this entry: http://kennethhunt.com/mt/mt-tb.cgi/411