|
Quick Start for Spam Identification and Filtering |
What is spam?
Spam is a term used to describe unsolicited email, also known as unsolicited commercial email (UCE) or junk email.. The messages are usually mass mailed and considered invasive by those who receive them. If you haven't heard of spam before, you probably haven't received much of it and should consider yourself lucky. The name is generally believed to come from the song in a Monty Python skit where the Vikings sing "Spam spam spam spam, spam spam spam spam, lovely spam, wonderful spam…"- a continuing repetition of worthless text, eventually drowning out all other communication.
Why is it so difficult to identify spam?
Unfortunately spam email is like regular junk mail - when you look through the clear address window in the envelope and the mail appears to contain a check, you can't tell whether the mail actually contains a check unless you open it and examine it. Sometimes, it looks like a check but is really an advertisement or loan offer. If the envelope looked suspicious, you'd probably open it anyway, just to make sure. If the envelope looks normal, you're very likely to open it and look inside.
Email is no different - and can actually be a bit more difficult. If it was possible to identify the real source (i.e. from) address of a spam message it would be possible to block the message based on that information alone. However, source addresses are often fake, and even if they're not fake they are changed very quickly -- it takes little effort to set up a new email account and start spamming again. It is also not feasible to block entire domains (for example hotmail.com) because many legitimate emails would be blocked along with the spam.
If spam messages could be identified without error, it might be realistic to block spam prior to final delivery. The best spam identification methods are heuristic, however, and are not 100% accurate. E-mail messages are examined and a score is given as to the probability that the email is spam based on observed characteristics of the email. This method can be very effective but is not always accurate, especially since individuals differ in their opinions about what should be defined as spam. Some email marked as potential spam will be normal email and some email that is marked as normal email will be spam. This is why email identified as potential spam should not be deleted by the email delivery system.
Why do I get so much spam? Am I being targeted? Do you sell or distribute my email address?
Spammers can get your email address from many sources. Most commonly, spammers use robots to search sites on the Internet for email addresses. They can be gathered from web pages (including your personal web page if your email address is contained within it (even if it's not directly displayed)), mailing lists, message boards and discussion groups, newsgroups, directories, and other online sources. Addresses that are harvested are often then just sold to others.
Sometimes, you give spammers your address directly or indirectly by signing up for a service or by purchasing products via the web. They may in turn sell your address to others or otherwise distribute it to their marketing partners. You should always check a site's privacy policy so you at least know their position on sharing your personal information.We do not sell or distribute email addresses.
The more public your email address is, and the more mailing lists you are on, the more likely you are to get lots of spam. If you order products and services via the Internet, participate in open mailing lists and discussion groups, or provide your email address to companies and other organizations, you might consider having a "public" email address through a different provider and a more private one for personal correspondence. Even this may be of limited use since spammers often just generate addresses and send email to them with the expectation that some will actually be real addresses.
What is Computing & Networking's policy about spam?
CSM does not want systems connected to its network contributing to the spam problem so all computer systems connected to the campus network that provide mail services must be configured to prevent their use as an open relay. Servers detected with open relays will be disconnected from the campus network until the relay is closed. Using CSM's computers or network to originate or transmit email judged to be spam is prohibited and exposes the sender to disciplinary action. We do not sell or distribute email addresses.
Our policies regarding incoming unsolicited email are governed by the principle that identification of spam is a personal decision which an individual email recipient must make. Furthermore, the centralized blocking of sites or addresses is generally not effective and raises privacy concerns for users. Therefore,
- we will not do site-wide spam blocking or removal except in unusual circumstances such as denial of service or virus and other resource attacks,
- we will not, electronically or via a staff member, examine individual email messages to determine whether they could be spam without a user's permission, and
- to accommodate individual needs and opinions, individual users should have as much control as is practical in identifying the characteristics that define spam, in configuring how spam is identified, and in the filtering and disposition of spam.
Therefore, CSM's Computing & Networking policy with respect to spam identification and filtering is an opt-in policy that provides customized user control of spam identification parameters where possible. Given this policy and that CSM uses email as an official communication method, it is your responsibility to insure that you do not filter out official CSM communications.
What is Computing & Networking doing about spam?
We know that junk mail is very annoying. Because of our public and very commonly used email addresses (such as webmaster) we probably receive more of it than any other group on campus. To provide our account holder's with a method to manage spam, we have made a spam filtering solution available through the central mail shell server, Imagine. Most students, faculty, and staff in academic departments receive mail through this server. If you do not have an account on the AC&N mail server and receive your email through it, then you should talk with your local system administrator about what options may be available to you.
The spam identification solution we have chosen is called SpamAssassin. This program runs on the central mail server and each user must elect to run it (opt-in) by setting it up (or having it set up) to evaluate all incoming email for the user. Spamassissin will initially be set up for you with some default options about how to identify spam. You can customize these settings after the initial setup. Common changes would be to always exempt email from or to certain addresses (whitelisting), always identify email from certain addresses as spam (blacklisting), changing the threshhold score which causes an email to be marked as spam, or changing the scoring of certain message characteristics.
Effective spam management is at least a two-part process. You need to 1) identify the spam (done by SpamAssassin) and 2) filter it to a spam/junk mail box (recommended) or the trash (not recommended) in your email client. If you filter it to a junk/spam mailbox, then you need to (at least periodically) review that mailbox for "false positives" and delete the actual spam messages. Deleting the messages from your spam/junk mailbox is especially important if you use webmail since those messages continue to take up space against your disk quota.
How does SpamAssassin work?
SpamAssassin examines email messages, evaluates characteristics about them, computes a "spam score," and then identifies them as spam in the subject line or leaves the subject line intact based on the computed spam score. Messages that appear to be spam are marked in the "Subject:" line so they can be easily identified and automatically filtered into a separate mail folder.
How do I initiate spam identification?
To initiate spam filtering yourself, you must not already have a .forward file in your Imagine home directory. If you do have an existing .forward file you should open a ticket at the Mines Help Center to ask for assistance in setting up spam filtering. If you don't have an existing .forward file (if you don't know what a .forward file is, you probably don't have one), then do the following:
- Login to Imagine using a secure shell client (such as ssh secure shell or putty), or with telnet if you must.
- Once you are logged into Imagine, type filterspam
- You will get an error message telling you to contact support if you already have a .forward file.
- If it works you will get an information message telling you SpamAssassin has been activated.
- Type exit to logout.
The next email you receive will cause a directory to be created in your Imagine home space called ".spamassassin" (without the quotes but with the dot. A file called user_prefs will be placed in that directory. This is the CSM default spmassassin user configuration file. The site-wide and default CSM user configuration settings are as follows:
- Messages identified as spam will have the following added at the beginning of the email message's subject line
- [spam: xx.xx/5.00] where xx.xx is the spam "score" assigned to this message
- when you set up a filter in your email client, you should look for "[spam:" to trigger moving the message to a different mailbox.
- All email messages will have "X headers" inserted in them that start with "X-Spam". These headers will typically only be visible if you turn on the "view all headers" option in your mail client.
- The default score to identify a message as spam is 5.00. You can increase or reduce this score in the user_prefs file and you can change the scoring characteristics of any parameter but you'll need to learn more about SpamAssassin if you want to do this. The spamassissin web site is located at http://www.spamassassin.org
- All email messages appearing to be from addresses at mines.edu are whitelisted by default so no messages appearing to come from CSM should be marked as spam unless you make changes in the configuration file. This is important since mail messages such as Campus In Brief, vacation/sick leave reporting messages, and emails to groups such as faculty, staff, and students may otherwise be identified as spam.
How do I filter email identified as spam?
How you actually filter email identified as spam will depend on what email client or clients you use. No matter which email client you use, the test you apply will be the same. In your email client program, you should create a mailbox where you want to send messages identified as spam. Call it something like spambox, spam, junkmail, or something else meaningful to you. Then, create a new filter in your email client and test for "[spam:" (without the quotes) at the beginning of the subject line. Some basic instructions for popular email clients are available here:
General spam filterning instructions page
--or go directly to instructions for--
CSM Webmail
Netscape Messenger 4.x
Netscape Messenger 6.x
Microsoft Outlook Express
Microsoft Outlook 2000 & 2002
EudoraOnce you have created the filters, messages identified as spam should be handled according to the instructions you gave when you created the filter.
What are false positives and negatives and should I care?
Yes, you should care if you want to effectively manage your email. A false positive is simply an email that gets identified as spam when you would not want it to be marked as spam. Only you can decide whether a message is a false positive. Conversely, you will probably receive some messages you think are spam that haven't been marked as spam. These cases are called false negatives. You should routinely scan your spam mailbox (the subject and sender at least - you don't actually have to open the messages), especially when you first start filtering, to look for false positives. You can edit your user_prefs file to "whitelist" (i.e. exempt) future messages from this sender, change scoring characteristics, and so on to reduce the number of false positives. False negatives will be obvious to you because they will appear in your regular mailbox. You can also edit your user_prefs file and add false negative addresses by "blacklisting" them or by further refining scoring characteristics.
What common changes can I make to fix false positives or negatives?
The two most common changes/additions you would make to correct false positives and negatives are to whitelist or blacklist email addresses. You will see samples of how to do this within the user_prefs file in the .spamassassin directory. The most common directives to use are "whitelist_from" and "blacklist_from". Other directive are also available and can be found in the SpamAssassin documentation. Here are some examples:
Directive Address / Parameter Meaningwhitelist_from yourfriend@yourfriend.com Never mark email from this address as spam whitelist_to maillinglist@maillist.com Never mark email sent to you as part of this mailing list or alias as spam
whitelist_from
whitelist_from
*@colostate.edu
*@*.colostate.edu
Never mark email sent to you from any address at CSU as spam
(You would need both of these lines to accomplish this)blacklist_from annoying@spammer.com Always mark email sent to you from this address as spam. blacklist_from *@spammer.com Always mark email received from any address at spammer.com as spam
Click here to view the documentation about user preferences options that you can insert in your user_prefs file..
Can I use SpamAssassin with other ".forward" functions such as vacation messages and mail forwarding?
Yes, it is possible to use Spam Assassin in conjunction with the vacation program, forwarding, procmail and other functions that typically hook into the email system via the .forward file. Care must be taken, however, as it can become a somewhat complex problem to get all the programs integrated correctly. SpamAssassin is very flexible and can be configured to do a number of things. The default configuration is very conservative with respect to what is done with spam. It is possible to do much more aggressive and automatic spam filtering. Please consult the SpamAssassin documentation, the Mines Help Center, and/or your system administrator if you have questions.
I'm confused and need some help to get started. Will you help?
Yes. If you are not comfortable in setting up Spam Assassin initially, assistance is available from the Computing and Networking support staff. We will not administer personal whitelists and blacklists for individual users, however. If you wish to maintain personal whitelists and blacklists of email addresses you must learn to administer those lists yourself. A web interface will eventually be developed to help you administer these. Currently, however, you must learn to edit these files yourself on Imagine.
You must set up and maintain filtering in your own email client. If the instructions about that don't work for you, please seek help from support or your local support team.
|
Spam Info Links
|
Spam Tips
|
|
|
Be cautious about using unsubscribe
options in spam messages you receive. Sometimes the options to get off
the list are legitimate but other times you are just confirming to the
sender that your address is real and you might end up receiving even
more spam. |