[Tutorial] Spam Filtering Based on SMTP Header
Spam. Spam is incessant; therefore, each responsible mail administrator should keep up with spam filtering methods based on their headers’ SMTP analysis. In this article, we use the Postfix server for MTA server configuration.
MTA (Mail Transfer Agent) is the main component of the Internet mail transfer system
All in all, this article’s information is targeted at those administrators who seek to establish a highly efficient spam filter. It also contains essential information for those who use e-mail a lot and want to understand how electronic correspondence mechanisms operate.
In this article, we assume that the Postfix mail server is used as MTA, but generally speaking, it considers more of a theoretical approach. The Postfix options have to be specified in the relevant * _restriction parameters of the configuration file, for more information, refer to the Postfix configuration guide.
At a glance: SMTP protocol
The main task of the SMTP (Simple Mail Transfer Protocol) protocol is to ensure the electronic mail (e-mail) transfer. To operate via SMTP protocol, the client establishes a TCP connection with the server through port 25. After that the client and SMTP server exchange information until the connection is closed or terminated. The major procedure in SMTP is Mail Procedure. It is followed by Mail Forwarding procedures, mailbox names checking and mail groups listing. The very first procedure opens the transmit channel, while the last one is to closes it.
SMTP commands communicate to the server what operation the client requires. These commands consist of keywords followed by one or more parameters. The keyword contains 4 characters and is separated from the argument by one or more spacebars. Each command line ends with CRLF symbols. Here is the syntax for all the SMTP protocol commands (SP – spacebar):
HELO <SP> <domain> <CRLF> MAIL <SP> FROM:<reverse-path> <CRLF> RCPT <SP> TO:<forward-path> <CRLF> DATA <CRLF> RSET <CRLF> SEND <SP> FROM:<reverse-path> <CRLF> SOML <SP> FROM:<reverse-path> <CRLF> SAML <SP> FROM:<reverse-path> <CRLF> VRFY <SP> <string> <CRLF> EXPN <SP> <string> <CRLF> HELP <SP> <string> <CRLF> NOOP <CRLF> QUIT <CRLF>
A typical SMTP server response contains a response number followed by subsidiary text separated by a spacebar. The response number serves as a server status indicator.
Let us, for instance, consider connecting to an SMTP server through port 25. Now you have to send a HELLO command and our IP address to the server:
C: HELLO 188.8.131.52 S: 250 smtp.example.com is ready
When sending mail, we transfer some required data (sender, receiver and the message itself):
C: MAIL FROM: <user1> specify the sender S: 250 OK C: RCPT TO: <[email protected]> specify the receiver S: 250 OK
Indicate to the server that we are going to transmit the message contents (header and message body)
C: DATA S: 354 Start mail input; end with <CRLF>. <CRLF>
message transfer has to be completed with CRLF.CRLF symbols
S: 250 OK C: From: User1 <[email protected]> C: To: User2 <[email protected]> C: Subject: Hello My Friend
between the message header and its text there is not one CRLF pair but two.
C: Hello User2. C: How do you do!
complete transmission with CRLF.CRLF
S: 250 OK
Now to complete the operation we send the QUIT command:
S: QUIT C: 221 smtp.example.com is closing transmit channel
And now let us consider the procedure of a client connecting to your mail server. The mail server capabilities contain a mechanism for e-mail check based only on the SMTP message headers taken from the address information.
As you already know, e-mail is transferred between mail servers via SMTP. SMTP communication begins with three requisite headers: HELO, MAIL FROM and RCPT TO. The implication is that before beginning data transfer, the server first introduces itself with HELO, then it reports the return sender address MAIL FROM and then the recipient address RCPT TO. These three headers form the signature on an electronic envelope, and most of the spam can be eliminated via their analysis. Survey of the MAIL FROM header rejects most of the attempts to transfer something to my server, that is, the letters are refused even before actual receiving, which considerably eases things off.
Let’s check the order of steps you need to take to reject emails that are obviously spam.
Initially, at the onset of the Internet, the message was delivered directly to the Nodes indicated in the mail address. In order to deliver a letter to [email protected], the mail server determined the IP address exemple.com and attempted to send an email to the determined IP. The appearance of MX records has solved many complications of such mailing process organization. Some programs still work with A records, but mail server organization requires at least one MX record for one domain.
MX records contain server addresses to which e-mails for the specified domain should be delivered. To eliminate spam, SPF (Sender Policy Framework) technology is also taken to; the following entry indicates the DNS addresses of possible sender servers from the specified domain.
For a more detailed introduction to the SPF technology, refer to RFC 4408. The next SPF record tells that emails can only be received from the servers listed in the MX records for your domain.
v=spf1 +mx ~all
It looks like ‘fairy dust’ that ensures that the spam problem is eliminated once and for all. All that is lacking is to enable strict filtering based on SPF and fix the spat of the detestable spam forever. It should be noted though that letters can get through retractors and thus SPF-based hard filtering is extremely undesirable. Be sure to add an SPF record to your domain, and enable SPF checking on your mail servers. As for the information on setting up SPF for Postfix, the Google search system has everything you may require.
A few more words about the DNS records, the DNS master records are “A” records that transform domain name in the IP address. There are also CNAME records that contain aliasing to an already existing domain name. These are the major records of the domain name system, but not all the users are aware of the so-called PTR reverse record, which translates the IP address into a domain name. If you follow these two rules, messages from your mail server are less likely to be seen as spam.
As for your domain name, there must be an “A” record, and a PTR record that allows to recognize by the host name IP address, and by IP again the host name. For an MX record, you must always specify the host name for which an A record is specified. It is forbidden to use IP records or the aliasing CNAME in MX.
Ignoring any of these rules causes some of the emails coming from your domain being recognized as spam.
Keep an Eye for the Client.
At the initialization of the SMTP session, you get the client’s IP address and therefore you can deduce some conclusions on its basis. The next option includes checking the PTR record of the client.
This option determines that if the IP address that initiates connection is resolved to the name via PTR, therefore this name must be resolved back to the client IP of the person who established the connection.
The following option provides a less tight constraint.
This option only checks for a PTR record, without looking for the existence of a corresponding “A” record.
The next trick allows you to eliminate a great part of incoming spam. For this you have to make sure that your server does not respond to the client immediately, but with a short delay. Spammers do not have queue time, and a common SMTP server never breaks the connection without waiting for at least a couple of seconds. Upon receiving a request for a connection from a client, a little delay will be very helpful, since the connection is already established and there is communication between a client and a server, and delays cannot force the client to disconnect.
The timeout during the connection is activated by the command
The optimal time is around 20 seconds, as after the 30th second of waiting Postfix considers by default that the server to be unavailable.
Before transmitting a letter to your mail server, the transfer begins with a HELO introduction. If the FQDN of a sender is not specified in HELO, we can beyond a doubt refuse its acceptance. It requires the following options
The first one prohibits the reception of letters from hosts introducing themselves with incorrect syntax, the second – from those hosts that send non-FQDN in a HELO request.
However, the non-FQDN is only transmitted by the most unwitting spammers, and it is not so difficult to introduce as gmail.com. Here is the next option for such an occasion.
This option prohibits receiving letters from those servers that are introduced with an address that had not been added to “A” or MX record. SMTP clients have to strictly adhere to RFC 5321 and begin a session with the EHLO (HELO) command and submit their full domain name (FQDN) or full IP address in case of absence of a domain name. Since mail servers on the Internet cannot go without a domain name, all of the above options can be established without hesitation.
After the server has introduced itself, the program turns to the sender’s address, which also contains useful information. There is a misconception that both the sender’s address and the server itself always belong to the same domain. It is not true; a single mail server can serve several domains.
If the sender’s address contains a non-existent domain or an address that is nothing but a random charset or even an empty space, the letter should obviously be banned. The following options are used to reject such messages.
The first option will reject the request if the domain in the MAIL FROM address line does not comply with the FQDN, as it is required by the RFC. The second option will reject the message if there are no “A” and MX DNS records in the MAIL FROM address line, or if the address contains an incorrect MX record, for example, a record with an MX name of zero length.
Next, we are going to request the server of the specified sender address for the existence of a user with such an address. This allows us to verify that the return address actually exists.
Verification is performed as follows, our server initiates a reciprocal SMTP session, attempting to send an email to the sender’s address. If it is sending RCPT TO is successful with this address, i.e. if the receiving server does not suddenly declare that the specified mailbox is nonexistent, it is considered a valid one. The data (that is, the message), of course, is not to be transmitted during verification, the session is terminated after RCPT TO.
It follows from the above mentioned, that for each address you send an email from your domain, there must be a mailbox on your server. Otherwise, these messages will fail the return address verification on the receiving side and, accordingly, will not be delivered to the receiver.
If you plan to issue a newsletter, always create mailboxes for all addresses you send your message from. If somehow you do not want to receive letters to a particular address, send them to it to /dev/null, but you are obliged to receive messages.
How to Verify a Recipient!
The following directive serves to verify the header of the recipient envelope for information actually being an email address.
The next step is to prohibit receive email from nonexistent addresses. To achieve this, you have to first create a list of serviced boxes, then introduce Postfix to it through one of the * _recipient_maps parameters of the configuration file, and then you can use the configuration file parameter
smtpd_reject_unlisted_recipient = yes
or a disabling option that has the same effect:
Thus, Postfix stops receiving emails for serviced domains where there is no mailbox for the recipient.
The open transmit of letters through your Postfix we prohibit, leaving only the opportunity to receive letters to known addresses. For this purpose serves the option
Thus, we prohibit sending of letters to all unregistered users, except for those that have passed authorization or have been added to permit_mynetworks. It is strongly recommended to use this option, otherwise, you risk to get in the DNSBL (RBL) – Domain Name Service-based Block List.
Based on the analysis of the three envelope headers, you can eliminate a huge amount of spam. Next, consider the following mechanism to combat spam.
Postgrey – implementation of gray lists for spam filtering.
Greylist (gray list) is an effective instrument in combating spam. The method is based on the fact that when a mail server receives a message, the server does not accept it, gives a warning of temporary unavailability, and takes the sender’s coordinates in a special list. If the sender’s mail server is configured correctly, soon there will be a second attempt to send the same message, which this time will be processed by our mail server. Information about whether to accept or reject the letter is taken from the database of temporary lists. The first time we reject, if the same mail comes the second time, then we accept it.
As practice shows, 50-70% of spam software usually makes only 1 delivery sample, so this method can safely halve junk mailing.
Cons of the greylist technique:
• The delivery time for messages increases, sometimes this period is significant, it all depends on the settings of the senders’ mail servers;
• If the sender’s mail server is configured incorrectly, it will not attempt to deliver the message again. Alas, sometimes it happens, though not very often.
Using blocklists or the so-called DNSBL (RBL) – Domain Name Service-based Block List, can cut off the lion’s share of spam that arrives to your mail server. In these lists, the nodes are logged absolutely randomly and there are no guarantees that a valid host will not happen to be included there (where, perhaps, a spamming virus has been settled for some period of time, but even though the virus has been eliminated, or still easier and much more real, a single external IP for a large network in which the spammer ha dug a foxhole). The second reason is more trivial: the filtering mechanism proposed above is much more efficient than any DNSBL and does not rely on unverified data from third parties.
We have considered the main mechanisms for combating spam, now it is time to examine our servers from the outside to avoid our own spamming.
For mail server administrators:
• Always make MX records referring to “A”.
• The A record for the mail server should always have a mirrored PTR record.
• The host from the HELO header should have an “A” or MX record.
• Always create SPF recordings (yes, this is just not necessary, but just a good tone rule).
• For all messages sent from the accepted domain, a return address should always exist and receive mail.
For those who send mail:
• Always send mail only with a valid return address.
• Never send mail from a domain that is not under your control without checking the SPF rules.
• Never send mail through incorrectly configured SMTP servers. Check the server for DNS records using dig or nslookup utilities.
By all means, the settings above will not eliminate all the spam. It is likely that you will need additional context filter that will analyze the content of the messages, for example, Spamassassin.
However, keep in mind the description we provided you within this article when configuring mail since the above parameters can reduce the server load in comparison with contextual filters only, and provide excellent additional filtering.
You have to decide for yourself which restrictions you are ready to put on your server and which ones are not yet acceptable to you. The verifications above are taken a dim view of by many. However, all of them are used in real SMTP servers on the Internet, and even if you turn on each and every one of them, you will not be alone. Therefore, if you notice a server that is configured incorrectly, do not hesitate to send a message to the administrator on this issue, perhaps you will be able to help him or her to avoid complaints from those users whose correspondence has not been sent properly. And never forget that cracked hosts can be added to the whitelist to allow receiving mail from them without verification.
Comments on Postfix options
First, as it has already been mentioned, all the above options must be specified in one of the four * _restrictions parameters of the Postfix configuration file related to the SMTP session headers. The first one, smtpd_client_restrictions, is responsible for verifying the client’s IP address, the remaining three – smtpd_helo_restrictions, smtpd_sender_restrictions and smtpd_recipient_restrictions – for the corresponding SMTP headers.
In this case, note that all four parameters should be implemented sequentially, and if the message is rejected at least by one, it is not accepted at all and the sender receives an error notification. However, the permit rules cancel further verification only on this particular parameter, and Postfix proceeds to the next. Therefore, if the letter successfully passed the examination by smtpd_client_restrictions, it can easily be flunked out by smtpd_helo_restrictions.
Moreover, modern Postfix versions by default reject the letter no earlier than they reach the RCPT TO command. Thus you can always see the full information about the letter – who is it from and to who is the supposed receiver. Do not attempt to change this in Postfix.
Finally, it is possible to test any reject_ * option without actually rejecting the messages. For this purpose, specify the warn_if_reject option before it in the corresponding parameter of the Postfix configuration file. Thus in case of a bun, a reject_warning notification will be added to the log file, but the letter will not be rejected. For more information about possible prohibitory options and parameters of the Postfix configuration file turn to the official documentation.
- This tutorial was helpful for you?10 votes
- Yes, definitely.60.00%
- A little bit.  0.00%
- Not at all.40.00%