An Introduction to Internet E-Mail

This document describes the fundamental concepts, and a few of the Unix implementation details, for Internet E-Mail systems. It's intended to serve as a basic primer for newcomers to Unix mail who are confused by the questions that their software's configuration script is asking.

Contents:

Some Definitions

Mail Transport Agent (MTA): a program which acts as a "mail server". Specifically, it's responsible for managing a queue of outgoing mail, and for accepting (or rejecting) incoming mail. Examples: sendmail, qmail, postfix, exim.

Mail User Agent (MUA): a program which provides a human user interface for reading and sending mail. Examples: elm, pine, mutt, Outlook, Netscape, Thunderbird.

Simple Mail Transport Protocol (SMTP): the protocol used between MTAs for sending mail from one host to another. This protocol is also sometimes used between an MUA and an MSA (see below). The official TCP port number for SMTP is 25.

Mail Submission Agent (MSA): a relatively new term in the e-mail field. This is the component of an MTA which accepts new mail messages from an MUA, using SMTP. (Traditional Unix MUAs send their mail using a pipe to one of the MTA's component programs on the same host. Most Windows MUAs use SMTP to talk to an MSA because there is no MTA on the Windows host.) Most MTA implementations use the same program as both their MSA and the part which accepts incoming mail from other hosts. In other cases, these functions are implemented separately. The official TCP port number for an MSA is 587 (although in many cases it's run on port 25).

Mail Delivery Agent (MDA): the component of an MTA which is responsible for the final delivery of a message to a local mailbox on disk. Sometimes this is a separate program, and sometimes it's built into the MTA.

Post Office Protocol (POP): a protocol used by some MUAs to retrieve mail from a user's mailbox on a remote server. Often written "POP3". The official TCP port number for POP3 is 110.

Internet Message Access Protocol (IMAP): a protocol used by some MUAs to retrieve mail from a user's mailbox on a remote server. This is a newer and more complicated protocol than POP, with a lot more functionality. The official TCP port number for IMAP is 143.

Local Mail Storage

Long before Unix systems were networked to each other, users on a Unix host sent e-mail to each other on the local system. Every Unix user account is capable of receiving e-mail, unless special steps have been taken to prevent this from happening. (On most systems, root is not able to receive e-mail directly.)

Mail is stored in one of four different formats:

Some Mail Transfer Agents (MTA) will deliver messages directly to local mailboxes by themselves. Others use a Mail Delivery Agent (MDA) to do that for them.

Simple Mail Transport Protocol (SMTP)

SMTP is a protocol that runs on top of TCP/IP. It allows Mail Transfer Agents (MTA) to exchange e-mail with each other over a network. It is also used by some Mail User Agents (MUA), especially MS-Windows ones, to send e-mail to a more competent or better-connected host for later delivery.

SMTP has no authentication. Messages sent by SMTP contain the following information, which is called the envelope:

Relaying

When an MTA accepts a message via SMTP, it takes on the responsibility for seeing that it is delivered to the recipients listed in the envelope, or at least to another MTA which is closer to the ultimate destination. This can take several tries, because the destination MTA might be down temporarily (or simply too busy at the moment).

In the ideal situation, every MTA would have a set of local domains for which it is responsible, and would accept messages only for users in those domains. Sending MTAs would deliver their messages straight to the final destination MTA, which would accept them and deliver them to the appropriate local mailboxes.

However, the real world is not that simple. There are systems that are connected to the Internet only intermittently, and so they cannot realistically queue messages and retry them periodically. (There are also operating systems that cannot handle this sort of multitasking with any reliability.) So, MTAs are set up that are intended to serve as relays for these less capable systems. They accept messages from the crippled hosts, and take on the responsibility for delivering them, retrying when necessary for a reasonable length of time. A relay is sometimes known as an SMTP server or a smart host.

Spam

Unfortunately, the generous nature of Internet SMTP service was noticed by people of low moral fiber. These people have decided, for whatever reason, that they wish to send e-mail to every address in the known universe. And moreover, they don't want their own systems to bear the burden of so many deliveries; they would prefer if someone else would do the hard work for them.

Such messages are called spam in the vernacular tongue. Another common term is unsolicited commercial e-mail (UCE), since the majority of such messages are advertisements.

In order to make it more difficult for "spammers" to exploit their systems, administrators of SMTP relays have had to take measures which were not envisioned by the people who originally created SMTP. Specifically, it is no longer recommended to run an open relay which accepts messages from anywhere and delivers them to anyone. Open relays are vigorously hunted by spammers, and once found, they will be exploited to spread spam.

There are at least three common strategies used by relays to attempt to stop the injection of spam:

Several organizations have attempted to fight against spam by collecting lists of hostnames (or IP addresses) which are known to be open relays. These lists are made available to the public. MTA administrators who wish to do so may configure their MTAs to reject messages which come from these hosts that are known to be open relays, on the grounds that any such messages are probably spam. The most famous of these lists is called the Realtime Blackhost List (RBL). Another is ORBS.

Implementation and Configuration of an MTA

When configuring an MTA, certain pieces of information must be supplied. Some of this is implicit; most of it must be explicitly supplied by the system administrator when the MTA is installed or configured.

An MTA has to know what domains are to be treated as local addresses. The MTA handles messages sent to users in these domains by delivering them to local mailboxes. Messages to domains that are not local will be sent to another MTA, using SMTP.

Some MTAs divide local domains into two separate categories: virtual domains and normal local domains. Suppose a system is handling two domains, dom1.foo and dom2.foo. If this system is not using virtual domains, then bob@dom1.foo and bob@dom2.foo both refer to the same local user, and are delivered to the same mailbox. However, if one or both of these domains are virtual domains, then the namespaces do not overlap: bob@dom1.foo and bob@dom2.foo are separate users. Virtual domains are popular with certain types of ISPs who manage large numbers of domains for their customers.

The MTA must know how to deliver messages to users in its local domains. It must know what mailbox format to use, and where the mailbox should reside within the file system. (Most MTAs have a standard delivery mechanism, which can be overriden by a "dot file" in the user's home directory.) Virtual domains complicate this, because an MTA must handle each virtual domain user separately.

The MTA must know whether it should permit relaying, and if so, it must know what hosts are allowed to use the relay. If you are planning to use POP before SMTP, or SMTP AUTH, then special steps must be performed here. (Setting up POP before SMTP or SMTP AUTH is outside the scope of this document.)

If your MTA is inside a firewall, or is not going to be connected to the Internet permanently, then you might need to set it up to use some other host as a relay (or smart host). In this case, you will have to supply the IP address or host name of the relay you wish to use.

Finally, your MTA will generally have a set of behaviors and rules regarding how it rewrites addresses. These fall into a few categories, based on which address is considered:

Dynamic IP

All of the configuration shown so far has been applicable to normal Internet hosts: that is, ones with a static IP address and a fixed host name. With the popularity of Linux and *BSD on home systems recently, a lot of people are attempting to set up services on systems that have a dynamic IP address. This presents special challenges for all types of services, including e-mail.

By far, the easiest way to set up e-mail on a system with a dynamic IP address is to register with one of the dynamic DNS services on the Internet. This gives you a fixed host name, which is all you really need for running a mail server. If you want to have your own domain, that can also be done.

Your MTA will probably need to be able to resolve its own hostname when it starts up. If you rely solely on a dynamic DNS name, this generally means that you'll have to be connected to the Internet when you start your MTA. Obviously that's not a good way to configure your system, because you don't want to have to restart your MTA manually every time you reboot.

The easiest workaround for that is to put a bogus line in your /etc/hosts file which contains a bogus fully qualified domain name (FQDN) for your system's hostname. Even if you don't have a LAN, you can still use a private IP address (such as 192.168.1.1) in /etc/hosts. Here is an example:


  127.0.0.1	localhost
  192.168.1.1	MyHostname.local	MyHostname

Now, when your MTA tries to resolve MyHostname, it will get the IP address 192.168.1.1. When it looks up that IP address, it will get the FQDN MyHostname.local. Since this has a period in it, your MTA will be happy. (Apache and some FTP servers also require similar configuration.)

Note that if you're using a dynamic DNS name, the line(s) in /etc/hosts will probably be totally unrelated to the domain name that you use for e-mail. This is to be expected.

Envelope Sender

We've discussed the SMTP envelope in previous sections, but you may not have understood why it's so important, and what steps you may need to take if you encounter problems with it.

Normally, your MTA constructs the envelope sender address by combining your local user name (ID of the user invoking /usr/sbin/sendmail), an "@" sign, and whatever your MTA thinks your local domain name is. If you're on a system whose MTA has a real domain name (either because you registered it, or because you set up your MTA to do full domain masquerading), then this usually works fine. The address the MTA constructs will be a valid address.

However, this falls apart -- badly -- when you have a dynamic IP address, and haven't chosen to use a dynamic DNS service. The namespace of your ISP's e-mail addresses will not map one-to-one onto the namespace of your local users (in /etc/passwd). For example, if your hostname is myhost and you have internet service with myisp.com, and you send mail as root, then a naive MTA setup would generate messages with one of the following envelope sender addresses:

Out of these, only the second one is actually correct, and it's only correct for messages on your LAN; you can't send e-mail on the Internet like that. The third and fourth ones are flat-out wrong: the third one will probably give an invalid FQDN because myhost.myisp.com is not in the DNS; the fourth is even worse, because you are not root on your ISP's mail server, yet you're claiming to be. (Bounce messages are sent to the envelope sender address. If you send out messages with root@myisp.com in the envelope, and they get bounced for some reason, your ISP's mail administrator is going to get the bounces -- and probably either delete them, or complain to you.)

If you send messages with an invalid envelope sender address (one that's not a real address at all, such as the first one, which is a very common mistake), there are two possible outcomes: either the message will be accepted by the receiving MTA, or it will be rejected. If the receiving MTA accepts it, but cannot deliver it, it will try to bounce the message -- and will discover that it cannot do so. Therefore, it will either discard the message, or it will "double bounce" the message to the remote system's postmaster (who will probably delete it, since (s)he has no idea who "myhost" is). If the MTA rejects the message, you will see an error message almost immediately. In any case, you won't be able to send outgoing Internet e-mail until you fix your setup.

So what's the answer?

If you choose not to use a dynamic DNS service, you will have to change the envelope sender on your outgoing messages. This can be done in several different ways, depending on which MTA you use, which MUA you use, and your local policy decisions:

From an end user's point of view, a virtual user table combined with rewriting of the headers is probably the easiest approach, because the end user doesn't have to do anything. Of course, this means that the administrator (you) has a bit more work to do. It also prevents users from changing their envelope sender address easily, which could be either good or bad depending on your point of view. (A user who has multiple e-mail addresses and wants to use a different one depending on the recipient would probably prefer the flexibility of setting -f on a per-message basis. An administrator who has a group of college student users may not want them to be able to spoof their addresses quite so easily.)

If you haven't chosen an MTA yet, and you plan to use a dynamic IP setup without a dynamic DNS service, you should look carefully at the different MTAs and decide which one best fits your needs and your site policy.

POP3 and IMAP

A POP3 or IMAP server is extremely useful for sites where users will be reading their mail without shell access to the mail server. This describes just about every local area network (LAN) and every Internet service provider (ISP) today.

Although some MTAs may incorporate POP3 or IMAP service in their software, POP3 and IMAP are usually provided by separate programs. A POP3 server is a relatively simple beast: it simply listens for connections on TCP port 110, authenticates the user by username and password, then allows the user to list, retrieve, and delete mail from the mailbox. The normal POP3 user paradigm involves "downloading" all the messages from the mail server to a local PC, then deleting them from the mail server. Mail storage, backup, filtering, and so on are expected to be managed by the client. POP3 connections are typically short-lived.

An IMAP server is conceptually similar; but the protocol allows a much greater degree of subtlety (and efficiency) in how the client retrieves information from the server. The IMAP server listens on port 143, authenticates the user by username and password, and then interacts with the user's MUA over an extended period of time. Mail is often left on the mail server (although this is not required); IMAP connections generally last as long as the user's MUA is still running.

The coordination between a POP3 or IMAP server, and the MTA, is fairly loose. Ultimately, the only requirement is that they all agree on where the mail is stored, and in what format. It helps if they agree on the username, so that the POP3 login name is the same as the local part of the e-mail address; but this is not mandatory. (In fact, some POP3 implementations use the entire e-mail address as the username.) As long as the POP3 or IMAP server can find and read the correct user's mailbox, everything should be OK.

References