N.B.: The following is the opinion of the author, and does not necessarily reflect the views of any other CSLab personnel.
What's in a name? More than you might think.
A computer can have any number of names. A server machine which supports several functions might be known under the aliases "ftp.xmpl.toronto.edu," "www.xmpl.toronto.edu," "wais.xmpl.toronto.edu," and so on, in addition to whatever "official" name it might have. In fact, most machines at the University of Toronto have at least two names: one in toronto.edu, and one in utoronto.ca.
However, a machine must also have one distinguished name, called its canonical hostname. This is the name which is returned when the system's IP address is looked up, and there are a number of situations in which the canonical name must be used. For example:
There are a number of departments at the University of Toronto which prefer to use a partially qualified domain name as the canonical name for on-campus hosts, instead of the fully qualified name. For example, within the Department of Computer Science, the canonical name of one of the compute servers is the partially qualified domain name "qew.cs," rather than he fully qualified domain name "qew.cs.toronto.edu."
This is usually accomplished by configuring the department's systems to use the /etc/hosts file to perform name and address lookups before querying the Domain Name System (DNS), and by listing the on-campus hosts in that file using their short (partially qualified) names, like this:
# /etc/hosts file 127.0.0.1 localhost 128.100.1.13 qew.cs qew 128.100.1.15 dvp.cs dvp # and so on, for lots and lots of other hosts on campus
There are a number of reasons why this is a really, really bad idea.
A consequence of using a local /etc/hosts file to produce different answers than DNS would give is that hosts which don't use your /etc/hosts file--that is, every host in the world outside the University--will think that your host's canonical name is different from what the host itself thinks its own canonical name is. With certain services, this can cause confusion. For example, part of the specification of the SMTP protocol is that the name presented on the HELO line must be the host's canonical name. If the machine foo.xmpl.toronto.edu is talking to an off-campus mail server, the conversation might begin like this:
220 mail.remote.com SMTP server ready, please send HELO HELO foo.xmpl 551-Your hostname is foo.xmpl.toronto.edu, not foo.xmpl 551 HELO command rejected; please try again.
Of course, you could hack your mailer's configuration to hard-code the name it presents in HELO lines; however, this could interfere with its communication with your own mail server, since it will be expecting your host to use the partially qualified name.
In practice, most (but not necessarily all) mailers are somewhat more forgiving than this--usually they will either ignore the incorrect name offered on the HELO line, or accept it with a warning that it was not the name it was expecting. However, as more sites implement measures to prevent third-party relaying of mail, expect some of them to be that much more paranoid about sites which introduce themselves using a name that doesn't correspond to their IP address.
The problem is not restricted to mailers, either; any client-server software model may reasonably make the assumption that both client and server will get the same answer if they ask the same question.
Conversely, when a service on your own machine looks up an IP address, there are two different forms that the answer it gets back might take: it might be a partially qualified name found from your /etc/hosts file, or it might be a fully qualified name fetched from the DNS.
This can be a serious obstacle when trying to set up control mechanisms for services which are only permitted for local machines. For instance, many departments use Wietse Venema's tcp_wrappers package, and or the tcpd program which comes with it, in order to control access to services. However, if some names are partially qualified and other names are fully qualified, you can't effectively implement access control based on the canonical name of the machine. For instance, you can't simply put the following line into your /etc/hosts.allow file:
in.rlogind: .xmpl.toronto.edu
because the canonical names for your hosts end in .xmpl, not .xmpl.toronto.edu!
If you wanted to refer to your own hosts in an /etc/hosts.allow file, you would have to permit ".xmpl" rather than ".xmpl.toronto.edu". This is, at best, imprecise, and at worst, a security hole; if your department name happens to have the same name as a top-level domain (or if your name servers can be fooled into believing such a top-level domain exists), you may be extending your service to more hosts than you intended--for example, if your department happens to extend services to the Numerical Analysis group, and you add ".na" to your /etc/hosts.allow file, you are granting access to not only those local hosts which are partially qualified as "*.na", but also those hosts in the Republic of Namibia whose fully qualified names are in the top-level .na domain.
While it is still possible to control access to services based on IP addresses (e.g., "in.rlogind: 128.100.253"), there are many situations where it is more convenient to do so by name--such as when a few of your department's hosts happen to be connected through other networks, or when you need to put a host on your network which is not supposed to have access to all services. If you want to make a service available to the entire University of Toronto community, it's much easier to allow ".toronto.edu" and ".utoronto.ca" than to try to maintain a correct list of which IP networks belong to U of T; you can't do this if the canonical names are only partially qualified.
There is a particular technical problem which arises when combining the /etc/hosts mechanism with hosts which have network interfaces on more than one network. Such machines are called "multi-homed hosts."
The problem is that when a hostname is looked up in the /etc/hosts file, the search ends when the first match is found. This means that if a host has two or more IP addresses, a name lookup will only find the first one that appears in /etc/hosts.
This is particularly problematic when combined with the tcpd program, or other daemons which verify that the hostname and IP address are consistent (such as IRIX's rlogind daemon with the -a option).
By way of example, the compute server qew.cs.toronto.edu has three IP addresses: 128.100.1.13, 128.100.2.15, and 128.100.3.10. Suppose qew.cs.toronto.edu tries to connect, from a source address of 128.100.3.10, to a tcpd-wrapped service on a host which lists it as "qew.cs" in its /etc/hosts file:
Note that this problem has nothing directly to do with the use of short names versus long names; the problem is the use of the /etc/hosts mechanism instead of DNS, which will return all of the IP addresses associated with a given name.
U of T has grown over the years: today, a complete /etc/hosts file which lists every host on campus is over 23,000 lines long and over 650KB in size.
This file must be searched linearly, from the beginning every time an application wants to look up a name or an IP address. This can amount to an appreciable delay for services which accept connections frequently, such as web servers. (Furthermore, whenever the machine being looked up is off-campus, the entire /etc/hosts file must be searched all the way to the end before the server will fall back to DNS.) This can cause a very noticeable impact on the performance of your servers; on a Sun SparcStation 5/85, it takes about a second and a half for the resolver library to read through and parse an /etc/hosts file of that size.
It is far more efficient to use DNS for all lookups, and keep only a skeleton /etc/hosts file for use while booting and when in single-user mode.
One argument which is sometimes put forward for using partially qualified canonical names in favour of fully qualified names is that the short names are easier to type.
However, in practice, this really isn't much of an issue. There are only a handful of cases where the full name needs to be used; typically, these involve cases where the IP address is mapped to a name.
Using fully quallified canonical names does not mean that your users have to start typing "telnet foo.xmpl.toronto.edu" instead of "telnet foo.xmpl" in order to connect to a local machine. When your hosts are mapping names to addresses via DNS, they do so according to the search rules explained in the resolver(5) or resolv.conf(5) manual page. For example, if your /etc/resolv.conf file lists the line:
search xmpl.toronto.edu toronto.edu utoronto.ca
then "telnet foo" or "telnet foo.xmpl" will both still work, just as they did before.
The only places where the canonical name must be used are those places where the name is compared against the result of an IP address lookup, such as /etc/hosts.equiv, ~/.rhosts, /etc/hosts.allow and /etc/hosts.deny, and so on. For most other situations, the short names work whether they're canonical or not.
This article deals with the question of whether the hostnames returned from /etc/hosts should be the same as the ones returned by DNS. (In case you're not paying attention, the answer to that question is "yes.")
There is also a related question: does the host's name as set by the /bin/hostname command need to correspond with its canonical name and/or its fully qualified domain name?
Strictly speaking, it doesn't have to be; however, the comp.protocols.tcp-ip-domains Frequently Asked Questions (FAQ) list recommends that the Unix kernel's view of the hostname be set to the fully qualified domain name of the system. If you're changing your /etc/hosts file, you may as well change the name used by /bin/hostname while you're at it. The one thing to keep an eye out for is that none of your existing software or scripts assume that the name returned by the kernel is partially qualified, and try to qualify it with "toronto.edu" or "utoronto.ca."
A canonical hostname should be just that: canonical. Using partially qualified names for local hosts is a shortsighted practice which ignores many of the realities of a global Internet, and introduces many more problems than it purports to solve.