Next: 5 Evaluation Up: HLFSD: Delivering Email to Previous: 3 Design

Subsections

4 Implementation of `hlfsd`

We used a prototype NFS server, and implemented only the operations that were needed. We generated NFS stubs using rpcgen. The server was developed first under SunOS version 4.1.2. This server was incorporated into the amd source tree, and we used some of amd's sources as utility functions, since they are well-written to handle a variety of architectures and operating systems. (See Section 4.7 for source code availability.)

4.1 The ``Home-Link'' File Service

This subsection includes technical details of the NFS operations and may be skipped. However, it provides an example of the design and implementation of a small special-purpose NFS server and may be of use to others.

HLFS is a read-only filesystem, and as such, all operations that require write access return the error code NFSERR_ROFS (``Read-Only Filesystem''): setattr, write, create, remove, rename, link, unlink, symlink, mkdir, and rmdir. Trivially implemented were the null, root, and writecache operations. We decided to have statfs return some value (all zeros in most cases). The read operation simply returns NFSERR_ACCES (``Permission Denied'').

The remaining operations are the heart of this filesystem: readdir, getattr, lookup, and readlink.

Our server must distinguish between the directory and the link, so we assigned them different integers to serve as filehandles. Note that these need not be as complicated as the filehandles usually generated by NFS. They need only to be unique, and their value is meaningful only to the server.

4.1.1 The readdir Operation

Opening this directory returns the ``.'' and ``..'' directories, and one symbolic link, home. Attempting to readdir on the symbolic link results in an NFSERR_NOTDIR. Anything else is a stale filehandle.

4.1.2 The getattr Operation

Getattr returns r-xr-xr-x for the ``.'' and ``..'' directories. The link itself, named home by default, is protected as rwxrwxrwx. It does not matter for the link that it is world-writable. The modification and creation times for the link and directories are the startup time of the server. If the effective gid of the process is HLFS_GID, then some fixed valid attributes are returned. Any other filehandle given to hlfsd is considered stale and the NFSERR_STALE (``Stale Filehandle'') result code is returned.

4.1.3 The lookup Operation

Obviously, we only allow looking up in the ``.'' and ``..'' directories, both of which return the same values. Trying to lookup ``in'' the link results in an NFSERR_NOTDIR (``Not a Directory'') error code. Any link not known to the server returns an NFSERR_NOENT (``No Such Entry'') error, unless the gid of the requesting process is HLFS_GID and the name corresponds to a valid user. In this case the username for that user is used in the returned filehandle, allowing the readlink operation to return the correct link. Anything else is a stale filehandle.

4.1.4 The readlink Operation

This is the most important operation, the central point of this work. We get the uid number from the credentials sent with the RPC operation. We make sure that Unix Authentication or DES is used or else we return the NFSERR_PERM (``Not Owner'') code.

If the gid of the accessing process is not HLFS_GID, the value we return for the symbolic link named home is a string representing the home directory of the user whose uid we just found, concatenated with a fixed component name representing a subdirectory within it. We used a binary search on the lookup table to quickly get the right pathname. Different home directories for multiple password database entries with the same uid numbers may return any of the home directories. Only uid 0 is guaranteed to return ``/''. See also Section 5.3.

If the symbolic link is named home and the gid is HLFS_GID, we return a link to ``.'', which causes hlfsd to be used to resolve the next pathname component. This is designed to maintain functionality of programs such as from. If the symbolic link is not named home and the gid of the accessing process is HLFS_GID, we return a value pointing to the user's mailbox file in their mail spool directory. To do this, we extract the username from the filehandle, which was returned by the lookup operation. See Table 2.

Trying to readlink on one of the two directories results in an NFSERR_ISDIR (``Is a directory'') error. Anything else is a stale filehandle.

4.2 Execution Flow

At initialization time, hlfsd creates a UDP service, and forks a child. The child builds the uid lookup table, sets up signal handlers, and interval timers. The signal handlers are meant to reload the lookup table at expiration time of the interval timer, or when a SIGHUP is sent to the server (presumably by a superuser). A special cleanup handler is setup for SIGTERM, to ensure the server terminates cleanly. Then the svc_run() routine is invoked.

Meanwhile the parent waits for the child to initialize. When it does, the parent mounts the server on the mount point. Of utmost importance is to make sure the attribute cache is turned off. If the attribute cache is not turned off, successive accesses to /mail/home would return previously computed pathnames pointing to another user's mail, resulting in mail loss or misdelivery. If it is not possible to turn off the attribute cache, hlfsd will exit. However, the SA has the option to force hlfsd to continue running and set the attribute cache to as short an interval as possible (See also Section 5.3). At this point the parent terminates, leaving the child to run.

When an interval timer goes off (SIGALRM) or a SIGHUP is sent to the server, the server forks a child that continues serving, while the parent reloads the lookup table. When the parent is finished loading, it sends a SIGKILL to the child process, and resumes serving. When a SIGTERM is received, the server forks a child that continues serving, while it tries to unmount the filesystem. If and when that succeeds, both parent and child exit.

As mail service is very important, we wanted to make hlfsd as robust as possible. We could have designed it as another amd ``filesystem type'', but decided that a separate daemon provides better reliability and faster service. In general, we try to do as much as possible: we make sure filesystems are accessible and contain some disk space to have mail delivered there. Where directories are expected we make sure there are no files by these names; where symbolic links are expected, we make sure there are no real files or directories with the same name. Whenever possible, we create directories, with proper ownership and permissions. We even check that the mount point for hlfsd is world readable and executable, since if it isn't, getwd("..") might fail.

4.3 Alternate Mail Spool Directories

Hlfsd tries to ensure that the user's home directory is accessible. Periodically it also tests that it can be written into (Section 4.5). If for any reason a failure occurs, hlfsd repoints the symbolic link for that user to an alternate local directory, which is presumably highly available. We use /var/spool/alt_mail in our environment. See Table 3.

Conditions: Any uid, gid $\neq$ HLFS_GID, and ~USER/.mailspool/ is not writable.
Table 3: Resolving /var/mail/NAME to /var/alt_mail/NAME
Resolving

component Pathname left Value if symbolic link

/ var/mail/NAME

var/ mail/NAME

mail@ /mail/home/NAME mail@ $\Rightarrow$ /mail/home

/ mail/home/NAME

mail/ home/NAME

home@ NAME home@ $\Rightarrow$ /var/alt_mail

/ var/alt_mail/NAME

var/ alt_mail/NAME

alt_mail/ NAME

NAME

**Table 3:** Resolving `/var/mail/`*NAME* to `/var/alt_mail/`*NAME*
Resolving
component	Pathname left	Value if symbolic link
`/`	`var/mail/`NAME
`var`/	`mail/`NAME
`mail`@	`/mail/home/`NAME	`mail`@ $\Rightarrow$ `/mail/home`
`/`	`mail/home/`NAME
`mail`/	`home/`NAME
`home`@	NAME	`home`@ $\Rightarrow$ `/var/alt_mail`
`/`	`var/alt_mail/`NAME
`var`/	`alt_mail/`NAME
`alt_mail`/	NAME
NAME

When hlfsd starts up, and before it mounts itself on top of the mount point, hiding anything that is underneath, hlfsd creates a fixed symbolic link to the alternate spool directory (if it does not exist already). This is done so that /var/spool/mail would not be a ``dangling'' symbolic link, and points to a real directory at all times, even after hlfsd terminates. When hlfsd runs, it hides this symbolic link, and provides our ``dynamic'' symbolic link. This trick at least provides us with an alternate place to deliver mail when things go wrong, rather than bounce or drop the mail.

A cron job on our systems checks the alternate mail spool directory several times a day. Any messages found there are resent to their rightful owners. The remailing script can be run as often as needed. Each invocation of the script deals only with newly lost mail since the previous invocation; the script locks and renames the lost mailbox file to a unique name, before parsing and remailing it.

Similar to amd, hlfsd can log debugging and various status information to a designated log file or using the syslog[22] facility. The SA may choose to watch these log files and facilities and be notified when serious problems occur such as a full filesystem.

4.4 Avoiding Hangs

As described in Section 4.2, hlfsd forks a child at any point where we suspect that an operation might hang. If, for example, the home machine of the user is down, and the filesystem on a client is hard-mounted, hlfsd will hang until the remote server is back up. Performing these operations in the background provides added reliability, an idea taken from amd.

4.5 Disk Space Problems

Hlfsd checks if the user's home directory is full or they exceeded their quota. It attempts to create and then remove a simple nonzero-length file in the user's spool directory, with the effective uid set to that of the user. If that fails, it instead returns back the name of the alternate spool directory as the value of the home symbolic link. Otherwise mail might be dropped or bounce.

Any success or failure state is recorded in hlfsd. It is left there for a specified number of seconds, after which the entry ``times out'' and a new actual backgrounded lookup is required. Otherwise, the cached result is used and no expensive fork is required. This simple caching feature of hlfsd has greatly improved its performance and reliability. See also Section 5.3.

4.6 Lock Files

An alternative design for hlfsd is to have it mount on top of the mail spool directory directly, instead of having the mail spool directory be a symbolic link to another link (home) within the HLFS, which points to a real subdirectory of the user's home. With some modifications to the server, we could have made all of the user's mailbox files point to the right place, but it suffered from serious drawbacks:

The spool directory would no longer be a regular directory. It would have to be managed by hlfsd. This would require the implementation of more NFS operations.
The user's spool file would not be a regular file, but a symbolic link to such. Some mail programs remove that file, not checking if it's a symbolic link. Therefore the symbolic link would be removed. We would have had to change the server so that removing the symbolic link would first follow it and remove the file it was pointing to. The same goes for all operations which require access to the user's mail spool file.
The worst problem was that different UAs and MTAs use different methods for locking the mail file. Some of them create temporary files named ${USER}.lock, others use the mktemp library call to generate unique names. Our method avoids the need to figure out all the different methods used in locking mail files, and usage of temporary files.

An alternate way to avoid the need for lock files is to deliver mail one message per file using a different system such as with INN[19] and NNTP[10]; however, this would require modifications to all UAs and MTAs.

4.7 Source Code Size, Availability, and Portability

Hlfsd is less than 2500 lines of C code, including comments and white-spaces. However, it makes use of almost 4000 lines of code from the amd distribution itself.

Hlfsd is available in source form as part of a special distribution of amd. It can be retrieved via anonymous ftp from ftp.cs.columbia.edu in the directory /pub/amd.

Hlfsd is built as part of the special distribution of amd available from our ftp server. It is almost as portable as amd is. It is only the lack of access to certain machines that stopped us from porting hlfsd to the numerous platforms amd runs on. At the writing of this paper, hlfsd has been successfully ported and running on SunOS 4.1.3, HP-UX 9.0.1, and Solaris 2.2. Those represent the 3 main system types amd runs on and span most Unix flavors: a BSD-style system, an SVR-BSD hybrid, and a system very close to SVR4, respectively.

Next: 5 Evaluation Up: HLFSD: Delivering Email to Previous: 3 Design

Erez Zadok
12/6/1997