HLFS is a read-only filesystem, and as such, all operations that require write access return the error code NFSERR_ROFS (``Read-Only Filesystem''): setattr, write, create, remove, rename, link, unlink, symlink, mkdir, and rmdir. Trivially implemented were the null, root, and writecache operations. We decided to have statfs return some value (all zeros in most cases). The read operation simply returns NFSERR_ACCES (``Permission Denied'').
The remaining operations are the heart of this filesystem: readdir, getattr, lookup, and readlink.
Our server must distinguish between the directory and the link, so we assigned them different integers to serve as filehandles. Note that these need not be as complicated as the filehandles usually generated by NFS. They need only to be unique, and their value is meaningful only to the server.
Opening this directory returns the ``.'' and ``..'' directories, and one symbolic link, home. Attempting to readdir on the symbolic link results in an NFSERR_NOTDIR. Anything else is a stale filehandle.
Getattr returns r-xr-xr-x for the ``.'' and ``..'' directories. The link itself, named home by default, is protected as rwxrwxrwx. It does not matter for the link that it is world-writable. The modification and creation times for the link and directories are the startup time of the server. If the effective gid of the process is HLFS_GID, then some fixed valid attributes are returned. Any other filehandle given to hlfsd is considered stale and the NFSERR_STALE (``Stale Filehandle'') result code is returned.
Obviously, we only allow looking up in the ``.'' and ``..'' directories, both of which return the same values. Trying to lookup ``in'' the link results in an NFSERR_NOTDIR (``Not a Directory'') error code. Any link not known to the server returns an NFSERR_NOENT (``No Such Entry'') error, unless the gid of the requesting process is HLFS_GID and the name corresponds to a valid user. In this case the username for that user is used in the returned filehandle, allowing the readlink operation to return the correct link. Anything else is a stale filehandle.
This is the most important operation, the central point of this work. We get the uid number from the credentials sent with the RPC operation. We make sure that Unix Authentication or DES is used or else we return the NFSERR_PERM (``Not Owner'') code.
If the gid of the accessing process is not HLFS_GID, the value we return for the symbolic link named home is a string representing the home directory of the user whose uid we just found, concatenated with a fixed component name representing a subdirectory within it. We used a binary search on the lookup table to quickly get the right pathname. Different home directories for multiple password database entries with the same uid numbers may return any of the home directories. Only uid 0 is guaranteed to return ``/''. See also Section 5.3.
If the symbolic link is named home and the gid is HLFS_GID, we return a link to ``.'', which causes hlfsd to be used to resolve the next pathname component. This is designed to maintain functionality of programs such as from. If the symbolic link is not named home and the gid of the accessing process is HLFS_GID, we return a value pointing to the user's mailbox file in their mail spool directory. To do this, we extract the username from the filehandle, which was returned by the lookup operation. See Table 2.
Trying to readlink on one of the two directories results in an NFSERR_ISDIR (``Is a directory'') error. Anything else is a stale filehandle.
Meanwhile the parent waits for the child to initialize. When it does, the parent mounts the server on the mount point. Of utmost importance is to make sure the attribute cache is turned off. If the attribute cache is not turned off, successive accesses to /mail/home would return previously computed pathnames pointing to another user's mail, resulting in mail loss or misdelivery. If it is not possible to turn off the attribute cache, hlfsd will exit. However, the SA has the option to force hlfsd to continue running and set the attribute cache to as short an interval as possible (See also Section 5.3). At this point the parent terminates, leaving the child to run.
When an interval timer goes off (SIGALRM) or a SIGHUP is sent to the server, the server forks a child that continues serving, while the parent reloads the lookup table. When the parent is finished loading, it sends a SIGKILL to the child process, and resumes serving. When a SIGTERM is received, the server forks a child that continues serving, while it tries to unmount the filesystem. If and when that succeeds, both parent and child exit.
As mail service is very important, we wanted to make hlfsd as robust as possible. We could have designed it as another amd ``filesystem type'', but decided that a separate daemon provides better reliability and faster service. In general, we try to do as much as possible: we make sure filesystems are accessible and contain some disk space to have mail delivered there. Where directories are expected we make sure there are no files by these names; where symbolic links are expected, we make sure there are no real files or directories with the same name. Whenever possible, we create directories, with proper ownership and permissions. We even check that the mount point for hlfsd is world readable and executable, since if it isn't, getwd("..") might fail.
|component||Pathname left||Value if symbolic link|
When hlfsd starts up, and before it mounts itself on top of the mount point, hiding anything that is underneath, hlfsd creates a fixed symbolic link to the alternate spool directory (if it does not exist already). This is done so that /var/spool/mail would not be a ``dangling'' symbolic link, and points to a real directory at all times, even after hlfsd terminates. When hlfsd runs, it hides this symbolic link, and provides our ``dynamic'' symbolic link. This trick at least provides us with an alternate place to deliver mail when things go wrong, rather than bounce or drop the mail.
A cron job on our systems checks the alternate mail spool directory several times a day. Any messages found there are resent to their rightful owners. The remailing script can be run as often as needed. Each invocation of the script deals only with newly lost mail since the previous invocation; the script locks and renames the lost mailbox file to a unique name, before parsing and remailing it.
Similar to amd, hlfsd can log debugging and various status information to a designated log file or using the syslog facility. The SA may choose to watch these log files and facilities and be notified when serious problems occur such as a full filesystem.
Any success or failure state is recorded in hlfsd. It is left there for a specified number of seconds, after which the entry ``times out'' and a new actual backgrounded lookup is required. Otherwise, the cached result is used and no expensive fork is required. This simple caching feature of hlfsd has greatly improved its performance and reliability. See also Section 5.3.
An alternate way to avoid the need for lock files is to deliver mail one message per file using a different system such as with INN and NNTP; however, this would require modifications to all UAs and MTAs.
Hlfsd is available in source form as part of a special distribution of amd. It can be retrieved via anonymous ftp from ftp.cs.columbia.edu in the directory /pub/amd.
Hlfsd is built as part of the special distribution of amd available from our ftp server. It is almost as portable as amd is. It is only the lack of access to certain machines that stopped us from porting hlfsd to the numerous platforms amd runs on. At the writing of this paper, hlfsd has been successfully ported and running on SunOS 4.1.3, HP-UX 9.0.1, and Solaris 2.2. Those represent the 3 main system types amd runs on and span most Unix flavors: a BSD-style system, an SVR-BSD hybrid, and a system very close to SVR4, respectively.