4 Evaluation

This system is implemented and is receiving use on a limited number of machines.

The goal of this work is to improve overall file system performance -- under certain circumstances, at least -- and to improve it enough to justify the extra complexity. For this method to really work, it must have:

1.: Low overhead latency measurement between switches.
2.: A quick switch.
3.: Low overhead access to the replacement after a switch.
4.: No anomalies or instabilities, like ping-pong switching.
5.: No process hangs due to server failures.
6.: No security or administrative complications.

We have carried out several measurements aimed at evaluating how well our mechanism meets these goals.

The overhead between switches is that of performance monitoring. The added cost of timing every rfscall() we found too small to measure. The cost of computing medians could be significant, since we retain 300 values. But we implemented a fast incremental median algorithm that requires just a negligible fraction of the time in nfs_lookup(). The kernel data structures are not so negligible: retaining 300 latency measurements costs about 2KB per file system. The reason for the expansion is the extra pointers that must be maintained to make the incremental median algorithm work. The extra fields in the struct vfs, struct vnode, struct file are small, with the exception of the DFT, which is large. The current size of each (per-filesystem) DFT is 60 slots which occupy a total of 1KB-2KB on average.

Our measured overall switch time is approximately 3 sec. This is the time between the request for a new file system and when the new file system is mounted (messages 1-8 in Figure 3). Three seconds is comparable to the time needed in our facility to mount a file system whose location is already encoded in Amd's maps, suggesting that most of the time goes to the mount operation.

The overhead after a switch consists mostly of doing equivalence checks outside the kernel; the time to access the vfs of the replacement file system and DFT during au_lookuppn() is immeasurably small. Only a few milliseconds are devoted to calling checksumd: 5-7 msec if the checksum is already computed. This call to checksumd is done once and need not be done again so long as a record of equivalence remains in the DFT.

A major issue is how long to cache DFT entries that indicate equivalence. Being stateless, NFS does not provide any sort of server-to-client cache invalidation information. Not caching at all ensures that files on the replacement file system are always equal to those on the master copy; but of course this defeats the purpose of using the replacement. We suppose that most publicly-exported read-only file systems have their contents changed rarely, and thus one should cache to the maximum extent. Accordingly, we manage the DFT cache by LRU.

As mentioned above, switching instabilities are all but eliminated by preventing switches more frequently than every 5 minutes.

4.1 Experience

4.1.1 What is Read-Only

Most of the files in our facility reside on read-only file systems. However, sometimes one can be surprised. For example, GNU Emacs is written to require a world-writable lock directory. In this directory Emacs writes files indicating which users have which files in use. The intent is to detect and prevent simultaneous modification of a file by different processes. A side effect is that the ``system'' directory in which Emacs is housed (at our installation, /usr/local) must be exported read-write.

Deployment of our file service spurred us to change Emacs. We wanted /usr/local to be read-only so that we could mount replacements dynamically. Also, at our facility there are several copies of /usr/local per subnet, which defeats Emacs' intention of using /usr/local as a universally shared location. We re-wrote Emacs to write its lock files in the user's home directory since (1) for security, our system administrators wish to have as few read-write system areas as possible and, (2) in our environment by far the likeliest scenario of simultaneous modification is between two sessions of the same user, rather than between users.

4.1.2 Suitability of Software Base

Kernel. The vfs and vnode interfaces in the kernel greatly simplified our work. The hot replacement, in particular, proved far easier than we had feared, thanks to the vnode interface. The special out-of-kernel RPC library also was a major help. Nevertheless, work such as ours makes painfully obvious the benefits of implementing file service out of the kernel. The length and difficulty of the edit-compile-debug cycle, and the primitive debugging tools available for the kernel were truly debilitating.

RLP. RLP was designed in 1983, when the evils of over-broadcasting were not as deeply appreciated as they are today and when there were few multicast implementations. Accordingly, RLP is specified as a broadcast protocol. A more up-to-date protocol would use multicast. The benefits would include causing much less waste (i.e., bothering hosts that lack an RLP daemon) and contacting many more RLP daemons. Not surprisingly, we encountered considerable resistance from our bridges and routers when trying to propagate an RLP request. A multicast RLP request would travel considerably farther.

NFS. NFS is ill-suited for ``cold replacement'' (i.e., new opens on a replacement file system) caused by mobility, but is well suited for ``hot replacement'' because of its statelessness.

NFS' lack of cache consistency callbacks has long been bemoaned, and it affects this work since there is no way to invalidate DFT entries. Since we restrict ourselves to read-only files, the danger is assumed to be limited, but is still present. Most newer file service designs include cache consistency protocols. However, such protocols are not necessarily a panacea. Too much interaction between client and server can harm performance, especially if these interactions take place over a long distance and/or a low bandwidth connection. See [27] for a design that can ensure consistency with relatively little client-server interaction.

The primary drawback of using NFS for mobile computing is its limited security model. Not only can a client from one domain access files in another domain that are made accessible to the same user ID number, but even a well-meaning client cannot prevent itself from doing so, since there is no good and easy way to tell when a computer has moved into another uid/gid domain.

Next: 5 Related Work Up: Discovery and Hot Replacement Previous: 3 Design

Erez Zadok
12/6/1997