USENET is a popular world-wide network consisting of thousands of discussion and informational ``news groups.'' Many of these are very popular and receive thousands of articles each day. In addition, many control messages are exchanged between news servers around the world, a large portion of which are article cancellation messages generated by anti-spam detection software. All of these articles and messages must be processed fast. If they are not, new incoming articles may be dropped.
Traditional Unix file system directories are structured as a flat, linear
sequence of entries representing files. When the operating system wants to
lookup an entry in a directory with N entries, it may have to search all
N entries to find the file in question. Portions of directories are often
cached in the file system, so that subsequent lookups do not have to
retrieve the data from disk. Table 1 shows the frequency
of all file system operations that use a pathname on our news spool over a
period of 24 hours.1
The table shows that the bulk of all
operations are for looking up files, so these should run very fast
regardless of the directory size.
These requirements necessitate a powerful news server that can copy memory fast, and have fast disks and I/O. As demands grow, the ability of the news server to process articles diminishes to a point where it starts rejecting or ``dropping'' articles. The effort to upgrade a site's news server is significant: large amounts of data need to be copied to a new server as fast as possible, because while an upgrade is in progress, new articles are not processed and can be lost.
In practice, many sites have resorted to reducing the number of articles in use by removing large newsgroups from their distribution and expiring articles more often, sometimes as often as several times a day. Most site administrators accepted the fact that their news servers will lose articles on occasion.
For example, our department runs an average size news server. We have several hundred users and three feeds from neighboring sites. Our server has had two major upgrades in the past 5 years, and several smaller upgrades in between. The major upgrades were from SunOS 4.1.3, to Solaris 2.x, and the last one was to Linux 2.0. Each major upgrade included news server (INN) software upgrade, a faster CPU, more memory, and more and faster disk space. Our previous news server was running on a Sun SparcStation 5 with 8GB of stripped disk space, 196MB of RAM, and Fast Ethernet. But the CPU and I/O bus had not been able to keep up with traffic, and for the last two years of that server's life, it kept on losing more and more articles. Just before it was replaced, our old news server was dropping 50% of all articles.
A few months ago we upgraded our news server to an AMD K6/200Mhz with faster disks and tripled the overall disk space available. We used the top-of-the-line SCSI cards and Fast Ethernet adapters. We also upgraded the operating system to Linux 2.0.34, because the Linux operating system is a small, fast, and highly optimized for the x86 platform. In addition, Linux's disk based file system (ext2fs) has two features useful for optimizing disk performance:
Since the upgrade, our new news server had dropped no articles, and has kept up with traffic. However, we have noticed that its network utilization is over 80% and that more disk space is constantly being added. At the current growth rate, we expect it to outrun its capabilities in a couple of years.
Several current solutions are available to the problem of slow performance of large directories used with news servers. They fall into one of two categories:
These solutions suffer from several problems.
Our approach modifies neither the news server/client software nor the native file systems.
Usenetfs is a small file system based on the loopback (lofs)[SMCC92] one. Usenetfs mounts (``stacks'') itself on top of a news spool hierarchy and interfaces between existing news software and disk based file systems, a seen in Figure 1. It makes a hierarchy of many small directories appear to be a single large flat directory.
``Vnode
Stacking''[Heidemann94,Rosenthal92,Skinner93]
is a technique for modularizing file system functions, by allowing one vnode
interface to call another. Before stacking existed, there was only a single
vnode interface; higher level operating system code called the vnode
interface which in turn called code for a specific file system. With vnode
stacking, several vnode interfaces may exist and may call each other in
order.