next up previous
Next: 4 Rubberd Up: The Design and Implementation Previous: 2 Elastic Quota Usage

Subsections


3 Elastic Quota File System

To support the elastic quota usage model, we created the Elastic Quota File System (EQFS). An important benefit of the elastic quota system is that it allows elastic files to be mixed together with persistent files and located in any directory in the file system. To provide this benefit, the system must efficiently find elastic files anywhere in the directory hierarchy. Furthermore, the system must account for the disk usage of persistent files separately from elastic files since only persistent files are counted against a user's quota. With substantial implementation effort, one could build a new file system from scratch with an elasticity attribute associated with each file and a quota system that accounts for elastic and persistent files separately. The problem with this approach is that it requires users to migrate to an entirely new file system to use elastic quotas.

EQFS addresses these design issues by first storing persistent and elastic files in separate underlying directories to efficiently account for and identify elastic files. EQFS then using file system stacking [10,21,25] to stack on top of both the persistent and elastic directories to present a unified view of the files to the user. Using file system stacking, a thin layer is inserted directly above an existing file system, thus allowing the layer to intercept and modify requests coming from upper layers or data being returned from lower layers. Although stackable file systems run in kernel space for best performance, they do not require kernel modifications and can extend file system functionality in a portable way [32].

Section 3.1 describes how EQFS stacks on top of both persistent and elastic directories, and how this supports the multiple views. Section 3.2 describes how EQFS utilizes the separation of persistent and elastic storage with traditional quota functionality to provide efficient disk usage accounting. Finally, Section 3.3 summarizes the implementation of individual EQFS file operations.


3.1 File System Stacking

One of the main features that EQFS must provide is a way to associate an attribute with each file that indicates whether it is elastic or persistent. Taking a file system stacking approach, one way to do this would be to store a separate attributes file for each file in the underlying file system that is manipulated by the upper layer file system. The approach provides a general way to extend file attributes, but would require accessing an entirely separate file for determining whether the respective file is elastic. However, it requires substantial additional overhead for an elasticity attribute that could potentially be stored as a single bit of information. Another design alternative would be for the stackable file system to manipulate special-purpose inode bits on the lower file system, to be able to flag a file as elastic. However, stackable file systems are designed to be modular and independent from the underlying file systems they mount on. Such access violates the principles of stacking as it makes a stackable file system dependent on the specific implementation details of the underlying file system.

To provide an efficient stacking approach, we designed EQFS to stack on top of two underlying directories in the native disk file system, one for storing all persistent files and the other for storing all elastic files. Because of the separate directories for persistent and elastic files, EQFS can infer whether a file is persistent or elastic from the location of the file. Although the system introduces a new file property -- namely its persistence or lack thereof -- EQFS does not need to store this property as part of the on-disk inode. In fact, EQFS does not maintain any state other than what it uses to stack on top of the underlying persistent and elastic directories. EQFS can be stacked on top of any existing file system exporting the Virtual File System (VFS) interface [14], such as UFS [16] or EXT2FS [4]. The VFS was designed as a system-independent interface to file systems and is now universally present in UNIX operating systems, including Solaris, FreeBSD, and Linux. By building on top of the VFS, EQFS serves as a higher-level file system abstraction that does not need to know about the specifics of the underlying file system.

In the VFS, a virtual node (vnode) is a handle to a file maintained by a running kernel. This handle provides a common view for public data associated with a file, and is the vehicle for interaction between the kernel proper and the file system. Vnodes export an interface for the set of generic operations commonly applicable to files and directories, known as vnode operations (vops). A stackable file system is one that stacks its vnodes on top of those of another file system. Stacked file systems are thus able to modify the kernel's view of the underlying file system by intercepting data and requests flowing between underlying file systems and the kernel proper through their vnode-private data and stacked vops. Our design of EQFS provides four important benefits:

  1. Compatibility with existing file systems: Because EQFS simply stacks on top of existing file systems, it is compatible with and does not require any changes to existing file systems. Furthermore, EQFS can be used with commodity file systems already deployed and in use. EQFS is ignorant of the underlying file systems and makes no assumptions about the underlying persistent and elastic directories. In particular, the underlying directories need not be on the same file system, or even of the same file system type.

  2. No modifications to commodity operating systems: Since EQFS stacks on top of the widely used VFS interface, EQFS can be implemented as a kernel module that can be loaded and used without modifying the kernel or halting system operation. Users can therefore use elastic quotas in the large installed base of commodity operating systems without upgrading to an entirely new system.

  3. Leveraging existing development investments: EQFS leverages existing functionality in file systems instead of replicating it. EQFS is a thin layer of functionality that extends existing disk-based file systems rather than replacing them. EQFS's ability to reuse existing file system functionality results in a much simpler implementation.

  4. Low performance overhead: Since file system performance is often crucial to overall system performance, elastic quotas should impose as little performance overhead as possible. EQFS runs in kernel space to minimize performance overhead.

EQFS is a thin stackable layer that presents multiple views of the underlying persistent and elastic directories as /home, /ehome, /persistent, and /elastic, as described in Section 2. To provide a unified view of all files, EQFS creates /home and /ehome by merging the contents of the underlying persistent and elastic directories. For example, if files A and B are stored in one directory and C and D are stored in another, merging the two directories will result in a unified directory that contains A, B, C, and D. /persistent and /elastic are created by simply referring to the respective underlying persistent and elastic directories. Figure 1 illustrates the structure of the views and underlying directories in EQFS.

Figure: Views and directories in EQFS
\begin{figure}\begin{centering}
\epsfig{file=figures/eqfs.eps, width=3.15in}\vspace{-1.00em}\end{centering}\end{figure}

EQFS makes merging of the underlying persistent and elastic directories possible by ensuring that both directories have the same structure and by avoiding file name conflicts. As discussed in Section 2, each of the four views exports the same directory structure to the user. Similarly, for each directory visible from these views, EQFS maintains a corresponding underlying directory for persistent files and a corresponding underlying directory for elastic files. If the directory structures were not the same, it would be ambiguous how to unify the two structures when some directories could be present in one but not the other. EQFS avoids file name conflicts by not exposing the underlying directories directly to users and only permitting file creation through /home and /ehome. /persistent and /elastic cannot be used for file creation. If the underlying directories were not protected, a user could create a file in the persistent directory and a file in the elastic directory, both with the same name. This would cause a file name conflict when the underlying directories are unified. File name conflicts are not possible using the views.

EQFS discriminates between the underlying directories unified in /home and /ehome in order to make file creations in /home persistent and file creations in /ehome elastic. EQFS unifies the two underlying directories by treating one of them as the primary directory and the other one as the secondary sister directory. The contents of the two directories are joined into a unified view, but any file creations are always made to the primary directory. EQFS populates /home by treating the underlying persistent directory as the primary directory and the underlying elastic directory as the sister directory. Conversely, /ehome has the elastic directory as a primary underlying directory and the persistent directory as a sister underlying directory. As a result, elastic files created in /ehome are elastic because the underlying primary directory is elastic.


3.2 Disk Usage Accounting

Quotas usually keep track of disk blocks or inodes allocated to each user or group. Traditional quota systems are implemented by specific file system code. EQFS utilizes this native quota functionality to simplify its implementation. However, as regular quotas do not have elastic files for which no usage limits exist, EQFS must build these extended semantics using existing primitives.

EQFS solves this disk usage accounting problem by defining a shadow user ID for each user. A shadow user ID is a second unique user ID that is internally assigned by EQFS to each user of elastic quotas. EQFS uses a mapping between normal user IDs and shadow user IDs that allows it to deduce one ID from the other in constant time. Persistent files are owned by and accounted for using normal user IDs, whereas elastic files are owned by and accounted for using shadow IDs. Shadow user IDs are made to infinite quotas, allowing disk space used by elastic files to not be limited by users' quotas. The shadow ID mapping used in our EQFS implementation defines the shadow ID for a given user ID as its twos-complement. Since the user ID is typically a 32-bit integer in modern systems and even the largest systems have far fewer than two billion users, at least half of the user ID space is unused. Our implementation takes advantage of the large underutilized ID space using the simple twos-complement mapping.

Rubberd takes advantage of the underlying quota system to obtain information on users' elastic space consumption in constant time. Even though there is no quota limit set on a user's shadow ID, the quota system still accounts for the elastic file usage.


3.3 File Operations

EQFS provides its own set of vnode operations (vops), most of which can be summarized as transforming the user ID to the shadow user ID if necessary, and then passing the operation on to the underlying vnode(s). The most notable exceptions to this generalization are the following vops which require additional functionality to maintain EQFS semantics: LOOKUP, READDIR, RENAME, MKDIR,CREATE, and LINK.

The LOOKUP vop returns the vnode for the given file name in the given directory. Since EQFS directory vnodes are associated with two underlying directories, LOOKUP must potentially search both directories for the file before returning the EQFS version of the underlying vnode. To enforce the invariant of always having two underlying directories for an EQFS directory, EQFS lazily creates missing directories in /elastic or /persistent if it cannot find them. This makes it easy to migrate existing file systems to EQFS; simply mount /persistent on a spare partition, and the first access to a directory will cause its sister to be created.

READDIR returns a subset of entries in a directory. Since directories in an EQFS mount are mirrored in both underlying sources, any given EQFS directory will contain duplicates for its subdirectories, which are eliminated before being returned by the kernel. Our READDIR implementation caches the merged result of both underlying directories for improved performance.

RENAME slightly departs from its traditional semantics to support changing the elasticity of a file; if the file names are the same and the target directory corresponds to the same logical directory on the mirror mount point (i.e., /ehome/mary and /home/mary), the file is moved to the mirror mount's primary directory, and its ownership updated accordingly. Thus, renaming a file from /home/mary/foo to /ehome/mary/foo will make it elastic, and the converse will make it persistent.

MKDIR creates a directory and returns the corresponding vnode. Under EQFS, this vop first checks both the primary and sister sources to make sure that there are no entries with the given name, passing the operation down to both file systems to create the named directory if successful. Directories can be created under either /home or /ehome, but they are mirrored persistently under both views. Note that directories are considered to be persistent, and like persistent files, are only removed if done so explicitly by the user.

CREATE creates the named file if it does not exist, otherwise it truncates the file's length to zero. Like the MKDIR vop, it must first check both sources to make sure that it does not create duplicates. If the file does not exist, the the file is always created in the primary directory, as outlined earlier. Note that to prevent namespace collisions when /persistent and /elastic are merged into /home and /ehome, the system must disallow direct namespace additions such as new files, directories or links to these underlying directories without passing through the EQFS layer. To ensure this, /persistent and /elastic are covered by a thin loopback-like file system layer which maintains no state and passes all operations on to the underlying file system, sans namespace additions.

LINK creates hard links. Hard links are created as inheriting the elastic properties of the file that is being linked, regardless of whether the operation is done under /home or /ehome. When a hard link is created to a persistent file, the hard link is considered persistent; a hard link that is created to an elastic file is considered elastic. Hard links to files across /home and /ehome are disallowed to avoid conflicting semantics in which a file is linked as both persistent and elastic.


next up previous
Next: 4 Rubberd Up: The Design and Implementation Previous: 2 Elastic Quota Usage
Erez Zadok 2002-06-21