next up previous
Next: 4. Evaluation Up: Usenetfs: A Stackable File Previous: 2. Design

Subsections

   
3. Implementation

The implementation of Usenetfs proceeded according to the design. We began by using a loopback file system (lofs) and modified it to our needs.

Each Vnode operation that handles a file name such as lookup, open and unlink calls a simple function that converts the file name to a one-level deep directory hierarchy. For example here is the (slightly simplified) vnode operation to remove a file:

usenetfs_unlink(inode_t *dir, char *name, int len)
{
  int err = -EPERM;
  inode_t *i_dir = get_interposed_vp(dir);

  if (dir->i_mode & S_ISUID) {
    err = get_transit_dir(name, len, &i_dir);
    if (err < 0)
      return err;
    err = i_dir->i_op->unlink(i_dir, name, len);
  }
 return err;
}

The function retrieves the stacked on (interposed) vnode from the current directory vnode. It continues by checking if the directory where the file name needs to be removed is managed by Usenetfs (whether the setuid bit is on.) If so, it calls the routine get_transit_dir to get the vnode for the directory where the file is actually located. This routine looks at the file name and computes the directory where the file should be. For example if we want to remove the file name 987, this function gets the vnode for the directory 0987; that value is put into i_dir. Finally, the unlink function calls the same operation on the interposed directory and returns its result.

   
3.1 Directory Reading

The one complication we were faced was with the ``readdir'' vnode operation. Readdir is implemented in the kernel as a restartable function. A user process calls the readdir C library call, which is translated to repeated calls to the getdents(2) system call, passing it a buffer of a given size. The buffer is filled by the kernel with enough bytes representing files within a directory being read, but no more. If the kernel has more bytes to offer the process (i.e. the directory has not been completely read) it will set a special EOF flag to false. As long as the user process sees that the flag is false, it must call getdents(2) again. Each time it does so, it will read more bytes starting at the file offset of the opened directory as was left off from the last read.

The important issue with respect to directory reading is not how to handle the file names, but how to continue reading the directory from exactly the offset it was left off the last time. Since the readdir kernel function needs to be implemented as a restartable call, the file system has to store some state in one of the returning variables or structures so that it may be passed back to the readdir call upon the next invocation; at that time the call must continue reading the directory exactly where it left off previously.

We chose Linux as our first development platform for one main reason: directory reading is simpler in the Linux kernel. In other operating systems such as Solaris, we have to read a number of bytes from the interposed file system, and parse them into chunks of sizeof(struct dirent) that have the actual file name characters appended to. It is cumbersome and asks the file system to perform a lot of bookkeeping. In Linux, much of that complexity was moved elsewhere to more generic code that is outside the main implementation of the file system. A file system developer would provide the Linux kernel a callback function for iterating over the entries in a directory. This function will be called by higher level code on each file name. It was easier for us to provide such a function, which in conjunction with our version of readdir proceeded to read directories as follows:

1.
Generate an entry for ``.'' and ``..'' and return those first.

2.
Read the special directory ``aaa'' and return entries within in the order they were read.

3.
Read all the directories 000, 001, through 999 and return entries within, also in order.


next up previous
Next: 4. Evaluation Up: Usenetfs: A Stackable File Previous: 2. Design
Erez Zadok
1999-02-17