OS Reading Group details

Reading Group Session Details

 

 

 

 


Session 14: Speculative Execution in a Distributed File System

 

Date: November 9, 2005 4:00PM

Venue: TBD

Talk by: Avishay Traeger

 

Speculative Execution in a Distributed File System (Local PDF)

Edmund B. Nightingale, Peter Chen, Jason Flinn
Symposium on Operating Systems Principles (SOSP '05)

Abstract:

Speculator provides Linux kernel support for speculative execution. It allows multiple processes to share speculative state by tracking causal dependencies propagated through inter-process communication. It guarantees correct execution by preventing speculative processes from externalizing output, e.g., sending a network message or writing to the screen, until the speculations on which that output depends have proven to be correct. Speculator improves the performance of distributed file systems by masking I/O latency and increasing I/O throughput. Rather than block during a remote operation, a file system predicts the operation's result, then uses Speculator to checkpoint the state of the calling process and speculatively continue its execution based on the predicted result. If the prediction is correct, the checkpoint is discarded; if it is incorrect, the calling process is restored to the checkpoint, and the operation is retried. We have modified the client, server, and network protocol of two distributed file systems to use Speculator. For PostMark and Andrew-style benchmarks, speculative execution results in a factor of 2 performance improvement for NFS over local-area networks and an order of magnitude improvement over wide-area networks. For the same benchmarks, Speculator enables the Blue File System to provide the consistency of single-copy file semantics and the safety of synchronous I/O, yet still outperform current distributed file systems with weaker consistency and safety.




Session 13: The Mach Microkernel

 

Date: October 26, 2005 4:00PM

Venue: TBD

Talk by: Sean Callanan

 

Mach: A New Kernel Foundation For UNIX Development" (Local PDF)

Mike Accetta, Robert Baron, William Bolosky, David Golub, Richard Rashid, Avadis Tevanian and Michael Young
USENIX Summer Conference (Atlanta 1986)

Abstract:

Mach is a multiprocessor operating system kernel and environment under development at Carnegie Mellon University. Mach provides a new foundation for UNIX development that spans networks of uniprocessors and multiprocessors. In contrast to the ongoing trend of adding new functionality to the UNIX kernel, Mach implements all functionality except scheduling, IPC, and memory protection in user-level. We discuss Mach's design, its implementation, and selected architectural issues.




Session 12: IRON File Systems

 

Date: October 12, 2005 4:00PM

Venue: TBD

Talk by: Charles Wright

 

IRON File Systems (Local PDF)

Vijayan Prabhakaran, Lakshmi N. Bairavasundaram, Nitin Agrawal, Haryadi S. Gunawi, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau
Symposium on Operating Systems Principles (SOSP '05)

Abstract:

Commodity file systems trust disks to either work or fail completely, yet modern disks exhibit more complex failure modes. We suggest a new fail-partial failure model for disks, which incorporates realistic localized faults such as latent sector errors and block corruption. We then develop and apply a novel failure-policy fingerprinting framework, to investigate how commodity file systems react to a range of more realistic disk failures. We classify their failure policies in a new taxonomy that measures their Internal RObustNess (IRON), which includes both failure detection and recovery techniques. We show that commodity file system failure policies are often inconsistent, sometimes buggy, and generally inadequate in their ability to recover from partial disk failures. Finally, we design, implement, and evaluate a prototype IRON file system, Linux ixt3, showing that techniques such as in-disk checksumming, replication, and parity greatly enhance file system robustness while incurring minimal time and space overheads.




Session 11: Self-Securing Storage

 

Date: September 28, 2005 4:00PM

Venue: 1223 Applied Logic Conference Room

Talk by: Gopalan Sivathanu

 

Self-Securing Storage: Protecting Data in Compromised Systems (Local PDF)

Strunk, J.D., Goodson, G.R., Scheinholtz, M.L., Soules, C.A.N. and Ganger, G.R

Abstract:

Self-securing storage is an exciting new technology for enhancing intrusion survival by enabling the storage device to safeguard data even when the client OS is compromised. It capitalizes on the fact that storage servers (whether file servers, disk array controllers, or even IDE disks) run separate software on separate hardware. This opens the door to server-embedded security that cannot be disabled by any software (even the OS) running on client systems as shown in the figure above. Of course, such servers have a narrow view of system activity, so they cannot distinguish legitimate users from clever impostors. But, from behind the thin storage interface, a self-securing storage server can actively look for suspicious behavior, retain an audit log of all storage requests, and prevent both destruction and undetectable tampering of stored data. The latter goals are achieved by retaining all versions of all data; instead of over-writing old data when a write command is issued, the storage server simply creates a new version and keeps both. Together with the audit log, the server-retained versions represent a complete history of system activity from the storage system's point of view.




PAST SESSIONS DURING SUMMER 2004 AND FALL 2004




 Session 1: Log-Structured File Systems

 

Date & Time : 24 August 2004, Tuesday, 6-30PM.

Venue: CS 2313A (Conference room)

Talk by:  Gopalan Sivathanu

 

Papers:

 

The Design and Implementation of a Log-Structured File System (Local PDF)

Mendel Rosenblum and John K. Ousterhout, University of California-Berkeley.
Proceedings of the 13th Symposium on Operating Systems Principles (SOSP '91)

Abstract:

This paper presents a new technique for disk storage management called a log-structured file system. A log-structured file system writes all modifications to disk sequentially in a log-like structure, thereby speeding up both file writing and crash recovery. The log is the only structure on disk; it contains indexing information so that files can be read back from the log efficiently. In order to maintain large free areas on disk for fast writing, we divide the log intosegmentsand use a segment cleaner to compress the live information from heavily fragmented segments. We present a series of simulations that demonstrate the efficiency of a simple cleaning policy based on cost and benefit. We have implemented a prototype log-structured file system called Sprite LFS; it outperforms current Unix file systems by an order of magnitude for small-file writes while matching or exceeding Unix performance for reads and large writes. Even when the overhead for cleaning is included, Sprite LFS can use 70% of the disk bandwidth for writing, whereas Unix file systems typically can use only 5-10%.

 

Hylog: A high performance Approach to Managing Disk Layout (Local PDF)

Wenguang Wang, Yanping Zhao, and Rick Bunt, University of Saskatchewan.

Proceedings of the 3rd Annual Conference on File And Storage Technologies (FAST '04)

Abstract:

Our objective is to improve disk I/O performance in multi-disk systems supporting multiple concurrent users, such as file servers, database servers, and email servers. In such systems, many disk reads are absorbed by large in-memory buffers, and so disk writes comprise a large portion of the disk I/O traffic. LFS (Log-structured File System) has the potential to achieve superior write performance by accumulating small writes into large blocks and writing them to new places, rather than overwriting on top of their old copies (called Overwrite). Although it is commonly believed that the high segment cleaning overhead of LFS makes it a poor choice for workloads with random updates, in this paper we find that because of the fast improvement of disk technologies, LFS significantly outperforms Overwrite in a wide range of system configurations and workloads (including the random update workload) under modern and future disks.

LFS performs worse than Overwrite, however, when the disk space utilization is very high due to the high cleaning cost. In this paper, we propose a new approach, the Hybrid Log-structured (HyLog) disk layout, to overcome this problem. HyLog uses a log-structured approach for hot pages to achieve high write performance, and Overwrite for cold pages to reduce the cleaning cost. We compare the performance of HyLog to that of Overwrite, LFS and WOLF (the latest improvement on LFS) under various system configurations and workloads. Our results show that, in most cases, Hylog performs comparably to the best of the other three approaches.


Session 2: FFS-like File System Layout

 

Date: August 31, 2004 6:45PM

Venue: CS 2313A (Conference room)

Talk by: Charles P. Wright

 

A Fast File System for UNIX

Marshall K. McKusick, William N. Joy, Samuel J Leffler, and Robert S. Fabry

Abstract:

A reimplementation of the UNIX file system is described. The reimplementation provides substantially higher throughput rates by using more flexible allocation policies that allow better locality of reference and can be adapted to a wide range of peripheral and processor characteristics. The new file system clusters data that is sequentially accessed and provides two block sizes to allow fast access to large files while not wasting large amounts of space for small files. File access rates of up to ten times faster than the traditional UNIX file system are experienced. Long-needed enhancements to the programmers' interface are discussed. These include a mechanism to place advisory locks on files, extensions of the name space across file systems, the ability to use long file names, and provisions for administrative control of resource usage.

Controling your PLACE in the File System with Gray-box Techniques (Local PDF)

James Nugent, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci Dusseau

Abstract:

We present the design and implementation of PLACE, a gray-box library for controlling file layout on top of FFS-like file systems. PLACE exploits its knowledge of FFS layout policies to let users place files and directories into specific and localized portions of disk. Applications can use PLACE to collocate files that exhibit temporal locality of access, thus improving performance. Through a series of microbenchmarks, we analyze the overheads of controlling file layout on top of the file system, showing that the overheads are not prohibitive, and also discuss the limitations of our approach. Finally, we demonstrate the utility of PLACE through two case studies: we demonstrate the potential of file layout rearrangement in a web-server environment, and we build a benchmarking tool that exploits control over file placement to quickly extract low-level details from the disk system. In the traditional gray-box manner, the PLACE library achieves these ends entirely at user level, without changing a single line of operating system source code.

Other papers possibly of interest:

Journaling Versus Soft Updates: Asynchronous Meta-Data Protection in File Systems (Local PDF)
Margo I. Seltzer, Gregory R. Ganger, M. Kirk McKusick, Keith A. Smith, Craig A. N. Soules, Christopher A. Stein

Transforming Policies into Mechanisms with Infokernel (Local PDF)
Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, Nathan C. Burnett, Timothy E. Denehy, Thomas J. Engle, Haryadi S. Gunawi, James A. Nugent, Florentina I. Popovici

The Orlov block allocator and Directory Allocation Algorithm For FFS


Session 3: Disk Scheduling I

 

Date: September 8, 2004 6:45PM

Venue: CS 2311 (Conference room)

Talk by: Sean Callanan

 

Anticipatory scheduling: A disk scheduling framework to overcome deceptive idleness in synchronous I/O (Local PDF)

Sitaram Iyer and Peter Druschel

Abstract:

Disk schedulers in current operating systems are generally work-conserving, i.e., they schedule a request as soon as the previous request has finished. Such schedulers often require multiple outstanding requests from each process to meet system-level goals of performance and quality of service. Unfortunately, many common applications issue disk read requests in a synchronous manner, interspersing successive requests with short periods of computation. The scheduler chooses the next request too early; this induces deceptive idleness, a condition where the scheduler incorrectly assumes that the last request issuing process has no further requests, and becomes forced to switch to a request from another process.

We propose the anticipatory disk scheduling framework to solve this problem in a simple, general and transparent way, based on the non-work-conserving scheduling discipline. Our FreeBSD implementation is observed to yield large benefits on a range of microbenchmarks and real workloads. The Apache webserver delivers between 29% and 71% more throughput on a disk-intensive workload. The Andrew filesystem benchmark runs faster by 8%, due to a speedup of 54% in its read-intensive phase. Variants of the TPC-B database benchmark exhibit improvements between 2% and 60%. Proportional-share schedulers are seen to achieve their contracts accurately and efficiently.


Session 4: Disk Scheduling II

 

Date: September 15, 2004 6:45PM

Venue: CS 2311 (Conference room)

Talk by: Akshat Aranya

 

Towards Higher Disk Head Utilization: Extracting "Free" Bandwidth From Busy Disk Drives (Local PDF)

Christopher Lumb, Jiri Schindler, Gregory R. Ganger, Erik Riedel and David F. Nagle

Abstract:

Freeblock scheduling is a new approach to utilizing more of a disk's potential media bandwidth. By filling rotational latency periods with useful media transfers, 20-50% of a never-idle disk's bandwidth can often be provided to background applications with no effect on foreground response times. This paper describes freeblock scheduling and demonstrates its value with simulation studies of two concrete applications: segment cleaning and data mining. Free segment cleaning often allows an LFS file system to maintain its ideal write performance when cleaning overheads would otherwise reduce performance by up to a factor of three. Free data mining can achieve over 47 full disk scans per day on an active transaction processing system, with no effect on its disk performance.

Other papers possibly of interest:
A Framework for Building Unobtrusive Disk Maintenance Applications (Local PDF)
Eno Thereska, Jiri Schindler, Christopher R. Lumb, John Bucy, Brandon Salmon, and Gregory R. Ganger


Session 5: The Exokernel Operating System Architecture

 

Date: September 23, 2004 6:45PM

Venue: CS 2311 (Tentative)

Talk by: Abhishek Rai

 

Exokernel: An Operating System Architecture for Application-Level Resource Management (Local PDF)

Dawson Engler, M. Frans Kaashoek, James O'Toole Jr.

Abstract:

Traditional operating systems limit the performance, flexibility, and functionality of applications by fixing the interface and implementation of operating system abstractions such as IPC and virtual memory. The exokernel operating system architecture addresses this problem by providing application-level management of physical resources. In the exokernel architecture, a small kernel securely exports all hardware resources through a low-level interface to untrusted library operating systems. Library operating systems use this interface to implement system objects and policies. This separation of resource protection from management allows application-specific customization of traditional operating system abstractions by extending, specializing, or even replacing libraries.

In this paper, we will go over the design of a prototype exokernel operating system. We will also come across several possible applications of such an architecture, and see how the exokernel architecture helps in their cause. The key idea of this paper is establishing and maintaining 'secure bindings'.

Other papers possibly of interest:
Application Performance and Flexibility on Exokernel Systems (Local PDF)
M. Frans Kaashoek, Dawson R. Engler, Gregory R. Ganger, Héctor M. Briceño, Russell Hunt, David Mazières, Thomas Pinckney, Robert Grimm, John Jannotti, and Kenneth Mackenzie

Exterminate All Operating System Abstractions (Local PDF)
Dawson Engler and M. Frans Kaashoek
Session 6: RAID 6

 

Date: September 30, 2004 6:45PM

Venue: CS 2311 (Conference room)

Talk by: Nikolai Joukov

 

Row-Diagonal Parity for Double Disk Failure Correction (Local PDF)

Peter Corbett, Bob English, Atul Goel, Tomislav Grcanac, Steven Kleiman, James Leong, and Sunitha Sankar

Abstract:

Row-Diagonal Parity (RDP) is a new algorithm for protecting against double disk failures. It stores all data unencoded, and uses only exclusive-or operations to compute parity. RDP is provably optimal in computational complexity, both during construction and reconstruction. Like other algorithms, it is optimal in the amount of redundant information stored and accessed. RDP works within a single stripe of blocks of sizes normally used by file systems, databases and disk arrays. It can be utilized in a fixed (RAID-4) or rotated (RAID-5) parity placement style. It is possible to extend the algorithm to encompass multiple RAID-4 or RAID-5 disk arrays in a single RDP disk array. It is possible to add disks to an existing RDP array without recalculating parity or moving data. Implementation results show that RDP performance can be made nearly equal to single parity RAID-4 and RAID-5 performance.

Other papers possibly of interest:
EVENODD: An efficient scheme for tolerating double disk failures in RAID architectures (Local PDF)
M. Blaum, J. Brandy, J. Bruck, and J. Menon

A Case for Redundant Arrays of Inexpensive Disks (RAID) (Local PDF)
David A Patterson, Garth Gibson, and Randy H Katz


Session 7: The Google File System

 

Date: October 7, 2004 6:45PM

Venue: CS 2311 (Conference room)

Talk by: Avishay Traeger

 

The Google File System (Local PDF)

Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung

Abstract:

We have designed and implemented the Google File System, a scalable distributed file system for large distributed data-intensive applications. It provides fault tolerance while running on inexpensive commodity hardware, and it delivers high aggregate performance to a large number of clients. While sharing many of the same goals as previous distributed file systems, our design has been driven by observations of our application workloads and technological environment, both current and anticipated, that reflect a marked departure from some earlier file system assumptions. This has led us to re-examine traditional choices and explore radically different design points.

The file system has successfully met our storage needs. It is widely deployed within Google as the storage platform for the generation and processing of data used by our service as well as research and development efforts that require large data sets. The largest cluster to date provides hundreds of terabytes of storage across thousands of disks on over a thousand machines, and it is concurrently accessed by hundreds of clients.

In this paper, we present file system interface extensions designed to support distributed applications, discuss many aspects of our design, and report measurements from both micro-benchmarks and real world use.


Session 8: Unix Buffer Cache

 

Date: October 14, 2004 6:45PM

Venue: CS 2311 (Conference room)

Talk by: Naveen Gupta

 

Improving the efficiency of the Unix File Buffer Caches (Local PS)

Andrew Braunstein, Mark Riley, John Wilkes

Abstract:

This paper reports on the effects of using hardware virtual memory assists in managing file buffer caches in UNIX. A controlled experimental environment was constructed from two systems whose only difference was that one of them (XMF) used the virtual memory hardware to assist file buffer cache search and retrieval. An extensive series of performance characterizations was used to study the effects of varying the buffer cache size (from 3 Megabytes to 70 MB); I\O transfer sizes (from 4 bytes to 64 KB); cache-resident and non-cache-resident data; READs and WRITEs; and a range of application programs. The results: small READ/WRITE transfers from the cache (?1 KB) were 5O% faster under XMF, while larger transfers (?8 KB) were 20% faster. Retrieving data from disk, the XMF improvement was 25% and 1O% respectively, although OPEN/CLOSE system calls took slightly longer in XMF. Some individual programs ran as much as 40% faster on XMF, while an application benchmark suite showed a 7-15% improvement in overall execution time. Perhaps surprisingly. XMF had fewer translation lookaside buffer misses.


Session 9: Windows Driver Model

 

Date: November 18, 2004 6:45PM

Venue: CS 2311 (Conference room)

Talk by: Rakesh Iyer

 

Windows Driver Model

Rakesh N Iyer

Abstract:

This seminar provides an introduction to the Windows kernel architecture.WDM was introduced in Win98 Second Edition and represents an architecture for driver development. The architecture allows for easy and portable development of drivers. i.e. A WDM driver developed on one Windows platform say(Windows 2k) would work on any other platform.This architecture emphasizes Plug and Play and Power management which are important features for end users as we move towards a wireless world. This seminar will cover some of the parts of the kernel such as the I/O manager and most importantly the IRP, which is the most important kernel data structure in Windows.


Session 10: Windows File Systems

 

Date: December 9, 2004 6:45PM

Venue: CS 2311 (Conference room)

Talk by: Rakesh Iyer

 

Windows File Systems

Rakesh N Iyer

Abstract:

The seminar will cover:

  • Brief overview of fields of the Irp(carried over from the previous seminar) that are relevant to the File system
  • How File systems stack and the mount process
  • File Objects, Context structures
  • SectionObject and its use
  • File Caching and unified model for shared memory
  • Writing a File system filter driver.
  • The Cryptfs (In) Experience


 

This page will be updated regularly