Session 14: Speculative Execution in a Distributed File System
Date: November 9, 2005 4:00PM
Venue: TBD
Talk by:
Avishay Traeger
Speculative Execution in a Distributed File System (Local PDF)
Edmund B. Nightingale, Peter Chen, Jason Flinn
Symposium on Operating Systems Principles (SOSP '05)
Abstract:
Speculator provides Linux kernel support for speculative execution. It
allows multiple processes to share speculative state by tracking causal
dependencies propagated through inter-process communication. It
guarantees correct execution by preventing speculative processes from
externalizing output, e.g., sending a network message or writing to the
screen, until the speculations on which that output depends have proven
to be correct. Speculator improves the performance of distributed file
systems by masking I/O latency and increasing I/O throughput. Rather
than block during a remote operation, a file system predicts the
operation's result, then uses Speculator to checkpoint the state of the
calling process and speculatively continue its execution based on the
predicted result. If the prediction is correct, the checkpoint is
discarded; if it is incorrect, the calling process is restored to the
checkpoint, and the operation is retried. We have modified the client,
server, and network protocol of two distributed file systems to use
Speculator. For PostMark and Andrew-style benchmarks, speculative
execution results in a factor of 2 performance improvement for NFS over
local-area networks and an order of magnitude improvement over
wide-area networks. For the same benchmarks, Speculator enables the
Blue File System to provide the consistency of single-copy file
semantics and the safety of synchronous I/O, yet still outperform
current distributed file systems with weaker consistency and safety.
Session 13: The Mach Microkernel
Date: October 26, 2005 4:00PM
Venue: TBD
Talk by:
Sean Callanan
Mach: A New Kernel Foundation For UNIX Development" (Local PDF)
Mike Accetta, Robert Baron, William Bolosky, David Golub, Richard Rashid,
Avadis Tevanian and Michael Young
USENIX Summer Conference (Atlanta 1986)
Abstract:
Mach is a multiprocessor operating system kernel and environment
under development at Carnegie Mellon University. Mach provides a new
foundation for UNIX development that spans networks of uniprocessors
and multiprocessors. In contrast to the ongoing trend of adding new
functionality to the UNIX kernel, Mach implements all functionality
except scheduling, IPC, and memory protection in user-level. We
discuss Mach's design, its implementation, and selected architectural
issues.
Session 12: IRON File Systems
Date: October 12, 2005 4:00PM
Venue: TBD
Talk by: Charles Wright
IRON File Systems (Local PDF)
Vijayan Prabhakaran, Lakshmi N. Bairavasundaram, Nitin
Agrawal, Haryadi S. Gunawi, Andrea C. Arpaci-Dusseau, Remzi H.
Arpaci-Dusseau
Symposium on Operating Systems Principles (SOSP '05)
Abstract:
Commodity file systems trust disks to either work or fail
completely, yet modern disks exhibit more complex failure
modes. We suggest a new fail-partial failure model for disks,
which incorporates realistic localized faults such as latent
sector errors and block corruption. We then develop and apply
a novel failure-policy fingerprinting framework, to
investigate how commodity file systems react to a range of
more realistic disk failures. We classify their failure
policies in a new taxonomy that measures their Internal
RObustNess (IRON), which includes both failure detection and
recovery techniques. We show that commodity file system
failure policies are often inconsistent, sometimes buggy, and
generally inadequate in their ability to recover from partial
disk failures. Finally, we design, implement, and evaluate a
prototype IRON file system, Linux ixt3, showing that
techniques such as in-disk checksumming, replication, and
parity greatly enhance file system robustness while incurring
minimal time and space overheads.
Session 11: Self-Securing Storage
Date: September 28, 2005 4:00PM
Venue: 1223 Applied Logic Conference Room
Talk by: Gopalan Sivathanu
Self-Securing Storage: Protecting Data in Compromised
Systems (Local PDF)
Strunk, J.D., Goodson, G.R., Scheinholtz, M.L., Soules, C.A.N. and Ganger, G.R
Abstract:
Self-securing storage is an exciting new technology for
enhancing intrusion survival by enabling the storage device to
safeguard data even when the client OS is compromised. It capitalizes
on the fact that storage servers (whether file servers, disk array
controllers, or even IDE disks) run separate software on separate
hardware. This opens the door to server-embedded security that cannot
be disabled by any software (even the OS) running on client systems as
shown in the figure above. Of course, such servers have a narrow view
of system activity, so they cannot distinguish legitimate users from
clever impostors. But, from behind the thin storage interface, a
self-securing storage server can actively look for suspicious
behavior, retain an audit log of all storage requests, and prevent
both destruction and undetectable tampering of stored data. The latter
goals are achieved by retaining all versions of all data; instead of
over-writing old data when a write command is issued, the storage
server simply creates a new version and keeps both. Together with the
audit log, the server-retained versions represent a complete history
of system activity from the storage system's point of view.
PAST SESSIONS DURING SUMMER 2004 AND FALL 2004
Session 1: Log-Structured File Systems
Date & Time : 24 August 2004, Tuesday, 6-30PM.
Venue: CS 2313A (Conference room)
Talk by: Gopalan Sivathanu
Papers:
The Design and Implementation of a Log-Structured File System (Local PDF)
Mendel Rosenblum and John K. Ousterhout, University of California-Berkeley.
Proceedings of the 13th Symposium on Operating Systems Principles (SOSP '91)
Abstract:
This paper presents a new technique for disk storage management called a log-structured file system. A log-structured file system writes all modifications to disk sequentially in a log-like structure, thereby speeding up both file writing and crash recovery. The log is the only structure on disk; it contains indexing information so that files can be read back from the log efficiently. In order to maintain large free areas on disk for fast writing, we divide the log intosegmentsand use a segment cleaner to compress the live information from heavily fragmented segments. We present a series of simulations that demonstrate the efficiency of a simple cleaning policy based on cost and benefit. We have implemented a prototype log-structured file system called Sprite LFS; it outperforms current Unix file systems by an order of magnitude for small-file writes while matching or exceeding Unix performance for reads and large writes. Even when the overhead for cleaning is included, Sprite LFS can use 70% of the disk bandwidth for writing, whereas Unix file systems typically can use only 5-10%.
Hylog: A high performance Approach to Managing Disk Layout (Local PDF)
Wenguang Wang, Yanping Zhao, and Rick Bunt, University of Saskatchewan.
Proceedings of the 3rd Annual Conference on File And Storage Technologies (FAST '04)
Abstract:
Our objective is to improve disk I/O performance in multi-disk systems
supporting multiple concurrent users, such as file servers, database servers,
and email servers. In such systems, many disk reads are absorbed by large
in-memory buffers, and so disk writes comprise a large portion of the disk I/O
traffic. LFS (Log-structured File System) has the potential to achieve superior
write performance by accumulating small writes into large blocks and writing
them to new places, rather than overwriting on top of their old copies (called
Overwrite). Although it is commonly believed that the high segment cleaning
overhead of LFS makes it a poor choice for workloads with random updates, in
this paper we find that because of the fast improvement of disk technologies,
LFS significantly outperforms Overwrite in a wide range of system
configurations and workloads (including the random update workload) under
modern and future disks.
LFS performs worse than Overwrite, however, when the disk space
utilization is very high due to the high cleaning cost. In this paper, we
propose a new approach, the Hybrid Log-structured (HyLog) disk layout, to
overcome this problem. HyLog uses a log-structured approach for hot pages to
achieve high write performance, and Overwrite for cold pages to reduce the
cleaning cost. We compare the performance of HyLog to that of Overwrite, LFS
and WOLF (the latest improvement on LFS) under various system configurations
and workloads. Our results show that, in most cases, Hylog performs comparably
to the best of the other three approaches.
Session 2: FFS-like File System Layout
Date: August 31, 2004 6:45PM
Venue: CS 2313A (Conference room)
Talk by: Charles P. Wright
A Fast File System for UNIX
Marshall K. McKusick, William N. Joy, Samuel J Leffler, and Robert S.
Fabry
Abstract:
A reimplementation of the UNIX file system is described. The
reimplementation provides substantially higher throughput rates by using more
flexible allocation policies that allow better locality of reference and can be
adapted to a wide range of peripheral and processor characteristics. The new
file system clusters data that is sequentially accessed and provides two block
sizes to allow fast access to large files while not wasting large amounts of
space for small files. File access rates of up to ten times faster than the
traditional UNIX file system are experienced. Long-needed enhancements to the
programmers' interface are discussed. These include a mechanism to place
advisory locks on files, extensions of the name space across file systems, the
ability to use long file names, and provisions for administrative control of
resource usage.
Controling your PLACE in the File System with Gray-box Techniques (Local PDF)
James Nugent, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci Dusseau
Abstract:
We present the design and implementation of PLACE, a gray-box library
for controlling file layout on top of FFS-like file systems. PLACE exploits its
knowledge of FFS layout policies to let users place files and directories into
specific and localized portions of disk. Applications can use PLACE to
collocate files that exhibit temporal locality of access, thus improving
performance. Through a series of microbenchmarks, we analyze the overheads of
controlling file layout on top of the file system, showing that the overheads
are not prohibitive, and also discuss the limitations of our approach. Finally,
we demonstrate the utility of PLACE through two case studies: we demonstrate
the potential of file layout rearrangement in a web-server environment, and we
build a benchmarking tool that exploits control over file placement to quickly
extract low-level details from the disk system. In the traditional gray-box
manner, the PLACE library achieves these ends entirely at user level, without
changing a single line of operating system source code.
Other papers possibly of interest:
Journaling Versus Soft Updates: Asynchronous Meta-Data Protection in File Systems (Local PDF)
Margo I. Seltzer, Gregory R. Ganger, M. Kirk McKusick, Keith A. Smith, Craig A. N. Soules, Christopher A. Stein
Transforming Policies into Mechanisms with Infokernel (Local PDF)
Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, Nathan C. Burnett, Timothy E. Denehy, Thomas J. Engle, Haryadi S. Gunawi, James A. Nugent, Florentina I. Popovici
The Orlov block allocator and Directory Allocation Algorithm For FFS
Session 3: Disk Scheduling I
Date: September 8, 2004 6:45PM
Venue: CS 2311 (Conference room)
Talk by: Sean Callanan
Anticipatory scheduling: A disk scheduling framework to overcome deceptive idleness in synchronous I/O (Local PDF)
Sitaram Iyer and Peter Druschel
Abstract:
Disk schedulers in current operating systems are generally work-conserving, i.e., they schedule a request as soon as the previous request has finished. Such schedulers often require multiple outstanding requests from each process to meet system-level goals of performance and quality of service. Unfortunately, many common applications issue disk read requests in a synchronous manner, interspersing successive requests with short periods of computation. The scheduler chooses the next request too early; this induces deceptive idleness, a condition where the scheduler incorrectly assumes that the last request issuing process has no further requests, and becomes forced to switch to a request from another process.
We propose the anticipatory disk scheduling framework to solve this problem in a simple, general and transparent way, based on the non-work-conserving scheduling discipline. Our FreeBSD implementation is observed to yield large benefits on a range of microbenchmarks and real workloads. The Apache webserver delivers between 29% and 71% more throughput on a disk-intensive workload. The Andrew filesystem benchmark runs faster by 8%, due to a speedup of 54% in its read-intensive phase. Variants of the TPC-B database benchmark exhibit improvements between 2% and 60%. Proportional-share schedulers are seen to achieve their contracts accurately and efficiently.
Session 4: Disk Scheduling II
Date: September 15, 2004 6:45PM
Venue: CS 2311 (Conference room)
Talk by: Akshat Aranya
Towards Higher Disk Head Utilization: Extracting "Free" Bandwidth
From Busy Disk Drives (Local PDF)
Christopher Lumb, Jiri Schindler, Gregory R. Ganger, Erik Riedel and David F. Nagle
Abstract:
Freeblock scheduling is a new approach to utilizing more of a disk's potential media bandwidth. By filling rotational latency periods with useful media transfers, 20-50% of a never-idle disk's bandwidth can often be provided to background applications with no effect on foreground response times. This paper describes freeblock scheduling and demonstrates its value with simulation studies of two concrete applications: segment cleaning and data mining. Free segment cleaning often allows an LFS file system to maintain its ideal write performance when cleaning overheads would otherwise reduce performance by up to a factor of three. Free data mining can achieve over 47 full disk scans per day on an active transaction processing system, with no effect on its disk performance.
Other papers possibly of interest:
A Framework for Building Unobtrusive Disk Maintenance Applications (Local PDF)
Eno Thereska, Jiri Schindler, Christopher R. Lumb, John Bucy, Brandon Salmon, and Gregory R. Ganger
Session 5: The Exokernel Operating System Architecture
Date: September 23, 2004 6:45PM
Venue: CS 2311 (Tentative)
Talk by: Abhishek Rai
Exokernel: An Operating System Architecture for Application-Level Resource Management (Local PDF)
Dawson Engler, M. Frans Kaashoek, James O'Toole Jr.
Abstract:
Traditional operating systems limit the performance, flexibility, and functionality of applications by fixing the interface and implementation of operating system abstractions such as IPC and virtual memory. The exokernel operating system architecture addresses this problem by providing application-level management of physical resources. In the exokernel architecture, a small kernel securely exports all hardware resources through a low-level interface to untrusted library operating systems. Library operating systems use this interface to implement system objects and policies. This separation of resource protection from management allows application-specific customization of traditional operating system abstractions by extending, specializing, or even replacing libraries.
In this paper, we will go over the design of a prototype exokernel operating system. We will also come across several possible applications of such an architecture, and see how the exokernel architecture helps in their cause. The key idea of this paper is establishing and maintaining 'secure bindings'.
Other papers possibly of interest:
Application Performance and Flexibility on Exokernel Systems (Local PDF)
M. Frans Kaashoek, Dawson R. Engler, Gregory R. Ganger, Héctor M. Briceño, Russell Hunt, David Mazières, Thomas Pinckney, Robert Grimm, John Jannotti, and Kenneth Mackenzie
Exterminate All Operating System Abstractions (Local PDF)
Dawson Engler and M. Frans Kaashoek
Session 6: RAID 6
Date: September 30, 2004 6:45PM
Venue: CS 2311 (Conference room)
Talk by: Nikolai Joukov
Row-Diagonal Parity for Double Disk Failure Correction (Local PDF)
Peter Corbett, Bob English, Atul Goel, Tomislav Grcanac, Steven Kleiman, James Leong, and Sunitha Sankar
Abstract:
Row-Diagonal Parity (RDP) is a new algorithm for protecting against double disk failures. It stores all data unencoded, and uses only exclusive-or operations to compute parity. RDP is provably optimal in computational complexity, both during construction and reconstruction.
Like other algorithms, it is optimal in the amount of redundant information stored and accessed. RDP works within a single stripe of blocks of sizes normally used by file systems, databases and disk arrays. It can be utilized in a fixed (RAID-4) or rotated (RAID-5) parity placement style.
It is possible to extend the algorithm to encompass multiple RAID-4 or RAID-5 disk arrays in a single RDP disk array. It is possible to add disks to an existing RDP array without recalculating parity or moving data.
Implementation results show that RDP performance can be made nearly equal to single parity RAID-4 and RAID-5 performance.
Other papers possibly of interest:
EVENODD: An efficient scheme for tolerating double disk failures in RAID architectures (Local PDF)
M. Blaum, J. Brandy, J. Bruck, and J. Menon
A Case for Redundant Arrays of Inexpensive Disks (RAID) (Local PDF)
David A Patterson, Garth Gibson, and Randy H Katz
Session 7: The Google File System
Date: October 7, 2004 6:45PM
Venue: CS 2311 (Conference room)
Talk by: Avishay Traeger
The Google File System (Local PDF)
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung
Abstract:
We have designed and implemented the Google File System, a scalable
distributed file system for large distributed data-intensive applications. It
provides fault tolerance while running on inexpensive commodity hardware, and
it delivers high aggregate performance to a large number of clients. While
sharing many of the same goals as previous distributed file systems, our design
has been driven by observations of our application workloads and technological
environment, both current and anticipated, that reflect a marked departure from
some earlier file system assumptions. This has led us to re-examine traditional
choices and explore radically different design points.
The file system has successfully met our storage needs. It is widely
deployed within Google as the storage platform for the generation and
processing of data used by our service as well as research and development
efforts that require large data sets. The largest cluster to date provides
hundreds of terabytes of storage across thousands of disks on over a
thousand machines, and it is concurrently accessed by hundreds of clients.
In this paper, we present file system interface extensions designed
to support distributed applications, discuss many aspects of our design, and
report measurements from both micro-benchmarks and real world use.
Session 8: Unix Buffer Cache
Date: October 14, 2004 6:45PM
Venue: CS 2311 (Conference room)
Talk by: Naveen Gupta
Improving the efficiency of the Unix File Buffer Caches (Local PS)
Andrew Braunstein, Mark Riley, John Wilkes
Abstract:
This paper reports on the effects of using hardware virtual memory assists in managing file buffer caches in UNIX. A controlled experimental environment was constructed from two systems whose only difference was that one of them (XMF) used the virtual memory hardware to assist file buffer cache search and retrieval. An extensive series of performance characterizations was used to study the effects of varying the buffer cache size (from 3 Megabytes to 70 MB); I\O transfer sizes (from 4 bytes to 64 KB); cache-resident and non-cache-resident data; READs and WRITEs; and a range of application programs. The results: small READ/WRITE transfers from the cache (?1 KB) were 5O% faster under XMF, while larger transfers (?8 KB) were 20% faster. Retrieving data from disk, the XMF improvement was 25% and 1O% respectively, although OPEN/CLOSE system calls took slightly longer in XMF. Some individual programs ran as much as 40% faster on XMF, while an application benchmark suite showed a 7-15% improvement in overall execution time. Perhaps surprisingly. XMF had fewer translation lookaside buffer misses.
Session 9: Windows Driver Model
Date: November 18, 2004 6:45PM
Venue: CS 2311 (Conference room)
Talk by: Rakesh Iyer
Windows Driver Model
Rakesh N Iyer
Abstract:
This seminar provides an introduction to the Windows kernel architecture.WDM was introduced in Win98 Second Edition and represents an architecture for driver development. The architecture allows for easy and portable development of drivers. i.e. A WDM driver developed on one Windows platform say(Windows 2k) would work on any other platform.This architecture emphasizes Plug and Play and Power management which are important features for end users as we move towards a wireless world. This seminar will cover some of the parts of the kernel such as the I/O manager and most importantly the IRP, which is the most important kernel data structure in Windows.
Session 10: Windows File Systems
Date: December 9, 2004 6:45PM
Venue: CS 2311 (Conference room)
Talk by: Rakesh Iyer
Windows File Systems
Rakesh N Iyer
Abstract:
The seminar will cover:
- Brief overview of fields of the Irp(carried over from the previous seminar) that are relevant to the File system
- How File systems stack and the mount process
- File Objects, Context structures
- SectionObject and its use
- File Caching and unified model for shared memory
- Writing a File system filter driver.
- The Cryptfs (In) Experience
|