Title: "Performance Availability in Cluster I/O" Speaker: Remzi H. Arpaci-Dusseau, U.C. Berkeley Abstract: In this talk, I will discuss how to achieve common-case peak performance for I/O-intensive cluster applications. The motivation for this work is experience with NOW-Sort, a high-performance external sort for clusters of workstations. NOW-Sort attains peak performance only at night -- when machines are otherwise idle, and all potential performance distractions are manually removed from the system. However, when run in a less sterile (and more realistic) environment, performance suffers noticeably. The main reason for this lack of "performance availability" is the presence of performance anomalies in clustered systems. Due to the complexity of both hardware and software, the behavior of machines across a seemingly homogeneous pool of machines is often quite varied. Software predicated on this homogeneity will exhibit erratic performance, often an order of magnitude worse than expected. To remedy this, systems must assume that such performance variations exist, and contain provisions to operate well in spite of them. Towards this end, I will describe a system called River. River employs two key ideas to avoid performance faults: load balancing, via high-performance distributed queues, and replication, with graduated declustering. With these two components, River facilitates the construction of applications that perform gracefully even under severe performance anomalies, allowing data to flow seamlessly around such faults. The result is nearly ideal performance at all times -- whether day or night -- with applications effectively utilizing their share of resources. Biography Remzi H. Arpaci-Dusseau is currently a graduate student at U.C. Berkeley, under advisor David Patterson. He received a B.S. in Computer Engineering, summa cum laude, from the University of Michigan in 1993, and a Masters in Computer Science from U.C. Berkeley in 1996. He plans to complete his dissertation work in the fall of 1999. His interests lay largely in the area of experimental distributed and parallel systems, including operating systems, file systems, databases, and computer architecture. His most recent work has been on River, a software system designed to provide consistent, high-performance for cluster applications with large I/O demands. He and his wife, Andrea Arpaci-Dusseau, broke and still hold two world records in external sorting. For more information, see: http://www.cs.berkeley.edu/~remzi