CSE 613 (#50842): Parallel Programming, Spring 2012

Lecture Time and Location. TuTh 2:20 pm - 3:40 pm, Computer Science 2129, West Campus

Instructor. Rezaul A. Chowdhury (rezaul{at}cs{dot}stonybrook{dot}edu)
Office Hours. TuTh 12:00 pm - 1:30 pm, 1421 Computer Science Building

Course Description. We will explore algorithms and techniques for programming state-of-the-art shared-memory (e.g., multicores) and distributed-memory parallel computers. The course will include both theoretical and programming components. Topics to be covered include: analytical modeling of parallel programs, bounds on parallel performance, scheduling, synchronization, programming using the message-passing paradigm and for shared address-space platforms, parallel algorithms for dense matrix operations, sorting, searching, graphs, computational geometry, and dynamic programming, concurrent data structures, and transactional memory.

This course is supported by educational grants from XSEDE (Extreme Science and Engineering Discovery Environment) and AWS (Amazon Web Services). We will use the computing environments provided by these two services for all homeworks and projects.

Prerequisites. Background in algorithms analysis (e.g., CSE 373 or CSE 548) and programming languages (e.g., C/C++) is required (or consent of instructor). Computer architecture background (e.g., CSE 320 or CSE 502) will be helpful, but not essential.

Recommended Textbooks.

  1. Ananth Grama, George Karypis, Vipin Kumar, and Anshul Gupta. Introduction to Parallel Computing (2nd Edition), Addison Wesley, 2003.
  2. Maurice Herlihy and Nir Shavit. The Art of Multiprocessor Programming (1st Edition), Morgan Kaufmann, 2008.
  3. Fayez Gebali. Algorithms and Parallel Computing (1st Edition), Wiley, 2011.
  4. Thomas Cormen, Charles Leiserson, Ronald Rivest, and Clifford Stein. Introduction to Algorithms (3rd Edition), MIT Press, 2009. (chapter 27 on Multithreaded Algorithms)
  5. Joseph JáJá. An Introduction to Parallel Algorithms (1st Edition), Addison Wesley, 1992.
  6. Peter Pacheco. Parallel Programming with MPI (1st Edition), Morgan Kaufmann, 1996.

Course Requirements. There will be 3 homework assignments (with both theory and programming components), two exams (one midterm, and one final), and a final group project. Each student will be responsible for scribing one lecture. The course grade will be based on the following.

Download the Latex template for scribe notes.

Blackboard. Some course documents will be available through Blackboard. The following documents have already been uploaded.

Lecture Schedule.

Date Topic Notes / Reading Materials
Tue, Jan 24 Introduction -
Thu, Jan 26 Analytical Modeling of Parallel Programs
Tue, Jan 31 The Cilk++ Concurrency Platform
Thu, Feb 2 Scheduling and Work Stealing
  • Chapter 27 (Multithreaded Algorithms), Introduction to Algorithms (3rd Edition) by Cormen et al.
  • Nimar S. Arora, Robert D. Blumofe, and Charles G. Plaxton, “Thread Scheduling for Multiprogrammed Multiprocessors”, Proceedings of the 10th Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA), pp. 119-129, 1998.
Tue, Feb 7 Analysis of the Work-Stealing Scheduler
  • The Arora et al. Paper from Lecture 4.
Thu, Feb 9 High Probability Bounds
  • Torben Hagerup and Christine Rüb, “A Guided Tour of Chernoff Bounds”, Information Processing Letters, 33(6), pp. 305-308, 1990. (If you are unable access the paper please check Blackboard under Documents/Misc for a local copy)
Tue, Feb 14 Basic Parallel Algorithmic Techniques
  • Guy E. Blelloch and Bruce M. Maggs, “Parallel Algorithms”, The Computer Science Handbook, Editor: Allen Tucker, Chapman & Hall/CRC, 2004.
  • Chapter 2 (Basic Techniques), An Introduction to Parallel Algorithms (1992) by Joseph JáJá
Thu, Feb 16 Analyzing Divide-and-Conquer Algorithms
  • Chapter 27 (Multithreaded Algorithms), Introduction to Algorithms (3rd Edition) by Cormen et al.
  • Chapter 4 (Divide-and-Conquer), Introduction to Algorithms (3rd Edition) by Cormen et al.
Tue, Feb 21 Divide-and-Conquer: Partitioning for Selection and Sorting
  • Chapter 9 (Randomized Algorithms), Section 9.6.3 (A Parallel Randomized Quicksort Algorithm), An Introduction to Parallel Algorithms (1992) by Joseph JáJá
  • Chapter 4 (Searching, Merging, and Sorting), Section 4.5 (Selection), An Introduction to Parallel Algorithms (1992) by Joseph JáJá
Thu, Feb 23 Cache Performance of Divide-and-Conquer Algorithms
Tue, Feb 28 Project Proposals -
Thu, Mar 1 Project Proposals ( continued )
Cache Performance of Divide-and-Conquer Algorithms ( continued )
-
Tue, Mar 6 Cache Performance of Divide-and-Conquer Algorithms ( continued ) -
Thu, Mar 8 Graph Algorithms: Connected Components
  • Guy E. Blelloch and Bruce M. Maggs, “Parallel Algorithms” (Section 4.3), The Computer Science Handbook, Editor: Allen Tucker, Chapman & Hall/CRC, 2004.
Tue, Mar 13 Graph Algorithms: Minimum Spanning Trees (and Radix Sort)
Thu, Mar 15 Graph Algorithms: Minimum Spanning Trees (and Radix Sort) ( continued ) -
Tue, Mar 20 Graph Algorithms: Minimum Spanning Trees (and Radix Sort) ( continued ),
Graph Algorithms: Maximal Independent Set
Thu, Mar 22 Graph Algorithms: Maximal Independent Set ( continued ) -
Tue, Mar 27 Midterm (in-class) -
Thu, Mar 29 Midterm Solutions,
Graph Algorithms: Maximal Independent Set ( continued )
-
Tue, Apr 3
Thu, Apr 5
Spring Break -
Tue, Apr 10 The Message Passing Interface
  • Chapter 6 (Programming Using the Message-Passing Paradigm), Introduction to Parallel Computing (2nd Edition) by Grama et al.
  • Chapter 2 (Message-Passing Computing), Parallel Programming (2nd Edition) by Wilkinson & Allen
Thu, Apr 12 The Message Passing Interface ( continued ) -
Tue, Apr 17 Distributed-Memory Algorithms: Dense Matrices
  • Chapter 2 (Parallel Programming Platforms), Section 2.5.1 (Message Passing Costs in Parallel Computers), Introduction to Parallel Computing (2nd Edition) by Grama et al.
  • Chapter 4 (Basic Communication Operations), Introduction to Parallel Computing (2nd Edition) by Grama et al.
  • Chapter 6 (Programming Using the Message-Passing Paradigm), Introduction to Parallel Computing (2nd Edition) by Grama et al.
  • Chapter 8 (Dense Matrix Algorithms), Section 8.2.2 (Cannon's Algorithm), Introduction to Parallel Computing (2nd Edition) by Grama et al.
  • Chapter 10 (Graph Algorithms), Section 10.4.2 (Floyd's Algorithm), Introduction to Parallel Computing (2nd Edition) by Grama et al.
Thu, Apr 19 Distributed-Memory Algorithms: Dense Matrices ( continued ) -
Tue, Apr 24 Distributed-Memory Algorithms: Sorting and Searching
  • Chapter 9 (Sorting), Section 9.5 (Bucket and Sample Sort), Introduction to Parallel Computing (2nd Edition) by Grama et al.
  • Chapter 6 (Programming Using the Message-Passing Paradigm), Section 6.6.10 (Example: Sample Sort), Introduction to Parallel Computing (2nd Edition) by Grama et al.
  • Chapter 11 (Search Algorithms for Discrete Optimization Problems), Sections 11.4.1-11.4.4 (Parallel Depth-First Search), Introduction to Parallel Computing (2nd Edition) by Grama et al.
  • John H. Reif, “Depth-First Search is Inherently Sequential”, Information Processing Letters, vol. 20 (5), pp. 229-234, 1985. (If you are unable access the paper please check Blackboard under Documents/Misc for a local copy)
Thu, Apr 26 Concurrent Data Structures: Queues and Stacks
  • Chapter 3 (Concurrent Objects), The Art of Multiprocessor Programming (2nd Edition) by Herlihy and Shavit.
  • Chapter 10 (Concurrent Queues and the ABA Problem), The Art of Multiprocessor Programming (2nd Edition) by Herlihy and Shavit.
  • Chapter 11 (Concurrent Stacks and Elimination), The Art of Multiprocessor Programming (2nd Edition) by Herlihy and Shavit.
Tue, May 1 A Brief Introduction to Transactional Memory
  • Chapter 18 (Transactioanl Memory), Section 18.1, The Art of Multiprocessor Programming (2nd Edition) by Herlihy and Shavit.
  • James Larus and Christos Kozyrakis, “Transactional Memory”, Communications of the ACM, vol. 51 (7), pp. 80-88, 2008.
Thu, May 3 Optimizing Energy Consumption
Fri, May 11 Final Exam -

Homeworks.

Exams.

Projects.

Programming Resources.