CSE-506 (Fall 2009) Homework Assignment #2 (100 points, 18% of your overall grade) Version 3 (10/31/2009) Due Sunday 11/1/2009 @ 11:59pm * PURPOSE: To become familiar with the VFS layer of Linux, and especially with extensible file systems APIs. To build a useful file system using stacking technologies. You will use "ecryptfs" as a starting point for this assignment, and choose one of two assignments: (1) Ecryptfs2: enhance ecryptfs to add additional security features to it. (2) Wrapfs: strip down ecryptfs to produce a very lightweight "null-layer" stackable file system. * ECRYPTFS2 Ecryptfs is a stackable file encryption file system that appeared in the Linux kernel first in 2.6.19. You can find plenty of information about it online (Google, https://launchpad.net/ecryptfs/, etc.). Ecryptfs offers stronger security than what you did in HW1, but it uses similar ideas. Ecryptfs is a file system that offers transparent encryption, so users don't need to use special system calls or tools: once the user authenticates to ecryptfs (often during mount time or afterwards), then ecryptfs will transparently encrypt/decrypt files. Ecryptfs, however, encrypts only file data (not file names). Moreover, it stayed away (on purpose) from additional "authorization" methods. Specifically, once ecryptfs has authenticated a key in the kernel's own keyring, then that key can be used by ANYONE to decrypt files. In other words, if you can become root on a machine, you can decrypt any file as long as the key is currently loaded into the kernel. In this assignment you are to enhance the authentication and authorization techniques of ecryptfs so you can further restrict the encryption/decryption to work based on one or more of the following: - UID N: only user N can access encrypted files - GID N: only tasks whose effective group is N can access files - PID N: only process ID N can access files - SID N: only processes in session ID N can access files -- see ps(1) - DIRECTORY N: only files below directory N can be accessed - TTY N: only access from terminal ID (TTY) N allowed (e.g., /dev/console) If a user specifies multiple restrictions, then they should be logically ANDed together to further restrict access (never allow more access than POSIX permissions provide). That is, if you specify a restriction of "GID X, SID Y", then ecryptfs will allow access only if the current task is part of session id Y *and* it has the effective group-id X. Otherwise, return -EACCES. Note: the process which authenticates successfully to ecryptfs, will have its credentials saved (in memory) for further checking (uid, gid, pid, etc.). Then, subsequent accesses by (possibly) other processes would have to be checked to see if they match the given restrictions which were set by the original authenticating process. You should come up with a reasonable way to pass these parameters to the kernel: possible options include ioctl()s, mount-time options, remount options, sysfs, configfs, /proc, etc. Note that some options may be easier/harder to implement and use. You have to justify your choice from several perspectives: security, ease-of-use, simple coding/debugging, efficiency, etc. Your design may consider storing those extra restrictions in the file header, so that they persist across reboots and remounts. You should design clean data structures, or extend existing ones, to support the required additional security features. Because you are modifying existing code, be sure to keep the same coding style and conventions as ecryptfs already has. You must supply a README file that clearly describes your design, data structures, etc. This'll be an important part of your grade. * WRAPFS In this assignment, take ecryptfs, and STRIP it down to provide the smallest possible "null-layer" pass-through stackable file system. Such a thin layer file system is very useful as a building block for others to develop new file systems. For example, if you want to create a file system which logs operations (e.g., provenance), you could easily add some printk's in certain places, starting from wrapfs. Or, if you wanted to create a file system which allows a user to undo a /bin/rm command, then you could change the unlink method to rename a file instead, or move it into a designated trashbin directory. Or, if you intercept all read/write ops, and check them for "bad" data patterns, you could create an anti-virus file system. The possibilities are endless, but you need a basic small template file system to start with -- and that is what wrapfs gives you. (In fact, in some of the HW3 projects, I may offer you to take wrapfs and enhance it to create some new useful functionality.) Wrapfs should only pass ops from the VFS down to the file system below, without encrypting or decrypting file data or file names. This means you have to figure out what code in ecryptfs relates to encryption, and remove it, leaving only code that relates to plain stacking. Ecryptfs is currently over 10k LoC. Your Wrapfs should be no more than 5,000 LoC (the shorter, the better your grade will be). To go about the code, study the ecryptfs code to understand how stacking works, and slowly remove features, testing that the code still compiles and works. Even after you strip all the encryption code, what's left could be stripped even further. For example, you can avoid address_space_ops entirely and use only one vm_ops->fault method (as Unionfs does). You can also simplify the readdir and filldir methods much more (as fistgen does). Where you can start from ecryptfs and strip it down to create a working wrapfs, there are two other options you can use: (1) Start from the latest Unionfs code, which is about the same size as ecryptfs (10k LoC): http://www.filesystems.org/project-unionfs.html Starting from either Ecryptfs or Unionfs would be fine, as both are stable and well tested. (2) Start from the Fistgen templates here: ftp://ftp.filesystems.org/pub/fistgen/fistgen-0.2.1.tar.gz Using fistgen, you can build and generate a "wrapfs" file system already! But, that wrapfs is not very stable, hasn't been updated for the latest 2.6 kernels, and includes all kinds of unclean/messy code. If you choose this option, you may find that you have to spend a lot of time *adding* missing code, fixing bugs, etc. Choose wisely. * RESOURCES Read everything here: http://www.fsl.cs.sunysb.edu/mailman/private/cse506/2009-October/000407.html You can use GIT to clone/checkout a kernel source tree. The Unionfs trees in the FSL's own git server include ecryptfs and unionfs already, useful if you want to compare which of the two is a better start for you: http://git.fsl.cs.sunysb.edu/ You can also checkout git trees from http://git.kernel.org/. Or you can download kernel tarballs from www.kernel.org or anywhere else. I would also recommend you consider using "guilt" to manage a set of patchsets on top of git. You can find guilt in /usr/local/bin in your VMs and vmpool. While it takes some time to learn how to use git+guilt, they are much better than CVS. * COMMON GUIDELINES Use 2.6.31.1, the latest stable kernel. Document your code carefully. Your README must be detailed about how you went about fixing/changing/adding/stripping the code. The README should also discuss the pros and cons (advantages, limitations) of your chosen techniques and design. I expect quality code akin to what the rest of the kernel has. In HW2 I give you less details of the assignment. Once you begin investigating, designing, and even coding, you may come across issues that are not mentioned in this hw2.txt document. In that case, you have to decide how to resolve the issue on your own, and document/defend your choices, taking into performance, usability, security, maintainability, etc. This is done on purpose for several reasons: to prepare you for the class project (HW3) where you'll have a lot of freedom; to prepare you for doing independent research in your future studies. * TEAMS You may work alone or in a pair for this assignment (no groups larger than two). Your assignment will be graded equally regardless of group size. However, I take group sizes into account when assigning final course grades, especially in borderline cases, as per the grading policy posted on the class Web site. If you work in pairs, you can ask for a "shared CVS repository" and I'll create one for you. In that case, email me the two OSLAB user-names of both partners and their full names). If you use git, however, you won't need a shared CVS repository, as you can share git repositories and patchsets more easily. If you work in pairs, you MUST declare your intention within 7 days of posting of this assignment, and you MUST list the names and emails of both members in your README file. * TESTING YOUR CODE You should thoroughly check your code. There are several common ways in which people test that a file system works. You're welcome to use any one or more of those (we will!): - compile some large software (linux kernel, gcc, openssh, etc.) inside your file system. A parallel compile is useful (e.g., "make -j 4"). - use a free POSIX compliance testing suite for Linux called the Linux Testing Project (LTP). See http://ltp.sourceforge.net/. - run Postmark, a tool to test I/O throughput - run fsx, a tool intended to exercise file system read/write operations - write small C programs to exercise specific system calls - use basic tools like ls, cat, rm, mv, strace, etc. Note: LTP is very good at testing compliance. In general, we would expect that anything that worked before you started changing the code, would be working as well after the changes (we won't expect you to fix problems in the basic code in 2.6.31.1.). * SUBMISSION You should submit two files via CVS to your (or your group's) hw2/ subdir: (1) The README which details your design and what you did. (2) A single large Unified Diff (diff -u) file against 2.6.31.1, which has all of your code changes. This file should be named hw2.patch. Please test that your patch applies and against 2.6.31.1, compiles, and runs. Your patch must pass "checkpatch --strict" completely cleanly. No other files should be needed, unless you think so: any userland helper code you may have written or modified. If you do submit additional files, list and explain them in your README. * MISC To help make grading simpler, we may decide to hold short demos for each assignment. If we see particularly good and stable code, we may help you post it online. * EXTRA CREDIT (OPTIONAL) If you do any of the extra credit work, then your EC code must be wrapped in #ifdef EXTRA_CREDIT // EC code here #else // base assignment code here #endif [10 pts] Ecryptfs2 name hiding: Implement extra support in readdir(), filldir(), lookup(), and anywhere else needed such that users won't even be able to see or stat() files they don't have access to. That means, if I don't have permission to open and/or decrypt a file or directory, then I should not be able to even see it in /bin/ls -- not stat it directly. You'll have to worry about cached dentry objects' visibility depending on which user is currently looking up a cached object. [10 pts] wrapfs-lite: Normally in a stackable file system, the upper dentry points to a lower dentry, and the upper inode points to a lower inode. But wrapfs really shares the same file data and meta-data both above and below. Therefore, you can implement an extra optimization: there will be no upper inode; instead, both upper and lower dentries will point directly to the lower inode. This optimization should reduce overhead and memory use of wrapfs. [10 pts] Ecryptfs ACL persistency Store any user restrictions persistently, whether they came though mount time options, ioctl's, or other form. By storing them persistently, those restrictions would be automatically read and applied, even after a system reboot. * ChangeLog: a list of changes that this description had v1: original version v2: add 10pts for ecryptfs2 ACL persistency v3: correct wrapfs-lite description