CSE-506 (Fall 2009) Homework Assignment #1 Handout number 3 (100 points, 12% of your overall grade) Version 2 (9/7/2009) Due Tuesday 9/29/2009 @ 11:59pm * PURPOSE: To get your Linux kernel development environment working; to make small changes to the kernel and test them out; to learn about system calls. * BACKGROUND: Encrypting files is very useful and important nowadays, but Linux does not support this feature natively (yet). Your task is to create a new system call that can take an input file, encrypt or decrypt it, and then produce an output file. Note that while we give you more details below, it is up to you to inspect the kernel sources to find out similar helpful examples of code; those will provide you with even greater details than what we provide here. The expected amount of written code for this assignment would be 300-500 lines of kernel code, and another 100-200 lines of user-level code. Note, however, that a lot of time may be spent reading existing sources and debugging the code you write. * TASK: Create a Linux kernel module (in vanilla 2.6.24.7 Linux) that, when loaded into Linux, will support a new system call called sys_crypt(infile, outfile, keybuf, keylen, flags) where "infile" is the name of an input file to encrypt or decrypt, "outfile" is the output file, "keybuf" is a buffer holding the cipher key, "keylen" is the length of that buffer, and "flags" determine if you're encrypting or decrypting. If the low (LSB) bit of flags is 0, you should encrypt the infile. If the LSB is 1, you should decrypt the infile. An unencrypted (cleartext) file is just a sequence of arbitrary bytes. An encrypted (ciphertext) file has two sections. The first section is a fixed length "preamble" and contains some information to validate the decryption key (e.g., a secure hash/checksum of the user-level passphrase). This first section may include other information as you see fit (e.g., original file size, and stuff for extra-credit). The second section is just the input file data, encrypted as per the cipher block size, cipher key, etc. The most important thing system calls do first is ensure the validity of the input they are given. You must check for ALL possible bad conditions that could occur as the result of bad inputs to the system call. In that case, the system call should return the proper errno value (EINVAL, EPERM, EACCESS, etc.) Consult the system errno table and pick the right error numbers for different conditions. The kinds of errors that could occur early during the system call's execution are as follows (this is a non-exhaustive list): - missing arguments passed - null arguments - pointers to bad addresses - len and buf don't match - invalid flags - input file cannot be opened or read - output file cannot be opened or written - input or output files are not regular, or they point to the same file - trying to decrypt a file w/ the wrong key (what errno should you return?) - ANYTHING else you can think of (the more error checking you do, the better) After checking for these errors, you should open the input and output files and begin copying data between the two, encrypting or decrypting the data before it is written. Your code must be efficient. Therefore, do not waste extra kernel memory (dynamic or stack) for the system call. Make sure you're not leaking any memory. On the other hand, for efficiency, you should copy the data in chunks that are native to the system this code is compiled on, the system page size (PAGE_CACHE_SIZE or PAGE_SIZE). Hint: allocate one page as temporary buffer. Note that the last page you write could be partially filled and that your code should handle zero length files as well. Also note that ciphers have a native block size (e.g., 64 bit) and your file may have to be padded to the cipher block size. Lastly, certain ciphers/modes don't care about blocking sizes so they won't need padding. The output file should be created with the user/group ownership of the running process, and its protection mode should NOT be less than the input file. Both the input and output files may be specified using relative or absolute pathnames. Do not assume that the files are always in the current working directory. If no error occurred, sys_crypt() should return 0 to the calling process. If an error occurred, it should return -1 and ensure that errno is set for the calling process. Choose your errno's appropriately. If an error occurred in trying to write some of the output file, the system call should NOT produce a partial output file. Instead, remove any partially-written output file and return the appropriate error code. Write a C program called "cipher" that will call your syscall. The program should have no output upon success and use perror() to print out information about what errors occurred. The program should take three arguments: - flag: -e to encrypt; -d to decrypt - flag: -c ARG to specify the type of cipher (as a string name) - flag: -p ARG to specify the encryption/decryption key - flag: -h to provide a helpful usage message - input file name - output file name You can process options using getopt(3). (Note that specifying the password on the command line is highly insecure, but it'd make grading easier. In reality, one would use getpass(3) to input a password.) You should be able to execute the following command: ./cipher -p "this is my password" -e infile outfile User-level passwords should be at least 6 characters long. Nevertheless, you should not just pass the password into the kernel as is: it is too short. You need to ensure that you pass a correctly sized encryption key into the kernel. You should remove any newline character ('\n'), and then convert the human readable password into a good length key. Use a cryptographic checksum algorithm such as MD5(3) or SHA1(3) to generate a good key to pass to the kernel. An even better way would be to use a PKCS#5 library to generate secure hashes (check "man -k pkcs" for more info). * SYSTEM CALLS IN 2.6: As of kernel 2.6, a kernel module is not allowed to override system calls (long story, I'll tell you in class :-) To help you, we've given you a template that allows you to override the a system call table's entry at one location, but this works ONLY for 2.6.24.7. Download, study, and test the following tarball; Read the README file and the source files carefully before using them. Be sure that your user-level code builds against YOUR kernel headers. * USING THE CIPHERS: You should perform all of your encryption in Cipher Block Chaining (CBC) mode on whole pages (4KB on Linux x86). Use the Linux kernel built-in CryptoAPI. To learn how to use it, see the kernel documentation that comes with the CryptoAPI option. You don't need to be an expert in security or encryption to do this assignment. Part of what this assignment will teach you is how to work with someone else's code, even if all you understand is the API to that code (and not the internals). CBC only supports input and output of certain multiples (e.g., 3des uses a 64-bit block). You will need to use padding to ensure that your input is a multiple of the block size. The padding scheme you should use must work under all circumstances (e.g., padding with zeros doesn't work, because zeros are a valid input file). For this assignment, use the AES cipher only (i.e., hard-code it in your kernel code). (But see the Extra Credit section below.) * READING FILES FROM INSIDE THE KERNEL Here's an example function that can open a file from inside the kernel, read some data off of it, then close it. This will help you in this assignment. You can easily extrapolate from this how to write data to another file. (Warning: the code below is from 2.4. Adapt it as needed.) /* * Read "len" bytes from "filename" into "buf". * "buf" is in kernel space. */ int wrapfs_read_file(const char *filename, void *buf, int len) { struct file *filp; mm_segment_t oldfs; int bytes; /* Chroot? Maybe NULL isn't right here */ filp = filp_open(filename, O_RDONLY, 0); if (!filp || IS_ERR(filp)) { printk("wrapfs_read_file err %d\n", (int) PTR_ERR(filp)); return -1; /* or do something else */ } if (!filp->f_op->read) return -2; /* file(system) doesn't allow reads */ /* now read len bytes from offset 0 */ filp->f_pos = 0; /* start offset */ oldfs = get_fs(); set_fs(KERNEL_DS); bytes = filp->f_op->read(filp, buf, len, &filp->f_pos); set_fs(oldfs); /* close the file */ filp_close(filp, NULL); return bytes; } * TESTING YOUR CODE: To load/unload your module, use the runme.sh script provided in the class tarball; internally, the script calls insmod to load, and rmmod to unload a module. To list modules, use lsmod. See their respective man pages for details. Once your module is loaded, the new system call behavior should exist, and you can run your program on various input files. Check that each error condition you coded for works as it should. Check that the modified file is changed correctly. Finally, although you may develop your code on any Linux machine, we will test your code using the same Virtual Machine distribution (with all officially released patches applied as of the date this assignment is released), and using the Linux 2.6.24.7 kernel. It is YOUR responsibility to ensure that your code runs well under these conditions. We will NOT test or demo your code on your own machine or laptop! So please plan your work accordingly to allow yourself enough time to test your code on the machines for which we have given you a login account (these are the same exact machines we will test your code on when we grade it). Additionally, you strongly suggest that you enable CONFIG_DEBUG_SLAB and other useful debugging features under the "Kernel hacking" configuration menu. When grading the homework, we will use a kernel tuned for debugging---which may expose bugs in your code that you can't easily catch without debugging support. So it's better for YOU to have caught and fixed those bugs before we do. Lastly, note that even if your system call appears to work well, it's possible that you've corrupted some memory state in the kernel, and you may not notice the impact until much later. If your code begins behaving strangely after having worked better before, consider rebooting your VM. * STYLE AND MORE: Aside from testing the proper functionality of your code, we will also carefully evaluate the quality of your code. Be sure to use a consistent style, well documented, and break your code into separate functions and/or source files as it makes sense. To be sure your code is very clean, it should compile with "gcc -Wall -Werror" without any errors or warnings! We'll deduct points for any warning that we feel should be easy to fix. Read Documentation/CodingStyle to understand which coding style is preferred in the kernel and stick to it for this assignment. Run your kernel code through the syntax checker in scripts/checkpatch.pl (with the "strict" option turned on), and fix every warning that comes up. Cleaner code tends to be less buggy. If the various sources you use require common definitions, then do not duplicate the definitions. Make use of C's code-sharing facilities such as common header files. You must include a README file with this and any assignment. The README file should briefly describe what you did, what approach you took, results of any measurements you might have made, which files are included in your submission and what they are for, etc. Feel free to include any other information you think is helpful to us in this README; it can only help your grade (esp. for Extra Credit). We provided you with a Makefile in the homework template. You can modify it if needed, but remember the we will use 'make all' and 'make clean' commands while testing your code. 'make all' must produce hw1-module.ko module compiled against the currently running kernel and also build the hw1-user tool. Please keep your kernel code all in one file as named in the template (i.e., hw1-module.c), and keep your user code in hw1-user.c. That way we can simplify the review of your code. Note that any deviations from our guidelines will be penalized. * SUBMISSION You will need to submit all of your sources, headers, scripts, Makefiles, and README. Submit all of your files using CVS. See general CVS submission guidelines on the class Web site. As part of this assignment, you should also build a 2.6.24.7 kernel that's as small as you can get. For example, there are dozens of file systems available: you need at least ext3 and nfs (client), but you don't need XFS o Reiserfs. Commit your .config kernel file into CVS, but rename it "kernel.config". We will grade you on how small your kernel configuration is with the following exceptions: 1. All start time servers that run by default in the VM provided, should start without failing. 2. We won't count "kernel hacking" options: so you may enable as many of them as you'd like. PLEASE test your submission before submitting it, by unpacking it in a separate directory, compiling it cleanly, and testing it again. This is described on the class Web site. DO NOT make the common mistake of writing code until the very last minute, and then trying to figure out how to use CVS and skipping the testing of what you submitted. You will lose valuable points if you do not get to submit on time or if you submission is incomplete!!! * EXTRA CREDIT (OPTIONAL, total 20 points) If you do any of the extra credit work, then your EC code must be wrapped in #define EXTRA_CREDIT // EC code here #else // base assignment code here #endif [A] 4 points. Augment your module to utilize the Initialization Vector (IV) part of the cipher. Without having to know much about the IV, it is useful to understand that setting it to a different value each time you encrypt or decrypt a chunk of bytes produces stronger encryption that is harder to break. A common way to set the 8 bytes of the IV is as follows: - first 8 bytes are the index of the page (or page number) that you are encrypting or decrypting (e.g., on an i386 system with a 4096-byte page size, bytes 0-4095 are in page 0, bytes 4096-8191 are in page 1, etc.). - set the remaining 8 bytes to the inode number of the file. Note: Your first IV information (assuming you "chain" them) should be stored in the cipher file preamble. [B] 6 points Support multiple ciphers. You should pass the cipher name as a string using the "-c ARG" option. Change the system call to accept an extra argument at the end called "char *cipher". This variable should be a constant string, null terminated, whose value can be one of: "blowfish" for the Blowfish cipher; "des" for DES; "des3_ede" for Triple DES; etc. The type of cipher must always be specified and must always be a valid cipher that the Linux kernel CryptoAPI understands. All kernel-supported ciphers should be allowed; return EINVAL if the user specifies an invalid cipher name. The cipher name (or ID) should also be stored in the preamble. [C] 5 points. Support multiple encryption unit sizes and key lengths. You will have to augment the system call as needed to pass the new info, and the user-level tool. For example: $ ./cipher -u 16000 -l 256 -e infile outfile where -l specifies the key length to 256 bits, and -u specifies that the encryption unit should be in whole chunks of 16000 bytes (instead of the default 4KB). If not specified, -u should default to PAGE_SIZE, and -l to 128 bits. Note that the argument to -l can be any valid key length that the cipher accepts (for example, Blowfish can't use keys smaller than 128 bits); however, the argument to -u can be ANY positive number that the cipher will accept (even odd numbers). Good luck. * Change History: 9/8/09: corrected the due date to 9/29/09. 9/17/09: increase IV from 8->16 bytes total. 9/28/09: clarify #define EXTRA_CREDIT wrappers.