Course Description: Overview of computational approaches to language use. Core topics include mathematical and logical foundations, syntax, semantics and progmatics. Special topics may include speech processing, dialog system machine translation information extraction and information retrieval. Statistical and traditional approaches are included. Students will develop familiarity with the literature and tools of the field.
Prerequisites: For CS graduate students, CSE 537 or permission of instructor; CSE 541 recommended. For graduate students outside CS, permission of instructor.
Have you ever heard someone say, "If only [cats|dogs|fish|ants] could talk, they would say ...."? When someone says something like this, they attribute intelligence along with language ability. That is, they assume that if the animal could talk, it would also be able to form the intention to talk and come up with something coherent and timely to say. The use of complex symbol systems including human language is generally considered to be a clear indicator of intelligence, and therefore it has long been studied by researchers in artificial intelligence. Alan Turing's famous Turing test is in large part a test of language ability.
Computational linguistics involves the application of computational tools and techniques to the study of human languages. These techniques may include first- and higher-order logics, rule-based systems, planning, search, and a variety of fairly complex statistical and probabilistic methods. Some computational linguists come from a formal linguistics background; others are mathematicians; others come from a traditional AI background; and still others are engineers. Some focus on collecting and analyzing large amounts of data to find underlying patterns; some on conducting experiments to uncover rules of interaction; some on signal processing; some on building and deploying language-capable computer systems; and so on. The richness of the toolset and the problem set in this field make it a very exciting place to work.
The first goal of this course is to give you some flavor of the range of techniques and tools brought to this complex task; that is, to make you a little of a computational linguist. Generally, we will use English as example language, because it is by far the most widely-studied language in this field. If you have a background in some other language, you are encouraged bring that to bear in your course project or homework assignments. You are also encouraged to contribute in class, pointing out blind spots or areas of similarity in the research we will discuss.
The second goal of this course is to improve your abilities as a research scientist; that is, to encourage your ability to think creatively, plan and conduct experiments, and make use of the literature. Throughout this course, you will be encouraged to bring your own ideas to bear on a range of problems, most of which you get to choose.
This course may include students from a variety of academic backgrounds. Build on your strengths, and be willing to share them with your fellow students.
CSE507 covers four main topics:
For each part of the course, you will learn what data, knowledge resources, techniques and tools computational linguists have developed and use in that area of computational linguistics.
Each section is set off from the others by a special topic, designed to illustrate alternative approaches to the handling of some aspect of natural languages. These topics may vary depending on student interest and time constraints, but will be drawn from the following: part-of-speech tagging, conversational agents, information extraction, natural language generation and machine translation.
We will use a reader for this course. The reader will contain published papers and other readings on course-related material.
The other resources for this course include:
This page lists the papers discussed during Spring 2002.
This page contains links to other books and websites you may find interesting.
Evaluation is based on four projects, one per section of the course (18% of the final grade each), which are to be completed in small groups, and on a final project presentation in which each student presents one of the projects s/he participated in during the semester (18% of the final grade). There are occasional surprise quizzes in class or on Blackboard (10% of the final grade).
You are encouraged to ask questions and seek help in class and during office hours. Questions you might ask include:
If you have a physical, emotional or medical disability that may impact your ability to complete the course work or which requires extra time on examinations, please contact the Disabled Student Services office in the ECC Building (phone: 633-6748/9TTY). DSS will review your concerns and determine, with you, what accommodations are necessary and appropriate. All information and documentation of disability is confidential.
As a student at Stony Brook, you have agreed to follow the university's rules regarding academic honesty and appropriate conduct. You should read both the academic honesty information and procedures and the student code of conduct, which can be found in the student handbook.
Any academic dishonesty will be reported to the academic judiciary.