CSE 600 (Ongoing Research Seminar)



Date: January 27, 2006
Time: 2:15 p.m.
Location: Computer Science Bldg 2311
Speaker:   Margo Seltzer, Harvard University
Title:   Provenance-Aware Storage Systems

Abstract:

As our scientific, business, and archival functions become increasingly dependent upon online data and processing, data provenance (or lineage) becomes increasingly important. Scientific disciplines rely on data provenance to validate their experimental results; archivists really on provenance to prove or preserve an object's integrity; and businesses rely on provenance to adhere to government regulations. In this talk I will present an introduction to the problem of data provenance and how we are tackling the challenges by constructing storage systems that are aware of provenance, treat it as a first class storage object, and maintain and generate it automatically wherever possible.

Speaker Bio:

Margo I. Seltzer is the Associate Dean for Computer Science and Engineering, the Herchel Smith Professor of Computer Science and a Harvard College Professor in the Division of Engineering and Applied Sciences at Harvard University. Her research interests include file systems, databases, and transaction processing systems. She is the author of several widely-used software packages including database and transaction libraries and the 4.4BSD log-structured file system. Dr. Seltzer is also a founder and CTO of Sleepycat Software, the makers of Berkeley DB. She is a Sloan Foundation Fellow in Computer Science, a Bunting Fellow, and was the recipient of the 1996 Radcliffe Junior Faculty Fellowship the University of California Microelectronics Scholarship. She is recognized as an outstanding teacher and won the Phi Beta Kappa teaching award in 1996 and the Abrahmson Teaching Award in 1999. Dr. Seltzer received an A.B. degree in Applied Mathematics from Harvard/Radcliffe College in 1983 and a Ph. D. in Computer Science from the University of California, Berkeley, in 1992.

Date: February 3rd, 2006
Time: 2:15 p.m.
Location: Computer Science Bldg 2311
Speaker:   Klaus Mueller, SUNY at Stony Brook
Title:   Semantics-Aware Visualization

Abstract:

Multi-modal and multi-resolution datasets have become ubiquitous in a wide range of disciplines, such as science, engineering, medicine, architecture, and even entertainment. There is a great demand to meaningfully fuse these types of data and visualize them efficiently. This requires the incorporation of data semantics into the visualization pipeline. In this talk, I will describe our Semantic Lens framework. It fuses and augments different types and resolutions of image data into one composite representation, using texture grammar and texture synthesis, and it provides a variety of zoom lenses for focus+context GPU-accelerated viewing within the semantic context. The general theme of this framwork has been implemented for both scientific and medical visualization, and in a virtual reality application for large-scale urban architecture.

Date: February 10, 2006
Time: 2:15 p.m.
Location: Computer Science Bldg 2311
Speaker:   Alex Borgida, Rutgers University
Title:   So what's so special about Description Logics?

Abstract:

The vision of a Semantic Web requires an "ontology language" to help express the semantic content of web pages and services. The current W3C proposal, OWL, is based on Description Logics (DLs). Such ontology languages also appear to be highly useful for information integration. Much of this talk is tutorial in nature, expounding on my prejudices of what makes DLs good and bad. This part is illustrated with an example showing how additional "semantic" information can be added to "syntactic" service descriptions in CORBA IDL.

One obstacle to the Semantic Web vision is the acknowledged expectation that there will be no single ontology, but rather a (growing) collection of independently maintained ones. The last, more technical, part of the talk will concern the notion of Distributed Description Logics, which we have investigated with Luciano Serafini. A potential failing of most distributed reasoner semantics is that the inconsitency of a single local source "infects" the entire system, so that an expensive global consistency check must always be made. We show how our simple notion of "semantic holes" provides a solution to this problem.

Speaker Bio:

Alex Borgida holds a PhD degree from the University of Toronto, and is a Professor of Computer Science at Rutgers University, New Brunswick, NJ. His research is mainly concerned with knowledge representation and its applications. He has published in a variety of areas including Artificial Intelligence, Databases, and Software Engineering. The main unifying thread of this work is a belief in the importance of languages, which shape the way we think of a problem (an unabashed Whorfian!), and the need to be precise and logical about the semantics of such languages. Alex is co-recipient of the most influential paper award of the 1994 International Conference on Software Engineering, and is proud to have contributed to the design and implementation of the Classic language/logic, which was used by AT&T as part of a system that configured "billions of dollars' worth of equipment".

Date: February 17, 2006
Time: 12:30 p.m.
Location: Computer Science Bldg 2311
Speaker:   Yaron Caspi, Tel-Aviv University, Israel
Title:   Video Visualization - Beyond Pixels and Frames

Abstract:

Video data is represented by pixels and frames. This restricts the way it is captured, accessed and visualized. On one hand, visual information is distributed across all frames, and therefore, in order to depict the visual information, the entire video sequence must be viewed sequentially, frame by frame. On the other hand, important visual information is lost by the limited frame rate. Similarly in the spatial domain, sensor and optics limit the capturing process, while huge redundancy prevents an efficient visualization of information. In this talk I will show how to exceed both limitations of capturing devices and of visual displays. In particular, how fusion of information from multiple sources allows to exceed temporal and spatial limitations, and how visualization of video data can benefit from importance ranking. I will describe a process that depicts the essence of video or animation, by embedding high dimensional data in low dimensional Euclidean space. I will also show how super-pixels (in contrast to pixels) contribute to the exploitation of temporal redundancy for the task of spatial segmentation of regions with high importance.

Date: February 24, 2006
Time: 2:15 p.m.
Location: Computer Science Bldg 2311
Speaker:   Julie Dorsey, Yale University
Title:   Digital Materials: Modeling the Appearance of the Everyday World
Date: March 3, 2006
Time: 2:15 p.m.
Location: Computer Science Bldg 2311
Speaker:   Amanda Stent, SUNY at Stony Brook
Title:   Adaptation in Spoken Dialog with Computers

Abstract:

In this talk I describe two experiments designed to address aspects of the following question: How do humans adapt in conversation with computers? The first experiment, a Wizard-of-Oz experiment, looks at hyperarticulation, or clear speech, directed at computers after evidence of misrecognition. The second experiment, which uses an implemented spoken dialog system, looks at adaptation in word choice and initiative. I then discuss how the results of these experiments will help us design better behaviors into spoken dialog systems.

Date: March 8, 2006
Time: 2:15 p.m.
Location: Computer Science Bldg 2311
Speaker:   Darren Gergle, Carnegie-Mellon University
Title:   The Value of Shared Visual Information for Collaborative Task Performance

Abstract:

For several decades, researchers and engineers have struggled to develop systems to support distance collaboration. The failure of many collaborative technologies is due, in part, to a limited understanding of how groups coordinate in collocated environments and how the coordination mechanisms of face-to-face collaboration are impacted by technology.

My research is building a theoretical understanding of the role shared visual information plays in communication and collaboration. Visual information provides evidence to support both situation awareness and referential grounding. However, the effectiveness with which it can be used for these purposes depends upon the features of the media (e.g., video refresh-rates) and the features of the task (e.g., the linguistic complexity of the objects being discussed). At a theoretical level, my research leads to an improved understanding of how features of tasks and media, both alone and in combination, influence group coordination. At an applied level, my work benefits developers by identifying the features of technologies that enable people to work remotely, and provides guidelines for the development of new technologies to support distance collaboration.

Date: March 9, 2006
Time: 2:15 p.m.
Location: Computer Science Bldg 2311
Speaker:   Richard Souvenir, Washington University in St. Louis
Title:   Faculty Candidate Talk

Abstract:

The field of manifold learning provides powerful tools for parameterizing high-dimensional data points with a small number of parameters when this data lies on or near some manifold. Images can be thought of as points in a high-dimensional image space where each coordinate represents the intensity value of a single pixel. Direct application of these manifold learning techniques has been successful on simple image sets such as handwriting data and a statue undergoing rigid motion. However, they tend to fail in the case of natural image sets, even those that only vary due to a single degree of freedom, such as a heart beating in MR data. This talk presents a framework which allows for the parameterization of data sets commonly seen in natural video. This framework specializes manifold learning for images by using image distance metrics that correspond to natural image variations and extends manifold learning to cyclic and intersecting topologies that occur in diagnostic medical image applications. To support such applications on cardiac MRI, the framework directly integrates the manifold embedding to provide new and stronger constraints for segmentation and tracking.

Speaker Bio:

Richard Souvenir is a doctoral candidate in the Department of Computer Science and Engineering at Washington University in St. Louis. He received the M.S. degree in Computer Science in 2003 and B.S. in Computer Science and Biology in 2001, both from Washington University. His research in computer vision focuses on discovering natural parameterizations of video data and includes aspects of machine learning and medical imaging. He is a recipient of the National Science Foundation Graduate Research Fellowship.

Date: March 10, 2006
Time: 2:15 p.m.
Location: Computer Science Bldg 2311
Speaker:   John A. Stankovic, University of Virginia
Title:   Self-Organizing Wireless Sensor Networks in Action
Date: March 13, 2006
Time: 2:15 p.m.
Location: Computer Science Bldg 2311
Speaker:   Colin Dewey, UC Berkeley
Title:   Whole-genome alignments and polytopes for comparative genomics

Abstract:

Whole-genome sequencing of many species has presented us with the opportunity to deduce the evolutionary relationships between each and every nucleotide. In this talk, I will present algorithms for this problem, which is that of multiple whole-genome alignment. The sensitivity of whole-genome alignments to parameter values can be ascertained through the use of alignment polytopes, which will be explained. I will also show how whole-genome alignments are used in comparative genomics, including the identification of novel genes, the location of micro-RNA targets, and the elucidation of cis- regulatory element and splicing signal evolution.

Speaker Bio:

Colin Dewey was an undergraduate at the University of California, Berkeley, where he majored in Electrical Engineering and Computer Sciences with an honors breadth area in Molecular Biology. After receiving his B.S. with high honors in 2001, he continued on as a graduate student at Berkeley under the guidance of Lior Pachter. He will receive his Ph.D. in Electrical Engineering and Computer Sciences with a Designated Emphasis in Computational and Genomic Biology in May 2006.

Driven by his interests in molecular evolution and algorithm design, Colin has focused his graduate research on the development of algorithms for comparing multiple whole genome sequences. He has participated in the international sequencing projects for the mouse, rat, and chicken genomes and is currently a member of the ENCODE Consortium, which aims to construct a catalog of all functional elements in the human genome. He has also collaborated with scientists at the National Center for Biotechnology Information.

Date: March 15, 2006
Time: 2:15 p.m.
Location: Computer Science Bldg 2311
Speaker:   Xifeng Yang, Univesrity of Illinois at Urbana-Champaign
Title:   Mining and searching massive graph databases

Abstract:

Graphs are ubiquitous with critical applications in domains ranging from software engineering to computational biology. However, it is challenging to analyze any reasonably large collection of graphs due to its high computational complexity. Development of scalable methods for the analysis of massive graph databases thus becomes one major thrust in data mining and database research.

In the core of many graph-related applications, there are two fundamental problems: how to mine graph patterns and how to process graph queries. My initial study grasped a tight connection between these two seemingly parallel areas. In this talk, I will first present a novel graph canonical labeling system that is able to speedup the discovery of frequent subgraph patterns. Next, discriminative pattern analysis is introduced for constructing compact yet high-quality graph indices, which are then applied to exact and approximate graph search in large graph databases. Such index mechanism is shown to be very effective in processing graph queries. The finding of graph pattern-based indexing is profound and yet to be fully explored since the same concept can also be applied to graph classification and clustering. In the end of my talk, I am going to examine broader applications of graph patterns, such as biological network analysis for functional annotation and program flow classification for software bug isolation.

Date: March 17, 2006
Time: 2:15 p.m.
Location: Computer Science Bldg 2311
Speaker:   Jennifer L. Wong, University of California, Los Angeles
Title:   Design of Embedded Systems using Data-driven Statistical Techniques

Abstract:

A large variety of applications in pervasive computing, sensor networks, and microbiological systems are driven by the capabilities of modern embedded systems. These systems have posed a number of exciting research challenges due to recent technology and application trends. The design and analysis of embedded systems are difficult due to the presence of variability and limited predictability. For example, links in wireless ad-hoc networks an often intermittent and properties of integrated circuits are subject to manufacturing variability. I will address one of the challenges of low-power wireless embedded systems: deployment for enabling more energy-efficient communication. The basic premise of my work is to use collected data from traces of deployed or operational systems to drive both the analysis and realization of the embedded system and that any proposed optimization must be demonstrated on actually operational systems. My analysis leverages non-parametric statistical techniques and identifies design insights, tractable and accurate optimization objectives, and serves a basis for fast and realistic simulation.

Specifically, I will demonstrate the importance and effectiveness of fast on-line lossy link prediction. Then apply the link metric for use in a force-directed optimization framework for addressing the network deployment problem by mapping the location discovery problem into communication space where the distance between two nodes is function of the cost of communication between the nodes. By using Delaunay tessellations, I have developed approaches for network deployment, node relocation, and other related deployment problems. I evaluated the effectiveness of both the link model as well as the optimization approach in several testbeds and for several typical tasks such as peer-to-peer communication and data collection.

Speaker Bio:

Jennifer L. Wong is a PhD candidate at the University of California, Los Angeles. She also received her B.S. in Computer Engineering and M.S. in Computer Science from UCLA. Her main research interests include statistical modeling for the design of embedded systems, low-power design optimization, sensor networks, and intellectual property protection.

Date: March 20, 2006
Time: 2:15 p.m.
Location: Computer Science Bldg 2311
Speaker:   Bo Pang
Title:   A sentimental education: machine learning problems using opinion-oriented text

Abstract:

Sentiment analysis seeks to identify the viewpoint(s) underlying a text span. Potential applications include question-answering systems that address opinions as opposed to facts and business intelligence systems that analyze user feedback. The research issues raised by such applications are often quite challenging compared to fact-based analysis. These challenges, together with large amounts of opinion-oriented text available through the web, make this area an excellent testing ground for interesting ideas in natural language processing, as well as machine learning, data mining, and theory. In this talk, we illustrate the new challenges and opportunities with two sentiment analysis tasks. In particular, we will describe how we modeled different types of relations, which can have implications outside this area as well.

One task that has attracted a great deal of attention is polarity classification, e.g., classifying a movie review as ``thumbs up'' or ``thumbs down'' from textual information alone. We considered a number of approaches, including one that applies text categorization techniques to just the subjective portions of the document. Extracting these portions can be a hard problem itself; we describe an approach based on efficient techniques for finding minimum cuts in graphs that incorporate sentence-level relations. Another task, which can be viewed as a non-standard multi-class classification task, is the rating-inference problem, where one must determine the reviewer's evaluation with respect to a multi-point scale (e.g. one to five "stars"). We apply a meta-algorithm, based on a metric labeling formulation of the problem, that explicitly exploits relations between classes. We show that the meta-algorithm can provide significant improvements over both multi-class and regression versions of SVMs when we employ a novel similarity measure appropriate to the problem.

Portions of this work are joint with Lillian Lee and Shivakumar Vaithyanathan.

Date: March 21, 2006
Time: 2:15 p.m.
Location: Computer Science Bldg 2311
Speaker:   Fei Sha, UPenn
Title:   Novel machine learning algorithms for sequential labeling problems

Abstract:

In this talk, I will describe novel machine learning algorithms that have been successfully applied to problems in speech recognition and natural language processing. The inputs to the algorithms are a sequence of examples (such as words, or acoustic observations); the outputs of the algorithms are a sequence of labels, one for each example. What makes these sequential labeling problems interesting and challenging is that the labeling decision for each example depends on decisions on other examples.

For instance, in automatic speech recognition, the examples correspond to overlapping segments of the speech signal. To recognize the words in the speech, we need to label each segments by phonemes. It is beneficial to explore contextual information by making "joint" labeling decisions on adjacent segments as well. This approach often leads to robust performance of speech recognizers.

Our new statistical learning algorithms build on the strengths of conventional sequential labeling paradigms (such as hidden Markov models) and the recent advances in statistical learning theory and convex optimization techniques. The new approaches have yielded state- of-the-art results on well-benchmarked problems in speech recognition and natural language processing, for instance, recognizing sequences of phonemes in utterances and identifying sequences of noun phrases in the text.

Sequential labeling problems are ubiquitous and arise in many domains. In the end of talk, I will discuss the possibility of extending and applying these algorithms to other types of sequential labeling problems, for example, recognizing human activities in a video sequence.

Speaker Bio:

Fei Sha is a Ph.D candidate at University of Pennsylvania in Dept. of Computer and Information Science. His primary research interests are machine learning and its application to speech, language processing, vision and robotics. He has worked on problems arising from pattern classification as well as exploratory analysis and visualization of high-dimensional data. As a coauthor, he won the Best Student Paper Award at ICML-2004 for his work on manifold learning. He holds B.Sc. and M.Sc. in Biomedical Engineering from Southeast University, China.

Date: March 22, 2006
Time: 2:15 p.m.
Location: Computer Science Bldg 2311
Speaker:   Dr. Joseph J. LaViola Jr., Brown University
Title:   Mathematical Sketching: A New Approach for Creating and Exploring Dynamic Illustrations

Abstract:

Diagrams and illustrations are often used to help explain mathematical concepts. They are commonplace in mathematical and physics textbooks providing a form of physical intuition to otherwise abstract principles. Similarly, students often use pencil and paper to create diagrams for math problems as an intuitive aid in visualizing relationships between variables, constants, and functions. However, such static diagrams generally assist only in the initial formulation of mathematical expressions but not in the ``debugging'' or analysis of those expressions which can be a severe visualization limitation even for simple problems.

To overcome these limitations I developed mathematical sketching, a novel approach to rapidly interacting with and visualizing mathematical concepts through the fluid association of handwritten mathematical notation and free-formed diagrams. Mathematical sketching derives from the familiar pencil-and-paper process of drawing supporting diagrams to facilitate the formulation of mathematical expressions; however, with a mathematical sketch, users can also leverage their physical intuition by watching their hand-drawn diagrams animate in response to continuous or discrete parameter changes in their written formulas. In this talk, I will discuss the critical components of mathematical sketching and present the results of an initial user evaluation in the context of a prototype application called MathPad^2.

Speaker Bio:

Joseph J. LaViola Jr. is currently a postdoctoral research associate in Computer Science at Brown University. He works under the direction of Andries van Dam and Robert Zeleznik in the Computer Graphics Group. His primary research interests include pen-based interactive computing, 3D interaction techniques, predictive motion tracking, multimodal interaction in virtual environments, and user interface evaluation. His work has appeared in journals such as Presence and IEEE Computer Graphics & Applications, and he has presented research at conferences including ACM SIGGRAPH, the ACM Symposium on Interactive 3D Graphics, IEEE Virtual Reality, and Eurographics Virtual Environments. He has also co-authored "3D User Interfaces: Theory and Practice", the first comprehensive book on 3D user interfaces. Joseph received a Sc.M. in Computer Science in 2000, a Sc.M. in Applied Mathematics in 2001, and a Ph.D. in Computer Science in 2005 from Brown University.

Date: March 27, 2006
Time: 2:15 p.m.
Location: Computer Science Bldg 2311
Speaker:   Andrew Ladd, Rice University
Title:   A Novel Approach to Planning for Physical Systems

Abstract:

Over the last decade, motion planning algorithms have been used to solve complex geometric problems and have contributed to advances in industrial automation, service robots and computer-assisted design of mechanisms. However, some of the most exciting applications for motion planning, such as surgical robots, humanoid robots and autonomous exploration vehicles are beyond the limits of current planning techniques. The fundamental reason for this gap is that motion planning algorithms typically do not explicitly consider the physics of robot motion. In contrast, predictive models for mechanical systems have become quite good. There are now many accurate and efficient software simulation software packages available and a corresponding need for planning techniques capable of leveraging them.

This talk will discuss recent work that has led to a novel algorithm that seamlessly combines geometry and physics. The geometry aspect of the problem is addressed with a combination of sampling and subdivision methods. The implementation uses a general purpose physical simulator for Stewart-Trinkle rigid body dynamics to model contact, friction and arbitrary kinematic constraints. The effectiveness of the planner has been demonstrated in a variety of studies including lifting a heavy weight with an articulated limb and driving a realistic car. Some theoretical aspects of the planner will be discussed as well as its implications to robotics, graphics, artificial intelligence and, more generally, our capability to compute in the physical world.

Speaker Bio:

Andrew Ladd is finishing his Ph.D. at Rice University. He is interested in algorithmic robotics which broadly studies how computers reason about physical systems. His work combines theoretical foundations with engineering practice.


Last update on 03/23/2006
Any comments to this page, please send mail to csgsc@cs.sunysb.edu