| Date | Speaker | Topic | Links | |
| 1. | Feb 13 | Patrick McDaniel | Authentication in Interdomain Routing | |
| 2. | Feb 18 | Anita Wasilewska | Proteins Secondary Structure Prediction (P1) | Paper |
| 3. | Feb 25 | Anita Wasilewska | Proteins Secondary Structure Prediction (P2) | Paper |
| 4. | Mar 03 | Beata Sarna-Starosta | Policy Analysis for Security-Enhanced Linux | Paper |
| 5. | Mar 10 | |||
| 6. | Mar 17 | Larry Koved | Beyond Code Generation: Using compiler technology for security and software defect identification | Paper |
| 7. | Mar 24 | Ping Yang | Pi-calculus Verification | |
| 8. | Mar 31 | IV Ramakrishnan | ||
| *** | Apr 07 | Spring Recess | ||
| 9. | Apr 14 | |||
| 10. | Apr 21 | |||
| 11. | Apr 28 | IV Ramakrishnan | On the Power of Semantic Partitioning of Web Documents | Paper |
| 12. | May 05 | |||
| 12. | May 26 | Guizhen Yang | The Complexity of Mining Maximal Frequent Itemsets and Maximal Frequent Patterns | Paper |
Attacks against Internet routing are increasing in number and
severity. Contributing greatly to these attacks is the absence of
origin authentication: there is no way to validate if an entity using
an address has the right to do so. This vulnerability is not only a
conduit for malicious behavior, but indirectly allows seemingly
inconsequential misconfigurations to disrupt large portions of the
Internet. This talk discusses the semantics, design, and costs of
origin authentication in interdomain routing. A formalization of
address usage and delegation is presented and broad classes of
cryptographic proof systems appropriate for origin authentication are
considered.
The costs of origin authentication are largely determined by the form
and stability of the served address space. However, prior to this
work, little was known about the relevant characteristics of address
use on the Internet. Developed from collected interdomain routing
data and presented in this talk, our approximate delegation hierarchy
shows that current IP address delegation is dense and relatively
static. One notable result shows that as few as 16 entities are the
source of 80% of the delegation on the Internet. We further show via
simulation that these features can be exploited to efficiently
implement Internet-scale origin authentication. The talk is concluded
with a presentation of thoughts on major problems in routing security
and other related future work.
Techniques for the prediction of protein secondary structure provide
information that is useful both in ad initio structure
prediction and as an additional constraint for fold-recognition
algorithms. Knowledge of secondary structure alone can help the design
of site-directed or deletion mutants that will not destroy the native
protein structure. However, for all these applications it is essential
that the secondary structure prediction be accurate, or at least that,
the reliability for each residue can be assessed. If a protein
sequence shows clear similarity to a protein of known three
dimensional structure, then the most accurate method of predicting the
secondary structure is to align the sequences by standard dynamic
programming algorithms, as the homology modelling is much more
accurate than secondary structure prediction for high levels of
sequence identity.
Secondary structure prediction methods are of most use when sequence
similarity to a protein of known structure is
undetectable. Accordingly, it is important that there is no detectable
sequence similarity between sequences used to train and test secondary
structure prediction methods.
It our talk we first define a formal mathematical model for PSSP
problems and use it to present an uniform view of some of the newest
supervised classification methods and discuss their results.
We then discuss the most modern PSSP data-sets, and first, second
and third generation of PSSP algorithms.
Finally, we evaluate and compare predictive accuracy of nine of PSSP
algorithms with our recently built meta-classifier. The
meta-classifier uses evolutionary information and combines the
predictions from the nine classifiers. It gives a 2\% improvement in
predictive accuracy over the best single method and 15\% over the
worst one.
Security-Enhanced Linux (SELinux) extends Linux with a flexible mandatory access control mechanism that enforces security policies expressed in SELinux's policy language. Determining whether a given policy meets a site's high-level security goals can be difficult, due to the low-level nature of the policy language and the size and complexity of SELinux policies. We propose a logic-programming-based approach to analysis of SELinux policies. The approach is implemented in a tool that helps users determine whether a policy meets its goals.
Compiler and optimization technology has a rich history. Traditional use
of the technology has been for code generation through a variety of intra-
and inter-procedural code analyses. We have been exploring the use of the
same basic techniques to address hard problems in security and software
defect identification. A number of factors make the analysis problems
more challenging. A significant trend in software development is the
reuse of components -- from libraries to entire applications. Often the
source code for these components is not available. The size and complexity
of the software to be analyzed can be quite large, making the analysis
even more challenging. This talk will give a high level overview of two
of our projects that use whole program control and data flow analyses:
We have been developing a variety of security analyses of Java
applications (e.g., servlets, applets, applications). For example, one of
the most difficult challenges for an application developer or system
administrator is identifying the security authorization requirements of
Java applications ("Permissions" to grant the application). The
traditional approach is either to turn off security or to run test cases
to observe authorization failures and add the required Permissions to the
authorization database until the application no longer fails. This talk
describes a better approach that automatically computes the authorization
policy for an application.
Another project, called SABER, uses whole program analysis to identify
coding defects in application code. Saber identifies code that may result
in performance problems, outright failure of an application, or violates
"best practices" (coding conventions). For example, a performance problem
may result when an application uses a class or method that is known to
result in poor performance when used in a particular context (e.g.,
calling the garbage collector during garbage collection). This talk will
give an overview of the types of defects that SABER can discover.
ON THE POWER OF SEMANTIC PARTITIONING OF WEB DOCUMENTS
IV Ramakrishnan
Stony Brook University
Enormous amount of semantic data is still being encoded in HTML
documents. Identifying and annotating the semantic concepts implicit
in such documents will make them directly amenable for Semantic Web
processing. This talk will describe a highly automated technique for
partitioning HTML documents into semantic structures. Semantic
partitioning has several important and powerful implications in
practice. For example, it eases the task of formulating queries to
retrieve data from Web documents. Knowledge of the schema made
explicit via semantic partitioning facilitates automated
transformation of HTML documents into more semantics-oriented document
formats such as RDF and XML. Yet another application is
audio-browseable Web content. By putting a dialog interface to the
content of a Web page which is reorganized based on the knowledge of
its schema, a user, especially visually disabled individuals, can
browse its content using audio and not suffer from information
overload. Finally semantic partitioning can enable the creation of
self-repairable wrappers, the technology that provides a database-like
interface to Web documents.
THE COMPLEXITY OF MINING MAXIMAL FREQUENT ITEMSETS AND MAXIMAL FREQUENT PATTERNS
Guizhen Yang
Department of Computer Science and Engineering
University at Buffalo, The State University of New York
Since the introduction of the Apriori algorithm about a decade ago, the
field of data mining has flourished into a research area of significant
technological and social importance, with applications ranging from
business intelligence to security to bioinformatics. However, in spite of
the multitude of data mining algorithms developed, not much effort has been
made on the theoretical frontend to study the inherent complexity nature of
data mining problems themselves. A thorough investigation of these
fundamental problems is greatly needed since it will not only provide
invaluable insights into many data mining problems but will also shed new
lights on the characteristics of different data mining algorithms and
benchmark datasets.
In this talk we seek to provide a theoretical account of the computational
difficulty of a genre of data mining problems that deal with maximal
frequent patterns, from the perspective of counting the number of
solutions. We present the first formal proof that the problem of counting
the number of distinct maximal frequent itemsets in a database of
transactions, given an arbitrary support threshold, is #P-complete, thereby
providing strong theoretical evidence that the problem of mining maximal
frequent itemsets is NP-hard.
We will also extend our complexity analysis to other similar data mining
problems dealing with complex data structures, such as sequences, trees,
and graphs, which have attracted intensive research interests in recent
years. Normally, in these problems a partial order among frequent patterns
can be defined in such a way as to preserve the downward closure property,
with maximal frequent patterns being those without any successor with
respect to this partial order. We investigate several variants of these
mining problems in which the patterns of interest are subsequences,
subtrees, or subgraphs, and show that the associated problems of counting
the number of maximal frequent patterns are all either #P-complete or
#P-hard.
This talk will be self-explanatory. No prior knowledge on data mining and
complexity theory will be assumed from audience.
Last updated on Mar 03, 2004 by Radu Grosu