Semantic Web Information Processing: from Semistructured Data to Structural Knowledge Guizhen Yang http://www.cs.sunysb.edu/~guizyang/ Department of Computer Science State University of New York at Stony Brook The vision of the Semantic Web is to define and share machine processable data on the Web which will enable a variety of automated tasks ranging from information search to data integration to content management to Web services. This talk will present our approach to realizing the Semantic Web vision, by addressing two fundamental issues: (1) creation of semantic content by transforming unstructured Web documents into structured data; (2) infrastructure for reasoning with semantically enriched data. In the first part of the talk, I will focus on creation of semantic content from Web documents. Specifically, I will describe novel techniques for data extraction from Web documents that exhibit a high degree of precision and recall. The theory behind these techniques is based on the concept of unambiguity in automatic learning of extraction patterns and the notion of resilience to changes in Web documents. I will present complexity results and efficient algorithms for learning unambiguous and resilient extraction patterns, as well as experimental results to demonstrate the effectiveness of these techniques in practice. In the second part of the talk, I will deal with infrastructure for reasoning with semantically enriched data. I will present my work on the design and implementation of Flora-2. Flora-2 unifies the well-known F-logic, HiLog, and Transaction Logic into one coherent rule-based, object-oriented knowledge representation system. I will discuss the engineering issues of language and compiler design, system architecture, and query optimization, as well as the theoretical issues related to the new semantics and algorithms for nonmonotonic multiple value and code inheritance. Flora-2 (and its predecessor Flora-1) has been used in a variety of application domains, ranging from Web agents to information integration in bioinformatics to ontology management to building CASE systems. Since its last alpha-release less than a year ago it has had hundreds of downloads and a small community of devoted users. A beta release is planned in the near future. The source code of Flora-2 is freely available at http://flora.sourceforge.net/. At the end of the talk I will outline ongoing and future research on the Flora-2 system, tree pattern query aggregation, mining semantic structures of Web documents, and security policy management.