2:20 pm-3:40 pm, Mon, Wed
CS Building1441
Data Mining, called also Knowledge Discovery in Databases (KDD) is a new multidisciplinary field, It brings together research and ideas from database technology, machine learning, neural networks, statistics, pattern recognition, knowledge based systems, information retrieval, high-performance computing, and data visualization. Its main focus is the automated extraction of patterns representing knowledge implicitly stored in large databases, data warehouses, and other massive information repositories.
The course will closely follow the book and is designed to give a broad, yet in-depth overview of the Data Mining field and examine the most recognized techniques in a more rigorous detail. It also will explore the newest trends and developments of the field in form of talks based on newest research papers from the field.
This is an educational part. The main goal of this presentation is to teach others the material.Students have to put time and effort into understanding the material, present it slowly and be prepared to answer students questions.
Remember that "I don't understand" is also an answer, but don't over-use it! The better answer is: "the book is not very clear, I think that its is ...".
I will distribute subjects very shortly.
Students have a total freedom of choice of then subjects. This presentation must consists of two parts.
Part 1 Short overview of a techniques, methods used as taught during the course or found in the literature.
Part 2 Detailed presentation of then paper or application.
Part 1 a short motivation WHY you chose those presentations for the report.
Part 2 One page description-summary (own words!) of each presentation.
Part 3 your own evaluation of the presentation.
Step 1 FIND (Web or other sources) a research paper on DATA MINING subject of your choice.
Step 2 Write motivation why you have chosen this particular paper.
Step 3 Write at least one page summary of the paper. In your own words. Do not copy abstracts or summaries. You have to state if it is an application or theoretical paper and what is the real point of the paper. It has to be your own summary, not the author's. You have to specify which techniques, algorithms, are used or improved upon etc...
Step 4 Write you own evaluation of the paper. Address
the following:
1. Does the author(s) really accomplished what they said they did.
2. How important is the result - based on what you KNOW (after our course!)
about the field.
3. How well the paper is written: motivation, description of related research,
statement of the problem of the paper, its history and relevance to the
field.
4. How important is the paper with respect of future development of the
field: does it open new directions, or in a case of general model building
paper, how much of the past research the does it cover.
5. Any other remarks and your own reflections.your own evaluation of the
presentation.
Any direct citations (even of ONE SENTENCE!) must have a standard form of a citation: give the page of the paper and show clearly when it start and when it finishes.
Chapter 1 Introduction. General overview: what is Data Mining, which data, what kinds of patterns can be mined.
Chapter 2 Data Warehouse and OLAP technology for Data Mining. (Students presentations)
Chapter 3 Data preprocessing: data cleaning, data integration and transformation, data reduction, discretization and concept hierarchy generation.
Chapter 4 Data Mining Primitives, Languages and System Architectures. (Students presentations)
Chapter 5 Concept Descriptions: Characteristic and Discriminant rules. Data Generalization. EXTRA: Example of decision tables and Rough Sets.
Chapter 6 Mining Association Rules in Large Databases. Transactional databases and Apriori Algorithm.
Chapter 5 Concept Descriptions: Characteristic and Discriminant rules. Data Generalization. EXTRA: Example of decision tables and Rough Sets.
Chapter 7 Classification and prediction. Decision Tree Induction ID3, C4.5).
Rough Sets.
Bayesian Classification. (Students
presentations).
Classification based on Concepts from Association rule mining.
Genetic algorithms. (Students presentations) Statistical Prediction.
Chapter 8 Cluster Analysis. A Categorization of major Clustering methods. Some students presentations.
TRENDS and Developments -newest research and applications presentation.
Last Updated by Zhiquan Gao, 04/18/2005