CSE634
DATA MINING
Spring 2007
Course Information
News:
THIS IS AN OLD (2007) WEB PAGE
The course WILL NOT be offered in Spring 2008
2007 SYLLABUS is in DOWNLOADS
Time:
Place:
Professor:
Anita Wasilewska
1428 CS Building; 632-8458
e-mail: anita@cs.sunysb.edu
Office Hours:
Teaching Assistant:
tba
E-mail:
Telephone:
Office Hours:
    TBA
Book:
DATA MINING Concepts and Techniques
Jiawei Han, Micheline Kamber
Morgan Kaufman Publishers, 2003
General Course Description:
Data Mining, called also
Knowledge Discovery in Databases (KDD) is a new multidisciplinary
field, It brings together research and ideas from database
technology, machine learning, neural networks, statistics, pattern
recognition, knowledge based systems, information retrieval,
high-performance computing, and data visualization. Its main focus is
the automated extraction of patterns
representing knowledge
implicitly stored in large databases, data warehouses, and other
massive information repositories.
The course will closely follow the book and is designed to give
a broad, yet in-depth overview of the Data Mining field and
examine the most recognized techniques in a more rigorous detail.
It also will explore the newest trends and developments of the
field in form of talks based on newest research papers from the
field.
Student Information
Downloads:
2007 SYLLABUS
Lecture Notes:
01. Introduction (Chapter 1)
02. Preprocessing (Chapter
3)
03. Classification (Chapters 5 and
7)
04. Classification (Testing)
05. Classification By Neural Networks
06. APRIORI Algorithm
07. Association Analysis
08. Classification (Example):
Protein Secondary Structure Prediction
09. Example 1:Decision Tree
10. Example 2:Decision Tree
DATASETS
Datasets for data mining and knowledge discovery
Datasets for data mining competitions
University California Irvine KDD Archive
World Bank datasets
Project Data
Play around with the data and familiarize yourself with it (DOWNLOAD: bakarydata.xls ).
You can download the project description from here (DOWNLOAD: Project Description)
This project will be done in groups.
More details on the project will be put up soon.
Presentations' Schedule and Subjects:
NEWS
Check 'Possible Presentations Subjects' to get an idea of the subjects to choose from.
Please mail the T.A the number of members in the group (not exceeding 4), the name, E-mail id and the SUNYSB Id of each group member, along with the subject of the presentation.
If you have not formed a group, please provide the T.A with the subject you are interested in. We will assign a group for you.
All groups presenting the same subject MUST collaborate.
Presentations' General Principles:
1. Groups must consist of 3-4 students
2. No more than 2-3 presentations on the same general topic (like
Clustering, Association Analysis, Neural Network, etc...) are allowed
3. Groups that choose the same general subject MUST collaborate.
4. No repetition of information within the same general subject is
allowed, except for one - two slides refering to previous
presentation(s) of other group, or groups.
5. "No repetition" principle applies to lecture type content as well as
reasearch papers, applications.
6. YOU MUST USE language developed in Professor Lecture Notes and the book.
Possible Presentations Subjects:
1. Data Warehouse and OLAP technology for Data Mining.
2. Data Mining Primitives, Languages and System Architectures
3. CRISP standards for Data Mining
4. Mining Association Rules in Large Databases
5. Classification based on Concepts from Association rule mining.
6. Classification Accuracy testing methods and problems
7. Statistical Methods 1: Statistical Prediction, Prediction by
regression, other purely statistical methods
8. Statistical Methods 2: Classification by Neural Networks
9. Statistical Methods 3: Bayesian Classification.
10. Statistical Methods 4: Cluster Analysis. A Categorization of major
Clustering methods
11. Evolutionary Computing: Genetic algorithms as optimization, Genetic
algorithms as classification. Other evolutionary computing methods.
12. NEW ADVANCES in Data Mining, for example:
Web Mining: an overview
of methods and problems
Text Mining: an
overview of methods and problems
Visualization and Data
Mining techniques
Natural Language
Processing and Data Mining techniques
13. FIND YOUR OWN subject and discuss it with the Professor.
Presentations' Groups, Topics, Schedule and Peer Evaluations:
Students' Presentations Report:
Download a pdf of the report form from here
Students' Presentations Spring 2007
Data Warehousing & Olap Technologies - I by Anuradha, Maduri, Sumit and Karthik.
Mining Association Rules in Large Databases by Sadler, Beili, Xiang, Xiaoxiang.
Cluster Analysis - I by Nam Kyu Han, Ju Jae Won and Chung Dong Hwan
Cluster Analysis - II by Karthik, Praveen, Shashank and Ravi
Artificial Neural Networks by Shikhir, Mohin, Kapil and Jai
Genetic Algorithms by Marcela, George, Mikhail and Abhishek
Bayesian Classification by Gayatri, Chethan, Joshwini and Krupa
Web Mining - I by Rajat, Pranav, Dhiraj and Abhijit
Web Mining - II by Vaishali, Pallavi, Minnie and Mehru
Text Mining by Alok, Shreta, Rheema and Shruthi
Students' Presentations Spring 2006:
Data Mining
Primitives, Languages, and System Architecture by Sushma and Swathi
Cluster Analysis - I
by Harpreet, Densel, and Sudipto
Neural
Networks - I by Janani, Divya, Arti, and Anjali
Neural
Networks - II by Mihir, Jeet, Rituparna, and Shrinand
Bayesian
Network by Vaibhav, Srinivasan, Faisal, and Vipin
Web Mining - I by
Anushri, Gaurao, Ankush, and Krati
Data
Warehouse and OLAP Technology - I by Rohan, Kalpit, Yeshesvini and
Smruti
Data Warehouse and OLAP
Technology - II by Sathyanarayana, Sunil, Lohit, and George
Clustering - II
by Anushree and Fatima
Web
Mining - II by Mikhail, Irem, Tania, and Barbara
Text Mining by
Rajan, Mohammad, Mahmoud and Munyaradzi
Decision Trees I, II by Vaibhav and Tarun
Visualization
in Data Mining by Chidroop and Deepanshu
Genetic Algorithms
by Durga, Rajiv, Manikant, and Kannan
Students' Presentations Spring 2005:
Presentation 1 Data
Mining Primitives, Languages, and System Architectures by Harshad
Kamat
Presentation 2 Neural
Network by Hyung-Yeon Gu & Jalal Mahmud
Presentation 3 Genetic
Algorithms by Chhavi Kashyap
Presentation 4 Data
Warehouse and OLAP Technology For Data Mining by Syed Ahmed
Presentation 5 CRISP-DM
by Jae Hong Kil
Presentation 6 Mining
Association Rules in Large Databases by Prateek Duble
Presentation 7 Association
Rules Hiding (Not Mining) by Prateek Duble
Presentation 8 Introduction
of Bayesian Network by Hiroo Kusaba
Presentation 9 Cluster
Analysis by Arthy Krishnamurthy & Jing Tun