CSE634
DATA MINING
Spring 2007



Course Information


News:

  • THIS IS AN OLD (2007) WEB PAGE
  • The course WILL NOT be offered in Spring 2008
  • 2007 SYLLABUS is in DOWNLOADS
  • Time:

    Place:

    Professor:

    Anita Wasilewska

    1428 CS Building; 632-8458
    e-mail: anita@cs.sunysb.edu
    Office Hours:

    Teaching Assistant:

    tba

    E-mail:
    Telephone:
    Office Hours:     TBA

    Book:

    DATA MINING Concepts and Techniques
    Jiawei Han, Micheline Kamber
    Morgan Kaufman Publishers, 2003

    General Course Description:

    Data Mining, called also Knowledge Discovery in Databases (KDD) is a new multidisciplinary field, It brings together research and ideas from database technology, machine learning, neural networks, statistics, pattern recognition, knowledge based systems, information retrieval, high-performance computing, and data visualization. Its main focus is the automated extraction of patterns representing knowledge implicitly stored in large databases, data warehouses, and other massive information repositories.
    The course will closely follow the book and is designed to give a broad, yet in-depth overview of the Data Mining field and examine the most recognized techniques in a more rigorous detail. It also will explore the newest trends and developments of the field in form of talks based on newest research papers from the field.

    Student Information

    Downloads:

    2007 SYLLABUS

    Lecture Notes:

    01. Introduction (Chapter 1)
    02. Preprocessing (Chapter 3)
    03. Classification (Chapters 5 and 7)
    04. Classification (Testing)
    05. Classification By Neural Networks
    06. APRIORI Algorithm
    07. Association Analysis
    08. Classification (Example): Protein Secondary Structure Prediction
    09. Example 1:Decision Tree
    10. Example 2:Decision Tree

    DATASETS

    Datasets for data mining and knowledge discovery
    Datasets for data mining competitions
    University California Irvine KDD Archive
    World Bank datasets

    Project Data

  • Play around with the data and familiarize yourself with it (DOWNLOAD: bakarydata.xls ).
  • You can download the project description from here (DOWNLOAD: Project Description)
  • This project will be done in groups.
  • More details on the project will be put up soon.
  • Presentations' Schedule and Subjects:


    NEWS

  • Check 'Possible Presentations Subjects' to get an idea of the subjects to choose from.
  • Please mail the T.A the number of members in the group (not exceeding 4), the name, E-mail id and the SUNYSB Id of each group member, along with the subject of the presentation.
  • If you have not formed a group, please provide the T.A with the subject you are interested in. We will assign a group for you.
  • All groups presenting the same subject MUST collaborate.
  • Presentations' General Principles:

    1. Groups must consist of 3-4 students
    2. No more than 2-3 presentations on the same general topic (like Clustering, Association Analysis, Neural Network, etc...) are allowed
    3. Groups that choose the same general subject MUST collaborate.
    4. No repetition of information within the same general subject is allowed, except for one - two slides refering to previous presentation(s) of other group, or groups.
    5. "No repetition" principle applies to lecture type content as well as reasearch papers, applications.
    6. YOU MUST USE language developed in Professor Lecture Notes and the book.

    Possible Presentations Subjects:


    1. Data Warehouse and OLAP technology for Data Mining.
    2. Data Mining Primitives, Languages and System Architectures
    3. CRISP standards for Data Mining
    4. Mining Association Rules in Large Databases
    5. Classification based on Concepts from Association rule mining.
    6. Classification Accuracy testing methods and problems
    7. Statistical Methods 1: Statistical Prediction, Prediction by regression, other purely statistical methods
    8. Statistical Methods 2: Classification by Neural Networks
    9. Statistical Methods 3: Bayesian Classification.
    10. Statistical Methods 4: Cluster Analysis. A Categorization of major Clustering methods
    11. Evolutionary Computing: Genetic algorithms as optimization, Genetic algorithms as classification. Other evolutionary computing methods.
    12. NEW ADVANCES in Data Mining, for example:
            Web Mining: an overview of methods and problems
            Text Mining: an overview of methods and problems
            Visualization and Data Mining techniques
            Natural Language Processing and Data Mining techniques
    13. FIND YOUR OWN subject and discuss it with the Professor.

    Presentations' Groups, Topics, Schedule and Peer Evaluations:


    Students' Presentations Report:


    Download a pdf of the report form from here

    Students' Presentations Spring 2007

    Data Warehousing & Olap Technologies - I by Anuradha, Maduri, Sumit and Karthik.
    Mining Association Rules in Large Databases by Sadler, Beili, Xiang, Xiaoxiang.
    Cluster Analysis - I by Nam Kyu Han, Ju Jae Won and Chung Dong Hwan
    Cluster Analysis - II by Karthik, Praveen, Shashank and Ravi
    Artificial Neural Networks by Shikhir, Mohin, Kapil and Jai
    Genetic Algorithms by Marcela, George, Mikhail and Abhishek
    Bayesian Classification by Gayatri, Chethan, Joshwini and Krupa
    Web Mining - I by Rajat, Pranav, Dhiraj and Abhijit
    Web Mining - II by Vaishali, Pallavi, Minnie and Mehru
    Text Mining by Alok, Shreta, Rheema and Shruthi

    Students' Presentations Spring 2006:


    Data Mining Primitives, Languages, and System Architecture by Sushma and Swathi
    Cluster Analysis - I by Harpreet, Densel, and Sudipto
    Neural Networks - I by Janani, Divya, Arti, and Anjali
    Neural Networks - II by Mihir, Jeet, Rituparna, and Shrinand
    Bayesian Network by Vaibhav, Srinivasan, Faisal, and Vipin
    Web Mining - I by Anushri, Gaurao, Ankush, and Krati
    Data Warehouse and OLAP Technology - I by Rohan, Kalpit, Yeshesvini and Smruti
    Data Warehouse and OLAP Technology - II by Sathyanarayana, Sunil, Lohit, and George
    Clustering - II by Anushree and Fatima
    Web Mining - II by Mikhail, Irem, Tania, and Barbara
    Text Mining by Rajan, Mohammad, Mahmoud and Munyaradzi
    Decision Trees I, II by Vaibhav and Tarun
    Visualization in Data Mining by Chidroop and Deepanshu
    Genetic Algorithms by Durga, Rajiv, Manikant, and Kannan

    Students' Presentations Spring 2005:


    Presentation 1 Data Mining Primitives, Languages, and System Architectures by Harshad Kamat
    Presentation 2 Neural Network by Hyung-Yeon Gu & Jalal Mahmud
    Presentation 3 Genetic Algorithms by Chhavi Kashyap
    Presentation 4 Data Warehouse and OLAP Technology For Data Mining by Syed Ahmed
    Presentation 5 CRISP-DM by Jae Hong Kil
    Presentation 6 Mining Association Rules in Large Databases by Prateek Duble
    Presentation 7 Association Rules Hiding (Not Mining) by Prateek Duble
    Presentation 8 Introduction of Bayesian Network by Hiroo Kusaba
    Presentation 9 Cluster Analysis by Arthy Krishnamurthy & Jing Tun