SubjectsSubjects(version: 979)
Course, academic year 2013/2014
  
Data Mining - N143034
Title: Vytěžování znalostí z dat
Guaranteed by: Department of Informatics and Chemistry (143)
Faculty: Faculty of Chemical Technology
Actual: from 2013 to 2015
Semester: summer
Points: summer s.:5
E-Credits: summer s.:5
Examination process: summer s.:
Hours per week, examination: summer s.:2/2, C+Ex [HT]
Capacity: 25 / 25 (unknown)
Min. number of students: unlimited
Qualifications:  
State of the course: taught
Language: Czech
Teaching methods: full-time
Level:  
Additional information: http://ich.vscht.cz/~svozil/teaching.html
Guarantor: Svozil Daniel prof. Mgr. Ph.D.
Examination dates   Schedule   
Annotation -
This course provides a broad introduction to machine learning and data mining. Topics include: (i) Supervised learning (decision trees, support vector machines, neural networks). (ii) Unsupervised learning (clustering, dimensionality reduction). (iii) Best practices in machine learning (bias/variance theory, data preprocessing, model quality assesment and model selection). Handson exercises will draw from numerous case studies and applications, and will use the open source datamining system Rapidminer.
Last update: ROZ143 (15.11.2012)
Literature -

R:Larose D. T., Discovering Knowledge in Data: An Introduction to Data Mining, Wiley-Interscience, 2004, ISBN 0471666572

A:Ch. M. Bishop, Pattern Recognition and Machine Learning, Springer, 2007, ISBN 0387310738

A:I. H. Witten, Data Mining: Practical Machine Learning Tools, Morgan Kaufmann, 2011, ISBN 0123748569

Last update: TAJ143 (02.07.2013)
Requirements to the exam - Czech

Zápočet bude získán na základě vypracovaného projektu, jehož součástí je závěrečné zpráva a prezentace výsledků. Na konci semestru studenti skládají písemnou zkoušku.

Last update: Svozil Daniel (28.02.2011)
Syllabus -

1) Introduction to data mining. CRISP-DM. Data warehousing. OLAP.

2) Pattern recognition - basic concepts. Supervised/unsupervised learning. Classification and regression. Generalization. Overfitting. Bias-variance tradeoff.

3) Test set. Cross validation. k-nearest neighbors.

4) Cluster analysis.

5) Information theory. Decission trees.

6) Neural networks I. Threshold neuron. ADALINE. Linear perceptron.

7) Neural networks II. Multilayer Perceptron.

8) Neural networks III. Radial Basis Function (RBF) Networks.

9) Neural networks IV. Self-Organizing Map.

10) Support Vector Machines.

11) Genetic Algorithms.

12) Feature Selection. Feature Extraction.

13) Ensemble learning.

14) Summary.

Last update: ROZ143 (15.11.2012)
Learning resources -

Online course materials at http://ich.vscht.cz/~svozil/teaching.html

Course (video and slides) "Learning from data" from Caltechu: http://work.caltech.edu/telecourse.html

Course (video and slides) "Machine learning" from Stanford: http://see.stanford.edu/see/courseinfo.aspx?coll=348ca38a-3a6d-4052-937d-cb017338d7b1 and http://cs229.stanford.edu/materials.html

Course "Machine learning" on Coursera: https://www.coursera.org/course/machlearning

Last update: ROZ143 (15.11.2012)
Learning outcomes -

Students will be able:

To apply basic data mining tools to common problems (classification, regression, clustering).

To choose algorithm suitable for the given problem and assess its accuracy.

Understand how and why data mining and machine learning algorithms work.

Last update: TAJ143 (03.12.2013)
Registration requirements -

Fundamentals of Statistics

Last update: TAJ143 (02.07.2013)
Teaching methods
Activity Credits Hours
Účast na přednáškách 1 28
Příprava na přednášky, semináře, laboratoře, exkurzi nebo praxi 1 28
Práce na individuálním projektu 1 28
Příprava na zkoušku a její absolvování 1 28
Účast na seminářích 1 28
5 / 5 140 / 140
Coursework assessment
Form Significance
Defense of an individual project 30
Examination test 70

 
VŠCHT Praha