SubjectsSubjects(version: 845)
Course, academic year 2018/2019
Data Mining Algorithms - N500012
Title in English: Algoritmy data miningu
Guaranteed by: CTU in Prague, Faculty of Information Technology (500)
Actual: from 2017
Semester: summer
Points: summer s.:4
E-Credits: summer s.:4
Examination process: summer s.:
Hours per week, examination: summer s.:2/1 C+Ex [hours/week]
Capacity: unlimited / unlimited (unknown)
Min. number of students: unlimited
Language: Czech
Teaching methods: full-time
For type:  
Note: course can be enrolled in outside the study plan
enabled for web enrollment
Guarantor: Holeňa Martin doc. Ing. RNDr.
Annotation -
Last update: Jirát Jiří Ing. Ph.D. (31.01.2014)
In this course, we discuss most popular data mining algoritms and optimization techniques such as decision trees, support vector machines, multilayered perceptrons etc. We also explain theoreticaly basic elements of statistical learning that are essential for all data engineers.
Aim of the course -
Last update: Jirát Jiří Ing. Ph.D. (31.01.2014)

Students will be able to:

use theoretical background that is needed for skillful application of data mining algoritms in the field of classification, regression and clustering

Literature -
Last update: Jirát Jiří Ing. Ph.D. (31.01.2014)

R:Hastie T.,Tibshirani R.,Friedman J., The Elements of Statistical Learning, Data Mining, Inference and Prediction, Springer, 2011

Learning resources -
Last update: Jirát Jiří Ing. Ph.D. (31.01.2014)

(login necessary)

Syllabus -
Last update: Jirát Jiří Ing. Ph.D. (31.01.2014)

1. Introduction to data mining, classification, prediction, K-NN algorithm and variants

2. Model, evaluation, plasticity regularization

3. Classification and Regression from statistical point of view

4. Decision Trees (C4.5, CART, MARS algorithms)

5. Classification by means of perceptrons and its generalization

6. Linear, polynomial and logistic regression, LMS, MLE algorithms

7. Nonlinear SVM-classifiers and the SV-regression

8. Inductive modelling - GMDH MIA, COMBI

9. Nonlinear regression by multilayered perceptrons

10. Ensemble models (Adaboost algorithm)

11. Statistical approach to neural networks

12. Cluster analysis (K-means, agglomerative clustering, neural gas, SOM)

13. A statistical approach to number of hidden neurons selection

Registration requirements -
Last update: Jirát Jiří Ing. Ph.D. (31.01.2014)


Teaching methods
Activity Credits Hours
Účast na přednáškách 1 28
Práce na individuálním projektu 2,2 61
Účast na seminářích 0,5 14
4 / 4 103 / 112