SubjectsSubjects(version: 949)
Course, academic year 2023/2024
  
Data Mining - N143034
Title: Vytěžování znalostí z dat
Guaranteed by: Department of Informatics and Chemistry (143)
Faculty: Faculty of Chemical Technology
Actual: from 2022
Semester: summer
Points: summer s.:5
E-Credits: summer s.:5
Examination process: summer s.:
Hours per week, examination: summer s.:2/2, C+Ex [HT]
Capacity: unknown / unknown (unknown)
Min. number of students: unlimited
Language: Czech
Teaching methods: full-time
Teaching methods: full-time
Level:  
For type:  
Additional information: http://ich.vscht.cz/~svozil/teaching.html
Guarantor: Svozil Daniel prof. Mgr. Ph.D.
Examination dates   Schedule   
Annotation -
Last update: ROZ143 (15.11.2012)
This course provides a broad introduction to machine learning and data mining. Topics include: (i) Supervised learning (decision trees, support vector machines, neural networks). (ii) Unsupervised learning (clustering, dimensionality reduction). (iii) Best practices in machine learning (bias/variance theory, data preprocessing, model quality assesment and model selection). Handson exercises will draw from numerous case studies and applications, and will use the open source datamining system Rapidminer.
Aim of the course -
Last update: TAJ143 (03.12.2013)

Students will be able:

To apply basic data mining tools to common problems (classification, regression, clustering).

To choose algorithm suitable for the given problem and assess its accuracy.

Understand how and why data mining and machine learning algorithms work.

Literature -
Last update: TAJ143 (02.07.2013)

R:Larose D. T., Discovering Knowledge in Data: An Introduction to Data Mining, Wiley-Interscience, 2004, ISBN 0471666572

A:Ch. M. Bishop, Pattern Recognition and Machine Learning, Springer, 2007, ISBN 0387310738

A:I. H. Witten, Data Mining: Practical Machine Learning Tools, Morgan Kaufmann, 2011, ISBN 0123748569

Learning resources -
Last update: ROZ143 (15.11.2012)

Online course materials at http://ich.vscht.cz/~svozil/teaching.html

Course (video and slides) "Learning from data" from Caltechu: http://work.caltech.edu/telecourse.html

Course (video and slides) "Machine learning" from Stanford: http://see.stanford.edu/see/courseinfo.aspx?coll=348ca38a-3a6d-4052-937d-cb017338d7b1 and http://cs229.stanford.edu/materials.html

Course "Machine learning" on Coursera: https://www.coursera.org/course/machlearning

Requirements to the exam - Czech
Last update: Svozil Daniel prof. Mgr. Ph.D. (28.02.2011)

Zápočet bude získán na základě vypracovaného projektu, jehož součástí je závěrečné zpráva a prezentace výsledků. Na konci semestru studenti skládají písemnou zkoušku.

Syllabus -
Last update: ROZ143 (15.11.2012)

1) Introduction to data mining. CRISP-DM. Data warehousing. OLAP.

2) Pattern recognition - basic concepts. Supervised/unsupervised learning. Classification and regression. Generalization. Overfitting. Bias-variance tradeoff.

3) Test set. Cross validation. k-nearest neighbors.

4) Cluster analysis.

5) Information theory. Decission trees.

6) Neural networks I. Threshold neuron. ADALINE. Linear perceptron.

7) Neural networks II. Multilayer Perceptron.

8) Neural networks III. Radial Basis Function (RBF) Networks.

9) Neural networks IV. Self-Organizing Map.

10) Support Vector Machines.

11) Genetic Algorithms.

12) Feature Selection. Feature Extraction.

13) Ensemble learning.

14) Summary.

Registration requirements -
Last update: TAJ143 (02.07.2013)

Fundamentals of Statistics

Teaching methods
Activity Credits Hours
Příprava na přednášky, semináře, laboratoře, exkurzi nebo praxi 1 28
Práce na individuálním projektu 1 28
Příprava na zkoušku a její absolvování 1 28
3 / 5 84 / 140
Coursework assessment
Form Significance
Defense of an individual project 30
Examination test 70

 
VŠCHT Praha