SubjectsSubjects(version: 948)
Course, academic year 2023/2024
  
Data Preprocessing - N500014
Title: Předzpracování dat
Guaranteed by: CTU in Prague, Faculty of Information Technology (500)
Faculty: University of Chemistry and Technology, Prague
Actual: from 2021
Semester: winter
Points: winter s.:4
E-Credits: winter s.:4
Examination process: winter s.:
Hours per week, examination: winter s.:2/1, C+Ex [HT]
Capacity: unknown / unknown (unknown)
Min. number of students: unlimited
Language: Czech
Teaching methods: full-time
Teaching methods: full-time
Level:  
Is provided by: M500004
For type:  
Guarantor: Jiřina Marcel doc. RNDr. Ing. Ph.D.
Is interchangeable with: M500004
Examination dates   Schedule   
Annotation -
Last update: Jirát Jiří Ing. Ph.D. (10.01.2014)
Students learn to prepare raw data for further processing and analysis. They learn what algorithms can be used to extract parameters from various data sources, such as images, texts, time series, etc., and learn the skills to apply these theoretical concepts to solve a specific problem in individual projects - e.g., parameter extraction from image data or from Internet.
Aim of the course -
Last update: Jirát Jiří Ing. Ph.D. (31.01.2014)

Students will be able to:

Apply knowledge of algorithms for extraction of parameters from various data sources as a fundamental part of knowledge engineering,

Literature -
Last update: Jirát Jiří Ing. Ph.D. (10.01.2014)

R:Pyle, D. ''Data Preparation for Data Mining''. Morgan Kaufmann, 1999. ISBN 1558605290.

R:Guyon, I., Gunn, S., Nikravesh, M., Zadeh, L. A. ''Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing)''. Springer, 2006. ISBN 3540354875.

Learning resources -
Last update: Jirát Jiří Ing. Ph.D. (10.01.2014)

https://edux.fit.cvut.cz/courses/MI-PDD/

(login necessary)

Syllabus -
Last update: Jirát Jiří Ing. Ph.D. (10.01.2014)

1. Data exploration, exploratory analysis techniques, visualization of raw data.

2. Descriptive statistics.

3. Methods to determine the relevance of features.

4. Problems with data ? dimensionality, noise, outliers, inconsistency, missing values, non-numeric data.

5. Data cleaning, transformation, imputing, discretization, binning.

6. Reduction of data dimension.

7. Reduction of data volume, class balancing.

8. Feature extraction from text.

9. Feature extraction from documents, web. Preprocessing of structured data.

10. Feature extraction from time series.

11. Feature extraction from images.

12. Data preparation case studies.

13. Automation of data preprocessing.

Registration requirements -
Last update: Jirát Jiří Ing. Ph.D. (10.01.2014)

Fundamentals of statistics, FCD course in data mining.

Teaching methods
Activity Credits Hours
Účast na přednáškách 1 28
Práce na individuálním projektu 2.2 61
Účast na seminářích 0.5 14
4 / 4 103 / 112
 
VŠCHT Praha