SubjectsSubjects(version: 948)
Course, academic year 2023/2024
  
Efficient Text Pattern Matching - N500001
Title: Efektivní vyhledávání v textech
Guaranteed by: CTU in Prague, Faculty of Information Technology (500)
Faculty: University of Chemistry and Technology, Prague
Actual: from 2021
Semester: winter
Points: winter s.:4
E-Credits: winter s.:4
Examination process: winter s.:
Hours per week, examination: winter s.:2/1, C+Ex [HT]
Capacity: unknown / unknown (unknown)
Min. number of students: unlimited
Language: Czech
Teaching methods: full-time
Teaching methods: full-time
Level:  
Is provided by: M500001
For type:  
Guarantor: Holub Jan prof. Ing. Ph.D.
Is interchangeable with: M500001
Examination dates   Schedule   
Annotation -
Last update: Jirát Jiří Ing. Ph.D. (09.01.2014)
Students get knowledge of efficient algorithms for text pattern matching. They learn to use so called succinct data structures that are efficient in both access time and memory complexity. They will be able to use the knowledge in design of applications that utilize pattern matching.
Aim of the course -
Last update: Jirát Jiří Ing. Ph.D. (31.01.2014)

Students will be able to:

Understand pattern matching as a key task in most applications.

Understand efficient algorithms and data structures for pattern matching (not only in information processing, but also in other fields like security - intrusion detection, virus scans).

Have an extensive overview of such algorithms and data structures.

Literature -
Last update: Jirát Jiří Ing. Ph.D. (09.01.2014)

R:W.F. Smyth: ''Computing Patterns in Strings'', Pearson Addison Wesley (UK), 2003, 423 pp. ISBN-10: 0201398397

R:M.Crochemore, W. Rytter: ''Jewels of Stringology''. World Scientific Publishing Company, 2003. ISBN-10: 9810248970.

R:G.Navarro, M. Raffinot: ''Flexible Pattern Matching in Strings''. Cambridge University Press, 2008. ISBN-10: 0521039932.

R:M.Crochemore, C. Hancart, T. Lecroq: ''Algorithms on Strings''. Cambridge University Press, 2007. ISBN-10: 0521848997

Learning resources -
Last update: Jirát Jiří Ing. Ph.D. (09.01.2014)

https://edux.fit.cvut.cz/courses/MI-EVY/

(login necessary)

Syllabus -
Last update: Jirát Jiří Ing. Ph.D. (09.01.2014)

1. Introduction, basic definitions, border array.

2. Text full index: Suffix array.

3. Text full index: Suffix tree, LCP construction.

4. Text full index: Factor, suffix automata, on-line construction.

5. Exact pattern matching algorithms.

6. FFT in pattern matching.

7. Succinct data structure: rank & select.

8. Succinct data structure: wavelet tree.

9. FM-Index.

10. Dictionary representation, spell checking.

11. Approximate pattern matching.

12. Pattern matching in bioinformatics and musicology.

13. Pattern matching in bioinformatics and musicology.

Registration requirements -
Last update: Jirát Jiří Ing. Ph.D. (31.01.2014)

Knowledge of finite automata and complexity.

Teaching methods
Activity Credits Hours
Účast na přednáškách 1 28
Práce na individuálním projektu 2.2 61
Účast na seminářích 0.5 14
4 / 4 103 / 112
 
VŠCHT Praha