SubjectsSubjects(version: 855)
Course, academic year 2019/2020
  
Information retrieval - AP500003
Title: Information retrieval
Guaranteed by: CTU in Prague, Faculty of Information Technology (500)
Actual: from 2019
Semester: both
Points: 0
E-Credits: 0
Examination process:
Hours per week, examination: 3/0 other [hours/week]
Capacity: winter:unlimited / unknown (unknown)
summer:unknown / unknown (unknown)
Min. number of students: unlimited
Language: English
Teaching methods: full-time
Level:  
For type: doctoral
Note: you can enroll for the course in winter and in summer semester
Guarantor: Kroha Petr prof. Dr. Ing. CSc.
Interchangeability : P500003
Annotation -
Last update: Pátková Vlasta (08.06.2018)
A number of electronic documents grows much faster than a human is able to deal with. The information retrieval methods help to identify documents likely containing a given information. The selection of documents is based on keywords, that are assigned to characterize document content and used to specify the aims of user search. To achieve this aim, information retrieval utilizes the methods of linear algebra that work with the vector model, statistical and probability methods, methods of computational linguistics or classification and clustering methods of artificial intelligence.
Aim of the course -
Last update: Pátková Vlasta (08.06.2018)

Students will know:

  • how to identify documents containing given information
  • how to assign keywords to text documents
  • how to index text documents
  • how to normalize text documents
  • how to categorize text documents
Literature -
Last update: Pátková Vlasta (08.06.2018)

R: Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Second edition, Addison-Wesley, 2011.

R: Weiss, S.M. et all: Text Mining? Predictive Methods for Analyzing Unstructured Information. Springer, 2005.

Learning resources -
Last update: Pátková Vlasta (08.06.2018)

Lecturer materials

Syllabus -
Last update: Pátková Vlasta (08.06.2018)

Introduction to information retrieval, uncertainty, relevance, text document normalization, Zipf's law

Text documents indexing, querying and searching - metrics, vector model - dimensionality reduction, latent semantic indexing

Document and keyword clustering, distance, similarity metrics, centroid, clustering algorithms

Document classification, Bayesian classification, k nearest neighbors, decision trees, metoda support vector machines

The aims and capabilities of text mining, linguistic methods in text mining, tokenization, part-of-speech tagging, named entity recognition, parsing, coreferences

Text mining in information retrieval: document content extraction, automatic document summarization, automatic question answering

Course completion requirements -
Last update: Pátková Vlasta (08.06.2018)

oral exam

 
VŠCHT Praha