Subjects

A number of electronic documents grows much faster than a human is able to deal with. The information retrieval methods help to identify documents likely containing a given information. The selection of documents is based on keywords, that are assigned to characterize document content and used to specify the aims of user search. To achieve this aim, information retrieval utilizes the methods of linear algebra that work with the vector model, statistical and probability methods, methods of computational linguistics or classification and clustering methods of artificial intelligence.

Last update: Svozil Daniel (25.05.2018)

S nástupem elektronických dokumentů nastala situace, kdy jejich počet roste mnohem vyšším tempem, než možnosti, schopnosti a ochota lidí je sledovat a číst. Metody oboru Information retrieval pomáhají najít informaci o tom, ve kterých dokumentech se hledaná informace zřejmě nachází. Provádí to tak, že umožňují vybírat dokumenty podle klíčových slov, kterými indexování dokumentů charakterizuje jejich obsah a uživatel cíle svého hledání. Jako nástroje se zde používají metody lineární algebry pro práci s vektorovým modelem hledání, statistické a pravděpodobnostní metody, metody počítačové linguistiky i shlukovací a klasifikační metody umělé intelligence.

Last update: Svozil Daniel (23.05.2018)

oral exam

Last update: Svozil Daniel (23.05.2018)

Ústní zkouška

Last update: Svozil Daniel (23.05.2018)

R: Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Second edition, Addison-Wesley, 2011.

R: Weiss, S.M. et all: Text Mining? Predictive Methods for Analyzing Unstructured Information. Springer, 2005.

Last update: Svozil Daniel (23.05.2018)

Z: Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Second edition, Addison-Wesley, 2011.

Z: Weiss, S.M. et all: Text Mining? Predictive Methods for Analyzing Unstructured Information. Springer, 2005.

Last update: Svozil Daniel (23.05.2018)

Introduction to information retrieval, uncertainty, relevance, text document normalization, Zipf's law

Text documents indexing, querying and searching - metrics, vector model - dimensionality reduction, latent semantic indexing

Document and keyword clustering, distance, similarity metrics, centroid, clustering algorithms

Document classification, Bayesian classification, k nearest neighbors, decision trees, metoda support vector machines

The aims and capabilities of text mining, linguistic methods in text mining, tokenization, part-of-speech tagging, named entity recognition, parsing, coreferences

Text mining in information retrieval: document content extraction, automatic document summarization, automatic question answering

Last update: Svozil Daniel (25.05.2018)

Úvod do problematiky information retrieval, neurčitost, relevance, přístup fuzzy, normalizace textových dokumentů, Zipfův zákon

Indexování, dotazování a hledání v textových dokumentech - metriky, vektorový model - redukce dimenzí, latentní semantické indexování

Shlukování dokumentů a shlukování klíčových slov (clustering), vzdálenost, metriky podobnosti, centroid, metody shlukování

Klasifikace dokumentů. Bayesovská klasifikace, metoda k-NN, metoda rozhodovacích stromů, metoda support vector machine

Cíle a možnosti text miningu, metody linguistiky v text miningu, lexikon, tokenization, part-of-speech tagging, named entity recognition, parsing, koreference

Aplikace metod text mining pro information retrieval: automatická extrakce obsahu dokumentu, automatické shrnutí obsahu dokumentu, automatické odpovědi na dotazy

Last update: Svozil Daniel (23.05.2018)

Lecturer materials

Last update: Svozil Daniel (23.05.2018)

Materiály přednášejícího

Last update: Svozil Daniel (23.05.2018)

Students will know:

how to identify documents containing given information

how to assign keywords to text documents

how to index text documents

how to normalize text documents

how to categorize text documents

Last update: Svozil Daniel (23.05.2018)

Studenti budou umět:

identifikovat dokumenty obsahující předem definovanou informaci

přiřadit relevantní klíčová slova k dokumentu

indexovat textové dokumenty

normalizovat textové dokumenty

chápat hlavní principy klasifikace textových dokumentů

Last update: Svozil Daniel (23.05.2018)