Předměty

S nástupem elektronických dokumentů nastala situace, kdy jejich počet roste mnohem vyšším tempem, než možnosti, schopnosti a ochota lidí je číst. Metody oboru Information Retrieval sice poskytují přehled o tom, ve kterých dokumentech se hledaná informace zřejmě nachází, ale to jenom znamená, že umožňují vybírat dokumenty podle klíčových slov, kterými indexování dokumentů charakterizuje jejich obsah. Tím jen vytvářejí síto, kterým protéká stále větší a větší počet dokumentů. Metody oboru Text Mining mají za cíl nejen dokumenty vybírat podle klíčových slov, ale také určovat, co vypovídají. To je úloha velmi složitá, neboť souvisí se sémantikou přirozeného jazyka, kterou často i školení lidé interpretují nejednoznačně. Používají se statistické metody, metody information retrieval, metody počítačové linguistiky i klasifikační metody umělé intelligence. Text Mining zkoumá zejména následující možnosti práce s textem: Informatin extraction - identifikace klíčových komponent textu a vztahů mezi nimi, Topic tracking - inteligentní filtrování textů na základě profilu uživatele, Summarization - shrnutí obsahu textu, Sentence extraction - identifikace vět, které jsou pro obsah dokumentu klíčové, Kategorizace, klasifikace, clustering - rozdělování textů do tříd podle příbuznosti obsahu, Concept linkage - hledání vztahů mezi texty, které mají společné koncepty

Poslední úprava: Svozil Daniel (23.05.2018)

A number of electronic documents grows much faster than a human is able to deal with. Though inormation retrieval methods help to identify documents likely containing a given information based on keywords, text mining approaches deal with the interpretation of information hidden in the documents. This difficult task is related to the semantics of a natural language that is difficult to interpret unequivocally even for trained experts. Text mining adopts various statistical and information retrieval methods, approaches of a computational linguistics and artificial intelligence classification methods. In text mining, following tasks are solved: Informatin extraction - the identification of key text components and of relationships between them, Topic tracking - an intelligent text filtering based on the user profile, Summarization - the summariozation of text content, Sentence extraction - the identification of sentences that are important for text understanding, Categorization, classification, clustering - text categorization based on content similarity, Concept linkage - the identification of relationships between texts with common concepts.

Poslední úprava: Svozil Daniel (25.05.2018)

ústní zkouška

Poslední úprava: Svozil Daniel (23.05.2018)

oral exam

Poslední úprava: Svozil Daniel (23.05.2018)

Z: R: Weiss, S.M. et all: Text Mining - Predictive Methods for Analyzing Unstructured Information. Springer, 2005

Poslední úprava: Svozil Daniel (23.05.2018)

R: Weiss, S.M. et all: Text Mining - Predictive Methods for Analyzing Unstructured Information. Springer, 2005

Poslední úprava: Svozil Daniel (23.05.2018)

Text Mining, Data Mining, Knowledge Discovery, Text Processing - základní pojmy

Information Retrieval - základní pojmy, textové dokumenty a klíčová slova, relevance a fuzzy logika, indexování, vektorový model

Latentní semantické indexování a singulární dekompozice matic

Shlukování klíčových slov, shlukování dokumentů

Klasifikace textů, pravděpodobnostní klasifikace - Naive Bayes, klasifikace pomocí metody k-NN, rozhodovacích stromů, neuronových sítí, support vector machine

Metody linguistiky v text mining, lexikon, part-of-speech tagging, named entity recognition, parsing, koreference

Aplikace, automatická extrakce obsahu dokumentu, automatické shrnutí obsahu dokumentu, automatické odpovědi na dotazy

Poslední úprava: Svozil Daniel (25.05.2018)

Text Mining, Data Mining, Knowledge Discovery, Text Processing - basic concepts

Information Retrieval - basic concepts, text documents and keywords, relevance and fuzzy logic, indexing, vector model

Latent semantic indexing and singular value decomposition

Clustering of keywords and documents

Text classification, porobabilistic classification - Naive Bayes, k nearest neighbors, decision trees, neural networks, support vector machines

Linguistics in text mining, lexicon, part-of-speech tagging, named entity recognition, parsing, co-references

Text mining applications, automatic content extraction, automatic question answering

Poslední úprava: Svozil Daniel (25.05.2018)

materiály přednášejícího

Poslední úprava: Svozil Daniel (23.05.2018)

Lecturer materials

Poslední úprava: Svozil Daniel (23.05.2018)

Studenti budou umět:

identifikovat klíčové komponenty textu a vztahy mezi nimi

automaticky shrnout obsahu textu

identifikovat obsahově klíčové věty

kategorizovat texty do tříd podle příbuznosti obsahu

hledat vztahy mezi texty se společnými koncepty

Poslední úprava: Svozil Daniel (25.05.2018)

Students will know:

to identify key text components and relationships between them

to automatically summarize text content

to indetify key factual sentences

to categorize texts into classes based on the similarity of their contents

to search for relationships between texts with same concepts

Poslední úprava: Svozil Daniel (25.05.2018)

Přednáška Information retrieval

Poslední úprava: Svozil Daniel (25.05.2018)

DSP lecture Information retrieval

Poslední úprava: Svozil Daniel (25.05.2018)