A number of electronic documents grows much faster than a human is able to deal with. Though inormation retrieval methods help to identify documents likely containing a given information based on keywords, text mining approaches deal with the interpretation of information hidden in the documents. This difficult task is related to the semantics of a natural language that is difficult to interpret unequivocally even for trained experts. Text mining adopts various statistical and information retrieval methods, approaches of a computational linguistics and artificial intelligence classification methods. In text mining, following tasks are solved: Informatin extraction - the identification of key text components and of relationships between them, Topic tracking - an intelligent text filtering based on the user profile, Summarization - the summariozation of text content, Sentence extraction - the identification of sentences that are important for text understanding, Categorization, classification, clustering - text categorization based on content similarity, Concept linkage - the identification of relationships between texts with common concepts.
Last update: Svozil Daniel (25.05.2018)
Students will know:
Last update: Svozil Daniel (25.05.2018)
R: Weiss, S.M. et all: Text Mining - Predictive Methods for Analyzing Unstructured Information. Springer, 2005 Last update: Svozil Daniel (23.05.2018)
Lecturer materials Last update: Svozil Daniel (23.05.2018)
Text Mining, Data Mining, Knowledge Discovery, Text Processing - basic concepts Information Retrieval - basic concepts, text documents and keywords, relevance and fuzzy logic, indexing, vector model Latent semantic indexing and singular value decomposition Clustering of keywords and documents Text classification, porobabilistic classification - Naive Bayes, k nearest neighbors, decision trees, neural networks, support vector machines Linguistics in text mining, lexicon, part-of-speech tagging, named entity recognition, parsing, co-references Text mining applications, automatic content extraction, automatic question answering Last update: Svozil Daniel (25.05.2018)
DSP lecture Information retrieval Last update: Svozil Daniel (25.05.2018)
oral exam Last update: Svozil Daniel (23.05.2018)