SubjectsSubjects(version: 948)
Course, academic year 2021/2022
  
Text mining - AP500001
Title: Text mining
Guaranteed by: Department of Informatics and Chemistry (143)
Faculty: Faculty of Chemical Technology
Actual: from 2019
Semester: both
Points: 0
E-Credits: 0
Examination process:
Hours per week, examination: 3/0, other [HT]
Capacity: winter:unlimited / unknown (unknown)
summer:unknown / unknown (unknown)
Min. number of students: unlimited
Language: English
Teaching methods: full-time
Teaching methods: full-time
Level:  
For type: doctoral
Note: can be fulfilled in the future
you can enroll for the course in winter and in summer semester
Guarantor: Kroha Petr prof. Dr. Ing. CSc.
Interchangeability : P500001
Annotation -
Last update: Pátková Vlasta (08.06.2018)
A number of electronic documents grows much faster than a human is able to deal with. Though inormation retrieval methods help to identify documents likely containing a given information based on keywords, text mining approaches deal with the interpretation of information hidden in the documents. This difficult task is related to the semantics of a natural language that is difficult to interpret unequivocally even for trained experts. Text mining adopts various statistical and information retrieval methods, approaches of a computational linguistics and artificial intelligence classification methods. In text mining, following tasks are solved: Informatin extraction - the identification of key text components and of relationships between them, Topic tracking - an intelligent text filtering based on the user profile, Summarization - the summariozation of text content, Sentence extraction - the identification of sentences that are important for text understanding, Categorization, classification, clustering - text categorization based on content similarity, Concept linkage - the identification of relationships between texts with common concepts.
Aim of the course -
Last update: Pátková Vlasta (08.06.2018)

Students will know:

  • to identify key text components and relationships between them
  • to automatically summarize text content
  • to indetify key factual sentences
  • to categorize texts into classes based on the similarity of their contents
  • to search for relationships between texts with same concepts
Literature -
Last update: Pátková Vlasta (08.06.2018)

R: Weiss, S.M. et all: Text Mining - Predictive Methods for Analyzing Unstructured Information. Springer, 2005

Learning resources -
Last update: Pátková Vlasta (08.06.2018)

Lecturer materials

Syllabus -
Last update: Pátková Vlasta (08.06.2018)

Text Mining, Data Mining, Knowledge Discovery, Text Processing - basic concepts

Information Retrieval - basic concepts, text documents and keywords, relevance and fuzzy logic, indexing, vector model

Latent semantic indexing and singular value decomposition

Clustering of keywords and documents

Text classification, porobabilistic classification - Naive Bayes, k nearest neighbors, decision trees, neural networks, support vector machines

Linguistics in text mining, lexicon, part-of-speech tagging, named entity recognition, parsing, co-references

Text mining applications, automatic content extraction, automatic question answering

Registration requirements -
Last update: Pátková Vlasta (08.06.2018)

DSP lecture Information retrieval

Course completion requirements -
Last update: Pátková Vlasta (08.06.2018)

oral exam

 
VŠCHT Praha