What methods do Text Mining use?
Information Retrieval. Pre-processing of text documnets
What tasks do text mining do?
Text Classification, Text Clustering or Text Summarization
What is an issue with text mining vs traditional data mining?
Traditional data mining is structured. Text often has no real structure.
What is a Vector Space Model?
A document is represented as a “bag” of words.
What is a problem with Vector Space Model?
There are many words in the English language.
How do you fix the limitations of the Vector Space Model?
Removing the stop words (“A, the, this, that …”)
Stemming (e.g combine the similar verbs (past/present tense)
How do you assign the weight (importance) of a term in text-mining?
Use TF-IDF
Weight = TF * IDF
TF = Term Frequency (how many times)
IDF = Inverse Document Frequency = log (total documents / document frequency)
What are the steps involved in text mining?
How to measure the similarity between two documents?
Use cosine distance.