Describe the three main ways of data?
What is Text Analytics?
Text mining, also referred to as text data mining, roughly equivalent to text analytics, is the process of deriving high-quality information from text. Used for:
What are the challenges of Text Analytics?
Which two parts has the general process of text analytics?
What are the five main steps of the Text Analytics Process?
What is done in the Pre-Processing phase?
What are Stemming & Lemmatization?
Stemming is a technique used to find out the root / stem of a word.
Lemmatization is a technique used to find out the lemma of a word.
What does the abbreviation TF-IDF stands for and where is it used?
Used in Text Data Analysis.
Explain TF?
TF = Term Frequency
Gives us the frequency of the word in each document in the corpus. It increases as the number of occurrences of that word within the document increases. Each term has its own TF in each document.
Explain IDF?
IDF = Inverse Document Frequency
Used to calculate the weight of rare words across all documents in the corpus. The words that occur rarely in the corpus have a high IDF score.
What can be done with vectors?
What are the three basic ways of text measurement in Text Mining?
Limitations: The techniques do not cover the synonym scenario (dog / puppy) etc.
What are common techniques for Text Data Analysis?