Flashcards in Deck 9 - Text Analytics Part 2 Deck (20):
What are two approaches in Topic Extraction?
-Latent Dirichlet Allocation (LDA)/Topic Model
-Latent Semantic Analysis
What does LDA do?
-Document has a distribution of topics
-Topic has a distribution of words
-Words in one topic can participate in another topic
What provides a way to sample from subjective probability distributions?
Markov Chain Monte Carlo
What was introduced in 1990 as a method for search engines (e.g. Google)?
Latent Semantic Analysis (LSA)
How do you quantify text?
By using the Vector Space Model (VSM)
What puts structure to unstructured data by compiling a term-by document matrix of frequencies (X)?
Vector Space Model
How are patterns extracted as Latent Concepts, by producing the singular values Σ, which are the square roots of eigenvalues?
By Singular Value Decomposition (SVD)
What does SVD help indicate?
The relative importance of the latent concepts
What is the purpose of a scree plot?
To help us decide how many latent concepts should be retained
What do Term Loadings reveal?
The relationship between terms and latent concepts
What do Document Loadings reveal?
The relationship between documents and latent concepts
What should you be able to do after examining the high-loading terms and high-loading documents for each latent concept (topic)?
Label the topics, if they exist
What software has the ability to perform Topic Extraction?
SAS Enterprise Miner 14.2
What node in SAS Enterprise Miner 14.2 helps export variables?
the Save Data node
What two approaches can help with cross tab analysis?
Chi-square test and Correspondence analysis
What does a chi-square tell us?
A chi-square test tells us that topics and sources are dependent (small p-value)
How can the relationship between rows and columns in the contingency table can be quantified?
Using correspondence analysis
How can rows and columns be then visualized?
Using Topic-Attribute Map
How does correspondence analysis work?
It uses Principal Components to project the cross-tab data on fewer dimensions