Deck 9 - Text Analytics Part 2 Flashcards Preview

Big Data Analystics - Final Exam 2 > Deck 9 - Text Analytics Part 2 > Flashcards

Flashcards in Deck 9 - Text Analytics Part 2 Deck (20):
1

What are two approaches in Topic Extraction?

-Latent Dirichlet Allocation (LDA)/Topic Model
-Latent Semantic Analysis

2

What does LDA do?

-Document has a distribution of topics
-Topic has a distribution of words
-Words in one topic can participate in another topic

3

What provides a way to sample from subjective probability distributions?

Markov Chain Monte Carlo

4

What was introduced in 1990 as a method for search engines (e.g. Google)?

Latent Semantic Analysis (LSA)

5

How do you quantify text?

By using the Vector Space Model (VSM)

6

What puts structure to unstructured data by compiling a term-by document matrix of frequencies (X)?

Vector Space Model

7

How are patterns extracted as Latent Concepts, by producing the singular values Σ, which are the square roots of eigenvalues?

By Singular Value Decomposition (SVD)

8

What does SVD help indicate?

The relative importance of the latent concepts

9

What is the purpose of a scree plot?

To help us decide how many latent concepts should be retained

10

What do Term Loadings reveal?

The relationship between terms and latent concepts

11

What do Document Loadings reveal?

The relationship between documents and latent concepts

12

What should you be able to do after examining the high-loading terms and high-loading documents for each latent concept (topic)?

Label the topics, if they exist

13

What software has the ability to perform Topic Extraction?

SAS Enterprise Miner 14.2

14

What node in SAS Enterprise Miner 14.2 helps export variables?

the Save Data node

15

What two approaches can help with cross tab analysis?

Chi-square test and Correspondence analysis

16

What does a chi-square tell us?

A chi-square test tells us that topics and sources are dependent (small p-value)

17

How can the relationship between rows and columns in the contingency table can be quantified?

Using correspondence analysis

18

How can rows and columns be then visualized?

Using Topic-Attribute Map

19

How does correspondence analysis work?

It uses Principal Components to project the cross-tab data on fewer dimensions

20

What is LDA?

Latent Dirichlet Allocation/Topic Model