Flashcards in Deck 9 - Text Analytics Part 2 Deck (20):

1

## What are two approaches in Topic Extraction?

###
-Latent Dirichlet Allocation (LDA)/Topic Model

-Latent Semantic Analysis

2

## What does LDA do?

###
-Document has a distribution of topics

-Topic has a distribution of words

-Words in one topic can participate in another topic

3

## What provides a way to sample from subjective probability distributions?

### Markov Chain Monte Carlo

4

## What was introduced in 1990 as a method for search engines (e.g. Google)?

### Latent Semantic Analysis (LSA)

5

## How do you quantify text?

### By using the Vector Space Model (VSM)

6

## What puts structure to unstructured data by compiling a term-by document matrix of frequencies (X)?

### Vector Space Model

7

## How are patterns extracted as Latent Concepts, by producing the singular values Σ, which are the square roots of eigenvalues?

### By Singular Value Decomposition (SVD)

8

## What does SVD help indicate?

### The relative importance of the latent concepts

9

## What is the purpose of a scree plot?

### To help us decide how many latent concepts should be retained

10

## What do Term Loadings reveal?

### The relationship between terms and latent concepts

11

## What do Document Loadings reveal?

### The relationship between documents and latent concepts

12

## What should you be able to do after examining the high-loading terms and high-loading documents for each latent concept (topic)?

### Label the topics, if they exist

13

## What software has the ability to perform Topic Extraction?

### SAS Enterprise Miner 14.2

14

## What node in SAS Enterprise Miner 14.2 helps export variables?

### the Save Data node

15

## What two approaches can help with cross tab analysis?

### Chi-square test and Correspondence analysis

16

## What does a chi-square tell us?

### A chi-square test tells us that topics and sources are dependent (small p-value)

17

## How can the relationship between rows and columns in the contingency table can be quantified?

### Using correspondence analysis

18

## How can rows and columns be then visualized?

### Using Topic-Attribute Map

19

## How does correspondence analysis work?

### It uses Principal Components to project the cross-tab data on fewer dimensions

20