Deck 8 - Text Analytics Part 1 Flashcards Preview

Big Data Analystics - Final Exam 2 > Deck 8 - Text Analytics Part 1 > Flashcards

Flashcards in Deck 8 - Text Analytics Part 1 Deck (27):
1

A 2011 Report from The Data Warehouse Institute identifies the analysis of _____ as an area of high potential growth.

Unstructured Text

2

In The Rexer Analytics 2013 Data Mining Survey, ___ ___ was listed as the 5th most frequently used algorithm

Text Mining

3

What are examples of unstructured sources of data?

Web pages, E-mail, news & blog articles, forum postings, and other social media, Contact-center notes and transcripts, Surveys, feedback forms, warranty claims, and every kind of corporate documents imaginable

4

What is an issue with unstructured sources?

They may mix fact and sentiment

5

What do people do with electronic documents?

1. Publish, Manage, and Archive.
2. Index and Search.
3. Categorize and Classify according to metadata & contents.
4. Information Extraction.

6

For textual documents, text analytics enhances what?

Publish, manage, archive, Index and search

7

For textual documents, text analytics enables what?

Categorize and Classify according to metadata & contents and Information Extraction

8

What do you need linguistics for?

Publish, Manage, Archive and information extraction

9

Is "search" enough?

No

10

What does 'search' involve?

Words & phrases: search terms & natural language.
Qualifiers: include/exclude, and/or, not, etc.

11

What do answers involve?

Entities: names, e-mail addresses, phone numbers
Concepts: abstractions of entities.
Facts and relationships.
Abstract attributes, e.g., “expensive,” “comfortable”
Opinions, sentiments: attitudinal information.
Data.

12

Why does 'search' fall short?

-Search helps you find things you already know about. It doesn’t help you discover things you’re unaware of
-Results often lack relevance
-Doesn’t enable unified analytics that links data from textual and transactional sources

13

What can make search better?

Text Analytics

14

How does text analytics enhance information retrieval?

-Recognizes patterns in search queries to enable basic question answering
-Recognizes patterns in search results to enable clustering of results

15

What is the next step beyond Information Retrieval (IR)?

Information Extraction (IE)

16

What is text mining?

Data Mining of textual sources AND
Knowledge Discovery in Text

17

What is N-Gram?

A string of characters

18

What does API stand for and what does it do?

Application Programming Interface
-set of routines, protocols, and tools that make an IT system (e.g., a Web server) accessible to outside programmers

19

What does Word Cloud provide?

A convenient way to visualize text content

20

What are some applications for Word Cloud?

-As topic summaries for speeches and written works
-As blog tool or website analysis for search engine optimization
-For visual analysis of qualitative data
-As brand clouds that let companies see how they are perceived
-For data mining a text corpus
-For helping writers and students reflect on their work
-As name tags for conferences and cocktail parties
-As résumés in a single glance
-As visual poetry

21

What are the 4 steps to creating a word cloud in R?

STEP 1: Install some R packages
STEP 2: Load the libraries
STEP 3: Read the text file and clean up the text
STEP 4: Create a term-document matrix, count word frequencies, and produce the word cloud

22

What is an important data structure for many “bag-of-words” text analytics algorithms?

Term-Document Matrix?

23

What is another name for Term-Document Matrix?

Term Frequency Matrix

24

What do Word Clouds need?

The row sums of term frequencies

25

What function is used to get the row sums of term frequencies for Word Clouds?

rowSums(m) function

26

What does Sentiment Analysis do?

Uses lists of positive and negative words

27

How does Sentiment Analysis work?

-Uses lists of positive and negative words
-Each document is scored on a scale that includes shades of negative and positive sentiment