Deck 8 - Text Analytics Part 1 Flashcards Preview

Big Data Analystics - Final Exam 2 > Deck 8 - Text Analytics Part 1 > Flashcards

Flashcards in Deck 8 - Text Analytics Part 1 Deck (27):

A 2011 Report from The Data Warehouse Institute identifies the analysis of _____ as an area of high potential growth.

Unstructured Text


In The Rexer Analytics 2013 Data Mining Survey, ___ ___ was listed as the 5th most frequently used algorithm

Text Mining


What are examples of unstructured sources of data?

Web pages, E-mail, news & blog articles, forum postings, and other social media, Contact-center notes and transcripts, Surveys, feedback forms, warranty claims, and every kind of corporate documents imaginable


What is an issue with unstructured sources?

They may mix fact and sentiment


What do people do with electronic documents?

1. Publish, Manage, and Archive.
2. Index and Search.
3. Categorize and Classify according to metadata & contents.
4. Information Extraction.


For textual documents, text analytics enhances what?

Publish, manage, archive, Index and search


For textual documents, text analytics enables what?

Categorize and Classify according to metadata & contents and Information Extraction


What do you need linguistics for?

Publish, Manage, Archive and information extraction


Is "search" enough?



What does 'search' involve?

Words & phrases: search terms & natural language.
Qualifiers: include/exclude, and/or, not, etc.


What do answers involve?

Entities: names, e-mail addresses, phone numbers
Concepts: abstractions of entities.
Facts and relationships.
Abstract attributes, e.g., “expensive,” “comfortable”
Opinions, sentiments: attitudinal information.


Why does 'search' fall short?

-Search helps you find things you already know about. It doesn’t help you discover things you’re unaware of
-Results often lack relevance
-Doesn’t enable unified analytics that links data from textual and transactional sources


What can make search better?

Text Analytics


How does text analytics enhance information retrieval?

-Recognizes patterns in search queries to enable basic question answering
-Recognizes patterns in search results to enable clustering of results


What is the next step beyond Information Retrieval (IR)?

Information Extraction (IE)


What is text mining?

Data Mining of textual sources AND
Knowledge Discovery in Text


What is N-Gram?

A string of characters


What does API stand for and what does it do?

Application Programming Interface
-set of routines, protocols, and tools that make an IT system (e.g., a Web server) accessible to outside programmers


What does Word Cloud provide?

A convenient way to visualize text content


What are some applications for Word Cloud?

-As topic summaries for speeches and written works
-As blog tool or website analysis for search engine optimization
-For visual analysis of qualitative data
-As brand clouds that let companies see how they are perceived
-For data mining a text corpus
-For helping writers and students reflect on their work
-As name tags for conferences and cocktail parties
-As résumés in a single glance
-As visual poetry


What are the 4 steps to creating a word cloud in R?

STEP 1: Install some R packages
STEP 2: Load the libraries
STEP 3: Read the text file and clean up the text
STEP 4: Create a term-document matrix, count word frequencies, and produce the word cloud


What is an important data structure for many “bag-of-words” text analytics algorithms?

Term-Document Matrix?


What is another name for Term-Document Matrix?

Term Frequency Matrix


What do Word Clouds need?

The row sums of term frequencies


What function is used to get the row sums of term frequencies for Word Clouds?

rowSums(m) function


What does Sentiment Analysis do?

Uses lists of positive and negative words


How does Sentiment Analysis work?

-Uses lists of positive and negative words
-Each document is scored on a scale that includes shades of negative and positive sentiment