Flashcards in Deck 8 - Text Analytics Part 1 Deck (27):
A 2011 Report from The Data Warehouse Institute identifies the analysis of _____ as an area of high potential growth.
In The Rexer Analytics 2013 Data Mining Survey, ___ ___ was listed as the 5th most frequently used algorithm
What are examples of unstructured sources of data?
Web pages, E-mail, news & blog articles, forum postings, and other social media, Contact-center notes and transcripts, Surveys, feedback forms, warranty claims, and every kind of corporate documents imaginable
What is an issue with unstructured sources?
They may mix fact and sentiment
What do people do with electronic documents?
1. Publish, Manage, and Archive.
2. Index and Search.
3. Categorize and Classify according to metadata & contents.
4. Information Extraction.
For textual documents, text analytics enhances what?
Publish, manage, archive, Index and search
For textual documents, text analytics enables what?
Categorize and Classify according to metadata & contents and Information Extraction
What do you need linguistics for?
Publish, Manage, Archive and information extraction
Is "search" enough?
What does 'search' involve?
Words & phrases: search terms & natural language.
Qualifiers: include/exclude, and/or, not, etc.
What do answers involve?
Entities: names, e-mail addresses, phone numbers
Concepts: abstractions of entities.
Facts and relationships.
Abstract attributes, e.g., “expensive,” “comfortable”
Opinions, sentiments: attitudinal information.
Why does 'search' fall short?
-Search helps you find things you already know about. It doesn’t help you discover things you’re unaware of
-Results often lack relevance
-Doesn’t enable unified analytics that links data from textual and transactional sources
What can make search better?
How does text analytics enhance information retrieval?
-Recognizes patterns in search queries to enable basic question answering
-Recognizes patterns in search results to enable clustering of results
What is the next step beyond Information Retrieval (IR)?
Information Extraction (IE)
What is text mining?
Data Mining of textual sources AND
Knowledge Discovery in Text
What is N-Gram?
A string of characters
What does API stand for and what does it do?
Application Programming Interface
-set of routines, protocols, and tools that make an IT system (e.g., a Web server) accessible to outside programmers
What does Word Cloud provide?
A convenient way to visualize text content
What are some applications for Word Cloud?
-As topic summaries for speeches and written works
-As blog tool or website analysis for search engine optimization
-For visual analysis of qualitative data
-As brand clouds that let companies see how they are perceived
-For data mining a text corpus
-For helping writers and students reflect on their work
-As name tags for conferences and cocktail parties
-As résumés in a single glance
-As visual poetry
What are the 4 steps to creating a word cloud in R?
STEP 1: Install some R packages
STEP 2: Load the libraries
STEP 3: Read the text file and clean up the text
STEP 4: Create a term-document matrix, count word frequencies, and produce the word cloud
What is an important data structure for many “bag-of-words” text analytics algorithms?
What is another name for Term-Document Matrix?
Term Frequency Matrix
What do Word Clouds need?
The row sums of term frequencies
What function is used to get the row sums of term frequencies for Word Clouds?
What does Sentiment Analysis do?
Uses lists of positive and negative words