BIS II - Data, Dashboards & Visual Analytics with Tableau Flashcards

1
Q

Performance Dashboards

A
  • Provide visual displays of important information
  • That is consolidated and arranged on a single screen
  • So that the information can be easily digested at one glance & easily drilled in an further explored
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What to look for in a dashboard

A
  • Use of visual components to highlight data and exceptions
  • Transparent to the user, i.e. require minimal training and are easy to use
  • Combine data from a variety of systems into a single, unified view of the business
  • Enable drill-down or drill-through to underlying data sources/reports
  • Present a dynamic, real-world view with timely data
  • Require little coding to implement/deploy/maintain
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Best practices in Dashboard Design

A
  • Benchmark KPIs with industry standards
  • Wrap the metrics with contextual data
  • Validate the design by a usability specialist
  • Prioritize and rand alters and exceptions
  • Enrich dashboard with business-user levels
  • Present information in three different levels
  • Pick the right visual constructs
  • Provide for guided analytics
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Text Mining & Text Analytics

Data in DSS – Data

A
  • Is a collection of facts usually obtained as the result of experiences, observations or experiments
  • May consist of numbers, words, images, …
  • Is the lowest level of abstraction (from which information and knowledge are derived)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why text is important

A

• Text is everywhere
o Many legacy applications still produce text
o Medical records (hand-written)
o Consumer complaint logs)
o Customer reviews
• 85% of corporate data is stored in some unstructured form, doubling every 18 months
• Text filtering can be applied in many contexts, e.g. impact of online word of mouth on sales, or classifying and filtering junk-email or NLP (natural language processing – Alexa)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Why text is difficult

A

• Often referred to as “unstructured data”
• Has a linguistic structure – intended for human consumption, not for computers.
o Words have varying lengths
o Text fields can have varying numbers of words
o Sometimes word order matters, sometimes not
• Text is relatively “dirty”, because people…
o Write ungrammatically
o Misspell words
o Abbreviate unpredictably
o … etc.
• Context is important

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Representation of Textual Data

A

• Goal: to turn the text into a feature-vector form
• General strategy: to use the simplest (least expensive) technique
• Terminology that is borrowed from information retrieval:
o Document = one piece of text, no matter how large or small
o Document is composed of individual tokens and terms. E.g. a word is a token.
o Corpus = a collection of documents
• Representation techniques:
1. Bag of words
2. Term frequency
3. Sparseness: inverse document frequency, TFIDF
4. N-grams

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

1) Bag of Words

A

Approach:
• Treats each document as just a collection of individual words
• Ignores grammar, word order, sentence structure and punctuation
• Straightforward and inexpensive to generate
• Works well for many tasks

Application case: Spam filtering:
• Represent e-mail messages as unordered bag of words
• Then compare them to the typical “spam” bag of words, e.g. containing “Viagra”, “Stock”, “buy”
• Where there is a big overlap, the message is classified as spam e-mail

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

2) Term frequency

A

 Next step is to use the word count in the document instead of just a zero or one
 Usually, the following steps are performed:
• Normalization: every term is in lowercase. E.g. iPhone, iphone, IPHONE -> iphone
• Stemming: suffixes are removed, plurals are turned to the singular forms. E.g. announces, announced, announcing -> announc; directors -> director
• Removal of stopwords: very common words in language being parsed. E.g. the, and, of, on in English

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

3) Sparseness: inverse document frequency - IDF and TFIDF

A

a. Take into account the distribution of the term over a corpus as well
b. The term should not be too rare and not too common
c. Impose upper & lower limits of term frequency
d. Inverse document frequency:
i. IFD(t) = 1+ log(Total # of documents/#of documents containing t)
ii. IDF may be thought of as the boost a term gets for being rare
e. TFIDF: is a product of term frequency (TF) and inverse document frequency (IDF)
i. Is a specific to a single document whereas IDF depends on entire corpus

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

4) N-gram sequences

A

a. Used when word order is important & the information about it should be preserved
b. N-grams = Sequences of adjacent words, are included as terms
c. E.g. “The quick brown fox jumps” -> {quick, brown, fox, jumps}, {quick_brown, brown_fox; fox_jumps}, {quick_brown_fox, brown_fox_jumps}
d. Advantage: easy to generate, require no linguistic knowledge
e. Disadvantage: greatly increase the size of the feature sets -> needs some special consideration for dealing with massive numbers of features and computational storage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Sentiment Analysis

A

• Sentiment: belief, view, opinion, conviction
• Sentiment analysis: opinion mining, subjectivity analysis, and appraisal extraction
• Goal: to answer the question: “What do people feel about a certain topic?”
• Explicit vs. implicit sentiment
• Sentiment polarity
o Positive vs. negative vs. neutral
• E.g. Linguistic Inquiry and Word Count (LIWC) Program:
o Counts percentage of words that reflect different emotions, thinking styles, social concerns, etc. in a text, to capture people’s social and psychological states.
o Words are categorized into different sections, e.g. swearing, or past, and it is counted how many words of that specific category were found

How well did you know this?
1
Not at all
2
3
4
5
Perfectly