Quiz 1 Prep Flashcards

(89 cards)

1
Q

when we have a lot of information and it exceeds the knowledge that we have. Superfluous information.

A

Information Overload

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Examples of __________
o High information about the 2016 presidential election, yet we have very little knowledge about what lead to this outcome!
o Daily information about the wars, yet very little knowledge about the underlying causes!

A

Information Overload

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Causes of poor data quality (4)

A
  • Platform Availability
  • Formality
  • Cost
  • Competitive Advantage
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Examples of platforms in data (4)

A

patient portals, social media platforms, online forums, e-commerce rating/review platforms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Aspects of platform availability (2)

A

empowered users and automated processes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Information provides firms with a _________.

A

competitive advantage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Acquiring high quality data is _________.

A

expensive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Abundant data quality is generally cheap but _________.

A

low quality

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Today’s data characteristics (2)

A

Volume and variety

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Information overload can lead to ___________.

A

poor decision making

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Characteristics of information overload (3)

A
  • Complexity
  • Substitution
  • Attention Deficit
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Substitute high quality data with low quality data.

A

Information overload - substitution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Example of information overload - substitution

A

Tinder

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Characteristics of Information Overload: _______
• The illusion of multi-tasking
• When we are exposed to a lot of data, our cognitive power is reduced?
• Distraction (Low cognitive power)

A

Attention Deficit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Characteristics of Information Overload: ______
• Time needed to consider all offered options (millions)
• Fear to miss important data needed for decision making (FOMO)

A

Complexity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Example of information overload: ______

buying a house

A

complexity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Characteristics of Paradigm Shift (4)

A
  • Get expert advice
  • Get information from a trusted source
  • Get wisdom of the crowds
  • Get information from a connected source
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Three characteristics of analytics

A
  1. Recommender Systems – Classification & Prediction
  2. Pattern Recognition
  3. Anomaly Detection algorithms
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Most companies have plenty of ____, but not enough _______.

A

data; knowledge

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Companies collect data about customers, products, sales but still lack knowledge to (3):

A
  • Identify products, customers and sales channels that return the highest profit margins.
  • Forecast variations in buying patterns across different types of customers.
  • Predict customer churn
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Organizations are being compelled to capture, understand, and harness their _____ to support _________ in order to improve __________.

A

data; decision making; business operations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

The extensive use of data, statistical and quantitative analysis, explanatory and predictive models, and fact-based management to drive decisions and actions.

A

Business Analytics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Two Consequences of the Information Age

A
  1. Every business process generates data.

2. Every business needs analytics to remain competitive.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Idiosyncrasies of Business Analytics (3)

A
  1. The Data
  2. The Users and Sponsors
  3. The Methodology
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Intelligent use of _______ results in the following: • better understanding of how technological, economic, and marketplace shifts affect business performance • ability to consistently and reliably distinguish between effective and ineffective interventions • efficient use of assets, reduced waste in supplies, and better management of time and resources • risk reduction via measurable outcomes and reproducible findings • early detection of market trends hidden in massive data • continuous improvement in decision making over time
analytics
26
A lot of people confuse analytics with _______.
simple reporting
27
Examples of proactive analytical investigation (5)
- inferential statistics - experimentation - empirical validation - forecasting - optimization
28
____________ answers questions such as: • What does a change in the market mean for my targets? • What do other factors tell me about what I can expect from my target? • What is the best combination of factors to give me the most efficient use of resources and maximum profitability? • What is the highest price the market will tolerate? • What will happen in six months if I do nothing? What if I implement an alternative strategy?
Proactive analytical investigation
29
Business Applications for Data Analytics (7)
- Churn analysis - Cross-selling - Fraud detectionvid - Risk management - Customer segmentation - Targeted ads - Sales forecast
30
Data Mining Tasks (6)
- Classification - Association - Regression - Forecasting - Sequence Analysis - Deviation Analysis
31
________ is an iterative process.
Knowledge discovery
32
the core of the knowledge discovery process.
Data mining
33
The Knowledge Discovery Process (KDD) (8 Steps)
1. Data Collection 2. Data Cleaning and Transformation 3. Model Building 4. Model Assessment 5. Reporting 6. Prediction (Scoring) 7. Application Integration 8. Model Management
34
Two types of reports:
- Findings | - Prediction or forecast
35
How can we use data mining models? (3)
* Insight * Prediction * Description
36
Assign items to a discrete class based on training data.
Classification
37
Data Mining Approaches (2)
- Supervised Learning (Estimation and Classification) | - Unsupervised Learning (Clustering)
38
``` o Step 1: The training data contains target class information for each record. o Step 2: New records are classified based on the models developed on the training data. ```
Steps of supervised learning
39
Systems designed to generate personalized recommendations to users for products and services.
Recommender Systems
40
CRM
Customer relationship management
41
How can large firms with millions of customers know customers individually?
Through analytical CRM - scoring and prediction
42
Recommender Systems – Design 1 | 2
● Most Popular Recommendations ● Contextual Recommendations
43
Recommendations based on transactions made by the entire population.
Most Popular Recommendations
44
Newspapers recommend top stories
Example of Most Popular Recommendations
45
Types of Contextual Recommendations
- Time-Based | - Location-Based
46
Staples.com: Time to re-order ink
Example of time-based recommendation
47
At the mall, could see ads/coupons of the store closest to you
Example of location-based recommendation
48
Recommender Systems – Design 2 | 2
- Personalized Recommendations | - Based on transaction history
49
Types of recommendations based on transaction history
- Content-based - Collaborative-filtering - Social marketing
50
 Recommend Similar items  Domain specific OR problem specific  “People who have bought this, also bought that”
Examples of content-based (market-basket) analysis
51
 “Customers like you purchased these products” |  Examples: Amazon, Netflix
Examples of collaborative-filtering
52
the nontrivial extraction of implicit, previously unknown, and potentially useful information from data.
Knowledge discovery
53
_______ usually takes the most effort.
Data preparation
54
A lot of people underestimate the efforts needed for ________.
pre-processing
55
_________ & _________ are very important and take a lot of time and effort
Data collection; pre-processing
56
Type of learning that includes a target variable or label.
Supervised learning
57
Type of learning that does not include a target variable or label.
Unsupervised learning
58
* The classification of the training data is unknown. | * The aim is to construct a set of clusters, given the data.
Unsupervised learning (clustering)
59
Review all data available for data mining. | Assess and explore data.
Data Understanding
60
What are my available data sources? (4)
* Corporate data sources * External data * Free external sources (census bureau) * Paid External Sources (Syndicated Databases)
61
Characteristics of data selection (4)
- Find available data sources - Find metadata - Clearly identify business objective prior to data selection - Avoid data sets with averages
62
Data types (3)
- Numeric or continuous data - Symbolic data - String data
63
a data type used in programming, such as an integer and floating point unit, but is used to represent text rather than numbers. It is comprised of a set of characters that can also contain spaces and numbers.
String data
64
Examples of numeric or continuous data
* Integer: age, income | * Real: claim amount
65
Examples of _______ • Flag / dichotomy / binary variable: only has two categorical values (Yes/No, True/False, Vote/no Vote, Response/No Response) • Categorical variable: has more than two categorical values (• Unordered: Region, Plan type, Product code • Ordered: satisfaction rating (very satisfied to not very satisfied))
symbolic data
66
Examples of ___________ • Frequency Distributions, Pie Charts, Bar Charts help identify potential problems (for example we have no Hispanics in our dataset) • Use statistical techniques for continuous variables, for example scatter diagrams (age=200)
Analyzing Data Distribution
67
Gain insight into data.
Data exploration
68
* Report key facts on historical data | * Aid in understanding the data
Pros of Summary Statistics
69
* Find simple relationship | * Statistics value can be deceiving
Cons of Summary Statistics
70
To increase the accuracy of the mining, has to perform _____________.
data preprocessing
71
Real-world data are (3):
- Incomplete - Noisy - Inconsistent
72
Data Quality Problems (11)
* Missing data * Data out of range * Duplicate data * Invalid data * Bad format data * Out of sequence data * Mixed data * Unformatted data * Incomplete data * Truncated data * Transposition errors
73
multiple observations for the same occurrence.
duplicate data
74
data that is just plain wrong, incorrect.
invalid data
75
fields that should be formatted in a particular manner are not. For example date fields that should be in ddmmyy format are recorded as ddmmmyyyy.
bad format data
76
where data should be in a particular sequence, such as account number order but the data is unsorted or incorrectly sorted.
out of sequence data
77
in some cases differing types of data or even multiple data files are intermixed.
mixed data
78
data that should have been recorded in a particular format, such as an address where the street name and number goes in one area, the city goes in another, the country in still another and the post or zip code in yet another. If all the information is combined in one field, this makes processing difficult.
unformatted data
79
files or records within a file that have missing fields or periods of coverage.
incomplete data
80
part of the record or file has been cut off or truncated. For example, data recorded from 08:00-09:00 only containing data recorded up to 08:45 due to lack of space to store all of the information.
Truncated data
81
when data is copied from a source, some of it is copied incorrectly.
transposition errors
82
range of attributes (features) values differ, thus one feature might overpower the other one.
Data Normalization
83
Characteristics of Transforming Data (4)
- Consolidate Data - Convert formats to fit algorithm - Data normalization - Scaling data values
84
Preparing your data (3)
o Manage missing values o Handle extreme or unusual values o Use nonnumeric inputs
85
Select a subset of records from data set based on selection criteria
Sampling and selecting data
86
Sampling methods
* Simple random sample * Stratified random sample * Over sample
87
Sampling by segment
Stratified random sample
88
Characteristics of __________ • Remove irrelevant, weakly relevant, and redundant attributes • Attribute selection • Often little degeneration in predictive performance or even better performance
Feature selection
89
In many cases the information that is lost by ___________ is made up for by a more accurate ___________ in the lower-dimensional space
discarding variables; mapping/sampling