Chapter 1: Introduction Flashcards

1
Q

New Sources of Data

A
  • Tweets 12tb
  • Facebook 25tb
  • Google, youtube …
  • RFID
  • Smart Meters
  • Cameras
  • GPS
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

1 Source of large data

A
  • Customer transactional data –> how do customers behave?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Traditional Data Warehousing

A
  • Several Sources (e.g. online transaction system) –>
  • Extractor / Monitor –>
  • Integration System (<–> Meta Data) –>
  • Data warehouse (Mngmt decision support)
  • –> Clients
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Volume, Velocity, and Variety

A
  • Volume: Enterprises are awash with ever-growing data of all types.
    • Turn 12 terabytes of Tweets each day into improved product sentiment analysis
    • Convert 350 billion annual meter readings to better predict power consumption
  • Velocity: For time-sensitive processes such as catching fraud, big data must be used as it streams into your enterprise in order to maximize its value.
    • Scrutinize 5 million trade events created each day to identify potential fraud
    • Analyze 500 million daily call detail records in real-time to predict customer churn
      faster
  • Variety: Big data is any type of data - structured and unstructured data such as text, sensor data, audio, video, click streams, log files and more.
    • Monitor 100’s of live video feeds from surveillance cameras to target points of interest
    • Exploit the 80% data growth in images, video and documents to improve customer satisfaction
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Aggregating Data from Different Sources

A

The challenge for most organizations is to manage and analyze the various sources of structured, structured, and streaming data.

  • Websites
  • Billing, ERP, CRM
  • RFID
  • Network switches
  • Social media
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

New Trends in Data Organization

A
  • Main memory databases are able to run queries in seconds (which took hours!)
  • Distributed file systems allow for effective parallelization (e.g., Apache Hadoop)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Business Analytics (Definition)

A

Business analytics makes extensive use of statistical analysis, including explanatory and predictive modeling, and fact-based management to drive decision making. It is therefore closely related to management science. Analytics may be used as input for human decisions or may drive fully automated decisions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Descriptive Analytics

A

What has occurred?

How much did I sell?

BI, Data engineering, statistics …

Data Engineering and Statistics:

Organize data, execute large queries, describe means, trends, and test hypotheses

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Predictive Analytics

A

What will occur?

Try to understand behaviour. E.g. switching customers

Data Mining and Econometrics

Forecast events, predict time series, or discrete choice decisions of customers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Prescriptive Analytics

A

What should occur?

Network flow, Management science …

Algorithms and Optimization

Develop algorithms and optimization models for planning, scheduling,

pricing, and revenue mgt.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Relationship to Business Intelligence (BA related to predictive / inductive statistics and BI related to descriptive analytics / statistics)

A
  • Business analytics (related to predictive analytics / inductive statistics)
    • focuses on developing new insights and understanding of business

performance based on data and statistical methods.

* may be used as input for human decisions or may drive fully automated decisions. * Business intelligence (related to descriptive analytics / statistics)
* traditionally focuses on using a consistent set of metrics to both measure past performance and guide business planning, which is also based on data and statistical methods.
* is often associated with querying, reporting, OLAP, and "alerts".
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

From Data to Information (Flow)

A
  • Data consolidation (Data input and Querys) –> DWH
  • Selection and processing (make sense out of large table)
  • Business analytics (model that fits data)
  • Interpretation and evaluation (insights)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Predictive Analytics

A
  • Algorithms and Databases
    • Association Rule Algorithms
    • Algorithm Design Techniques
    • Algorithm Analysis
    • Statistics and Econometrics
  • Statistics and Econometrics
    • Bayes Theorem
    • Regression Analysis
    • EM Algorithm
    • Clustering
    • Time Series Analysis
  • Machine Learning and Data Mining
    • Decision Tree and other Classification Algorithms
    • Clustering
    • Neural Networks
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Numerical prediction

A

Given a collection of data with known numeric outputs, create a function that outputs a predicted value from a new set of inputs.

E.g. Given gestation time of an animal, predict its maximum life span.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Classification

A
  • From data with known labels, create a classifier that determines which label to apply to a new observation
  • E.g. Identify new loan applicants as low, medium, or high risk based on existing applicant behavior.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Clustering

A
  • Identify “natural” groupings in data
  • Unsupervised learning, no predefined groups
  • E.g. Identify clusters of “similar” customers.

Difference to classification: you do not know the groups

17
Q

Association Rule Analysis

A
  • Identify relationships in data from co-occuring terms or items.
  • E.g., analyze grocery store purchases to identify items most commonly purchased together.

Market basket analysis (milk, sugar, eggs)

18
Q

What is a Model?

A

Mathematical functions

  • Mathematical combination of attribute values
  • E.g. linear model, non-linear model
  • CPU performance prediction

E.g.:

  • Decision tree:
    • Study: >= 10hrs –> Do homework <10 hours test well…
  • Neural networks:
19
Q

Model selection

A
  • Build model
  • evaluate performance
  • meet criteria? no –> build model
  • yes: interpret model
20
Q

Most important algorithms

A
  • Regression
  • Decision tree
  • Cluster analysis
21
Q

Examples of Analytics in Retailing

A
  • Campaign management
  • Product recommendations
  • Customer profitability analysis
  • Customer segmentation analysis
  • Pricing products
  • Forecasting revenues
  • Analysis of clickstream data
22
Q

CRM Marketing and examples

A
  • CRM marketing is #1 area to which data mining is applied.
  • Reccomender systems: (systems for recommending items) Amazon, netflix …
    • increase sales
    • Customer A buys
    • Customer B searches what A bought, gets presented what he also bought…
  • Collaborative Filtering
    • Maintain a database of many users’ ratings of a variety of items. For a given user, find other similar users whose ratings strongly correlate with the current user.
    • Recommend items rated highly by these similar users, but not rated by the current user.