New Sources of Data
 Tweets 12tb
 Facebook 25tb
 Google, youtube ...
 RFID
 Smart Meters
 Cameras
 GPS
#1 Source of large data
 Customer transactional data > how do customers behave?
Traditional Data Warehousing
 Several Sources (e.g. online transaction system) >
 Extractor / Monitor >
 Integration System ( Meta Data) >
 Data warehouse (Mngmt decision support)
 > Clients
Volume, Velocity, and Variety

Volume: Enterprises are awash with evergrowing data of all types.

Turn 12 terabytes of Tweets each day into improved product sentiment analysis

Convert 350 billion annual meter readings to better predict power consumption

Velocity: For timesensitive processes such as catching fraud, big data must be used as it streams into your enterprise in order to maximize its value.

Scrutinize 5 million trade events created each day to identify potential fraud

Analyze 500 million daily call detail records in realtime to predict customer churn
faster

Variety: Big data is any type of data  structured and unstructured data such as text, sensor data, audio, video, click streams, log files and more.

Monitor 100’s of live video feeds from surveillance cameras to target points of interest

Exploit the 80% data growth in images, video and documents to improve customer satisfaction
Volume: Enterprises are awash with evergrowing data of all types.

Turn 12 terabytes of Tweets each day into improved product sentiment analysis

Convert 350 billion annual meter readings to better predict power consumption
Velocity: For timesensitive processes such as catching fraud, big data must be used as it streams into your enterprise in order to maximize its value.

Scrutinize 5 million trade events created each day to identify potential fraud

Analyze 500 million daily call detail records in realtime to predict customer churn
faster
Variety: Big data is any type of data  structured and unstructured data such as text, sensor data, audio, video, click streams, log files and more.

Monitor 100’s of live video feeds from surveillance cameras to target points of interest

Exploit the 80% data growth in images, video and documents to improve customer satisfaction
Aggregating Data from Different Sources
The challenge for most organizations is to manage and analyze the various sources of structured, structured, and streaming data.
 Websites
 Billing, ERP, CRM
 RFID
 Network switches
 Social media
New Trends in Data Organization
 Main memory databases are able to run queries in seconds (which took hours!)
 Distributed file systems allow for effective parallelization (e.g., Apache Hadoop)
Business Analytics (Definition)
Business analytics makes extensive use of statistical analysis, including explanatory and predictive modeling, and factbased management to drive decision making. It is therefore closely related to management science. Analytics may be used as input for human decisions or may drive fully automated decisions.
Descriptive Analytics
What has occurred?
How much did I sell?
BI, Data engineering, statistics ...
Data Engineering and Statistics:
Organize data, execute large queries, describe means, trends, and test hypotheses
Predictive Analytics
What will occur?
Try to understand behaviour. E.g. switching customers
Data Mining and Econometrics
Forecast events, predict time series, or discrete choice decisions of customers
Prescriptive Analytics
What should occur?
Network flow, Management science ...
Algorithms and Optimization
Develop algorithms and optimization models for planning, scheduling,
pricing, and revenue mgt.
Relationship to Business Intelligence (BA related to predictive / inductive statistics and BI related to descriptive analytics / statistics)

Business analytics (related to predictive analytics / inductive statistics)

focuses on developing new insights and understanding of business
performance based on data and statistical methods.

may be used as input for human decisions or may drive fully automated decisions.

Business intelligence (related to descriptive analytics / statistics)

traditionally focuses on using a consistent set of metrics to both measure past performance and guide business planning, which is also based on data and statistical methods.

is often associated with querying, reporting, OLAP, and "alerts".
Business analytics (related to predictive analytics / inductive statistics)

focuses on developing new insights and understanding of business
performance based on data and statistical methods.

may be used as input for human decisions or may drive fully automated decisions.
Business intelligence (related to descriptive analytics / statistics)

traditionally focuses on using a consistent set of metrics to both measure past performance and guide business planning, which is also based on data and statistical methods.

is often associated with querying, reporting, OLAP, and "alerts".
From Data to Information (Flow)
 Data consolidation (Data input and Querys) > DWH
 Selection and processing (make sense out of large table)
 Business analytics (model that fits data)
 Interpretation and evaluation (insights)
Predictive Analytics

Algorithms and Databases

Association Rule Algorithms

Algorithm Design Techniques

Algorithm Analysis

Statistics and Econometrics

Statistics and Econometrics

Bayes Theorem

Regression Analysis

EM Algorithm

Clustering

Time Series Analysis

Machine Learning and Data Mining

Decision Tree and other Classification Algorithms

Clustering

Neural Networks
Algorithms and Databases

Association Rule Algorithms

Algorithm Design Techniques

Algorithm Analysis

Statistics and Econometrics
Statistics and Econometrics

Bayes Theorem

Regression Analysis

EM Algorithm

Clustering

Time Series Analysis
Machine Learning and Data Mining

Decision Tree and other Classification Algorithms

Clustering

Neural Networks
Numerical prediction
Given a collection of data with known numeric outputs, create a function that outputs a predicted value from a new set of inputs.
E.g. Given gestation time of an animal, predict its maximum life span.
Classification
 From data with known labels, create a classifier that determines which label to apply to a new observation
 E.g. Identify new loan applicants as low, medium, or high risk based on existing applicant behavior.
Clustering

Identify “natural” groupings in data

Unsupervised learning, no predefined groups

E.g. Identify clusters of “similar” customers.
Identify “natural” groupings in data
Unsupervised learning, no predefined groups
E.g. Identify clusters of “similar” customers.
Difference to classification: you do not know the groups
Association Rule Analysis
 Identify relationships in data from cooccuring terms or items.
 E.g., analyze grocery store purchases to identify items most commonly purchased together.
Market basket analysis (milk, sugar, eggs)
What is a Model?
Mathematical functions
 Mathematical combination of attribute values
 E.g. linear model, nonlinear model
 CPU performance prediction
E.g.:
 Decision tree:
 Study: >= 10hrs > Do homework <10 hours test well...
 Neural networks:
Model selection
 Build model
 evaluate performance
 meet criteria? no > build model
 yes: interpret model
Most important algorithms
 Regression
 Decision tree
 Cluster analysis
Examples of Analytics in Retailing

Campaign management

Product recommendations

Customer profitability analysis

Customer segmentation analysis

Pricing products

Forecasting revenues

Analysis of clickstream data
Campaign management
Product recommendations
Customer profitability analysis
Customer segmentation analysis
Pricing products
Forecasting revenues
Analysis of clickstream data
CRM Marketing and examples
 CRM marketing is #1 area to which data mining is applied.
 Reccomender systems: (systems for recommending items) Amazon, netflix ...
 increase sales
 Customer A buys
 Customer B searches what A bought, gets presented what he also bought...

Collaborative Filtering

Maintain a database of many users’ ratings of a variety of items. For a given user, find other similar users whose ratings strongly correlate with the current user.

Recommend items rated highly by these similar users, but not rated by the current user.
 increase sales
 Customer A buys
 Customer B searches what A bought, gets presented what he also bought...
Collaborative Filtering

Maintain a database of many users’ ratings of a variety of items. For a given user, find other similar users whose ratings strongly correlate with the current user.

Recommend items rated highly by these similar users, but not rated by the current user.