Test 1 ISYS 4293 Flashcards
(80 cards)
Business Intelligence and Data Mining
Data mining is a collection of knowledge-discovery technologies used to perform Business Intelligence in order to support an organizationβs decision-making
Cross Industry Standard Process-DM
is how we do data mining
(1) Business Problem Understanding
-Define business requirements and objectives
-Translate objectives into data mining problem definition
-Prepare initial strategy to meet objectives
(2) Data Understanding Phase
-Collect data
-Assess data quality
-Perform exploratory data analysis (EDA)
(3) Data Preparation Phase
-Cleanse, prepare, and transform data set
-Prepares for modeling in subsequent phases
-Select cases and variables appropriate for analysis
(4) Modeling Phase
-Select and apply one or more modeling techniques
-Calibrate model settings to optimize results
-If necessary, additional data preparation may be required
(5) Evaluation Phase
-Evaluate one or more models for effectiveness
-Determine whether defined objectives are achieved
-Make decision regarding data mining results before deploying to field
(6) Deployment Phase
-Make use of models created
-Simple deployment: generate report
-Complex deployment: implement additional data mining effort in another department
-In business, customer often carries out deployment based on model
How many data mining tasks?
6 data mining task
Data Mining Task: Description
-Describes general patterns and trends
-Easy to interpret and explain
-Transparent Models
-Pictures and #βs
-E.g. Scatterplots, Descriptive Stats
Data Mining Task: Estimation
-Target Variable = Numerical
-Numerical Predictor/Categorical (IVβs) values to approximate changes in Numerical Target Variables(DVβs)
-Ex: Estimate a studentβs Graduate GPA from their Undergrad GPA
-E.g. Correlation, Linear Regression
Data Mining Task: Classification
-target variables (DVβs) = categorical
-Examples:
Simple vs Complex tasks
Fraudulent card transactions
Income brackets(ex. high, middle, low)
Data Mining Task: Prediction
-Results lie in the future
-There is a time component in this task
-Ex: What is the probability of Razorbacks winning a game with a particular combination of player profiles?
Data Mining Task: Association
-Finding attributes of data that go together
-Profiling relationships between two or more attributes
-Understand the consequent behaviors when based on prior behaviors
-Ex: Supermarkets use affinity analysis to see what items are purchased together
Data Mining Task: Clustering
-no target variables
-segmentation of data
-Ex: Focused marketing campaigns
Data mining Task: Learning Types
Supervised and Unsupervised
Supervised
-Have a target variable
-Task:
Classification(Categorical Target Variable)
Estimation (Numeric Target Variable)
Description
Prediction
Unsupervised
-No target variable
-Task:
Association
Clustering
Fallacy 1:
-Set of tools can be turned loose on data repositories
-Finds answers to all business problems
Reality 1:
-No automatic data mining tools solve problems
-Rather, data mining is a process (CRISP-DM)
-Integrates into overall business objectives
Fallacy 2:
-Data mining process is autonomous
-Requires little oversight
Reality 2:
-Requires significant intervention during every phase
-After model deployment, new models require updates
-Continuous evaluative measures monitored by analysts
Fallacy 3:
-Data mining quickly pays for itself
Reality 3:
-Return rates vary
-Depending on startup, personnel, data preparation costs, etc.
Fallacy 4:
Data mining software easy to use
Reality 4:
-Ease of use varies across projects
-Analysts must combine subject matter knowledge with specific problem domain
Fallacy 5:
Data mining identifies causes of business problems
Reality 5:
-Knowledge discovery process uncovers patterns of behavior
-Humans interpret results and identify causes
Fallacy 6:
-Data mining automatically cleans data in databases
Reality 6:
-Data mining often uses data from legacy systems
-Data possibly not examined or used in years
-Organizations starting data mining efforts confronted with huge data preprocessing task