Definitions Flashcards
(27 cards)
Data Science is the art of turning data into actions.
Combines: Domain Expertise, Statistics, and Computing Skills
Flows back and forth between deductive and inductive reasoning
Relatively new discipline in which methodologies and frameworks are still being solidified
Inter-related concepts
of Data Science
Analytics, Business Analytics, Data Science, Business Intelligence, Data Analytics, Big Data, Statistical Learning.
Deductive Reasoning
Theory Driven, Hypothesis —-> To Analytics.
Inductive Reasoning
Empirically Drive, Analytics —–> Hypothesis
Big Data:
Data in which the volume, variety, or velocity of information prohibits analysis via conventional desktop or server scale tools.
Distributed Processing (or computing):
A solution to the big data problem. Platforms which allow the power of individual machines to be simultaneously utilized to solve big data problems (e.g. Hadoop)
Machine Learning:
Most closely associated with Inductive reasoning. Algorithms that allow computers to learn from data without explicit instructions from the operator.
Supervised Learning:
Machine learning in which the outcome is defined by the operator. Can think of predicting outcomes.
Unsupervised Learning:
Machine learning in which the outcome is not defined. Can think of classifying observations or dimensions.
Regression:
A class of problems in which the objective is to predict the value of an outcome.
Classification:
A class of problems in which the objective is to predict which group or “class” of an observation is likely to belong to.
Parametric Techniques:
Techniques in which there are specific assumptions about the nature and/or shape of relationships between variables. E.g. in linear regression the slope of a line is being fit.
Non-parametric Techniques:
Techniques in which there are not specific assumptions about the nature and/or shape of the relationships between variables. E.g. decision trees.
Un-Structured Data:
Data that has no easily identified structure (e.g. free-form text responses)
Types of Analytics
Descriptive Analytics: What is or has been?
Predictive Analytics: What is likely to happen?
Prescriptive Analytics: What should you do?
Good Analytics
Creates Action: What will be different?
Understands context: What are the physics of the problem?
Avoids Bias: In the model and in the setup
Focuses on Impact: What value is generated?
Data Science is a response and solution to the data deluge
Tools and process to deal with “Big Data”
Creates advantage to companies that use it effectively
Data Science can handle a breadth of problems
Different domains Different outcomes Different purposes (Descriptive, Predictive, Prescriptive)
CRISP-DM Definition
Cross-Industry Standard Process for Data Mining
CRISP-DM
Components
Business Understanding Data Understanding Data Preparation Modeling Evaluation Deployment
Crisp-DM
Business Understanding
Determine Business Objectives:
Business background
Objectives and Success Criteria
Assess the situation:
Resource Inventory (e.g. budget, people)
Requirements, Assumptions, Constraints, Risks, Contingencies
Cost/Benefit
Determine Data Mining Goals and Success Criteria
Produce a Project Plan
Crisp-DM
Data Understanding
Collect Initial Data
Describe the Data
Explore the Data
Verify Data Quality
Crisp-DM
Data Preparation
Select Data
Cleaning Data:
Missing, Invalids
Construct New Data:
Transformations, Structure Data
Integrate Data
Format Data
Crisp-DM
Modeling
Select Modeling Technique
Generate a Test Design
Build the Model
Assess the Model
Revise the Model