Big Quiz 4 Flashcards
(21 cards)
Data Mining
Collecting, aggregating, and visualizing
artificial intelligence
replicate human reasoning and decision making
machine learning
prediction, and update (predictions)
Bi Stack
the set of technologies needed for data analysis.
Data Sources
Own [crm, scm, acc, hrs, erp]
third party DBS
Research & Partners
Primary vs secondary sources
only research is primary, everything else is secondary
ETL
Extract the data from a data source
Transform the data by cleaning and aggregating it
Load the data into a data warehouse where it can be accessed for future analytics
Data Lake vs Warehouse
Lakes are dirty
Warehouses are organized
Notice the warehouse is after the ETL step
Analytical vs operational
warehouse - analytical
source - operational
KPI
KEY: only the most important measures
Performance: how are we doing as a company?
Indicators: shows us how we are doing
Good Measures
Simple
Should be easy to understand
Easy to obtain
Shouldn’t be time/cost/etc. prohibitive to obtain
Precisely Definable
Only one way to interpret it
Objective
Not opinion based
Robust
Not likely to heavily swayed by outside factors
Valid
Are we measuring the right thing the right way?
Types of analytics
Descriptive - Understanding the data you DO have (past values)
Predictive - Understanding the data you do NOT have (future)
Prescriptive - ‘What if’ scenarios (compares options)
Clustering
Grouping customers or products and creating unique strategies for each segment.
Key influencer analysis
Identifying the most influential variables by measuring correlation.
Forecasting
Prediciting future values over interval time periods based on known values of the same timeframe.
Recommendation Analysis
Predicting items that a customer may want to purchase based on the shopping baskets of other customers.
Dependant Variable
The thing we are trying to predict
Y
Label
Independent Variable
The inputs to our prediction
X variables
Features
Important Terms
Null Hypothesis
- Nothing is happening beyond random chance.
Alternative Hypothesis
- Something is happening
P-Value
< 0.05
Effect Size
Measures the amount of impact one variable has on another
Important Terms 2
Trendline:
A line in a scatterplot that shows the direction a relationship takes
Regression Line:
A line that fits the data the best by minimizing residuals (The distance between the data points and the line)
R-Squared Stat
0 - 1. The closer it is to 1, the more of your data your model explains. Measures how good your model is