Machine Learning for Business Flashcards

Question 1

Q

ML Applications

Answer

A

ML is aplying statistical or computer science mtds on data to

Draw causal insights eg determine why customers are cancelling their subscription
Predict future events eg. determine which customers are likely to cancel their subscription next month
Understand patterns in data eg. customer segmentation to unravel groups of customers with similar behaviours wich can be used to customize marketing and other activities

Question 2

Q

Date Need Pyramid (by priority)

Answer

A

Collection - Extract data from source systems
Storage - Store data reliably and accessibly
Preparation - organize and clean data to make it usable to generate insights and for other use cases such as outlier detection, data quality processes and methods to ensure data reflects reality
Analysis - Understand business trends, distributions (at cohort, geographic, and other levels) and segments. Results in the production of dashboards and other reports
Prototyping and testing ML - Build interpretable simple models, and conduct A/B tests and experiments to predict desired outputs and drive metric up eg run a customer retention campaign based on churn predicting model to reduce churn
ML in production - automate the model, and deploy systems such as CRM, website, mobile apps and other tools into production.

Question 3

Q

Machine learning principles

Answer

A

Supervised learning - Used to draw causal insights and predict future events. Supervised learning data has Target variable or labels ( which supervises what the model is optimizing for) and Input Features (which are data points collected about the transaction.
Examples include
Marketing (prediction of customers likely to purchase next month, customer’s expected lifetime value), Finance ( fraud detection, mortgage default).
Manufacturing - quality control ie predict defects, predict likely machine breakdown and maintenance need
Transportation - prediction of parcel delivery time, identify fastest driving routes, weekly demand (an therefore stocking needs)
Unsupervised learning - used to understand patterns in data such as identifying groups of similar observations. Its data has only input features but no target variables.
Examples:
Marketing (customer segmentation based on past purchases),
Finance (segment transactions into profitable, riskly and loss making segments)
Manufacturing - anomaly detection

Question 4

Q

Job roles, tools and technologies

Answer

A

Collection - Infrastructure owners
Storage - Data Engineers
Preparation - Data Engineers, Data Analyst
Analysis - Data Analysts, Data Scientist
Prototyping and Testing - Data Scientist, ML Engineer
ML in Production - ML Engineer

Team Structure -

Centralized - All data functions in one central team. Good for small companies and startups
Decentralized - each BU has its own data functions. Good for large companies. Issues with data governance, definition differences, redundancies
Hybrid - Best. Infrastructure, definitions, methods and tooling are centralized. Application and prototyping are decentralized.

Question 5

Q

Prediction vs. inference dilemma

Answer

A

Inference or causal models
- Seek to understand drivers of a business outcome
- Easily interpretable
- Less accurate than prediction models
Prediction models
- Main goal is to predict
- Are not easily interpretable….are like black boxes
- Much more accurate than inference models

Question 6

Q

Inference (causal) models

Answer

A

Causality

identify causal relationship of how much certain actions affect an outcome of interest
Answers the why questions
Optimizes for model interpretability not accuracy and performance
detect patterns from observational data and draw causal conclusions

Best practices

Do experiment wherever possible. Outcomes are guaranteed and more reliable
Run the experiments periodically and use as benchmark
If experiments are not possible, then build an inference model

Question 7

Q

Prediction models (supervised learning)

Answer

A

Supervised models types

Classification - predict Class/type of outcome (eg fraud, purchase, subscription cancellation). Target variable is categorical or discrete. Here, ML learns the rules from labeled data. These rules are now applied to new transactions to predict outcome i.e class.
Regression - quantity of an outcome eg. dollars spent, hours played etc. Target variable is continuous. Again, ML learns the rules from labeled data. These rules are now applied to new transactions to predict outcome i.e. quantity

Question 8

Q

Prediction models (unsupervised learning)

Answer

A

Unsupervised ML models have no target variable only features and are applied in

Clustering (grouping obervations into similar clusters)
Anomaly detection - detecting observations that are outside the regular pattern which can be used as input in supervised learning.
Recommender engines - to suggest products or services to customers based on their similarity to other customers

Question 9

Q

Business Requirements

Answer

A

Scoping business needs
- What is the business situation? eg Increasing fraud rate. Always start with inference questions…why, which, how etc. Then follow with defining prediction questions…can we identify customers likely to churn?

What is the business opportunity and how big is it?
eg. reduce fraud rate by X% resulting in Y USD savings. Size up the opportunity, know the drivers of the outcome, cost and value.
What are the business actions to take?
eg. improve faud detection system, reduce fraud drivers, and manually review transactions at risk.
Carry out experiment to know whether you can change prediceted outcome.

Question 10

Q

Model training

Answer

A

Model training - using input features and target variable to train the model to detect patterns and then predict target variable on future data.

Model testing - a smaller portion of the data is then used to test model performance on unseen dataset.

Underfitting - Does not memorize training data and ffits badly to test / unseen data.

Overfitting - Model just memorizes training data pattern and does not predict unseen data well,

Right fit - model memorizes training data and predicts unseen data well.

Question 11

Q

Model performance measurement

Answer

A

Classification Performance
Accuracy - All correct predictions / All observations
I.E. (TP + TN) / All observations

Precision - Precision measures how many of the predictions of that class turned out to be true i.e. correct (churn) predictions / Observations predicted (as churn) - TP / (TP + FP)

Recall - Recall measures how many of the total observations of that class the prediction was able to capture (or recall) i.e. correct (churn) predictions / All Actual (churn) observations - TP / (TP + FN)

Regression
Error measurement - Regression error calculates how far away the prediction is from the observed number.

Actionable models
Good models are not always actionable -
Test whether model helps improve outcomes

Question 12

Q

Machine learning risks

Answer

A

Poor performance - bad predictions means no effec t on business. Be sure to review test performance,  not training performance. 
Low precision - high false positives. A lot of misclassified items in the class of interest.

Low recall - High false negatives.

Large regression error - large differences between predicted and actual values. Assess cost of mistake and error tolerance level for business.

Question 13

Q

Machine learning mistakes

Answer

A

Mistakes
ML first - Companies should not put ML first ut should use the data needs pyramid.

Not enough data - data availability and quality

Target variable definition (eg. fraud), observation etc

Feature selection:
Inference - these are what affect the target variable. Choose variables within your control. Business must be involved in this.

Prediction - Cann target variable be estimated in the future. Use readily available data, if model performance is ok, test it. Introduce new features iteratively. ML team must have target date for market testing.

Late testing, no impact

Question 14

Q

Communication management

Answer

A

Managing communication betwen business and ML teams. Aims are to define business requirements, review ML models and business products, Inference vs prediction, Baseline model results vs outline model updates, market testing, production

Question 15

Q

Machine learning in production

Answer

A

Production system is live customer facing and business critical. Examples
CRM
Fraud detection system
Online banking platform 
Autonomous cars

Staffing
Prototype ML: Data Scientists, ML Engineers
ML in Production: Software engineers, Data Engineers, Infrastructure owners

Machine Learning for Business Flashcards

(15 cards)