AI for Business Specialization (Wharton, University of Pennsylvania) Flashcards

(129 cards)

1
Q

What are the criteria for AI to be considered a general-purposed technology (GPT)?

A

1) has a pervasive use in a wide range of industries and sectors
2) stimulate innovation and economic growth
3) has a significant number of research jobs across industries

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Which are the 4xV of Big Data?

A

VOLUME - Terabytes to petabytes of existing data to process
VARIETY - Structured, unstructured, text, video,…
VELOCITY - Streaming data, multiseconds to seconds to process
VERACITY - Uncertainty due to data inconsistency and incompleteness, ambiguities, latency, …

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the difference between traditional analytics and big data analytics?

A

Traditional analytics is hypothesis driven, i.e. question > hypothesis > analyzed info > answer. It is structured and repeatable.

Big data analytics is data driven, i.e. Data > exploration > correlation > actionable insight. It is iterative and explorative.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the new skillsets required in Big Data?

A

1) Manage data&raquo_space;> tool development, data expertise
2) Understanding data&raquo_space;> data science, visualization
3) Acting on data&raquo_space;> decision making, apply data to problem solving

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Type if tools in Big Data…

A

1) DATA MANAGEMENT TOOLS: Data warehouse and Hadoop/Spark

2) DATA ANALYSIS TOOLS: Clustering, association role mining, machine learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How would you define a Data Warehouse?

A

It is a particular kind of DB management system, specialized in historic data from many sources and with the purpose to enable analytics (e.g. reporting, visualization or BI).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Examples of Data Warehouse

A

Azure SQL Data Warehouse
Google BigQuery
Snowflake
Amazon Redshift

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does the term ETL come from and what does it do?

A

ETL = Extract > Transform > Load

It is a function that takes data from different sources (CRM, ERP, billing, supply chain, etc.) and builds a Data Warehouse from which you can generate different analytics as an outcome (e.g. reporting, visualization, BI)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the difference between data for operations and data for analytics?

A

Data for operations requires real time processing in order to take immediate action while analytics do not. The other difference is that analytics considers historical information, operations do not.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the main open source Big Data tools?

A

HADOOP and SPARK (evolution of HADOOP). Developed by Google, storage and process massive amount of data in distributed fashion with low cost server architecture

SNOWFLAKE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is data mining?

A

it is a term encompassing tools for discovering patterns in large datasets. Main difference with data regression is that is is data driven (predictive analytics) and not hypothesis driven,

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are two of the most popular techniques for data mining?

A

1) Clustering - grouping data, e.g. customer segmentation

2) Association rule mining - finding common co-ocurrences in data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

AI spectrum (types)…

A

WEAK AI - artificial narrow intelligence, good at one specific task

STRONG AI - artificial general intelligence, do things similarly as humans (as quickly and easy)

ARTIFICIAL SUPER INTELLIGENCE - do human things faster and better

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

In which ways can you build AI?

A

There are two approaches:

1) Expert systems approach: capturing and transferring knowledge using rules. Cannot beat human since it has the limitation that tacit knowledge is not transferred.

2) Machine Learning: subset of AI, used for predictions, that has the ability to learn from data without being explicitly trained through rules.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the 3 most common techniques for ML?

A

1) Supervised learning
2) Unsupervised learning
3) Reinforcement learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the main characteristics of Supervised (M)Learning?

A
  • learns from past data, coming down to aproximating the function f(x)=y with high fidelity and accuracy
  • inputs (x) = features/covariates (labeling and annotations)
  • outputs (y) = targets
  • uses classification and regretion methods
  • requires high quality training data sets
  • 90% of practical AI cases uses ML and out of that 90% is supervised ML
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are the main characteristics of unsupervised (M)Learning?

A
  • There is not a fixed set of outputs predefined
  • the goal is to cluster and identify important features and patterns so the system can learn by itself
  • requires large training data sets
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are the main characteristics of Reinforcement (M)Learning?

A
  • let algorithms learn by testing various actions and strategies to decide which one works best…do not begin with large training datasets but learn by taking actions and observing the results
  • bandit algorithm: trades off between EXPLORATION (gathering more info about the decision environment) and EXPLOITATION (making the best decision based on the info available)…specific example would be the Multi-armed bandit algorithm where a finite set of resources must be allocated among multiple choices
  • application e.g. in gaming or onlines personalization
  • this type of ML is not widely used
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What drives accuracy in Supervised ML?

A

1) Quantity of data (number of observations)
2) Quality of data (number of characteristics of the observations)
Others: relevance of the info, complexity of the model, feature engineering, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What are some of the most common methods in ML to aproximate f(x)=y?

A
  • Logistic regression
  • Decision trees and random forest
  • Neuronal networks
  • others: boosting, SVM (support vector machines), Neuronal Networks more complex than the ones explained, etc.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Explain Logistic Regression in ML…

A

It is the most popular method for binary classification algorithms when outcomes can take only 1 of 2 values. Logit function constrains probabilities to between 0 and 1…it is equivalent to finding the ‘best fit’ line/plane that separates the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Explain Decision Trees in ML…

A

It is an easy-to-interpret model built iteratively looking for features in your data that are most predictive. In essence is about choosing a variable/split that provides the most predictive power at each step.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Explain Random Forest in ML…

A

It is an ‘ensemble’ algorithm that harnesses the power of multiple decision trees. It is popular and relatively simple. Takes many random samples of your dataset and train a decision tree for each one, choosing the prediction with more votes. Each prediction is less accurate that a single decision tree built with the entire dataset…however the combination is better!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Explain Neuronal Network in ML…

A

Loosely inspired by biological neurons, takes inputs from other neurons, apply transformation and pass signal on. Normally has several layers (deep neural network) with an input layer, output layer and hidden layers. Are often the best algorithms for audio, images, video, etc. due to its ability to built very complex model. Recent advances in GPU (Graphic Power Units) and algorithm propagation has allowed to build more layers…main disadvantage is that are hard to understand and interprete. Lots of works are being done to open up the black box and understand what they do all through the different intermediate layers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
How do you choose a specific ML model?
by evaluating each model performance on a validation dataset
26
what does it mean partitioning a data set for ML model selection?
splitting the data set in different subsets: - training set: subsection from which the ML learns - validation set: another subset to which we apply the ML algorithm to see how accurately it identifies relationships between the known outcomes for the target variable and the datasets of the features - holdout set: testing data that provides a final estimate of the ML model´s performance after trained and validated. Should never be used to make decisions about which algorithm to use
27
what are some types of validation?
1) Holdout validation: Partition available data into a training dataset and a holdout; evaluate model performance on holdout 2) Cross-validation:Create a number of partitions (validation datasets) from the training dataset; fit model to the training dataset (sans the validation data); evaluate model against each validation dataset; repeat with each validation set and average results to obtain the cross-validation error
28
What is more important in ML, the model or the data?
The data...among modern methods, performance differences between algorithms are relatively small for a given dataset when compared to differences between algorithms with more or less data...it is the so call: 'UNREASONABLE EFFECTIVENESS OF DATA" (Peter Novig, Google)
29
What is feature engineering?
It is the part of the process of ML that engineer the unstructured data so it can be consumed by the algorithm and convert into a target
30
TRUE OR FALSE? ...big revolution in AI is the ability to predict from "unstructured data"
TRUE
31
TRUE OF FALSE? ...feature engineering is one of the easiest parts of ML process where data scientists spend less time
FALSE it is one of the more complex and time consuming
32
What is Deep Learning?
It is a kind of AI that eliminates the need for feature extraction (engineering) by using neural networks that leverage loss/cost functions to compare against training labels
33
Why is Deep Learning a "Game Changer"
- no need for feature engineering that is expensive, error-prone and uncertain - can lead to massive performance improvements relative to hand-coded features - computation is getting cheaper, making deep learning more feasible - it substitutes domain expertise (e.g. doctor) by more and more data (labeled examples) and computation
34
Deep learning examples
Image recognition, detecting fake news, detecting knockoffs from luxury products
35
What are the limitations of Deep Learning?
1) scale of data required (quantity and variety) 2) Computational power and storage space 3) lack of interpretability (explainability)
36
While evaluating ML performance, what should we tell the algorithm to optimize on?
Lost/cost functions: - Accuracy: fraction of labels (answers) that the algorithm predicts correctly - Precision: what proportion of positive identifications (e.g. fraudulent) were actually correct - Sensitivity: how many relevant instances (e.g. fraudulent) did you catch? - Specificity: proportion of legitimates correctly identified as such
37
What is Confusion Matrix or Receiver Operating Characteristics (ROC) is used for?
...mapping True/False positives and negatives to check lost/cost functions
38
tradeoffs between loss functions depends on the application
e.g. on medical application like disease screening we might want to make sure we dont miss a person with the disease (i.e. sensitivity is key) while on identifying violations that have severe punishments we want to make sure that those identified are true (i.e. precision is key)
39
What is training data
The data the algorithm uses to learn the best mapping between the inputs and the right predictions or outputs. Training data is the key to building ML algorithms
40
Where does training data come from?
- archival or historical data in the organization - human data labeling to generate training data (e.g. platforms for crowdsourcing the task of labeling data) - using customer to label data (e.g. Google or Gmail spam filtering)
41
What is key to ML algorithms?
Training data though the point is to predict outcomes where we don´t already know what is going to. happen
42
What is the over-fitting problem?
It is the danger that the model performs well on training data but no other data sets...example: studying a test vs studying the material
43
What is the "bias-variance" tradeoff?
the challenge in capturing the relevant aspects of the model vs. capturing the noise in the training data
44
What is the Test Data (Hold-out)?
it is the data set that is not used to train or build the model, but can be used to validate the model. It helps ensure the model also works well on outside samples
45
Where does test data come from?
one common approach is to divided the labeled data into training and test (e.g. 70/30)
46
Advantages of Deep Learning vs ML?
- Consistency of results - scale and speed - works really well for some tasks (e.g. image recognition) - faster and cheaper to build
47
Possible applications of NLP (Natural Language Processing)...
it is normally used for predictions (through unstructured data) but it might also be used for unsupervised learning such as topic modeling (classifies content in a way that makes them easier to interpret)
48
What is a Generative Model?
Can create new data instances. Instead of classifying data into two categories, a generative model asks what the underlying process is that could have generated the type of data that we are seeing in the sample
49
What is GAN, Generative Adversarial Networks?
It is used to generate artificial content that is increasingly hard to tell apart from real content.
50
What is a Generator and a Discriminator Network?
The two networks used in GAN, competing with one another. Generator produces new content and the other, the Discriminator, tellS whether the output of the first one is real or fake. Over time the generator will learn what it needs to do to create content that is harder and harder for the discriminator to identify as being fake content.
51
TRUE OR FALSE GANs are not controversial
FALSE. Big concerns around deepfakes
52
What is a VAE -Variational AutoEncoder?
It is an encoder that takes data and boil it down to a simpler representation which then can be used to recreate itself. Can be used to slightly vary some attributes or aspects of the image
53
What does traditional DevOps focus on?
Practices and tools to build, test and deploy code in production
54
What is CI (Continuous Integration) and CD (Continuous Deployment) about in the DevOps context?
Continuous Integration is about creating a new code branch to fix the problems in parallel to the production code. Once fixed it is merged and deployed in to production (Continuous Deployment).
55
What are the main differences between the DevOps and the ML Ops workflows?
In ML Ops the code is not the only source of changes, the data might change, the model itself might change as it re-trains
56
ML Ops phases
1. Infrastructure Management 2. Data Management 3. Model Management 4. Deployment 5. Monitoring
57
Examples of ML Ops tools
There are tools for each of the phases on the ML Ops cycle. For exemple: Amazon Sage Maker covers several phases, Paperspace or Pachyderm are focused only in one.
58
What is the so called Chicken&Egg problem?
When developing a new product you don't have data to trained the models...without user you don´t have data, without data you can´t build AI models
59
What are the 5 possible strategies to solve the Chicken & Egg Problem in AI Enterpreneurship?
1. Star with a non-AI product that generates data 2. Partner with an organization that has data 3. Crowdsource the (labeled) data you need 4. Make use of public data 5. Rethink the need for data (e.g. reinforcement learning or expert systems)
60
What are the two main AI applications in Marketing Function?
1. AI to predict or shorten customer journey 2. Personalization on the Web
61
What type of AI can be used?
- Voice AI - Visual AI - Language AI
62
How to approach the use of AI
1. Ask Why? what is the customer need or problem you are trying to solve 2. How? what are the technologies and the data assets necessary to best solve it? 3. RoI...is it worth?
63
What does it mean upstreaming or downstreaming in your customer journey?
upstreaming...company wants to curate the entire customer journey downstreaming...company wants to keep customer engaged after initial purchase
64
what are the 4 main models of organizational structure for analytics?
1. Centralized model. Core unit serves the entire company. PROS: ability to work cross-functional. CONS: agility, workload and time constrains 2. Center of Excellence (CoE) model. Analysts based in BUs and their activities coordinated by a small central team. PROS: coordination is centralized, as well as training. CONS: may not have enough control or organizational support 3. Functional model. Analysts place within function(s) that dominate the analytics activity. PROS: analytics concentrated where it can benefit the most. CONS: other functions may not get the support. 4. Dispersed model. Analyst spread through the organization without centralized support. PROS: some units may get support...CONS: not really a model
65
A template for AI Transformation
1. Set up AI brain trust. 8-12 people 2. Identify data assets and activities to be automated using ML/AI 3. Construct ML portfolio with 4-5 short term and at least 1 long term projects 4. Figure out how AI team will fit into org. chart 5. Set up risk management or audit process
66
What are Recommender Systems?
Systems that use data on purchases, product ratings and user profiles to predict which products are best suited to a particular customer (e.g. "customer who bought this item also bought..." or "people like you bought...")
67
Two main Recommender Designs...
1) Content-based Recommenders: find products with similar attributes 2) Collaborative Filtering (CF): use information on what others buy/like
68
what are the two types of Collaborative Filters (CF)?
1) Item-to-Item CF: recommends items bought by others who bought the item you are interested in 2) User-similarity based CF: recommends items bought by others who are similar to you based on data the company has about your preferences
69
What are the PROS of Collaborative Filters?
- easy and cheap to build (no need for detailed metadata about products) - effective in practice
70
what are the main challenges of Collaborative Systems?
Cold-start: how to make recommendations for new users without data or how to handle new items without reviews or users
71
What is the Longtail concept in Collaborative Filtering?
...the main effect of automated recommendations would be to help people move from the world of hits to the world of niches- obscure products that are closer to our individual preferences but never get our attention in mainstream markets this is partially correct, CF increase niche product sales but also hit ones, this last ones increasing more the marketshare (i.e. popularity bias)
72
Pros and Cons of Content-Based Recommenders
Pros: - doesn´t have popularity bias - provides relevant suggestions - can explain recommendations - works relatively well even for less popular/newer tiems with less data as well as for new users CONS - Difficult to build because you need detaild metadata
73
Hybrid Recommender: Spotify
- crawls web to examine blog posts and online discussions to figure out the kind of descriptive items that listeners use to discuss songs/artists and then uses these terms for attributes of songs - when there is less data/discussions, uses ML to analyze the audio signal of a song and extract characteristics
74
What is personalizations?
it is about hollistically adjusting communications with customers based on customer characteristics (e.g. websites or emails tailored to individual users)
75
Risks of Personalization...
- missaplication - data privacy concerns...crossing the 'creepy' line - regulatory compliance
76
TRUE OR FALSE? Finance has long been technology, data and model oriented
TRUE
77
Scientific Method in 4 steps...
1. Clearly articulate a specific question 2. Guess an answer (hypothesize) 3. Identify empirical implications of guess 4. Compare implications with data
78
Data science workflow
1. Acquisition and verification 2. Preparation (most time consuming and critical) 3. Analysis 4. Communication ...to decision makers so they can take actions
79
What is corporate credit risk? Why is important? For whom is it important?
...inability of firms to repay financial obligations ...affects availability and price of credit ...it is important to investors, employees, customers, suppliers and taxpayers
80
What are some of the KPIs for credit rating?
* Liquidity ratios, focused on balance sheet coverage (e.g. current ratio, quick ratio, cash ratio) * Coverage ratios, focused on operating coverage (e.g. interest, debt service or cash) * Leverage ratios, focused on how the company is financing its operations (e.g. debt-to-ebitda, debt-to-
81
Which are the 3 largest credit rating agencies?
Moody's, Standard&Poor's, Fitch Ratings
82
Pre-requisites and steps to develop a model to distinguish between investment-grade and speculative-grade companies
1. Define very precisely what is success, e.g. a model that balances both type of errors, false positive and negative) 2. Data Science workflow i) data acquisition and verification (e.g. Wharton Research Data Services -WRDS- or S&P Compustat database), 10,000+ observations from more than 1,400 firms between 1995 and 2016 ii) data preparation...EDA = Exploratory Data Analysis iii) data analysis iv) communication 3. Model preparation. Y = f(x1, x2, ...xn) ...Y = outcome variable = 1 investment grade, 0 otherwise ...(x1, x2, ...xn) = model inputs, predictors, explanatory variables, et. 4) Train - test split...take a sample and split it for these two purposes 5) Prection...logit model - confusion matrix 6) Model training - Precision = probability of true positive conditional on positive prediction - Recall = probability of a true positive conditional on a positive outcome - F1 = Harmonic mean (weighted average of recall and precision)
83
TRUE or FALSE? Models and NOT data is what drives ML success
FALSE
84
AI main applications in HR...
1) Hiring 2) Engagement 3) Attrition 4) Internal career path
85
TRUE or FALSE Sentiment analysis and topic modelling are two of the main techniques to predict employees engagement
TRUE
86
TRUE OR FALSE Bias is intentional
FALSE
87
What is the blackstone ratio?
Bette than 10 guilty persons scape than that one innocent one suffers
88
TRUE OR FALSE Bias is easy to fix and does not affects to other performance metrics
FALSE It is difficult and there is a tradeoff between explainability and performance
89
Approaches to fix bias problem
- improve training data (e.g. change labels or adding weights) - more information - training engineers/dev - AI councils
90
What is explainable ai (XAI)?
relates to using methods where how and why the algorithm arrives at the results can be understood by human experts
91
Deep learning and decission trees are easy to explain?
Deep learning no, decission trees yes
92
in which business context is important AI explainability or interpretability?
- HR - Medical decissions
93
What are the different approaches to Explainable AI?
- SHAP. separates out how much each feature is contributing to the output/prediction - LIME. can start with a model that is very complex and difficult to interpret and generate a simpler comparison model that is locally accurate - SURROGATE TREES. Generates a simpler model (e.g. decision tree) that mimics the performance of the more complex model and is easy to interpret - AUTO-ENCODERS. Boils data down into a small set of features to make it easier to interpret
94
AI and blockchain...
AI requires massive amount of data that needs to be store, with the implications in terms of ownership and privacy. Bolckchain can help to solve this problem by exchanging the information between two parties without the need of a thrid-party to own or hold the data
95
What is blockchain?
is a data storage technology that is immutable, transactions cannot be changed once confirmed and therefore it is the truth.
96
How does blockchain work?
Identical ocpies of this ledgers are stored across all participating nodes of the network - the transaction is verified by each of the counterparts - the transaction entry is subsequently never falsified, updated or deleted transactions are verified using PGP encription
97
98
A portfolio approach to AI…
Quick wins + Long term projects Quick wins: focus’s on applying off-the-shelf ML to internal employees touch points LT projects: most impactful, re thinking R2-D2 processes
99
Resources that are democratizing ML…
1. hardware…specialized, GPU or TPU (x15-30 faster), available to rent at low cost 2. Software…open-source frameworks and developer tools (e.g.Tensor Flow). Automate data science process 3. Data and algorithms…marketplaces for data and algorithms
100
Key economic variables of AI
1. SW…cost down while moving to open-source 2. skills…lowering barrier while things get easier 3. computation 4. data…the key to ML applications, virtual cycle of data collection means the rich get richer
101
What is Auto ML?
Automates the ML workflow, just have to upload the data in the right format and drives you all through the process (eg data preparation, feature engineering, model selection and training, etc.)
102
What is the so called “hubris” concern?
The notion that a lack of understanding about the inner workings will lead to problems…likely to fuel a rapid increase in demand for AI ethicists who can grapple with the implications of algorithm decisions
103
AI in the Organization Structure…
1. Integrate AI strategy into broader company strategy 2. Focus on revenue and growth rather than on the cost side alone 3. Right data infrastructure in place! 4. Think not only on producing AI but also consuming it 5. Investing in talent
104
TRUE OR FALSE Data analytics and AI can greatly facilitate process innovation
TRUE
105
AI tools + orientation toward process improvement leads to…
Productivity gains ca. 7% (lower if no process improvement orientation
106
TRUE OR FALSE The more oriented toward product and service innovation the more likely you to invest in AI
False
107
What is diverse recombination (1) and radical innovation (2) in the context of AI innovation?
(1) Combining many technology elements together in a new way (2) technology class that has not existed before
108
TRUE air FALSE aI and data analytics support more novel innovation than diverse recombination
False
109
Ways to organize innovation at company
Descentralización and centralization
110
TRUE OR FALSE Data analytics helps decentralized innovation structure
TRUE
111
TRUE air FALSE Data analytics may be specially great at new combinations or reuse of existing combinations
TRUE
112
TRUE OR False Decentralized innovation structures benefit tremendously by having investment in AI and data analytics
113
TRUE or FAlse You need to hire people with AI skills who are also inventors
False…the complement and work together though
114
TRUE air FALSE Companies that disperse AI and data analytics skills across functions are far more productive than firms that did not
115
YRUE OR FAlse Robot adopting forms increase productivity and hire more people
TRUE…robots are associated with less managerial hiring
116
What does it mean 'algorithm overfit'?
an algorithm that fits historical data too well but fail in realistic test conditions
117
Main AI risks to Society
- Harms of allocation...e.g. unfair loan approval - Harms of representation...e.g. airport screening system more likely to false alarm people of color
118
Main AI risks to Firms
- Reputational risks - Legal risks - Regulatory risks
119
What are the legal responses to Algorithm Bias?
- legal claims...quite limited - Disparate impact - US based, limited - GDPR -limited context ...new proposal under construction, e.g. EU AI paper or US Algorithm Accountability Act
120
Stages of the Privacy Lifecycle
1. Collection 2. Aggregation/analysis 3. Storage 4. Use 5. Distribution
121
Data protection, what are the legal, technical and operational mechanisms
Legal: GDPR (EU) human-rights based and opt-in, US market based, user choice (opt-out) Technical level: - Federated learning - differential privacy---adding noise Operational level: - Privacy by design - formal mechanisms such as data impact assessments - etc.
122
TRUE OR FALSE Too much user control of the AI algorithm does impact performance
TRUE
123
Does too much transparency and control improve proportionally users trust?
no, according to different experiments with just some control is enough to get trust and does not increase proportionally
124
Interpretable ML = Explainable ML?
YES
125
What is the difference between global and local interpretability?
- Global interpretability: Can we explain at a high level what are the most important variables driving a model’s predictions (e.g. income, credit history, etc)? - Local interpretability: Can we explain the most important variables driving a particular prediction or decision (why Kartik’s loan application was not approved)?
126
What would an audit for algorithms look like?
- creation of an inventory of all ML models employeed - specific use cases of these models - names of the developers & business owners of models - risk rating: social/finantial risks if the model fails
127
What areas should an audit for algorithms look at for high-risks models?
1. inputs, e.g. quality, bias, etc. 2. Model, e.g. alternative models, statisticals tests,transparency, stress test, etc. 3. Outputs, decisions with explanations, outliers
128
What are the 3 lines of defense for managing risks at data science?
1. Control...model developer 2. Transparency...data science QA 3. Audits...data science auditor
129