Srinivasan & Chandler - Biases in AI systems Flashcards

1
Q

The critical role of machine learning (ML)

A

Machines are asked to preprocess the data appropriately, choose the right models from several available ones, tune parameters, and adapt model architectures to suit the requirements of an application

  • It’s important to educate ML-developers about the various kinds of biases that can creep into the AI-pipeline
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Biases in the AI pipeline

A

A typical AI pipeline starts from the data-creation stage > collecting the data, annotating or labeling it, and preparing or processing it into a format that can be consumed by the
rest of the pipeline

  • Data-creation bias à specific types of biases can occur during the creation of datasets
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Sampling bias

A
  • Selecting particular types of instances more than others (and thereby rendering the dataset under representative of the real world)
  • One of the most common types of dataset biases
  • This can lead to poor generalization of learned algorithms
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Measurement bias

A

Errors in human measurement, or because of certain intrinsic habits of people in capturing data

  • Example > some photographers might tend to take pictures of objects in similar ways, which results in the dataset containing object views from certain angles only
  • There could also be a bias as a result of the device used to capture datasets
  • Example à cameras used to capture images may be defective, leading to poorcquality images
  • Using proxies instead of true values in creating datasets could also lead to bias
  • Example > arrest rates are often used instead of crime rates; doctor visits and
    medications are used as indicators of medical conditions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Label bias

A

Is associated with inconsistencies in the labeling process à different annotators have different styles and preferences that get reflected in the label created
- Grass vs. lawn, picture vs. painting, etc.

  • It can also happen when the subjective biases of evaluators affect labeling
  • Example: annotating emotions experienced in a text, biased by their cultures, beliefs and introspective capabilities
  • Confirmation bias à the human tendency to search for, interpret, focus one, and remember information in a way that confirms ones preconceptions
  • So, labels may be assigned based on prior belief rather than objective assessments
  • Peak end effect à memory-related cognitive bias in which people judge an experience based largely on how they felt at its peak (most intense point) and at its end, rather than based on the total sum or average of every moment of the experience
  • Example: some annotators may give more importance to the last part of a conversation in assigning a label
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Negative set bias

A
  • ## Negative set bias: being introduced in the dataset as a consequence of not having enough samples representative of ‘the rest of the world’Example à state that datasets define a phenomenon (object, scene, event, etc.) not just by what it is (positive instances), but also by what it is not (negative instances)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Framing effect bias

A
  • Based on how the problem is formulated and how information is presented the results obtained can be different and perhaps biased
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Sample selection bias

A

Is introduced by the selection of individuals, groups, or data for analysis in such a way that the samples are not representative of the population intended to be analyzed

  • This occurs during data analysis as a result of conditioning on some variables in the dataset (skin color, gender, among others)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Confounding bias

A

Bias can arise in the AI model if the algorithm learns the wrong relations by not taking into account all the informational the data or if it misses the relevant relations between features and target outputs

  • Originates from common causes that affect both inputs and outputs
  • Omitted variable: occurs when some relevant features are not included in the analysis
  • Proxy variable: even if sensitive variables such as race and gender are not considered
    for decision making, certain other variables used in the analysis might serve as proxies,
    for those sensitive variables
  • Example: zip code might be indicative for race, as people of a certain race might predominantly live in a certain neighborhood
  • Also known as indirect bias or indirect discrimination
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Omitted variable:

A

Occurs when some relevant features are not included in the analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Proxy variable

A

Even if sensitive variables such as race and gender are not considered for decision making, certain other variables used in the analysis might serve as proxies, for those sensitive variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Design-related bias

A

Sometimes biases occur as a result of algorithmic limitations or other constraints on the system such as computational power

  • Algorithm bias à bias that is solely induced or added by the algorithm
  • Ranking bias à showing certain results in search engines
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Human evaluation biases

A

Recall bias: how much information humans can recall

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Sample treatment bias

A
  • Sample treatment bias: introduced in the process of selectively subjecting some sets of people to a type of treatment
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Guidelines for ML-developers:

A
  • Incorporation of domain-specific knowledge is crucial in defining and detecting bias
  • It is important to understand which features of the data
    are deemed sensitive based on the application (‘age’ in
    determining who gets a loan, medical treatment, etc.)
  • Datasets used for analysis should be representative of
    the true population under consideration
  • Appropriate standards have to be laid out for annotating
    the data
  • Identifying all features that may be associated with the
    target feature of interest is important
  • Features that are associated with both input and output
    can lead to biased estimates
  • Restricting data analysis to some truncated portions of
    the dataset can lead to unwanted selection bias
  • In validating the performance of a model, care has to be taken to guard against the
    introduction of sample treatment bias (not restricting the test conditions to a certain
    subset of the population
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the biases in the “ AI pipeline”?

A
  1. Sampling bias
  2. Measurement bias
  3. Label Bias
  4. Negative set bias
  5. Framing effect bias
17
Q

Biases related to the algorithm/data analysis

A
  1. Sample selection bias
  2. Confounding bias
    3.Design-related bias
18
Q

Bias related to evaluation or validation

A
  1. Human evaluation bias
  2. Sample treatment bias
19
Q

3 types of biases

A
  1. Biases in the AI pipeline
  2. Biases related to the algorithm/data analysis
    3.Bias related to evaluation or validation