Week 5: Training Neural Networks Flashcards by Audrey Balas

What is required for training a dataset?

Input (x) and ground truth (y)

How well did you know this?

Not at all

Perfectly

Explain how input and ground truth is used to train data.

Ground truth is provided by humans
Provide corrupted input
Compares prediction from the neural network (y hat) to the ground truth
Based on the prediction and the ground truth, you are given an error
Based on the error, the weight and bias ia adjuted to bring it closer to the gorund truth

How well did you know this?

Not at all

Perfectly

Ground truth

verified, true data used fro training

How well did you know this?

Not at all

Perfectly

Why do we need a model if you already have the ground truth?

The dataset you input is only a fraction of the amount of data in the real word where you don’t know the ground truth. We want to train the neural network with the data we do have so we can apply it to similar scenarios where we don’t know the answer.

you can’t use neural network training fot one task and apply it to a different task
E.g. you can’t train a network to predict the cost of a house in Los Angeles and apply it to New York

How well did you know this?

Not at all

Perfectly

Is the ground truth supervised or unsupervised?

Supervised - you label each object in an image

How well did you know this?

Not at all

Perfectly

Supervised data

annotated data (done by humans)

How well did you know this?

Not at all

Perfectly

One hot encoding

Turns every label into a unique binary vector

The length is equal to the number of unique categories in the data

Each vector contains a single 1 at the position corresponding to the category

0s everywhere else

How well did you know this?

Not at all

Perfectly

Explain how one-hot is used for training.

You compare the one-hot ground truth to the predicted activation outputs

If one hot ground truth doesn’t match activation output, there is an error

How well did you know this?

Not at all

Perfectly

What is a softmax function?

You turn all of the output numbers into a probability

If you have a large activation in the output, this will become the highest probability

How well did you know this?

Not at all

Perfectly

Why is the dataset squared when making a prediction

You don’t want the error rate to be a negative number

How well did you know this?

Not at all

Perfectly

What does the loss number give you a prediction of?

an estimation of how your neural network is currently performing

How well did you know this?

Not at all

Perfectly

How do you calcluate the loss?

(Predicted output - ground truth output)^2
Add up all of the indivudual training values
Multiply by (1/n)

*n = number in training dataset

refer to notes on page 19

How well did you know this?

Not at all

Perfectly

Explain training loss when you first begin training a neural network

At the beginning, the training loss will be very high

Begins by plugging in random weights, but the loss gradually decreases over time

X represent how many times you change the weight (i.e. steps), Y is training loss (i.e. the error)

Training loss will never reach 0

How well did you know this?

Not at all

Perfectly

According to Historian Lisa Gitelman…

every field has its own defintion of what qualfies as data

How well did you know this?

Not at all

Perfectly

Explain the CU Colorado Springs AI contoversy

-Professor installed camera to collect images of students and faculty to train dataset
-Did so without their permission
-Front facing images are too easy - wanted people to be looking away
-State senate banned facial recognition technology in Colorado

How well did you know this?

Not at all

Perfectly

Explain how the Duke research used facial recognition.

Study These Flashcards

-Duke research used multiple cameras to track the same person across campus
-A number is associated with each person, making it possible to track an individual
-Don’t need to add a neuron for every pearson. Instead, there are two neurons that identify if it is the same person or not – it becomes a yes or no question

OpenAI Lawsuit with NYT

Study These Flashcards

Open AI uses articles from NYT to train it, despite IP laws
With AI, what qualifies as fair use - sometimes if it contributes to society it is allowed

How is DeepSeek trained?

Study These Flashcards

from data in other AI models

Explain CAPTCHA

Study These Flashcards

A turing test used to prove that you are human by clicking on the object it tells you to (Usually cars and traffic lights because they want to collect data to train self-driving cars)

Labeling is expensive but this provides a cheap and efficent way to label

What are some ethical issues for collecting data to train AI

Study These Flashcards

Consent (e.g. using artists’ work for training)
Cheap labor (need cheap labor for supervised learning)
e.g. OpenAI used Kenyan workers on less than $2 per hour to make chatgpt less toxic

Components of ImageNet

Study These Flashcards

One component: not possible without iphone data
Second component: the labelling aspect
(paying students to label was too expensive so posted the job on Amazon platform that enabled people all around the world to sign up- outsourcing labor made it cheaper)

Limitation of loss minimization

Study These Flashcards

This model seems good in theory but can cause a lot of problems
Google racist mistake identifying two black people as “gorillas”

How does the example of the gorilla miscategorization highlight an issue with training?

Study These Flashcards

The issue with this model is that the errors have the same weight (either 0 or 1)

They treat all miscategorization the same
(E.g treats miscategorization a dog bread and labeling a human as a gorilla in the same way)

Instead, we should change what we place most importance on when evaluating an error (e.g. prioritizing accurately categorizing humans) – will prioritize dropping the error that is most important (e.g. labelling a human as a gorilla)

But model will be less motivated to correct errors of things that are less important

Loss meaning

Study These Flashcards

the error - to what degree does the prediction differ from the ground truth

How did Google solve the gorilla problem

Google banned the label “gorilla” from the image classifier Another approach might be to train it with more images of gorilla and humans (technical solution) But even with 99.9% accuracy, there are never 100% accurate and there is still a risk of a mistake Google wanted to avoid this mistake again, even with high accuracy

What is the challenge with training data and bias

we don't know our own value system and biases Whatever bias we have is amplified by AI The images we see on generative AI are a result of bias and then these images reinforce our bias (it is a cycle) E.g. gender distribution in occupational portraits created by AI generators

Two camps for approaching bias in AI

As long as no one is tweaking the data, it is what it is Others want to combat AI reflecting the bias of humans

issue with addressing AI bias

Researchers are actively fighting AI bias but sometimes they go too far E.g. getting images of founding father wrong by changing their race Model developed by a few developers who make key decisions, which influence how correcting bias is approached

Week 5: Training Neural Networks Flashcards

(28 cards)