Chapter 2 from model to production Flashcards

1
Q

What are the four key areas that DL is good at?

A

Computer vision,Text,Tabular,and reccomendation systems(collab filtering)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Tell me about Computer vision

A

Two major areas that computer vision models are good at would be objection recognition and object detection. Computers can recognize images just as well as humans(object recognition). They are also good at recognizing where objects are in an image and can name them(object detection). One major drawback is that computers are not that great at detecting images when they are structurally different than what was used in the training set. If there were no black and white or hand drawn images in the dataset, it would be bad at figuring out what those images were. There’s no one end-all solution for figuring out what kind of images aren’t in your dataset, but there are some ways to be able to recognize that unexpected images popped up in your dataset( out of domain error/edge cases). This can be solved by having more images and types but one major drawback is that labeling is a slow and expensive process.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is data augmentation and what is it used for?

A

One major drawback of computer vision is that out of domain error can mess with your model, so you can try to have many types of data structures and make your model more robust by slightly changing how your pictures look like. For example, you can lower the brightness or contrast or rotate the image by a little bit.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is an interesting way to use computer vision revolving around sound?

A

One way to figure out what kind of “sound” is being produced is by turning the sound into sound waves and then running computer vision models on this. Another example would be how someone made an anti-fraud model by following the clicks and how the person used their mouse in order to catch “bots”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Tell me about NLP

A

NLP is another domain in which DL models are rather good at. Computers are good at tasks such as classifying short and long documents and figuring out which is spam and which isn’t. They are also good at figuring out the key parts of something like a news article. Deep learning NLP models are also good at generating context appropriate responses such as responses to social media posts or mirroring someone’s writing style. They aren’t good at generating correct responses. We can give a computer a bunch of medical documents and then tell it to diagnose patients and tell them about the disease, but it wont be very accurate or useful. One issue revolving around NLP is that bot farms are a danger to society right now. The scary part about it is that text generation models will always be ahead of models that recognize artificially generated text. This is because the bot can use a classifier to learn to avoid being classified.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Can we combine text and images?

A

These models are more advanced than people suspect->we can train a model to output captions for pictures that are content appropriate, but there’s no guarantee the captions will be correct. I think going into the future, there will be use cases for combining text and images in the medical world such as maybe a model looks through thousands upon thousands of radiology images and flags certain photos that might be put into an “urgent” area which the doctor can look at first.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Tell me about Tabular Data

A

DL has been making some strides in this region.Odds are if you already have a system in place such as random forests or gradient boosting, your DL model won’t be much better. You could use DL model to boost your gradient model by doing something like running a NLP to grab the sentiment (1-10) of a book review, then put in a column on your tabular data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Tell me about the recommendation systems

A

In my opinion, they are really just a special type of tabular data. They involve high cardinality categorical variables representing users and another representing products. One example of this would be how amazon uses sparse matrixes for its recommendation systems for its customers. If customer A bought item 1 2 4 10 and customer b bought 2 4, then customer a would be recommended 1 and 10. can be used in conjunction with other model types such as NLP to diversify the columns. Usually runs into a feedback issue. For example, if i look up a certain author and his book, buy it, then i dont buy anything for a while, amazon will recommend me books by the author. It’s also bad for recognizing when things are in bundles, ofr example if i buy book 1 of a trilogy, it would recommend me a package of the whole trilogy even though i obviously have book 1 already

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Tell me about the drive train approach and why is it important?

A

An approach to give some form to the mayhem that is modeling. Starts off with clearly defining the business question, then you figure out what kind of inputs you can actually change, then you collect the data to learn more about it , and only then can you start building the model. A real life example of this would be Google. When they started off their question was “whats the main objective that users have when they perform a search query” which obviously was to show the most relevant search results. Then Google had would be “what kind of levers do we have to pull to achieve this” which would be ranking the the search results. After that, you need to figure out what kind of data and how to collect it to achieve this task, and then only after that can you start model building

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are some of the main issues of Deep Learning Models, and how can we help to avoid or lessen their impact?

A

I think a comparison to software engineering would be a good idea here. In softwar engineering, its easier to find out which step your product when wrong on, but in DL models, since we don’t exactly know the complexities of what goes in, it becomes a lot harder to do that. For example ,if we built a detection system for bears in a forest, we would have to learn how to identify all the types of data structures( night time, video data,low resolution, and all the other types of out of domain error). Another issue would be domain shift where the data changes over time. Imagine if you have a health care company, you would have to decide how much of old patient data to use because there are always innovations in the medical realm which allow for changes to ones life span, treatments to illness,etc. It’s always important before deploying a model to run a manual model in parallel(have a park ranger watching the area),have a human sanitfy check the solutions. It’s also important to limit the scope of the development, so maybe deploy to a small group of people the health insurance predictor or roll it out for a couple weeks and then look into it. If everything is going good, the you could gradually expand rather than dropping everything at once. Like before ,another major issue is the feedback loops to keep an eye on ( police crime prediction problem)-> would only end up predicting arrests not actually crime could be a positive feedback loop.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Why do p values suck?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the difference between a loss and metric?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Does adding a validation set gurantee we don’t overfit?

A

No, it depends on the type of model you are using asnd how you go about tweaking your model. It is possible to end up overfitting to your validation set if you check too many times, so its always a good idea to have a test dataset out there just to sanity check at the end of all your changes. Also, if you’re doing something like predicting future sales and you are growing a lot, you wouldn’t want to keep a random chunk in the middle as your validation set, you’d want to have the validation set be the data that is the closest to the end of the dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q
A
17
Q
A
18
Q
A