Case Study — NextStar Recommender System Flashcards

1
Q

What is Explicit Behavioural Data?

A

Personal Data/Profile, user is aware that it is being collected.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Example of Explicit Behavioural Data.

A

When a user rates a video clip.
What they rate that video clip as.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is Implicit Data?

A

Data that the user is not aware that it is being collected.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Example of Implicit Behavioural Data.

A

Click Data, Purchase Data, Key logger.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is bad about implicit Data?

A

The advertisements could become too specialised and becomes too compelling. Too much user spending.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is right to anonymity?

A

they don’t know who you are but they know what you are doing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is right to privacy?

A

they know who you are but don’t know what you are doing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is Machine Learning?

A

Algorithm makes intelligent decisions based on what it has previously learned. Programmer inputs data and answers and the algorithm will output rules to help produce future answers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the three types of machine learning?

A

Supervised, Unsupervised, Reinforced.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does machine learning entail?

A

Using the data and the answers to find the rules/patterns.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Why is supervised learning effective?

A

It’s precise.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is supervised learning?

A

use of labeled datasets to train algorithms that to classify data or predict outcomes accurately.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What two techniques are used in supervised learning?

A

Regression and Classification.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is regression?

A

Looking at relationship between results and features.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is Classification?

A

Sorting items into categories by features.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is unsupervised learning?

A

Giving data and letting the machine find patterns and labels itself.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Why is unsupervised learning effective?

A

Good for sorting data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What three techniques are used in unsupervised learning?

A

Clustering and Dimensionality reduction and Association.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is clustering?

A

Divide by similarity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is association?

A

Identifying sequences.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What id dimensionality reduction?

A

Taking high dimension data (complicated) and reducing it to low dimensional data (understandable and meaningful.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is reinforced learning?

A

Gives algorithm labels and rules and lets it achieve its goals. Then rewarding it or punishing it depending on the outcome.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What do most recommender systems use for learning?

A

Supervised Learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What are the two types of filtering?

A

Content and Collaborative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is content filtering?

A

Personalised results based on previous data interactions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is collaborative filtering?

A

What similar people interacted with.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What do the best recommender systems used for filtering?

A

A hybrid of both. In combination with different machine learning methods.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What is good about Collaborative filtering?

A

Other users are used.
Chance is involved.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What is bad about Collaborative filtering?

A

Needs more data
Problems for new users/products
If product has no ratings it can’t be used.
People aren’t all the same.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What is good about content filtering?

A

Works with lesser data
User specific

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What is bad about content filtering?

A

Over-specification (won’t try new stuff)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What is the main difference between collaborative and content filtering?

A

Collaborative = emphasises user.
Content = emphasises content.

33
Q

What is the cold start problem?

A

Cold start problem is when the system cannot draw any inferences for users or items about which it has not yet gathered sufficient information.

34
Q

Which filtering system suffers from cold start problem?

A

Collaborative = they don’t know who you are like.

35
Q

The case study uses collaborative filtering. What two algorithms does it use to recommend content?

A

K nearest neighbour (K=nn)
Matrix factorisation.

36
Q

What is k nn?

A

stores available cases and new cases based on similarity. Classifying data on what it is similar to.

37
Q

What is k?

A

K refers to the number of cases one is close to so it can decide whether it matches?

38
Q

What type of thing is k?

A

Hyperparameter

39
Q

What is parameter tuning?

A

Choosing the right diameter of k.

40
Q

How does one know if the parameter tuning is correct?

A

Trial and Error

41
Q

What happens if k is too small?

A

Neighbours are too specific = won’t try new things.

42
Q

What happens if k is too big?

A

More work = more expensive = more storage needed.

43
Q

What is matrix factorisation?

A

Breaking down bigger things into smaller ones.

44
Q

How does matrix factorisation work?

A

Uses data of how similar people have rated certain movies and using that to find patterns of how other people will.

45
Q

Example of matrix factorisation.

A

If someone who liked action movies loved the movie with the car. Then someone else who loved action movies might love the movie with the car.

46
Q

What are dependencies?

A

People aren’t completely random they have preferences that can often be patterned,

47
Q

What is good about matrix factorisation?

A

Instead of storing data of everyones individual preferences, You can store people ad preferences and predict it when necessary.
Plus it is easier to teach to training models

48
Q

What is bad about matrix factorisation?

A

Cold start problem = won’t be as accurate until we have data to use.

49
Q

What is cloud computing?

A

Is the storage of data in a servers and databases that can be accessed over the internet.

50
Q

What are the three types of cloud computing models?

A

Cloud deployment models describe the structure of the cloud, what they run through, who runs them, how they run.
IaaS, SaaS and PaaS

51
Q

What is IaaS?

A

On demand computing resources on cloud using infrastructur e, that when paid will allocate more space.

52
Q

What is SaaS?

A

Run through internet e.g. software/browser.

53
Q

What is PaaS?

A

Run through platform.
Dedicated to providing services for apps to run.

54
Q

What are the cons of Infrastructure aaS?

A

Does not have Unique App
Legal limitations
Potential Security flaws.
Doesn’t; work without internet connectivity.

55
Q

What are the cons of Software aaS?

A

Not very flexible
Data stored offsite.
Dependent on server infrastructure.
Compromise for security.

56
Q

What are the cons of Platform aaS?

A

Not very scalable.
Data security to third party ownership.
Your proffered tech stack might not be available.
Transition to PaaS is hard.

57
Q

What cloud computing model gives you the most managing potential?

A

IaaS

58
Q

What cloud computing model gives you the least managing potential?

A

SaaS

59
Q

What is the root mean square error?

A

Measures how much error there is between two data sets.

60
Q

What does the root mean square error let us discover?

A

Evaluate the quality of machine learning predictions.

61
Q

What is online behavioural data?

A

Generated information about you from the things you do online to predict you.

62
Q

What sort of events allow companies to collect online behavioural data?

A

Clicking a link
Watching/reading
Rating an item
user customer service
placing an order
response to adverts/marketing

63
Q

What is behavioural data?

A

What you like and don’t
how you responds
what are you likely to do.

64
Q

What is personal data?

A

Who you are
where you live
who are you related to.

65
Q

What is overfitting?

A

System recommendations are too exact or too close, and may therefore fail to fit to additional data or predict future observations reliably

66
Q

State the two ways to measure a recommender system’s effectiveness

A

Using the click-through rate measures how many people click recommendations. Furthermore, predictive accuracy metrics measures how close a recommender predicted ratings were closer to actual user ratings.

67
Q

Which is the best cloud computing model?

A

PaaS is better because you get the mixture of customisation with outside control, meaning it can be efficient with and without your intervention and still be customisable.

68
Q

What is Fmeasusre?

A

a way to measure how accurate a recommender system is. It takes into account both the system’s recall score and precision score.

69
Q

What does MAE stand for?

A

Mean absolute error

70
Q

What is MAE?

A

the average difference between the observed result and the expected result. Used to measure the accuracy of a recommender system.

71
Q

What is Precision?

A

measures the ratio of correct items identified out of total items identified

72
Q

What is the Stochastic gradient descent?

A

an optimisation algorithm to find the model parameters that correspond to the best fit between predicted and actual outputs.

73
Q

What is cost function?

A

Cost function is a parameter deciding the success of an algorithm based on how many errors there are in each stage by comparing predicted values and actual values.

74
Q

How can NextStar use cost function to improve their algorithm?

A

By using cost function, NextStar can see where the errors are in the system in a way to start error spotting and error solving. It allows the algorithm to become more reliable as the results keep getting checked.

75
Q

Why is IaaS a good cloud computing model?

A

IaaS would be best for NextStar because it is customisable but also helped by server managers. The best of both worlds.

76
Q

What is F-measure or F-score?

A

A combination of two methods of accuracy, precision and recall, which tests the success rate of filtering.

77
Q

Hyperparemeter

A

A hyperparameter is a machine learning parameter whose value is chosen before a learning algorithm is trained.

78
Q

Mean Average Error

A

The mean of the errors of the filtering system.