Exam Flashcards

(44 cards)

1
Q

What is the bottleneck of user-based CF and how does item-based Cf avoid it

A

the search for neighbours (in real-time) among a large user population of potential neighbours.
item-based CF avoids this by computing similarities between items instead of users

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the intuition of item-based CF?

A

users are interested in items similar to those previously experienced

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the edge that item-item similarities have over user-based and why?

A

They are more “stable” as the domain of items changes less than users, allowing for less frequent system updates.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the benefit of adjusted cosine similarity in item-based CF?

A

It accounts for differences in how users rate items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the underlying heuristic of CF?

A

people who agreed or disagreed on items in the past are likely to agree or disagree on future items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the steps in the UBCF algorithm

A
  1. Data representation
  2. similarity computation
  3. neighbourhood formation
  4. prediction/top-N list
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the main issue with the MSD similarity metric?

A

it assumes that users rate according to similar distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

For MSD similarity, what are two important features of the metric

A
  • summations over co-rated items only -> else set to 0
  • results in a value [0,1]
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

For Pearson similarity, what are two important features of the metric

A
  • summations over co-rated items only -> else set to 0
  • results in a value [-1,1]
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the benefit of significance weighting to Pearson

A

It adjusts for the number of co-rated items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What impacts the range of cosine similarity results

A

the non-negativity of ratings

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Briefly describe some of the extensions to Pearson Correlation

A
  • jaccard index: modify similarity weights by the number of co-rated items between users divided by the union of items
  • default voting: calculate over the union of items applying a default to non-co-rated items
  • case amplification: emphasise weights which are close to 1 and reduce the influence of lower weights
  • inverse user frequency (IUF): gives more weight to ratings for niche items
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

CF advantages

A
  • quality and taste
  • item descriptions/features
  • serendipitous recommendations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

CF Limitations

A
  • cold start problem
  • early rater problem
  • sparsity problem
  • scalability
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What do RS help drive?

A

demand down the long-tail; benefits to both consumers and retailers alike

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does CF automate?

A

The “word-of-mouth” process

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the key difference between CF and Content-based recommendation?

A

The use of the item’s descriptions/features (content)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

How is document-document similarity calculated in Content-based?

A

The cosine of the angle between the document’s vectors

19
Q

What is case-based recommendation?

A

A form of content-based recommendation which represents items using a well-defined set of features and feature values

20
Q

List sources of recommendation knowledge and give examples of each

A
  • transactional and behavioural data: clicks, purchases, likes.
  • content and meta data: text, features, tags.
  • experiential data: user-generated opinions.
21
Q

List some properties of consumer reviews

A
  • ubiquitous
  • abundant
  • usually independent
  • often insightful
22
Q

Do reviews matter?

A

Yes. Research shows that reviews help users to make better decisions. They increase conversion rates and improve satisfaction.

23
Q

What are some considerations when making recommendations and ranking them

A
  • business imperatives: e.g. promoting items
  • domain
  • the influence of particular items
24
Q

How are non-personalised recommendations usually presented?

A

in the form of a top-N ranked list

25
Personalised recommendation considerations
- acquiring users' personal information - recommendation output - personalisation: ephemeral or persistent
26
what is ephemeral personalisation?
matching current activity
27
what is persistent personalisation?
matching long-term interests
28
Benefits of RSs?
- turning web browsers into buyers - cross/up-selling - customer loyalty
29
What are the two main approaches to content-based recommendation and how do you distinguish between them?
- traditional content-based (unstructured) - case-based (structured)
30
List term-weighting approaches
- term frequency - normalised term frequency - inverse document frequency (IDF) - binary weighting - NTF + IDF
31
What is term stemming?
considering terms of similar meaning as being the same for matching purposes
32
How are stop words handled?
They are omitted from the term-document matrix
33
Differences in making recommendations for NP vs P
- NP: rank recommendation candidates by similarity to the target item - P: rank recommendation candidates by similarity to the target user's profile
34
Reasons why case-based recommendation is a powerful approach to recommendation?
- facilitates the search and navigation of complex information spaces - flexible user feedback options - suitable for e-commerce applications
35
Underlying assumptions of case-based reasoning
- the world is a regular place and similar problems tend to have similar solutions - the world is a repetitive place and similar problems tend to recur
36
CBR Cycle
- Retrieve - Reuse - Revise - Retain
37
Key differences between Case-based from content-based systems
- case representation - similarity assessment
38
In case-based recommenders, what do the following symbols represent: Sim(T, C), w, v, Sim(v1, v2)
- similarity between the target case and candidate case - relative importance of a feature - $v_{c,i}$ is the value of feature i in case C - feature-level similarity
39
What is a key issue for case-based recommenders
acquiring similarity knowledge (numerical vs non-numerical, symmetric vs asymmetric)
40
The ideal balance of similarity vs density in similarity-based recommendation
we want the top-k retrieved items to be equally similar to the target item/user profile but in different ways
41
Two algorithms used for balancing similarity vs density in similarity-based recommendation
- Bounded greedy selection - Shimazu's algorithm
42
Advantages of content-based systems
early recs can be made
43
Issues of content-based systems
- feature identification and extraction can be problematic - content-based filters cannot distinguish between low and high-quality items - a "more-like-this" approach -> low serendipity
44
Evaluation methods for systems
- live-user trials - offline evaluations