Class Six Flashcards

1
Q

What is outlier detection in machine learning?

A

Outlier detection refers to the process of identifying data points or observations that deviate significantly from the majority of the dataset. Outliers can be caused by errors, anomalies, or rare events.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the advantages of outlier detection?

A

Advantages of outlier detection include identifying data quality issues, detecting anomalies or fraudulent activities, and improving the accuracy of predictive models by removing influential outliers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the limitations of outlier detection?

A

Limitations of outlier detection methods include the subjectivity of defining what constitutes an outlier, the potential presence of masked outliers, and the impact of outliers on the overall analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is feature selection?

A

Feature selection is the process of selecting a subset of relevant features from a larger set of available features in the data. It aims to improve model performance, reduce overfitting, and enhance interpretability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the advantages of feature selection?

A

Advantages of feature selection include improved model interpretability, reduced computational complexity, increased generalization performance, and the elimination of irrelevant or redundant features.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the limitations of feature selection?

A

Limitations of feature selection include potential loss of information if relevant features are removed, the challenge of selecting the optimal subset of features, and the sensitivity to feature interactions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is finding similar items?

A

Finding similar items involves identifying items that are similar or related to a given item based on their attributes, characteristics, or usage patterns. It is commonly used in recommendation systems and search engines.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the advantages of finding similar items?

A

Advantages of finding similar items include personalized recommendations, improved user experience, identification of related products or content, and the potential for cross-selling or upselling.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the limitations of finding similar items?

A

Limitations of finding similar items include the challenge of defining similarity metrics, scalability issues with large datasets, and the potential for serendipity problems where similar items may not always be relevant or desired.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are recommender systems?

A

Recommender systems are information filtering systems that predict and suggest relevant items to users based on their preferences, historical behavior, or similarities with other users.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the different types of recommender systems?

A

Content Filtering: Assumes access to side information about items
Example: Pandora

Collaborative Filtering: Does not assume access to side information about items
* Example: Netflix
* Personal tastes are correlated:
* If Alice and Bob both like X and Alice likes Y then Bob is more likely to like Y.

In summary, collaborative filtering is based on finding similarities in user or item behavior to make recommendations, while content-based filtering relies on item attributes and user preferences for those attributes. Collaborative filtering looks at user-user or item-item relationships, while content-based filtering focuses on item characteristics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the two types of collaborative filtering?

A

Neighborhood: Find neighbors based on similarity of movie preferences.
Latent Factor: Assume that both movies and users live in some low-dimensional space describing their properties.Recommend a movie based on its proximity to the user in the latent space.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the advantages of recommender systems?

A

Advantages of recommender systems include personalized recommendations, increased user engagement, improved customer satisfaction, and potential revenue growth through cross-selling and upselling.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the limitations of recommender systems?

A

Limitations of recommender systems include the cold-start problem for new users or items, the potential for echo chamber effects or limited diversity, and the need for data privacy and ethical considerations.

Issues:
* Diversity: How different are the recommendations?
* Persistence: How long should recommendations last?
* Trust: Tell user why you made a recommendation..
* Social recommendation: What did your friends watch?
* Freshness: people tend to get more excited about new/surprising things.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How can outlier detection be performed?

A

Outlier detection can be performed using various techniques such as statistical methods (e.g., z-score, modified z-score), distance-based approaches (e.g., k-nearest neighbors), or machine learning algorithms (e.g., isolation forest, one-class SVM).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How can feature selection be done?

A

Feature selection can be done using methods like filter methods (e.g., correlation, chi-square test), wrapper methods (e.g., recursive feature elimination), or embedded methods (e.g., Lasso regression, decision tree-based feature importance).

17
Q

How can recommender systems be implemented?

A

Recommender systems can be implemented using techniques like collaborative filtering (user-based or item-based), content-based filtering, hybrid methods combining multiple approaches, or more advanced techniques like matrix factorization and deep learning.

18
Q

What are L1 and L2 penalties?

A

L1 penalty (Lasso regularization): L1 penalty refers to the use of the absolute values of the coefficients as a regularization term in the loss function. Lasso regularization encourages sparsity by driving many coefficients to exactly zero, effectively performing feature selection. It is particularly useful when dealing with high-dimensional datasets and can lead to models that have a subset of important features.

L2 penalty (Ridge regularization): L2 penalty refers to the use of the squared magnitudes of the coefficients as a regularization term in the loss function. Ridge regularization helps control overfitting by penalizing large coefficient values and encourages the model to distribute the weight among all features rather than emphasizing a few. It tends to produce models with smaller but non-zero coefficients.

Both L1 (Lasso) and L2 (Ridge) regularization methods are widely used for controlling model complexity and preventing overfitting in machine learning. They have different effects on the resulting models, with L1 regularization promoting sparsity and feature selection, while L2 regularization encourages a more even distribution of weights across features.