Machine Learning Flashcards

1
Q

Cosine Similarity

A

Measures the cosine of the angle between two vectors to determine the similarity between two items.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Manhattan Distance

A

Calculates the distance between points in a grid-based layout as the sum of the absolute differences of their Cartesian coordinates.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Jaccard Similarity

A

Compares the similarity and diversity of sample sets, calculating the size of the intersection divided by the size of the union of the sets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Spearman’s Rank Correlation

A

A measure of rank correlation that assesses how well the relationship between two variables can be described using a monotonic function.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

K-Nearest Neighbors (KNN)

A

A classification algorithm that stores all cases and classifies new cases based on a majority vote of its k nearest neighbors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Matrix Factorization

A

A collaborative filtering technique using decompositions like SVD to predict missing entries in a user-item interaction matrix.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Content-Based Filtering

A

Recommends items based on their similarity to items previously liked by the user, using the features of the items themselves.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Cold Start Problem

A

A challenge in recommendation systems where there is insufficient data on new users or items to make accurate recommendations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Item-to-Item Collaborative Filtering

A

A form of collaborative filtering based on calculating the similarity between items using ratings given by users.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Hamming Distance

A

Measures the distance between two strings of equal length by counting the number of positions at which the corresponding symbols differ.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Supervised Learning

A

A type of machine learning where the model is trained on a labeled dataset, learning to predict the output from the input data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Unsupervised Learning

A

Learning from data that has not been labeled, categorized, or classified, aiming to identify significant patterns.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Regression

A

A statistical method used in machine learning for predicting continuous outcomes based on previous data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Classification

A

A process in machine learning for categorizing data into predefined classes or categories.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Decision Trees

A

A decision support tool that uses a tree-like model of decisions and their possible consequences or probability event outcomes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Random Forest

A

An ensemble learning method for classification, regression, and other tasks that operates by constructing multiple decision trees at training time.

17
Q

Neural Networks

A

Computing systems vaguely inspired by the biological neural networks that constitute animal brains, capable of pattern recognition and data classification.

18
Q

Gradient Descent

A

An optimization algorithm used to minimize some function by iteratively moving in the direction of steepest descent as defined by the negative of the gradient.

19
Q

Overfitting

A

A modeling error in machine learning where a function is too closely fitted to a limited set of data points and fails to generalize to new data.

20
Q

Cross-Validation

A

A technique for assessing how the results of a statistical analysis will generalize to an independent data set, commonly used in settings where the goal is prediction and one wants to estimate how accurately a predictive model will perform in practice.

21
Q

Collaborative filtering

A

Collaborative filtering is a technique used in recommendation systems to predict the preferences of a user by collecting preferences or taste information from many users. The underlying assumption of the collaborative filtering approach is that if a person A has the same opinion as a person B on an issue, A is more likely to have B’s opinion on a different issue than that of a randomly chosen person.

22
Q

Pearson Correlation

A

Pearson correlation measures the linear relationship between two variables, providing a value between -1 and 1. A score of 1 means a perfect positive correlation, -1 indicates a perfect negative correlation, and 0 means no correlation. It’s commonly used in statistics to assess the strength and direction of two continuous variables’ relationships.

23
Q

Euclidean Distance

A

Euclidean distance is the “straight-line” distance between two points in Euclidean space. In terms of data points, it represents the geometric distance in multidimensional space, calculated using the Pythagorean theorem. It is often used in clustering and classification to determine how similar or dissimilar data points are to each other.