Machine Learning - Supervised Flashcards

Question

K-Nearest Neighbors: Restriction Bias

Answer 1

Low-dimensional datasets

Answer 2

Supervised learning, instance based

Answer 3

K-NN is an algorithm that can be used when you have a bunch of objects that have been classified or labeled in some way, and other similar objects that haven't gotten classified or labeled yet, and you want a way to automatically label them.

Answer 4

Trying to fit a linear continuous function to the data. Univariate or Multivariate.

Answer 5

1: Unable to model complex relationships, 2: Unable to capture nonlinear relationships without first transforming the inputs

Answer 6

Trying to fit a linear continuous function to the data to predict results. Can be univariate or multivariate.

Answer 7

1: Fitting a line

Answer 8

1: Prefers continuous variables; 2: A first look at a dataset; 3: Numerical data with lots of features

Answer 9

1: Very fast - runs in constant time, 2: Easy to understand the model, 3: Less prone to overfitting

Answer 10

Low restriction on problems it can solve

Answer 11

Supervised learning, regression class

Answer 12

A kind of regression analysis often used when the dependent variable is dichotomous and scored 0 or 1. It is usually used for predicting whether something will happen or not, such as graduation, business failure, or heart attack-anything that can be expressed as event/non-event. Independent variables may be categorical or continuous in logistic regression analysis.

Answer 13

Multiclass classification. Reducing a classification problem with multiple features that have to be predicted to a simple classification problem by looking at one feature at a time. Then to determin the final prediction we take the max of all the predicted values.

Answer 14

Given its simplicity and the assumption that the independent variables are statistically independent, Naive Bayes models are effective classification tools that are easy to use and interpret. Naive Bayes is particularly appropriate when the dimensionality of the independent space is high. For the reasons given above, Naive Bayes can often outperform other more sophisticated classification methods. A variety of methods exist for modeling the conditional distributions of the inputs including normal, lognormal, gamma, and Poisson.

Answer 15

A variety of methods exist for modeling the conditional distributions of the inputs including normal, lognormal, gamma, and Poisson.

Answer 16

Works on problems where the inputs are independent from each other

Answer 17

1: Easy to use and interpret; 2: Works well with high dimensional problems

Answer 18

Prefers problems where the probability will always be greater than zero for each class

Answer 19

Supervised learning; used for classification; probabalistic approach

Answer 20

In neuronal networks the process of calculating the subsequent layers of the network. Each layer depends on the calculations done on the layer before it.

Answer 21

Interconnected neural cells. With experience, networks can learn, as feedback strengthens or inhibits connections that produce certain results. Computer simulations of neural networks show analogous learning.

Answer 22

1: Prone to overfitting; 2: Long training time; 3: Requires significant computing power for large datasets; 4: Model is essentially unreadable; 5: Work best with "homogenous" data where features all have similar meanings

Answer 23

With experience, networks can learn, as feedback strengthens or inhibits connections that produce certain results. Each layer depends on the calculations done on the layer before it.

Answer 24

1: Images; 2: Video; 3: "Human-intelligence" type tasks like driving or flying; 4: Robotics

Answer 25

Deep learning

Answer 26

Prefers binary inputs

Answer 27

1: Extremely powerful, can model even very complex relationships; 2: No need to understand the underlying data; 3: Almost works by "magic"

Answer 28

Symmetry breaking for neural networks is achieved by:

Answer 29

Little restriction bias

Answer 30

Supervised learning; nonlinear functional approximation

Answer 31

Ways of encoding the structure (independencies) of a probability distribution into a picture. The two main types of graphical models are directed graphical models and undirected graphical models, probability distributions represented by directed and undirected graphs respectively. Each node in the graph represents a random variable, and a connection between two nodes indicates a possible dependence between the random variables. So, for example, a fully disconnected graph would represent a fully independent set of random variables, meaning the distribution could be fully factored as P(x,y,z,...)=P(x)P(y)P(z)... Note that the graphs represent structures, not probabilities themselves.

Answer 32

A decision tree classifier that produces a "forest of trees", yielding highly accurate models, essentially by iteratively randomizing one input variable at a time in order to learn if this randomization process actually produces a less accurate classifier. If it doesn't, then that variable is ousted from the model.

Answer 33

Based on past user behavior. Each user's history of behaviors (ratings, purchases, or viewing history) is used to make associations between users with similar behavior and between items of interest to the same users. Example: Netflix. Methods: 1. Neighborhood-based methods, based on user-user or item-item distances; 2. Latent factor or reduced- dimension models, which automatically discover a small number of descriptive factors for users and items; 3. Low-rank matrix factorization is the best-known example of reduced-dimension models and is among the most flexible and successful methods underlying recommendation systems. There are many variants of matrix factorization, including probabilistic and Bayesian versions. Restricted Boltzmann machines, a type of deep learning neural network, are another state-of-the-art approach.

Answer 34

....Probabilistic and Bayesian versions, Restricted Boltzmann machines, a type of deep learning neural network, are another state-of-the-art approach.

Answer 35

Retailers: Amazon, Target; Movies + Music Sites: Netflix, last.fm, Pandora; Social networks: Facebook, Twitter; Grocery stores: Tesco; Content publishers: Ad networks: Yahoo!, Google; CRM: Next-best offer in marketing decision making

Answer 36

Gathers information (e.g., demographics, genre, keywords, preferences, survey responses) to generate a profile for each user or item. Users are matched to items based on their profiles. Example: Pandora's Music Genome Project.

Answer 37

We are trying to predict results within a continuous output, meaning that we are trying to map input variables to some continuous function.

Answer 38

a single regression equation is much smaller and less complex than a regression tree, but tends to also be much less accurate.

Answer 39

decision trees which predict numeric quantities. The leaf nodes of these trees have a numeric quantity instead of a class. This numeric quantity is often decided by taking the average of all training set values to which the leaf node applies

Answer 40

an S-shaped mathamatical curve is often used to describe the activation function of a neuron over time

Answer 41

Variable selection process for multivariate regression. In forward stepwise selection, a seed variable is selected and each additional variable is inputed into the model, but only kept if it significantly improves goodness of fit (as measured by increases in R^2). Backwards selection starts with all variables, and removes them one by one until removing an additional one decreases R^2 by a non-trivial amount. Two deficiencies of this method are that the seed chosen disproportionately impacts which variables are kept, and that the decision is made using R^2, not Adjusted R^2. (submitted by Santiago Perez)

Answer 42

We are given a data set and already know what our correct output should look like, having the idea that there is a relationship between the input and the output. Categorized into "regression" and "classification" problems.

Answer 43

can extrapolate information from one dimensional data (input space) and some information about weights & correlative relationships to another dimension (feature space)

Answer 44

divide an instance space by finding the line that is as far as possible from both classes. This line is called the "maximum-margin hyperplane"

Answer 45

Powerful Jedi machine learning classifier. Among classification algorithms used in supervised machine learning, SVM usually produces the most accurate classifications. Read more about SVM in this article "The Importance of Location in Real Estate, Weather, and Machine Learning."

Answer 46

when determining the maximum-margin hyperplane for a support vector machine, only the points near the hyperplane are important. These points near the boundary are called the support vectors

Answer 47

1: Need to select a good kernel function; 2: Model parameters are difficult to interpret; 3: Sometimes numerical stability problems; 4: Requires significant memory and processing power

Answer 48

Divides an instance space by finding the line that is as far as possible from both classes. This line is called the "maximum-margin hyperplane". Only the points near the hyperplane are important. These points near the boundary are called the support vectors.

Answer 49

1: Text classification; 2: Image classification; 3: Handwriting recognition

Answer 50

since support vector machines use dot-products (just like linear classifiers) when determining the hyperplane, they can be turned into a non-linear classifier by replacing the dot-product with a kernel such as the radial-basis function

Answer 51

open source library for SVMs written in C++ (w/a Java version as well). Trains an SVM model, makes predictions, and tests predictions w/in a dataset with support for kernel methods such as the radial-basis function

Answer 52

Works where there is a definite distinction between two classifications

Answer 53

1: Can model complex, nonlinear relationships; 2: Robust to noise (because they maximize margins)

Answer 54

Prefers binary classification problems

Answer 55

Supervised learning for defining a decision boundary

Machine Learning - Supervised Flashcards

(84 cards)