Machine Learning - Supervised Flashcards

1
Q

Classification

A

We are trying to predict results in a discrete output. In other words, we are trying to map input variables into discrete categories.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Classification: Linear Binary

A

Binary classification problems arise when we seek to separate two sets of data points in R^n, each corresponding to a given class. We seek to separate the two data sets using simple ‘‘boundaries’’, typically hyperplanes. Once the boundary is found, we can use it to predict the class a new point belongs to, by simply checking on which side of the boundary it falls.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Decision Trees

A

A tree of questions to guide an end user to a conclusion based on values from a single vector of data. The classic example is a medical diagnosis based on a set of symptoms for a particular patient. A common problem in data science is to automatically or semi-automatically generate decision trees based on large sets of data coupled to known conclusions. Example algorithms are CART and ID3. (Submitted by Michael Malak)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Decision Trees

A

each node in the tree tests an attribute, decisions are represented by the edges leading away from that node with leaf nodes representing the final decision for all instances that reach that leaf node

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Decision Trees: Best Uses

A

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Decision Trees: Cons

A

1: Complex trees are hard to interpret; 2: Duplication within the same sub-tree is possible

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Decision Trees: Dealing with missing values

A

1) have a specific edge for no value; 2) track the number of instances that follow each path and assign that instance to the most popular edge; 3) give instances weights and split the instance evenly down each possible edge. Once the value has reached the leaf nodes, recombine it using the weights

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Decision Trees: Definition

A

Each node in the tree tests a single attribute, decisions are represented by the edges leading away from that node with leaf nodes representing the final decision.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Decision Trees: Example Applications

A

1: Star classification; 2: Medical diagnosis; 3: Credit risk analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Decision Trees: Flavors

A

CART, ID3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Decision Trees: Pros

A

1: Fast; 2: Robust to noise and missing values; 3: Accurate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Decision Trees: Pruining

A

since decision trees are often created by continuously splitting the instances until there is no more information gain, it is possible that some splits were done with too little information gain and result in overfitting of the model to the training data. Basic idea is to check child nodes to see if combining them with their parent would result in an increase in entropy below some threshold

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Decision Trees: Replicated subtree problem

A

because only a single attribute is tested at a node, it is possible that two copies of the same subtree will need to be placed in a tree if the attributes from that subtree were not tested in the root node

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Decision Trees: Restriction Bias

A

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Decision Trees: Testing nominal values

A

if a node has edges for each possible value of that nominal value, then that value will not be tested again further down the tree. If a node groups the values into subsets with an edge per subset, then that attribute may be tested again

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Decision Trees: Testing numeric values

A

can be tested for less than, greater than, equal to, within some range. Can result in just two edges or multiple edges. Can also have an edge that represents no value. May be tested multiple times in a single path

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Deep Learning

A

Refers to a class of methods that includes neural networks and deep belief nets. Useful for finding a hierarchy of the most significant features, characteristics, and explanatory variables in complex data sets. Particularly useful in unsupervised machine learning of large unlabeled datasets. The goal is to learn multiple layers of abstraction for some data. For example, to recognize images, we might want to first examine the pixels and recognize edges; then examine the edges and recognize contours; examine the contours to find shapes; the shapes to find objects; and so on.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Ensemble Learning

A

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Hidden Layer

A

The second layer of a three-layer network where the input layer sends its signals, performs intermediary processing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

K-Nearest Neighbors: Cons

A

1: Performs poorly on high-dimensionality datasets; 2: Expensive and slow to predict new instances; 3: Must define a meaningful distance function;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

K-Nearest Neighbors: Definition

A

K-NN is an algorithm that can be used when you have a objects that have been classified or labeled and other similar objects that haven’t been classified or labeled yet, and you want a way to automatically label them.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

K-Nearest Neighbors: Example Applications

A

1: Computer security: intrusion detection; 2: Fault detection in semiconducter manufacturing; 3: Video content retrieval; 4: Gene expression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

K-Nearest Neighbors: Preference Bias

A

Good for measuring distance based approximations, good for outlier detection

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

K-Nearest Neighbors: Pros

A

1: Simple; 2: Powerful; 3: Lazy, no training involved; 4: Naturally handles multiclass classification and regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

K-Nearest Neighbors: Restriction Bias

A

Low-dimensional datasets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

K-Nearest Neighbors: Type

A

Supervised learning, instance based

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

KNN

A

K-NN is an algorithm that can be used when you have a bunch of objects that have been classified or labeled in some way, and other similar objects that haven’t gotten classified or labeled yet, and you want a way to automatically label them.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Linear Regression

A

Trying to fit a linear continuous function to the data. Univariate or Multivariate.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Linear Regression: Cons

A

1: Unable to model complex relationships, 2: Unable to capture nonlinear relationships without first transforming the inputs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Linear Regression: Definition

A

Trying to fit a linear continuous function to the data to predict results. Can be univariate or multivariate.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Linear Regression: Example Applications

A

1: Fitting a line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Linear Regression: Preference Bias

A

1: Prefers continuous variables; 2: A first look at a dataset; 3: Numerical data with lots of features

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Linear Regression: Pros

A

1: Very fast - runs in constant time, 2: Easy to understand the model, 3: Less prone to overfitting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Linear Regression: Restriction Bias

A

Low restriction on problems it can solve

35
Q

Linear Regression: Type

A

Supervised learning, regression class

36
Q

Logistic Regression

A

A kind of regression analysis often used when the dependent variable is dichotomous and scored 0 or 1. It is usually used for predicting whether something will happen or not, such as graduation, business failure, or heart attack-anything that can be expressed as event/non-event. Independent variables may be categorical or continuous in logistic regression analysis.

37
Q

Multiclass-Classification: One-vs-all

A

Multiclass classification. Reducing a classification problem with multiple features that have to be predicted to a simple classification problem by looking at one feature at a time. Then to determin the final prediction we take the max of all the predicted values.

38
Q

Naive Bayes

A

39
Q

Naive Bayes: Cons

A

40
Q

Naive Bayes: Definition

A

Given its simplicity and the assumption that the independent variables are statistically independent, Naive Bayes models are effective classification tools that are easy to use and interpret. Naive Bayes is particularly appropriate when the dimensionality of the independent space is high. For the reasons given above, Naive Bayes can often outperform other more sophisticated classification methods. A variety of methods exist for modeling the conditional distributions of the inputs including normal, lognormal, gamma, and Poisson.

41
Q

Naive Bayes: Example Applications

A

42
Q

Naive Bayes: Flavors

A

A variety of methods exist for modeling the conditional distributions of the inputs including normal, lognormal, gamma, and Poisson.

43
Q

Naive Bayes: Preference Bias

A

Works on problems where the inputs are independent from each other

44
Q

Naive Bayes: Pros

A

1: Easy to use and interpret; 2: Works well with high dimensional problems

45
Q

Naive Bayes: Restriction Bias

A

Prefers problems where the probability will always be greater than zero for each class

46
Q

Naive Bayes: Type

A

Supervised learning; used for classification; probabalistic approach

47
Q

Neural Networks

A

In neuronal networks the process of calculating the subsequent layers of the network. Each layer depends on the calculations done on the layer before it.

48
Q

Neural Networks

A

Interconnected neural cells. With experience, networks can learn, as feedback strengthens or inhibits connections that produce certain results. Computer simulations of neural networks show analogous learning.

49
Q

Neural Networks: Cons

A

1: Prone to overfitting; 2: Long training time; 3: Requires significant computing power for large datasets; 4: Model is essentially unreadable; 5: Work best with “homogenous” data where features all have similar meanings

50
Q

Neural Networks: Definition

A

With experience, networks can learn, as feedback strengthens or inhibits connections that produce certain results. Each layer depends on the calculations done on the layer before it.

51
Q

Neural Networks: Example Applications

A

1: Images; 2: Video; 3: “Human-intelligence” type tasks like driving or flying; 4: Robotics

52
Q

Neural Networks: Flavors

A

Deep learning

53
Q

Neural Networks: Preference Bias

A

Prefers binary inputs

54
Q

Neural Networks: Pros

A

1: Extremely powerful, can model even very complex relationships; 2: No need to understand the underlying data; 3: Almost works by “magic”

55
Q

Neural Networks: Random Initialization

A

Symmetry breaking for neural networks is achieved by:

56
Q

Neural Networks: Restriction Bias

A

Little restriction bias

57
Q

Neural Networks: Type

A

Supervised learning; nonlinear functional approximation

58
Q

Overview of algorithms

A

59
Q

Probabilistic Graphical Model (a.k.a. Graphical Model)

A

Ways of encoding the structure (independencies) of a probability distribution into a picture. The two main types of graphical models are directed graphical models and undirected graphical models, probability distributions represented by directed and undirected graphs respectively. Each node in the graph represents a random variable, and a connection between two nodes indicates a possible dependence between the random variables. So, for example, a fully disconnected graph would represent a fully independent set of random variables, meaning the distribution could be fully factored as P(x,y,z,…)=P(x)P(y)P(z)… Note that the graphs represent structures, not probabilities themselves.

60
Q

Random Forest

A

61
Q

Random Forests

A

A decision tree classifier that produces a “forest of trees”, yielding highly accurate models, essentially by iteratively randomizing one input variable at a time in order to learn if this randomization process actually produces a less accurate classifier. If it doesn’t, then that variable is ousted from the model.

62
Q

Recommendation Systems: Collaborative filtering

A

Based on past user behavior. Each user’s history of behaviors (ratings, purchases, or viewing history) is used to make associations between users with similar behavior and between items of interest to the same users. Example: Netflix. Methods: 1. Neighborhood-based methods, based on user-user or item-item distances; 2. Latent factor or reduced- dimension models, which automatically discover a small number of descriptive factors for users and items; 3. Low-rank matrix factorization is the best-known example of reduced-dimension models and is among the most flexible and successful methods underlying recommendation systems. There are many variants of matrix factorization, including probabilistic and Bayesian versions. Restricted Boltzmann machines, a type of deep learning neural network, are another state-of-the-art approach.

63
Q

Recommendation Systems: Collaborative filtering: Matrix Factorization

A

….Probabilistic and Bayesian versions, Restricted Boltzmann machines, a type of deep learning neural network, are another state-of-the-art approach.

64
Q

Recommendation Systems: Companies using them

A

Retailers: Amazon, Target; Movies + Music Sites: Netflix, last.fm, Pandora; Social networks: Facebook, Twitter; Grocery stores: Tesco; Content publishers: Ad networks: Yahoo!, Google; CRM: Next-best offer in marketing decision making

65
Q

Recommendation Systems: Content-based filtering

A

Gathers information (e.g., demographics, genre, keywords, preferences, survey responses) to generate a profile for each user or item. Users are matched to items based on their profiles. Example: Pandora’s Music Genome Project.

66
Q

Regression Analysis

A

We are trying to predict results within a continuous output, meaning that we are trying to map input variables to some continuous function.

67
Q

Regression Trees

A

a single regression equation is much smaller and less complex than a regression tree, but tends to also be much less accurate.

68
Q

Regression Trees

A

decision trees which predict numeric quantities. The leaf nodes of these trees have a numeric quantity instead of a class. This numeric quantity is often decided by taking the average of all training set values to which the leaf node applies

69
Q

Sigmoid Function

A

an S-shaped mathamatical curve is often used to describe the activation function of a neuron over time

70
Q

Stepwise Regression

A

Variable selection process for multivariate regression. In forward stepwise selection, a seed variable is selected and each additional variable is inputed into the model, but only kept if it significantly improves goodness of fit (as measured by increases in R^2). Backwards selection starts with all variables, and removes them one by one until removing an additional one decreases R^2 by a non-trivial amount. Two deficiencies of this method are that the seed chosen disproportionately impacts which variables are kept, and that the decision is made using R^2, not Adjusted R^2. (submitted by Santiago Perez)

71
Q

Supervised Learning

A

We are given a data set and already know what our correct output should look like, having the idea that there is a relationship between the input and the output. Categorized into “regression” and “classification” problems.

72
Q

Support Vector Machine

A

can extrapolate information from one dimensional data (input space) and some information about weights & correlative relationships to another dimension (feature space)

73
Q

Support Vector Machine

A

divide an instance space by finding the line that is as far as possible from both classes. This line is called the “maximum-margin hyperplane”

74
Q

Support Vector Machine

A

Powerful Jedi machine learning classifier. Among classification algorithms used in supervised machine learning, SVM usually produces the most accurate classifications. Read more about SVM in this article “The Importance of Location in Real Estate, Weather, and Machine Learning.”

75
Q

Support Vector Machine

A

when determining the maximum-margin hyperplane for a support vector machine, only the points near the hyperplane are important. These points near the boundary are called the support vectors

76
Q

Support Vector Machines: Cons

A

1: Need to select a good kernel function; 2: Model parameters are difficult to interpret; 3: Sometimes numerical stability problems; 4: Requires significant memory and processing power

77
Q

Support Vector Machines: Definition

A

Divides an instance space by finding the line that is as far as possible from both classes. This line is called the “maximum-margin hyperplane”. Only the points near the hyperplane are important. These points near the boundary are called the support vectors.

78
Q

Support Vector Machines: Example Applications

A

1: Text classification; 2: Image classification; 3: Handwriting recognition

79
Q

Support Vector Machine: Kernels

A

since support vector machines use dot-products (just like linear classifiers) when determining the hyperplane, they can be turned into a non-linear classifier by replacing the dot-product with a kernel such as the radial-basis function

80
Q

Support Vector Machine: libsvm

A

open source library for SVMs written in C++ (w/a Java version as well). Trains an SVM model, makes predictions, and tests predictions w/in a dataset with support for kernel methods such as the radial-basis function

81
Q

Support Vector Machines: Preference Bias

A

Works where there is a definite distinction between two classifications

82
Q

Support Vector Machines: Pros

A

1: Can model complex, nonlinear relationships; 2: Robust to noise (because they maximize margins)

83
Q

Support Vector Machines: Restriction Bias

A

Prefers binary classification problems

84
Q

Support Vector Machines: Type

A

Supervised learning for defining a decision boundary