Classification

We are trying to predict results in a discrete output. In other words, we are trying to map input variables into discrete categories.

Classification: Linear Binary

Binary classification problems arise when we seek to separate two sets of data points in R^n, each corresponding to a given class. We seek to separate the two data sets using simple ''boundaries'', typically hyperplanes. Once the boundary is found, we can use it to predict the class a new point belongs to, by simply checking on which side of the boundary it falls.

Decision Trees

A tree of questions to guide an end user to a conclusion based on values from a single vector of data. The classic example is a medical diagnosis based on a set of symptoms for a particular patient. A common problem in data science is to automatically or semi-automatically generate decision trees based on large sets of data coupled to known conclusions. Example algorithms are CART and ID3. (Submitted by Michael Malak)

Decision Trees

each node in the tree tests an attribute, decisions are represented by the edges leading away from that node with leaf nodes representing the final decision for all instances that reach that leaf node

Decision Trees: Best Uses

...

Decision Trees: Cons

1: Complex trees are hard to interpret; 2: Duplication within the same sub-tree is possible

Decision Trees: Dealing with missing values

1) have a specific edge for no value; 2) track the number of instances that follow each path and assign that instance to the most popular edge; 3) give instances weights and split the instance evenly down each possible edge. Once the value has reached the leaf nodes, recombine it using the weights

Decision Trees: Definition

Each node in the tree tests a single attribute, decisions are represented by the edges leading away from that node with leaf nodes representing the final decision.

Decision Trees: Example Applications

1: Star classification; 2: Medical diagnosis; 3: Credit risk analysis

Decision Trees: Flavors

CART, ID3

Decision Trees: Pros

1: Fast; 2: Robust to noise and missing values; 3: Accurate

Decision Trees: Pruining

since decision trees are often created by continuously splitting the instances until there is no more information gain, it is possible that some splits were done with too little information gain and result in overfitting of the model to the training data. Basic idea is to check child nodes to see if combining them with their parent would result in an increase in entropy below some threshold

Decision Trees: Replicated subtree problem

because only a single attribute is tested at a node, it is possible that two copies of the same subtree will need to be placed in a tree if the attributes from that subtree were not tested in the root node

Decision Trees: Restriction Bias

...

Decision Trees: Testing nominal values

if a node has edges for each possible value of that nominal value, then that value will not be tested again further down the tree. If a node groups the values into subsets with an edge per subset, then that attribute may be tested again

Decision Trees: Testing numeric values

can be tested for less than, greater than, equal to, within some range. Can result in just two edges or multiple edges. Can also have an edge that represents no value. May be tested multiple times in a single path

Deep Learning

Refers to a class of methods that includes neural networks and deep belief nets. Useful for finding a hierarchy of the most significant features, characteristics, and explanatory variables in complex data sets. Particularly useful in unsupervised machine learning of large unlabeled datasets. The goal is to learn multiple layers of abstraction for some data. For example, to recognize images, we might want to first examine the pixels and recognize edges; then examine the edges and recognize contours; examine the contours to find shapes; the shapes to find objects; and so on.

Ensemble Learning

...

Hidden Layer

The second layer of a three-layer network where the input layer sends its signals, performs intermediary processing

K-Nearest Neighbors: Cons

1: Performs poorly on high-dimensionality datasets; 2: Expensive and slow to predict new instances; 3: Must define a meaningful distance function;

K-Nearest Neighbors: Definition

K-NN is an algorithm that can be used when you have a objects that have been classified or labeled and other similar objects that haven't been classified or labeled yet, and you want a way to automatically label them.

K-Nearest Neighbors: Example Applications

1: Computer security: intrusion detection; 2: Fault detection in semiconducter manufacturing; 3: Video content retrieval; 4: Gene expression

K-Nearest Neighbors: Preference Bias

Good for measuring distance based approximations, good for outlier detection

K-Nearest Neighbors: Pros

1: Simple; 2: Powerful; 3: Lazy, no training involved; 4: Naturally handles multiclass classification and regression

K-Nearest Neighbors: Restriction Bias

Low-dimensional datasets

K-Nearest Neighbors: Type

Supervised learning, instance based

KNN

K-NN is an algorithm that can be used when you have a bunch of objects that have been classified or labeled in some way, and other similar objects that haven't gotten classified or labeled yet, and you want a way to automatically label them.

Linear Regression

Trying to fit a linear continuous function to the data. Univariate or Multivariate.

Linear Regression: Cons

1: Unable to model complex relationships, 2: Unable to capture nonlinear relationships without first transforming the inputs

Linear Regression: Definition

Trying to fit a linear continuous function to the data to predict results. Can be univariate or multivariate.