Data Mining/ ML Methods Flashcards

1
Q

Discuss the general approach to classification

A

Classification is when you want to assign an item to a specific category based on various conditions. Generally find location of items that need classification, compare it to items close by, and then assign group. Also used for object detection spam cancer etc method is called K nearest neighbors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Clustering

A

Groupings are unknown, and analyst wants to determine if object belongs to any group. Clustering is unsupervised learning and data set is unlabeled.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Bayes Theorem

A

Given the hypothesis and the observed data, this theorem is the probability of observing data. Basically the probability of getting the data that you found.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Naive Bayes

A

Estimates the conditional probability of an outcome. Naive Bayes is an algorithm that applies to Bayes theorem. Naive Bayes classifier is a ml model used to classify the object based on different features.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

PCA principal component analysis

A

This is an attempt to find out if variables themselves group in any meaningful way. This is a data reduction method used to reduce dimensionality of large data sets. This is done by transforming large set of variables into smaller ones that still contains most of the information in the large set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Dimensionality reduction

A

Reduces the number of variables and the amount of data. PCA is a technique for this

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Data reduction

A

Reducing volume of data in storage or in database. Goal is or optimize storage capacity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Hierarchal clustering

A

Algorithm that groups similar objects into groups that are called clusters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Anomaly detection

A

Identify rare items. Can be used to detect fraud. Using R or tableau with s local outlier factor or Alfa function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Neural networks

A

Algorithm that mimics the operation of human brain to recognize relationships in data sets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Deep learning

A

Type of neural network capable of performing text classification. Also type of recurrent neural network RNN that works best on sequential data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Decision Trees

A

Tree like model of alternative decisions and the consequences. It is a sequence of binary decisions based on your data that can combine to predict an outcome by branching out from one decision to the next.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Optimization Analysis

A

Finding the best value for one or more target variables given certain constraints. Showing what value a variable should have given certain conditions or restraints

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Supervised model versus unsupervised

A

Supervised is an ml algorithm that has a labelled data set. Such as classification or regression

Unsupervised is unlabeled data that an ml algorithm tries to find patterns. This would be clustering anomaly detection or a neural network.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly