Exam2 Flashcards

1
Q

Naive Bayes predicts…

A

What is the probability that a new data point has label A, B, C… etc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Regression models are used to predict…

A

responses which have a continuous span of values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

The adjusted R^2 can be used

A

to compare models with different numbers of terms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

R^2, Mean Squared Error, Mean absolute error, etc, are examples of

A

goodness measures

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

For Root Mean Squared Error (RMSE) if the value is < ? , it’s a good sign

A

< 1 standard deviation of the response variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

For linear regression, always use…

A

more than 1 input variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Regression trees are

A

non-parametric method

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a stochastic process

A

a random variable that is a function of some index (ex space or time)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is Lazy Learning

A

A model fit using local data, it does not create a general model but instead memorizes the training data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a technique that is an example of lazy learning

A

K nearest neighbors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is Eager Learning

A

A model fit which is “eager” to produce a general model to fit all data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

A Gaussian Process for regression produces

A

a probability distribution of functions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

For Gaussian Processes for regression, a Kernel function…

A

ascribes how each data is similar to others. It is chosen and determines the covariance function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are 3 main types of pre-processing for AI Model improvement

A
  • Transformations
  • Feature Selection
  • Feature Engineering
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

The selection of a subset of input variables to use in the model is called

A

Feature Selection (AI improvement)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

A “Transformation” is…

A

a mathematical alteration to the data space to facilitate improving the model goodness; rotating/stretching the input features

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

PCA stands for ___ and is an example of

A

Principal Components Analysis, a Transformation technique for data pre-processing

Converts a set of correlated input variables to uncorrelated variables (ie Principal Components)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is “Feature Engineering”?

A

Addition of a new variable (feature) from existing variables to provide important info to ML, often adds human context to the data set (AI model improvement)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is a primary purpose of text analytics in model improvement?

A

To perform feature engineering to integrate text data into the ML process

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

When you Transform the data using PCA, common application is

A

Reducing dimensions, to just use the most influential (largest) principal components, while minimizing information loss

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

In PCA, the ____ are referred to as the principal components

A

Eigenvalues

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

With PCA, the transformed space is…

A

Orthonormal and uncorrelated (0 covariance)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Creation of Dummy Variables is considered a ___ technique

A

Transformation

24
Q

Dummy variables are…

A

a numerical equivalent to categorical variables

25
Q

If you have 3 or more categories to convert to a number, it is better to introduce a dummy variable than a scale to avoid…

A

introducing bias

26
Q

What is the limiting assumption of PCA

A

It is based on covariance, which is the linear statistical variation between 2 variables. The variables could have a more complex/nonlinear correlation in reality

27
Q

Covariance is…

A

the linear statistical variation between 2 variables

28
Q

What are some ML techniques that have built-in feature selection

A

Decision/Regression trees and Linear regression

29
Q

What is a disadvantage of feature selection

A

You’re still losing information

30
Q

What are 3 types of importance measures used for feature selection

A
  • Filter (select features in pre-processing, train on your selection)
  • Wrapper (train on a subset, then add or remove iteratively)
  • Embedded (integrated into learning process, ex Trees)
31
Q

What is the simplest importance measure to use for feature selection

A

Linear statistical importance - squared correlation

32
Q

“Feature Importance” means

A

assigning a numerical importance value to each feature

33
Q

When using squared correlation technique for feature importance, you could drop (input) features that…

A

show a very weak correlation to your target value (output) - ex 6.3

34
Q

Cross Validation can produce more _ models, while ensemble typically has the goal of more _ models

A

Robust, accurate

35
Q

One technique for Cross Validation is

A

K-folds

36
Q

_____ is a newer ensemble learning technique

A

Random Forests

37
Q

Ensemble learning often using ___ methods

A

decision/regression tree

38
Q

By themselves, Trees are considered ____

A

weak learners

39
Q

Trees can be prone to

A

overfitting

40
Q

Bagging can help

A

avoid overfitting

41
Q

Bagging definition in class

A

Each model uses a random subset of training data

42
Q

Boosting can help

A

remove bias

43
Q

What is the Random Forests technique

A

Ensemble method where a multitude of decision trees are used using probability measures for their construction and preduction

44
Q

The accuracy of what technique can get on same level as A NNs?

A

Random Forests

45
Q

CART stands for

A

classification and regression trees

46
Q

Bootstrap Aggregation is another term for

A

Bagging

47
Q

Text Analytics is the process of

A

quantifying information from raw text

48
Q

What are 3 methods of text analytics

A
  • Feature-Value Mapping
  • Similarity Measures
  • Vectorizing
49
Q

Feature-value mapping is the same as…

A

Dummy Variables

50
Q

In text analytics, similarity methods work by…

A

calculating an equivalent distance metric for text variables

51
Q

What is an example of similarity method for text analytics

A

Levenshtein distance

52
Q

Definition of vectorizing

A

Conversion of raw text data into a numerical equivalent

53
Q

What are 3 ways to vectorize text?

A
  • tokenizing
  • counting
  • normalizing
54
Q

What are “stop words”

A

words that may not be informative in a set of text data, that can be excluded from vectorization

55
Q

Validation test sets are used often when…

A

Detecting/avoiding overfitting when training ANNs

56
Q
A