Naive Bayes predicts…

What is the probability that a new data point has label A, B, C… etc

Regression models are used to predict…

responses which have a continuous span of values

The adjusted R^2 can be used

to compare models with different numbers of terms

R^2, Mean Squared Error, Mean absolute error, etc, are examples of

goodness measures

For Root Mean Squared Error (RMSE) if the value is < ? , it’s a good sign

< 1 standard deviation of the response variable

For linear regression, always use…

more than 1 input variable

Regression trees are

non-parametric method

What is a stochastic process

a random variable that is a function of some index (ex space or time)

What is Lazy Learning

A model fit using local data, it does not create a general model but instead memorizes the training data

What is a technique that is an example of lazy learning

K nearest neighbors

What is Eager Learning

A model fit which is “eager” to produce a general model to fit all data

A Gaussian Process for regression produces

a probability distribution of functions

For Gaussian Processes for regression, a Kernel function…

ascribes how each data is similar to others. It is chosen and determines the covariance function

What are 3 main types of pre-processing for AI Model improvement

- Transformations
- Feature Selection
- Feature Engineering

The selection of a subset of input variables to use in the model is called

Feature Selection (AI improvement)

A “Transformation” is…

a mathematical alteration to the data space to facilitate improving the model goodness; rotating/stretching the input features

PCA stands for ___ and is an example of

Principal Components Analysis, a Transformation technique for data pre-processing

Converts a set of correlated input variables to uncorrelated variables (ie Principal Components)

What is “Feature Engineering”?

Addition of a new variable (feature) from existing variables to provide important info to ML, often adds human context to the data set (AI model improvement)

What is a primary purpose of text analytics in model improvement?

To perform feature engineering to integrate text data into the ML process

When you Transform the data using PCA, common application is

Reducing dimensions, to just use the most influential (largest) principal components, while minimizing information loss

In PCA, the ____ are referred to as the principal components

Eigenvalues

With PCA, the transformed space is…

Orthonormal and uncorrelated (0 covariance)

Creation of Dummy Variables is considered a ___ technique

Transformation

Dummy variables are…

a numerical equivalent to categorical variables

If you have 3 or more categories to convert to a number, it is better to introduce a dummy variable than a scale to avoid…

introducing bias

What is the limiting assumption of PCA

It is based on covariance, which is the linear statistical variation between 2 variables. The variables could have a more complex/nonlinear correlation in reality

Covariance is…

the linear statistical variation between 2 variables

What are some ML techniques that have built-in feature selection

Decision/Regression trees and Linear regression

What is a disadvantage of feature selection

You’re still losing information

What are 3 types of importance measures used for feature selection

- Filter (select features in pre-processing, train on your selection)
- Wrapper (train on a subset, then add or remove iteratively)
- Embedded (integrated into learning process, ex Trees)

What is the simplest importance measure to use for feature selection

Linear statistical importance - squared correlation

“Feature Importance” means

assigning a numerical importance value to each feature

When using squared correlation technique for feature importance, you could drop (input) features that…

show a very weak correlation to your target value (output) - ex 6.3

Cross Validation can produce more _ models, while ensemble typically has the goal of more _ models

Robust, accurate

One technique for Cross Validation is

K-folds

_____ is a newer ensemble learning technique

Random Forests

Ensemble learning often using ___ methods

decision/regression tree

By themselves, Trees are considered ____

weak learners

Trees can be prone to

overfitting

Bagging can help

avoid overfitting

Bagging definition in class

Each model uses a random subset of training data

Boosting can help

remove bias

What is the Random Forests technique

Ensemble method where a multitude of decision trees are used using probability measures for their construction and preduction

The accuracy of what technique can get on same level as A NNs?

Random Forests

CART stands for

classification and regression trees

Bootstrap Aggregation is another term for

Bagging

Text Analytics is the process of

quantifying information from raw text

What are 3 methods of text analytics

- Feature-Value Mapping
- Similarity Measures
- Vectorizing

Feature-value mapping is the same as…

Dummy Variables

In text analytics, similarity methods work by…

calculating an equivalent distance metric for text variables

What is an example of similarity method for text analytics

Levenshtein distance

Definition of vectorizing

Conversion of raw text data into a numerical equivalent

What are 3 ways to vectorize text?

- tokenizing
- counting
- normalizing

What are “stop words”

words that may not be informative in a set of text data, that can be excluded from vectorization

Validation test sets are used often when…

Detecting/avoiding overfitting when training ANNs