Modules Flashcards

(60 cards)

1
Q

Permutation Feature Importance

A

Feature Selction

Permutation feature importance works by randomly changing the values of each feature column, one column at a time, and then evaluating the model.

The rankings provided by permutation feature importance are often different from the ones you get from Filter Based Feature Selection, which calculates scores before a model is created.

This is because permutation feature importance doesn’t measure the association between a feature and a target value, but instead captures how much influence each feature has on predictions from the model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Filter Based Feature Selection

A

Feature Selection

The Filter Based Feature Selection module provides multiple feature selection algorithms to choose from, including correlation methods such as Pearsons’s or Kendall’s correlation, mutual information scores, and chi-squared values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Fisher Linear Discriminant Analysis

A

Feature Selection

Identifies the linear combination of feature variables that can best group data into separate classes.

Captures the combination of features that best separates two or more classes.

This method is often used for dimensionality reduction, because it projects a set of features onto a smaller feature space while preserving the information that discriminates between classes. This not only reduces computational costs for a given classification task, but can help prevent overfitting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Synthetic Minority Oversampling Technique (SMOTE)

A

Manipulation

Use the SMOTE module in Azure Machine Learning Studio to increase the number of underepresented cases in a dataset used for machine learning. SMOTE is a better way of increasing the number of rare cases than simply duplicating existing cases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Vowpal Wabbit

A

Text Analytics

Vowpal Wabbit (VW) is a fast, parallel machine learning framework that was developed for distributed computing by Yahoo! Research. Later it was ported to Windows and adapted by John Langford (Microsoft Research) for scientific computing in parallel architectures.

Features of Vowpal Wabbit that are important for machine learning include continuous learning (online learning), dimensionality reduction, and interactive learning. Vowpal Wabbit is also a solution for problems when you cannot fit the model data into memory.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Root Mean Square Error

A

Evaluate Model - Regression

Root mean squared error (RMSE) creates a single value that summarizes the error in the model. By squaring the difference, the metric disregards the difference between over-prediction and under-prediction.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

R-Squared

A

Evaluate Model - Regression

Coefficient of determination, often referred to as R2, represents the predictive power of the model as a value between 0 and 1. Zero means the model is random (explains nothing); 1 means there is a perfect fit. However, caution should be used in interpreting R2 values, as low values can be entirely normal and high values can be suspect.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

F1 score

A

Evaluate Model - Classification

F-score is computed as the weighted average of precision and recall between 0 and 1, where the ideal F-score value is 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

k-fold cross-validation

A

Cross Validate Module - Regression / Classification

Cross-validation is a technique often used in machine learning to assess both the variability of a dataset and the reliability of any model trained through that data.

The Cross Validate Model module takes as input a labeled dataset, together with an untrained classification or regression model. It divides the dataset into some number of subsets (folds), builds a model on each fold, and then returns a set of accuracy statistics for each fold. By comparing the accuracy statistics for all the folds, you can interpret the quality of the data set. You can then understand whether the model is susceptible to variations in the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Assign Data to Clusters

A

xxx

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Load Trained Model

A

xxx

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

C. Partition and Sample

A

xxx

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

D. Tune Model-Hyperparameters

A
Integrated train and tune: You configure a set of parameters to use, and then let the module iterate over multiple combinations, measuring accuracy until it finds a
"best" model. With most learner modules, you can choose which parameters should be changed during the training process, and which should remain fixed.
We recommend that you use Cross-Validate Model to establish the goodness of the model given the specified parameters. Use Tune Model Hyperparameters to identify the optimal parameters.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Build Counting Transform

A

Build Counting Transform module in Azure Machine Learning Studio, to analyze training data. From this data, the module builds a count table as well as a set of count-based features that can be used in a predictive model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Missing Values Scrubber

A

The Missing Values Scrubber module is deprecated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Feature Hashing

A

Feature hashing is used for linguistics, and works by converting unique tokens into integers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Clean Missing Data

A

to remove, replace, or infer missing values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Replace Discrete Values

A

the Replace Discrete Values module in Azure Machine Learning Studio is used to generate a probability score that can be used to represent a discrete value. This score can be useful for understanding the information value of the discrete values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Import Data

A

xxx

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Latetent Dirichlet Transformation

A

Latent Dirichlet Allocation module in Azure Machine Learning Studio, to group otherwise unclassified text into a number of categories. Latent Dirichlet Allocation (LDA) is often used in natural language processing (NLP) to find texts that are similar. Another common term is topic modeling.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Partition and Sample

A

xxx

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Convert to Indicator Values

A

Use the Convert to Indicator Values module in Azure Machine Learning Studio. The purpose of this module is to convert columns that contain categorical values into a series of binary indicator columns that can more easily be used as features in a machine learning model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Clean Missing Data

A

xxx

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Remove Duplicate Rows

A

xxx

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Synthetic Minority Oversampling Technique (SMOTE)
xxx
26
Stratified split
xxx
27
Computer Linear Correlation
The Compute Linear Correlation module in Azure Machine Learning Studio is used to compute a set of Pearson correlation coefficients for each possible pair of variables in the input dataset.
28
B. Export Count Table
``` The Export Count Table module is provided for backward compatibility with experiments that use the Build Count Table (deprecated) and Count Featurizer (deprecated) modules. ```
29
C. Execute Python Script
With Python, you can perform tasks that aren't currently supported by existing Studio modules such as: Visualizing data using matplotlib Using Python libraries to enumerate datasets and models in your workspace Reading, loading, and manipulating data from sources not supported by the Import Data module
30
D. Convert to Indicator Values
The purpose of the Convert to Indicator Values module is to convert columns that contain categorical values into a series of binary indicator columns that can more easily be used as features in a machine learning model.
31
E. Summarize Data
Summarize Data statistics are useful when you want to understand the characteristics of the complete dataset. For example, you might need to know: How many missing values are there in each column? How many unique values are there in a feature column? What is the mean and standard deviation for each column? The module calculates the important scores for each column, and returns a row of summary statistics for each variable (data column) provided as input.
32
Test Hypothesis Using t-Test
xxx
33
Remove stop words
Remove words to optimize information retrieval. Remove stop words: Select this option if you want to apply a predefined stopword list to the text column. Stop word removal is performed before any other processes.
34
Lemmatization
Ensure that multiple related words from a single canonical form. Lemmatization converts multiple related words to a single canonical form
35
Remove special characters
Remove special characters: Use this option to replace any non-alphanumeric special characters with the pipe | character.
36
Group data into bins
xxx
37
Group data into bins
xxx
38
Synthetic Minority Oversampling Technique (SMOTE)
xxx
39
Scale and Reduce
xxx
40
Boosted Decision Tree Regression
xxx
41
Online Gradient Descent
xxx
42
Baysian Linear Regression
xxx
43
Neural Network Regression
xxx
44
Linear Regression
xxx
45
Decision Forest Regression
xxx
46
Clean Missing Data
xxx
47
Multiple Imputation by Chained Equations (MICE)
xxx
48
Equal Width with Custom Start and Stop binning
xxx
49
Entropy MDL binning mode
xxx
50
Apply a Quantiles binning mode with a PQuantile normalization
xxx
51
Entropy MDL binning mode
xxx
52
Synthetic Minority Oversampling Technique (SMOTE)
xxxx
53
Last Observation Carried Forward (LOCF)
xxx
54
Multiple Imputation by Chained Equations (MICE)
xxx
55
Permutation Feature Importance
xxx
56
Edit Metadata
xxx
57
Filter Based Feature Selection
xxx
58
Execute Python Script
xxx
59
Latent Dirichlet Allocation
xxx
60
Fortsätt Page 29
https://www.google.com/search?q=site:https://www.examtopics.com/exams/microsoft/dp-100/+%22studio-module-reference%E2%80%9D&rlz=1C1GCEA_enSE827SE827&sxsrf=ALeKk02QGVPNz-bJYoGOfw2svk4SdpRtPw:1591636394795&ei=qnHeXpGKMNL6qwGZoZGAAQ&start=10&sa=N&ved=2ahUKEwiRh7HP2_LpAhVS_SoKHZlQBBAQ8NMDegQIDBAz&biw=845&bih=927