Booz Terms Flashcards by Mark Schmale

Active Learning

Intelligent sample selection to improve performance of model. Samples are selected to provide the greatest information to a learning model.

How well did you know this?

Not at all

Perfectly

Agent Based Simulation

Simulates the actions and interactions of autonomous agents.

How well did you know this?

Not at all

Perfectly

ANOVA

Hypothesis testing for differences between more than two groups.

How well did you know this?

Not at all

Perfectly

Association Rule Mining (Apriori)

Data mining technique to identify the common co-occurances of items.

How well did you know this?

Not at all

Perfectly

Bayesian Network

Models conditional probabilities amongst elements, visualized as a Directed Acyclic Graph.

How well did you know this?

Not at all

Perfectly

Collaborative Filtering

Also known as ‘Recommendation,’ suggest or eliminate items from a set by comparing a history of actions against items performed by users. Finds similar items based on who used them or similar users based on the items they use.

How well did you know this?

Not at all

Perfectly

Coordinate Transformation

Provides a different perspective on data.

How well did you know this?

Not at all

Perfectly

Deep Learning

Method that learns features that leads to higher concept learning. Usually very deep neural network architectures.

How well did you know this?

Not at all

Perfectly

Design of Experiments

Applies controlled experiments to quantify effects on system output caused by changes to inputs.

How well did you know this?

Not at all

Perfectly

Differential Equations

Used to express relationships between functions and their derivatives, for example, change over time.

How well did you know this?

Not at all

Perfectly

Discrete Event Simulation

Simulates a discrete sequence of events where each event occurs at a particular instant in time. The model updates its state only at points in time when events occur.

How well did you know this?

Not at all

Perfectly

Discrete Wavelet Transform

Transforms time series data into frequency domain preserving locality information.

How well did you know this?

Not at all

Perfectly

Ensemble Learning

Learning multiple models and combining output to achieve better performance.

How well did you know this?

Not at all

Perfectly

Expert Systems

Systems that use symbolic logic to reason about facts. Emulates human reasoning.

How well did you know this?

Not at all

Perfectly

Exponential Smoothing

Used to remove artifacts expected from collection error or outliers.

How well did you know this?

Not at all

Perfectly

Factor Analysis

Describes variability among correlated variables with the goal of lowering the number of unobserved variables, namely, the factors.

How well did you know this?

Not at all

Perfectly

Fast Fourier Transform

Transforms time series from time to frequency domain efficiently. Can also be used for image improvement by spatial transforms.

How well did you know this?

Not at all

Perfectly

Format Conversion

Creates a standard representation of data regardless of source format. For example, extracting raw UTF-8 encoded text from binary file formats such as Microsoft Word or PDFs.

How well did you know this?

Not at all

Perfectly

Fuzzy Logic

Study These Flashcards

Logical reasoning that allows for degrees of truth for a statement.

Gaussian Filtering

Study These Flashcards

Acts to remove noise or blur data.

Generalized Linear Models

Study These Flashcards

Expands ordinary linear regression to allow for error distribution that is not normal.

Genetic Algorithms

Study These Flashcards

Evolves candidate models over generations by evolutionary inspired operators of mutation and crossover of parameters.

Grid Search

Study These Flashcards

Systematic search across discrete parameter values for parameter exploration problems.

Hidden Markov Models

Study These Flashcards

Models sequential data by determining the discrete latent variables, but the observables may be continuous or discrete.

Hierarchical Clustering

Connectivity based clustering approach that sequentially builds bigger (agglomerative) or smaller (divisive) clusters in the data.

K-means and X-means Clustering

Centroid based clustering algorithms, where with K means the number of clusters is set and X means the number of clusters is unknown.

Linear, Non-linear, and Integer Programming

Set of techniques for minimizing or maximizing a function over a constrained set of input parameters.

Markov Chain Monte Carlo (MCMC)

A method of sampling typically used in Bayesian models to estimate the joint distribution of parameters given the data.

Monte Carlo Methods

Set of computational techniques to generate random numbers.

Naive Bayes

Predicts classes following Bayes Theorem that states the probability of an outcome given a set of features is based on the probability of features given an outcome.

Neural Networks

Learns salient features in data by adjusting weights between nodes through a learning rule.

Outlier Removal

Method for identifying and removing noise or artifacts from data.

Principal Components Analysis

Enables dimensionality reduction by identifying highly correlated dimensions.

Random Search

Randomly adjust parameters to find a better solution than currently found.

Regression with Shrinkage (Lasso)

A method of variable selection and prediction combined into a possibly biased linear model.

Sensitivity Analysis

Involves testing individual parameters in an analytic or model and observing the magnitude of the effect.

Simulated Annealing

Named after a controlled cooling process in metallurgy, and by analogy using a changing temperature or annealing schedule to vary algorithmic convergence.

Stepwise Regression

A method of variable selection and prediction. Akaike's information criterion AIC is used as the metric for selection. The resulting predictive model is based upon ordinary least squares, or a general linear model with parameter estimation via maximum likelihood.

Stochastic Gradient Descent

General-purpose optimization for learning of neural networks, support vector machines, and logistic regression models.

Support Vector Machines

Projection of feature vectors using a kernel function into a space where classes are more separable.

Term Frequency / Inverse Document Frequency

A statistic that measures the relative importance of a term from a corpus.

Topic Modeling (Latent Dirichlet Allocation)

Identifies latent topics in text by examining word co-occurrence.

Tree Based Methods

Models structured as graph trees where branches indicate decisions.

T-Test

Hypothesis test used to test for differences between two groups.

Wrapper Methods

Feature set reduction method that utilizes performance of a set of features on a model, as a measure of the feature set’s performance. Can help identify combinations of features in models that achieve high performance.

Booz Terms Flashcards

(45 cards)