Active Learning
Intelligent sample selection to improve performance of model. Samples are selected to provide the greatest information to a learning model.
Agent Based Simulation
Simulates the actions and interactions of autonomous agents.
ANOVA
Hypothesis testing for differences between more than two groups.
Association Rule Mining (Apriori)
Data mining technique to identify the common co-occurances of items.
Bayesian Network
Models conditional probabilities amongst elements, visualized as a Directed Acyclic Graph.
Collaborative Filtering
Also known as ‘Recommendation,’ suggest or eliminate items from a set by comparing a history of actions against items performed by users. Finds similar items based on who used them or similar users based on the items they use.
Coordinate Transformation
Provides a different perspective on data.
Deep Learning
Method that learns features that leads to higher concept learning. Usually very deep neural network architectures.
Design of Experiments
Applies controlled experiments to quantify effects on system output caused by changes to inputs.
Differential Equations
Used to express relationships between functions and their derivatives, for example, change over time.
Discrete Event Simulation
Simulates a discrete sequence of events where each event occurs at a particular instant in time. The model updates its state only at points in time when events occur.
Discrete Wavelet Transform
Transforms time series data into frequency domain preserving locality information.
Ensemble Learning
Learning multiple models and combining output to achieve better performance.
Expert Systems
Systems that use symbolic logic to reason about facts. Emulates human reasoning.
Exponential Smoothing
Used to remove artifacts expected from collection error or outliers.
Factor Analysis
Describes variability among correlated variables with the goal of lowering the number of unobserved variables, namely, the factors.
Fast Fourier Transform
Transforms time series from time to frequency domain efficiently. Can also be used for image improvement by spatial transforms.
Format Conversion
Creates a standard representation of data regardless of source format. For example, extracting raw UTF-8 encoded text from binary file formats such as Microsoft Word or PDFs.
Fuzzy Logic
Logical reasoning that allows for degrees of truth for a statement.
Gaussian Filtering
Acts to remove noise or blur data.
Generalized Linear Models
Expands ordinary linear regression to allow for error distribution that is not normal.
Genetic Algorithms
Evolves candidate models over generations by evolutionary inspired operators of mutation and crossover of parameters.
Grid Search
Systematic search across discrete parameter values for parameter exploration problems.
Hidden Markov Models
Models sequential data by determining the discrete latent variables, but the observables may be continuous or discrete.
Hierarchical Clustering
Connectivity based clustering approach that sequentially builds bigger (agglomerative) or smaller (divisive) clusters in the data.
K-means and X-means Clustering
Centroid based clustering algorithms, where with K means the number of clusters is set and X means the number of clusters is unknown.
Linear, Non-linear, and Integer Programming
Set of techniques for minimizing or maximizing a function over a constrained set of input parameters.
Markov Chain Monte Carlo (MCMC)
A method of sampling typically used in Bayesian models to estimate the joint distribution of parameters given the data.
Monte Carlo Methods
Set of computational techniques to generate random numbers.
Naive Bayes
Predicts classes following Bayes Theorem that states the probability of an outcome given a set of features is based on the probability of features given an outcome.
Neural Networks
Learns salient features in data by adjusting weights between nodes through a learning rule.
Outlier Removal
Method for identifying and removing noise or artifacts from data.
Principal Components Analysis
Enables dimensionality reduction by identifying highly correlated dimensions.
Random Search
Randomly adjust parameters to find a better solution than currently found.
Regression with Shrinkage (Lasso)
A method of variable selection and prediction combined into a possibly biased linear model.
Sensitivity Analysis
Involves testing individual parameters in an analytic or model and observing the magnitude of the effect.
Simulated Annealing
Named after a controlled cooling process in metallurgy, and by analogy using a changing temperature or annealing schedule to vary algorithmic convergence.
Stepwise Regression
A method of variable selection and prediction. Akaike’s information criterion AIC is used as the metric for selection. The resulting predictive model is based upon ordinary least squares, or a general linear model with parameter estimation via maximum likelihood.
Stochastic Gradient Descent
General-purpose optimization for learning of neural networks, support vector machines, and logistic regression models.
Support Vector Machines
Projection of feature vectors using a kernel function into a space where classes are more separable.
Term Frequency / Inverse Document Frequency
A statistic that measures the relative importance of a term from a corpus.
Topic Modeling (Latent Dirichlet Allocation)
Identifies latent topics in text by examining word co-occurrence.
Tree Based Methods
Models structured as graph trees where branches indicate decisions.
T-Test
Hypothesis test used to test for differences between two groups.
Wrapper Methods
Feature set reduction method that utilizes performance of a set of features on a model, as a measure of the feature set’s performance. Can help identify combinations of features in models that achieve high performance.