Booz Selection Flashcards by Mark Schmale

What is the key question for 1. Describe?

How do I develop an understanding of the content of my data?

How well did you know this?

Not at all

Perfectly

What is the key question for 1. Describe | Processing?

How do I clean and separate my data?

How well did you know this?

Not at all

Perfectly

What is the key question for 1. Describe | Processing | Filtering?

How do I identify data based on its absolute or relative values?

How well did you know this?

Not at all

Perfectly

What is the key question for 1. Describe | Processing | Imputation?

How do I fill in missing values in my data?

How well did you know this?

Not at all

Perfectly

What is the key question for 1. Describe | Processing | Dimensionality Reduction?

How do I reduce the number of dimensions in my data?

How well did you know this?

Not at all

Perfectly

What is the key question for 1. Describe | Processing | Normalization & Transformation?

How do I reconcile duplication representations in the data?

How well did you know this?

Not at all

Perfectly

What is the key question for 1. Describe | Processing | Feature Extraction?

Really depends on the domain of the information. Variety of methods.

How well did you know this?

Not at all

Perfectly

For 1. Describe | Processing | Filtering, If you want to add or remove data based on its value, start with:

Relational algebra projection and selection

How well did you know this?

Not at all

Perfectly

For 1. Describe | Processing | Filtering, If early results are uninformative and duplicative, start with:

Outlier removal, Exponential smoothing, Gaussian filter, Median filter

How well did you know this?

Not at all

Perfectly

For 1. Describe | Processing | Imputation, If you want to generate values from other observations in your dataset, start with:

Random sampling, Markov Chain Monte Carlo (MC)

How well did you know this?

Not at all

Perfectly

For 1. Describe | Processing | Imputation, If you want to generate values without using other observations in your dataset, start with:

Mean, Statistical distributions, Regression models

How well did you know this?

Not at all

Perfectly

For 1. Describe | Processing | Dimensionality Reduction, If you need to determine whether there is multi-dimensional correlation, start with:

PCA and other factor analysis

How well did you know this?

Not at all

Perfectly

For 1. Describe | Processing | Dimensionality Reduction, If you can represent individual observations by membership in a group, start with:

K-means clustering, Canopy clustering

How well did you know this?

Not at all

Perfectly

For 1. Describe | Processing | Dimensionality Reduction, If you have unstructured text data, start with:

Term Frequency/Inverse Document Frequency (TF IDF)

How well did you know this?

Not at all

Perfectly

For 1. Describe | Processing | Dimensionality Reduction, If you have a variable number of features but your algorithm requires a fixed number, start with:

Feature hashing

How well did you know this?

Not at all

Perfectly

For 1. Describe | Processing | Dimensionality Reduction, If you are not sure which features are the most important, start with:

Wrapper methods, Sensitivity analysis

How well did you know this?

Not at all

Perfectly

For 1. Describe | Processing | Dimensionality Reduction, If you need to facilitate understanding of the probability distribution of the space, start with:

Self organizing maps

How well did you know this?

Not at all

Perfectly

For 1. Describe | Processing | Normalization & Transformation, If you suspect duplicate data elements, start with:

Deduplication

How well did you know this?

Not at all

Perfectly

For 1. Describe | Processing | Normalization & Transformation, If you want your data to fall within a specified range, start with:

Normalization

How well did you know this?

Not at all

Perfectly

For 1. Describe | Processing | Normalization & Transformation, If your data is stored in a binary format, start with:

Format Conversion

How well did you know this?

Not at all

Perfectly

For 1. Describe | Processing | Normalization & Transformation, If you are operating in frequency space, start with:

Fast Fourier Transform (FFT), Discrete wavelet transform

How well did you know this?

Not at all

Perfectly

For 1. Describe | Processing | Normalization & Transformation, If you are operating in Euclidian space, start with:

Coordinate transform

How well did you know this?

Not at all

Perfectly

What is the key question for 1. Describe | Aggregation?

How do I collect and summarize my data?

How well did you know this?

Not at all

Perfectly

For 1. Describe | Aggregation, If you are unfamiliar with the dataset, start with:

basic statistics: Count, Mean, Standard deviation, Range, Scatter Plots, Box plots

How well did you know this?

Not at all

Perfectly

For 1. Describe | Aggregation, If your approach assumes the data follows a distribution, start with:

Distribution fitting

For 1. Describe | Aggregation, If you want to understand all the information available on an entity, start with:

“Baseball card” aggregation

What is the key question for 1. Describe | Enrichment?

How do I add new information to my data?

For 1. Describe | Enrichment, If you need to keep track of source information or other user-defined parameters, start with:

Annotation

For 1. Describe | Enrichment, If you often process certain data fields together or use one field to compute the value of another, start with:

Relational algebra rename, Feature addition (e.g., Geography, Technology, Weather)

What is the key question for 2. Discover?

What are the key relationships in the data?

What is the key question for 2. Discover | Clustering?

How do I segment the data to find natural groupings?

For 2. Discover | Clustering, If you want an ordered set of clusters with variable precision, start with:

Hierarchical

For 2. Discover | Clustering, ? If you have an unknown number of clusters, start with:

X-means, Canopy, Apriori

For 2. Discover | Clustering, If you have text data, start with:

Topic modeling

For 2. Discover | Clustering, If you have non-elliptical clusters, start with:

Fractal, DB Scan

For 2. Discover | Clustering, If you want soft membership in the clusters, start with:

Gaussian mixture models

For 2. Discover | Clustering, If you have an known number of clusters, start with:

K-means

What is the key question for 2. Discover | Regression?

How do I determine which variables may be important?

For 2. Discover | Regression, If your data has unknown structure, start with:

Tree-based methods

For 2. Discover | Regression, If statistical measures of importance are needed, start with:

Generalized linear models

For 2. Discover | Regression, If statistical measures of importance are not needed, start with:

Regression with shrinkage (e.g., LASSO, Elastic net), Stepwise regression

What is the key question for 2. Discover | Hypothesis Testing?

How do I test ideas?

For 2. Discover | Hypothesis Testing, If you want to compare two groups

T-test

For 2. Discover | Hypothesis Testing, If you want to compare multiple groups

ANOVA

What is the key question for 3. Predict?

What are the likely future outcomes?

What is the key question for 3. Predict | Classification?

How do I predict group membership?

For 3. Predict | Classification, If you have known dependent relationships between variables

Bayesian network

For 3. Predict | Classification, If you are unsure of feature importance, start with:

Neural nets, Random forests, Deep learning

For 3. Predict | Classification, If you require a highly transparent model, start with:

Decision trees

For 3. Predict | Classification, If you have less than 20 data dimensions, start with:

K-nearest neighbors

For 3. Predict | Classification, If you have a large dataset with an unknown classification signal, start with:

Naive bayes

For 3. Predict | Classification, If you want to estimate an unobservable state based on observable variables, start with:

Hidden markov model

For 3. Predict | Classification, If you don't know where else to begin, start with:

Support vector machines (SVM), Random forests

What is the key question for 3. Predict | Regression?

How do I predict a future value?

For 3. Predict | Regression, If the data structure is unknown, start with:

Tree-based methods

For 3. Predict | Regression, If you require a highly transparent model, start with:

Generalized linear models

For 3. Predict | Regression, If you have less than 20 data dimensions, start with:

K-nearest neighbors

What is the key question for 3. Predict | Recommendation?

How do I predict relevant conditions?

For 3. Predict | Recommendation, If you only have knowledge of how people interact with items, start with:

Collaborative filtering

For 3. Predict | Recommendation, If you have a feature vector of item characteristics, start with:

Content-based methods

For 3. Predict | Recommendation, If you only have knowledge of how items are connected to one another, start with:

Graph-based methods

What is the key question for 4. Advise?

What course of action should I take?

What is the key question for 4. Advise | Logical Reasoning?

How do I sort through different evidence?

For 4. Advise | Logical Reasoning, If you have expert knowledge to capture

Expert systems

For 4. Advise | Logical Reasoning, If you're looking for basic facts

Logical reasoning

What is the key question for 4. Advise | Optimization?

How do I identify the best course of action when my objective can be expressed as a utility function?

For 4. Advise | Optimization, If your problem is represented by a non-deterministic utility function, start with:

Stochastic search

For 4. Advise | Optimization, If approximate solutions are acceptable, start with:

Genetic algorithms, Simulated annealing, Gradient search

For 4. Advise | Optimization, If your problem is represented by a deterministic utility function, start with:

Linear programming, Integer programming, Non-linear programming

For 4. Advise | Optimization, If you have limited resources to search with

Active learning

For 4. Advise | Optimization, If you want to try multiple models

Ensemble learning

What is the key question for 4. Advise | Simulation?

How do I characterize a system that does not have a closed-form representation?

For 4. Advise | Simulation, If you must model discrete entities, start with:

Discrete event simulation (DES)

For 4. Advise | Simulation, If there are a discrete set of possible states, start with:

Markov models

For 4. Advise | Simulation, If there are actions and interactions among autonomous entities, start with:

Agent-based simulation

For 4. Advise | Simulation, If you do not need to model discrete entities, start with:

Monte Carlo simulation

For 4. Advise | Simulation, If you are modeling a complex system with feedback mechanisms between actions, start with:

Systems dynamics

For 4. Advise | Simulation, If you require continuous tracking of system behavior, start with:

Activity-based simulation

For 4. Advise | Simulation, If you already have an understanding of what factors govern the system, start with:

ODES, PDES

For 4. Advise | Simulation, If you have imprecise categories

Fuzzy logic

Booz Selection Flashcards

(80 cards)