Booz Selection Flashcards

(80 cards)

1
Q

What is the key question for 1. Describe?

A

How do I develop an understanding of the content of my data?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the key question for 1. Describe | Processing?

A

How do I clean and separate my data?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the key question for 1. Describe | Processing | Filtering?

A

How do I identify data based on its absolute or relative values?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the key question for 1. Describe | Processing | Imputation?

A

How do I fill in missing values in my data?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the key question for 1. Describe | Processing | Dimensionality Reduction?

A

How do I reduce the number of dimensions in my data?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the key question for 1. Describe | Processing | Normalization & Transformation?

A

How do I reconcile duplication representations in the data?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the key question for 1. Describe | Processing | Feature Extraction?

A

Really depends on the domain of the information. Variety of methods.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

For 1. Describe | Processing | Filtering, If you want to add or remove data based on its value, start with:

A

Relational algebra projection and selection

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

For 1. Describe | Processing | Filtering, If early results are uninformative and duplicative, start with:

A

Outlier removal, Exponential smoothing, Gaussian filter, Median filter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

For 1. Describe | Processing | Imputation, If you want to generate values from other observations in your dataset, start with:

A

Random sampling, Markov Chain Monte Carlo (MC)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

For 1. Describe | Processing | Imputation, If you want to generate values without using other observations in your dataset, start with:

A

Mean, Statistical distributions, Regression models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

For 1. Describe | Processing | Dimensionality Reduction, If you need to determine whether there is multi-dimensional correlation, start with:

A

PCA and other factor analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

For 1. Describe | Processing | Dimensionality Reduction, If you can represent individual observations by membership in a group, start with:

A

K-means clustering, Canopy clustering

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

For 1. Describe | Processing | Dimensionality Reduction, If you have unstructured text data, start with:

A

Term Frequency/Inverse Document Frequency (TF IDF)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

For 1. Describe | Processing | Dimensionality Reduction, If you have a variable number of features but your algorithm requires a fixed number, start with:

A

Feature hashing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

For 1. Describe | Processing | Dimensionality Reduction, If you are not sure which features are the most important, start with:

A

Wrapper methods, Sensitivity analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

For 1. Describe | Processing | Dimensionality Reduction, If you need to facilitate understanding of the probability distribution of the space, start with:

A

Self organizing maps

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

For 1. Describe | Processing | Normalization & Transformation, If you suspect duplicate data elements, start with:

A

Deduplication

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

For 1. Describe | Processing | Normalization & Transformation, If you want your data to fall within a specified range, start with:

A

Normalization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

For 1. Describe | Processing | Normalization & Transformation, If your data is stored in a binary format, start with:

A

Format Conversion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

For 1. Describe | Processing | Normalization & Transformation, If you are operating in frequency space, start with:

A

Fast Fourier Transform (FFT), Discrete wavelet transform

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

For 1. Describe | Processing | Normalization & Transformation, If you are operating in Euclidian space, start with:

A

Coordinate transform

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is the key question for 1. Describe | Aggregation?

A

How do I collect and summarize my data?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

For 1. Describe | Aggregation, If you are unfamiliar with the dataset, start with:

A

basic statistics: Count, Mean, Standard deviation, Range, Scatter Plots, Box plots

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
For 1. Describe | Aggregation, If your approach assumes the data follows a distribution, start with:
Distribution fitting
26
For 1. Describe | Aggregation, If you want to understand all the information available on an entity, start with:
“Baseball card” aggregation
27
What is the key question for 1. Describe | Enrichment?
How do I add new information to my data?
28
For 1. Describe | Enrichment, If you need to keep track of source information or other user-defined parameters, start with:
Annotation
29
For 1. Describe | Enrichment, If you often process certain data fields together or use one field to compute the value of another, start with:
Relational algebra rename, Feature addition (e.g., Geography, Technology, Weather)
30
What is the key question for 2. Discover?
What are the key relationships in the data?
31
What is the key question for 2. Discover | Clustering?
How do I segment the data to find natural groupings?
32
For 2. Discover | Clustering, If you want an ordered set of clusters with variable precision, start with:
Hierarchical
33
For 2. Discover | Clustering, ? If you have an unknown number of clusters, start with:
X-means, Canopy, Apriori
34
For 2. Discover | Clustering, If you have text data, start with:
Topic modeling
35
For 2. Discover | Clustering, If you have non-elliptical clusters, start with:
Fractal, DB Scan
36
For 2. Discover | Clustering, If you want soft membership in the clusters, start with:
Gaussian mixture models
37
For 2. Discover | Clustering, If you have an known number of clusters, start with:
K-means
38
What is the key question for 2. Discover | Regression?
How do I determine which variables may be important?
39
For 2. Discover | Regression, If your data has unknown structure, start with:
Tree-based methods
40
For 2. Discover | Regression, If statistical measures of importance are needed, start with:
Generalized linear models
41
For 2. Discover | Regression, If statistical measures of importance are not needed, start with:
Regression with shrinkage (e.g., LASSO, Elastic net), Stepwise regression
42
What is the key question for 2. Discover | Hypothesis Testing?
How do I test ideas?
43
For 2. Discover | Hypothesis Testing, If you want to compare two groups
T-test
44
For 2. Discover | Hypothesis Testing, If you want to compare multiple groups
ANOVA
45
What is the key question for 3. Predict?
What are the likely future outcomes?
46
What is the key question for 3. Predict | Classification?
How do I predict group membership?
47
For 3. Predict | Classification, If you have known dependent relationships between variables
Bayesian network
48
For 3. Predict | Classification, If you are unsure of feature importance, start with:
Neural nets, Random forests, Deep learning
49
For 3. Predict | Classification, If you require a highly transparent model, start with:
Decision trees
50
For 3. Predict | Classification, If you have less than 20 data dimensions, start with:
K-nearest neighbors
51
For 3. Predict | Classification, If you have a large dataset with an unknown classification signal, start with:
Naive bayes
52
For 3. Predict | Classification, If you want to estimate an unobservable state based on observable variables, start with:
Hidden markov model
53
For 3. Predict | Classification, If you don't know where else to begin, start with:
Support vector machines (SVM), Random forests
54
What is the key question for 3. Predict | Regression?
How do I predict a future value?
55
For 3. Predict | Regression, If the data structure is unknown, start with:
Tree-based methods
56
For 3. Predict | Regression, If you require a highly transparent model, start with:
Generalized linear models
57
For 3. Predict | Regression, If you have less than 20 data dimensions, start with:
K-nearest neighbors
58
What is the key question for 3. Predict | Recommendation?
How do I predict relevant conditions?
59
For 3. Predict | Recommendation, If you only have knowledge of how people interact with items, start with:
Collaborative filtering
60
For 3. Predict | Recommendation, If you have a feature vector of item characteristics, start with:
Content-based methods
61
For 3. Predict | Recommendation, If you only have knowledge of how items are connected to one another, start with:
Graph-based methods
62
What is the key question for 4. Advise?
What course of action should I take?
63
What is the key question for 4. Advise | Logical Reasoning?
How do I sort through different evidence?
64
For 4. Advise | Logical Reasoning, If you have expert knowledge to capture
Expert systems
65
For 4. Advise | Logical Reasoning, If you're looking for basic facts
Logical reasoning
66
What is the key question for 4. Advise | Optimization?
How do I identify the best course of action when my objective can be expressed as a utility function?
67
For 4. Advise | Optimization, If your problem is represented by a non-deterministic utility function, start with:
Stochastic search
68
For 4. Advise | Optimization, If approximate solutions are acceptable, start with:
Genetic algorithms, Simulated annealing, Gradient search
69
For 4. Advise | Optimization, If your problem is represented by a deterministic utility function, start with:
Linear programming, Integer programming, Non-linear programming
70
For 4. Advise | Optimization, If you have limited resources to search with
Active learning
71
For 4. Advise | Optimization, If you want to try multiple models
Ensemble learning
72
What is the key question for 4. Advise | Simulation?
How do I characterize a system that does not have a closed-form representation?
73
For 4. Advise | Simulation, If you must model discrete entities, start with:
Discrete event simulation (DES)
74
For 4. Advise | Simulation, If there are a discrete set of possible states, start with:
Markov models
75
For 4. Advise | Simulation, If there are actions and interactions among autonomous entities, start with:
Agent-based simulation
76
For 4. Advise | Simulation, If you do not need to model discrete entities, start with:
Monte Carlo simulation
77
For 4. Advise | Simulation, If you are modeling a complex system with feedback mechanisms between actions, start with:
Systems dynamics
78
For 4. Advise | Simulation, If you require continuous tracking of system behavior, start with:
Activity-based simulation
79
For 4. Advise | Simulation, If you already have an understanding of what factors govern the system, start with:
ODES, PDES
80
For 4. Advise | Simulation, If you have imprecise categories
Fuzzy logic