Booz Selection Flashcards
(80 cards)
What is the key question for 1. Describe?
How do I develop an understanding of the content of my data?
What is the key question for 1. Describe | Processing?
How do I clean and separate my data?
What is the key question for 1. Describe | Processing | Filtering?
How do I identify data based on its absolute or relative values?
What is the key question for 1. Describe | Processing | Imputation?
How do I fill in missing values in my data?
What is the key question for 1. Describe | Processing | Dimensionality Reduction?
How do I reduce the number of dimensions in my data?
What is the key question for 1. Describe | Processing | Normalization & Transformation?
How do I reconcile duplication representations in the data?
What is the key question for 1. Describe | Processing | Feature Extraction?
Really depends on the domain of the information. Variety of methods.
For 1. Describe | Processing | Filtering, If you want to add or remove data based on its value, start with:
Relational algebra projection and selection
For 1. Describe | Processing | Filtering, If early results are uninformative and duplicative, start with:
Outlier removal, Exponential smoothing, Gaussian filter, Median filter
For 1. Describe | Processing | Imputation, If you want to generate values from other observations in your dataset, start with:
Random sampling, Markov Chain Monte Carlo (MC)
For 1. Describe | Processing | Imputation, If you want to generate values without using other observations in your dataset, start with:
Mean, Statistical distributions, Regression models
For 1. Describe | Processing | Dimensionality Reduction, If you need to determine whether there is multi-dimensional correlation, start with:
PCA and other factor analysis
For 1. Describe | Processing | Dimensionality Reduction, If you can represent individual observations by membership in a group, start with:
K-means clustering, Canopy clustering
For 1. Describe | Processing | Dimensionality Reduction, If you have unstructured text data, start with:
Term Frequency/Inverse Document Frequency (TF IDF)
For 1. Describe | Processing | Dimensionality Reduction, If you have a variable number of features but your algorithm requires a fixed number, start with:
Feature hashing
For 1. Describe | Processing | Dimensionality Reduction, If you are not sure which features are the most important, start with:
Wrapper methods, Sensitivity analysis
For 1. Describe | Processing | Dimensionality Reduction, If you need to facilitate understanding of the probability distribution of the space, start with:
Self organizing maps
For 1. Describe | Processing | Normalization & Transformation, If you suspect duplicate data elements, start with:
Deduplication
For 1. Describe | Processing | Normalization & Transformation, If you want your data to fall within a specified range, start with:
Normalization
For 1. Describe | Processing | Normalization & Transformation, If your data is stored in a binary format, start with:
Format Conversion
For 1. Describe | Processing | Normalization & Transformation, If you are operating in frequency space, start with:
Fast Fourier Transform (FFT), Discrete wavelet transform
For 1. Describe | Processing | Normalization & Transformation, If you are operating in Euclidian space, start with:
Coordinate transform
What is the key question for 1. Describe | Aggregation?
How do I collect and summarize my data?
For 1. Describe | Aggregation, If you are unfamiliar with the dataset, start with:
basic statistics: Count, Mean, Standard deviation, Range, Scatter Plots, Box plots