CAP Study Guide Flashcards
(101 cards)
What are the seven CAP domains?
- Frame the business problem, 2. Frame the analytics problem, 3. Data, 4. Select methodology / approach, 5. Build model, 6. Deploy solution, 7. Model lifecycle
The five E’s are
ethics, education, experience, examination, and effectiveness
One popular way to frame a business opportunity or problem is to obtain reliable information on
the five W’s: who, what, where, when, and why
Five W’s: Who
are the stakeholders who satisfy one or more of the following with respect to the project: funding, using, creating, or affected by the project’s outcome?
Five W’s: What
problem/function is the project meant to solve/perform?
Five W’s: When
When: does the problem occur, or function need to be performed? When does the project need to be completed?
Five W’s: Where
does the problem occur? Or where does the function need to be performed? Are the physical and spatial characteristics articulated?
Five W’s: Why
does the problem occur, or function need to occur?
After the initial analysis, it may be necessary to
refine the problem statement to make it more accurate, more appropriate to the stakeholders, or more amenable to available analytic tools/methods.
In framing the analytics problem, one danger we’re trying to avoid is
“anchoring.”
What is “anchoring”?
People have a tendency to hang on to views that they’ve seen and held before, even if they are incorrect.
How can you help mitigate the anchoring effect?
Remind team that assumptions are initial and preliminary, rather than finalized views.
Decomposition
the act of breaking down a higher-level requirement to multiple lower-level requirements
A requirement should be
unitary (no conjunctions such as and, but, or or), positive, and testable
What is EDA?
Exploratory data analaysis
DBSCAN stands for
Density-based spatial clustering of applications with noise
DBSCAN is a _____-based _____
density-based clustering algorithm
DBSCAN is one of the _____ algorithms
most common clustering algorithms
DBSCAN works by
grouping together points that are closely packed together (points with many nearby neighbors) and marks as outliers points that lie alone in low-density regions (whose nearest neighbors are too far away).
R squared is a statistic that will give some information about
the goodness of fit of a model.
In regression, the R squared coefficient of determination is a statistical measure of
how well the regression line approximates the real data points.
R squared is also known as
Coefficient of determination
An R squared of 1 indicates that
the regression line perfectly fits the data.
Low R-squared values are
not always bad and high R-squared values are not always good!