CRO Principles Flashcards
(32 cards)
What is a strong CRO Hypothesis? (Hint: Structure)
A testable statement predicting an outcome: Changing [Element X] into [Variation Y] for [Audience Segment Z] will result in [Impact on Metric K] because [Rationale based on Data/Insight].
What is the primary purpose and interpretation of an A/A Test?
Purpose: Validate testing tool setup & methodology understand inherent variance (noise). Interpretation: Run identical versions; expect non-significant results for the primary metric approx. 95% of the time (at 95% confidence). SRMs should be checked.
What is the core difference/advantage of a Bayesian approach in A/B testing?
Calculates full probability distributions for metrics yielding intuitive results like P(B>A) (Probability B is better than A) and Expected Loss. Allows for potentially faster decisions (with caution) compared to fixed-horizon frequentist tests.
What is Sequential Testing in experimentation? (Concept & Benefit)
Method allowing continuous analysis during a test using statistical stopping boundaries (for significance or futility) while controlling Type I/II error rates. Benefit: Can significantly reduce average test duration vs fixed-horizon tests.
What are Multi-armed Bandit algorithms in CRO? (Concept & Use Case)
Algorithms that dynamically allocate more traffic to better-performing variations during exploration balancing learning (explore) vs maximizing immediate return (exploit). Use Case: Often for short-term optimizations (e.g. headlines) or personalization.
What is Statistical Significance (p-value)?
Probability the observed difference (or a larger one) between variations occurred purely by random chance assuming the null hypothesis (no real difference) is true. Threshold (alpha) often 0.05.
What is Confidence Level?
Represents the level of certainty that a test result isn’t due to random chance. If you set a 95% confidence level, you accept a 5% risk (alpha) of concluding there’s a difference when there really isn’t one (a false positive).
What is a Confidence Interval?
An estimated range of values (e.g. for uplift or conversion rate) calculated from sample data likely to contain the true population parameter at a specified confidence level. Indicates precision of the estimate.
What determines required Sample Size in testing?
Key factors include: Baseline Conversion Rate (BCR) Minimum Detectable Effect (MDE) desired Statistical Power (1-beta) and Significance Level (alpha). Also variance for continuous metrics.
What is Statistical Power? (Definition & Implication)
The probability (1 - beta) of correctly rejecting the null hypothesis when it is false (i.e. detecting a real effect if it exists). Implication: Low power increases the risk of Type II errors (false negatives). Typically set at 80%+.
What is Minimum Detectable Effect (MDE)?
The smallest effect size (e.g. relative uplift) that a test is designed to reliably detect with the specified Power and Significance levels. A crucial input for sample size calculation based on business needs.
What is a Type I Error (False Positive)? (Definition & Risk Control)
Incorrectly rejecting a true null hypothesis (claiming a difference exists when it doesn’t). Risk is controlled by the significance level (alpha e.g. 0.05).
What is a Type II Error (False Negative)? (Definition & Risk Control)
Failing to reject a false null hypothesis (missing a real difference when it exists). Risk (beta) is controlled by Statistical Power (Power = 1 - beta).
What is Regression to the Mean & its implication in testing?
Statistical tendency for extreme results on initial measurements to be closer to the average on subsequent measurements. Implication: Distrust unusually large effects seen early in a test; wait for sufficient data.
What is the ‘Multiple Comparisons Problem’ in experiment analysis?
Analyzing multiple metrics or segments increases the overall probability of making at least one Type I error (false positive) purely by chance. Requires adjustments (e.g. Bonferroni FDR) or pre-specification.
What is Simpson’s Paradox & its risk in CRO?
A trend appears in different groups of data but disappears or reverses when the groups are combined. Risk: Aggregate A/B test results can mislead if segments with different baseline rates are disproportionately represented across variations.
What are Novelty & Learning Effects in testing? (Definition & Mitigation)
Novelty: Initial user reaction (positive or negative) to change itself. Learning: Time users take to adapt to a change. Mitigation: Run tests long enough for effects to stabilize; segment by user tenure; monitor metrics over time.
What is Twyman’s Law in data analysis?
“Any figure that looks interesting or different is usually wrong.” Implication: Be highly skeptical of surprising or outlier data points/results; rigorously investigate potential errors in tracking setup or analysis.
What is the challenge with Network Effects / Interference in testing?
Occurs when one user’s experience affects another’s (e.g. social platforms marketplaces) violating the A/B test assumption of independent observations (SUTVA). Requires alternative designs like cluster or switchback randomization.
What’s a key consideration for testing Non-Binary Metrics? (e.g. AOV RPU)
Metrics like AOV or RPU often aren’t normally distributed. Requires appropriate statistical tests (e.g. t-test variants if assumptions met or non-parametric tests like Mann-Whitney U) and careful handling of outliers and variance.
What is Sample Ratio Mismatch (SRM)?
When the observed ratio of users/sessions assigned to variations significantly deviates from the intended ratio (e.g. not 50/50). Indicates a potential issue with the randomization or data collection process invalidating results.
How are Jobs-to-be-Done (JTBD) interviews used in CRO research?
To uncover underlying user goals (functional social emotional) informing value propositions and identifying unmet needs/opportunities for optimization beyond surface-level features or interactions.
What key insights does Session Recording analysis provide for CRO?
Reveals specific user journeys identifies friction points (hesitation rage clicks U-turns) visualizes interaction with dynamic elements and provides qualitative context for drop-offs seen in quantitative funnel analysis.
What are Heatmaps/Clickmaps used for in CRO research?
To visualize aggregate user attention (heatmaps) and interaction patterns (clickmaps - including ‘dead clicks’ on non-interactive elements) revealing what users see ignore and attempt to engage with on a page.