Terms_and_Definitions Flashcards by Lauren Taylor

A/B Test

A method of comparing two versions of a webpage, feature, or app against each other to determine which performs better.

How well did you know this?

Not at all

Perfectly

Null Hypothesis (H₀)

Assumes there is no significant difference between the control and test groups.

How well did you know this?

Not at all

Perfectly

Alternative Hypothesis (H₁)

Assumes there is a significant difference between the control and test groups.

How well did you know this?

Not at all

Perfectly

P-Value

The probability of observing results at least as extreme as those measured, assuming the null hypothesis is true.

How well did you know this?

Not at all

Perfectly

Significance Level (α)

The threshold for rejecting the null hypothesis (commonly set at 0.05).

How well did you know this?

Not at all

Perfectly

Confidence Interval (CI)

A range of values that is likely to contain the true effect size or metric with a given level of confidence (e.g., 95%).

How well did you know this?

Not at all

Perfectly

Control Group

The group that does not receive the treatment or variant being tested.

How well did you know this?

Not at all

Perfectly

Test Group

The group that receives the treatment or variant being tested.

How well did you know this?

Not at all

Perfectly

Randomization

Assigning participants to groups in a way that each participant has an equal chance of being in any group.

How well did you know this?

Not at all

Perfectly

Power Analysis

A calculation to determine the minimum sample size required to detect a given effect size with sufficient power.

How well did you know this?

Not at all

Perfectly

Effect Size

The magnitude of the difference between groups (e.g., a 5% increase in conversion rate).

How well did you know this?

Not at all

Perfectly

Type I Error

Incorrectly rejecting the null hypothesis (false positive).

How well did you know this?

Not at all

Perfectly

Type II Error

Failing to reject the null hypothesis when it is false (false negative).

How well did you know this?

Not at all

Perfectly

Bonferroni Correction

A method to adjust significance levels when multiple comparisons are being made.

How well did you know this?

Not at all

Perfectly

Simpson’s Paradox

A trend appears in different groups of data but disappears or reverses when the groups are combined.

How well did you know this?

Not at all

Perfectly

Descriptive Statistics

Summarizing and describing the features of a dataset (e.g., mean, median, mode).

How well did you know this?

Not at all

Perfectly

Inferential Statistics

Using a sample to make generalizations about a population (e.g., hypothesis testing, confidence intervals).

How well did you know this?

Not at all

Perfectly

Mean

The average value of a dataset.

How well did you know this?

Not at all

Perfectly

Median

The middle value in a dataset when ordered.

How well did you know this?

Not at all

Perfectly

Mode

The most frequently occurring value in a dataset.

How well did you know this?

Not at all

Perfectly

Variance

A measure of how much values in a dataset vary from the mean.

How well did you know this?

Not at all

Perfectly

Standard Deviation

The square root of the variance, representing data dispersion.

How well did you know this?

Not at all

Perfectly

Z-Test

A hypothesis test for comparing means when the population variance is known.

How well did you know this?

Not at all

Perfectly

T-Test

A hypothesis test for comparing means when the population variance is unknown.

How well did you know this?

Not at all

Perfectly

ANOVA (Analysis of Variance)

A test to compare the means of three or more groups.

Chi-Square Test

A test for relationships between categorical variables.

Linear Regression

A method to model the relationship between a dependent variable and one or more independent variables.

Logistic Regression

A regression model used when the dependent variable is categorical.

Bayesian Statistics

An approach to statistics that incorporates prior beliefs or evidence.

Frequentist Statistics

A traditional approach to statistics based on frequency or proportion.

SELECT

A SQL command used to retrieve data from a database.

FROM

Specifies the table to retrieve data from.

WHERE

Filters rows based on conditions.

GROUP BY

Groups rows sharing a property for aggregation.

HAVING

Filters grouped rows based on aggregated values.

JOIN

Combines rows from two or more tables based on a related column.

INNER JOIN

Returns rows with matching values in both tables.

LEFT JOIN

Returns all rows from the left table and matching rows from the right table.

RIGHT JOIN

Returns all rows from the right table and matching rows from the left table.

OUTER JOIN

Returns all rows from both tables, with nulls where no match exists.

ORDER BY

Sorts the result set by specified columns.

LIMIT

Restricts the number of rows returned in a query.

Subquery

A query nested within another query.

CTE (Common Table Expression)

A temporary result set used within a SQL query.

Pandas

A library for data manipulation and analysis.

NumPy

A library for numerical computations.

Matplotlib

A library for creating static visualizations.

Seaborn

A library for statistical data visualization.

Scipy.stats

A library for statistical functions and tests.

Statsmodels

A Python module for statistical modeling and hypothesis testing.

A/B Test Simulation

A process to mimic test results using random sampling or bootstrapping.

Data Visualization

Representing data graphically to communicate insights.

Dashboard

A visual interface that displays key performance metrics and data.

Power BI

A business analytics tool for creating dashboards and visualizations.

Tableau

A software tool for data visualization and business intelligence.

Funnel Analysis

A method to track user journey and identify drop-off points.

Cohort Analysis

Analyzing behavior by grouping users based on shared characteristics.

Customer Journey

The path a customer takes from initial interaction to conversion.

Clickstream Data

Data collected about user interactions on a website or app.

Hadoop

A framework for distributed storage and processing of large datasets.

Telemetry

The collection of data about the usage of a digital product.

Data Pipeline

A series of steps to process and analyze data from source to destination.

Hypothesis Validation

The process of testing assumptions with data.

Exploratory Data Analysis (EDA)

Initial analysis to summarize data characteristics.

ETL (Extract, Transform, Load)

A process for collecting, transforming, and storing data.

Terms_and_Definitions Flashcards

(65 cards)