1 Descriptive and inferential statistics Flashcards
(16 cards)
What is Statistics?
The science of collecting, analyzing, interpreting, presenting, and organizing data.
What are the two main branches of statistics?
Descriptive Statistics and Inferential Statistics.
What is Descriptive Statistics?
Methods for organizing, summarizing, and presenting data in an informative way (e.g., calculating means, creating graphs). It describes the features of a dataset.
What is Inferential Statistics?
Methods used to draw conclusions or make predictions about a larger population based on data collected from a smaller sample.
Define Population and Sample.
Population: The entire group of individuals, objects, or measurements of interest.
Sample: A subset or portion of the population selected for study.
What are Quantiles, Quartiles, and Percentiles used for?
They are used for describing data by dividing a probability distribution or a sample into continuous intervals with equal probabilities or observations.
Quantiles: General term for points dividing the data range.
Quartiles: Divide the data into four equal parts (Q1, Q2/Median, Q3).
Percentiles: Divide the data into 100 equal parts.
What is a Boxplot (or Box-and-Whisker Plot)?
A graphical representation of the distribution of a dataset based on its five-number summary: minimum, first quartile (Q1), median (Q2), third quartile (Q3), and maximum. It helps visualize central tendency, spread, and identify potential outliers.
What are Frequency and Density Distributions?
Frequency Distribution: A table or graph showing how often different values or ranges of values occur in a dataset.
Density Distribution: Represents the distribution of continuous data, often shown as a smoothed curve (like a PDF). The area under the curve represents probability.
What are Distribution Histograms and Curves used for?
They are visual tools used to represent the shape, center, and spread of a dataset’s distribution. Histograms use bars for frequency counts in intervals; curves (like density curves) provide a smoothed representation.
What is the basic concept of Probability?
The measure of the likelihood that a specific event will occur. It’s expressed as a number between 0 (impossibility) and 1 (certainty).
Why are Theoretical Distributions used?
They are used to model and describe random phenomena or variables. They provide a mathematical function that approximates the probability distribution of real-world data (e.g., Normal, Binomial, Poisson distributions).
Define PDF, CDF, and ICDF.
PDF (Probability Density Function): For continuous variables, describes the relative likelihood for a random variable to take on a given value. The area under the PDF curve between two points gives the probability of the variable falling within that range.
CDF (Cumulative Distribution Function): Gives the probability that a random variable is less than or equal to a specific value ‘x’.
ICDF (Inverse Cumulative Distribution Function / Quantile Function): Given a probability ‘p’, it returns the value ‘x’ such that P(X ≤ x) = p. It’s the inverse of the CDF.
What is the Normal Distribution?
A continuous probability distribution that is symmetrical and bell-shaped. It’s defined by its mean (μ) and standard deviation (σ). Many natural phenomena approximate this distribution.
What are Standard Scores (Z-scores) and Standardization?
Standard Score (Z-score): Measures how many standard deviations a specific data point is away from the mean of its distribution. Formula: Z = (X - μ) / σ.
Standardization: The process of converting data points to Z-scores, resulting in a distribution with a mean of 0 and a standard deviation of 1. This allows comparison of scores from different distributions.
What is a QQ Plot (Quantile-Quantile Plot)?
A graphical tool used to assess if a dataset follows a particular theoretical distribution (often the Normal distribution). It plots the quantiles of the dataset against the quantiles of the theoretical distribution. If the points fall approximately on a straight line, the data likely follows that distribution.
What is the purpose of Normality Tests?
Formal statistical procedures (like Shapiro-Wilk or Kolmogorov-Smirnov tests) used to determine whether there is significant evidence to reject the hypothesis that a dataset comes from a normally distributed population.