Flashcards in statistics exam 2 Deck (63):

## association

### values of one variable tend to occur with certain values of another variable; detected when the conditional distributions differ from the marginal distribution and from each other.

## bias

### a condition where the mean of the statistic values differs from the parameter and the statistic estimates

## bivariate data

### data collected on two variables for each individual in a study.

## central limit theorem

### the name of the statement telling us that the sampling distribution of x bar is approximately normal whenever the sample is large and random.

## conditional distribution

### the distribution of the values in a single row (or a single column) of a two-way table.

## control chart

### a statistical tool for monitoring the input or output of a process

## control limits

### u-3sigma/rt n and u+3sigma/rt n; used to detect out-of-control signals in a control chart.

## correlation coefficient

### a measure of the strength of the linear relationship between two quantitative variables.

## disjoint events

### events that cannot occur simultaneously

## distribution of a variable

### a list of the possible values of a variable together with the frequency of each value (probabilities can be given instead of frequencies)

## event

### a single outcome or a combination of outcomes from a random phenomenon

## extrapolation

### predicting a Y value using a value of X that is outside of the range of X values used to obtain the regression equation. This prediction could be very far off.

## inference

### using results from a sample statistic value to draw conclusions about the population parameter.

## influential observation

### an observation that substantially alters the values of slope and y intercept in the regression equation when it is included in the computations.

## law of large numbers

### The fact that the average (x bar) of observed values in a sample will get closer and closer to u as the sample size increases.

## laws of probability

### the basis for hypothesis testing and confidence interval estimation

## least squares

### a method for finding the equation of a line that minimizes the sum of squared residuals.

## least squares regression line:

### the line with the smallest sum of squared residuals

## lurking variable

### a variable that is not measured but explains association between two variables that are measured.

## marginal distribution

### the distribution of the values in the "total" row (or the "total" column) of a two-way table

## mean of the sampling distribution of x bar

### the mean of all the sample means (x bars) from all possible samples of size n from a population; equals u

## u

### the mean of the population

## no association

### a condition where values of one variable occur independent of values of another variable; detected when the conditionals of a two-way table equal the marginal distribution (and each other)

## out-of-control process

### one sample mean outside three standard deviations of x bar or 9 sample means in a row above or below the center line.

## outlier

### an observation that falls outside the overall pattern of the data set

## parameter

### a characteristic of a population that is usually unknown; this could be mean, median, proportion, standard deviation computed on all the data from the population; a parameter does not have variability

## parameter symbols

### u, sigma, and p (mean of population, standard deviation of population, proportion of a population)

## positive association

### high values of one variable tend to associate with high values of another variable.

## probability of an outcome

### a measure of the proportion of times an outcome occurs in a very long series of repetitions that gives us an indication of the likelihood of the outcome.

## process

### sequence of operations used in production, manufacturing, etc.

## process in statistical control

### a process whose inputs and outputs exhibit natural variation when observed over time

## quality control chart

### a chart plotting the means, x bar, of regular samples of size n against time; this chart is used to access whether the process is in control.

## quantitative bivariate:

### the type of data required for regression analysis

## r

### the symbol for correlation coefficient

## r squared

### the percentage of total variation in the response variable, y, that is explained by the regression equation; in other words, the percentage of total variation in the response variable, y, that is explained by the explanatory variable, X.

## random

### a phenomenon that describes the uncertainty of individuals outcomes but gives a regular distribution of the outcomes in the long run.

## regression equation

### a formula for a line that models a linear relationship between two quantitative variables

## residual

### the observed y minus the predicted y; denoted y-yhat

## residual plot

### a diagnostic plot of the explanatory variable versus the residuals used to access how well the regression line fits the data; complete scatter in a shoebox pattern is good whereas a megaphone pattern denotes unequal variance in Y's across all levels of X and curvature in the form of a smile or a frown denotes that the linear model isnot best for that data.

## sample mean, x bar

### the random variable ot the sampling distribution of x bar

## sample space

### the list of all possible outcomes of a random phenomenon

## sampling distribution

### a distribution of a statistic; a list of all the possible values of a statistic together with the frequency (or probability) of each value

## sampling distribution of x bar

### a list of all the possible values for x bar together with the frequency (or probability) of each value; in other words, the distribution of all x bar's from all possible samples

## sampling variability

### the variability of sample results from one sample to the next; something we must measure in order to effectively do inference

## scatterplot

### a two dimensional plot used to examine strength of relationship between two variables as well as direction and type of relationship.

## Simpson's paradox

### a condition where the percentages reverse when a third (lurking) variable is ignored; in other words, a condition leading to misinterpretation of the direction of association between two variables caused by ignoring a third variable that is associated with both of the reported variables.

## simulation

### using random numbers to imitate chance behavior

## slope

### a measure of the average change in the response variable for every one unit increase in the explanatory or independent variable

## standard deviation (s):

### a measure of the variability of data in a sample about x bar.

## standard deviation of x bar, also called the standard deviation of the sampling distribution of x bar

### a measure of the variability of the values of the statistic x bar about u; a measure of the variability of the sampling distribution of x bar; in other words, the "average" amount that the statistic, x bar, deviates from its associated parameter. computed as sigma/rt n

## statistic

### a number computed from sample data (without any knowledge of the value of a parameter) used to estimate the value of the parameter.

## statistic symbols:

### x bar, s, p hat (mean of sample, standard deviation of sample, proportion of sample)

## statistical process control

### a procedure used to check a process at regular intervals to detect problems and correct them before they become serious.

## sum of squared residuals (or error)

### the residuals are squared and added; denoted SSE.

## total variation in Y:

### the sum of the squared deviations of the Y observations about their mean, y hat

## two-way table

### a table containing counts for two categorical variables. It has r rows and c columns

## unbiased

### a condition where the mean of the statistic values equals the parameter that the statistic estimates

## unexplained variation

### the sum of squared residuals

## X:

### the symbol for explanatory variable

## x bar-chart

### a plot of sample means over time used to assess whether a process is in control

## Y:

### the symbol for response variable

## y hat:

### the symbol for predicted y

