Flashcards in statistics exam 2 Deck (63):
values of one variable tend to occur with certain values of another variable; detected when the conditional distributions differ from the marginal distribution and from each other.
a condition where the mean of the statistic values differs from the parameter and the statistic estimates
data collected on two variables for each individual in a study.
central limit theorem
the name of the statement telling us that the sampling distribution of x bar is approximately normal whenever the sample is large and random.
the distribution of the values in a single row (or a single column) of a two-way table.
a statistical tool for monitoring the input or output of a process
u-3sigma/rt n and u+3sigma/rt n; used to detect out-of-control signals in a control chart.
a measure of the strength of the linear relationship between two quantitative variables.
events that cannot occur simultaneously
distribution of a variable
a list of the possible values of a variable together with the frequency of each value (probabilities can be given instead of frequencies)
a single outcome or a combination of outcomes from a random phenomenon
predicting a Y value using a value of X that is outside of the range of X values used to obtain the regression equation. This prediction could be very far off.
using results from a sample statistic value to draw conclusions about the population parameter.
an observation that substantially alters the values of slope and y intercept in the regression equation when it is included in the computations.
law of large numbers
The fact that the average (x bar) of observed values in a sample will get closer and closer to u as the sample size increases.
laws of probability
the basis for hypothesis testing and confidence interval estimation
a method for finding the equation of a line that minimizes the sum of squared residuals.
least squares regression line:
the line with the smallest sum of squared residuals
a variable that is not measured but explains association between two variables that are measured.
the distribution of the values in the "total" row (or the "total" column) of a two-way table
mean of the sampling distribution of x bar
the mean of all the sample means (x bars) from all possible samples of size n from a population; equals u
the mean of the population
a condition where values of one variable occur independent of values of another variable; detected when the conditionals of a two-way table equal the marginal distribution (and each other)
one sample mean outside three standard deviations of x bar or 9 sample means in a row above or below the center line.
an observation that falls outside the overall pattern of the data set
a characteristic of a population that is usually unknown; this could be mean, median, proportion, standard deviation computed on all the data from the population; a parameter does not have variability
u, sigma, and p (mean of population, standard deviation of population, proportion of a population)
high values of one variable tend to associate with high values of another variable.
probability of an outcome
a measure of the proportion of times an outcome occurs in a very long series of repetitions that gives us an indication of the likelihood of the outcome.
sequence of operations used in production, manufacturing, etc.
process in statistical control
a process whose inputs and outputs exhibit natural variation when observed over time
quality control chart
a chart plotting the means, x bar, of regular samples of size n against time; this chart is used to access whether the process is in control.
the type of data required for regression analysis
the symbol for correlation coefficient
the percentage of total variation in the response variable, y, that is explained by the regression equation; in other words, the percentage of total variation in the response variable, y, that is explained by the explanatory variable, X.
a phenomenon that describes the uncertainty of individuals outcomes but gives a regular distribution of the outcomes in the long run.
a formula for a line that models a linear relationship between two quantitative variables
the observed y minus the predicted y; denoted y-yhat
a diagnostic plot of the explanatory variable versus the residuals used to access how well the regression line fits the data; complete scatter in a shoebox pattern is good whereas a megaphone pattern denotes unequal variance in Y's across all levels of X and curvature in the form of a smile or a frown denotes that the linear model isnot best for that data.
sample mean, x bar
the random variable ot the sampling distribution of x bar
the list of all possible outcomes of a random phenomenon
a distribution of a statistic; a list of all the possible values of a statistic together with the frequency (or probability) of each value
sampling distribution of x bar
a list of all the possible values for x bar together with the frequency (or probability) of each value; in other words, the distribution of all x bar's from all possible samples
the variability of sample results from one sample to the next; something we must measure in order to effectively do inference
a two dimensional plot used to examine strength of relationship between two variables as well as direction and type of relationship.
a condition where the percentages reverse when a third (lurking) variable is ignored; in other words, a condition leading to misinterpretation of the direction of association between two variables caused by ignoring a third variable that is associated with both of the reported variables.
using random numbers to imitate chance behavior
a measure of the average change in the response variable for every one unit increase in the explanatory or independent variable
standard deviation (s):
a measure of the variability of data in a sample about x bar.
standard deviation of x bar, also called the standard deviation of the sampling distribution of x bar
a measure of the variability of the values of the statistic x bar about u; a measure of the variability of the sampling distribution of x bar; in other words, the "average" amount that the statistic, x bar, deviates from its associated parameter. computed as sigma/rt n
a number computed from sample data (without any knowledge of the value of a parameter) used to estimate the value of the parameter.
x bar, s, p hat (mean of sample, standard deviation of sample, proportion of sample)
statistical process control
a procedure used to check a process at regular intervals to detect problems and correct them before they become serious.
sum of squared residuals (or error)
the residuals are squared and added; denoted SSE.
total variation in Y:
the sum of the squared deviations of the Y observations about their mean, y hat
a table containing counts for two categorical variables. It has r rows and c columns
a condition where the mean of the statistic values equals the parameter that the statistic estimates
the sum of squared residuals
the symbol for explanatory variable
a plot of sample means over time used to assess whether a process is in control
the symbol for response variable
the symbol for predicted y