Flashcards in Exam #1 Deck (82):
a collection of methods for planning studies and experiments, obtaining data, then organizing, summarizing, presenting, analyzing, interpreting, and drawing conclusions from data.
observations that have been collected, such as measurements or responses
the complete collection of all measurements or data that are being considered
the collection of data from every member of the population
a sub-collection of members selected from a population
a numerical measurement describing some characteristic of a population
A numerical measurement describing some characteristic of a sample
(numerical data) consists of number representing counts or measurements
(qualitative data) consists of names or labels that can be separated into different categories distinguished by some non-numerical characteristic
the data may take on any of a finite or "countable" number of possible values (example: how many eggs does a hen lay in a day?...can be counted: 1, 2, 3, etc)
the data may take on any value over a continuous range of infinitely many possible values (example: How much milk does a cow give in a day?...cannot count milks: could be 1.666666........)
Voluntary Response Sample
the respondents themselves decide whether or not to be included in the sample (such as a phone survey). The people who choose to participate may have different opinions/characteristics than those who do not.
the subjects of the study are asked to report results about themselves, which may lead to them providing desired results; it is better for the researcher to measure the data.
a conclusion is drawn about a large population based on a sample of a small number of subjects, which may or may not represent the population as a whole.
a question used in a survey may contain language which influences the subject's response
Order of Questions or Words
the order of the words in a question or of the possible answers may affect the response
some of the subjects either refuse to respond to a question or are unavailable; this may skew results because those who refuse to talk are likely to be different from those who are willing to talk (this may lead to a voluntary response sample).
a statistical study is sponsored (paid for) by a party that is trying to promote its own interest.
What are the 7 Potential Statistical Flaws?
1. Voluntary Response Sample
2. Reported Results
3. Small Samples
4. Loaded Questions
5. Order of Questions or Words
7. Self-Interest Study
a study in which we observe and measure specific characteristics but do not attempt to modify the subjects of the study
a study in which we apply some treatment to the subjects, then watch and observe the effects of that treatment
What are the 6 different sampling methods?
1. Random sample (not specific - could be used in conjunction with the others)
2. Simple random sample
3. Systematic sampling
4. Convenience sampling
5. Stratified sampling
6. Cluster sampling
members from the population are selected in such a way that each individual member has an equal chance of being selected
Simple random sample
a sample of size n is selected in such a way that every possible sample of size n has an equal chance of being selected (BEST METHOD)
a starting point is selected randomly, then every kth element in the population is selected
data is collected that is easy to access
the population is divided into at least two different sub-groups that share the same characteristics, then a sample is taken from each subgroup
the population is divided into "clusters" based on locations, then some of those clusters are selected randomly and all the members from the selected clusters are samples
What are the 2 sampling errors?
1. Sampling error - the difference between the sample result and the true population result: such an error results naturally from sample fluctuations
2. Non-sampling error - the result of human error, including such factors as wrong data entries, computing errors, questions with biased wording, false data provided by respondents, etc.
What are 4 characteristics of data?
a representative value that indicates where the middle of the data set is located (average value)
a measure of the amount that the data values vary
the nature or shape of the spread of the data over the range of values
sample values that lie very far away from the vast majority of the other sample values
a table listing data values (either individually or by class) along with their corresponding frequencies.
Lower class limits
the smallest numbers that can belong to each of the classes
Upper class limits
the largest numbers that can belong to each of the classes
the midpoints of the classes (found by adding the lower class limit to the upper class limit, then dividing by two.)
the difference between two consecutive lower class limits (found by subtracting two consecutive lower class limits)
What is the procedure for constructing a frequency distribution?
1. Decide how many classes you want to use.
2. Calculate the class width (class width= largest value-smallest value/# of classes)
3. Choose the lower limit of the first class (should be the smallest data value or just below)
4. List all lower class limits
5. List all upper class limits
6. Tally up the date in each class to find the frequency
Relative Frequency Distribution
a frequency distribution in which frequencies are replaced by relative frequencies (proportions or percentages)
How do you calculate relative frequency?
Relative frequency = frequency/total number of observations
a bar graph in which the horizontal axis represents the data classes, the vertical scale represents frequencies, and the height of the bars correspond to the frequencies of the classes
Normal (bell shaped) distribution form
the highest frequency occurs in the middle and the frequencies tail off to the left and right
the frequencies of the classes are equal
Skewed right or left
the histogram is not symmetric, with higher frequencies occurring on one side than on the other
What are 4 types of statistical graphs?
1. Frequency Polygon
3. Pareto Charts
4. Pie Chart
a table display in which quantitative data is separated into two parts - the stem (the leftmost digits) and the leaf (the rightmost digit) *benefit: helps preserve the original data
a bar graph used for qualitative data. The data values are arranged with the highest frequency values to the left and the lowest frequency values to the right.
Measure of Center
a value at the center or middle of a data set
What are the 4 measures of center?
1. Mean (calculated by adding all the data values then dividing by the number of values
2. Median (calculated by arranging the data values in ascending order, then selecting the middle value from the list
3. Mode (data value that occurs most frequently in the data set)
4. Midrange - the value midway between the lowest and highest values in the data set (calculate by adding the smallest and largest value then dividing by 2)
What is the relationship between the mean, median, and mode in a normal distribution?
What is the relationship between the mean, median, and mode in a skewed right distribution?
What is the relationship between the mean, median, and mode in a skewed left distribution?
the extent to which data values vary from each other
the difference between the maximum and minimum values of a data set
How do you calculate the range?
Range = maximum data value - minimum data value
a measure of how much the data values deviate from the mean
What is the procedure to calculate the standard deviation of sample data?
1. Compute the sample mean
2. Subtract the mean from each data value, to obtain a list of deviations of the form (x-mean)
3. Square each of the differences obtained in step 2
4. Find the sum of all the squared values
5. Divide the sum by n-1, where n is the number of values in the sample
6. Take the square root
What is the procedure to calculate the standard deviation of a population?
Same formula as for the standard deviation of sample data except you divide the sum by N rather than n-1
a measure of variation equal to the square of the standard deviation
How do you calculate variance?
Use the same formula as the standard deviation but not take the square root.
Characteristics that apply to data sets whose distributions are approximately bell-shaped:
1. about 68% of all data values fall within 1 s.d. (standard deviation) of the mean.
2. about 95% of all data values fall within 2 s.d. of the mean
3. about 99.7% of all data values fall within 3 s.d. of the mean
What data values are considered unusual?
Any values outside of 2 s.d.
How do you find unusual values? What two values do you need to know?
Need to know the mean and the standard deviation.
Mean + 2(Standard Deviation) = max normal values
Mean - 2(standard Deviation) = min normal values
Coefficient of Variation (definition and formula)
describes the standard deviation as a percentage of the mean (CV = standard deviation/mean x 100%)
Z Score (definition and formula)
(standardized value) the number of standard deviations that a given data value is above or below the mean (z = x - mean/standard deviation)
How do you identify unusual values using z Scores?
Unusual values: z score < -2, z score > 2
numbers which divide a set of data into 100 groups with about 1% of the values in each group.
If someone is in the 99th percentile, they would be in what percent of something?
They would be in the top 1%
If someone scored in the 68th percentile, what does that mean?
they did better than 68% of the class
How do you calculate the percentile corresponding to a data value?
Percentile of value x = (number of values less than x/total number of values) x 100
numbers which divide a set of data into four groups, with about 25% of the values in each group
a list of the minimum value, the three quartiles, and the maximum value for a data set
A graph of the data set consisting of a line extending from the minimum value to the maximum value and a box with lines drawn at each of the three quartiles
a process with uncertain results that can be repeated (ex: rolling a die, measuring the head circumference of a baby)
any collection of results or outcomes of a procedure
the set of all possible outcomes for a procedure
What does P(A) mean?
The probability of event A occurring
Relative Frequency Approximation
conduct or observe a procedure a large number of times, and count the number of occurrences of event A. the P(A) can be estimated as the number of times A occurred divided by the total number of trials.
(P(A) = number of times event A occurred/number of times the procedure was repeated)
(most commonly used) assume that a procedure has n different outcomes each with an equal chance of occurring. (P(A) = number of ways A can occur/number of possible outcomes)