Exam #1 Flashcards Preview

Statistics > Exam #1 > Flashcards

Flashcards in Exam #1 Deck (82):
1

Statistics

a collection of methods for planning studies and experiments, obtaining data, then organizing, summarizing, presenting, analyzing, interpreting, and drawing conclusions from data.

2

Data

observations that have been collected, such as measurements or responses

3

Population

the complete collection of all measurements or data that are being considered

4

Census

the collection of data from every member of the population

5

Sample

a sub-collection of members selected from a population

6

Parameter

a numerical measurement describing some characteristic of a population

7

Statistic

A numerical measurement describing some characteristic of a sample

8

Quantitative Data

(numerical data) consists of number representing counts or measurements

9

Categorical Data

(qualitative data) consists of names or labels that can be separated into different categories distinguished by some non-numerical characteristic

10

Discrete Data

the data may take on any of a finite or "countable" number of possible values (example: how many eggs does a hen lay in a day?...can be counted: 1, 2, 3, etc)

11

Continuous Data

the data may take on any value over a continuous range of infinitely many possible values (example: How much milk does a cow give in a day?...cannot count milks: could be 1.666666........)

12

Voluntary Response Sample

the respondents themselves decide whether or not to be included in the sample (such as a phone survey). The people who choose to participate may have different opinions/characteristics than those who do not.

13

Reported Results

the subjects of the study are asked to report results about themselves, which may lead to them providing desired results; it is better for the researcher to measure the data.

14

Small samples

a conclusion is drawn about a large population based on a sample of a small number of subjects, which may or may not represent the population as a whole.

15

Loaded Questions

a question used in a survey may contain language which influences the subject's response

16

Order of Questions or Words

the order of the words in a question or of the possible answers may affect the response

17

Nonresponse

some of the subjects either refuse to respond to a question or are unavailable; this may skew results because those who refuse to talk are likely to be different from those who are willing to talk (this may lead to a voluntary response sample).

18

Self-Interest Study

a statistical study is sponsored (paid for) by a party that is trying to promote its own interest.

19

What are the 7 Potential Statistical Flaws?

1. Voluntary Response Sample
2. Reported Results
3. Small Samples
4. Loaded Questions
5. Order of Questions or Words
6. Nonresponse
7. Self-Interest Study

20

Observational Study

a study in which we observe and measure specific characteristics but do not attempt to modify the subjects of the study

21

Experiment

a study in which we apply some treatment to the subjects, then watch and observe the effects of that treatment

22

What are the 6 different sampling methods?

1. Random sample (not specific - could be used in conjunction with the others)
2. Simple random sample
3. Systematic sampling
4. Convenience sampling
5. Stratified sampling
6. Cluster sampling

23

Random sample

members from the population are selected in such a way that each individual member has an equal chance of being selected

24

Simple random sample

a sample of size n is selected in such a way that every possible sample of size n has an equal chance of being selected (BEST METHOD)

25

Systematic sampling

a starting point is selected randomly, then every kth element in the population is selected

26

Convenience sampling

data is collected that is easy to access

27

Stratified sampling

the population is divided into at least two different sub-groups that share the same characteristics, then a sample is taken from each subgroup

28

Cluster sampling

the population is divided into "clusters" based on locations, then some of those clusters are selected randomly and all the members from the selected clusters are samples

29

What are the 2 sampling errors?

1. Sampling error - the difference between the sample result and the true population result: such an error results naturally from sample fluctuations
2. Non-sampling error - the result of human error, including such factors as wrong data entries, computing errors, questions with biased wording, false data provided by respondents, etc.

30

What are 4 characteristics of data?

1. Center
2. Variation
3. Distribution
4. Outliers

31

Center

a representative value that indicates where the middle of the data set is located (average value)

32

Variation

a measure of the amount that the data values vary

33

Distribution

the nature or shape of the spread of the data over the range of values

34

Outliers

sample values that lie very far away from the vast majority of the other sample values

35

Frequency Distribution

a table listing data values (either individually or by class) along with their corresponding frequencies.

36

Lower class limits

the smallest numbers that can belong to each of the classes

37

Upper class limits

the largest numbers that can belong to each of the classes

38

Class midpoints

the midpoints of the classes (found by adding the lower class limit to the upper class limit, then dividing by two.)

39

Class width

the difference between two consecutive lower class limits (found by subtracting two consecutive lower class limits)

40

What is the procedure for constructing a frequency distribution?

1. Decide how many classes you want to use.
2. Calculate the class width (class width= largest value-smallest value/# of classes)
3. Choose the lower limit of the first class (should be the smallest data value or just below)
4. List all lower class limits
5. List all upper class limits
6. Tally up the date in each class to find the frequency

41

Relative Frequency Distribution

a frequency distribution in which frequencies are replaced by relative frequencies (proportions or percentages)

42

How do you calculate relative frequency?

Relative frequency = frequency/total number of observations

43

Histogram

a bar graph in which the horizontal axis represents the data classes, the vertical scale represents frequencies, and the height of the bars correspond to the frequencies of the classes

44

Normal (bell shaped) distribution form

the highest frequency occurs in the middle and the frequencies tail off to the left and right

45

Uniform

the frequencies of the classes are equal

46

Skewed right or left

the histogram is not symmetric, with higher frequencies occurring on one side than on the other

47

What are 4 types of statistical graphs?

1. Frequency Polygon
2. Stemplot
3. Pareto Charts
4. Pie Chart

48

Stemplot

a table display in which quantitative data is separated into two parts - the stem (the leftmost digits) and the leaf (the rightmost digit) *benefit: helps preserve the original data

49

Pareto Charts

a bar graph used for qualitative data. The data values are arranged with the highest frequency values to the left and the lowest frequency values to the right.

50

Measure of Center

a value at the center or middle of a data set

51

What are the 4 measures of center?

1. Mean (calculated by adding all the data values then dividing by the number of values
2. Median (calculated by arranging the data values in ascending order, then selecting the middle value from the list
3. Mode (data value that occurs most frequently in the data set)
4. Midrange - the value midway between the lowest and highest values in the data set (calculate by adding the smallest and largest value then dividing by 2)

52

What is the relationship between the mean, median, and mode in a normal distribution?

mean=median=mode

53

What is the relationship between the mean, median, and mode in a skewed right distribution?

mode

54

What is the relationship between the mean, median, and mode in a skewed left distribution?

mean

55

Variation

the extent to which data values vary from each other

56

Range

the difference between the maximum and minimum values of a data set

57

How do you calculate the range?

Range = maximum data value - minimum data value

58

Standard Deviation

a measure of how much the data values deviate from the mean

59

What is the procedure to calculate the standard deviation of sample data?

1. Compute the sample mean
2. Subtract the mean from each data value, to obtain a list of deviations of the form (x-mean)
3. Square each of the differences obtained in step 2
4. Find the sum of all the squared values
5. Divide the sum by n-1, where n is the number of values in the sample
6. Take the square root

60

What is the procedure to calculate the standard deviation of a population?

Same formula as for the standard deviation of sample data except you divide the sum by N rather than n-1

61

Variance

a measure of variation equal to the square of the standard deviation

62

How do you calculate variance?

Use the same formula as the standard deviation but not take the square root.

63

Empirical Rule

Characteristics that apply to data sets whose distributions are approximately bell-shaped:
1. about 68% of all data values fall within 1 s.d. (standard deviation) of the mean.
2. about 95% of all data values fall within 2 s.d. of the mean
3. about 99.7% of all data values fall within 3 s.d. of the mean

64

What data values are considered unusual?

Any values outside of 2 s.d.

65

How do you find unusual values? What two values do you need to know?

Need to know the mean and the standard deviation.
Mean + 2(Standard Deviation) = max normal values
Mean - 2(standard Deviation) = min normal values

66

Coefficient of Variation (definition and formula)

describes the standard deviation as a percentage of the mean (CV = standard deviation/mean x 100%)

67

Z Score (definition and formula)

(standardized value) the number of standard deviations that a given data value is above or below the mean (z = x - mean/standard deviation)

68

How do you identify unusual values using z Scores?

Unusual values: z score < -2, z score > 2

69

Percentiles

numbers which divide a set of data into 100 groups with about 1% of the values in each group.

70

If someone is in the 99th percentile, they would be in what percent of something?

They would be in the top 1%

71

If someone scored in the 68th percentile, what does that mean?

they did better than 68% of the class

72

How do you calculate the percentile corresponding to a data value?

Percentile of value x = (number of values less than x/total number of values) x 100

73

Quartiles

numbers which divide a set of data into four groups, with about 25% of the values in each group

74

5-number summary

a list of the minimum value, the three quartiles, and the maximum value for a data set

75

Boxplot

A graph of the data set consisting of a line extending from the minimum value to the maximum value and a box with lines drawn at each of the three quartiles

76

Procedure

a process with uncertain results that can be repeated (ex: rolling a die, measuring the head circumference of a baby)

77

Event

any collection of results or outcomes of a procedure

78

Sample Space

the set of all possible outcomes for a procedure

79

What does P(A) mean?

The probability of event A occurring

80

Relative Frequency Approximation

conduct or observe a procedure a large number of times, and count the number of occurrences of event A. the P(A) can be estimated as the number of times A occurred divided by the total number of trials.
(P(A) = number of times event A occurred/number of times the procedure was repeated)

81

Classical approach

(most commonly used) assume that a procedure has n different outcomes each with an equal chance of occurring. (P(A) = number of ways A can occur/number of possible outcomes)

82

Subjective Approach

the probability of an even A is found simply by guessing or estimating its value based on knowledge of the relevant circumstances (educated guess)