Flashcards in Exam #1 Deck (82):

1

## Statistics

### a collection of methods for planning studies and experiments, obtaining data, then organizing, summarizing, presenting, analyzing, interpreting, and drawing conclusions from data.

2

## Data

### observations that have been collected, such as measurements or responses

3

## Population

### the complete collection of all measurements or data that are being considered

4

## Census

### the collection of data from every member of the population

5

## Sample

### a sub-collection of members selected from a population

6

## Parameter

### a numerical measurement describing some characteristic of a population

7

## Statistic

### A numerical measurement describing some characteristic of a sample

8

## Quantitative Data

### (numerical data) consists of number representing counts or measurements

9

## Categorical Data

### (qualitative data) consists of names or labels that can be separated into different categories distinguished by some non-numerical characteristic

10

## Discrete Data

### the data may take on any of a finite or "countable" number of possible values (example: how many eggs does a hen lay in a day?...can be counted: 1, 2, 3, etc)

11

## Continuous Data

### the data may take on any value over a continuous range of infinitely many possible values (example: How much milk does a cow give in a day?...cannot count milks: could be 1.666666........)

12

## Voluntary Response Sample

### the respondents themselves decide whether or not to be included in the sample (such as a phone survey). The people who choose to participate may have different opinions/characteristics than those who do not.

13

## Reported Results

### the subjects of the study are asked to report results about themselves, which may lead to them providing desired results; it is better for the researcher to measure the data.

14

## Small samples

### a conclusion is drawn about a large population based on a sample of a small number of subjects, which may or may not represent the population as a whole.

15

## Loaded Questions

### a question used in a survey may contain language which influences the subject's response

16

## Order of Questions or Words

### the order of the words in a question or of the possible answers may affect the response

17

## Nonresponse

### some of the subjects either refuse to respond to a question or are unavailable; this may skew results because those who refuse to talk are likely to be different from those who are willing to talk (this may lead to a voluntary response sample).

18

## Self-Interest Study

### a statistical study is sponsored (paid for) by a party that is trying to promote its own interest.

19

## What are the 7 Potential Statistical Flaws?

###
1. Voluntary Response Sample

2. Reported Results

3. Small Samples

4. Loaded Questions

5. Order of Questions or Words

6. Nonresponse

7. Self-Interest Study

20

## Observational Study

### a study in which we observe and measure specific characteristics but do not attempt to modify the subjects of the study

21

## Experiment

### a study in which we apply some treatment to the subjects, then watch and observe the effects of that treatment

22

## What are the 6 different sampling methods?

###
1. Random sample (not specific - could be used in conjunction with the others)

2. Simple random sample

3. Systematic sampling

4. Convenience sampling

5. Stratified sampling

6. Cluster sampling

23

## Random sample

### members from the population are selected in such a way that each individual member has an equal chance of being selected

24

## Simple random sample

### a sample of size n is selected in such a way that every possible sample of size n has an equal chance of being selected (BEST METHOD)

25

## Systematic sampling

### a starting point is selected randomly, then every kth element in the population is selected

26

## Convenience sampling

### data is collected that is easy to access

27

## Stratified sampling

### the population is divided into at least two different sub-groups that share the same characteristics, then a sample is taken from each subgroup

28

## Cluster sampling

### the population is divided into "clusters" based on locations, then some of those clusters are selected randomly and all the members from the selected clusters are samples

29

## What are the 2 sampling errors?

###
1. Sampling error - the difference between the sample result and the true population result: such an error results naturally from sample fluctuations

2. Non-sampling error - the result of human error, including such factors as wrong data entries, computing errors, questions with biased wording, false data provided by respondents, etc.

30

## What are 4 characteristics of data?

###
1. Center

2. Variation

3. Distribution

4. Outliers

31

## Center

### a representative value that indicates where the middle of the data set is located (average value)

32

## Variation

### a measure of the amount that the data values vary

33

## Distribution

### the nature or shape of the spread of the data over the range of values

34

## Outliers

### sample values that lie very far away from the vast majority of the other sample values

35

## Frequency Distribution

### a table listing data values (either individually or by class) along with their corresponding frequencies.

36

## Lower class limits

### the smallest numbers that can belong to each of the classes

37

## Upper class limits

### the largest numbers that can belong to each of the classes

38

## Class midpoints

### the midpoints of the classes (found by adding the lower class limit to the upper class limit, then dividing by two.)

39

## Class width

### the difference between two consecutive lower class limits (found by subtracting two consecutive lower class limits)

40

## What is the procedure for constructing a frequency distribution?

###
1. Decide how many classes you want to use.

2. Calculate the class width (class width= largest value-smallest value/# of classes)

3. Choose the lower limit of the first class (should be the smallest data value or just below)

4. List all lower class limits

5. List all upper class limits

6. Tally up the date in each class to find the frequency

41

## Relative Frequency Distribution

### a frequency distribution in which frequencies are replaced by relative frequencies (proportions or percentages)

42

## How do you calculate relative frequency?

### Relative frequency = frequency/total number of observations

43

## Histogram

### a bar graph in which the horizontal axis represents the data classes, the vertical scale represents frequencies, and the height of the bars correspond to the frequencies of the classes

44

## Normal (bell shaped) distribution form

### the highest frequency occurs in the middle and the frequencies tail off to the left and right

45

## Uniform

### the frequencies of the classes are equal

46

## Skewed right or left

### the histogram is not symmetric, with higher frequencies occurring on one side than on the other

47

## What are 4 types of statistical graphs?

###
1. Frequency Polygon

2. Stemplot

3. Pareto Charts

4. Pie Chart

48

## Stemplot

### a table display in which quantitative data is separated into two parts - the stem (the leftmost digits) and the leaf (the rightmost digit) *benefit: helps preserve the original data

49

## Pareto Charts

### a bar graph used for qualitative data. The data values are arranged with the highest frequency values to the left and the lowest frequency values to the right.

50

## Measure of Center

### a value at the center or middle of a data set

51

## What are the 4 measures of center?

###
1. Mean (calculated by adding all the data values then dividing by the number of values

2. Median (calculated by arranging the data values in ascending order, then selecting the middle value from the list

3. Mode (data value that occurs most frequently in the data set)

4. Midrange - the value midway between the lowest and highest values in the data set (calculate by adding the smallest and largest value then dividing by 2)

52

## What is the relationship between the mean, median, and mode in a normal distribution?

### mean=median=mode

53

## What is the relationship between the mean, median, and mode in a skewed right distribution?

### mode

54

## What is the relationship between the mean, median, and mode in a skewed left distribution?

### mean

55

## Variation

### the extent to which data values vary from each other

56

## Range

### the difference between the maximum and minimum values of a data set

57

## How do you calculate the range?

### Range = maximum data value - minimum data value

58

## Standard Deviation

### a measure of how much the data values deviate from the mean

59

## What is the procedure to calculate the standard deviation of sample data?

###
1. Compute the sample mean

2. Subtract the mean from each data value, to obtain a list of deviations of the form (x-mean)

3. Square each of the differences obtained in step 2

4. Find the sum of all the squared values

5. Divide the sum by n-1, where n is the number of values in the sample

6. Take the square root

60

## What is the procedure to calculate the standard deviation of a population?

### Same formula as for the standard deviation of sample data except you divide the sum by N rather than n-1

61

## Variance

### a measure of variation equal to the square of the standard deviation

62

## How do you calculate variance?

### Use the same formula as the standard deviation but not take the square root.

63

## Empirical Rule

###
Characteristics that apply to data sets whose distributions are approximately bell-shaped:

1. about 68% of all data values fall within 1 s.d. (standard deviation) of the mean.

2. about 95% of all data values fall within 2 s.d. of the mean

3. about 99.7% of all data values fall within 3 s.d. of the mean

64

## What data values are considered unusual?

### Any values outside of 2 s.d.

65

## How do you find unusual values? What two values do you need to know?

###
Need to know the mean and the standard deviation.

Mean + 2(Standard Deviation) = max normal values

Mean - 2(standard Deviation) = min normal values

66

## Coefficient of Variation (definition and formula)

### describes the standard deviation as a percentage of the mean (CV = standard deviation/mean x 100%)

67

## Z Score (definition and formula)

### (standardized value) the number of standard deviations that a given data value is above or below the mean (z = x - mean/standard deviation)

68

## How do you identify unusual values using z Scores?

### Unusual values: z score < -2, z score > 2

69

## Percentiles

### numbers which divide a set of data into 100 groups with about 1% of the values in each group.

70

## If someone is in the 99th percentile, they would be in what percent of something?

### They would be in the top 1%

71

## If someone scored in the 68th percentile, what does that mean?

### they did better than 68% of the class

72

## How do you calculate the percentile corresponding to a data value?

### Percentile of value x = (number of values less than x/total number of values) x 100

73

## Quartiles

### numbers which divide a set of data into four groups, with about 25% of the values in each group

74

## 5-number summary

### a list of the minimum value, the three quartiles, and the maximum value for a data set

75

## Boxplot

### A graph of the data set consisting of a line extending from the minimum value to the maximum value and a box with lines drawn at each of the three quartiles

76

## Procedure

### a process with uncertain results that can be repeated (ex: rolling a die, measuring the head circumference of a baby)

77

## Event

### any collection of results or outcomes of a procedure

78

## Sample Space

### the set of all possible outcomes for a procedure

79

## What does P(A) mean?

### The probability of event A occurring

80

## Relative Frequency Approximation

###
conduct or observe a procedure a large number of times, and count the number of occurrences of event A. the P(A) can be estimated as the number of times A occurred divided by the total number of trials.

(P(A) = number of times event A occurred/number of times the procedure was repeated)

81

## Classical approach

### (most commonly used) assume that a procedure has n different outcomes each with an equal chance of occurring. (P(A) = number of ways A can occur/number of possible outcomes)

82