Mid1 Flashcards Preview

Stats > Mid1 > Flashcards

Flashcards in Mid1 Deck (102):
1

Statistics

Way of reasoning, a collection of tools and methods designed to help us better understand the world

2

Descriptive stats

Methods for organizing and summarizing data

3

Inferential stats

Drawing conclusions about populations based on sample data

4

Nominal variable

Categories are unordered: color,gender,ethnicity

5

Ordinal variable

Categories are ordered (rate my professor, government officials)

6

Discrete quantity

Collection isolated points on the number line (shoe size, death toll)

7

Continuous quantity

Any value in an interval of numbers on the number line (age, weigh, car mileage)

8

Population parameter

Number or calculation that describes or summarizes a population

9

A sample should

Be representative of the entire population

10

Sample statistics

Number or calculation that describes or summarizes a sample

11

Take a sample and make an inference about ur population. This is

Statistics

12

Data analysis for one categorical variable

Bar chart, pie, frequency table

13

2 categorical variables data analysis

Contingency table

14

One quantitative variable data analysis

Box plot, stemplot, histogram

15

2 quantitative variables data analysis

Scatter plot

16

Exploratory data analysis helps describe

The distribution of a variable

17

Distribution tells us

What values a variable can take and how often it takes these values

18

Voluntary response sampling

People volunteer to be part of the sample

19

Convenience sampling

Selects people that are easiest to reach

20

Simple random sample (SRS)

Each member of the population has an equally likely chance of being in the sample

21

Stratified random sample

Population is first separated into groups with similar characteristics, strata. Then a simple random is done within each stratum. The samples are combined to make the full sample

22

Cluster sample

Population is divided into groups called clusters. We then randomly select clusters and measure all of the individuals with the selected clusters.

23

Systematic random sample

Population is divided into segments of the same length, a starting point is chosen, and then a sample is taken at the same point in each of the segments. (Ex: every 5th student is chosen on a roster)

24

Multistage random sample

Combining a variety of sampling methods

25

Sampling frame

List of items or subjects u wish to sample from

26

Sampling variability

Each sample will select different people and therefore different values for the measured variables ( no 2 samples will be identical)

27

Undercoverage bias

Entire population targeted is not reached because of the design of the sample. (Rate my professor)

28

Non-response

Individual selected as part of survey didn't respond

29

Response bias

Interviewees responses are influenced by the interviewer, confusing wording to provoke a certain response

30

Retrospective study

Looks back in time. (Using old medical records to study disease)

31

Prospective study

Looks forward in time. Following a group over time

32

Single blinding

Only patient is unaware

33

Double blinding

Both patients and evaluators are unaware

34

Placebo

Dummy treatment made from inactive substance

35

Placebo effect

When dummy treatment improves a patients condition simple because the patient has the expectation that it will

36

Statistically significant

When comparing two large groups, if the difference between the two is so large it would rarely occur by chance

37

Key concepts in experimental design

Control groups, replication, randomization, blocking

38

Block design

Groups of subjects are similar, purpose is to isolate variability between groups so we can see the effects of treatments more clearly. Allows for comparison of more than two treatments

39

Matched pairs

A reduced block design in which the block only contains two subjects. Researcher must choose pairs of subjects that are as closely matched as possible (twins). Then randomly assign them into treatment groups

40

Steps in determining statistical significance

1. Determine the null and alternative hypotheses
2. Summarize the data into test statistics after first verifying necessary data conditions met
3. Decide whether or not the result is statistically significant based on the cutoff value
4. Report conclusion in context of the situation

41

Null hypothesis

Ho: two variables are not related in the population

42

Alternative hypothesis

Ha: two variables are related in the population

43

Chi square statistic

Measures difference between the observed counts and the counts that would be expected if there were no relationship

44

If X chi square test is greater than critical value then

Since our test stat is greater than the critical value, we reject the null hypothesis and conclude the alternative is true. There is a relationship between the groups and categories

45

Chi square less than 3.841

Since our test stat is not greater than the critical value we fail to reject the null hypothesis and conclude the alternative is not true. There is not a relationship between the groups and the categories

46

X^2 formula

(Observed count-expected count)^2/Expected count

47

Weak line correlation negative or positive

R is greater than or equal to -0.35 and less than 0

R is greater than zero and less than or equal to 0.35

48

Moderate correlation

R is greater than .35 and less than or equal to .75

R is greater than or equal to -.75 and less than -.35

49

Strong correlation

R is greater than or equal to -1 and less than -.75

R is greater than .75 and less than or equal to 1

50

Linear model

A regression line is a straight line that models the linear relationship between an explanatory variable and a response variable. Only useful when one variable helps to predict the others

51

Leverage

X values far from the mean of x

52

Outliers

Any data point that stands away from the others

53

Influential points

Removing this point from the data set results in a very different regression model

54

Least-square regression line (LSRL)

Describes how a response variable y changes as an explanatory variable x changes

55

LSRL predicts

A response, y hat, from a given explanatory variable x

56

LSRL formul

Predicted y=y intercept + the slope of the line

57

Formula for y intercept

Y intercept= average of y - (slope times x average)

58

Slope formula

r(Sy/Sx)

59

Residual is

The difference between observed y and predicted y

60

Residual formula

e= y-(y hat)

61

If you make an over prediction for y then

e is negative

62

If u make an underprediction for y then

e is positive

63

Residual plot is

Scatter plot of the (x,residual) pairs (x,e)

64

If the model for a residual plot is a good fit then

The plot looks random

65

DO NOT USE linear regression if ur residual plot has these characteristics:

Unusually large values for your residuals

No linear patterns

Uneven variation(fanning)

Influential observations

66

R^2 values(Excellent-->Weak)

Excellent: r^2>80%
Good: 50%

67

Coefficient of determination formula

R^2

68

Correlation coefficient formula

r=square root(r^2)

69

Best way to display the distribution of a quantitive variable that is large dataset

Histogram or boxplots

70

Best way to display the distribution of a quantitive variable that is part of a small dataset

Stem plots

71

3 factors in describing the distribution of a quantitative variable

Shape, center, spread

72

Is mean resistant to outliers

No

73

Is median resistant to outliers

Yes

74

Z score function

How far our observation is from the mean in terms of SD's

75

A more negative z score means

The value is lower than the mean

76

A more positive z score means

The value is higher than the mean

77

When u draw a boxplot, where should the lower whisker end

At the smallest data point not smaller than the lower fence

78

When u draw a boxplot, where should the higher whisker end

At the largest value not bigger than the higher fence

79

In a bar graph the x and y values are

X:category
Y: relative frequency

80

In a histogram the y axis is

The amount of x variables

81

Interpret the slope of the equation
Y=10+ 2x

1 additional x is associated with an increase of 2 y.

82

Interpret the value of the coefficient of determination 0.65

There is a good predictive power. 65% of the variability of y can be explained by x

83

Interpret the value of the correlation coefficient r=.80

There is a strong positive linear relationship by x and y

84

How to calculated expected value of a table

(Total of category x total of the row)/n(the whole sample)

85

P(A and B) is independent if

It is equal to P(A)P(B)

86

Probability of A given B is independent given that it is equal to

P(A)

87

Probability of A or B is independent if it is equal to

P(A) + P(B) - P(A)P(B)

88

Probability of neither A or B happening

1-P(A and B)

89

Probability of not being in A given that you are in B

P(not A/B)=1-(probability of being in A given you are in B)
= 1-[P(A and B)/P(B)]

90

Joint proportion is synonymous with

Relative frequency or probability

91

When we sample without replace each time we take out one item the probabilities will change, why and how?

Number of total items available is changing and by decreasing the fraction by one (numerator and denominator)

92

How to draw a tree diagram

First the nonconditional probability then then conditional and then the "and" probabilities

93

What is the distribution of each sample

X~N(mean,SD)

94

Absorbance

The log value of the intensity of the incident light divided by intensity of transmitted light

95

What is comparing the activity in different classes of cones essential for color discrimination

The trichromatic human vision is based on relative levels of activity in 3 sets of cones that have opsins with different absorption spectra

96

What pigment genes are localized on one chromosome

3 cone pigments (opsins), for red and green

97

Does green opsin mean you can see green? Why or why not

No, different activity in cones helps discriminate between colors. You need m and l opsins to see green but level in activity in each cone is different. The brain compares the differential activity of the neurons to process the signals

98

Protanopia

Loss of long wavelength sensitive cones

99

What is the visual result of protanopia

Difficulty to discriminate between red and geeen

100

Deuteranopia

Loss of medium wavelength sensitive cones

101

Deuteranopia results in

Difficulty to discriminate between red and green

102

The three opsins

A-short wavelength(blue)
M- medium wavelength (green)
L-long wavelength (red)