Flashcards in Mid1 Deck (102):

1

## Statistics

### Way of reasoning, a collection of tools and methods designed to help us better understand the world

2

## Descriptive stats

### Methods for organizing and summarizing data

3

## Inferential stats

### Drawing conclusions about populations based on sample data

4

## Nominal variable

### Categories are unordered: color,gender,ethnicity

5

## Ordinal variable

### Categories are ordered (rate my professor, government officials)

6

## Discrete quantity

### Collection isolated points on the number line (shoe size, death toll)

7

## Continuous quantity

### Any value in an interval of numbers on the number line (age, weigh, car mileage)

8

## Population parameter

### Number or calculation that describes or summarizes a population

9

## A sample should

### Be representative of the entire population

10

## Sample statistics

### Number or calculation that describes or summarizes a sample

11

## Take a sample and make an inference about ur population. This is

### Statistics

12

## Data analysis for one categorical variable

### Bar chart, pie, frequency table

13

## 2 categorical variables data analysis

### Contingency table

14

## One quantitative variable data analysis

### Box plot, stemplot, histogram

15

## 2 quantitative variables data analysis

### Scatter plot

16

## Exploratory data analysis helps describe

### The distribution of a variable

17

## Distribution tells us

### What values a variable can take and how often it takes these values

18

## Voluntary response sampling

### People volunteer to be part of the sample

19

## Convenience sampling

### Selects people that are easiest to reach

20

## Simple random sample (SRS)

### Each member of the population has an equally likely chance of being in the sample

21

## Stratified random sample

### Population is first separated into groups with similar characteristics, strata. Then a simple random is done within each stratum. The samples are combined to make the full sample

22

## Cluster sample

### Population is divided into groups called clusters. We then randomly select clusters and measure all of the individuals with the selected clusters.

23

## Systematic random sample

### Population is divided into segments of the same length, a starting point is chosen, and then a sample is taken at the same point in each of the segments. (Ex: every 5th student is chosen on a roster)

24

## Multistage random sample

### Combining a variety of sampling methods

25

## Sampling frame

### List of items or subjects u wish to sample from

26

## Sampling variability

### Each sample will select different people and therefore different values for the measured variables ( no 2 samples will be identical)

27

## Undercoverage bias

### Entire population targeted is not reached because of the design of the sample. (Rate my professor)

28

## Non-response

### Individual selected as part of survey didn't respond

29

## Response bias

### Interviewees responses are influenced by the interviewer, confusing wording to provoke a certain response

30

## Retrospective study

### Looks back in time. (Using old medical records to study disease)

31

## Prospective study

### Looks forward in time. Following a group over time

32

## Single blinding

### Only patient is unaware

33

## Double blinding

### Both patients and evaluators are unaware

34

## Placebo

### Dummy treatment made from inactive substance

35

## Placebo effect

### When dummy treatment improves a patients condition simple because the patient has the expectation that it will

36

## Statistically significant

### When comparing two large groups, if the difference between the two is so large it would rarely occur by chance

37

## Key concepts in experimental design

### Control groups, replication, randomization, blocking

38

## Block design

### Groups of subjects are similar, purpose is to isolate variability between groups so we can see the effects of treatments more clearly. Allows for comparison of more than two treatments

39

## Matched pairs

### A reduced block design in which the block only contains two subjects. Researcher must choose pairs of subjects that are as closely matched as possible (twins). Then randomly assign them into treatment groups

40

## Steps in determining statistical significance

###
1. Determine the null and alternative hypotheses

2. Summarize the data into test statistics after first verifying necessary data conditions met

3. Decide whether or not the result is statistically significant based on the cutoff value

4. Report conclusion in context of the situation

41

## Null hypothesis

### Ho: two variables are not related in the population

42

## Alternative hypothesis

### Ha: two variables are related in the population

43

## Chi square statistic

### Measures difference between the observed counts and the counts that would be expected if there were no relationship

44

## If X chi square test is greater than critical value then

### Since our test stat is greater than the critical value, we reject the null hypothesis and conclude the alternative is true. There is a relationship between the groups and categories

45

## Chi square less than 3.841

### Since our test stat is not greater than the critical value we fail to reject the null hypothesis and conclude the alternative is not true. There is not a relationship between the groups and the categories

46

## X^2 formula

### (Observed count-expected count)^2/Expected count

47

## Weak line correlation negative or positive

###
R is greater than or equal to -0.35 and less than 0

R is greater than zero and less than or equal to 0.35

48

## Moderate correlation

###
R is greater than .35 and less than or equal to .75

R is greater than or equal to -.75 and less than -.35

49

## Strong correlation

###
R is greater than or equal to -1 and less than -.75

R is greater than .75 and less than or equal to 1

50

## Linear model

### A regression line is a straight line that models the linear relationship between an explanatory variable and a response variable. Only useful when one variable helps to predict the others

51

## Leverage

### X values far from the mean of x

52

## Outliers

### Any data point that stands away from the others

53

## Influential points

### Removing this point from the data set results in a very different regression model

54

## Least-square regression line (LSRL)

### Describes how a response variable y changes as an explanatory variable x changes

55

## LSRL predicts

### A response, y hat, from a given explanatory variable x

56

## LSRL formul

### Predicted y=y intercept + the slope of the line

57

## Formula for y intercept

### Y intercept= average of y - (slope times x average)

58

## Slope formula

### r(Sy/Sx)

59

## Residual is

### The difference between observed y and predicted y

60

## Residual formula

### e= y-(y hat)

61

## If you make an over prediction for y then

### e is negative

62

## If u make an underprediction for y then

### e is positive

63

## Residual plot is

### Scatter plot of the (x,residual) pairs (x,e)

64

## If the model for a residual plot is a good fit then

### The plot looks random

65

## DO NOT USE linear regression if ur residual plot has these characteristics:

###
Unusually large values for your residuals

No linear patterns

Uneven variation(fanning)

Influential observations

66

## R^2 values(Excellent-->Weak)

###
Excellent: r^2>80%

Good: 50%

67

## Coefficient of determination formula

### R^2

68

## Correlation coefficient formula

### r=square root(r^2)

69

## Best way to display the distribution of a quantitive variable that is large dataset

### Histogram or boxplots

70

## Best way to display the distribution of a quantitive variable that is part of a small dataset

### Stem plots

71

## 3 factors in describing the distribution of a quantitative variable

### Shape, center, spread

72

## Is mean resistant to outliers

### No

73

## Is median resistant to outliers

### Yes

74

## Z score function

### How far our observation is from the mean in terms of SD's

75

## A more negative z score means

### The value is lower than the mean

76

## A more positive z score means

### The value is higher than the mean

77

## When u draw a boxplot, where should the lower whisker end

### At the smallest data point not smaller than the lower fence

78

## When u draw a boxplot, where should the higher whisker end

### At the largest value not bigger than the higher fence

79

## In a bar graph the x and y values are

###
X:category

Y: relative frequency

80

## In a histogram the y axis is

### The amount of x variables

81

##
Interpret the slope of the equation

Y=10+ 2x

### 1 additional x is associated with an increase of 2 y.

82

## Interpret the value of the coefficient of determination 0.65

### There is a good predictive power. 65% of the variability of y can be explained by x

83

## Interpret the value of the correlation coefficient r=.80

### There is a strong positive linear relationship by x and y

84

## How to calculated expected value of a table

### (Total of category x total of the row)/n(the whole sample)

85

## P(A and B) is independent if

### It is equal to P(A)P(B)

86

## Probability of A given B is independent given that it is equal to

### P(A)

87

## Probability of A or B is independent if it is equal to

### P(A) + P(B) - P(A)P(B)

88

## Probability of neither A or B happening

### 1-P(A and B)

89

## Probability of not being in A given that you are in B

###
P(not A/B)=1-(probability of being in A given you are in B)

= 1-[P(A and B)/P(B)]

90

## Joint proportion is synonymous with

### Relative frequency or probability

91

## When we sample without replace each time we take out one item the probabilities will change, why and how?

### Number of total items available is changing and by decreasing the fraction by one (numerator and denominator)

92

## How to draw a tree diagram

### First the nonconditional probability then then conditional and then the "and" probabilities

93

## What is the distribution of each sample

### X~N(mean,SD)

94

## Absorbance

### The log value of the intensity of the incident light divided by intensity of transmitted light

95

## What is comparing the activity in different classes of cones essential for color discrimination

### The trichromatic human vision is based on relative levels of activity in 3 sets of cones that have opsins with different absorption spectra

96

## What pigment genes are localized on one chromosome

### 3 cone pigments (opsins), for red and green

97

## Does green opsin mean you can see green? Why or why not

### No, different activity in cones helps discriminate between colors. You need m and l opsins to see green but level in activity in each cone is different. The brain compares the differential activity of the neurons to process the signals

98

## Protanopia

### Loss of long wavelength sensitive cones

99

## What is the visual result of protanopia

### Difficulty to discriminate between red and geeen

100

## Deuteranopia

### Loss of medium wavelength sensitive cones

101

## Deuteranopia results in

### Difficulty to discriminate between red and green

102