Test 1 Flashcards

1
Q

What are statistics?

A

The science of reasoning with data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the two branches of statistics?

A

Descriptive - consists of methods of collecting, organizing, summarizing, and presenting data in an informative way

Inferential - a body of methods used to draw conclusions or inferences about characteristics of populations based on sample data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is data?

A

facts or numbers with context; the observed values of a variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the two types of data? Describe them.

A

Qualitative (Categorical) - places an individual into one of several groups. Can be a number, but this number doesn’t quantify anything.

Quantitative (Numerical) - takes numerical values for which arithmetic operations make sense (can be ordered and ranked)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the types of Qualitative (Categorical) data? Describe them.

A

Nominal - Data that consists of names, labels, or categories only. The data does not have a natural order.

Ordinal - Data that does have a natural order.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the types of Quantitative (Numerical) data?

A

Discrete - assumes values that can be counted

Continuous - can assume all values in between any two specific values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is an individual?

A

objects described by a set of data (aka subject, experimental unit)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a variable?

A

any characteristic of an individual

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a value?

A

a possible observation of a variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is population?

A

consists of all subjects or items of interest. It is the group being studied.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a parameter?

A

a numerical measurement describing a population (mean, mode, median, etc)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a census?

A

the collection of data from every member of a population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a sample?

A

a group selected from the population (a subset of the population)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a statistic?

A

a numerical measurement describing a sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the steps of data analysis

A

1) Collect the information needed
2) Organize and summarize the information
3) Draw conclusions from or analyze the information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the methods of collecting data?

A

retrospective studies, observational study, designed experiment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is a retrospective study including pros and cons?

A

using information or data that is readily available to us
pros - cheap, quick, easy
cons - no control over how or what data was collected

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is an observational study including pros and cons?

A

observes individuals or processes and measures variables of interest but does not attempt to influence the responses

pros - can detect associations between variables where the value of the variable have already been determined. Can also be used to study variables that are impossible or unethical to to control in a lab setting

cons - cannot isolate causes to determine causation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is a confounding (lurking) variable?

A

a variable that has not been accounted for but which is causing a difference in the groups being studied

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is a designed experiment including pros and cons?

A

isolate cause and effect by giving researcher full control; good blinding practices and placebos are encouraged

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What are the goals of experimental design?

A

replication, randomization, control of error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is a factor?

A

a treatment we may be interested in

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is a level?

A

values of factors

24
Q

What is a treatment combination (interaction)?

A

a specific combination of levels of each factor

25
Q

What is pairing?

A

finding a similar individual or unit to compare outcomes

26
Q

What are blocks?

A

multi-level pairing

27
Q

What are the goals when choosing a sample?

A
  • Sample is representative of population

- minimize the cost of obtaining the sample (money, time, personnel, etc)

28
Q

What is bias?

A

a tendency of produce an untrue value

29
Q

What are the types of bias? Describe them.

A

Measurement bias - results from asking questions that do not produce a true answer
Sampling bias - occurs when a sample is used that is not representative of the population

30
Q

What are sources of measurement bias?

A

1) self-reporting of personal data
2) confusing, leading, non-neutral survey questions
3) missing data, precise numbers, percentages, scales used

31
Q

What are sources of sampling bias?

A

1) voluntary response samples (internet polls)

2) convenience sampling (small samples, non-responses, incentivized sampling)

32
Q

What is simple random sampling?

A

Each individual in the population has the same chance of being chosen for the sample. Subjects are selected without replacement. (drawing out of a hat)

33
Q

What is systematic sampling?

A

uses chance to select a sample, based on known selection probabilities (like counting off)

34
Q

What is stratified sampling?

A

dividing a population into subgroups and then sampling equally from those subgroups

35
Q

What does a distribution allow us to do?

A

Examine the variation of the data

36
Q

What values are included in the 5 number summary

A

minimum, q1, median, q3, maximum

37
Q

Pros and cons of histograms?

A

pros - 1) good for large data sets

      2) help focus on the general shape of the data
      3) visual representation of a frequency table

cons - 1) individual data values are hidden

      2) distribution shape affected by change in bin width
       3) useful for quantitative data only
38
Q

How do we summarize numerical distributions?

A

S - Shape (Distribution)
O - Outliers
C - Center
S - Spread (Variation)

39
Q

What are the types of shapes of distributions?

A

symmetric - uniform, bell shaped, other symmetric shapes

asymmetric - right skewed, left skewed

unimodal, bimodal

40
Q

pros and cons of boxplots?

A

pros - good for comparing data sets, shows five number summary, identifies outliers

cons - does not show individual values

41
Q

pros and cons for stem and leaf?

A

pros - shows individual values and distribution of data

cons - not good for large data sets

42
Q

For a skewed distribution, does mean or median better represent the center?

A

median

43
Q

True of False: For a perfectly symmetric distribution mean and median are exactly the same.

A

True

44
Q

What is probability?

A

The chance of an event occurring

45
Q

What is a probability experiment?

A

a repeatable process where the results are uncertain

46
Q

What is an outcome?

A

the result of a single trial or a probability experiment

47
Q

What is an event?

A

an outcome or set of outcomes of a probability experiment

48
Q

What is a sample space?

A

set of all possible outcomes of a probability experiment. Usually denoted by S.

49
Q

What is a probability model?

A

a mathematical description of a random phenomenon. Includes a sample space and a way of assigning probabilities to events

50
Q

How to find probability using the classical approach.

A

P(E) = # of outcomes in E / total # of outcomes in the sample space

51
Q

What is the union of two events?

A

the event consisting of all outcomes that are contained in either of two events. Called E1 or E2.

52
Q

What is the intersection of two events?

A

The events consisting of all outcomes contained in both of two events. Called E1 and E2.

53
Q

What is the complement of an event?

A

The set of outcomes that are not contained in the event. E with bar over top.

54
Q

What is PMF?

A

Probability Mass Function; it gives the probability that a discrete random variable, X, is a specific value.

55
Q

What is CDF?

A

The cumulative distribution function; it gives the probability that a random, variable, X, is equal to or less than a certain value. The CDF for a continuous random variable can be plugged into directly and does not require and integral. It is the integral of the PDF.

56
Q

What is the PDF?

A

Probability Density Function; it gives the probability of a continuous random variable, X, occurring between A and B by integral from A to B of f(x)dx

57
Q

What do the probabilities on a z-score table represent?

A

The probability of the value less than the given x. (to the left)