Exam 1 Flashcards

(91 cards)

1
Q

Data file

A

the format in which statistical format is organized, typically in spreadsheet form. Rows contain measurements for a particular subject, columns contain measurements for a particular characteristic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Simulation

A

use of a computer to mimic what would actually happen if you selected a sample and used statistics in real life. These are done when it is not practical to physically perform an experiment. Probability sampling is used in designing simulations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Response variable

A

variable we are interested in measuring

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

component

A

what you are simulating through use of a random device

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

trial

A

One repetition of a simulation/experiment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Steps for building simulations

A
  1. Identify component to be repeated/simulated
  2. Explain how you will model the component’s outcome
  3. State response variable clearly
  4. Explain how to combine the components into a trial to model the response variable
  5. Run several trials
  6. Collect and summarize the results of the trials
  7. State your conclusion
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

3 reason for studying stats

A
  1. being informed
  2. making good decisions
  3. evaluate decisions that affect you
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Definition of statistics

A

The science of learning from data in the presence of variability. variability is everywhere

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Statistical problem solving process

A
  1. formulate a statistical research question
  2. collect data
  3. analyze data
  4. interpret results
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Main components of statistics

A
  1. design: plan on how to obtain data to answer the question
  2. description: summarize and analyze the data
  3. probability: determine how sample differs from population
  4. Inference: make decisions and predictions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Variable

A

any characteristic observed in a study

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

data

A

the values of a variable for one or more people or things

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Observation

A

(subject) an individual piece of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

data set

A

the collection of all observations for a particular variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Categorical variable

A

(qualitative) Non-numerical variable with different categories, can still be a number depending on what that number represents

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Quantitative variable(and types)

A

a numerical variable

Types
1. Discrete: values form a set of separate numbers. Typically something we count

  1. continuous: values form a continuum of values, infinite number of possible values. Typically something we measure
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Reasons for identifying different data types

A
  1. Choose appropriate graphical display

2. Choose correct statistical method for inferential procedures

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

W’a and H for data

A

How, What, Where, When, Why, Who

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Frequency distribution

A

A listing of distinct categories and their frequencies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Relative frequency distribution

A

A listing of distinct values and their relative frequencies(proportions and percentages). Used to compare samples of unequal size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Joint event

A

Event with two or more characteristics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

How to tell if there is an association or not?

A

Association: relative frequencies differ

No association: relative frequencies are similar

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Dot plots

A
  • easy to make
  • useful for comparing 2 or more data sets
  • display individual values of data set
  • good for smaller data sets
  • shows raw data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Stem plots

A
  • not useful with large data sets
  • Usually displays more info than histograms
  • include raw data
  • useful for comparing 2 or more data sets
  • Have “stem”(can have more than one digit) and “leaf” can not have more than one digit
  • arranged in ascending order
  • must have a key
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Histogtams
* analogous to bar charts * horizontal axis has classes of quantitative data * frequency, relative frequency or percent * bars touch * good for larger data sets * good if you need more flexibility
26
Time plots
* show changes over time * vertical axes show each observation * horizontal axes show time when observation was measured * trends can be seen by connecting points
27
what does “n” usually indicate?
sample size
28
Which measures of center are resistant to the outliers and which arent?
* Resistant: Median | * Not resistant: Mean
29
Which measures of center are useful with quantitative data and which are useful with qualitative/categorical data?
Mean and median can only be used with quantitative data. Mode can be used with both
30
What can you know about the distribution if the mean is greater than median? What about if the is less than the median?
Mean is greater: right skewed Mean is less than: left skewed
31
Measures of variation(purpose and types)
Indicate amount of spread in a distribution types 1. Range: if you dont know this youre screwed 2. standard deviation: accounts for all observations, indicates how far on average observations lie from the mean, not resistant to outliers 3. Interquartile range(IQR): Quartiles of data, used with boxplotd
32
which types of graphical displays are for quantitative data?
1. dot plots 2. stem and leaf plots 3. histograms 4. time plots
33
Graphical displays for categorical data
1. Frequency distribution 2. Relative frequency distributions 3. Pie charts: use relative frequencies, aka circle graph, difficult to construct by hand, best for data sets for few categories 4. Bar charts: easiest way to graph, horizontal axis is distinct values of categorical data, vertical axis is frequencies or relative frequencies 5. Pareto charts: bar graph with bars from tallest to shortest
34
Response variable
measured to make comparisons between groups
35
Explanatory variable
(predictor) explains the value of response values
36
Association
relationship between 2 variables
37
Contingency table
Frequency distribution for bivariate data, also called a two way or cross tabulation table
38
Conditional proportions
Proportions based on the explanatory variable for categories of the response variables
39
Empirical rule
Applies to bell shaped distributions 68% of data falls within 1 standard deviation the mean 95% falls within 2 standard deviations 99.7% falls within 3
40
Percentile
* measure of relative standing * indicate the below which a certain percentage of observations fall * resistant to outliers * often preferred over mean and STD * Divides data into 100 equal parts, there are 99 percentiles
41
Types of percentiles
1. Deciles: divide data into tenths 2. Quartiles: divide data into fourths •1st quartile: aka lower quartile, median of lower half of data, divides lower 25% and upper 75% •Second quartile: median •Third quartile: divides bottom 75% from top 25%
42
5 number summary and it’s graph
1. Minimum 2. Q1 3. Median 4. Q3 5. Maximum represented by a boxplot
43
Interquartile range
* Preferred measure of variation when median is used * IQR=Q3-Q1 * more resistant to outliers
44
Finding potential outliers with IQR
1. less than Q1-1.5•IQR | 2. greater than Q3+1.5•IQR
45
Difference between potential outlier and outliers
and outlier is far removed from the rest of the data
46
SOCS
* Acronym for Shape, Outliers, center, spread | * Use to describe distributions of quantitative data
47
Components of graph shape
Modality: #of peaks, can be unimodal, binodal or multimodal Skewedness and symmetry
48
Outlier criterion using z scores
z>|3|
49
How to know whether to use mean or median for measure of center
* Use mean of possible because it takes into account of actual observations * mean is good for symmetric observations with a small number of discrete values * median is good for skewed distributions when potential outliers are oresent
50
What is report with the mean? median?
Mean and standard deviation are reported together while IQR and range are reported with median
51
Probability
The science of uncertainty, used to evaluate and control the likelihood that a statistical inference is correct. It quantified uncertainty
52
Types of probability
1. Subjective: guessing a probability based off personal judgement 2. Theoretical: Based on formulas 3. Experimental/empirical: results of a random experiment
53
Common cutoff values for an event to be considered “unusual”
1%, 5%, 10%(mainly 5%)
54
Law of large numbers
The probability of an event is the proportion of times it occurs in a large number of repetitions in an experiment. Aka frequentist interpretation. Ignores black swan events. Helps understand and visualize meaning of probability
55
Sample space
all possible outcomes for an experiment
56
Ways to visualize a sample space
Tree diagram or venn diagram
57
Event
A subset of the sample space. A collection of 1 or more outcomes
58
Complement of an event
* Event that does not occur * denoted as A^c * P(A^c)=1-P(A)
59
Disjoint events
* aka mutually exclusive events * events that do not have any outcomes in common * events that cant happen at the same time * compliment events are disjoint
60
Intersection
* consists of outcomes that are in both events, the overlap | * disjoint events: P(A and B)=0
61
Union
* A or B | * Out comes that are in one or the other
62
P(A or B)
Disjoint: = P(A)+P(B) | Not disjoint: = P(A)+P(B)-P(A and B)
63
Conditional probability
The probability of an event occurring when you know that another event has occurred P(A|B)=P(A and B)/P(B) Probability that event A will occur given that B has occurred. We are conditioning event B, meaning it occurred first
64
Formula for intersection of two events using conditional probability
P(A and B)= P(A)•P(B|A) P(A and B)=P(B)•P(A|B)
65
Methods for determining if events are independent
1. P(A|B)=P(A) 2. P(B|A)=P(B) 3. P(A and B)=P(A)•P(B)
66
Sensitivity
The probability that the test will give a positive result, given that the condition tested for is present P(Positive result|condition present)
67
Specificity
The probability that the test will give a negative result, given that the condition tested for is not present P(Negative result|Condition isnt present)
68
Parameter
* Numerical summary of a population * Numerical summary of a probability distribution * Denoted by greek letters
69
Random variable
A numerical measurement of the outcome of a random event
70
Expected value
the mean
71
Mean of a discrete probability distribution
mean=x•p(x) | repeat “x•p(x)” for each sample
72
What type of graph represents continuous distributions?
A curved graph
73
Normal distribution
* used for continuous random variables | * symmetric and bell shaped
74
Properties of empirical rule
1. Data must be unimodal and approximately bell-shaped | 2. Probabilities are approximate
75
Rounding rules when working with normal distributions
Round to 4 decimal places
76
Conditions for binomial dostribution
1. Fixed number of trials(n) 2. each trial has 2 possible outcomes 3. the probability of success (p) is the same for each trial 4: Trials are independent
77
What happens to a binomial distribution if p isnt 0.50?
p<0.5: right skewed p>0.5: left skewed
78
How do you know if n is large enough in a binomial distribution?
np> or equal to 15 and 1-p=15
79
Mean and standard deviation formulas for binomial distributions
Mean=np Std=/np(1-p)
80
Ways to obtain information
census, sampling, experimentation
81
Mean and median in symmetric distributions
Mean and median can be used, they should be close in value
82
What is spread measured by?
Standard deviation and IQR
83
How to gage symmetry
Look at how different the mean and median are
84
What type of statistics is probability?
Inferential
85
How do you measure spread for discrete random variables?
Range
86
What is used to find the center of probability distributions?
Mean
87
Purpose of descriptive statistics
Reduce the data to simple summaries without distorting too much information
88
Types of proportion distributions
1. Population distribution: almost never observed, we learn about it from sample distributions 2. Sample distribution: aka data distribution, consists of sample data you observe and analyze, should resemble population distribution if good sampling techniques were used 3. Sampling distributions: Describes long run behavior of the statistic, specifies probabilities for all possible values of the statistic for a sample in a given sizr
89
How to tell if a sampling distribution is normal?
n•p and n(1-p) are at least 15
90
Central limit theorem assumptions and conditions for the sampling distribution of p
1. Randomization condition: values are randomly obtained 2. Independence assumption: Sampled values are independent 3. 10% condition: n is no more than 10% of the population 4. Sample size assumption: n has to be large enough to expect at least 15 successes and failures
91
Central limit theorem assumptions and conditions for the sampling distributions of the mean of observations
1. Randomization condition: values are sampled randomly 2. Independence assumption: sampled values are independent 3. 10% condition: n is no more than 10% of the population 4. Sample size assumption: There is no one size fits all rule, small samples work if population is unimodal and symmetric, large sample is need if skewed