Deck 1 Flashcards

(58 cards)

1
Q

Distribution skewed to the left

A

A distribution is skewed to the left (negatively skewed) if the left tail is longer than the right tail (e.g. lifespan)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Distribution skewed to the right

A

A distribution is skewed to the right (positively skewed) if the right tail is longer than the left tail (e.g. wealth)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

cross sectional data

A

where observations are made at a single point in time. The unit of observation is, for example, individuals, industries, regions, countries.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

time series data

A

where observations on the same variables are taken repeatedly over time. The unit of observation is a period of time, for example, annual, monthly, weekly, each minute.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is an experiment

A

an activity such as tossing a coin, which has a range of possible outcomes​

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is a trial

A

a single performance of the experiment​

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is a sample space

A

all possible outcomes of the experiment. For a single tossof a coin the sample space is {Heads, Tails}.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

mutually exclusive outcomes

A

onlyone can arise from a single trial. You cannot get bothheads and tails by tossing a coin!​

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

event

A

a combination of outcomes, for example, rolling a die andobtaining an even number; a roulette wheel ball landingon one of the red numbers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

rules of probability

A
  • A probability always lies between zero and one
  • The sum of the probabilities of all outcomes in the sample space must equal one
  • The probability of an outcome not occurring must equal one minus the probability of an outcome occurring
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

compound events

A

Compound events are when we combine events and try to find a joint probability
- and/or events

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

non-disjoint event

A

events that can both occur at the same time. In this case, their set of outcomes within the sample space overlap.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what are ‘or’ events

A

compound events where event a or event b can take place, we need to add the probabilities

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what are ‘and’ events

A

events are not mutually exclusive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

random variables

A

A random variable is a variable whose outcome (value it takes) is, at least to some extent, a result of chance, and therefore unpredictable (not perfectly predictable).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

expected value

A

the mean of a probability distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

binomial distribution

A
  • When the underlying probability experiment has only two possible outcomes
  • Used for repeated trials
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

normal distribution

A
  • When many small independent factors influence a variable
  • Use the Normal Distribution to answer questions about the probability of a random variable falling within a range
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

poisson distribution

A
  • For rare events, when the probability of occurrence is low
  • It is used for binary outcomes (event happens or does not), like the Binomial
  • Use in place of the Normal Approximation to the Binomial when nP < 5
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

central limit theorem

A

If the sample size is large (n > 30) the population does not have to be Normally distributed, the sample mean is (approximately) Normal whatever the shape of the population distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Different kinds of sampling

A
  • Simple random sampling
  • Stratified sampling
  • Cluster sampling
  • Multi-stage sampling.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

the sampling frame

A

The sampling frame is the list of subjects in the population from which the sample is taken, ideally it lists the entire population of interest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

simple random sampling

A
  • Need a list of the population, then randomly select sample observations from that list
  • Every member of the population has an equal chance of being selected.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

stratified sampling

A
  • Simple random sampling can lead to unrepresentative samples, if we are unlucky.
  • If there is an important variable that affects the main outcome of interest, then we want to see that variable represented appropriately in our sample.
25
advantages of stratified sampling
- Rules out ‘bad’ (i.e. unrepresentative) samples – so less chance of an extreme estimate. - You can include in your sample enough subjects in each strata you want to evaluate. - Provides greater precision for the estimates from the sample, since making use of additional information.
26
Disadvantages of stratified sampling
- Need that information in the first place. Need to know: - The variable that affects the outcome - The proportion in each category of that variable in the population. - The value of that variable for each member of the sampling frame.
27
cluster sampling
- Used to save cost of data collection. - Using a simple random sample across a wide geographical area can be expensive
28
advantages of cluster sampling
- Saves money - Is most advantageous when the clusters are similar to each other, but there is variation within clusters in terms of their characteristics.
29
Disadvantages of cluster sampling
Though still providing unbiased estimates, they will be less precise than those from a simple or stratified sample (can compensate for this by increasing the sample size).
30
multi-stage sampling
- Complicated survey design can use a combination of the above – known as multi-stage sampling. - Could stratify the clusters before selecting them, to make sure we get a representative sample of clusters. - Within each chosen cluster, could then use simple random sampling to select the respondents, or further stratify within clusters.
31
convenience sampling
a type of survey sample that is easy to obtain, but non-random.
32
volunteer sampling
most common form of convenience sample.
33
experimental units
The subjects of an experiment; the entities that we measure in an experiment.
34
Treatment
A specific experimental condition imposed on the subjects of the study; the treatments correspond to assigned values of the explanatory variable.
35
Explanatory variable
Defines the groups to be compared amongst the experimental units.
36
resposne variable
The outcome measured amongst the experimental units, to reveal the effect of the treatment.
37
issues with running experiments
- Time consuming - Expensive - Raise ethical issues
38
quasi experiments
It would be unethical to deliberately subject (“treat”) individuals to radioactivity, just to test its effects. So what do economists do? We cannot randomise pregnant women to getting exposed to radiation, but at a point in time some women may have been randomly exposed to radiation FOR REAL more than others, for instance, as a consequence of the Chernobyl disaster.
39
representativeness
Recall that the goal of experimentation is to analyze the association between the treatment and the response for the population, not just the sample. However, care should be taken to generalize the results of a study only to the population that is represented by the study (i.e. the population from which the sample of subjects was drawn)
40
replication
Replication means repeating the experiment on different samples. Since it is always possible that the observed effects were due to chance alone, replicating the experiment also builds confidence in our conclusions. Replication allows us to attribute observed effects to the treatments rather than to ordinary variability.
41
placebo effect
an improvement in outcomes (employment) due not to any treatment but only to the subject’s belief that he or she will improve.
42
control groups
The control group provide an estimate of the counterfactual, of what would have happened in the absence of treatment.
43
how to make a good table
- use descriptive headings - give units of measurements in headings - sizing - be consistent with presentation - use contrast - round numbers - align numbers - provide column/row totals - don't report more data than needed - ordering - spacing
44
univariate graphs - discrete variables
- A univariate graph only considers one variable - a discrete variable takes a limited number of values or categories
45
bivariate graph - discrete variables
- contains information on two variables - provides info on the relationship between the two variables
46
univariate graphs - continuous variables
- can be divided into bands - wider the band = wider the group
47
estimation
the process of using sample data to draw inferences about the population
48
point estimate
a single value
49
interval estimate
a range of values
50
unbiased
correct an average - the expected value of the estimate is equal to the true value
51
precise
small sampling variance - the estimate is close to the true value
52
hypothesis testing
HT is about deciding whether a hypothesis is true or false
53
type 1 error
rejecting the null when true
54
type 2 error
failling to reject the null when false
55
between sum of squares - BSS
the variance due to differences between factors
56
within sum of squares - WSS
the variance due to differences within factors
57
limitations of correlation
- splititing sample into two or more groups and comparing means is a good first step but doesn't give us the full answer - we need a way of using all the data to estimate the association between two variables
58
regression
regression measures the effect of x and y and whether it is statistically significant. - where x is the explanatory variable and y is the dependent variable