(10) Sampling and Estimation Flashcards

1
Q

LOS 11. a: Define simple random sampling and a sampling distribution.

A

Simple random sampling is a method of selecting a sample in a way that each item or person in the population being studied has the same probability of being included in the sample. Each number is chosen using either of the following methods: random number generator or selecting every kth element

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

LOS 11. a: Define simple random sampling and a sampling distribution.

A

A sampling distribution is the distribution of all values that a sample statistic can take on when computed from samples of identical size randomly drawn from the same population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

LOS 11. b: Explain sampling error.

A

Sampling error is the difference between a sample statistic and its corresponding population parameter (e.g., the sample mean minus the population mean).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

LOS 11. c: Distinguish between simple random and stratified random sampling.

Also, what are the steps to create a stratified random sampling?

A

Stratified random sampling involves randomly selecting samples proportionally from subgroups that are formed based on one or more distinguishing characteristics, so that the sample will have the same distribution of these characteristics as the overall population.

Stratified random sampling reduces sampling error

Step 1: population is divided into sub-populations

Step 2: Simple random samples are dranw from each strata in proportion to their size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

LOS 11. d: Distinguish between time-series and cross-sectional data.

A

Time-series data consists of observations taken at specific and equally spaced points in time. This is only for one observational unit

Ex of time series: ABC daily stock prices

Cross-sectional data consists of observations taken at a single point in time. This includes many observational units.

Ex of cross-sectional: Free cash flow/ debt ratio for U.S Industrials

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

LOS 11. e: Explain the central limit theorem and its importance.

A

The central limit theorem states that for a population with a mean µ and a finite variance σ2, the sampling distribution of the sample mean for all possible samples of size n (for n >= 30) will be approximately normally distributed with a mean equal to µ and a variance equal to σ2/n.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

LOS 11. f: Calculate and interpret the standard error of the sample mean.

A

The standard error of the sample mean is the standard deviation of the distribution of the sample means and is calculated as:

σXbar = s/(n1/2), where σ, the population standard deviation, is known

sx = s/(n1/2), where s, the sample standard deviation, is used because the population standard deviation is unknown.

As n increases, SE will decrease

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

LOS 11. g: Identify and describe desirable properties of an estimator.

A

Desirable statistic properties of an estimator include:

  1. Unbiasedness (sign of estimation error is random; the expected value of the estimator equals the parameter being estimated),
  2. Efficiency (lower sampling error than any other unbiased estimator)
  3. Consistency (variance of sampling error decreases and mean increases with sample size increases).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

LOS 11. h: Distinguish between a point estimate and a confidence interval estimate of a population parameter.

A

Point estimates are single value estimates of population parameters. An estimator is a formula used to compute a point estimate.

Formula is Sample mean + or - (reliability factor x standard error); where reliability factor is Z(a/2)

Z (a/2) = 1.65 for 90% CI; 1.96 for 95% CI; 2.58 for 99% CI

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

LOS 11. h: Distinguish between a point estimate and a confidence interval estimate of a population parameter.

A

A range within which we can assert, with probability of 1 - a, the degree of confidence that the range will contain the parameter.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

LOS 11. h: Distinguish between a point estimate and a confidence interval estimate of a population parameter. The reliability factor.

A

The reliability factor is a number that depends on the sampling distribution of the point estimate and the probability that the point estimate falls on the confidence interval.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

LOS 11. i: Describe properties of Student’s t-distribution and calculate and interpret its degrees of freedom.

A

Use this when the following is present: Sample less than 30 and normal distribution with unknown variance;

Defined by a single parameter => degrees of freedom = n - 1

Lower peak than normal, fatter tails

Degrees of freedom for the t-distirbution are equal to n-1. Student’s t-distribution is closer to the normal distribution when df is greater, and confidence intervals are narrower when df is greater.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

LOS 11. j: Calculate and interpret a confidence interval for a population mean, given a normal distribution with 1) a known population variance, 2) an unknown population variance, or 3) an unknown variance and a large sample size.

A

For a normally distributed population, a confidence interval for its mean can be constructed using a z-statistic when variance is known, and a t-statistic whne the variance is unknown. The z-statistic is acceptable in the case of a normal population with an unknown variance if the sample size is large (30+).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

LOS 11. j: Calculate and interpret a confidence interval for a population mean, given a normal distribution with 1) a known population variance, 2) an unknown population variance, or 3) an unknown variance and a large sample size. Chart.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

LOS 11. k: Describe the issues regarding selection of the appropriate sample size, data-mining bias, sample selection bias, survivorship bias, look-ahead bias, and time-period bias.

A

Increasing the sample size will generally improve parameter estimates and narrow confidence intervals. The cost of more data must be weighted against these benefits, and adding data that is not generated by the same distribution will not necessarily improve accuracy or narrow confidence intervals.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

LOS 11. k: Describe the issues regarding selection of the appropriate sample size, data-mining bias, sample selection bias, survivorship bias, look-ahead bias, and time-period bias.

A

The practice of hitting a data set over and over again until you hit gold. Typically, not motivated by a theory (hypothesis). Result of data narrowing. Data mining (significant relationships that have occurred by chance),

17
Q

LOS 11. k: Describe the issues regarding selection of the appropriate sample size, data-mining bias, sample selection bias, survivorship bias, look-ahead bias, and time-period bias.

A

The exclusion of certain data/variables due to unavailability (This makes it non-random)

18
Q

LOS 11. k: Describe the issues regarding selection of the appropriate sample size, data-mining bias, sample selection bias, survivorship bias, look-ahead bias, and time-period bias.

A

survivorship bias (using only surviving mutual funds, hedge funds, ect.),

19
Q

LOS 11. k: Describe the issues regarding selection of the appropriate sample size, data-mining bias, sample selection bias, survivorship bias, look-ahead bias, and time-period bias.

A

look-ahead bias (basing the test at a point in time on data not available at that time)

Mismatch between the timing of observations among variables (i.e. stock prices/returns vs accounting data)

20
Q

LOS 11. k: Describe the issues regarding selection of the appropriate sample size, data-mining bias, sample selection bias, survivorship bias, look-ahead bias, and time-period bias.

A

time-period bias (the relation does not hold over other time periods).

21
Q

Calculate Z score

A

(Sample mean or average return - population mean) / (Standard deviation / square root of sample size)

22
Q

Unbiased estimator:

A

One in which the expected value is equal to the parameter that it is intended to estimate