CRP 109 stats Lecture 2 Flashcards

1
Q

z Score

A

-The number of standard deviations that a given value x is above or
below the mean.
-Round z scores to two decimal places.
-It is expressed as numbers with no units of measurement.
-If an individual data value is less than the mean, its corresponding z
score is a negative
-Units have now been converted to “standard deviations away from the
mean” and can thus be compared

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Random Variable

A

A variable, typically represented by x , that has a single
numerical value, determined by chance, for each outcome of a
procedure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Discrete Random Variable

A

Has a collection of values that is finite or
countable (even theoretically)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Continuous Random Variable

A

A collection of values that has infinitely
many values, and is not countable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Probability Distribution

A

gives the probability for each
value of the random variable
-We use 0+ to represent a probability value that is
positive but very small. Rounding to 0 would be
misleading because it would incorrectly suggest that the
event is impossible

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Probability Distribution Requirements

A

-There is a numerical (not categorical) random variable x , and its
number values are associated with corresponding probabilities
-sum of P(x) = 1
-P(x) is between 0 and 1 inclusive for all values of x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Probability Histogram

A

-vertical scale shows probabilities instead of relative frequencies based on actual sample results.
-The areas of the rectangles are the same as the probabilities from the
corresponding probability distribution table
-probability distribution can also be in the form of a formula

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Expected Value (E)

A

-theoretical mean value of the outcomes for infinitely many trials
-Does not need to be a whole number

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Bernoulli Trial

A

-A Bernoulli trial is an experiment with only two possible outcomes:
success or failure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Binomial probability distribution

A

outcomes belong to two categories
1. The procedure has a fixed number of Bernoulli trials. One Bernoulli
trial is a single observation.
2. The trials must be independent, meaning that the outcome of any
individual trial does not affect the probabilities in the other trials.
3. Each trial must have all outcomes classified into exactly two categories,
commonly referred to as success and failure.
4. The probability of a success remains the same in all trials

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Binomial probability distribution notation

A

-S (success) and F (failure)
p = probability of a success in one of the n trials
q = probability of a failure in one of the n trials = 1 − p
n = fixed number of Bernoulli trials
x = specific number of successes in n trials
P(x) = probability of getting exactly x successes among
the n trials

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Sampling With/Without Replacement

A

-The binomial distribution will be applicable in cases where we sample
with replacement.
-If we sample from a small finite population without replacement, the
binomial distribution should not be used because the events are not
independent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Hypergeometric Distribution

A

If sampling is done without replacement and the outcomes belong to one of two types (success/failure), we can use the hypergeometric
distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Poisson probability distribution

A

discrete probability distribution
that applies to occurrences of some event over a specified interval
1. The random variable x is the number of occurrences of an event in
some interval.
2. The occurrences must berandom.
3. The occurrences must be independent of each other.
4. The occurrences must be uniformly distributed over the interval being
used
-determined only by the mean μ.
-The possible values of x has no upper limit
μ = mean number of occurrences of the event in the intervals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Poisson Distribution as Approximation to Binomial

A

Requirements:
1. n ≥ 100
2. np ≤ 10
Then for the Poisson distribution, we need parameter μ = np

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Uniform Distribution

A

-random variable is continuous (although it can also be used for
discrete random variables).
-The values of the random variable are spread evenly over the range of
possibilities

17
Q

Density Curves

A

-The graph of any continuous probability distribution is called a density curve.
Properties:
-The total area under the curve is 1.
-There is a correspondence between area and probability

18
Q

Normal Distribution

A

-The random variable is continuous.
-Graph is symmetric and bell-shaped
-characterized by the population mean, μ, and the population standard deviation, σ

19
Q

Standard Normal Distribution

A

-special normal distribution with the
following additional properties:
-Population mean, μ = 0.
-Population standard deviation, σ = 1.
-Commonly, the z -score is used as the label for the horizontal axis of
the graph.

20
Q

Table A-2: Standard Normal Distribution

A

can be used to determine the area (probability) when given a z
score, or to determine the z score when given an area (probability)
-It is designed only for the standard normal distribution

21
Q

Finding the Area Between Two Values

A

The area corresponding to the region between two z scores can be found by
finding the difference between the two areas found in Table A-2 (z score table)

22
Q

Critical Values

A

For the standard normal distribution, a critical value is a z score on the
borderline separating those z scores that are significantly low or
significantly high

23
Q

Converting Distributions

A

We can perform a conversion that allows us to “standardize” any
normal distribution so that x values can be transformed to z scores
z = x - mu / standard deviation

24
Q

Sampling Distribution of a Statistic

A

-The distribution of all values of the
statistic when all possible samples of the same size n are taken from the same population.
-The statistic can refer to the sample proportion, sample mean, sample
variance, etc

25
Sampling Distribution of the Sample Proportion
p population proportion pˆ sample proportion -The distribution of sample proportion tends to approximate a normal distribution. -The mean of sample proportions is the same as the population mean
26
Sampling Distribution of the Sample Mean
-The distribution of sample mean tends to approximate a normal distribution. -The mean of sample means is the same as the population mean
27
Sampling Distribution of the Sample Variance
-The distribution of sample variance tends to be a distribution skewed to the right. -The mean of sample variance is the same as the population variance
28
Estimators
-Estimator A statistic used to infer (estimate) the value of a population parameter. -Unbiased Estimator A statistic that targets the value of the corresponding population parameter in the sense that the sampling distribution of the statistic has a mean that is equal to the corresponding population parameter, such as pˆ, x¯,s2. -Biased Estimator A statistic that does not target the value of the corresponding population parameter, such as median, range, s.
29
Central Limit Theorem (CLT)
-For all samples of the same size n with n > 30, the sampling distribution of x¯can be approximated by a normal distribution with mean μ and standard deviation -Given any population with any distribution, the distribution of x¯can be approximated by a normal distribution when the samples are large enough with n > 30
30
Standard error of the mean, SEM
Standard deviation of all values of the sample mean
31
Applying the CLT
1. Population (with any distribution) has mean μ and standarddeviation σ. 2. Simple random samples all of the same size n are selected from the population. Requirement: -Population has a normal distribution or n > 30
32
Considerations During Problem Solving
1. Check Requirements: When working with the mean from a sample, verify that the normal distribution can be used by confirming that the original population has a normal distribution or the sample size is n > 30. 2. Individual Value or Mean from a Sample? Determine whether you are using a normal distribution with a single value x or the mean x¯ from a sample of n values
33
Normal Quantile (Probability) Plot
A normal quantile plot is a graph of points (x , y ) where each x value is from the original set of sample data, and each y value is the corresponding z score that is expected from the standard normal distribution. -If the data forms (approximately) a straight line, then we can assume it arises from a normal distribution
34
Sample Data From a Normally Distributed Population?
1. Histogram: Construct a histogram. If the histogram departs dramatically from a bell shape, conclude that the data do not have a normal distribution. 2. Outliers: Identify outliers. If there is more than one outlier present, conclude that the data might not have a normal distribution. 3. Normal quantile plot: If the histogram is basically symmetric and the number of outliers is 0 or 1, look at a normal quantile plot. The population is normal if the pattern of the points is reasonably close to a straight line
35
lognormal distribution
-Many data sets have a distribution that is not normal, but we can transform the data so that the modified values have a normal distribution. -One common transformation is to transform each value of x by taking its logarithm. -If the distribution of the logarithms of the values is a normal distribution, the distribution of the original values is called a lognormal distribution
36
approximate normal distribution requirements
1. The sample is a simple random sample of size n from a population in which the proportion of successes is p, or the sample is the result of conducting n independent trials of a binomial experiment in which the probability of success is p. 2. np ≥ 5 and nq ≥5. If the above requirements are satisfied, then the binomial probability distribution of the random variable x can be approximated by a normal distribution
37
Continuity Correction
When using the normal distribution (which is a continuous distribution) as an approximation to the binomial distribution (which is a discrete distribution), a continuity correction is made to a discrete whole number x in the binomial distribution by representing the discrete whole number x by the interval from x − 0.5 to x + 0.5 1. Check the requirements that np ≥ 5 and nq ≥ 5. 2. Find μ = np and σ = √npq to be used for the normal distribution. 3. Identify the discrete whole number x that is relevant to the binomial probability problem being considered, and represent that value by the region bounded by x ±0.5