Chapter 4 Flashcards
(19 cards)
Discrete, random variable
-countable possible values
Random variable (X): numerical outcome of a random experiment
Ex: consider flipping two coins, the sample space [All possible outcomes] is S= HH,HT,TH,TT
-now defined a random variable X as the number of heads, the possible values of X are: X = 0,1,2
-Here ex is a discrete random variable because it only takes on a specific value (0,1 or 2)
Probability distribution of a discrete, random variable
-probability distribution a science probabilities to each possible value of X it must satisfy these two conditions:
- Each probability is between 0 and 1: 0<P(X)<1 for all x
- The probability sum to 1
Inclusive inequalities
-include boundary points
-use symbols less than or equal to (<) and greater than or equal to (>) because they include the given number
Ex:
-at most five means X<5
-At least 10 means X>10
-No more than eight X<8
-No less than three X>3
-less than or equal to seven X<7
-greater than or equal to 2 X>2
Ex: X>50 (means test score must be at least 50 to pass)
Exclusive inequalities
-do not include boundary point
-less than 5 X<5
-more than 10 X>10
-Fewer than 6 X<6
-Up to, but not including 4 X<4
Mean and standard deviation of a discrete, random variable
-can only be calculated when we know the probability distribution of a random variable
-calculated by multiplying X with P(X) and then adding the results
(mean is also called expected value)
-population mean: calculated by taking all the values adding them up and dividing by how many values there are.
-if a random number, you will represents a random observation from a known population it’s mean will match the population mean
Bernoulli trials
-sequence of trials that satisfy the following condition
- Each trial has two possible outcomes, success or failure S and F
- Each trial is independent of all other trials. (outcome of one trial does not affect outcome of another trial)
- The probability of success is the same for each trial, and the probability of failure is 1-p
Ex: flipping a coin
Binomial distribution
-A way to calculate the probability of getting a certain number of successes (like heads in coin flips) in a fixed number of trials (experiments) where each trial has only two possible outcomes, success or failure
-A binomial distribution arises when we repeat a bernoulli a certain number of times (denoted as n) and are interested in finding the probability of getting a specific number of successes (denoted as k) in those trials
Shape of binomial distribution
-because probability of success is equal to the probability of failure (50% chance of success and 50% chance of failure) the binomial distribution is symmetric around the mean
-the mean and medain are also equal because the distribution is symmetric
-when p<0.5 (probability of success is less than probability of failure) distribution is right skewed, meaning there are more failures than successes and distribution has long tail to the right
-when p>0.5 (probability of success is greater than probability of failure) distribution is left skewed with a longer tail on the left side
1-2-3 rule
-within one standard deviation of the mean, this is the range where you would expect to find the number of successes most of the time
-within two standard deviations of the mean this is the range where you would expect to find the number of successes almost all the time
-Within three standard deviations of the mean this is the range where you would expect to find the number of successes nearly always
Continuous, random variables and density curves
-continuous, random variables have infinite possibilities within any given range, decimal numbers can take an infinite number of possible values in a given interval [between zero and 10 minutes]
-since there are infinitely many possible values, we cannot assign a probability to any exact value, instead we calculate probabilities for ranges of values P(X>180)
Density function
-for a continuous, random variable, we express its probability distribution using a density function
-this function defines a curve, such that the total area under the curve equals 1
-area under the curve: they probability that a random variable falls within a specific range (between a and B) is equal to the area under the curve between a and B
-Total area equals one
-we cannot talk about our probability that a continuous variable takes any exact value like P(X=x), we need to calculate the probability over an interval, such as P(a<X<b)
Normal distribution
Shape of the curve: for a normal distribution, it’s bell shaped and symmetric around the mean
Area under the curve: the total area under the curve is equal to one
Defined by two parameters: mean (centre of distribution) -determines where the peak of the curve is located
-Standard deviation (measures the spread or width of the curve)
-Larger standard deviation= wider flatter curve
-smaller standard deviation= narrow taller curve.
Standardization
-any normal distribution can be converted into the standard normal distribution by a process called standardization. -this involves transforming a normal variable X into a new variable Z using the formula Z= X-mean/standard deviation
-this shift the data to have a mean of 0 and scale data to have a standard deviation of 1
-after standardization any normal variable can be analyzed using the standard normal distribution for which we can find probabilities using a Z table
Normal distribution
-bell shaped, symmetric distribution, defined by its mean and standard deviation
-the standard normal distribution is a special case where the mean equals zero and standard deviation equals 1
-The Z table allows you to find probabilities associated with Z scores in the standard normal distribution
Z table
- To find probabilities of a normal random variable, you standardize it to a standard normal random variable and use the Z table
- To find values corresponding to a given probability you use a Z table to find a Z score and then reverse standardize to get the original value.
A scored
- To find probabilities of a normal random variable, you standardize it to a standard normal random variable and use the Z table
- To find values corresponding to a given probability you use a Z table to find a Z score and then reverse standardize to get the original value.
Checking for normality
-A more accurate method for checking normality is to use a normal probability [Q-Q or P-P plots]
-Q-Q: if the data is normally distributed, the points should lie approximately along a straight diagonal line, if the points deviate from the straight line, then the data is likely not normally distributed. This plot compares observed, quant tiles of your data against theoretical quant tiles, quant tiles are data points that split the data into intervals with equal probabilities
-P-P plot: compares the cumulative probabilities of the data against the cumulative probabilities of a normal distribution, similarly, if the data is normal, the points will lie on a straight diagonal line, any significant deviation suggest the data is not normal
Steps for creating a normal probability plot
- Determine sample size (n)
- Sort the data from the smallest to largest values.
- For each data point to find the corresponding expected the score.
- On the X axis plot the observed data values on the Y axis plot the corresponding expected Z scores
-If the data is normally distributed, the points will approximately form a straight line, if the points deviate significantly from a straight line, it suggests that the data is not normal
-a long right tail might indicate a right skew
-long left till my indicate a left skew
-NS shape might suggest the presence of heavy tails