RStudio Code for Intro to Statistics, Module 5 Flashcards

Question 1

Q

What RStudio command should we write to generate the same set of random numbers, which will allow us to reproduce the same set of numbers later on?

Answer

A

set.seed(1)

Question 2

Q

What are codes we could write which will culminate in a vector of 10,000 normally distributed values, with a mean of 3 and a standard deviation of 1, the primary function being called “norm_dist_1”?

Answer

A

n = 10000
mu_1 = 3
sigma_1 = 1
norm_dist_1 = rnorm(n, mean = mu_1, sd = sigma_1)

Question 3

Q

Now that we have our “norm_dist_1” object, we want to create a second normal distribution called “norm_dist_2”, which will have a mean value of 5, a standard deviation of 2, and will have an equal number of iterations as seen in the first distribution (10,000).

Answer

A

n = 10000
mu_2 = 5
sigma_2 = 2
norm_dist_2 = rnorm(n, mean = mu_2, sd = sigma_2)

Question 4

Q

What operation can we write in the “hist()” function to specify the width of the bins we want present in our plots?

Answer

A

breaks=[sequence of numbers]

Question 5

Q

Now that we have our two normal distributions, we want to establish them on the same plotting area, with the histogram of “norm_dist_1” above “norm_dist_2”. We want the margins around the graphs to all be 2 units on each side of the box.

The first distribution has a range of [-1.303, 6.728], while the second has a range of [-2.278, 12.302], reflecting the larger standard deviation and mean value in the latter. We want the widths of the bins to be 0.1 units.

What codes could we write to project both distributions, keeping in mind that we want the x-axes to run from [-5, 15]?

Answer

A

par(mfrow = c(2,1),
mar = c(2, 2, 2, 2))
break_points =
seq(from = -5, to = 15, by = 0.1)
hist(norm_dist_1, xlim = c(-5, 15),
breaks = break_points)
hist(norm_dist_2, xlim = c(-5, 15),
breaks = break_points)

Question 6

Q

In RStudio, we have a list of mean values (“mu1 = x”, “mu2 = x”, … “mu[n] = x”). When all of the means are the same, what would we expect from our plots if each had a different standard deviation (“sigma1 = 1”, “sigma2 = 1.5”, … “sigma[n] = z”)?

Answer

A

If all of the mean values (represented by “mu1 … mu[n]”) are the same (x), we would expect the top of the bell curve to peak in the same area, but distributions with larger standard deviations will be wider and flatter, and those with reduced standard deviations will be narrower.

Question 7

Q

In RStudio, we have a list of mean values (“mu1 = 1”, “mu2 = 2”, … “mu[n] = x”). When all of the means are different, what would we expect from our plots if each had the same standard deviation (“sigma1 = z”, “sigma2 = z”, … “sigma[n] = z”)?

Answer

A

If all of the standard deviations (represented by “sigma1 … sigma[n]”) are the same (z), we would expect that each bell curve is about equally wide and tall, but the peak of each curve lies at a different mean and the range of each curve is different, if equally large

Question 8

Q

When coding for a normal distribution, what do the following pieces of text typically represent to most people:
(1) b0
(2) b1
(3) sigma
(4) n

Answer

A

(1) intercept
(2) slope
(3) measure of the spread of frequency
(4) sample size

Question 9

Q

Repeatedly generate 300 normally distributed random variables with a mu value of 7.0 and a sigma of 2.5.

Calculate the mean of each set of numbers generated.

If this process is repeated 10 times, the end result is 10 different means.

What is the approximate minimum and maximum from these 10 distributions (think, “range”)?

Answer

A

Across 10 normal distributions with the given number of iterations, mu value and standard deviation, the lowest value we can expect to see from one of these is -2, the maximum we can expect is about 16.

Question 10

Q

Repeatedly generate 300 normally distributed random variables with a mu value of 7.0 and a sigma of 2.5.

Calculate the mean of each set of numbers generated.

If this process is repeated 10 times, the end result is 10 different means.

What is the approximate minimum and maximum from the means of these 10 distributions?

Answer

A

For the 10 normal distributions with 300 iterations, a standard deviation of 2.5 and a mean of 7.0, the lowest mean we might expect to generate is about 6.8, while the largest mean might be about 7.2.

Question 11

Q

Repeatedly generate 60 normally distributed random variables with a mu value of 7.0 and a sigma of 2.5.

Calculate the mean of each set of numbers generated.

If this process is repeated 10 times, the end result is 10 different means.

What is the approximate minimum and maximum from the means of these 10 distributions?

Answer

A

For the 10 normal distributions with 60 iterations, a standard deviation of 2.5 and a mean of 7.0, the lowest mean we might expect to generate is about 6.6, while the largest mean might be about 7.4.

Question 12

Q

Why is the range in the mean values of 10 different normal distributions slightly larger when we have only 60 iterations versus when we have 300 iterations?

Answer

A

The range in mean values for the 10 normal distributions is smaller with 300 iterations than it is with 60 iterations because we have less variability in the means in the set with 300 iterations

Question 13

Q

Assume we are simulating data for an ANOVA and have four treatments which each have 50 observations.

We have the parameters b0=2, b1=1, b2=0, b3=-1, and a sigma=0.5.

The data from which treatments will have the largest mean and the smallest mean?

Answer

A

Treatment 2 will have the largest mean value (mu2 = (b0 + b1) = (2 + 1) = 3).

Treatment 4 will have the smallest mean value (mu4 = (b0 + b3) = (2 + (-1) = 1).

Treatments 1 & 3 will have moderate mean values (mu1 = mu3 = b0 = (b0 + b2) = 2).

Question 14

Q

Assuming that all other settings do not change for Question #13, what would be expected to happen if the sigma value increased from 0.5 to 2.0 and the data was resimulated?

Answer

A

The parameters “b0” (intercept), “b1” (slope of first variable), “b2” (slope of second variable) and “b3” (slope of third variable) would be harder to estimate, because there would be more variability

Question 15

Q

If the values b1, b2 and b3 are the same, but b0 is increased, what would we expect to see?

Answer

A

A similar scatter-plot, except that the points would all be higher up

RStudio Code for Intro to Statistics, Module 5 Flashcards

(15 cards)