R-Studio Code for Intro to Statistics, Module 4 Flashcards by Joseph Paoli

We have a data set “wild_cats” which has 13 rows and 3 columns. Only the first 5 rows have actual species under the column “Spp” (species are “cheetah”, “lion”, “ocelot”, “lynx”, and “tiger”).
The columns “BodyWt” and “BrainWt” give values for these five species, but rows 6-13 just have the values “NA” under the “BrainWt” column and rows 6-13 under the “Spp” and “BodyWt” columns are blank.
How can we go about shrinking the data set to just the relevant species?

query_cats = is.na(wild_cats$BrainWt)
index_cats = which(query_cats)
wild_cats_new = wild_cats[-index_cats , ]

How well did you know this?

Not at all

Perfectly

Now that we have a table only of our five relevant cat species, we want to make a scatter-plot which relates the values in the “BodyWt” and “BrainWt” columns to one another.
We want “BodyWt” to be the x-variable and “BrainWt” to be the y-variable.
What line of code should we write?

plot(x = wild_cats_new$BodyWt ,
y = wild_cats_new$BrainWt)

How well did you know this?

Not at all

Perfectly

We have generated a scatter-plot relating body weight and brain weight in the “wild_cats_new” data set, but we want to change our open circles on the graph to solid black equilateral triangles.
What is the new line of code we run to generate a graph with such points?

plot(x = wild_cats_new$BodyWt ,
y = wild_cats_new$BrainWt ,
pch = 17)

How well did you know this?

Not at all

Perfectly

We have generated a scatter-plot relating body weight and brain weight in the “wild_cats_new” data set, but we want to change the label on the x-axis to “Body Weight (kg)” and the label on the y-axis to “Brain Weight (g)”.
What is the new line of code we run to generate a graph with such points?

plot(x = wild_cats_new$BodyWt ,
y = wild_cats_new$BrainWt ,
xlab = “Body Weight (kg)” ,
ylab = “Brain Weight (g)”)

How well did you know this?

Not at all

Perfectly

We have generated a scatter-plot relating body weight and brain weight in the “wild_cats_new” data set, but we want to specify that the x-axis starts at 0 kilograms and runs to 250 kilograms, and the y-axis starts at 150 grams and runs to 1500 grams.
What is the new line of code we run to generate a graph with such points?

plot(x = wild_cats_new$BodyWt ,
y = wild_cats_new$BrainWt ,
xlim = c(0 , 250) ,
ylim = c(150 , 1500))

How well did you know this?

Not at all

Perfectly

What line of code would we write if we wanted to hard code the word “TIGER” at the coordinates (62 , 1320) on a scatter-plot?

text(x = 62 , y = 1320, labels = “TIGER”)

How well did you know this?

Not at all

Perfectly

What line of code would we write if we wanted to assign appropriate labels to all five of the wild cat species in our scatter-plot, with the labels themselves appearing to the right of the plotted symbol?

text(x = wild_cats_new$BodyWt ,
y = wild_cats_new$BrainWt ,
labels = wild_cats_new$Spp ,
pos = 4)

How well did you know this?

Not at all

Perfectly

We have the data set “malaria_Afr” which has nine columns and 1,508 samples.
The columns are:

“outdoor_occupation” (0 or 1),
“microsc” (0 or 1 or NA),
“pcr” (0 or 1 or NA),
“x” (numbers between 702725.8 and 715037.3),
“y” (numbers between 8913068 and 8928818),
“gender” (0 or 1),
“age” (numbers between 0.25 and 90),
“time_Afr” (numbers between 0 and 36), and
“occupation” (14 different jobs listed)

We want to create a table of the data based on the “occupation” column and to call it “occ_table”.
What code should we write?

occ_table = table(malaria_Afr$occupation)

How well did you know this?

Not at all

Perfectly

After establishing our “occ_table” from the “malaria_Afr” data set, we want to establish a bar plot to show the occupations the 1,508 individuals sampled in the data work in.
What code should we write?

barplot(occ_table)

How well did you know this?

Not at all

Perfectly

After writing “barplot(occ_table)”, we want to make our labels on the x-axis vertical so that they will all fit on the projected plot.
How should the code be altered?

barplot(occ_table, las = 2)

How well did you know this?

Not at all

Perfectly

After establishing our bar plot on occupations from the “malaria_Afr” data set with the vertical labels, we want to add an overall title called “Primary Occupation”.
How should the code be altered?

barplot(occ_table, las = 2,
main = “Primary Occupation”)

How well did you know this?

Not at all

Perfectly

After establishing our “Primary Occupation” bar plot, we decide that we want the bars to be red rather than the default gray.
How should the line of code be altered?

barplot(occ_table, las = 2,
main = “Primary Occupation”,
col = “red”)

How well did you know this?

Not at all

Perfectly

Now that our “Primary Occupation” bar plot has red bars, we decide we want the y-axis to be called “Number of Samples” and the x-axis to be called “Profession”.
How should the line of code be altered?

barplot(occ_table, las = 2,
main = “Primary Occupation”,
col = “red”,
ylab = “Number of Samples”,
xlab = “Profession”)

How well did you know this?

Not at all

Perfectly

We want to examine how the variable “age” varies based on the variable “occupation” from the “malaria_Afr” data set, and we want to examine this as a box plot. We also want to establish a y-axis label of “Age [years]”.
What would be the correct line of code?

boxplot(age ~ occupation,
data = malaria_Afr, ylab = “Age [years]”)

How well did you know this?

Not at all

Perfectly

In the “malaria_Afr” data set, the column “gender” uses a binary system where ‘0’ means men and ‘1’ means women. We want to change this to the actual words for the establishment of a box plot.
What is a query sequence we could use to change the numbers to the words ‘Men’ and ‘Women’?

malaria_Afr$gender_M = ‘Men’
query_M = malaria_Afr$gender==0
index_M = which(query_M)
malaria_Afr$gender_M[index_M] = ‘Women’

How well did you know this?

Not at all

Perfectly

We want to break our plotting space across four separate graphs, with two columns and two rows.
What is the appropriate line of code?

Study These Flashcards

par(mfrow = c(2 , 2))

We want to break our plotting space across six separate graphs, with two columns and three rows.
What is the appropriate line of code?

Study These Flashcards

par(mfrow = c(3 , 2))

We want to break our plotting space across two separate graphs, which are side by side on the same row in two different columns.
What is the appropriate line of code?

Study These Flashcards

par(mfrow = c(1 , 2))

We want to break our plotting space across four separate graphs, with 2 columns and 2 rows.
For each panel, we want the margins around the graph to be 6 units below each panel, 2 units to the left of each panel, 1 unit above each panel, and 3 units to the right of each panel.
What is the appropriate line of code?

Study These Flashcards

par(mfrow = c(2 , 2),
mar = c(6 , 2, 1, 3))

We want to break our plotting space across four separate graphs, with 2 columns and 2 rows.
To the overall outside margins of the 2-by-2 set of graphs, we want to add 1 unit of space below, 4 units of space to the left, 2 units of space on the top and 3 units of space to the right.
What is the appropriate line of code?

Study These Flashcards

par(mfrow = c(2 , 2),
oma = c(1, 4, 2, 3))

What line of code might we write to add the y-axis labels “Time in Africa” and “Age” to the top row and bottom row of the 2-by-2 set of tables plotted in RStudio on the left side of the block of graphs?

Study These Flashcards

mtext(side = 2, outer = T,
line = 2, at = c(0.3 , 0.85),
c(“Time in Africa”, “Age”))

What line of code might we write to add the x-axis labels “By Gender” and “By Occupation” above the left column and right column of the 2-by-2 set of tables plotted in RStudio?

Study These Flashcards

mtext(side = 3, outer = T,
line = 2, at = c(0.25 , 0.75),
c(“By Gender”, “By Occupation”))

We have a data set called “climate” which includes 63 samples with 5 factors:

“deltaT” is the difference in temperature between a location in the year 1824 and today, in 2024; negative temperatures mean that it was colder on average 200 years previously.
“sdev” is the standard deviation in the factor “deltaT”; it provides us with a measure of uncertainty in the value of a given samples “deltaT” value.
“proxy” informs us which type of measurement was used to generate our “deltaT” value; unique inclusions in “proxy” include ‘Mg/Ca’, ‘Faunal’, ‘Pollen’, and others.
“t_m” is a column which includes ‘1’ if the measurement was done in a marine environment and a ‘0’ if the measurement comes from a terrestrial environment.
“latitude” is the latitude at which the measurements were taken.

We are interested in plotting the relationship between “latitude” and “deltaT”, with the latter being the y-axis variable.
What is the appropriate line of code?

Study These Flashcards

plot(climate$latitude, climate$deltaT)

To our plot relating latitude and change in temperature from the past 200 years, we want to add a red horizontal line at 0 degrees of temperature change, which will better convey that it was colder on average in the past.
What is the appropriate line of code we could write after running our “plot(climate$latitude, climate$deltaT)” command?

Study These Flashcards

abline(h = 0, col = “red”)

To our plot relating latitude (x-axis) and change in temperature from the past 200 years (y-axis), we want to add a "Locally Weighted Scatterplot Smoothing" (LOWESS) line. What line of code might we write to establish the list of "x" and "y" values in an object called "low_line"?

low_line = lowess(x = climate$latitude, y = climate$deltaT)

To our plot relating latitude (x-axis) and change in temperature from the past 200 years (y-axis), we want to add a "Locally Weighted Scatterplot Smoothing" (LOWESS) line. Now that we have stored the values needed to plot this line in the object "low_line", what command would we type in RStudio to project it in the plotting area with a line width of 2?

lines(low_line$x , low_line$y, lwd = 2)

In the box plot relating change in temperature over 200 years to latitude, we want to convey in colors which samples came from marine environments (blue) and which came from terrestrial environments (red). Recall that in the "t_m" column, '1' designates a measurement done in a marine environment, while '0' is a measurement from a terrestrial environment. What would an appropriate list of commands be to change the plotted black circles to red or blue?

climate$color_t = 'red' query_t = climate$t_m==1 index_t = which(query_t) climate$color_t[index_t] = 'blue' plot = (climate$latitude, climate$deltaT, col = climate$color_t)

In the previous question, why did we make the 'cex' function equal to the inverse of the standard deviation ('1/climate$sdev'), rather than make 'cex' equal to 'climate$sdev' directly?

By making 'cex' equal to the inverse of the values in the column 'sdev', we are conveying that larger circles have a greater amount of certainty. If we made 'cex' directly equal to 'climate$sdev', the coordinates would become diminished as the standard deviation associated with them got smaller, which would mean values we felt more sure about would be visually subordinate to those we were less certain of.

To our scatter-plot with circles colored in based on if they came from a terrestrial proxy (red) or a marine proxy (blue), we want to add the horizontal axis label "Latitude", the vertical axis label "Temperature (C)", and we want to change the size of the projected samples based on the values found in the column "sdev" to convey the certainty (or lack thereof) we have in the result reached by the proxy. What would be an appropriate change we could make to the "plot()" command from the previous question?

plot = (climate$latitude, climate$deltaT, col = climate$color_t, xlab = 'Latitude', ylab = 'Temperature (C)', cex = 1/climate$sdev)

We want to put together everything we have done so far with plotting the "climate" data set's relationship between latitude and temperature differences for the samples: (1) We want "latitude" on the x-axis and labeled as such, save that the first letter is capitalized. (2) We want "deltaT" on the y-axis and labeled as "Temperature Difference", and we want the range of the y-axis to stretch from -7 to 4. (3) In reference to the earlier lines of code in which we assigned red and blue to terrestrial and marine proxies respectively, we want those coordinates to be colored as such on the projection. (4) We want the plotted samples to be larger if they have a lower standard deviation and smaller if their standard deviation is higher. (5) We want a LOWESS line we established with the earlier code which is green when projected. (6) We want a horizontal line projected at a zero-degree temperature change which is black and has a width of 2 units. (7) We want to project a legend for the graph at the coordinates (-20, 4), which has "Marine" on top next to a blue circle and "Terrestrial" below it next to a red circle. What are the lines of code we would need to run to generate this plot?

plot(climate$latitude, climate$deltaT, col = climate$color_t), xlab = "Latitude", ylab = "Temperature Difference", ylim = c(-7, 4), cex = 1/climate$sdev) lines(low_line$x, low_line$y, col = 'green'), abline(h = 0, col = 'black', lwd = 2) legend(x = -20, y = 4, col = c('blue', 'red'), pch = 1, legend = c('Marine', 'Terrestrial'))

R-Studio Code for Intro to Statistics, Module 4 Flashcards

(30 cards)