R-Studio Code for Intro to Statistics, Modules 1-3 Flashcards by Joseph Paoli

What is the code used to set the working directory for the RStudio file?

setwd(“C:/Users/Joseph Paoli/Downloads/Lessons in R for Stats/ Module ___”)

How well did you know this?

Not at all

Perfectly

What is another way to set the working directory in RStudio which involves the lower-right panel of the user interface?
Describe it in three steps.

Click the “Files” tab in the lower-right panel, which is next to “Plots”.
In the tab below the blue gear, find the correct folder desired for the working directory.
Click on the blue gear at this point to open the drop-down menu, then select “Set As Working Directory”.

How well did you know this?

Not at all

Perfectly

What is the code used to start off RStudio with a blank environment?

rm(list=ls())

How well did you know this?

Not at all

Perfectly

We want to create a random sequence of 100 numbers, each of which is between 1 and 20, and we want to call this object “random_numbers”.
What is the RScript code to do this?

random_numbers=runif(100, 1, 20)

How well did you know this?

Not at all

Perfectly

We want to see a summary of the object “random_numbers” (a set of 100 random values between 1 and 20).
What is the RStudio code?

summary(random_numbers)

How well did you know this?

Not at all

Perfectly

We want to see a histogram of the object “random_numbers” (a set of 100 random values between 1 and 20).
What is the RStudio code?

hist(random_numbers)

How well did you know this?

Not at all

Perfectly

What six statistics are provided when you run the “summary” command?

Minimum Value
1st Quartile
Median
Mean
3rd Quartile
Maximum Value

How well did you know this?

Not at all

Perfectly

If we generate a histogram and want to save it as an image file to our working directory, how would we do this?

In the “Plots” tab, select the drop-down menu titled “Export” and select “Save as Image…”, then name the file accordingly.

How well did you know this?

Not at all

Perfectly

We want to save out the summary statistics from the object “random_numbers”, so that we can use the data to make a table for a later report. We will save it out as an XLS file with the title “rando_values”.
What is the RStudio code to do this?

capture.output(summary(random_numbers), file=”rando_values.xls”)

How well did you know this?

Not at all

Perfectly

What are the two different ways to call in a dataset in RStudio, assuming the data is in a CSV file format and has the name “TestData.csv”?

In the “Environment” tab in the upper-left, select the drop-down menu for “Import Dataset”, then select “From Text (base)…”, at which point a sub-window will open displaying the working directory where the CSV file should be located.
We can type the ‘read’ command in the script editor, which would look something like this:
test_data=read.csv(“TestData.csv”)

How well did you know this?

Not at all

Perfectly

We have a dataset called “data” which includes nine observations with the following values:
1 1 1 3 5 7 9 14 23
We want to remove all values less than four to create an abbreviated list which is called “modData”.
What is the RStudio code?

modData=data[(data>4)]

How well did you know this?

Not at all

Perfectly

We want to log-transform the values in the “modData” values.
What is the RStudio code?

log_modData=log(modData)

How well did you know this?

Not at all

Perfectly

We want to square root-transform the values in the “modData” values.
What is the RStudio code?

sqrt_modData=sqrt(modData)

How well did you know this?

Not at all

Perfectly

We want to find the average of the set of values in “modData”.
What is the RStudio code?

mean(modData)

How well did you know this?

Not at all

Perfectly

We want to find the variance of the set of values in “modData”.
What is the RStudio code?

var(modData)

How well did you know this?

Not at all

Perfectly

We want to find the standard deviation of the set of values in “diam_1983” column of the “tree_diam” data set.

We want to find the standard deviation

How well did you know this?

Not at all

Perfectly

We want to install and load the package “COBRA” using the RScript Editor. COBRA is a standard set which is found in RStudio at the time of RStudio’s download.
What are the two lines of code in RStudio to do this?

install.packages(“COBRA”)
library(COBRA)

How well did you know this?

Not at all

Perfectly

Let’s say we have the command “read.table” and we want RStudio to tell us more about how to use it. What is the appropriate code to type in the RScript Editor?

?read.table

How well did you know this?

Not at all

Perfectly

Let’s say you want to find a code for imputing data into RStudio, but you don’t know that that code might be. What can you type into the RScript Editor which may help?

Study These Flashcards

help.search(“data.input”)

Maybe there’s a command like “anova” which you know the name of and want to use, but you don’t know the package it’s located under, so it isn’t loaded in your RStudio session.
What code would locate the package the “anova” command is located in?

Study These Flashcards

find(“anova”)

Let’s say we want to run an example of the command “lm”, which generates a linear model of the data, so that we better understand how to perform such analyses and to get familiar with a new command we may not have ever used. What RStudio code could do this for us?

Study These Flashcards

example(lm)

What does the RStudio code “demo(graphics)” generate when run?

Study These Flashcards

“demo(graphics)” generates a series of plots and shows the code to make them in the “Console” window in the lower-left.

Given these codes, we can generate similar graphics for our own data by copying and editing the ones provided by the program.

What commands return the first and last parts of a vector, matrix, table, data frame or function? For the sake of answering, assume the object has the generic name, “data”.

Study These Flashcards

“head(data)” would give a preview for the first samples in “data”, while “tail(data)” would return the final few samples in “data”.

We want to know the names of all the columns in a data set called “tree_diameters”. What is the appropriate RScript code?

Study These Flashcards

colnames(tree_diameters)

What two separate commands could we run to produce the number of rows and the number of columns for a data set with the name "tree_diameters" in the console?

nrow(tree_diameters) ncol(tree_diameters)

What single command could we run to produce the number of rows and the number of columns for a data set with the name "tree_diameters" in the console on the same line?

dim(tree_diameters)

We want to examine the internal structure of the R object "tree_diameters". What RScript code can we type?

str(tree_diameters)

What RScript code would one type to find the range in values for the data set "tree_diameters" from one particular column which represents those trees sampled in 1983, referred to in the table as "diam_1983"?

range(tree_diameters$diam_1983)

What code would we type to produce the 10% quantile for the column "diam_1983" in the "tree_diameters" data set? That is to say, 10% of the other samples are below it and the remaining 90% of samples are above it?

quantile(tree_diameters$diam_1983, 0.1)

What code would we type to produce the 90% quantile for the column "diam_1985" in the "tree_diameters" data set? That is to say, 90% of the other samples are below it and the remaining 10% of samples are above it?

quantile(tree_diameters$diam_1985, 0.9)

What code would we type to produce the 1%, 50% and 99% quantiles for the column "diam_1987" in the "tree_diam" data set? That is to say, in the console we would obtain the value for the output 1% of observations fall beneath, 50% of observations fall beneath, and 99% of observations fall beneath?

quantile(tree_diam$diam_1987, c(0.01, 0.50, 0.99))

We have a column "varieties" in the data set "tree_diameters" which provides the species of the trees present in the study (i.e., ''M.domestica", "P.armeniaca", "C.paradisi", etc.). What RScript code could we type to succinctly see in the console what all of the species present under "varieties"?

unique(tree_diameters$varieties)

What RScript code can we write to tabulate the number of samples we have for each species in the "varieties" column in the "tree_diameters" data set?

table(tree_diameters$varieties)

What RScript code can we type to generate all of the integer numbers from one to ten?

indexes=1:10

We have the data set "tree_diam", which has 4,027 rows and 30 columns. We want to modify it so that only the first seven columns appear in a new data set, "tree_diam_mod1". What RScript code could we write to accomplish this?

indexes_columns=1:7 tree_diam_mod1=tree_diam[, indexes_columns]

We have the data set "tree_diam", which has 4,027 rows and 30 columns. We want to modify it so that only the first nine rows appear in a new data set, "tree_diam_mod2". What RScript code could we write to accomplish this?

indexes_rows=1:9 tree_diam_mod2=tree_diam[indexes_rows ,]

We have the data set "tree_diam", which has 4,027 rows and 30 columns. We want to modify it so that only the first 11 rows and the first 13 columns appear in a new data set, "tree_diam_mod3". What RScript code could we write to accomplish this?

indexes_columns=1:13 indexes_rows=1:11 tree_diam_mod3=tree_diam[indexes_rows , indexes_columns]

We have the data set "tree_diam", which has 4,027 rows and 30 columns. We want to modify it so that only the first, third and sixth columns appear in a new data set, "tree_diam_mod4". What RScript code could we write to accomplish this?

indexes_columns=c(1, 3, 6) tree_diam_mod4=tree_diam[, indexes_columns]

We have the data set "tree_diam", which has 4,027 rows and 30 columns. We want to modify it so that only the second, fourth and eighth rows appear in a new data set, "tree_diam_mod5". What RScript code could we write to accomplish this?

indexes_rows=c(2, 4, 8) tree_diam_mod5=tree_diam[indexes_rows ,]

We have the data set "tree_diam", which has 4,027 rows and 30 columns. We want to exclude the first row to generate the data set "tree_diam_mod6". What RScript code could we write to accomplish this?

tree_diam_mod6=tree_diam[-1,]

We have the data set "tree_diam", which has 4,027 rows and 30 columns. We want to exclude the first column to generate the data set "tree_diam_mod7". What RScript code could we write to accomplish this?

tree_diam_mod7=tree_diam[, -1]

We have the data set "tree_diam", which has 4,027 rows and 30 columns. We want to exclude the first, third, and seventh rows to generate the data set "tree_diam_mod8". What RScript code could we write to accomplish this?

indexes_rows=c(1, 3, 7) tree_diam_mod8=tree_diam[-indexes_rows,]

We have the data set "tree_diam", which has 4,027 rows and 30 columns. We want to exclude the third, fourth, and ninth columns to generate the data set "tree_diam_mod9". What RScript code could we write to accomplish this?

indexes_columns=c(3, 4, 9) tree_diam_mod9=tree_diam[, -indexes_columns]

We have the data set "tree_diam", which has 4,027 rows and 30 columns. We want to exclude first, third and eighth rows, and we want to remove the second, fifth and ninth columns. The new data set is to have the name "tree_diam_mod10". What RScript code could we write to accomplish this?

indexes_rows=c(1, 3, 8) indexes_columns=c(2, 5, 9) tree_diam_mod10=tree_diam[-indexes_rows, -indexes_columns]

We have the data set "tree_diam", which has 4,027 rows and 30 columns. Two of the columns have the names "species" and "diam_1983", and we want to exclude the others to generate a new data set with only these two columns and the name "tree_diam_mod11". What RScript code could we write to accomplish this?

retain_columns=c("species", "diam_1983") tree_diam_mod11=tree_diam[, retain_columns]

We have generated a histogram after running the code "hist(tree_data$height)", but we want to spruce it up to have the x-axis label "Tree Species", the y-axis label "Height (m)", and the overall title of "Average Height for Given Tree Species". What RScript code could we write to accomplish this?

hist(tree_data$height, main = "Average Height for Given Tree Species", xlab = "Tree Species", ylab = "Height (m)")

We want to find the standard deviation for trees in the "diam_1984" column of the "tree_diam" data set. What command can we type in RScript to split the data into subsets, compute summary statistics for each subset, and return the results of those summaries in the console?

aggregate(diam_1984~species, data=tree_diam, sd)

R-Studio Code for Intro to Statistics, Modules 1-3 Flashcards

(47 cards)