Section 1 - Setting Up R Environment Flashcards

Question 1

Q

What is tidyverse?

Answer

A

Tidyverse is a package in R which contains other packages/tools for data manipulation.

Question 2

Q

What is the first path for the “here” package?

Answer

A

root folder. Folder which consists of the project file.

Question 3

Q

What is symbol used for chaining multiple functions together?

Answer

A

%>%
Eg -
chocolate_data <- chocolate %>%
separate( col = “ingredients”,
into = c(“number_ingredients”,”list_ingredients”),
sep = “-“) %>%
mutate(number_ingredients = as.numeric(number_ingredients),
vanilla = case_when(
str_detect(list_ingredients,”V”) ~ “contains Vanilla”,
TRUE ~ “no vanilla”),
vanilla = as_factor(vanilla),
cocoa_percent = str_remove(cocoa_percent, “%”),
cocoa_percent = as.numeric(cocoa_percent),
number_ingredients = case_when(
is.na(number_ingredients) & cocoa_percent == 100 ~ 1,
TRUE ~ number_ingredients
),
across(where(is.character), as_factor)
)

Question 4

Q

Explain this statment -
chocolate_data <- chocolate %>%
separate( col = “ingredients”,
into = c(“number_ingredients”,”list_ingredients”),
sep = “-“) %>%
mutate(number_ingredients = as.numeric(number_ingredients),
vanilla = case_when(
str_detect(list_ingredients,”V”) ~ “contains Vanilla”,
TRUE ~ “no vanilla”),
vanilla = as_factor(vanilla),
cocoa_percent = str_remove(cocoa_percent, “%”),
cocoa_percent = as.numeric(cocoa_percent),
number_ingredients = case_when(
is.na(number_ingredients) & cocoa_percent == 100 ~ 1,
TRUE ~ number_ingredients
),
across(where(is.character), as_factor)
)

Answer

A

This R code processes a dataset called chocolate using the tidyr and dplyr packages. Here’s a concise explanation:
Separate Ingredients: Splits the ingredients column into two columns: number_ingredients (numeric) and list_ingredients (text), using “-“ as the separator.

Convert to Numeric: Converts number_ingredients to a numeric type.

Create Vanilla Column: Creates a new column vanilla that checks if “V” is in list_ingredients. If true, sets “contains Vanilla”; otherwise, “no vanilla”. Converts vanilla to a factor.

Clean Cocoa Percent: Removes the “%” from cocoa_percent and converts it to numeric.

Handle Missing Ingredients: For rows where number_ingredients is NA and cocoa_percent is 100, sets number_ingredients to 1; otherwise, keeps the existing value.

Convert Characters to Factors: Converts all character columns (except those already processed) to factors.

The result is a cleaned and transformed dataset with new columns and appropriate data types for analysis.

Question 5

Q

What is geom_jitter? geom_smooth?

Answer

A

In R, specifically within the ggplot2 package, geom_jitter and geom_smooth are geometric objects (geoms) used to visualize data in plots.
geom_jitter: Adds a small amount of random noise to data points to prevent overplotting, making it easier to see individual points in dense datasets. It’s useful for categorical or discrete data where points might overlap. For example, in a scatterplot, geom_jitter() spreads points slightly around their actual positions.Example:
R
ggplot(data, aes(x = category, y = value)) + geom_jitter()
geom_smooth: Adds a smoothed conditional mean (like a regression line or curve) to a plot, showing the trend or relationship in the data. It often includes a confidence interval. You can specify methods like lm (linear model) or loess (local regression).Example:
R
ggplot(data, aes(x = x_variable, y = y_variable)) + geom_smooth(method = “lm”)
Both are used in ggplot2 to enhance data visualization, with geom_jitter reducing overlap and geom_smooth highlighting trends.

Section 1 - Setting Up R Environment Flashcards

(5 cards)