r_ggplot Flashcards
(25 cards)
What does the “gg” in “ggplot” stand for?
- Grammar Graphics
How do you import ggplot
- library(ggplot2)
- What is one limitation of ggplot
- Works exclusively with data tables
- In these data tables:
- rows have to be observations
- columns have to be variables
What happens if you do the following?
- murders %>% ggplot()
- This renders a blank plot, since no geometry has been defined.
What is Median Absolute Deviation
- mad(x)
- robust measure of central tendency
-
not sensitive to the presence of outliers
- unlike standard mean and standard deviation
How can you use exploratory data analysis to detect that an error was made?
- A boxplot, histogram, or qq-plot would reveal a clear outlier
Given the following:
- x <- Galton$child
Do the following:
- Write a function called error_avg that takes a value k and returns the average of the vector x after the first entry changed to k.
- Show the results for k=10000 and k=-10000.
- error_avg <- function(k){
x[1] <- k
return(mean(x))
}
error_avg(10000)
error_avg(-10000)
What are layers in ggplot?
- In ggplot, graphs are created by adding layers
- They are added component by component
- Layers can:
- define geometries
- compute summary statistics
- define what scales to use
- even change styles
- To add layers, we use a symbol plus
Create a ggplot scatter plot with the following:
- “murders” dataset
- x-axis: population/10^6
- y-axis: total
murders %>% ggplot() +
geom_point( aes( x = population/10^6, y = total) )
What are the functions to add labels to the x and y axis’, and add a title to the plot?
- x-axis label: xlab(“<label>”)</label>
- y-axis label: ylab(“<label>”)</label>
- plot title: ggtitle(“<title>")</title>
How do you add a color to a category of a variable, such as region? Example
- We have to use a mapping
- To map each point to a color, we need to use aes
- geom_point( aes( col=region), size = 3)
How would you add a line to the plot with the following characteristics:
- Dashed
- Goes through log10(r)
- Darkgrey color
- geom_abline( intercept = log10(r), lty = 2, color = “darkgrey”)
How do you do the following:
- Change the legend label from “region” to “Region”
- scale_color_discrete(name = “Region”)
How would you do the following:
- change the plot them to “theme_economist”
- library(ggthemes)
- theme_economist()
Create a plot of the following:
- data: male heights from the heights dataset
- bin width: 1
- color: bars blue with black border
- label: (x-axis) “Male heights in inches”
- title: (plot) “Histogram”
p <- heights %>%
+ filter(sex==”Male”) %>%
+ ggplot(aes (x = height) )
+ geom_histogram(binwidth = 1, fill = “blue”, col = “black”)
+ xlab(“Male heights in inches”)
+ ggtitle(“Histogram”)
Take the previously created plot ‘p’ and make the following change:
- change to smooth density plot with blue color
- p + geom_density(fill = “blue”)
What is the default standard deviation and mean of a qqplot
- mean = 0
- standard deviation = 1
How would you adjust an existing qqplot“p” to change the default mean and sd to the mean and sd of the “height” variable
- Create a new object “params” with mean(height) and sd(height)
- add the new object “params” to the dparams function of geom_qq
params <- heights %>%
+ filter(sex = “Male”) %>%
+ summarize(mean = mean(height), sd = sd(height) )
p + geom_qq(dparams = params)
What is the class of p <- ggplot(murders)
- ggplot
Using the pipe %>%, create an object p associated with the heights dataset instead of with the murders dataset as in previous exercises.
- p <- murders %>% ggplot()
Create a scatter plot from the murders with the following:
- “total” on the x-axis
- “population” on the y-axis
- label the points with “abb”
- color the labels blue
murders %>% ggplot(aes(population, total,label= abb)) +
geom_label(color=”blue”)
Create a scatter plot from the murders with the following:
- “total” on the x-axis
- “population” on the y-axis
- label the points with “abb”
- color the labels by region
murders %>% ggplot(aes(population, total, label = abb, color=region)) +
geom_label()
Make the following change to the existing ggplot ‘p’:
- Change both axes to be in the log scale. Make sure you do not redefine p - just add the appropriate layers.
- Add a title to the plot “Gun murder data”
p + scale_x_log10() +
scale_y_log10() +
ggtitle(“Gun murder data”)
Create a ggplot object called p using the pipe to assign the heights data to a ggplot object.
Assign height to the x values through the aes function.
- p <- heights %>% ggplot(aes(x = height))