r_ggplot Flashcards

1
Q

What does the “gg” in “ggplot” stand for?

A
  • Grammar Graphics
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How do you import ggplot

A
  • library(ggplot2)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q
  • What is one limitation of ggplot
A
  • Works exclusively with data tables
  • In these data tables:
    • rows have to be observations
    • columns have to be variables
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What happens if you do the following?

  • murders %>% ggplot()
A
  • This renders a blank plot, since no geometry has been defined.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is Median Absolute Deviation

A
  • mad(x)
  • robust measure of central tendency
  • not sensitive to the presence of outliers
    • unlike standard mean and standard deviation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How can you use exploratory data analysis to detect that an error was made?

A
  • A boxplot, histogram, or qq-plot would reveal a clear outlier
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Given the following:

  • x <- Galton$child

Do the following:

  • Write a function called error_avg that takes a value k and returns the average of the vector x after the first entry changed to k.
  • Show the results for k=10000 and k=-10000.
A
  • error_avg <- function(k){
    x[1] <- k
    return(mean(x))
    }

error_avg(10000)
error_avg(-10000)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are layers in ggplot?

A
  • In ggplot, graphs are created by adding layers
  • They are added component by component
  • Layers can:
    • define geometries
    • compute summary statistics
    • define what scales to use
    • even change styles
  • To add layers, we use a symbol plus
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Create a ggplot scatter plot with the following:

  • “murders” dataset
  • x-axis: population/10^6
  • y-axis: total
A

murders %>% ggplot() +

geom_point( aes( x = population/10^6, y = total) )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the functions to add labels to the x and y axis’, and add a title to the plot?

A
  • x-axis label: xlab(“<label>”)</label>
  • y-axis label: ylab(“<label>”)</label>
  • plot title: ggtitle(“<title>")</title>
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do you add a color to a category of a variable, such as region? Example

A
  • We have to use a mapping
  • To map each point to a color, we need to use aes
  • geom_point( aes( col=region), size = 3)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How would you add a line to the plot with the following characteristics:

  • Dashed
  • Goes through log10(r)
  • Darkgrey color
A
  • geom_abline( intercept = log10(r), lty = 2, color = “darkgrey”)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How do you do the following:

  • Change the legend label from “region” to “Region”
A
  • scale_color_discrete(name = “Region”)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How would you do the following:

  • change the plot them to “theme_economist”
A
  • library(ggthemes)
  • theme_economist()
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Create a plot of the following:

  • data: male heights from the heights dataset
  • bin width: 1
  • color: bars blue with black border
  • label: (x-axis) “Male heights in inches”
  • title: (plot) “Histogram”
A

p <- heights %>%

+ filter(sex==”Male”) %>%

+ ggplot(aes (x = height) )

+ geom_histogram(binwidth = 1, fill = “blue”, col = “black”)

+ xlab(“Male heights in inches”)

+ ggtitle(“Histogram”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Take the previously created plot ‘p’ and make the following change:

  • change to smooth density plot with blue color
A
  • p + geom_density(fill = “blue”)
17
Q

What is the default standard deviation and mean of a qqplot

A
  • mean = 0
  • standard deviation = 1
18
Q

How would you adjust an existing qqplot“p” to change the default mean and sd to the mean and sd of the “height” variable

A
  • Create a new object “params” with mean(height) and sd(height)
  • add the new object “params” to the dparams function of geom_qq

params <- heights %>%

+ filter(sex = “Male”) %>%

+ summarize(mean = mean(height), sd = sd(height) )

p + geom_qq(dparams = params)

19
Q

What is the class of p <- ggplot(murders)

A
  • ggplot
20
Q

Using the pipe %>%, create an object p associated with the heights dataset instead of with the murders dataset as in previous exercises.

A
  • p <- murders %>% ggplot()
21
Q

Create a scatter plot from the murders with the following:

  • “total” on the x-axis
  • “population” on the y-axis
  • label the points with “abb”
  • color the labels blue
A

murders %>% ggplot(aes(population, total,label= abb)) +
geom_label(color=”blue”)

22
Q

Create a scatter plot from the murders with the following:

  • “total” on the x-axis
  • “population” on the y-axis
  • label the points with “abb”
  • color the labels by region
A

murders %>% ggplot(aes(population, total, label = abb, color=region)) +
geom_label()

23
Q

Make the following change to the existing ggplot ‘p’:

  • Change both axes to be in the log scale. Make sure you do not redefine p - just add the appropriate layers.
  • Add a title to the plot “Gun murder data”
A

p + scale_x_log10() +

scale_y_log10() +

ggtitle(“Gun murder data”)

24
Q

Create a ggplot object called p using the pipe to assign the heights data to a ggplot object.

Assign height to the x values through the aes function.

A
  • p <- heights %>% ggplot(aes(x = height))
25
Q

Create a ggplot from the “heights” dataset with separate density plots for males and females by defining group by sex

  • 2 ways to do this
A

heights %>%
ggplot(aes(height, group = sex)) +
geom_density()

heights %>%
ggplot(aes(height, color = sex)) +
geom_density()