r_gapminder Flashcards
Show the “country” and “infant_mortality” for the following from the gapminder dataset:
- year == 2015
- country == Sri Lanka and Turkey
gapminder %>% +
filter(year == 2015 & country %in% c(“Sri Lanka”, “Turkey”)) %>% +
select(country, infant_mortality)
Use faceting and ggplot to create a scatter plot with the following:
- filter on “gapminder” dataset and “years” 1962 and 2012
- plot “fertility” and “life_expectancy”, with colored “continent” labels
- facet_grid with “continent” in the rows and “year” in the columns
filter(gapminder, year%in%c(1962, 2012) ) %>% +
ggplot( aes( fertility, life_expectancy, col = continent) ) +
geom_point() +
facet_grid( continent~year )
Use faceting and ggplot to create a scatter plot with the following:
- filter on “gapminder” dataset and “years” 1962 and 2012
- plot “fertility” and “life_expectancy”, with colored “continent” labels
- facet_grid with “year” in the columns
filter(gapminder, year%in%c(1962, 2012) ) %>% +
ggplot( aes( fertility, life_expectancy, col = continent) ) +
geom_point() +
facet_grid( .~year )
What are Time Series plots?
- Plots with:
- time in the x-axis
- an outcome or measurement on the y-axis
Create a time series plot with the following:
- Gapminder dataset
- Filter on “country” of United States
- Plot “year” on the x-axis and “fertility” on the y-axis
- Use a geom_line
gapminder %>% filter(country == “United States”) %>% +
ggplot( aes( year, fertility) ) +
geom_line()
What is a key step you must take when comparing two countries in time series plots?
- By default, a line will go through the points for both countries
- To let ggplot know we want two separate lines, we assign each point to a group (one for each country)
Create a time series plot for two countries (South Korea, Germany), and plot on “year” and “fertility”
countries <- c(“South Korea”, “Germany”)
gapminder %>% filter(country %in% countries) %>% +
ggplot(aes (year, fertility, color = countries) ) +
geom_line()
Note: the use of “color” has two effects:
- same as “group = countries”
- also colors the countries in the group by different colors and inserts a legend
Create a time series with the following:
- labels
- country = countries
- x = 1975, 1965
- y = 60, 72
- Use gapminder dataset
- filter on country
- No legend
- for geom_text
- use size of 5
- label = country
labels <- data.frame(country = countries, x = c(1975, 1965), y = c(60, 72))
gapminder %>% filter(country %in% countries) %>% +
ggplot( aes( year, life_expectancy, col = country) ) +
geom_line() +
geom_text( data = labels, aes( x, y, label = country) , size = 5) +
them(legend.position = “none”)
What is the mode of a normal distribution?
- the average
What are local modes?
- Points on a normal distribution where the distiribution goes down and up again
Use the gapminder dataset to do the following:
- Filter on “past_year” from year and no NA in gdp
- Plot dollars_per_day
- Histogram with bin values of ‘1’ and color black
- Transform the scale to base 2
gapminder %>% +
filter(year == “past_year”, !no.na(gdp) ) +
ggplot(dollars_per_day) +
geom_histogram(binwidth=1, color=black) +
scale_x_continuous(trans=”log2”)
Take the existing dataset ‘p’ and adjust as follows:
- Change to box plot
- Rotate the x-axis labels 90 degrees
p + geom_plot() +
theme(axis.text.x = element_text(angle = 90, hjust = 1) )
Use the gapminder dataset to do the following:
- Filter on “past_year” from year and no NA in gdp
- Create a “region” variable that orders the plot on “region” and “dollars per day” with the median function
- Plot “region”, “dollars_per_day” with fill on “continent”
- Box plot
- Rotate the x-axis labels 90 degrees
- xlab
gapminder %>% +
filter( year==”pas_year”, !is.na(gdp) %>% +