Week 4: Data Visualisation and Transformation Flashcards

1
Q

REVERSED

  • Maximise data-to-ink ratio
  • Present more data without losing interpretability
  • Use levels of detail
A

What are 3 distilled principles from Tufte?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

REVERSED

geom_point() #scatterplot
geom_density_2d() #contour lines from x and y positions
geom_histogram() #histogram, use binwidth to change size of bins
geom_density() #density plot
geom_rug() #adds rug marks of raw data on the axes
geom_boxplot() #boxplot
geom_bar() #barplot, automatically transforms variables to counts using stat_count
geom_col() #column graph, need x and y variables
geom_line() #line chart
geom_label(aes(x=, y=, label=)) #gives a label at the specified x and y position
geom_freqpoly() #same as histogram but uses lines instead of bars, good for overlapping data
geom_area() #area chart
geom_smooth(method=‘lm’) #draws line of best fit. se=TRUE is default and displays the standard error lines aswell
geom_abline() #adds reference line, default is from data
geom_ribbon(x, ymin, ymax) #draws line with lower and upper bounds e.g. confidence interval

A

What are some useful geoms? (15)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

REVERSED

filter(!is.na(data))

A

How do you remove the missing values from a dataframe using filter?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

REVERSED

get the residual and plot the data against the residual

A

How do you “flatten” a graph?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

REVERSED

more data with less ink on the page

A

What is a high data-ink ratio?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

REVERSED

ungroup()

A

How do you remove a grouping to return to operations on ungrouped variables?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

REVERSED

+ xlim(0,5)

+ coord_cartesian(xlim = c( , ))
this way doesn’t throw away the data outside the limits, used to zoom

A

How do you set axis limits in 2 ways?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

REVERSED

%in% #in

between(x, left, right) #finds rows where x is between left and right

A

How do you use in and between in R?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

REVERSED

Using aes(group= ) is the same as mapping a variable onto an aesthetic, it separates based on the discrete variable but does not specify the difference with a legend on the graph

A

What does the “group” mapping do in a geom?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

REVERSED

anything from the input can be called in server using input$…

A

How do you call something from the input in the server in shiny?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

REVERSED

is.na()

A

How do you determine if a value is a missing value in R?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q
# Define a plot called distplot in the server using output$distplot. 
In ui.R can call directly with distplot
A

How do you define something e.g. a plot in the server and how do you refer to it in the ui in shiny?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

REVERSED

mutate(data, new column = …)

A

How do you add a new column to a dataframe?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

REVERSED

nrow =
ncol =
specifies number of rows or columns

A

What are additional arguments to facet_wrap? (2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

REVERSED

The output depends on the users input. all output must be wrapped in a renderSomething function, this watches the input and updates accordingly

A

What is reactivity in Shiny?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

REVERSED

min_rank(x) #gives 1 to the lowest value and so on
min_rank(desc(x)) #gives 1 to the highest value

A

How do you give a ranking to values in a vector x?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

REVERSED

filter(dataframe, value1==1, value2==4)

A

How do you select only certain rows of a dataframe based on their values?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

REVERSED

geom_count, creates a graph with different size dots at each intersection
geom_tile with fill aesthetic darker depending on combinations of each value:
data %>% count(variable1, variable2) %>% ggplot(aes(x = variable1, y = variable2)) + geom_tile(aes(fill = n))

A

What is a good way to display two categorical variables?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

REVERSED

The mapping in the geom overrides the global mapping in ggplot

A

If you have different mappings in ggplot() and geom(aes()), which one will override?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

REVERSED

read_csv(‘filepath’)

A

How do you load a csv into a dataframe in R?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

REVERSED

As the number of points and number of categories increases, facet grid becomes better

A

When should you use facet grid over mapping by colour?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

REVERSED

facet_wrap(~variable)
create formula with ~, variable should be discrete

A

How do you use facet wrap?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

REVERSED

Facet of 4 graphs with very different data but the same line of best fit, which has the same linear regression coefficient 0.7

A

What is Ansocombe’s quartet?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

REVERSED

mutate(column = ifelse(test, value1, value2))
test is a logical vector ie column<5
value 1 is what to do when test is true
value 2 is what to do when test is false

A

How do you replace unusual values with missing values in a dataframe?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

REVERSED

Yes to colour, gives continuous scale. No to shape

A

Can you map continuous variables to colour and shape?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

REVERSED

cut(variable, breaks=(0,10,20), labels = c(“”, “”))

A

How do you split a variable into different sections and label the sections?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

REVERSED

+ scale_y_log10() + scale_x_log10()

A

How do you “straighten” a graph?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

REVERSED

geom_freqploy(aes(x=price, y=..density..))

A

How do you get a freqpoly to display density instead of count?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

REVERSED

cumsum()

A

How do you do a cumulative sum?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

REVERSED

E.g. colour = variable <5. Gives true or false for colours

A

How do you map an aesthetic to a function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

REVERSED

  1. position on a common scale
  2. position on an unaligned scale
  3. length
  4. tilt/angle
  5. area
  6. depth
  7. colour luminance/colour saturation
  8. curvature/volume
A

What is the ranking of effectiveness for ordered attributes? (8)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

REVERSED

data %>% group_by(variable) %>% summarise(newname = mean(distance, na.rm=TRUE))

A

How do you get a grouped summary (mean for this question) of a variable grouped by another variable?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

REVERSED

  • Unhelpful errors
  • Dependency hell
  • Unreliable back end
  • Need to run R on a server
  • Reactivity can be slow
A

What are the disadvantages of r shiny? (5)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

REVERSED

sum(x>10) gives the number of TRUE’s in x>10
mean(x>10) gives the proportion of TRUEs in x>10

A

How do you get a count of how many x>10 in a variable? How do you get a proportion?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

REVERSED

arrange(data, desc(column1))

A

How do you arrange rows in descending order?

36
Q

REVERSED

library(tidyverse)

A

What package is ggplot2 in?

37
Q

REVERSED

mean ()
sd()

A

How do you get mean and standard deviation?

38
Q

REVERSED

colour = NA

A

How do you change the colour to no colour?

39
Q

REVERSED

contains everything which should be done in the background

A

What is contained in the server.R file in shiny?

40
Q

REVERSED

Put the variable with more levels in the columns

A

Using facet grid, which variable should you put in the rows and which in the columns?

41
Q

REVERSED

arrange(data, column1, column2) 
#orders by column 1 then column 2 in ascending order
A

How do you change the order of rows?

42
Q

REVERSED

Excludes them automatically

A

What does filter do with NA values?

43
Q

REVERSED

  1. electric shock
  2. saturation
  3. length
  4. area
  5. depth
  6. brightness
A

What is the order of how humans perceive sensations? (6)

44
Q

REVERSED

Gives NA when there are missing values
Use na.rm=TRUE to remove missing values prior to calculation

A

What do aggregate functions do when there are missing values? How do you change the default?

45
Q

REVERSED

transmute(data, newcolumn = …)

A

How do you add a new column to a dataframe and keep only the new columns?

46
Q

REVERSED

geom_point (scatterplot)

geom_bin2d and geom_hex() divide plane into 2d bins and use to colour to display how many points in each bin

use boxplot and divide one continuous variable into a categorical: geom_boxplot(aes(group = cut_width(x, width))

A

What is a good way to display two continuous variables? (3)

47
Q

REVERSED

library(GGally)
ggpairs(data, columns= c(‘column1’, ‘column2’, ‘column3’)]

A

What is a different way of creating a matrix of plots?

48
Q

REVERSED

  1. spatial region
  2. colour hue
  3. motion
  4. shape
A

What is the ranking of effectiveness for categorical attributes? (4)

49
Q

REVERSED

provides schematic for how the application will be presented to the user. e.g. titlePanel(), sidebarLayout(), sidebarPanel(), mainPanel()

A

What is contained in the ui.R file in shiny?

50
Q

REVERSED

, >=, !=, ==

A

What are the comparison operators in R?

51
Q

REVERSED

If it gives information about a variable it goes inside the aes(). If not it must go outside the aes()

A

Does the aesthetic mapping go inside or outside the aes()?

52
Q

REVERSED

tibble(column1= , column2= )
data.frame(column1=, column2= )

can use as.character(column1= )

data.frame automatically sets characters as factors unless you set stringsAsFactors=FALSE

A

What are 2 codes to make vectors into a dataframe?
How do you specify the type?
What is a difference between the 2 codes?

53
Q

REVERSED

Facets are subplots that each display a subset of the data

A

What is a facet?

54
Q

REVERSED

  • Aesthetics (position, shape, colour, …)
  • Geometric objects (points, lines, bars, …)
  • Scales (continuous, discrete, Cartesian coordinates, …)
  • Facets (small multiples)
  • Statistical transformation (identity, binning, median, …)
  • Coordinate system (Cartesian, polar, parallel, …)
A

What are the layers of Wickhams grammar of graphics? (6)

55
Q

REVERSED

Like a boxplot but shows the full density plots

A

What is a violin plot?

56
Q

REVERSED

!!vector #bang-bang operator, matches to names of variables in vector

all_of(vector)
any_of(vector)
#in all of, all names must be present. in any of they don’t

A

How do you select columns by matching them to names contained in a string vector? (3 ways)

57
Q

REVERSED

  1. Additional groups will go unplotted
A

How many shapes can you plot as a mapping? What happens if there are more than that many factors?

58
Q

REVERSED

+ labs(y= “”, x= “”)

A

How do you label the axes?

59
Q

REVERSED

geom_point(position=“jitter”) OR geom_jitter()

A

What are two ways to add jitter to a plot?

60
Q

REVERSED

  • Proximity: things that are spatially near to one another seem to be related
  • Similarity: things that look alike seem to be related
  • Connection: things that are visually tied to one another seem to be related
  • Continuity: partially hidden objects are completed into familiar shapes
  • Closure: incomplete shapes are perceived as complete
  • Figure and ground: visual elements are taken to be either in the foreground or in the background
  • Common fate: elements sharing a direction of movement are perceived as a unit
A

What are the Gestalt principles of relatedness? (7)

61
Q

REVERSED

& #and
| #or
! #not

A

What are the logical operators in R?

62
Q

REVERSED

coord_cartesian() #default
coord_flip() #switches the x and y axis
coord_map() #sets the aspect ratio correctly for maps
coord_quickmap() #sets the aspect ratio correctly for maps, quicker
coord_polar() #uses polar coordinates. use argument (theta=“y”) to create a pie chart. otherwise you get a bulls-eye chart
coord_fixed() #forces a specified ratio between the axes, default is 1

A

What are 6 coordinate systems in ggplot?

63
Q

REVERSED

boxplot
freqpoly, map by colour

A

What is a good way to display a categorical variable vs a continuous variable?

64
Q

REVERSED

count(variable1, variable2)
gives number of combinations in a table

A

How do you get the counts of each combination when comparing 2 categorical variables?

65
Q

REVERSED

ui

A

What is the structure of a shiny app?

66
Q

REVERSED

How humans perceive sensations compared to how they actually change

A

What is Steven’s Psychophysical Power Law?

67
Q

REVERSED

Can’t be at the beginning of a line. Must be at end of previous line

A

When using the + in ggplot, where in the line must it go?

68
Q

REVERSED

data[1:200,]

A

code to keep the first 200 rows and all columns of a dataframe?

69
Q

REVERSED

ggplot(data = ) +
geom(mapping=aes(), stat=, position=) +
+

A

What is the general format of a ggplot code?

70
Q

REVERSED

rename(data, newname = variable1)

A

How do you rename a variable?

71
Q

REVERSED

show.legend = FALSE

A

How do you remove the legend?

72
Q

REVERSED

Uses statistical transformation stat_count() to find the count of each variable. Can override using stat = “identity” for example to use a y value instead of count

A

How does geom_bar get the count for each variable? How do you override it?

73
Q

REVERSED

facet_grid(variable1~variable2)

A

How do you facet a plot on a combination of 2 variables?

74
Q

REVERSED

ggplot(data) + geom_boxplot(aes(x = reorder(xvariable, yvariable, FUN = median), y = yvariable))

A

How do you reorder boxplots by median?

75
Q

REVERSED

geom_bar creates is own bin for NAs
geom_histogram removes missing values

A

What happens to missing values in geom_bar and geom_histogram?

76
Q

REVERSED

max(variable) #gives max value
which.max(variable) #gives location of max value

A

How do you get the max value of a variable and the location of the max value?

77
Q

REVERSED

bar chart

A

What is a good way to visualise a single categorical variable?

78
Q

REVERSED

select(data, variable1, variable2)

A

How do you select only certain columns of a dataframe?

79
Q

REVERSED

The bars are automatically stacked with colours for each object in the variable (position = “stack”)

position = “identity” #places each object exactly where it falls in context of graph. Will include overlapping so should use alpha 
position = “fill” #works like stacking but makes each set of stacked bars the same height. Makes it easier to compare proportions across groups 
position = “dodge” #places overlapping objects beside each other. Makes it easier to compare individual values
A

What happens when you map an aesthetic (colour) to a bar chart? What 3 ways are there to change how this looks?

80
Q

REVERSED

lead(x) removes first number and adds NA to end
lag(x) adds NA to beginning and removes last number

A

What do lead(x) and lag(x) do to a vector x?

81
Q

REVERSED

select(data, variable5, variable7, everything()) 
#moves variables 5 and 7 to the start but still keeps all others
A

How do you rearrange the columns to put certain columns at the start?

82
Q

REVERSED

colour #changes colour,
size #changes size
alpha #changes transparency between 0 and 1
shape #changes shape
linetype #from 0-6. 0 is blank, 1 is solid, others are dotted or dashed
fill #fill colour of shape or curve
stroke #modify width of border. E.g. on shapes
fontface #character “plain”, “bold”, “italic” “bold.italic”
group #groups by a specified variable

A

What are some possible aesthetic mappings? (9)

83
Q

REVERSED

histogram, puts continuous variables into bins 
density plot (smoothed histogram)
A

What is a good way to visualise a single continuous variable? (2)

84
Q

REVERSED

select(data, -(variable1))

A

How do you exclude a column from a dataframe?

85
Q

REVERSED

%/% (integer division)
%% (remainder)

A

What are the modular arithmetic symbols?

86
Q

REVERSED

table(variable)

A

How do you create a table with a count for each value in a variable?

87
Q

REVERSED

starts_with(“abc”) c
ends_with(“abc”)
contains(“abc”)
matches(“expression”)

A

How do you select column that start, with, end with, contain or match a string?