W3: Data Visualization Flashcards

1
Q

What are the 2 methods of scoring scales?

A
  1. Added together for total sum score
  2. Average of all items
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the two ways of averaging item scales using rowMeans()

A
  1. Normal, just average
  2. Multiple by number of items after averaging
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is .SD

A
  • Refers to Subset (S) of Data (D)
  • On its own returns all data you’re working on
    E.g unicorn[ , .SD]
  • .SDcols tells data.table what columns you want
    db[, StressAVG := rowMeans(.SD, na.rm = TRUE), .SDcols = c(“PSS1”, “PSS2”, “PSS3”, “PSS4)]
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What function is used to calculate reliability of a scale?

A

psych::alpha()
* refers to Cronbach’s alpha

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What should you add to psych::alpha when using reverse scored scales?

A
  • check.keys = TRUE
    E.g psych::alpha( as.data.frame( db
    [, .(PSS1, PSS2r, PSS3r, PSS4)]),
    check.keys = TRUE)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What do aesthetics do and what are 4 examples of them in ggplot2?

A
  • Controls how geometrics are displayed
  • Size, shape, colour, transparency level
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are 4 common geoms_ used for univariate graphs?

A

geom_histogram( ) , geom_density( ) , geom_dotplot( ), geom_qq( )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What argument does geom_qq() need?

A

scale( predictor ) to z-score data
* z = (x-mean) / SD
geom_abline( intercept = 0, slope = 1): line where all points would fall if normally distributed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What function is used to check distribution?

A

plot( testDistribution() )
E.g plot(testDistribution(db$Stress,
extremevalues = “theoretical”, ev.perc = .005))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What 3plots are shown from using plot( testDistribution() )?

A

Density plot, rug plot, deviates plot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What do you need to do when mapping additional categorical variables onto graphs?

A

Convert variable into a factor
* db [, sex := factor ( sex, levels = c(1,2),
labels = c(“male”, “female))]
* ggplot(db[!is.na(sex)], aes(Stress, colour = sex)) + geom_density()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

When using geom_histogram, it is more helpful to control what?

A

Fill colour
e.g ggplot(db[!is.na(sex)], aes(Stress, fill = sex)) +
geom_histogram()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the argument to have bars side by side when using geom_historgram?

A

geom_histogram(position = “dodge”)
* bars are stacked by default

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are 3 common geoms used for bivariate graphs?

A

geom_point() scatter plot, geom_line(), geom_bar(stat = “identity”) for values to be actual bar height

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is best practice for data visualization?

A

More data, less ink

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are 4 ways to reduce ink and provide more data in graphs?

A
  1. Remove background borders - theme_pubr()
  2. Remove axis lines - theme(axis.line = element_blank() )
  3. Replace geom_bar with geom_point
  4. Using shapes for values - scale_shape_manual(
    name = “Sex”,
    values = c(“male” = 1, “female” = 3))
17
Q

How do you change axes to only go the range of observed data?

A

Using geom_rangeframe()

18
Q

What are 2 ways to add [interquartile] break points to axis labels?

A
  1. Using quantile()
    * scale_x_continuous(breaks = as.numeric(quantile(db$Stress)))
    * scale_y_continuous(breaks = as.numeric(quantile(db$SE)))
  2. Using scale_x/y_discrete
    scale_y_continuous(labels = percent) +
    scale_x_discrete(
    breaks = c(“High SE”, “Low SE”),
    labels = c(“High SE (median)”, “Low SE (median)”))
19
Q

What are the functions used for boxplot with raw data shown?

A

geom_boxplot() + geom_jitter()

20
Q

What are the 2 ways to provide mean and/or 95% CI on graphs?

A
  1. stat_summary(fun.data = mean_cl_normal)
  2. Using prop.test
    LL = prop.test
    (x = sum(sex == “female”, na.rm = TRUE), n = sum(!is.na(sex)), correct = FALSE)$conf.int[1],
    UL = prop.test(
    x = sum(sex == “female”, na.rm = TRUE),
    n = sum(!is.na(sex)), correct =FALSE)$conf.int[2])
21
Q

What should you do before graphing all categorical variables?

A

Make 1 variable “continuous” by getting their percentages using egltable()

22
Q

What function provides multi-panel plot which is useful for all categorical variables graphing?

A

facet_grid and/or coord_flip

23
Q

What is the common graph/geom for all continuous variables?

A

geom_point i.e scatter plot

24
Q

What are 4 things you can add to a graph with all continuous variables?

A
  1. correlation coeff and p-values using
    cor.test(~ SE + Stress, data = db)
  2. regression line using
    stat_smooth(method = “lm”)
  3. text annotation using
    annotate(“text”, x = max(db$Stress), y = max(db$SE),
    label = “r = -0.65, p < .001”,
    size = 6, hjust = 1, vjust = 1)
  4. histograms to margins using
    ggMarginal( x, type = “histogram”)
25
Q

How do you make more space for long axis labels?

A

ggarrange( ggtitle ( “rotate text”) or (“rotate graph”)

26
Q

What are 3 ways to improve geom_dotplot visualizations?

A
  1. binwidth = .1 to shrink dot size
  2. alpha = .2 for dot transparency
  3. y = jitter to add noise of scores
27
Q

What are 2 scenarios you would use geom_violin?

A
  1. For large datasets
  2. To compare distributions across variables
28
Q

What is a benefit of using rowMeans instead of simply adding all variable scores together?

A

it imputes the mean for a person with missing data