Summarizing Data Flashcards

1
Q

Describe univariate data

A
  • 1 variable
  • For describing a distribution
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Describe bivariate and multivariate data

A
  • 2 / 3 or more respectively
  • For exploring relationships between variables
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Graphs that can be used for any type of categorical or quantitative data:

A
  • Categorical:
    • Bar graph
  • Quantitative:
    • Histogram
    • Dotplot
    • Boxplot
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Best graph for: 2 quantitative (and 1 categorical sometimes)

A
  • Scatterplot
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Best graph for: 2 categorical

A

Two-way table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Best graph for: 1 quantitative, 1 (or 2) categorical

A
  • Stripchart
  • Boxplot
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How do you determine skew?

A
  • mean > median = right skew
    • Boxplot: mean is above the median
  • mean < median = left skew
    • Boxplot: mean is below the median
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Define standard deviation

A
  • Standard deviation: measure of spread about the mean
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Describe the interquartile ranges of a boxplot

A
  • Each line is 25%
    • However, we cannot define any other percentage (ex., 85th percentile)
  • Q1: median of numbers to the left
  • Q2: median
  • Q3: median of numbers to the right
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the 5 number summary?

A

Minimum Q1 M Q3 Maximum

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is IQR and what is it used for?

A
  • IQR = Q3 - Q1
  • 1.5xIQR = outliers
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Define missing completely at random (MCAR)

A
  • MCAR: missing value is truly random from the population (accidentally skipped the question, accidentally dropped the test tube)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Define missing not at random (MNAR)

A
  • MNAR: likelihood of missingness is associated with particular values (low test shore = higher chance of it missing)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Define missing at random (MAR)

A
  • MAR: a second variable influences the likelihood of missingness, but not the variable itself (females not reporting weight)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Describe the 3 ways to handle missing values

A
  1. Deletion
    • Complete case: get rid of the individual, decrease sample size
    • Pairwise: e.g. 8 individuals, only 7 data points for income
  2. Imputation: replace with the mean/most frequent/predicted value (controversial)
  3. Treat as a new category (e.g. a choice of colour is now ‘N/A’)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is metadata?

A
  • Metadata: descriptive paragraph, table of information, or other files that help understand the data
    • Description of data collection
    • Where to access data
    • Known problems
    • Quality check characteristics (number of rows/columns, sum for numerical columns)