Missing values Flashcards

week 3

1
Q

generate the summary statistic abs visualizations of dataset

A

skim()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

function is useful for adding rows for the missing combinations of variables

A

complete()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

visualize missing data

A

vis_miss()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

There are three main types of missing values:

A
  1. Missing Completely at Random (MCAR). “The dog eats homework”
  2. Missing at Random (MAR). “The dog ate a particular student’s homework”
  3. Missing not at Random (MNAR). “The dog only eats bad homework”
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

command is useful to use alongside a
filter
to just get the complete rows. (missing values)

A

complete.cases()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Mean value imputation

A

Replace any missing values with the mean of the available data for numeric variables.
Replace missing values with the modal (i.e. most common) category (level).
Very simple to implement.
Very crude – can distort structure of dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

function can do mean imputation

A

impute_mean()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Nearest neighbour imputation

A

We do imputation based on records that are similar to the one with missing data. Can measure similarity (or rather dissimilarity) by calculating a distance between records. Could use Euclidean (straight line) distance.
Or some other criteria.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly