DAV CHAP 4 Flashcards

1
Q

Why do we need to do data pre-processing / handling?

A

For data visualization, it is all about data and if there is no interesting data then visualization is useless.

Data can be stored in subsets or various formats so conversion or extraction is needed before visualization can take place.

Data may need to be cleaned for missing, duplicate values or outliers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is dirty data?

A

Data with some kind of errors in them / in a format that is unusable or unfriendly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the types of data that need to be cleaned?

A
  1. Data that is not parsed correctly
    - Delimiter used is not a character found in the data
  2. Extra characters
    - Remove the extra characters
  3. Duplicate data records
    - Caused by manual mistake + program error that submit twice
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Difference between join and union data?

A

Join is combining data into new columns while union is new rows

How well did you know this?
1
Not at all
2
3
4
5
Perfectly