Course 4: Module 2 Flashcards

(31 cards)

1
Q

Dirty data

A

Data that is incomplete, incorrect, or irrelevant to the problem you’re trying to solve

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Clean data

A

Data that is complete, correct, and relevant to the problem you’re trying to solve

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Null

A

An indication that a value does not exist in a dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Duplicate data

A

Any data record that shows up more than once

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Outdated data

A

Any data that is old which should be replaced with newer and more accurate information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Incomplete data

A

Any data that is missing important fields

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Incorrect/Inaccurate data

A

Any data that is complete but inaccurate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Inconsistent data

A

Any data that uses different formats to represent the same thing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Field

A

A single piece of information from a row or column of a spreadsheet

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Data validation

A

A tool for checking the accuracy and quality of data before adding or importing it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Data merging

A

The process of combining two or more datasets into a single dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Compatibility

A

How well two or more datasets are able to work together

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Common mistakes to avoid

A
  • Not checking for spelling errors
  • Forgetting to document errors
  • Not checking for misfielded values
  • Overlooking missing values
  • Only looking at a subset of the data
  • Not fixing the source of the error
  • Not analyzing the system prior to data cleaning
  • Not backing up your data prior to data cleaning
  • Not accounting for data cleaning in your deadlines/process
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Conditional formatting

A

A spreadsheet tool that changes how cells appear when values meet specific conditions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Remove duplicates

A

A tool that automatically searches for and eliminates duplicate entries from a spreadsheet

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Text string

A

A group of characters within a cell, most often composed of letters

17
Q

Split

A

A tool that divides text around a specified character and puts each fragment into a new, separate cell

18
Q

Concatenate

A

A function that joins multiple text strings into a single string

19
Q

COUNT IF

A

Returns the number of cells that match a specified value

20
Q

Syntax

A

A predetermined structure that includes all required information and its proper placement

21
Q

Len

A

A function that tells you the length of a text string by counting the number of characters it contains

22
Q

LEFT

A

A function that gives you a set number of characters from the left side of a text string

23
Q

RIGHT

A

A function that gives you a set number of characters from the right side of a text string

25
MID
A function that gives you a segment from the middle of a text string
26
Trim
A function that removes leading, trailing, and repeated spaces in data
27
Pivot table
A data summarization tool that is used in data processing
28
VLOOKUP
A function that searches for a certain value in a column to return a corresponding piece of information
29
Data mapping
The process of matching fields from one data source to another
30
Compatibility
How well two or more datasets are able to work together
31
Schema
A way of describing how something is organized