Process Data from Dirty to Clean (Terms) Flashcards

1
Q

A range of values that conveys how likely a statistical estimate reflects the population

A

Confidence interval

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

A character that indicates the beginning or end of a data item

A

Delimiter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

A data value that cannot be left blank or empty

A

Mandatory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

A file containing a chronologically ordered list of modifications made to a project

A

Changelog

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

A function that removes leading, trailing, and repeated spaces in data

A

TRIM

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

A function that returns a segment from the middle of a text string

A

MID

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

A function that returns a set number of characters from the left side of a text string

A

LEFT

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

A function that returns a set number of characters from the right side of a text string

A

RIGHT

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

A function that returns the length of a text string by counting the number of characters it contains

A

LEN

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

A group of characters within a cell, most often composed of letters

A

Text string

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

A keyword that is added to a SQL SELECT statement to retrieve only non-duplicate entries

A

DISTINCT

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

A number that contains a decimal

A

Float

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

A process that ensures certain conditions for multiple data fields are satisfied

A

Cross-field validation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

A process to confirm that a data-cleaning effort was well executed and the resulting data is accurate and reliable

A

Verification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

A process to determine if a survey or experiment has meaningful results

A

Hypothesis testing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

A professional who develops processes and procedures to effectively store and organize data

A

Data warehousing specialist

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

A professional who transforms data into a useful format for analysis and gives it a reliable infrastructure

A

Data engineer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

A rule that says the values in a table must match a prescribed pattern

A

Regular expression (RegEx)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

A spreadsheet function that calculates the number of days, months, or years between two dates

A

DATEDIF

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

A spreadsheet function that counts the total number of values within a specified range

A

COUNTA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

A spreadsheet function that divides text around a specified character and puts each fragment into a new, separate cell

A

Split

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

A spreadsheet function that joins together two or more text strings

A

CONCATENATE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

A spreadsheet function that returns the number of cells in a range that match a specified value

A

COUNTIF

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

A spreadsheet function that vertically searches for a certain value in a column to return a corresponding piece of information

A

VLOOKUP

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
A spreadsheet tool that automatically searches for and eliminates duplicate entries from a spreadsheet
Remove duplicates
26
A spreadsheet tool that changes how cells appear when values meet specific conditions
Conditional formatting
27
A SQL function that adds strings together to create new text strings that can be used as unique keys
CONCAT
28
A SQL function that converts data from one datatype to another
CAST
29
A SQL function that extracts a substring from a string variable
SUBSTR
30
A SQL function that returns non-null values in a list
COALESCE
31
A SQL statement that returns records that meet conditions by including an if/then statement in a query
CASE
32
A subset of a text string
Substring
33
A tool for checking the accuracy and quality of data
Data validation
34
A tool for determining how many characters can be keyed into a spreadsheet field
Field length
35
A tool that finds a specified search term and replaces it with something else
Find and replace
36
A value that can’t have a duplicate
Unique
37
A way of selecting a sample from a population so that every possible type of the sample has an equal chance of being chosen
Random sampling
38
An agreement that unites two organizations into a single new one
Merger
39
An indication that a value does not exist in a dataset
Null
40
Any data that has been superseded by newer and more accurate information
Outdated data
41
Any record that inadvertently shares data with another record
Duplicate data
42
Converting data from one type to another
Typecasting
43
Data that is complete but inaccurate
Incorrect/inaccurate data
44
Data that uses different formats to represent the same thing
Inconsistent data
45
Data that is complete, correct, and relevant to the problem being solved
Clean data
46
Data that is incomplete, incorrect, or irrelevant to the problem to be solved
Dirty data
47
Data that is missing important fields
Incomplete data
48
How well two or more datasets are able to work together
Compatibility
49
Nontechnical traits and behaviors that relate to how people work
Soft skills
50
Numerical values that fall between predefined maximum and minimum values
Data range
51
Skills and qualities that can transfer from one job or industry to another
Transferable skills
52
The accuracy, completeness, consistency, and trustworthiness of data throughout its life cycle
Data integrity
53
The average number of people who typically complete a survey
Estimated response rate
54
The criteria that determine whether a piece of a data is clean and valid
Data constraints
55
The degree to which data conforms to constraints when it is input, collected, or created
Validity
56
The degree to which data conforms to the actual entity being measured or described
Accuracy
57
The degree to which data contains all desired components or measures
Completeness
58
The degree to which data is repeatable from different points of entry or collection
Consistency
59
The maximum amount that sample results are expected to differ from those of the actual population
Margin of error
60
The number of characters in a text string
Length
61
The predetermined structure of a language that includes all required words, symbols, and punctuation, as well as their proper placement
Syntax
62
The probability that a sample size accurately reflects the greater population
Confidence level
63
The probability that a test of significance will recognize an effect that is present
Statistical power
64
The probability that sample results are not due to random chance
Statistical significance
65
The process of changing data to make it more organized and easier to read
Data manipulation
66
The process of combining two or more datasets into a single dataset
Data merging
67
The process of copying data from a storage device to computer memory or from one computer to another
Data transfer
68
The process of matching fields from one data source to another
Data mapping
69
The process of storing data in multiple locations
Data replication
70
The process of testing two variations of the same web page to determine which page is more successful at attracting user traffic and generating revenue
A/B testing