Glossary Flashcards

(406 cards)

1
Q

A/B testing

A

The process of testing two variations of the same web page to determine which page is more successful at attracting user traffic and generating revenue.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Absolute reference

A

A reference within a function that is locked so that rows and columns won’t change if the function is copied.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Access control

A

Features such as password protection, user permissions, and encryption that are used to protect a spreadsheet.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Accuracy

A

The degree to which data conforms to the actual entity being measured or described.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Action-oriented question

A

A question whose answers lead to change.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Administrative metadata

A

Metadata that indicates the technical source of a digital asset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Aesthetic (R)

A

A visual property of an object in a plot.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Agenda

A

A list of scheduled appointments.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Aggregation

A

The process of collecting or gathering many separate pieces into a whole.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Algorithm

A

A process or set of rules followed for a specific task.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Aliasing

A

Temporarily naming a table or column in a query to make it easier to read and write.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Alternative text

A

Text that provides an alternative to non-text content such as images and videos.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Analytical skills

A

Qualities and characteristics associated with using facts to solve problems.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Analytical thinking

A

The process of identifying and defining a problem, then solving it by using data in an organized step-by-step manner.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Annotation

A

Text that briefly explains data or helps focus the audience on a particular aspect of the data in a visualization.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Anscombe’s quartet

A

Four datasets that have nearly identical summary statistics but contain different plotted values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Area chart

A

A data visualization that uses individual data points for a changing variable connected by a continuous line with a filled-in area underneath.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Argument (R)

A

Information needed by a function in R in order to run.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Arithmetic operator

A

An operator used to perform basic math operations such as addition, subtraction, multiplication, and division.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Array

A

A collection of values in spreadsheet cells.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Assignment operator

A

An operator used to assign values to variables and vectors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Attribute

A

A characteristic or quality of data used to label a column in a table.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Audio file

A

Digitized audio storage usually in an MP3, AAC, or other compressed format.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

AVERAGE

A

A spreadsheet function that returns an average of the values from a selected range.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
AVERAGEIF
A spreadsheet function that returns the average of all cell values from a given range that meet a specified condition.
26
Bad data source
A data source that is not reliable, original, comprehensive, current, and cited (ROCCC).
27
Balance
The design principle of creating aesthetic appeal and clarity in a data visualization by evenly distributing visual elements.
28
Bar graph
A data visualization that uses size to contrast and compare two or more values.
29
Bias
A conscious or subconscious preference in favor of or against a person, group of people, or thing.
30
Big data
Large complex datasets typically involving long periods of time which enable data analysts to address far-reaching business problems.
31
Boolean data
A data type with only two possible values, usually true or false.
32
Borders
Lines that can be added around two or more cells on a spreadsheet.
33
Box plot
A data visualization that displays the distribution of values along an x-axis.
34
Bubble chart
A data visualization that displays individual data points as bubbles, comparing numeric values by their relative size.
35
Bullet graph
A data visualization that displays data as a horizontal bar chart moving toward a desired value.
36
Business metric
A standard of measurement used to solve a business task.
37
Business task
The question or problem data analysis resolves for a business.
38
C#
An object-oriented programming language used to create games and mobile apps in the .NET open source developer platform.
39
C++
An extension of the C programming language that is used to create console games such as those for Xbox.
40
Calculated field
A new field within a pivot table that carries out certain calculations based on the values of other fields.
41
Calculus
A branch of mathematics that involves the study of rates of change and the changes between values that are related by a function.
42
CASE
A SQL statement that returns records that meet conditions by including an if/then statement in a query.
43
Case study
A common way for employers to assess job skills and gain insight into how a candidate approaches common data-related challenges.
44
CAST
A SQL function that converts data from one datatype to another.
45
Causation
When an action directly leads to an outcome, such as a cause-effect relationship.
46
Cell reference
A cell or a range of cells in a worksheet typically used in formulas and functions.
47
Changelog
A file containing a chronologically ordered list of modifications made to a project.
48
Channel
A visual aspect or variable that represents characteristics of the data in a visualization.
49
Chart
A graphical representation of data from a worksheet.
50
Circle view
A data visualization that shows comparative strength in data.
51
Clean data
Data that is complete, correct, and relevant to the problem being solved.
52
Cloud
A place to keep data online rather than a computer hard drive.
53
Cluster
A collection of data points on a data visualization with similar values.
54
COALESCE
A SQL function that returns non-null values in a list.
55
Code chunk
A piece of code added in an R Markdown file that is used to process, visualize, or analyze data.
56
Coding
The process of writing instructions to a computer in the syntax of a specific programming language.
57
Column chart
A data visualization that uses individual data points for a changing variable represented as vertical columns.
58
Combo chart
A data visualization that combines more than one visualization type.
59
Compatibility
How well two or more datasets are able to work together.
60
Completeness
The degree to which data contains all desired components or measures.
61
Computer programming
The process of giving instructions to a computer in order to perform an action or set of actions.
62
CONCAT
A SQL function that adds strings together to create new text strings that can be used as unique keys.
63
CONCATENATE
A spreadsheet function that joins together two or more text strings.
64
Conditional formatting
A spreadsheet tool that changes how cells appear when values meet specific conditions.
65
Conditional statement
A declaration that if a certain condition holds, then a certain event must take place.
66
Confidence interval
A range of values that conveys how likely a statistical estimate reflects the population.
67
Confidence level
The probability that a sample size accurately reflects the greater population.
68
Confirmation bias
The tendency to search for or interpret information in a way that confirms pre-existing beliefs.
69
Consent
The aspect of data ethics that presumes an individual’s right to know how and why their personal data will be used before agreeing to provide it.
70
Consistency
The degree to which data is repeatable from different points of entry or collection.
71
Context
The condition in which something exists or happens.
72
Continuous data
Data that is measured and can have almost any numeric value.
73
CONVERT
A SQL function that changes the unit of measurement of a value in data.
74
Cookie
A small file stored on a computer that contains information about its users.
75
Correlation
The measure of the degree to which two variables change in relationship to each other.
76
COUNT
A spreadsheet function that counts the number of cells within a range that meet a specified condition.
77
COUNTA
A spreadsheet function that counts the total number of values within a specified range that meet specified criteria.
78
COUNTIF
A spreadsheet function that returns the number of cells within a range that match a specified value.
79
COUNT DISTINCT
A SQL function that only returns the distinct values in a specified range.
80
CRAN (R)
An online archive with R packages, source code, manuals, and documentation.
81
CREATE TABLE
A SQL clause that adds a temporary table to a database that can be used by multiple people.
82
Cross-field validation
A process that ensures certain conditions for multiple data fields are satisfied.
83
CSS
A programming language used for web page design that controls graphic elements and page presentation.
84
CSV
A delimited text file that uses a comma to separate values.
85
Currency
The aspect of data ethics that presumes individuals should be aware of financial transactions resulting from the use of their personal data and the scale of those transactions.
86
Dashboard
A tool that monitors live incoming data.
87
Data
A collection of facts.
88
Data aggregation
The process of gathering data from multiple sources and combining it into a single summarized collection.
89
Data analysis
The collection, transformation, and organization of data in order to draw conclusions, make predictions, and drive informed decision-making.
90
Data analysis process
The six phases of ask, prepare, process, analyze, share, and act, whose purpose is to gain insights that drive informed decision-making.
91
Data analyst
Someone who collects, transforms, and organizes data in order to draw conclusions, make predictions, and drive informed decision-making.
92
Data analytics
The science of data.
93
Data anonymization
The process of protecting people's private or sensitive data by eliminating identifying information.
94
Data bias
When a preference in favor of or against a person, group of people, or thing systematically skews data analysis results in a certain direction.
95
Data blending
A Tableau method that combines data from multiple data sources.
96
Data composition
The process of combining the individual parts in a visualization and displaying them together as a whole.
97
Data constraints
The criteria that determine whether a piece of data is clean and valid.
98
Data design
How information is organized.
99
Data-driven decision-making
Using facts to guide business strategy.
100
Data ecosystem
The various elements that interact with one another in order to produce, manage, store, organize, analyze, and share data.
101
Data element
A piece of information in a dataset.
102
Data engineer
A professional who transforms data into a useful format for analysis and gives it a reliable infrastructure.
103
Data ethics
Well-founded standards of right and wrong that dictate how data is collected, shared, and used.
104
Data frame
A collection of columns containing data similar to a spreadsheet or SQL table.
105
Data governance
A process for ensuring the formal management of a company’s data assets.
106
Data-inspired decision-making
Exploring different data sources to find out what they have in common.
107
Data integrity
The accuracy, completeness, consistency, and trustworthiness of data throughout its life cycle.
108
Data interoperability
The ability to integrate data from multiple sources and a key factor leading to the successful use of open data among companies and governments.
109
Data life cycle
The sequence of stages that data experiences, which include plan, capture, manage, analyze, archive, and destroy.
110
Data manipulation
The process of changing data to make it more organized and easier to read.
111
Data mapping
The process of matching fields from one data source to another.
112
Data merging
The process of combining two or more datasets into a single dataset.
113
Data model
A tool for organizing data elements and how they relate to one another.
114
Data privacy
Preserving a data subject’s information any time a data transaction occurs.
115
Data range
Numerical values that fall between predefined maximum and minimum values.
116
Data replication
The process of storing data in multiple locations.
117
Data science
A field of study that uses raw data to create new ways of modeling and understanding the unknown.
118
Data security
Protecting data from unauthorized access or corruption by adopting safety measures.
119
Data storytelling
Communicating the meaning of a dataset with visuals and a narrative that are customized for an audience.
120
Data strategy
The management of the people, processes, and tools used in data analysis.
121
Data structure
A format for organizing and storing data.
122
Data transfer
The process of copying data from a storage device to computer memory or from one computer to another.
123
Data type
An attribute that describes a piece of data based on its values, its programming language, or the operations it can perform.
124
Data validation
A tool for checking the accuracy and quality of data.
125
Data validation process
The process of checking and rechecking the quality of data so that it is complete, accurate, secure, and consistent.
126
Data visualization
The graphical representation of data.
127
Data warehousing specialist
A professional who develops processes and procedures to effectively store and organize data.
128
Database
A collection of data stored in a computer system.
129
Dataset
A collection of data that can be manipulated or analyzed as one unit.
130
DATEDIF
A spreadsheet function that calculates the number of days, months, or years between two dates.
131
Decision tree
A tool that helps analysts make decisions about critical features of a visualization.
132
Delimiter
A character that indicates the beginning or end of a data item.
133
Density map
A data visualization that represents concentrations with color representing the number or frequency of data points in a given area on a map.
134
Descriptive metadata
Metadata that describes a piece of data and can be used to identify it at a later point in time.
135
Design thinking
A process used to solve complex problems in a user-centric way.
136
Digital photo
An electronic or computer-based image, usually in BMP or JPG format.
137
Dirty data
Data that is incomplete, incorrect, or irrelevant to the problem to be solved.
138
Discrete data
Data that is counted and has a limited number of values.
139
DISTINCT
A keyword that is added to a SQL SELECT statement to retrieve only non-duplicate entries.
140
Distribution graph
A data visualization that displays the frequency of various outcomes in a sample.
141
Diverging color palette
A color theme that displays two ranges of data values using two different hues with color intensity representing the magnitude of the values.
142
Donut chart
A data visualization where segments of a ring represent data values adding up to a whole.
143
dplyr (R)
An R package in Tidyverse that offers a consistent set of functions to complete common data-manipulation tasks.
144
DROP TABLE
A SQL clause that removes a temporary table from a database.
145
Duplicate data
Any record that inadvertently shares data with another record.
146
Dynamic visualizations
Data visualizations that are interactive or change over time.
147
Elevator pitch
A short statement describing an idea or concept.
148
Emphasis
The design principle of arranging visual elements to focus the audience’s attention on important information in a data visualization.
149
Engagement
Capturing and holding someone’s interest and attention during a data presentation.
150
Equation
A calculation that involves addition, subtraction, multiplication, or division (also called a math expression).
151
Estimated response rate
The average number of people who typically complete a survey.
152
Ethics
Well-founded standards of right and wrong that prescribe what humans ought to do, usually in terms of rights, obligations, benefits to society, fairness, or specific virtues.
153
External data
Data that lives and is generated outside of an organization.
154
Facets (R)
A series of functions that splits data into subsets in a matrix of panels.
155
Factor (R)
An object that stores categorical data where the data values are limited and usually based on a finite group such as country or year.
156
Fairness
A quality of data analysis that does not create or reinforce bias.
157
Field
A single piece of information from a row or column of a spreadsheet; in a data table, typically a column in the table.
158
Field length
A tool for determining how many characters can be keyed into a spreadsheet field.
159
Fill handle
A box in the lower-right-hand corner of a selected spreadsheet cell that can be dragged through neighboring cells in order to continue an instruction.
160
Filled map
A data visualization that colors areas in a map based on measurements or dimensions.
161
Filtering
The process of showing only the data that meets a specified criteria while hiding the rest.
162
Find and replace
A tool that finds a specified search term and replaces it with something else.
163
First-party data
Data collected by an individual or group using their own resources.
164
Float
A number that contains a decimal.
165
Foreign key
A field within a database table that is a primary key in another table (Refer to primary key).
166
Formula
A set of instructions used to perform a calculation using the data in a spreadsheet.
167
Framework
The context a presentation needs to create logical connections that tie back to the business task and metrics.
168
FROM
The section of a query that indicates from which table(s) to extract the data.
169
Function
A preset command that automatically performs a specific process or task using the data in a spreadsheet.
170
Function (R)
A body of reusable code for performing specific tasks in R.
171
FWF
A text file with a specific format which enables the saving of textual data in an organized fashion.
172
GAM
A process for smoothing plots with a large number of points.
173
Gantt chart
A data visualization that displays the duration of events or activities on a timeline.
174
Gap analysis
A method for examining and evaluating the current state of a process in order to identify opportunities for improvement in the future.
175
Gauge chart
A data visualization that shows a single result within a progressive range of values.
176
GDPR
Policy-making body in the European Union created to help protect people and their data.
177
Geolocation
The geographical location of a person or device by means of digital information.
178
Geom (R)
The geometric object used to represent data.
179
ggplot2 (R)
An R package in Tidyverse that creates a variety of data visualizations by applying different visual properties to the data variables in R.
180
Good data source
A data source that is reliable, original, comprehensive, current, and cited (ROCCC).
181
GROUP BY
A SQL clause that groups rows that have the same values from a table into summary rows.
182
HAVING
A SQL clause that adds a filter to a query instead of the underlying table that can only be used with aggregate functions.
183
head() (R)
An R function that returns a preview of the column names and the first few rows of a dataset.
184
Header
The first row in a spreadsheet that labels the type of data in each column.
185
Headline
Text at the top of a visualization that communicates the data being presented.
186
Heat map
A data visualization that uses color contrast to compare categories in a dataset.
187
Highlight table
A data visualization that uses conditional formatting and color on a table.
188
Highlight table
A data visualization that uses conditional formatting and color on a table
189
Histogram
A data visualization that shows how often data values fall into certain ranges
190
HTML (Hypertext Markup Language)
The set of markup symbols or codes used to create a webpage
191
HTML5
A programming language that provides structure for web pages and connects to hosting platforms
192
Hypothesis
A theory that one might try to prove or disprove with data
193
Hypothesis testing
A process to determine if a survey or experiment has meaningful results
194
IDE (Integrated Development Environment)
A software application that brings together all the tools a data analyst may want to use in a single place
195
Incomplete data
Data that is missing important fields
196
Inconsistent data
Data that uses different formats to represent the same thing
197
Incorrect/inaccurate data
Data that is complete but inaccurate
198
Inline code
Code that can be inserted directly into the text of an R Markdown file
199
INNER JOIN
A SQL function that returns records with matching values in both tables
200
Inner query
A SQL subquery that is inside of another SQL statement
201
Internal data
Data that lives within a company’s own systems
202
Interpretation bias
The tendency to interpret ambiguous situations in a positive or negative way
203
Java
A programming language widely used to create enterprise web applications that can run on multiple clients
204
JOIN
A SQL function that is used to combine rows from two or more tables based on a related column
205
Jupyter Notebook
An open-source web application used to create and share documents that contain live code, equations, visualizations and narrative text
206
Label
Text in a visualization that identifies a value or describes a scale
207
Labels and annotations (R)
A group of R functions used for customizing a plot
208
Leading question
A question that steers people toward a certain response
209
LEFT
A function that returns a set number of characters from the left side of a text string
210
LEFT JOIN
A SQL function that will return all the records from the left table and only the matching records from the right table
211
Legend
A tool that identifies the meaning of various elements in a data visualization
212
LEN
A function that returns the length of a text string by counting the number of characters it contains
213
Length
The number of characters in a text string
214
Library
A directory containing all of a data analyst’s installed packages
215
LIMIT
A SQL clause that specifies the maximum number of records returned in a query
216
Line graph
A data visualization that uses one or more lines to display shifts or changes in data over time
217
List
A vector whose elements can be of any type
218
Live data
Data that is automatically updated
219
Loess smoothing (R)
A process used for smoothing plots with fewer than 1,000 points
220
Log file
A computer-generated file that records events from operating systems and other software programs
221
Logical operator
An operator that returns a logical data type
222
Long data
A dataset in which each row is one time point per subject, so each subject has data in multiple rows
223
Mandatory
A data value that cannot be left blank or empty
224
Map
A data visualization that organizes data geographically
225
Mapping (R)
The process of matching up a specific variable in a dataset with a specific aesthetic
226
Margin of error
The maximum amount that sample results are expected to differ from those of the actual population
227
Markdown (R)
A syntax for formatting plain text files
228
Mark
A visual object in a data visualization such as a point, line, or shape
229
MATCH
A spreadsheet function used to locate the position of a specific lookup value
230
Math expression
A calculation that involves addition, subtraction, multiplication, or division (also called an equation)
231
Math function
A function that is used as part of a mathematical formula
232
Matrix
A two-dimensional collection of data elements with rows and columns
233
MAX
A spreadsheet function that returns the largest numeric value from a range of cells
234
MAXIFS
A spreadsheet function that returns the maximum value from a given range that meets a specified condition
235
McCandless Method
A method for presenting data visualizations that moves from general to specific information
236
Measurable question
A question whose answers can be quantified and assessed
237
Mental model
A data analyst’s thought process and approach to a problem
238
Mentor
Someone who shares knowledge, skills, and experience to help another grow both professionally and personally
239
Merger
An agreement that unites two organizations into a single new one
240
Metadata
Data about data
241
Metadata repository
A database created to store metadata
242
Metric
A single, quantifiable type of data that is used for measurement
243
Metric goal
A measurable goal set by a company and evaluated using metrics
244
MID
A function that returns a segment from the middle of a text string
245
MIN
A spreadsheet function that returns the smallest numeric value from a range of cells
246
MINIFS
A spreadsheet function that returns the minimum value from a given range that meets a specified condition
247
Modulo
An operator (%) that returns the remainder when one number is divided by another
248
Movement
The design principle of arranging visual elements to guide the audience’s eyes from one part of a data visualization to another
249
Naming conventions
Consistent guidelines that describe the content, creation date, and version of a file in its name
250
Narrative
(Refer to Story)
251
Nested
Code that performs a particular function and is contained within code that performs a broader function
252
Nested function
A function that is completely contained within another function
253
Networking
Building relationships by meeting people both in person and online
254
Nominal data
A type of qualitative data that is categorized without a set order
255
Normalized database
A database in which only related data is stored in each table
256
Notebook
An interactive, editable programming environment for creating data reports and showcasing data skills
257
Null
An indication that a value does not exist in a dataset
258
Observation
The attributes that describe a piece of data contained in a row of a table
259
Observer bias
The tendency for different people to observe things differently (also called experimenter bias)
260
Open data
Data that is available to the public
261
Open-source
Code that is freely available and may be modified and shared by the people who use it
262
Openness
The aspect of data ethics that promotes the free access, usage, and sharing of data
263
Operator
A symbol that names the operation or calculation to be performed
264
ORDER BY
A SQL clause that sorts results returned in a query
265
Order of operations
Using parentheses to group together spreadsheet values in order to clarify the order in which operations should be performed
266
Ordinal data
Qualitative data with a set order or scale
267
Outdated data
Any data that has been superseded by newer and more accurate information
268
OUTER JOIN
A SQL function that combines RIGHT and LEFT JOIN to return all matching records in both tables
269
Outer query
A SQL statement containing a subquery
270
Ownership
The aspect of data ethics that presumes individuals own the raw data they provide and have primary control over its usage, processing, and sharing
271
Package (R)
A unit of reproducible R code
272
Packed bubble chart
A data visualization that displays data in clustered circles
273
Pattern
The design principle of using similar visual elements to demonstrate trends and relationships in a data visualization
274
PHP (Hypertext Preprocessor)
A programming language for web application development
275
Pie chart
A data visualization that uses segments of a circle to represent the proportions of each data category compared to the whole
276
Pipe (R)
A tool in R for expressing a sequence of multiple operations, represented with “%>%”
277
Pivot chart
A chart created from the fields in a pivot table
278
Pivot table
A data summarization tool used to sort, reorganize, group, count, total, or average data
279
Pixel
In digital imaging, a small area of illumination on a display screen that, when combined with other adjacent areas, forms a digital image
280
Population
In data analytics, all possible data values in a dataset
281
Portfolio
A collection of materials that can be shared with potential employers
282
Pre-attentive attributes
The elements of a data visualization that an audience recognizes automatically without conscious effort
283
Primary key
An identifier in a database that references a column in which each value is unique (Refer to foreign key)
284
Problem domain
The area of analysis that encompasses every activity affecting or affected by a problem
285
Problem types
The various problems that data analysts encounter, including categorizing things, discovering connections, finding patterns, identifying themes, making predictions, and spotting something unusual
286
Profit margin
A percentage that indicates how many cents of profit has been generated for each dollar of sale
287
Programming language
A system of words and symbols used to write instructions that computers follow
288
Proportion
The design principle of using the relative size and arrangement of visual elements to demonstrate information in a data visualization
289
Python
A general-purpose programming language
290
Qualitative data
A subjective and explanatory measure of a quality or characteristic
291
Quantitative data
A specific and objective measure, such as a number, quantity, or range
292
Query
A request for data or information from a database
293
Query language
A computer programming language used to communicate with a database
294
R
A programming language used for statistical analysis, visualization, and other data analysis
295
R Markdown
A file format for making dynamic documents with R
296
R Notebook
A document for running code and displaying the graphs and charts that visualize the code
297
Random sampling
A way of selecting a sample from a population so that every possible type of the sample has an equal chance of being chosen
298
Range
A collection of two or more cells in a spreadsheet
299
Ranking
A system to position values of a dataset within a scale of achievement or status
300
Record
A collection of related data in a data table, usually synonymous with row
301
Redundancy
When the same piece of data is stored in two or more places
302
Reframing
The process of restating a problem or challenge, then redirecting it toward a potential resolution
303
Regular expression (RegEx)
A rule that says the values in a table must match a prescribed pattern
304
Relational database
A database that contains a series of tables that can be connected to form relationships
305
Relational operator
An operator used to compare values, also known as a comparator
306
Relativity
The process of considering observations in relation or proportion to something else
307
Relevant question
A question that has significance to the problem to be solved
308
Remove duplicates
A spreadsheet tool that automatically searches for and eliminates duplicate entries from a spreadsheet
309
Repetition
The design principle of repeating visual elements to demonstrate meaning in a data visualization
310
Report
A static collection of data periodically given to stakeholders
311
Return on investment (ROI)
A formula that uses the metrics of investment and profit to evaluate the success of an investment
312
Revenue
The total amount of income generated by the sale of goods or services
313
Rhythm
The design principle of creating movement and flow in a data visualization to engage an audience
314
RIGHT
A function that returns a set number of characters from the right side of a text string
315
RIGHT JOIN
A SQL function that will return all records from the right table and only the matching records from the left
316
Root cause
The reason why a problem occurs
317
ROUND
A SQL function that returns a number rounded to a certain number of decimal places.
318
Ruby
An object-oriented programming language for web application development
319
Sample
In data analytics, a segment of a population that is representative of the entire population
320
Sampling bias
Overrepresenting or underrepresenting certain members of a population as a result of working with a sample that is not representative of the population as a whole
321
Scatterplot
A data visualization that represents relationships between different variables with individual data points without a connecting line
322
Schema
A way of describing how something, such as data, is organized
323
Scope of work (SOW)
An agreed-upon outline of the tasks to be performed during a project
324
Second-party data
Data collected by a group directly from its audience and then sold
325
SELECT
The section of a query that indicates from which column(s) to extract the data
326
SELECT INTO
A SQL clause that copies data from one table into a temporary table without adding the new table to the database
327
Shiny (R)
An R package used to build interactive web apps with R code
328
Small data
Small, specific data points typically involving a short period of time, which are useful for making day-to-day decisions
329
SMART methodology
A tool for determining a question’s effectiveness based on whether it is specific, measurable, action-oriented, relevant, and time-bound
330
Smoothing (R)
A process used to make data visualizations in R clearer and more readable
331
Smoothing line (R)
A line on a data visualization that uses smoothing to represent a trend
332
Social media
Websites and applications through which users create and share content or participate in social networking
333
Soft skills
Nontechnical traits and behaviors that relate to how people work
334
Sort range
A spreadsheet menu function that sorts a specified range and preserves the cells outside the range
335
Sort sheet
A spreadsheet menu function that sorts all data by the ranking of a specific sorted column and keeps data together across rows
336
Sorting
The process of arranging data into a meaningful order to make it easier to understand, analyze, and visualize
337
Specific question
A question that is simple, significant, and focused on a single topic or a few closely related ideas
338
SPLIT
A spreadsheet function that divides text around a specified character and puts each fragment into a new, separate cell
339
Sponsor
A professional advocate who is committed to moving forward the career of another
340
Spotlightling
Scanning through data to quickly identify the most important insights
341
Spreadsheet
A digital worksheet
342
SQL
(Refer to Structured Query Language)
343
Stakeholders
People who invest time and resources into a project and are interested in its outcome
344
Static data
Data that doesn’t change once it has been recorded
345
Static visualization
A data visualization that does not change over time unless it is edited
346
Statistical power
The probability that a test of significance will recognize an effect that is present
347
Statistical significance
The probability that sample results are not due to random chance
348
Statistics
The study of how to collect, analyze, summarize, and present data
349
Story
The narrative of a data presentation that makes it meaningful and interesting
350
String data type
A sequence of characters and punctuation that contains textual information (also called text data type)
351
Structural metadata
Metadata that indicates how a piece of data is organized and whether it is part of one or more than one data collection
352
Structured data
Data organized in a certain format such as rows and columns
353
Structured Query Language
A computer programming language used to communicate with a database
354
Structured thinking
The process of recognizing the current problem or situation, organizing available information, revealing gaps and opportunities, and identifying options
355
Subquery
A SQL query that is nested inside a larger query
356
SUBSTR
A SQL function that extracts a substring from a string variable
357
Substring
A subset of a text string
358
Subtitle
Text that supports a headline by adding context and description
359
SUM
A spreadsheet function that adds the values of a selected range of cells
360
SUMIF
A spreadsheet function that adds numeric data based on one condition
361
Summary table
A table used to summarize statistical information about data
362
SUMPRODUCT
A function that multiplies arrays and returns the sum of those products
363
Swift
A programming language for macOS, iOS, watchOS, and tvOS
364
Symbol map
A data visualization that displays a mark over a given longitude and latitude
365
Syntax
The predetermined structure of a language that includes all required words, symbols, and punctuation, as well as their proper placement
366
Tableau
A business intelligence and analytics platform that helps people visualize, understand, and make decisions with data
367
Technical mindset
The ability to break things down into smaller steps or pieces and work with them in an orderly and logical way
368
Temporary table
A database table that is created and exists temporarily on a database server
369
Text data type
A sequence of characters and punctuation that contains textual information (also called string data type)
370
Text string
A group of characters within a cell, most often composed of letters
371
Third-party data
Data provided from outside sources that did not collect it directly
372
Tibble (R)
A streamlined variation of data frames
373
Tidy data (R)
A way of standardizing the organization of data within R
374
Tidyverse (R)
A system of packages in R with a common design philosophy for data manipulation, exploration, and visualization
375
Time-bound question
A question that specifies a timeframe to be studied
376
Transaction transparency
The aspect of data ethics that presumes all data-processing activities and algorithms should be explainable and understood by the individual who provides the data
377
Transferable skills
Skills and qualities that can transfer from one job or industry to another
378
TRIM
A function that removes leading, trailing, and repeated spaces in data
379
TSV (Tab-separated values file)
A text file that stores a data table by separating columns of data with tabs
380
Turnover rate
The rate at which employees voluntarily leave a company
381
Typecasting
Converting data from one type to another
382
Unbiased sampling
When the sample of the population being measured is representative of the population as a whole
383
Underscores
Lines used to underline words and connect text characters
384
Unfair question
A question that makes assumptions or is difficult to answer honestly
385
Unique
A value that can’t have a duplicate
386
United States Census Bureau
An agency in the U.S. Department of Commerce that serves as the nation’s leading provider of quality data about its people and economy
387
Unity
The design principle of using visual elements that complement each other to create aesthetic appeal and clarity in a data visualization
388
Unstructured data
Data that is not organized in any easily identifiable manner
389
Validity
The degree to which data conforms to constraints when it is input, collected, or created
390
VALUE
A spreadsheet function that converts a text string that represents a number to a numeric value
391
Variable (R)
A representation of a value in R that can be stored for later use
392
Variety
The design principle of using different kinds of visual elements in a data visualization to engage an audience
393
Vector (R)
A group of data elements of the same type stored in a one-dimensional sequence in R
394
Verification
A process to confirm that a data-cleaning effort was well executed and the resulting data is accurate and reliable
395
Video file
A collection of images, audio files, and other data usually encoded in a compressed format such as MP4, MV4, MOV, AVI, or FLV
396
Vignette (R)
Documentation for an R package that describes the problem the package is designed to solve, explains how its functions can be used, and lists any dependencies on other packages
397
Visual form
The appearance of a data visualization that gives it structure and aesthetic appeal
398
Visualization
(Refer to Data visualization)
399
VLOOKUP
A spreadsheet function that vertically searches for a certain value in a column to return a corresponding piece of information
400
WHERE
The section of a query that specifies criteria that the requested data must meet
401
Wide data
A dataset in which every data subject has a single row with multiple columns to hold the values of various attributes of the subject
402
WITH
A SQL clause that creates a temporary table that can be queried multiple times
403
World Health Organization
An organization whose primary role is to direct and coordinate international health within the United Nations system
404
X-axis
The horizontal line of a graph usually placed at the bottom, which is often used to represent time scales and discrete categories
405
Y-axis
The vertical line of a graph usually placed to the left, which is often used to represent frequencies and other numerical variables
406
YAML
A language that translates data to improve readability