Introduction to Data Literacy Flashcards

1
Q

Can help us learn how data can be used to connect the dots and create value?

A

Data Literacy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

The ability to read, work with, analyze, and communicate insights with data.

A

Data Literacy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Three main components of data literacy?

A

Reading data
Working with and analyzing data
Communicating insights with data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does reading data consist of?

A

Identifying data sources
Collect data
Manage data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Allow you to store organize and share your data

A

Databases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Main tools for communication?

A

Visualizations and Storytelling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

In the DIKW pyramid, this consists of raw observations or measurements?

A

Data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

In the DIKW pyramid, this refers to unorganized, unprocessed, and does not have meaning (yet)

A

Data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

In the DIKW pyramid, this refers to raw data placed into context.

A

Information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

In the DIKW pyramid, this is typically done by organizing or aggregating data.

A

Information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

In the DIKW pyramid, this refers to combining information and making connections to learn and gain meaning.

A

Knowledge

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

In the DIKW pyramid, this is typically done by detecting patterns, making generalizations or predictions.

A

Knowledge

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

In the DIKW pyramid, this is applied knowledge, or knowledge in action, as it allows to act proactively.

A

Wisdom

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

In the DIKW pyramid, this is typically done by combining knowledge logically to determine the course of action.

A

Wisdom

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Characteristics of insights?

A

Allow to get closer to wisdom
Valuable, realistically achieved
Apply knowledge and take action
Approached, but not quite reached

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

The process of using data to make an informed decision about a specific problem and acting upon it.

A

Data-driven decision making

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

5 main steps that underpin every data-driven process:

A

Problem statement
Data Collection
Data Analysis
Communication
Action and reflection

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Problem statement answers the question:

A

What is the problem that you want to solve?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Step in data-driven decision making that guides the data-driven process?

A

Problem statement

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Typical problem categories:

A

Describing the state of an organization or process
Diagnosing causes of events
Detecting anomalies or predicting events

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Guiding questions on how to define a problem:

A

What is the current situation?
What do we need to know?
Where do we want to be?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

A good problem statement is:

A

Clearly defined
Actionable
Realistic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Data comes in different forms

A

Images and text
Network and spatial data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Different sources of data?

A

Open Data and Internal data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Open data includes:
Public databases and records
26
The importance of data type has an effect on:
How to collect the data How to store the data How to analyze the data
27
Data in tabular form
Structured Data
28
Easy to search and organize
Structured Data
29
Requires less preprocessing
Structured Data
30
Stored in relation databases
Structured Data
31
Data without pre-defined structure
Unstructured data
32
Difficult to search and organize
Unstructured data
33
Requires more preprocessing
Unstructured Data
34
Stored in document databases
Unstructured Data
35
Examples of structured data
Spreadsheets Data tables
36
Examples of unstructured data
Images Videos Sound Text
37
Describes something with numbers
Quantitative
38
Can be measured or counted
Quantitative
39
Wider range of statistics and analysis methods
Quantitative
40
Describes something with categories
Qualitative
41
Can be observed
Qualitative
42
More restricted range of statistics and analysis methods
Qualitative
43
allows the user to store, retrieve, and access the data
Database management system (DBMS)
44
Different type of databases
Relational vs. document databases Data warehouse vs. data lake
44
Document databases stores what type of data?
Unstructured data
45
Relational databases stores what type of data?
Structured Data
46
Contains processed, organized data in preparation for future analysis
Data warehouse
47
Used to store raw data that has not been prepared yet.
Data Lake
48
Designing and optimizing database systems is typically the responsibility of a _______
Data engineer
49
Data is stored o remote servers and accessed over the internet
Data storage in cloud
50
Data storage in the cloud has services provided by a specialized third party
true
51
Cost-effective, but still rely on third party for security dependent
True
52
The purpose of ___________ move data from one database to another.
Pipelines
53
Pipelines can be automated collection and storage via the _____________
ETL Process
54
ETL process stands for?
Extract, transform, and Load
55
Making use of pipelines ensures what?
The availability of up-to-date and accurate data
56
Accessing and Retrieving data from databases?
Querying
57
Industry standard for querying?
SQL
58
SQL stands for?
Structured Querying Language
59
Another way to leverage the data available in databases?
Dashboards
60
Alternative non-technical way to collecting, managing and sharing data between teams.
Dashboards
61
Provides information at a glance?
Dashboards
62
Receives data from a linked database
Dashboards
63
Data is presented in a very visual way
Dashboards
64
A multipurpose tool used for exploratory analysis of the data and communicating
Dashboards
65
Dirty data is categorized as what?
Incorrect Incomplete Inconsistent
66
Caused by human error, technical issues, or issues with the data collection process
Dirty data
67
Consists of data that is incorrect or inconsistent
Data Errors
68
Data errors are typically cause by _____________ error in recording the value or the format
Human or Technical error
69
Techniques to counter data errors:
Original value or valid format is known: correct data If unknown: drop data
70
When data is incomplete, what do we call it?
Missing data
71
Missing data will be problematic if:
Many data points are missing There are underlying patterns in the missing data
72
What techniques should we do to counter missing data?
Dropping data Imputation
73
Societal bias can be reflected in data
Data Bias
74
Leads to unrepresentative data and results
Data Bias
75
Techniques to counter to avoid data bias:
Sound data collection process Awareness in conclusions Explainable AI models
76
Set of techniques to counter data problems
Data Cleaning
77
Important preparation step for any data analysis
Data Cleaning
78
Not all data problems are completely solvable
True
79
Four main types of analytics:
Descriptive Analytics Diagnostic Analytics Predictive Analytics Prescriptive Analytics
80
What is being asked in Descriptive analytics?
Why is it happening?
81
What is being asked in Diagnostic Analytics?
Why is it happening?
82
What is being asked in Predictive Analytics?
What will happen?
83
What is being asked in Prescriptive Analytics?
What should we do?
84
What type of analytics responsible for finding the root causes of events?
Diagnostic Analytics
85
What type of analytics summarizes and visualizes the data?
Descriptive Analytics
86
What type of analytics identifies the possible outcomes and the probability that they will happen?
Predictive Analytics
87
What type of analytics determines the best course of action given the outcome we want to achieve?
Prescriptive Analytics
88
Common techniques for Descriptive analytics
Descriptive statistics Visualizations Outlier Detection EDA
89
Why should we use descriptive analytics?
Get to know the data Investigate relationships in the data Preparation for more advance techniques
90
Focus on exploring the data: Assessing main characteristics Finding relationships, patterns or groups Suggesting hypotheses for future analysis
Exploratory Data Analysis
91
Groundwork for further analysis but also valuable on its own
EDA
92
Why use diagnostic analytics?
Find potential causes of events or reasons for behaviors Investigate casual relationships Suggest solutions based on the identified causes
93
Common techniques of Diagnostic Analytics:
Drill-down analytics Correlation and regression analysis Hypothesis testing Root cause analysis
94
Formal set of steps to look beyond superficial causes that have a direct effect
Root cause Analysis
95
Steps of Root cause analysis
Define the event Collect relevant data Determine Contributing factors Find root causes Recommend possible solutions
96
Why use Predictive analytics?
Anticipate most likely outcomes Forecast a process or sequence Estimate an unknown based on the information that is available
97
Two types of machine learning models:
Classification-Based Regression-Based
98
Common techniques used in Predictive Analytics:
Machine Learning Models Time Series forecasting Predictive text analysis
99
Predicting housing prices based on neighborhood characteristics
Regression-based
99
Predicting cancellation of subscriptions
Classification-based
99
Predicting sales revenue over time
Time series Forecasting
100
Predicting whether an email is spam or not
Predictive text analysis
101
Steps in Predictive Modeling
Define the outcome Collect and Prepare data Build Predictive model Interpret and evaluate the model Implement / Fine-tune
102
In the predictive modeling phase, data is split into __________________ to build the predictive model
Training and Test Set
103
Predictions are interpreted and evaluated on the test data, using pre-determined metrics like (accuracy) percentage of correct predictions
True
104
Primary purpose of prescriptive analytics
To help decide what best to do
105
Why use prescriptive analytics?
Make informed, data-driven decision Optimize processes Mitigate Risks
106
Common techniques used in Prescriptive Analytics
Rule-based systems Reinforcement Learning Scenario and simulation analysis
107
Consist of generating a set of rules or decision logic to get the best outcome
Rule-based systems
108
An algorithm learns to achieve a particular objective or optimize an outcome by receiving positive and negative feedback when running though a set of actions.
Reinforcement Learning
109
Running through a set of pre-determined scenarios or simulating multiple outcomes to help select the decision that leads to the best outcome
Scenario and simulation analysis
110
Predicts interest based on past behavior and Provides recommendations based on predicted interests
Recommendation engine