Study Flashcards

(43 cards)

1
Q

Also known as the discovery phase

A

Business understanding

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Analyst defines the major questions of interest that need to be answered

A

Business understanding

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

The phase of collecting data

A

Data acquisition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Alternative names include data cleansing, data wrangling, data munging, and feature engineering

A

Data cleaning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

When ignored the results from analysis may be irrelevant
No one common tool, may use SQL, Python, R, or Excel
Data quality is measured in terms of uniqueness and relevance

A

Data cleaning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Analyst begins to understand the basic nature of data and the relationships within
Often relies on visualization tools and numerical summaries such at central tendency and variability
Central tendency is a single value that attempts to describe a set of data by identifying the central position
Variability describes how far apart data points lie from each other and from the center of a distribution

A

Data exploration

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Creating models that enable predictions of outcomes of interest
Tools such as Python and R play an important role in automating the training and use of models

A

Predictive modeling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Sometimes machine learning is used as a synonym

A

Data mining

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Ability of computers to look for patterns in large amounts of data
Tools such as Python and R play an important role

A

Data mining

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

An analyst tells the story of the data and uses graphs or interactive dashboards to inform others of the findings from the analyses

A

Reporting and visualization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

The goal is to provide actionable insights for various stakeholders

A

Reporting and visualization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Scope Project
Identify stakeholders and research questions/KPIs
Identify timeline, budget, and participants

A

Business Understanding

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Gather/collect data from a variety of sources
Provide structure to data accessible via relational databases (SQL)
Build data pipeline (ETL)
Use of API to download data from an external source

A

Data acquisition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Estimate/project future values or likelihood of an event.
Extend correlations found in EDA to mathematical models
Predict/determine output values based on input values
Cross-validation of predictive models to ensure accuracy.

A

Predictive Modeling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Creating training and testing datasets to build models from
Identify/detect patterns
Determine if groups (clusters) exist in data
Classify data into groups
Create models that “learn” and improve (e.g., machine/deep learning, AI, etc.)

A

Data mining

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Tell a story with data
Provide a summary of analytic analysis
Provide insights to stakeholders
Create insightful graphs that showcase trends and forecasts

A

Reporting and visualization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What happened?

A

Descriptive Analytics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Why did it happen?

A

Diagnostic Analytics

19
Q

What will happen?

A

Predictive Analytics

20
Q

How can we make it happen?

A

Predictive Analytics

21
Q

Is a relationship between two variables: when one variable changes, you know the degree in which the other variable changes

22
Q

Is when there is a real-world explanation for why this is logically happening; it implies a cause and effect

23
Q

Which phase of the data analytics life cycle is also known as the discovery phase?

A

Business understanding

24
Q

Which phase of data analytics life cycle allows an analyst to use graphs or interactive dashboards to tell the story of the data?

A

Reporting and visualization

25
Which phase of data analytics life cycle does the analyst begin to understand the nature of the data?
Data exploration
26
Which phase of the data analytics life cycle provides structure to data accessible via relational databases?
Data acquisition
27
Which term is defined as a relationship between two variables?
Correlation
28
A way to graph numerical data in groups or bins that allow bars to represent frequencies
Histogram
29
Provides a concise summary of the quartiles of numerical data (i.e., cut points that divide the data into 25% percentile segments)
Boxplot
30
Colorful graph that can visually show frequency or interaction using a range of colors
Heatmap
31
Two-dimensional graph Great to visualize correlation or relationships
Scatterplot
32
Predict an outcome based on a set of predictor variables
Regression
33
Technique in which the analyst wants to assign an item to a specific category based on various conditions
Classification
34
Groupings are unknown and the analyst wishes to determine if the objects belong to any groups
Clustering
35
Looks for trends in data over time Focused on breaking apart different reasons for the variation (decomposition)
Time series
36
Technique attempts to group variables into meaning groups
Principal Component Analysis
37
Tool - Data Science (Deep Learning/AI), Web Development, Embedded System
Python
38
Tool - Data analysis and statistical modeling
R
39
Tool - Can easily perform matrix computation as well as optimization
Python
40
Tool - Consists of many to use packages
R
41
Key Characteristic: Often numbers or labels, stored in a structured framework of columns and rows relating to pre-set parameters Typical File Types: Databases
Structured Data File Type
42
Key Characteristic: Loosely organized into categories using meta tags Typical File Types: JSON, XML, Email, Web pages
Semi-structured Data File Type
43
Key Characteristic: Text-heavy information that’s not organized in a clearly defined framework or model Typical File Types: Audio, Video, Image data, Natural Language, Documents
Unstructured Data File Type