The Data Analytics Journey Flashcards

WGU Class D596

1
Q

Quantative Data

A

Quantitative data represents numerical values that can be measured or counted. It answers questions like “How many?” or “How much?”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Discrete data

A

Countable values. Distinct and separate; they cannot take on values between the defined points.
Like number of students in a class or pets in a home

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Continuous data

A

Continuous data is a type of quantitative data that can take on any value within a range.
Height, temperature, time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

categorical data

A

Categorical data represents categories or labels rather than numerical values.
Can be nominal or ordinal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Nominal data

A

Categories have no natural order.
ex: colors, types of pets, martital status, favorite sports, car brands

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Ordinal data

A

Categories have a meaningful order, but the intervals between them are not equal.’ Examples: Ratings (poor, fair, good, excellent), educational levels (high school, bachelor’s, master’s).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Part to whole

A

Shows how individual parts contribute to the whole. Great for when you want to display proportions or percentages.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Distribution

A

Shows how values in a dataset are spread or distributed across a range. Understanding the spread, skewness, or patterns in your data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Nominal Comparison

A

Compares values for categorical (nominal) variables without any specific order. Comparing quantities between categories.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Time Series

A

Data collected over time (e.g., daily, monthly, yearly) to track trends or patterns. Analyzing how data changes over time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Correlation

A

Shows the relationship between two variables, indicating whether they move together (positive correlation), move oppositely (negative correlation), or show no relationship.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Ranking

A

Compares items in a dataset by sorting them in ascending or descending order. Highlighting the relative positions or hierarchy of categories.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Deviation

A

Shows how data deviates from a baseline, expected value, or the mean. Highlighting differences or anomalies in the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What charts are good for deviation?

A

Diverging bar chart
Line chart (with baseline or reference line)
Error bars.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What charts are good for ranking?

A

Bar chart (sorted by value)
Column chart
Dot plot.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What charts are good for correlation?

A

Scatter plot
Bubble chart
Heatmap.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What charts are good for time series?

A

Line chart
Area chart.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What charts are good for nominal comparison?

A

Bar chart
Column chart

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What charts are good for distribution

A

Histogram
Box plot
Violin plot.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What charts are good for part-to-whole?

A

Pie chart
Donut chart
Stacked bar chart (with percentages).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What are visual elements to use when designing charts?

A

Similarity & Contrast
Dominance & Emphasis
Scale & Proportion
Hierarchy
Balance & Symmetry

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Regression

A

Regression is a technique that allows an analyst to predict an outcome (either numerical or categorical) based on a set of predictor variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Regression analysis

A

A statistical method that identifies the relationship between variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Classification

A

A type of supervised machine learning task where the goal is to predict a categorical label for a given input based on a set of features. It involves assigning items to predefined classes or categories based on their characteristics.

25
Clustering
An unsupervised learning algorithm used to group data points into clusters based on their similarity without prior knowledge of labels.
26
T or F: Decision Trees are an example of Clustering
False. a supervised learning algorithm used for classification and regression tasks. It creates a model that predicts the value of a target variable by learning simple decision rules inferred from the data.
27
What is the classification process?
The process typically involves: Data preparation, feature selection/extraction, model training, prediction, evaluation
28
Market Basket Analysis
A data mining technique used to understand purchasing behavior by identifying relationships between items in a transaction
29
Process Mining
A technique that analyzes event logs from business processes to identify inefficiencies, bottlenecks, or opportunities for optimization.
30
T-Test
A statistical test used to determine if there is a significant difference between the means of two groups.
31
Text Mining
The process of extracting meaningful information from unstructured text data.
32
Neural Networks
A machine learning model inspired by the human brain, consisting of layers of interconnected "neurons" that learn patterns in data.
33
Principal Component Analysis (PCA)
A dimensionality reduction technique that simplifies data by converting it into principal components (uncorrelated variables).
34
Supervised Learning
A machine learning approach where the model is trained on labeled data to predict outcomes for new data.
35
Regression ML
A supervised learning technique used to predict a continuous output (numerical value) based on input features.
36
Unsupervised learning
A machine learning approach where the model works with unlabeled data to discover patterns or structures.
37
Time Series Model
A model designed to analyze and predict values that change over time.
38
What are these algorithms an example of? K-Means, DBSCAN, Hiearchial?
Clustering. a type of unsupervised machine learning used to group data points into clusters
39
What model uses forecasting and detecting seasonal patterns?
Time Series Model
40
What are decision trees, support vector machines, and k-Nearest Neighbors examples of?
Classification
41
What uses image and/or speech recognition, or predictive analytics?
Neural networks
42
Google Sheets, MySQL, and sales data are examples of?
Structured data
43
What is semi-structured data?
Data that does not follow a rigid structure but still has some level of organization, typically using tags or markers to separate elements
44
What are examples of semi-structured data?
JSON files, XML files, emails
45
What is unstructured data?
Data that does not follow any predefined format or structure, making it difficult to store and analyze in traditional databases.
46
T or F: You use SQL on semi-structured and unstructured data.
False. Usually MongoDB or Apache Hive
47
What is AutoML?
Automated Machine Learning (AutoML) is a framework or set of tools that automates the process of developing, training, tuning, and deploying machine learning (ML) models.
48
What are keys to managing stakeholders in a project?
Obtain a project sponsor Identify project stakeholders, group them by power, influence, and need Survey other stakeholders and create an engagement map Pinpoint stakeholder frustrations and visions of success when talking and interviewing stakeholders
49
What are keys to communicating the data effectively?
Continue learning about the business. Tie the data to the business question asked Avoid granualarity Make data easy to consume Ask for feedback Don't discuss technical unless important to business question
50
What is the key to persuasion?
Communication, emotional intelligence, active listening, logic and reasoning, interpersonal skills, and negotiation
51
What questions should we ask ourselves when posing a question?
Does the receiver understand what is asked, have you phrased the question based on what the receiver knows or may not know, is the question logical, and is your tone neutral?
52
How do you summarize what you hear?
Using your own words, capturing the intent the receiver/speaker is trying to express while filling in their words and actions as if understanding the feeling accurately.
53
T or F: Discrete data can be decimals
False. Discrete data are whole numbers
54
T or F: Continuous data can be fractions and decimals
True.
55
What is a data analytics plan?
A data analytics project plan outlines the steps and processes involved in conducting a data analytics project from start to finish.
56
Define a EDARP
It is a Exploratory Data Analysis Research Plan. Convincing the organization the potential value of your work. It understands the objectives and details path to reaching the objectives.
57
What makes a good data analytics plan?
Scoping meetings, aligning the list of requirements, building a mockup, avoiding commiting to deadlines until processing data, creating a UAT document, avoiding feature creep, hosting regular meetings with end-users/stakeholders, releasing a minimum viable product, conducting demo/training, scheduling regroups & adoption, obtaining feedback, building a contigency plan
58
What are examples of classification in ML
Spam dectection, classify email spam or not spam. Sentiment analysis - determine the sentiment of text as positive, negative, neutral Image recognition - identify images as cats vs. dogs