6 Types of Analytics Flashcards
What are the four general categories of analyses needed for the exam?
Exploratory data analysis (EDA), performance analysis, trend analysis, link analysis
What is EDA?
A general term for analyses used to understand your data better
What common types of analyses fall under EDA?
Descriptive statistics, relationships, dimension reduction
What are the four categories of descriptive statistics?
- Measures of central tendency
- Measures of dispersion
- Measures of frequency
- Measures of position
What do measures of central tendency include?
- Mean
- Median
- Mode
What do measures of dispersion explain?
How spread out your data is
What are some examples of measures of dispersion?
- Standard deviation
- Range
- Variance
- Min
- Max
- Quartiles
How are measures of frequency represented?
Counts, ratios, or percentages
What visualization is often used for measures of frequency?
Bar chart or heat maps
What is the purpose of relationships in EDA?
To see whether there is a correlation between two variables
What is a common visualization used to assess relationships between variables?
Scatter plot
What is dimension reduction?
The idea of simplifying data before analysis
What are some advanced methods for dimension reduction?
- Principal component analysis (PCA)
- Non-negative matrix factorization (NMF)
What is a key step before performing analyses on a dataset?
Understanding basic information about the data
What is the function used in Python’s pandas package to get descriptive statistics?
.describe()
What command is used to import the pandas package in Python?
import pandas as pd
What does the command MyData.info() provide?
A rough description of the dataset
What are the common variable types in programming languages?
- Integers
- Floats
- Strings
What does the command MyData[‘Age_Bracket’].value_counts() do?
It provides counts of each unique value in the Age_Bracket variable
What does the command pd.plotting.scatter_matrix(MyData) create?
A scatter plot matrix of all numerical variables
True or False: EDA includes data cleaning and wrangling processes.
False
Fill in the blank: EDA encompasses any preliminary information gathering that you must do before you can jump into what you actually want to know, including frequencies, averages, trends, or the _______.
relationships between your variables
What is the purpose of using scatter plots in EDA?
To visually assess potential correlations between variables
What does the command MyData[‘Total_Spent’].describe() return?
Descriptive statistics for the Total_Spent variable