Chapter 2 - Descriptive Analytics I Flashcards

(26 cards)

1
Q

What are the Characteristics that Define the Readiness Level of Data for an Analytics Study? (1-5)

A

Data source reliability - appropriateness of the medium where the data was obtained
Data content accuracy - data are correct and a good match for the analytics problem
Data accessibility - data are readily and easily obtainable
Data security and data privacy - only those who have authority and need to access the data can access it
Data richness - all the required data elements

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the Characteristics that Define the Readiness Level of Data for an Analytics Study? (6-10)

A

Data consistency - means data are accurately collected and combined/merged
Data currency/data timeliness - means the data should be up to date for a given analytics model
Data granularity - requires that the variables and data values be defined at lowest level of detail for the intended use of the data.
Data validity - the term used to describe a match/mismatch between the actual and expected data values of a given variable.
Data relevancy - means the variables in the data set are all relevant to the study being conducted.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What has to match related to data and its usability?

A

The data has to match with the task for which it is intended to be used

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does it mean to have data “analytics ready”?

A

It means the data has been transformed into a flat-file format and is ready for ingestion into predictive algorithms.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the three broad sources of data in which business analytics come from?

A

Unstructured data, categorical structured data, and numerical structured data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How is datum defined?

A

A singular form of data. A collection of facts usually obtained as the result of experiments, observations, transactions, or experiences.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is unstructured data?

A

Data composed of any combination of textual, imagery, voice, and Web content.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is structured data?

A

Categorical or numeric data that is used in data mining algorithms.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the six elements of the data taxonomy?

A

Categorical Data
Nominal Data
Ordinal Data
Numeric Data
Interval Data
Ratio Data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the two main categories of structured data and the two subcategories below them?

A

Structured data is either categorical or numerical.
Categorical data is either nominal or ordinal
Numerical data is either interval or ratio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the subcategories for unstructured data?

A

Textual
Multimedia
XML/JSON

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the four steps of data preprocessing?

A

Data consolidation
Data cleaning
Data transformation
Data reduction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is dimensional reduction or variable selection as it relates to data preparation?

A

The reduction of variables that describe the phenomenon from different perspectives down to a manageable size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the main tasks and methods for Data Consolidation?

A

Tasks: Access and collect the data, select and filter the data
Methods: SQL Queries, domain expertise, software agents

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the main tasks and methods for Data Cleaning?

A

Tasks: Handle missing values, ID and reduce noise, Find and eliminate errors
Methods: Fill in missing values, ID outliers and either remove or smooth the them, ID erroneous data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the main tasks and methods for Data Transformation?

A

Tasks: Normalize, aggregate data; construct new attributes
Methods: Reduce range of values to a standard range; if needed, convert numeric variables to discrete variables; Derive new and more informative variables from existing ones

17
Q

What are the main tasks and methods for Data Reduction?

A

Tasks: Reduce number of attributes and records, balance skewed data
Methods: Principle component analysis; sampling; oversample the less represented or undersample the overrepresented classes.

18
Q

How is dispersion defined?

A

It is the representation of the numerical spread of a given data set.

19
Q

What is the box and whiskers plot?

A

A graphical illustration of both centrality and dispersion of a given data set

20
Q

What is correlation vs. regression?

A

Correlation is interested in the low-level relationships between two variables, regression is concerned with the relationships between all explanatory variables and the response variable.

21
Q

What is simple versus multiple regression?

A

A simple regression is built between one response variable and one explanatory variable. Multiple regression is built between one response variable and multiple explanatory variables.

22
Q

What are the three elements of developing regression models? (Revisit)

A

R squared
Overall F-test
Root mean square error (RMSE)

23
Q

What are the five assumptions to linear regression?

A
  1. Linearity
  2. Independence
  3. Normality
  4. Constant variance
  5. Multicollinearity
24
Q

What is time series forecasting?

A

The use of mathematical modeling to predict future values of the variable of interest based on previously observed values.

25
What are the three types of business reporting and what do each report?
Metric management reports - outcome-oriented metrics (e.g. KPIs) Dashboard Reports - A range of different performance indicators on one page Balanced-Scorecard Reports - Presents an integrated view of success in the org. From financial, customer, business process, and learning and growth perspectives.
26
What is data visualization?
The use of visual representations to explore, make sense of, and communicate data.