Exam 1 Review Flashcards
(56 cards)
The process of extracting portions of a data set that are relevant to the analysis is called
subsetting
The methodology of extracting information and knowledge from data to improve a company’s bottom line and enhance the consumer experience
business analytics
How does business analytics benefit companies? (6)
- develop better marketing strategies
- deepen customer engagement
- enhance efficient in procuremnt
- uncover ways to reduce expense
- identify emerging market trends
- mitigate risk and fraud
What topics do business analytics encompass?
- statistics
- computer science
- information systems
What questions do the 3 types of analytics techniques ask?
- Descriptive: What has happened?
- Predictive: What could happen in the future?
- Prescriptive: What should we do?
Data that have been organized, analyzed, and processed in a meaningul and purposeful way
Information
Derived from a blend of data, contextual information, experience, and intuition
Knowledge
Data collected by recording a characteristic of many subjects at the same point in time
cross-sectional data
Data collected over several time periods
Time series data
Provide examples of human-generated and machine-generated, structured and unstructured data
What are the 3 characteristics of big data?
- volume (immense amount)
- velocity (generated at rapid speed)
- variety (different types and forms of data)
When a characteristic of interest differs in kind or degree among various observations
variable
What are the 2 broad types of variable divisions?
- Categorical (qualitative)
- Numerical (quantitative)
What are the 2 types of numerical variables?
provide examples
- continuous
ex: weight, time, height, investment return - discrete (countable)
ex: number of points or children
What are the 4 measurement scales?
Provide definitions and examples
- nominal (categorical): observations just differ by name
- ordinal (categorical): observations can be categorized or ranked (but differences are meaningless)
ex: ratings - interval (numerical): observations can be categorized or ranked (differences are meaningful)
ex: temperatures - ratio (numerical): observations are on interval-scale w/true zero point
ex: grades, weight, time, distance
Process of retrieving, cleansing, integrating, transforming, and enriching data to support subsequent data analysis
Data wrangling
What are the objectives of data wrangling? (3)
- improve data quality
- reduce time and effort required to perform analytics
- help reveal true intelligence in the data
What helps us to verify that the data set is complete or may have missing values
counting & sorting
What allows us to review the range of values for each variable?
sorting data
What are 2 common strategies for dealing with missing values?
Provide definitions and when to use them
- omission (complete-case analysis): exclude missing values
ex: use when amount of missing values is small and expected to be randomly distributed across observations - imputation: replace missing values
ex: may replace with mean; used when variable w/missing values is deemed important
Process of converting data from one format or structure to another
Provide Examples
Data transformation
ex: convert dates into seasons; convert values into natural logarithms; combine height and weight to create BMI
Process of transforming numerical into categorical variables
What are the constraints?
binning
Bins must be consecutive and nonoverlapping
What are 3 common approaches for transforming categorical data?
Explain/provide examples
- category reduction: combining categories
ex: Mon-Fri = Weekdays; “Other” - dummy variables: AKA indicator or binary variable that takes on value of 1 or 0 to describe two cateogires of a variable (n - 1)
- category scores: ex: recode satisfaction survey to numbers
ex: used when data are ordinal and have natural, ordered categories
In addition to binning, another common approach is to create new variables through ____ transformations
mathematical