Exam 1: Lectures 1, 2, 3 Flashcards

(57 cards)

1
Q

Business Analytics

A

refers to the skills, technologies, and practices for continuous iterative exploration and investigation of past business performance (e.g., sales and return on investment) to gain insight and drive business planning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Descriptive Analytics

A

Tools that summarize what happened

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Prescriptive Analytics

A

Statistical techniques that make predictions and then suggest decision options to take advantage of the predictions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Predictive Analytics

A

A variety of statistical techniques that analyze data to make predictions about future

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Business Analytics Advantages

A

1) Drive Revenue
2) Save Money
3) Encourage Experimentation
4) Side-step Politics
5) Persuade Executives

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

4 Key Challenges in Doing Business Analytics

A

1) Managing 6V’s of Big Data
2) Growth of Unstructured Data
3) Underestimating the Hard Work
4) Hiring the Right Person(s)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

4 Main Elements of Data-Driven Tasks

A

1) Data Access
2) Data Management
3) Data analysis
4) Data presentation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Model

A

an abstraction of a real problem that tries to capture the essence and key features of the problem

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Key Challenges of Managing 6V’s of Big Data

A

1) Volume
2) Velocity
3) Variety
4) Volatility
5) Validity
6) Value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Volume

A

Big data implies large volumes of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Velocity

A

It is the speed of data processing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Variety

A

Many sources and types of data are structured and unstructured

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Volatility

A

It refers to how long data is valid and how long it should be stored

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Validity

A

Data should be correct and accurate for the intended use

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Seven step modeling process

A

1) Define the problem
2) Collect and summarize data
3) Develop a model
4) Verify the model
5) Select one or more suitable decisions
6) Present the results to the organization
7) Implement the model and update it over time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

graphs

A

bar charts, pie charts, histograms, scatter charts, and time series graphs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

numerical summary measures

A

counts, percentages, averages, and measures of variability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

tables of summary measures

A

totals, averages, counts, and grouped by categories

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

population

A

includes all of the entities of interest in a study (people, households, machines, etc.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

sample

A

a subset of the population, often randomly chosen and preferably representative of the population as a whole

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Four Scales of Measurement

A

1) Nominal
2) Ordinal
3) Interval
4) Ratio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Nominal

A

have two or more categories without having any kind of natural order, two levels: gender (male and female), multiple levels: marital status (single, married, divorced, widowed)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

ordinal

A

a categorical variable for which the possible categories are ordered, education level: less than high school, high school, college degree, graduate degree

24
Q

interval

A

measure is ordered and the distance between each number is equal; however, there is no natural zero condition, temperature: the difference between 10C and 20C is the same as the difference between 20C and 30C

25
ratio
variables are interval variables, but with the added condition of zero (origin), money, sales revenue
26
interquartile range
the third quartile minus the first quartile Thus, it is the range of the middle 50% of the data It is less sensitive to extreme values than the range
27
variance
essentially the average of the squared deviations from the mean If Xi is a typical observation, its squared deviation from the mean is (Xi – mean)2
28
range
the maximum value minus the minimum value
29
standard deviation
the square root of the variance
30
skewness
occurs when there is a lack of symmetry
31
kurtosis
has to do with the “fatness” of the tails of the distribution relative to the tails of a normal distribution
32
Statisticians generally consider a value as an outlier if
it is more than three standard deviations from the mean
33
dummy variable
a 0–1 coded variable for a specific category It is coded as 1 for all observations in that category and 0 for all observations not in that category
34
bin variable
corresponds to a numerical variable that has been categorized into discrete categories
35
when a distribution has a negative (or positive) skew, ____ is larger than ____
median, mean
36
Two Types of Estimators
1) Point Estimators 2) Interval Estimators
37
Point Estimators
to estimate a population characteristic with a single value
38
Interval Estimators
to estimate a population characteristic with an interval, or range, of values
39
simple random sampling mechanism
the sample mean is typically used as a “best guess.” This estimate is a point estimate The accuracy of the point estimate is measured by its standard error It is the standard deviation of the sampling distribution of the point estimate A confidence interval (with 95% confidence) for the population mean extends to approximately two standard errors on either side of the sample mean From the central limit theorem, the sampling distribution of 𝑋 ̅ is approximately normal when n is reasonably large There is approximately a 95% chance that any particular 𝑋 ̅ will be within two standard errors of the population mean μ
40
For a simple random sampling, if we have 10,000 customers and we want to select 1,000 customers at random; each customer should have ___ chance to be selected
1 in 10
41
typical sampling mistakes
1) Unrepresentative sample 2) Biased respondents 3) Low response rate (non-response bias) 4) Biased questions
42
unrepresentative sample
Sample does not represent population
43
biased respondents
Respondents incorrectly answer sensitive questions such as annual income
44
low response rate (non-response bias)
Only few respondents participate in surveys
45
biased questions
Incorrect wordings make hard to understand what respondents answer
46
confidence interval
a range of values we are fairly sure our true value lies in
47
systematic sampling
is a type of probability sampling method in which sample members from a larger population are selected according to a random starting point and a fixed periodic interval. This interval, called the sampling interval, is calculated by dividing the population size by the desired sample size
48
stratified sampling
Suppose various subpopulations within the total population can be identified. These subpopulations are strata Instead of taking a simple random sample from the entire population, it might make more sense to select a simple random sample from each stratum separately
49
cluster sampling
the population is separated into clusters, such as cities or city blocks, and then a random sample of the clusters is selected
50
p-value
the probability of obtaining a result equal to what was actually observed, when the null hypothesis is true
51
What % of observations within 1, 2, or 3 standard deviations of its mean when a variable x follows a normal distribution
68, 95, 99.7
52
When we reject or fail to reject null hypothesis at 0.05 significant level
p-value<0.05, rejected, p-value>0.05 fail to reject
53
hypothesis
a claim that can be tested statistically
54
one-tailed alternative
supported only by evidence in a single direction
55
two-tailed alternative
supported by evidence in either of two directions
56
how to deal with missing values in variables
One option is to simply ignore them. Then you will have to be aware of how the software deals with missing values Another option is to fill in missing values with the average of nonmissing values. We use this option! A third option is to examine the nonmissing values in the row of a missing value; these values might provide clues on what the missing value should be
57