Chapter 1 Flashcards

(52 cards)

1
Q

Cases

A

The objects described by a set of data.

Ex. Customers, companies, subjects in a study, stock

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Label

A

Is a SPECIAL VARIABLE used in some data sets to distinguish the different cases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Variable

A

Is a characteristic of the case–> different cases can have different values for variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Observation

A

Describes the data for a particular case

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Categorical Variable

A

Places a case into one of several groups or categories

Ex. Bar Graphs, Pie Charts, and Pareto Charts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Quantitative Variable

A

Takes numerical values arithmetic operations, such as adding and averaging, makes sense

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Statistical Software

A

In some statistical software spaces are not allowed in variable names–> instead use an underscore

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Ordered Categorical Variable

A

Possible values for a grade…A, B, C, D..etc because A is better than B which is better then C and so on

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Nominal Variable

A

A categorical variable that is not ordered

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Instruments

A

Different areas of application (marketing) can also have their own special variables–> these variable are measured with instruments

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Rate

A

Computing a rate is one of several ways of adjusting one variable to create another–> sometime more meaningful than count

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Distribution

A

Describes how to values of a variable vary from case to case

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Pareto Chart

A

Categories are ordered from MOST frequent–>least frequent–>most important categories for a categorical variable
Ex. frequently used in quality control settings

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Histogram

A

The most common graph of the distribution of a quantitative variable wear we group near values into classes–> for small data sets a stemplot can be used

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How can you describe the overall pattern of a histogram

A

You can describe the overall pattern of a histogram by its SHAPE, CENTER, and SPREAD

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Outlier

A

The most important type of deviation–> an individual value that falls outside the overall pattern

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

When is a distribution symmetric?

A

If the right and left sides of the histogram are mirror images of each other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Skewed to the right

A

If the right side of the histogram extends much farther out than the left side..and vice versa

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Positively skewed

A

Data that skews to the right–> positive skewness is the MOST common type of skewness that we see in real data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Time plot

A

Plots each observation against the time it was measured–> time on a horizontal and the variable you are measuring on a vertical scale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Mean

A

The most common measure of center is the ordinary arithmetic average–> NOT a resistant measure of center as it can be influenced by outliers

22
Q

Median

A

The median is the midpoint of a distribution, the number such that half the observations are smaller and half are larger

23
Q

Median Odd

A

(N+1)/2 observations up from the bottom of the list

24
Q

Median Even

A

It is the mean of the two numbers in the middle

25
Median vs Mean
The median is more resistant than the mean
26
Median and Mean in a Symmetric Distibution
They are close together--> exactly symmetric exactly the same
27
Median and Mean in a skewed distribution
The mean is farther out on the long tail than the median
28
The five number summary
Boxplot-->consits of the smallest observation, the first quartile, the median, the thrid quartile, and the largest observation --> in order form largest to smallest
29
The five number summary vs. distribution
Not the most common numerical description of distribution
30
Most common numerical description of distribution
The mean to measure the center and the standard deviation to measure the spread
31
Standard deviation
Measures spread by caluculating how far the observations are from their mean--> should only be used when the mean is chosen as the method of center
32
n-1
Degrees of freedom of the variance or standard deviation
33
S=0
Only when ther is no spread--> means all the observations have the same value, otherwise S is greater than 0
34
What does it mean if the standard deviation is higher?
S gets larger when the observations are more spread out across their mean
35
Units
S has the same units of measurement as the original observation
36
S and the Mean
Like the mean, S is not resistant a few outliers or strong skewness can greatly increase S
37
How do you measure risk in finance
Taking a looking at the standard deviation of returns --> large spread --> less predictable--> more risky BUT five number summary would be more informative
38
Density curve
A density curve is a mathematic model for the distribution of a quantitative variable
39
What does a density curve describe?
The overall pattern of a distribution. Thea area under the curve AND within any range of values is the proportion of all observations that fall within that range
40
68-95-99.7 rule
68% of observations fall within 1 standard deviation of the mean 95% of observations fall within 2 standard deviations of the mean 99.7% of observations fall within 3 standard deviations of the mean
41
Z-Score
Standardized value--> tells us how many standard deviations the observation falls away from the mean and in which direction
42
Z-score positive
Observations larger than the mean
43
Z-score negative
Observations smaller than the mean
44
Sample survey
Collects data from a sample of cases that represent a larger population of cases
45
Observation vs Experiment
We do not attempt to influence the responses by imposing a treatment (change)
46
Training Data Set
In some studies we generate one set of data to generate a set of results Ex. model to predict something
47
Database
Data sets for statistical analysis can be extracted
48
Data warehouse
System for organizing, storing, and analyzing complex data
49
Sampling frame
A list of items to be sampled
50
Response rate
The proportion of the original sample who actually provide usable data
51
Undercoverage
Some groups in the population are left out of the process of choosing the sample
52
Nonresponse
Occurs when a case chosen for the sample cannot be contacted or does not cooperate