Recap Statistics Flashcards

1
Q

What is qualitative data?

A

Outcomes are categorical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is nominal?

A

Mutually exclusive categories, labeling

A nominal scale describes a variable with categories that do not have a natural order or ranking. You can code nominal variables with numbers if you want, but the order is arbitrary and any calculations, such as computing a mean, median, or standard deviation, would be meaningless.

Examples of nominal variables include:

genotype, blood type, zip code, gender, race, eye color, political party

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is ordinal?

A

Natural ordering (e.g. preference for chocolate)

An ordinal scale is one where the order matters but not the difference between values.

Examples of ordinal variables include:

socio economic status (“low income”,”middle income”,”high income”), education level (“high school”,”BS”,”MS”,”PhD”), income level (“less than 50K”, “50K-100K”, “over 100K”), satisfaction rating (“extremely dislike”, “dislike”, “neutral”, “like”, “extremely like”).

Note the differences between adjacent categories do not necessarily have the same meaning. For example, the difference between the two income levels “less than 50K” and “50K-100K” does not have the same meaning as the difference between the two income levels “50K-100K” and “over 100K”.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is quantitative data?

A

Outcomes are numerical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is interval data?

A

An interval scale is one where there is order and the difference between two values is meaningful.

Examples of interval variables include:

temperature (Farenheit), temperature (Celcius), pH, SAT score (200-800), credit score (300-850).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is ratio data?

A

A ratio variable, has all the properties of an interval variable, and also has a clear definition of 0.0. When the variable equals 0.0, there is none of that variable.

Examples of ratio variables include:

enzyme activity, dose amount, reaction rate, flow rate, concentration, pulse, weight, length, temperature in Kelvin (0.0 Kelvin really does mean “no heat”), survival time.

When working with ratio variables, but not interval variables, the ratio of two measurements has a meaningful interpretation. For example, because weight is a ratio variable, a weight of 4 grams is twice as heavy as a weight of 2 grams. However, a temperature of 10 degrees C should not be considered twice as hot as 5 degrees C. If it were, a conflict would be created because 10 degrees C is 50 degrees F and 5 degrees C is 41 degrees F. Clearly, 50 degrees is not twice 41 degrees. Another example, a pH of 3 is not twice as acidic as a pH of 6, because pH is not a ratio variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Which data type has the highest degree of information?

A

Ratio, because it has an absolute zero

all op the operators including times and divided by are included

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

When can a histogram be use?

A

When the data is interval

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How are the number of calls intervals determined?

A

Depends on the number of observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How is the class interval width determined?

A

(largest-smallest observation)/#classes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How does positive skewness look like?

A

skewed to the left. (more frequent observations on the left)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a modal class?

A

With a distribution in classes it is the class with the highest frequency.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are descriptive techniques for qualitative data?

A

Bar and pie charts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does the bar chart display?

A
  • emphasizes the frequency of occurrence of the different categories
  • nominal or ordinal variable
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does the pie chart display?

A
  • emphasizes the proportion of occurrences of each category

- nominal or ordinary variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is an ogive?

A

In statistics, an ogive is a graphic showing the curve of a cumulative distribution function drawn by hand. The points plotted are the upper class limit and the corresponding cumulative frequency.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the arithmetic mean?

A

mean = sum of all observations/ # of observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

How is the sample mean denoted?

A

x with a bar on it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

How is the population mean denoted?

A

u

20
Q

What is the median?

A

value that is in the middle when the measurements are arranged in order of magnitude

21
Q

When is the median and not the mean used?

A

When there a a few deviating observations. For example when there are only poor people in one room and one billionaire the mean would be very high but the median would remain low.

22
Q

What is a mode?

A

The value that is most common in a row of data is the mode. In other words, it is the value with the highest frequency.

With a distribution in classes is the class with the highest frequency the modal class. If there are two values with the highest frequency, there is no mode.

23
Q

Geometric mean; when is it used?

A
  • is used when the arithmetic mean is inappropriate
  • e.g. when the average growth rate should me measured (in finance return on investment)

Geometric mean can only be calculated for positive numbers and is always less than arithmetic meanwhile arithmetic mean can be calculated for both positive and negative numbers and is always greater than the geometric mean

24
Q

Formula geometric mean

A

[(1+R1 )×(1+R 2 )×(1+R 3​ )…×(1+Rn )]^1/n−1
where:
R=Return
n=Count of the numbers in the series

25
Q

arithmetic mean

A

adding the observations divided by the # of observations

26
Q

What are the measurements of spread?

A

range, variance, SD, coefficient of variation, interquartile range

27
Q

Range

A
  • is a set of measurements it is the difference between the largest and smallest observation
  • but is sensitive to extreme observations
  • but does not provide information on the dispersion of the values between the end points
28
Q

Variance

A
  • is the average squared deviation of the observations from their mean
  • ((xi-u)^2)/n
29
Q

How do you calculate the SD?

A

Squareroot of the variance

30
Q

What is the coefficient of variation?

A

cv= SD/mean value

The coefficient of variation (CV) is the ratio of the standard deviation to the mean. The higher the coefficient of variation, the greater the level of dispersion around the mean. It is generally expressed as a percentage. … The lower the value of the coefficient of variation, the more precise the estimate.

31
Q

symbol for sample SD

A

s

32
Q

symbol for population SD

A

ó

33
Q

If a sample of measurements is bell-shaped, the interval x-s, x+s contains approximately ? of the measurements

A

68% of the measurements

34
Q

If a sample of measurements is bell-shaped, the interval x-2s, x+2s contains approximately ? of the measurements

A

95%

35
Q

If a sample of measurements is bell-shaped, the interval x-3s, x+3s contains approximately ? of the measurements

A

99.7%

36
Q

Chebysheff’s theorem

A

Chebyshev’s Theorem

For any numerical data set. Empirical rule is only for bell shapes

37
Q

How to locate a percentile

A
  1. sort the measurement
  2. lp (location percentile) = (n+1)*p/100
    n= # of observations
  3. if you have 2.75 for instance it means that the value is 75% between your 2nd and 3rd value
38
Q

Box plots

A
- is a pictorial display that provides the Ain descriptive measures of a data set, using the so-called five umber summary of the data
s = the smaller measurement
Q1 = the lower quartile
Q2= is the median
Q3= the upper quartile
L = the largest measurement
39
Q

What is an outlier in the Box plot?

A
  • an outlier is defined as a value located at a distance more than 1.5*(Q3-Q1) from the box (more than the Whisker)
40
Q

Where to draw the Whisker?

A

at the left size of Q1 and the right side of Q3

41
Q

What are descriptive statistics?

A

for organizing, summarizing and presenting data sample

- graphical and numerical techniques

42
Q

graphical technique

A

Bar and Pie charts, line charts, histogram

43
Q

numerical technique

A
  • central location

- variability

44
Q

central location/tendency

A

arithmetic mean, median, mode

45
Q

variability

A

range, variance, SD, coefficient of variation, percentile, box plot