Descriptive Stats Flashcards

1
Q

What are measures of central tendency

A

Methods employed to determine central point in a given distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Mode

A
  • Corresponds to score that has highest frequency in a frequency distribution (visually its the
    highest value)
  • For grouped distribution (histogram) its defined as the most frequently occurring interval (or
    the mid-point of that interval).
  • It is applicable to almost all kinds of data-sets.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Disadvantages of mode

A
  • Lack of reliability
  • Lack of precision in some cases
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Unimodal distribution

A

Distributions with single highest values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Bimodal distribution

A

Distributions with two highest values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Properties of mean

A
  • If a constant is added (or subtracted) to every score in a distribution, the mean is increased
    (or decreased) by that constant.
  • If every score is multiplied (or divided) by same constant, the mean will be multiplied (or
    divided) by the same constant.
  • The sum of deviations from the mean will be equal to zero.
  • The sum of squared deviations from the mean will be less than the sum of squared
    deviation around any other point in the distribution.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Median

A
  • Corresponds to determining middle score of a distribution, after arranging the data in ascending order.
  • Corresponds to 50th percentile of a distribution.
  • If a distribution has odd number of scores then median is literally the middle value in a distribution
    (provided the data is arranged in ascending order).
  • If a distribution has even number of scores then median corresponds to average of two middle scores.
  • In principle it divides a distribution into two equal halves.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Disadvantages of median

A
  • Its not applicable to all kinds of data-sets.
    (e.g., The median cannot be identified for categorical nominal data, as it cannot be
    logically ordered).
  • Median is more informative if there are not many ties, and the distribution is skewed.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Whathappens if there is a great deal of variability

A

No measure of central tendency is very representative of the scores, if the
distribution contains a great deal of variability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Different measures of variability

A
  • Range
  • Semi-Interquartile range
  • Mean deviation
  • Variance
  • Standard deviation.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Range

A
  • Evaluates width of a distribution by subtracting lowest (lowest real limits) from highest
    score (highest real limits).
  • The advantage is that it captures whole distribution.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Disadvantages of range

A
  • The major disadvantage of range is that, just like mode, it’s unreliable.
  • The range can be changed drastically by removing or adding just one score in the
    distribution.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Semi interquartile range

A
  • This type of measure of variability can be used for open-ended distribution.
  • The interquartile range is obtained by subtracting the 25th percentile from the 75th
    percentile. The semi-interquartile range is half the interquartile range.
  • It does not get affected much by addition or subtraction of extreme scores from a
    distribution.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Mean deviation

A
  • It evaluates distance of every score from the mid point of the distribution and averages it
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Deviation score = ?

A

Mean - Individual score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Mean deviation calculation

A
  • Deviation score
  • Mean of al deviation scores (take absolute deviation scores
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is variance also referred to as

A

Mean square

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

SS = ?

A

summation of (individual score - mean)^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Variance = ?

A

SS / N

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Standard deviation

A
  • calculated by taking the root of teh variance
  • also called the root mean square
  • affected by scores having large deviations in distribution
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Properties of standard deviation

A
  • If a constant is added (or subtracted) to every score in a distribution, the standard deviation
    is not affected.
  • If every score is multiplied (or divided) by same constant, the standard deviation will be
    multiplied (or divided) by the same constant.
  • The standard deviation from the mean will be smaller than the standard deviation from any
    other point in the distribution.
22
Q

What is positive skewness

A

A positive skewness represents asymmetrical
distribution with long right tail.

23
Q

What is negative skewness

A

A negative skewness represents
asymmetrical distribution with long left tail.

24
Q

Skewness = ?

A

Summation of (individual score - mean)^3 / N

25
Central tendencies of a skewed distribution
- When the distribution is negatively skewed, the mean will be to the left of the median - When the distribution is positively skewed, the mean will be to the right of the median
26
Important distinction
two distributions can both be symmetric (i.e., skewness equals zero), unimodal, and bell-shaped and yet not be identical in shape.
27
How can kurtosis be measured
by raising deviations from the mean to the fourth power, taking their average, and then dividing by the square of the population variance.
28
What does negative kurtosis indicate?
relatively thin tails and a lesser peakedness in the middle (a platykurtic distribution).
29
What does positive kurtosis indicate
relatively fat tails and more peakedness in the middle of the distribution (a leptokurtic distribution),
30
What is mesokurtic distribution?
If the kurtosis measure is set to zero for the normal (mesokurtic) distribution (by subtracting 3 in the above formula),
31
What is kurtosis measured relative to?
relative to the kurtosis of a normal distribution, which is 3. Therefore, we are always interested in the “excess“ kurtosis,
32
Excess kurtosis= ?
Excess kurtosis = sample kurtosis – 3
33
What is kurtosis used for quantifying?
non-normality—the deviation from a normal distribution—of a distribution.
34
What does a value of 3 or more indicate?
large departure from normality.
35
What does a very small value of kurtosis indicate?
a deviation from normality, but it is considered as benign deviations.
36
What is population analysis?
statistics applied to the whole data set
37
What is ample analysis
Statistics applied ot a sub set of teh whole data
38
Sample variance can be
Larger or smaller than population variance
39
What equals to the population variance
If infinitely many sample variances are calculated and their average is taken
40
What is degree of freedom
The number of deviations that are free to vary
41
df = ?
N-1
42
What is confidence interval?
the range of likely values of the parameter
43
What is teh standard error of mean
the standard deviation divided by the square root of the number of samples.
44
What is the variance
the average of the squared deviations from the mean across the number of samples.
45
What are outliers
those observations that differ strongly (different properties) from the other data points in the sample of a population.
46
Sources of outliers
Human errors (wrong data entry), Measurement errors (faulty system/ tool), Data manipulation error (Faulty data pre-processing), Sampling errors (creating samples from heterogeneous sources),
47
Methods for indicating outliers
1. Tukey’s Fences (or Quartile method) 2. Z – Score 3. Local Outlier Function 4. Angle based Outlier Detection (AbOD) 5. Silhouette (K-Means Clustering) 6. Confidence Interval (CI) of fit
48
What is H-spread
The length of the box and is equal to teh interquartile range, not teh semi-interquartile range
49
What are the inner fences
The outermost limits of teh plot
50
Inner fence is equal to
1.5 times the H spread
50
The whiskers do not generally extend to the
inner fences
51
End of upper and lower inner fences are known as
upper adjacent value and lower adjacent value