- Corresponds to score that has highest frequency in a frequency distribution (visually its the highest value) For grouped distribution (histogram) its defined as the most frequently occurring interval (or the mid-point of that interval). It is applicable to almost all kinds of data-sets.

If a constant is added (or subtracted) to every score in a distribution, the mean is increased (or decreased) by that constant. If every score is multiplied (or divided) by same constant, the mean will be multiplied (or divided) by the same constant. The sum of deviations from the mean will be equal to zero. The sum of squared deviations from the mean will be less than the sum of squared deviation around any other point in the distribution.

Corresponds to determining middle score of a distribution, after arranging the data in ascending order. Corresponds to 50th percentile of a distribution. If a distribution has odd number of scores then median is literally the middle value in a distribution (provided the data is arranged in ascending order). If a distribution has even number of scores then median corresponds to average of two middle scores. In principle it divides a distribution into two equal halves.

Evaluates width of a distribution by subtracting lowest (lowest real limits) from highest score (highest real limits). The advantage is that it captures whole distribution.

- It evaluates distance of every score from the mid point of the distribution and averages it

- calculated by taking the root of teh variance - also called the root mean square - affected by scores having large deviations in distribution

Descriptive Stats Flashcards by Mansi Bhatia

What are measures of central tendency

Methods employed to determine central point in a given distribution

How well did you know this?

Not at all

Perfectly

Mode

Corresponds to score that has highest frequency in a frequency distribution (visually its the
highest value)
For grouped distribution (histogram) its defined as the most frequently occurring interval (or
the mid-point of that interval).
It is applicable to almost all kinds of data-sets.

How well did you know this?

Not at all

Perfectly

Disadvantages of mode

Lack of reliability
Lack of precision in some cases

How well did you know this?

Not at all

Perfectly

Unimodal distribution

Distributions with single highest values

How well did you know this?

Not at all

Perfectly

Bimodal distribution

Distributions with two highest values

How well did you know this?

Not at all

Perfectly

Properties of mean

If a constant is added (or subtracted) to every score in a distribution, the mean is increased
(or decreased) by that constant.
If every score is multiplied (or divided) by same constant, the mean will be multiplied (or
divided) by the same constant.
The sum of deviations from the mean will be equal to zero.
The sum of squared deviations from the mean will be less than the sum of squared
deviation around any other point in the distribution.

How well did you know this?

Not at all

Perfectly

Median

Corresponds to determining middle score of a distribution, after arranging the data in ascending order.
Corresponds to 50th percentile of a distribution.
If a distribution has odd number of scores then median is literally the middle value in a distribution
(provided the data is arranged in ascending order).
If a distribution has even number of scores then median corresponds to average of two middle scores.
In principle it divides a distribution into two equal halves.

How well did you know this?

Not at all

Perfectly

Disadvantages of median

Its not applicable to all kinds of data-sets.
(e.g., The median cannot be identified for categorical nominal data, as it cannot be
logically ordered).
Median is more informative if there are not many ties, and the distribution is skewed.

How well did you know this?

Not at all

Perfectly

Whathappens if there is a great deal of variability

No measure of central tendency is very representative of the scores, if the
distribution contains a great deal of variability.

How well did you know this?

Not at all

Perfectly

Different measures of variability

Range
Semi-Interquartile range
Mean deviation
Variance
Standard deviation.

How well did you know this?

Not at all

Perfectly

Range

Evaluates width of a distribution by subtracting lowest (lowest real limits) from highest
score (highest real limits).
The advantage is that it captures whole distribution.

How well did you know this?

Not at all

Perfectly

Disadvantages of range

The major disadvantage of range is that, just like mode, it’s unreliable.
The range can be changed drastically by removing or adding just one score in the
distribution.

How well did you know this?

Not at all

Perfectly

Semi interquartile range

This type of measure of variability can be used for open-ended distribution.
The interquartile range is obtained by subtracting the 25th percentile from the 75th
percentile. The semi-interquartile range is half the interquartile range.
It does not get affected much by addition or subtraction of extreme scores from a
distribution.

How well did you know this?

Not at all

Perfectly

Mean deviation

It evaluates distance of every score from the mid point of the distribution and averages it

How well did you know this?

Not at all

Perfectly

Deviation score = ?

Mean - Individual score

How well did you know this?

Not at all

Perfectly

Mean deviation calculation

Deviation score
Mean of al deviation scores (take absolute deviation scores

How well did you know this?

Not at all

Perfectly

What is variance also referred to as

Mean square

How well did you know this?

Not at all

Perfectly

SS = ?

summation of (individual score - mean)^2

How well did you know this?

Not at all

Perfectly

Variance = ?

SS / N

How well did you know this?

Not at all

Perfectly

Standard deviation

calculated by taking the root of teh variance
also called the root mean square
affected by scores having large deviations in distribution

How well did you know this?

Not at all

Perfectly

Properties of standard deviation

Study These Flashcards

If a constant is added (or subtracted) to every score in a distribution, the standard deviation
is not affected.
If every score is multiplied (or divided) by same constant, the standard deviation will be
multiplied (or divided) by the same constant.
The standard deviation from the mean will be smaller than the standard deviation from any
other point in the distribution.

What is positive skewness

Study These Flashcards

A positive skewness represents asymmetrical
distribution with long right tail.

What is negative skewness

Study These Flashcards

A negative skewness represents
asymmetrical distribution with long left tail.

Skewness = ?

Study These Flashcards

Summation of (individual score - mean)^3 / N

Central tendencies of a skewed distribution

- When the distribution is negatively skewed, the mean will be to the left of the median - When the distribution is positively skewed, the mean will be to the right of the median

Important distinction

two distributions can both be symmetric (i.e., skewness equals zero), unimodal, and bell-shaped and yet not be identical in shape.

How can kurtosis be measured

by raising deviations from the mean to the fourth power, taking their average, and then dividing by the square of the population variance.

What does negative kurtosis indicate?

relatively thin tails and a lesser peakedness in the middle (a platykurtic distribution).

What does positive kurtosis indicate

relatively fat tails and more peakedness in the middle of the distribution (a leptokurtic distribution),

What is mesokurtic distribution?

If the kurtosis measure is set to zero for the normal (mesokurtic) distribution (by subtracting 3 in the above formula),

What is kurtosis measured relative to?

relative to the kurtosis of a normal distribution, which is 3. Therefore, we are always interested in the “excess“ kurtosis,

Excess kurtosis= ?

Excess kurtosis = sample kurtosis – 3

What is kurtosis used for quantifying?

non-normality—the deviation from a normal distribution—of a distribution.

What does a value of 3 or more indicate?

large departure from normality.

What does a very small value of kurtosis indicate?

a deviation from normality, but it is considered as benign deviations.

What is population analysis?

statistics applied to the whole data set

What is ample analysis

Statistics applied ot a sub set of teh whole data

Sample variance can be

Larger or smaller than population variance

What equals to the population variance

If infinitely many sample variances are calculated and their average is taken

What is degree of freedom

The number of deviations that are free to vary

df = ?

N-1

What is confidence interval?

the range of likely values of the parameter

What is teh standard error of mean

the standard deviation divided by the square root of the number of samples.

What is the variance

the average of the squared deviations from the mean across the number of samples.

What are outliers

those observations that differ strongly (different properties) from the other data points in the sample of a population.

Sources of outliers

Human errors (wrong data entry), Measurement errors (faulty system/ tool), Data manipulation error (Faulty data pre-processing), Sampling errors (creating samples from heterogeneous sources),

Methods for indicating outliers

1. Tukey’s Fences (or Quartile method) 2. Z – Score 3. Local Outlier Function 4. Angle based Outlier Detection (AbOD) 5. Silhouette (K-Means Clustering) 6. Confidence Interval (CI) of fit

What is H-spread

The length of the box and is equal to teh interquartile range, not teh semi-interquartile range

What are the inner fences

The outermost limits of teh plot

Inner fence is equal to

The whiskers do not generally extend to the

inner fences

End of upper and lower inner fences are known as

upper adjacent value and lower adjacent value

Descriptive Stats Flashcards

(52 cards)