Ch.14, Descriptive Statistics Flashcards by Elise Voyevoda

Define descriptive stats

Describe data in ways rear give us a better idea of their charachteristics; Number that summarizes a set of data
NOT a correlation statistic (correlations are inferential)

How well did you know this?

Not at all

Perfectly

What is the simplest measure of dispersion?

Range: take maximum — minimum

How well did you know this?

Not at all

Perfectly

What are data matrices?

Putting data into a grid: a matrix
Opportunity to exam all data in one place

How well did you know this?

Not at all

Perfectly

Histograms

graphical display of values where each bar indicates the frequency of the range or value
LIMITATIONS: The more “accessible” a data set is, the less information/less complexity you’re conveying
Advantages: identifies mode, helps to identify potential outliers

How well did you know this?

Not at all

Perfectly

Binning

Binning in data mining is a data preprocessing technique that involves grouping data into smaller, more manageable categories or bins. It can be used for both numerical and categorical data and can help improve the efficiency and accuracy of data analysis.

How well did you know this?

Not at all

Perfectly

Stem-Plots

both a graph and a chart that displays each score in a data set so that it visually represents the distribution/ frequency of scores
Stem: leading numbers
Leaves: trailing numbers

How well did you know this?

Not at all

Perfectly

What does sigma mean and what it is its symbol?

Σ= sum of all scores

How well did you know this?

Not at all

Perfectly

What does –
x. mean?

mean

How well did you know this?

Not at all

Perfectly

Mean, advantages/disadvantages

Advantages: very common, takes into account every entry of a data set
Disadvantages: extremely influenced by outliers, knowledge about individual cases is completely lost with average

How well did you know this?

Not at all

Perfectly

Population vs. sample mean

CAN NEVER REALLY KNOW THIS, whatever you’re trying to make a generalization about ; Population Mean: (mu greek symbol is the population mean) mean of the entire population (on charts)
Sample Mean: mean of your sample (on charts)

How well did you know this?

Not at all

Perfectly

Median, advantages/disadvantages

Middle (from lowest to highest)
At the median half the data set is below that number and half the data set is above that number
Position of Median = number of entries + 1/ 2
Odd Number of Entries: median is the middle data entry
^^Even Number of Entries: median is the mean of the two middle data entries
Advantages: not influenced by outliers, reasonable estimate of what most people mean by the center of a distribution “reasonable” average salary in Canada not including billionaires
Disadvantages: may not be good to ignore extreme values in all cases;

How well did you know this?

Not at all

Perfectly

Disadvantages, Advantages, Mode

LEAST USED, NOMINAL/CATEGORICAL VARIABLE Mode
Most frequently occurring; if there is no entry that is repeated there is no mode
Data can be bimodal, 3 OR MORE MODES= MULTI-MODAL)
Elections use this often to represent who said what party the most/ ask what most popular dish at a cafe
Advantages: most frequently obtained score which can be useful, not influenced by extreme scores and works when outliers aren’t relevant
Disadvantages: may not represent a large proportion of the scores, there’s still a bunch of answers that might be very frequent as well and it completely ignores those

How well did you know this?

Not at all

Perfectly

Advantages/disadvantages of range

Range can never be negative: ALWAYS HAS TO BE ABSOLUTE VALUES
Advantages: includes all the data, simple,
Disadvantages: sensitive to small sample sizes, if you have a small sample of a broader population you wouldn’t get the full range in your small (small samples = less range), small samples = not a representative range, doesn’t tell you anything about where the bulk of the values are and is affected by outliers

How well did you know this?

Not at all

Perfectly

What are interquartiles?

INTERQUARTILES SHOW DISPERSION AROUND MEDIAN

How well did you know this?

Not at all

Perfectly

What is a quartile?

Quartiles: positions in a range of values representing multiples of 25%

How well did you know this?

Not at all

Perfectly

What is the first and third quartile?

First Quartile: 25% of scores fall below the first quartile, 75% above (Q1: splitting bottom half in half)
Third Quartile: 75% of scores fall below the third quartile, 25% fall above (Q3: splitting top half in half)

How well did you know this?

Not at all

Perfectly

What does the second quartile do?

Measure of distance between the first and third quartile (special kind of range that includes just the middle 50% of values) WHERE THE MIDDLE HALF OF THE DATA IS; TELLS YOU WHERE THE MIDDLE IS (25%, 30%, 30%, 25%) Interquartile Range would be 30%

How well did you know this?

Not at all

Perfectly

Deviation Calculation

Study These Flashcards

DEVIATION IS CALCULATING ONE SCORE’S DISTANCE FROM THE MEAN; STANDARD DEVIATION IS CALCDifference between each score and the mean of the data set
*Deviation shows dispersion around the MEAN rather than the median
Deviation of x (any given score) = x (that score) — x (the average)
(1) First, find the mean
(2) Then determine the deviation with the above formula
(3) Deviation scores all together should always sum to zero

Variance calculation

Study These Flashcards

Isn’t usually reported because it’s not informative, not as usual as standard deviation: standard deviation is much more useful
Single number representing the average amount of variation in a set of scores
(1) Find mean of the data set
(2) Find deviation of each entry
(3) Square each deviation
(4) Add to get sum of squares
(5) Divide by n-1 to get the sample variance

Standard Deviation Calculation

Study These Flashcards

Average of the deviations
Measure of the spread of scores out from the mean of sample
Calculate Variance then take square root of it
(1) Find mean of the data set
(2) Find deviation of each entry
(3) Square each deviation
(4) Add to get sum of squares
(5) Divide by n-1 to get the sample variance
(6) find the square root of the variance

What measures of central tendency do NOIR use?

Study These Flashcards

Nominal: only use the mode
Ordinal: use mode and median (The mean cannot be computed with ordinal data. Finding the mean requires you to perform arithmetic operations like addition and division on the values in the data set. Since the differences between adjacent scores are unknown with ordinal data, these operations cannot be performed for meaningful results)
Interval: use mode, median, and mean
Ratio: use mode, median, and mean

What are confidence intervals?

Study These Flashcards

the range of values that you expect your estimate to fall between a certain percentage of the time if you run your experiment again or re-sample the population in the same way. Range of values based on sample data likely to contain a true value
The confidence interval provides a sense of the size of any effect. The figures in a confidence interval are expressed in the descriptive statistic to which they apply (percentage, correlation, regression, etc.). This effect size information is missing when a test of significance is used on its own.

Skewed Distribution

Study These Flashcards

distributions that are not normal; large amount of scores are clumped at either end

Left skewed distribution

Study These Flashcards

majority of the numbers clumped at the right, with the long tail pointing toward the left

Right skewed distribution

Majority of numbers at left, with tail pointing toward right

Median Cut-Point

Cut Point = sample (n) + 1 / 2 Provides location of the median in data set Not affected by outliers

Absolute Frequeny

Absolute Frequency: adding each case Raw counts of the number of cases associated with each value

Relative Frequency

Percentage of cases associated with particular value or category (how much of the data responded that way (25% responded this way, 15% responded this way etc. ADD TO 100%)

Cumulative Frequency

sum of cases associated with a value/category and all classes below it

Cumulative Percentage

what percent of people spent ten hours together or less (pick a mark and go up or down) what percent of people spent ten hours or more together ALWAYS COUNTS FROM LOW TO HIGH,

Sum of squares

dispersion of the data set found by: 1. subtracting each number from the mean 2. squaring that number 3. adding those squares ss= Σ (x -- x with line over top) squared

What are Z scores useful for?

allows us to compare one score to another can compare a score to another data sample that uses a different score

Scatter Plots

visual representation between two variables in which each value is represented as a dot

Line of Best Fit

line that minimizes the distance between the line and actual data point

Where is the mean located on an asymmetrical distribution?

Mode is always furthest away from tail Mean is closest to the tail (most affected by extreme cases CLOSER TO TAIL = MORE AFFECTED ) mean is better used on normal distributions/symmetrical distributions Median is always between these two

What is the standard deviation in a normal distribution?

Two thirds of the data, in normal distribution, are within ONE standard deviation on either side of the mean (WITHIN -1 AND +1 STANDARD DEVIATIONS ON EITHER SIDE OF THE MEAN)

Q Position

QNumber (n+1)/4

IQR

IQR= Q3—Q1

High-End Outliers

HIGH END Outliers: any scores that are LARGER, not equal to, than Q3 + (1.5 x IQR)

Low-End Outliers

LOW END OUTLIERS : Scores less than, NOT EQUAL TO, Q1– (1.5 X IQR)

Box and Whisker Plot

“Whiskers”: smallest value within lower inner fence DRAWN TO THE LARGEST VALUE IN THE DATA SET THAT IS STILL WITHIN THE FENCE

Scatterplots

Describing association between two variables Each individual dot is a person or dyad: has two values (X and a Y) Shows direction and strength of relationships Shows degree of covariance

Rounding rule for Q1 and Q3?

round up to nearest 0.5 for Q1, down to nearest 0.5 for Q3

Ch.14, Descriptive Statistics Flashcards

(44 cards)