Basic Descriptive Statistics Flashcards

(55 cards)

1
Q

what are statistics?

A

branch of applied mathematics that involves the collection, description, analysis, and inference of conclusions from quantitative data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what are descriptive stats?

A

summarizes or describes the characteristics of a data set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what are examples of descriptive stats?

A

measures of central tendency (mean, median, mode), measures of variability (spread), and frequency distribution (count)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

inferential stats?

A

involves the use of a sample to estimate some characteristics in a large population and/or test a research hypothesis about a given population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

primary data

A

acquired directly from source

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

secondary data (or archival data)

A

collected by someone else

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

nominal data and example

A

categories without any inherent order or meaningful sequence

ex: fruit colors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

ordinal data and example

A

ranked data

ex: strongly agree to strongly disagree

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

interval data and example

A

data with ordered categories and equal intervals between them with no true zero point

ex: temperature

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

ratio data and example

A

data with ordered categories and equal intervals between them with a true zero

ex: height, weight, age, income

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

measures of central tendency definition

A

univariate statistic that indicates the average observed value of a variable in a data set or the center of the frequency distribution of the data set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what is the median?

A

variable in the middle, splitting is 50/50

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

if the distribution is symmetric, mean and median are same/different?

A

Same

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

if distribution almost symmetric, mean and median are ___

A

almost the same

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

if the distribution is skewed, the mean is pulled in the direction of…

A

the long tail

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what does it mean to have have unstable median?

A

the median value of a dataset or distribution fluctuates or is sensitive to small changes in the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

is the median resistant to ouliers?

A

yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

what is mode

A

most frequent value (value not the frequency itself)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

what type of data is mode most commonly used for?

A

nominal because it identifies the most frequent category

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

measures of dispersion

A

descriptive stats that describe how similar a set of scores are to each other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

the most similar the scores are to each other, the ____ the measure of dispersion will be

A

lower

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

the less similar the scores are to each other, the ___ the measure of dispersion will be

A

higher

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

a taller curve has ___ dispersion, a flatter curve has ___ dispersion

24
Q

when would you use range?

A

when you have ordinal data or you are presenting your results with little or no knowledge of stats

rarely used as it is fairly insensitive

25
what is the interquartile range (IQR)?
the range of the middle 50% of the data. It is calculated as the difference between the third quartile (Q3) and the first quartile (Q1)
26
what is the semi-interquartile range (SIR)?
half the interquartile range (difference between Q1 and Q3 divided by 2)
27
what is variance?
it tells you how far, on average, each data point is from the mean (average) of the dataset
28
how do you solve for variance?
subtract the mean from each score and square, then divide by number of data
29
the larger the variance, the ___ the scores deviate away from the mean
more
30
why is the formula for variance of a sample different from formula for population variance?
because the sample mean is not a perfect estimate of the population mean
31
degree of freedom
the number of values that are free to vary when calculating a statistic Degrees of freedom help account for the fact that some of the data is already used up in estimating certain values (like the mean), so fewer data points are "free" for calculating other statistics
32
what is standard deviation
the square root of the variance a measure of how much the values in a dataset tend to differ from the average (mean) value
33
why do we use SD instead of variance?
because squared numbers are awkward to use
34
what is coefficient of variation
shows how much variability or spread there is in a dataset relative to the mean (average) of the data. It’s often used to compare the variability between different datasets, especially when they have different units or scales.
35
formula for CV
CV = S/X bar(100)
36
if data are symmetric, with no serious outliers, what measures of variabilities should be used
range and SD
37
if data is skewed and/or have serious outliers, what measure of variability should be used
IQR
38
if comparing variation across two data sets, what measure of variability should be used?
CV
39
what is the notch?
helps you visually assess whether the medians of different groups are likely different or similar, based on confidence intervals.
40
if two boxes' notches do overlap, there is ____ evidence their medians differ
strong
41
what are error bars?
help indicate estimated error or uncertainty to give a general sense of how precise a measurement is
42
histogram
bar graph of frequencies or percentages
43
in a negatively skewed distribution, mean __ median
<
44
in a positively skewed distribution, mean __ median
>
45
what is sturges' formula?
used to determine the number of bins or intervals (also called "classes") to use when creating a histogram for a dataset.
46
what is the mean center (centroid)?
central tendency of a group of points located on a Cartesian coordinate system average position of the points
47
how to find mean center
get mean of x values and then on y values, those coordinates are mean center
48
weighted mean center
some data is more important than others
49
what is the median center (Euclidean median)
the point that minimizes aggregate distance to the center
50
what is standard distance
measures how far data points are from a central value or from each other.
51
standard deviational ellipse
graphical tool used to represent the variability and directionality of a set of data points in a two-dimensional space
52
kernel density estimation
a method used to estimate and visualize the distribution of data points, creating a smooth curve that represents the underlying distribution
53
what proportions and what are they useful for and give example
sum of proportions = 1 useful for comparing two sets of data with different sizes and category counts e.g. a different box of marbles gives a yellow proportion of 2/23, and in order for this to be a reasonable comparison we need to know the totals for both samples
54
what is location quotient?
used to compare the relative concentration of characteristics in a specific area against a broader reference area
55
standard error
a measure of the variability or dispersion of a sample statistic, such as the sample mean, from the true population parameter