Module 2A Visualising Variability Flashcards

1
Q

What is variation in the context of data analysis?

A

The spread or difference between data points in a dataset, showing how much they differ from each other.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a random variable?

A

A quantity whose value is uncertain and can vary based on chance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does a frequency distribution describe?

A

The value of a variable and how often they appear in the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a categorical variable?

A

Data that consists of labels or names for which arithmetic manipulation is impossible

Examples include gender, color, or brand names.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Define a quantitative variable.

A

Data that consists of numerical values for which arithmetical manipulation is possible

Examples include age, height, or income.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a sample in statistics?

A

A subset of the population that makes data collection feasible

Samples are used to infer characteristics about larger populations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is relative frequency?

A

The proportion of times a value occurs in a dataset, calculated as: Frequency of a value / Total number of values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How is percent frequency calculated?

A

Frequency of a value / Total number of values * 100

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a probability distribution?

A

It shows how the possible values of a random variable are distributed and the likelihood of each value occurring.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does Benford’s Law state?

A

States that in many data sets, the proportion of observations in which the first digit is 1, 2, 3, 4, 5, 6, 7, 8, or 9 follows a specific distribution

This law is often used in fraud detection and data analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does skewness represent in a quantitative distribution?

A

The lack of symmetry in a quantitative distribution

It indicates how much the distribution deviates from a normal distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a frequency polygon?

A

A line graph that shows the distribution of data by plotting the midpoints of each class interval and connecting them with straight lines

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a Trellis Display?

A

A grid of small graphs that shows how data patterns change across different categories or conditions. (Same formatting but different data sets)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the first quartile?

A

25th percentile

Quartiles divide the data set into four equal parts.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the second quartile also known as?

A

The median

It represents the middle value of the data set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How is the interquartile range calculated?

A

3rd quartile minus 1st quartile

It measures the spread of the middle 50% of the data.

17
Q

What does the mean represent?

A

The sum of the values divided by the sample size

It is a measure of central tendency.

18
Q

How is the median defined?

A

The middle value of the sample size; if sample size is even, take the average of the two middle points

It is less affected by outliers compared to the mean.

19
Q

What is the mode?

A

The most frequent value(s) in the data set

A data set can have multiple modes or none at all.

20
Q

How is the range calculated?

A

Largest value minus smallest value in the set

It gives a measure of the spread of the data.

21
Q

What does standard deviation measure?

A

The average deviation from the mean

It quantifies the amount of variation or dispersion in a set of values.

22
Q

What is the Empirical Rule for bell-shaped distributions regarding data values within one standard deviation?

A

68% of the data values lie within one standard deviation of the mean

This rule provides a quick estimation of data spread.

23
Q

What percentage of data values lie within two standard deviations of the mean according to the Empirical Rule?

A

95%

This helps in understanding the distribution of data points.

24
Q

What percentage of data values lie within three standard deviations of the mean according to the Empirical Rule?

A

99.7%

This is known as the 68-95-99.7 rule.

25
What does a Box-and-Whisker Plot use to display data?
It uses the measures of variability to display data. ## Footnote It shows the median, quartiles, and potential outliers.
26
What is a Violin Chart?
An advanced visualization that combines a box and whisker chart with a rotated and mirrored kernel density chart ## Footnote It provides a richer representation of data distribution.
27
What is statistical inference?
The process of using data from a sample to make conclusions or predictions about a larger population.
28
What is a confidence interval?
Provides a range of values within which the true population parameter is expected to lie.
29
How is a confidence interval on a mean calculated?
Sample mean ± margin of error ## Footnote It reflects the uncertainty associated with the sample mean.
30
What does the margin of error represent?
The maximum expected difference between the sample estimate and the true population value. (Uncertainty on the parameter)
31
What is time series data?
Data collected or recorded at regular time intervals, showing how values change over time. ## Footnote It is often used for forecasting and trend analysis.
32
What is a time series chart?
A line graph that shows how data points change over time, with time on the x-axis and the measured values on the y-axis.