# Unit6Vocabulary Flashcards

1
Q

Skewed Left

A

Also known as negatively skewed, the bulk of the data items are clustered on the positive end of a graph with the long tail to the left.

2
Q

Mean

A

The average value of all the data in a dataset. Calculated by adding up the values of all data items and then dividing by the number of items in the dataset.

2
Q

z-score

A

A value indicating the number of standard deviations a data item is from the mean of its dataset.

3
Q

Box and Whisker Plot

A

A graphical representation of the five number summary.

5
Q

Upper Quartile

A

The median of the upper half of a dataset.

7
Q

Bivariate

A

Two datasets used to measure correlation.

8
Q

Strong Positive Correlation

A

Indicated by a correlation coefficient as defined below:

{ r | 0.7 < r < 1 }

8
Q

Weak Negative Correlation

A

Indicated by a correlation coefficient as defined below:

{ r | -0.1 < r < -0.3 }

8
Q

Weak Positive Correlation

A

Indicated by a correlation coefficient as defined below:

{ r | 0.1 < 0.3 }

9
Q

Maximum

A

The largest data value in a dataset.

9
Q

Neutral Positive Correlation

A

Indicated by a correlation coefficient as defined below:

{ r | 0.4 < r < 0.6 }

10
Q

Correlation Coefficient

A

A statistical measure of how linear a bivariate dataset is. Typically represented with a lowercase r:

{ r | -1 < r < 1 }

10
Q

Lower Quartile

A

The median value of the lower half of a dataset.

11
Q

Skewed Right

A

Also known as positively skewed, the bulk of the data items are clustered on the negative end of a graph with the long tail to the right.

12
Q

Histogram

A

A graphical representation of the clustering of a dataset based on a specified bin width and the number of data items within each bin.

14
Q

Bell Curve

A

A graphical representation of the spread of a normal dataset indicating 1, 2, and 3 standard deviations from mean.

14
Q

Median

A

The middle data item in a dataset. When the number of items is even, the median is calculated by taking the middle 2 terms and averaging them.

15
Q

Standard Deviation

A

A statistical measure of the average distance the data items within a dataset are from the mean.

16
Q

Strong Negative Correlation

A

Indicated by a correlation coefficient as defined below:

{ r | -0.7 < r < -1 }

17
Q

Causation

A

In a bivariate data analysis, high correlation is often cited as an indication of a causal relationship. Causation is when it is proven that one thing causes a change in another thing. Correlation does not imply causation.

17
Q

No Correlation

A

Indicated by a correlation coefficient near or equal to zero.

19
Q

Five Number Summary

A

A measure of a dataset’s spread and distribution accomplished by partitioning the data into quarters:

1. Minimum
2. Lower Quartile
3. Median
4. Upper Quartile
5. Maximum
21
Q

Minimum

A

The data item with the smallest value in a dataset.

23
Q

Interquartile Range

A

The difference between the upper and lower quartiles of a dataset.

24
Q

Outlier

A

A data item within a dataset whose value is far from the bulk of the other data item’s values.

25
Q

Normal Distribution

A

A dataset whose histogram maps closely to a bell curve.

26
Q

Mode

A

The number(s) that appear the most in a dataset. If all items appear only once, then there is no mode defined for that dataset.

28
Q

Neutral Negative Correlation

A

Indicated by a correlation coefficient as defined below:

{ r | -0.4 < r < -0.6 }

29
Q

A

A measure of the range of a dataset.

30
Q

Percentile Ranking

A

The percentage of data items whose values are less than the item being ranked.

31
Q

Range

A

The width of a dataset’s values. It is calculated as the difference between the maximum and minimum of a dataset.

32
Q

Data Distribution

A

A measure of a datasets clustering and spread.