Organizing Visualizing and Describing Data Flashcards
(101 cards)
Absolute dispersion
The amount of variability present without comparison to any reference point or benchmark.
Absolute frequency
The actual number of observations counted for each unique value of the variable (also called raw frequency).
Arithmetic mean
The sum of the observations divided by the number of observations.
Bar chart
A chart for plotting the frequency distribution of categorical data, where each bar represents a distinct category and each bar’s height is proportional to the frequency of the corresponding category. In technical analysis, a bar chart that plots four bits of data for each time interval—the high, low, opening, and closing prices. A vertical line connects the high and low prices. A cross-hatch left indicates the opening price and a cross-hatch right indicates the closing price.
Bimodal
A distribution that has two most frequently occurring values.
Box and whisker plot
A graphic for visualizing the dispersion of data across quartiles. It consists of a “box” with “whiskers” connected to the box.
Bubble line chart
A line chart that uses varying-sized bubbles to represent a third dimension of the data. The bubbles are sometimes color-coded to present additional information.
Categorical data
Values that describe a quality or characteristic of a group of observations and therefore can be used as labels to divide a dataset into groups to summarize and visualize (also called qualitative data).
Chi-square test of independence
A statistical test for detecting a potential association between categorical variables.
Clustered bar chart
A bar chart for showing joint frequencies for two categorical variables (also known as a clustered bar chart).
Coefficient of variation
The ratio of a set of observations’ standard deviation to the observations’ mean value.
Confusion matrix
A grid used for error analysis in classification problems, it presents values for four evaluation metrics including true positive (TP), false positive (FP), true negative (TN), and false negative (FN).
Contingency table
A table of the frequency distribution of observations classified on the basis of two discrete variables.
Continuous data
Data that can be measured and can take on any numerical value in a specified range of values.
Correlation
A measure of the linear relationship between two random variables.
Cost averaging
The periodic investment of a fixed amount of money.
Cross-sectional data
A list of the observations of a specific variable from multiple observational units at a given point in time. The observational units can be individuals, groups, companies, trading markets, regions, etc.
Cumulative absolute frequency
Cumulates (i.e., adds up) in a frequency distribution the absolute frequencies as one moves from the first bin to the last bin.
Cumulative frequency distribution chart
A chart that plots either the cumulative absolute frequency or the cumulative relative frequency on the y-axis against the upper limit of the interval and allows one to see the number or the percentage of the observations that lie below a certain value.
Cumulative relative frequency
A sequence of partial sums of the relative frequencies in a frequency distribution.
Data
A collection of numbers, characters, words, and text—as well as images, audio, and video—in a raw or organized format to represent facts or information.
Data table
see two-dimensional rectangular array.
Deciles
Quantiles that divide a distribution into 10 equal parts.
Descriptive statistics
The study of how data can be summarized effectively.