Chapter 3 Definitions Flashcards
(16 cards)
Define histogram
A histogram is a type of bar chart used to show the distribution of continuous data. The x-axis represents class intervals (groups of data), and the y-axis shows the frequency or frequency density. Unlike bar charts, the bars in a histogram touch each other, because the data is continuous with no gaps between intervals. It helps you quickly see how data is spread, where it is concentrated, and whether it is skewed or symmetrical.
Key words to remember:
➡️ continuous data
➡️ class intervals
➡️ frequency/frequency density
➡️ distribution of data
Frequency
Frequency means how often a value or range of values appears in a data set. It shows the number of times something happens. In a table or chart, it tells us the count for each category or class. It is useful for spotting patterns, modes, or common values in the data.
Keywords: number of occurrences, how often, count, repeated values, pattern
Frequency density
Frequency density is used in histograms to make bars proportional when class widths are unequal. It is calculated by dividing frequency ÷ class width. This tells us how concentrated the data is within each class interval. It allows fair comparisons between different-sized intervals in continuous data.
Keywords: frequency ÷ class width, histogram, unequal intervals, concentration, continuous data
Class width
Class width is the size of the interval between the lower and upper boundaries of a class in grouped data. You calculate it by subtracting the lower boundary from the upper boundary. It shows how wide each group is in a histogram or frequency table. It’s important for calculating frequency density.
Keywords: upper - lower boundary, interval size, grouped data, histogram, used in frequency density
Underlying feature
An underlying feature is a hidden or natural property of the data that affects how we interpret it. It could be things like skewness, symmetry, or outliers. Understanding the underlying feature helps us choose the right graph or method. It’s not always directly visible, but it influences the shape and spread of data.
Keywords: hidden pattern, data behaviour, skew, shape, affects analysis
Outlier
An outlier is a value that is much higher or lower than the rest of the data. It lies far outside the usual range and can affect averages and spread. Outliers can happen due to errors, rare events, or natural variation. They are important to notice because they can distort results.
Keywords: extreme value, outside range, affects mean, rare, must be checked
Interquartile range
The interquartile range is the difference between the upper and lower quartiles. It measures the spread of the middle 50% of the data. It is a reliable measure of spread because it ignores extreme values. A smaller IQR means the data is more consistent, while a larger IQR means it’s more spread out.
Keywords: Q3 - Q1, middle 50%, spread, resistant to outliers, consistency
Quartiles
Quartiles split the data into four equal parts. The lower quartile (Q1) marks the 25th percentile, the median (Q2) marks the 50th, and the upper quartile (Q3) marks the 75th. They help describe how the data is spread and where most values lie. Quartiles are used to find the IQR and analyse skewness.
Keywords: Q1, Q2, Q3, divides data into 4 parts, spread, IQR
Percentile
A percentile tells you what percentage of values fall below a certain value in the data. For example, the 70th percentile means 70% of the data is below that point. Percentiles are used in exam results, performance scores, and rankings. They help compare values within a distribution.
Keywords: percentage position, rank, data comparison, below, distribution
Upper Quartile
Upper Quartile (Q3)
The upper quartile is the value at the 75th percentile, meaning 75% of the data lies below it. It marks the end of the middle half of the data. It’s used to calculate the interquartile range (IQR). Q3 helps show where the higher values in the data start becoming more common.
Keywords: Q3, 75th percentile, IQR, upper data range, spread
Lower Quartile
Lower Quartile (Q1)
The lower quartile is the value at the 25th percentile, so 25% of the data lies below it. It shows the start of the middle 50% of the data. Like Q3, it’s used in calculating the interquartile range (IQR). It helps identify how low values are distributed in the data.
Keywords: Q1, 25th percentile, IQR, lower data range, spread
Anomaly
An anomaly is a value in a data set that doesn’t fit the general pattern or trend. It stands out as unusual compared to the rest of the data. Anomalies may be caused by errors, unusual events, or natural variation. They are important to check because they can affect averages, charts, and analysis.
Keywords: unusual value, breaks pattern, possible error, affects results, must investigate
Box Plot
Box Plot (also called Box-and-Whisker Diagram)
A box plot is a diagram that shows the five-number summary of a data set: minimum, lower quartile (Q1), median (Q2), upper quartile (Q3), and maximum. It uses a box to show the interquartile range and lines (“whiskers”) to show the full spread of the data. Outliers and anomalies can also be shown as separate dots or crosses. It helps quickly identify skewness, spread, and central tendency.
Keywords: five-number summary, Q1 Q2 Q3, whiskers, IQR, data shape
Stem and Leaf Diagram
Stem and Leaf Diagram
A stem and leaf diagram is a way of organising numerical data to show its shape and keep all the original values. The “stem” shows the leading digits (like tens), and the “leaf” shows the last digit (like units). It helps to spot clusters, gaps, and the mode quickly. It’s especially useful for small data sets and is great for comparing two sets of data side-by-side.
Keywords: original data kept, stem = first digits, leaf = last digits, shows shape, small data sets
Continuous data
Continuous data consists of values that can take any number within a given range. These data points are measured and can have decimals or fractions, such as height, weight, or temperature. Continuous data is usually collected using instruments that measure smoothly across intervals, and the values can be infinitely precise depending on the measurement tool.
Discontinuous data
Discontinuous data, also called discrete data, consists of separate, distinct values that cannot be broken down into smaller parts meaningfully. These are countable values, such as the number of students in a class or the number of cars in a parking lot. Discontinuous data usually involves whole numbers and cannot take values between these counts.