Organizing, Displaying, and Describing Data Flashcards
What is a variable
- Any characteristic that can & does assume different values for different people, objects, or events being studied
What are the four measurement scales for variables
- Nominal
- Ordinal
- Interval
- Ratio
Describe nominal
- Numbers are simply used as a code to represent characteristics.
- There is no order to the categories.
- The assignment of numbers to categories is arbitrary
- Ex: gender or ethnicity
Describe ordinal
- Numbers represent categories that can be placed in a meaningful numerical order (e.g., from lowest to highest).
- There is no information regarding the size of the interval between the different values.
- The size of the interval may be different between the different categories.
- There is no “true” zero.
- EX: pain scale 1 = no pain, 2 = a little pain, 3 = some pain, 4 = a lot of pain
Describe interval
- Numbers can be placed in meaningful order.
- The intervals between the numbers are equal.
- It is possible to add and subtract across an interval scale.
- There is no true zero, so ratios cannot be calculated.
- Ex: Fahrenheit temp., SAT, or GRE
Describe ratio
- Numbers can be placed in meaningful order.
- The intervals between the numbers are equal.
- There is a “true” zero, determined by nature, which represents the absence of the phenomena.
- Almost all biomedical measures (weight, pulse rate, and cholesterol level) are of ratio scale.
- Ex: weight, age, # of min. spent exercising, cholesterol level, or # of wks pregnant
What is the goal of displaying data
- To get a feeling for the distribution of the data
Define the parts of displaying data
- Central tendency: most frequently occurring values
- Dispersion: how the values are spread out
- Shape and skewness: symmetry or asymmetry of the distribution of the values
- Outliers: unusual values that do not fit the pattern of the data
Describe frequency distributions
- A table that shows classes or intervals of data with a count of the number in each class. The frequency (f) of a class is the number of data points in the class.
Define class width
- The distance b/w lower (or upper) limits of consecutive classes
Define range
- The difference b/w the max and min data entries
Describe histograms
- A way of organizing the data in visual form
- Data have to be at least ordinal in scale
What are the rules for histogram construction
- The values of the variable being graphed are on the x-axis
- Class intervals are used (mutually exclusive, exhaustive, & even widths)
- The bars of the histogram touch
Describe a stem and leaf plot
- Each number is separated into a stem (usually the entry’s leftmost digits) and a leaf (usually the rightmost digit)
- Allows us to see the shape of the data as well as the actual values
What is the advantage and disadvantage of using a graphical method for describing data
- Advantage: Its visual representation
- Disadvantage: Its unsuitability for making inferences (our main goal)
What are some numerical methods for describing data
- Frequency distribution table
- Histograms
- Stem and leaf plot
- Pie chart
- Scatter plot
- Times series chart
Describe the differences between mode, median, and mean
- Mode: most frequently recurring value (appropriate for nominal, ordinal, interval, & ratio data); if no entry is repeated then there is no mode
- Median: the value that is in the middle of the distribution (appropriate for ordinal, interval, & ratio data); middle entry when all entries are put in order & if it’s a even # of entries take the mean of the 2 middle values
- Mean: the arithmetic average of the distribution ( appropriate for interval & ratio data); sum of all values divided by total entries
Define an outlier
- A data entry that is far removed from the other entries in the data set
Comparing mean, median, and mode which ones are affected by an outlier
- Mean is affected while median and mode are not influenced by extreme values
Define midrange
- The average of the highest and lowest value in the data set
- Very easy to find but highly effected by the extreme values
Describe a weighted mean
- It’s the mean of a data set whose entries have varying weights
- Ex: homework is 30%, exams are 50%, and projects are 20% of your final grade
What are the measures of dispersion and their goal
- Goal is to get a feeling for the spread of the data
- Range: difference b/w the highest & lowest value in a data set (appropriate for ordinal, interval, & ratio data)
- Interquartile range: the value that is in the middle of the distribution (appropriate for ordinal, interval, & ratio data)
- Standard deviation: average distance of each point from the mean (appropriate for interval & ratio data)
Describe symmetrical distributions
- Data are evenly distributed about the center
- There is the same amount of data on the right & left side of the distribution
- Not all symmetrical distributions are “normal”
Describe skewed distributions
- Data are not evenly distributed about the center
- Can be “right skewed” or “left skewed”