module 2 visualing data abd outliers Flashcards
(45 cards)
bar graphs are a popular way to summarize which type of data
categorical data.
is bars represent mean score in each category lines (called error bars) may be shown on top of bars to represent____
standard deviation
histograms are used to depict what kind of data
scale data.
if data is skewed, which measure of central tendency would be used to best describe data
median, because it takes extreme values info account but is not greatly impacting by them since it is in the middle.
why would a frequency polygon be used over a histogram?
can be useful when comparing multiple groups as adding multiple lines to one growth is easier to interpret than multiple bars.
a one way scatter plot
used a single acid to display the relative position of each data point in a group. this type of figure can be used with categorical or scale data, can be presenting horizontal or vertically.
box plots
only has one axis, they show a summary of the data instead of each data point.
the center depicts interquartile range. lines or whiskers projecting from the box on either side extend to the adjacent values( the most extreme observation in the data set that are no more than 1.5 times the height of the box beyond either quartile) anything beyond the adjacent values are considered extreme values and are plotted as individual dots
when can a box plot be used
where there are too many overlapping data points and that would be difficult to interpret as a scatter plot or one way scatter plot
what type of data would you use for a two way scatter plot
scale variables can also depict the relationship between two scale variables
line graphs
similar to two way scatter plots in that they represent the relationship between two scale variables how ever for line graphs each point on the x axis has a corresponding y value, which is not a requirement for scatter plots
what’s an outlier
something unusual or different or outside the norm
how would you identify potential outliers
by visualizing my data. extremely positive or negative values are easy to spot in box plots, scatter plots, and histograms
what constitutes and outlier?
- values that are more the two standard deviations above or below the mean
- values that are more than 1.5 times the IQR above Q3 or below Q1 ( values outside of whiskers in a box plot)
what is a research population?
the group of objects events people procedures or observations that a researcher is interesting in studying
dependent variable
what is being measured or the outcome of a study
approaches to sampling
random sampling and non random sampling
random sampling
random selection is used to choose people, objects, events or observations to be included in each sample, each often of interest has an equal change of being included with the sample
none random sampling
the items included in the study are selected for a reason(proximity, feasibility) non probability sampling
which graph type is best for showing changes over time?
bar chart
line graph
pie chart
histogram
line graph, they are good for showing trends or patterns across time points, like monthly case numbers or yearly vaccination rates
which chart is best for showing the frequency distribution of a continuous variable
histogram, they show often values fall into specific ranges, ideal for continued variables like height, blood pressure or income
true or false a bar chart can be used to display both categorical and numerical data
true
true or false line graphs should only be used for categorical variables
false, they are used for continuous or ordinal data across a time axis- not categorical labels
what is discrete data and what kind of visualization should i use for it
discrete data are countable separate values, cannot be broken into smaller pieces, no decimals or fractions eg number of people in a household
use bars charts or pie charts don’t use histograms