module 2 visualing data abd outliers Flashcards

(45 cards)

1
Q

bar graphs are a popular way to summarize which type of data

A

categorical data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

is bars represent mean score in each category lines (called error bars) may be shown on top of bars to represent____

A

standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

histograms are used to depict what kind of data

A

scale data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

if data is skewed, which measure of central tendency would be used to best describe data

A

median, because it takes extreme values info account but is not greatly impacting by them since it is in the middle.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

why would a frequency polygon be used over a histogram?

A

can be useful when comparing multiple groups as adding multiple lines to one growth is easier to interpret than multiple bars.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

a one way scatter plot

A

used a single acid to display the relative position of each data point in a group. this type of figure can be used with categorical or scale data, can be presenting horizontal or vertically.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

box plots

A

only has one axis, they show a summary of the data instead of each data point.

the center depicts interquartile range. lines or whiskers projecting from the box on either side extend to the adjacent values( the most extreme observation in the data set that are no more than 1.5 times the height of the box beyond either quartile) anything beyond the adjacent values are considered extreme values and are plotted as individual dots

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

when can a box plot be used

A

where there are too many overlapping data points and that would be difficult to interpret as a scatter plot or one way scatter plot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what type of data would you use for a two way scatter plot

A

scale variables can also depict the relationship between two scale variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

line graphs

A

similar to two way scatter plots in that they represent the relationship between two scale variables how ever for line graphs each point on the x axis has a corresponding y value, which is not a requirement for scatter plots

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what’s an outlier

A

something unusual or different or outside the norm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

how would you identify potential outliers

A

by visualizing my data. extremely positive or negative values are easy to spot in box plots, scatter plots, and histograms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what constitutes and outlier?

A
  • values that are more the two standard deviations above or below the mean
  • values that are more than 1.5 times the IQR above Q3 or below Q1 ( values outside of whiskers in a box plot)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is a research population?

A

the group of objects events people procedures or observations that a researcher is interesting in studying

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

dependent variable

A

what is being measured or the outcome of a study

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

approaches to sampling

A

random sampling and non random sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

random sampling

A

random selection is used to choose people, objects, events or observations to be included in each sample, each often of interest has an equal change of being included with the sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

none random sampling

A

the items included in the study are selected for a reason(proximity, feasibility) non probability sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

which graph type is best for showing changes over time?

bar chart
line graph
pie chart
histogram

A

line graph, they are good for showing trends or patterns across time points, like monthly case numbers or yearly vaccination rates

20
Q

which chart is best for showing the frequency distribution of a continuous variable

A

histogram, they show often values fall into specific ranges, ideal for continued variables like height, blood pressure or income

21
Q

true or false a bar chart can be used to display both categorical and numerical data

23
Q

true or false line graphs should only be used for categorical variables

A

false, they are used for continuous or ordinal data across a time axis- not categorical labels

24
Q

what is discrete data and what kind of visualization should i use for it

A

discrete data are countable separate values, cannot be broken into smaller pieces, no decimals or fractions eg number of people in a household

use bars charts or pie charts don’t use histograms

25
what is continuous data and what visualization would I use for it
These are measurable values, that can be broken down into fractions or decimals. Eg height or weight blood pressure temperature Use histograms, line graphs or scatter plots
26
Would a number of prescription meds someone takes be discrete or continuous
Discrete
27
Would systolic blood pressure be discrete or continuous
Continuous
28
Would number of er visits per year be discrete or continuous
Discrete
29
How to cho see e correct graph or statistical test depending on
Type of variable - categorical, ordinal, continuous Number of groups What you're trying to compare
30
Best graph type for categorical nominal data
Bar chart or pie chart eg gender blood type
31
Best graph for ordinal data
Bar chart or boxplot if numeric eg pain scale satisfaction level
32
Best chart for one continuous variable
Histogram, line graph, scatter plot or boxplot eg weight income time
33
Best graph for 2 continuous variables
Scatter plot eg bmi vs blood pressure
34
What statistical test would you run if you are comparing 2 groups and its continuous outcome
Independent t test eg compare average bmi for males vs females
35
What statistical analysis would you use if you are comparing 3 or more groups with continuous outcome
ANOVA eg comparing cholesterol levels across the ss age groups
36
What statistical test would you use to look at the relationship between 2 continuous variables
Correlation or regression eg Time spent exercising and stress level
37
What statistical analysis would you usebti compare proportions with categorical variables
Chi square test eg compare smoking rates by gender
38
What statistical test would you use to predict one variable from another
Regression eg predict weight from calorie intake
39
Four types of bias in research
Sampling bias Survivorship bias Response bias Recall bias
40
This bias occurs when each member or item of the relevant population does not have an equal chance of ending up in the sample, or selection bias
Sampling bias, eg only picking people at one location
41
This bias occurs when participants give answers they believe the researcher wants to hear or what they think are socially acceptable answers
Response bias eg on a survey asked about sexual behaviour or alcohol consumption they may lie
42
This bias occurs when individuals or objects leave the study and the researcher continuous to measure the remaining participants without considering those that left
Survivorship bias eg a study exploring 12 week excercise program to see if it removes your risk of falls, but the ones who fell dropped out skewing the results
43
This bias occurs when participants do not remember past events accurately or omit details
Recall bias eg asking mother their comfort levels giving birth years after they've given birth
44
True or false histograms are for distributions not relationships
True
45