Module 1 Flashcards

Question

Points to a good graph

Answer 1

To have reliable results, any sample size quantitative analysis should include at least 200 people. Data is from a neutral source Questions should not imply or provoke a specific response Be wary of misleading averages Tracking information for each year provides a truer picture of the general trends. Avoiding biased samples Looking for truncating on an axis Percentages can hide small sample sizes Using different bases Data fishing - data dreging - the analysis of large data with the goal of findings an association that is implied as coorelation

Answer 2

Should convey the general patterns in a set of observations at a single glance. Should be simple, self-explanatory, clearly labeled

Answer 3

means of organizing and summarizing observations

Answer 4

Values fall into unordered categories or classes (sex, blood type) Proportion measurements can be used

Answer 5

When order becomes important, a natural order exists, but we still aren't concerned with the magnitude (pain scale)

Answer 6

Data is arranged from highest to lowest and then assigned a rank (possible causes of death) - disregard magnitude and only care about relative position

Answer 7

Order and magnitude are important - numbers represent actual quantities (unlike ranks) and can only take on specific values (e.g., the number of car accidents in MA in a month). These will be integers. Arithmetic can be applied

Answer 8

Data that represents specific quantities but is not restricted to specific values (like integers). Eg. Time) The only limiting factor with continuous data is the degree of accuracy with which it can be measured

Answer 9

Nominal and Ordinal data: consists of a set of classes/categories and the numerical counts Discrete or Continuous Data: 1. Breakdown into a series of non-overlapping intervals. 2. count

Answer 10

the proportion of the total number of observations that appears in that interval. Useful for comparing sets of data that contain unequal numbers of observations

Answer 11

The cumulative relative frequency is calculated by summing the relative frequencies for the specified interval and all previous ones

Answer 12

Two random variables are stochastically ordered if one is more likely to take on larger values than the other, as formalized by the comparison of their probability distributions or cumulative distribution functions

Answer 13

Pictorial representation of frequency distribution for nominal and ordinal data

Answer 14

Frequency distribution for discrete or continuous data. The total area = 1 or 100%; frequency is associated with the area, not the height of the bar. RElative frequency and absolute frequency histograms will have the same shape

Answer 15

Superior to histograms for comparing two sets of data A cumulative frequency polygon

Answer 16

A one-way scatter plot uses a single horizontal axis to display the relative position of each data point in the group No observations are lost, but they can be difficult to read

Answer 17

similar to one-way scatter plots in that they require a single axis; instead of plotting every observation, however, they display only a summary of the data Adjacent values are the most extreme observations that are not more than 1.5X the interquartile range

Answer 18

used to depict the relationship between two different continuous measurements.

Answer 19

The most frequently used measurement of central tendency - also called the average Not appropriate for nominal or ordinal data The mean is extremely sensitive to unusual values

Answer 20

Can be used for ordinal, discrete and continuous data. It is the 50th percentile of all measurements Odd number of observations: [(n + 1)/2] - even takes the average of the two middle numbers. Only take ordering into account and is not as sensitive to unusual values (robust)

Answer 21

It can be used for all types of data. It is the set of values that occurs most frequently.

Answer 22

Unimodal, evenly distributed: mean, median, and mode are about the same Bimodal, recently shaped: mean, median, and mode are about the same (but the measurement would be extremely unlikely to occur) Asymmetric data - median is often the best measure

Answer 23

difference between the largest observation and the smallest. Highly sensitive to exceptionality

Answer 24

Not as easily impacted by extremes Calculated by subtracting the 25th percentile of the data from the 75th percentile; consequently, it encompasses the middle 50% of the observations.

Answer 25

quantifies the amount of variability, or spread, around the mean of the measurement

Answer 26

variance is calculated by subtracting the mean of a set of values from each of the observations, squaring these deviations, adding them up, and dividing by 1 less than the number of observations in the data set.

Answer 27

positive square root of the variance. In a comparison of two groups of data, the group with the smaller standard deviation has the more homogeneous observations; the group with the larger standard deviation exhibits a greater amount of variability.

Answer 28

relates the standard deviation of a set of values to its mean. Most useful for comparing two ormore data sets

Answer 29

this procedure can be applied to data that have been summarized in the form of a frequency distribution. Data that is organized in this way is often referred to as grouped data. Can also be interpreted as a weighted average

Answer 30

can be used to summarize the distribution of values instead. Chebychev's inequality is less specific than the empirical rule, but it is true for any set of observations, no matter what its shape

Module 1 Flashcards

(55 cards)