Flashcards in Basic Terms Deck (110):
What are four common sampling techniques?
Convenience Sampling, systematic sampling, stratified sampling, cluster sampling
What is Convenience sampling?
Sampling process by which people are selected in a way that is convenient to the researcher. Not random.
What is systematic sampling?
A sampling method that uses a system to randomize the process. For example everyone in a population is assigned a number, beginning in a random my spot every K person is chosen. Noted there are three steps.
What is stratified sampling?
People from every applicable strata are included samples are taken randomly from every strata.
What is cluster sampling?
People are assigned randomly two clusters, clusters are then randomly selected, and everyone in the cluster is sampled.
What are the four levels of measurement?
Nominal, ordinal, interval data, and ratio data
What does nominal measurement mean?
By categories not ordered
What does ordinal measurement mean?
Data that can be ordered, but differences are mathematically meaningless. For example rank, or grade in school.
What is interval data?
Data can be ordered, differences are mathematically significant, but there is no natural zero! For example, temperature.
What is ratio data?
Ratio data can be ordered, the differences are mathematically Significant, and it has a natural zero. For example, a count of something.
How do quantitative data get grouped versus qualitative data?
Quantitative data is grouped by classes. Qualitative data is grouped in bins.
What is the difference between a statistic and a parameter?
A statistic is a characteristic of a sample. A parameter is a characteristic of a population.
What is the confidence interval?
The range including the margin of error. For example 60% plus or -2 would equal a confidence interval at 58% to 62%.
What is the distribution?
A listing of all possible values of a variable, or intervals of a variable, and its frequency.
What is a variable?
A characteristic of a sample being counted, measured, or categorized.
What types of graphs are used for one-variable quantitative data?
Histograms, box-plots, time charts
What types of graphs are used for qualitative data?
Pie charts and bar graphs
How is numerical data usually summarized?
By measures of center, measures of spread, and relationships.
How is categorical data usually summarized?
By frequency or relative frequency
What are three ways experiments can go bad?
Researchers or subjects know what group they are in, factors uncontrolled for affect the outcome, or there is lack of a control group.
What is three criteria for good data?
No bias in questions or methods, size must be appropriate (not too small), and volunteers must be randomized with a control group.
What are the two types of quantitative data?
Discrete data is accountable or finite, such as sides of a die. Continuous data is not accountable, is infinite, such as measurements.
What is the difference between an experiment and an observational study?
Observational studies and experiments both measure specific characteristics, but experiments actually do something to the subjects.
What is the difference between a random and a simple random sample?
A random sample is when every single person has an equal opportunity to be chosen. A simple random sample is when every group of the same characteristic has it equal opportunity to be chosen.
What are two common errors in sampling?
A non-sampling error is it error in calculating a recording data. A sampling error is the difference between the sample (n) and the actual population be studied (N) due to random chance of sampling. Larger samples help alleviate this error.
What is correlation, positive correlation, and negative correlation?
Correlation is the strength of a relationship between X and Y. In a negative correlation X goes up and Y goes down. In a positive correlation X goes up and Y goes up.
What is regression?
Finding the equation of a line that best fits the data and to use those results to make predictions for a variable based on another.
What is a proxy?
Something related to what we want to measure but isn’t exactly it.
How do outliers affect measures of center?
Outliers dramatically affect the mean and the range. Outliers do not affect the median, mode, or the variation very much
What is another name for measures of spread?
What is a better measure then range?
Range doesn’t tell us what the core values are. IQR, the Interquartile range, Is better.
What are two way tables?
And table where the positive values of one variable make up roses and another positive value of the variable make up columns
What is the IQR?
The interquartile range is the middle 50% of data. One calculate the median Of the data, then take the mediums of each half. You get the bottom 25%, the IQR, and the top 25%.
What is the variance?
The variance is the deviation or difference from each data point to the mean.
What do outliers change?
The mean, the variation, the standard deviation, and the range, they did not affect the median or the mode.
What is the spread of the standard deviation tell us?
The larger the standard deviation, the bigger the spread the data covers. The smaller the deviation, the smaller the spread of the data.
How can crime be reported differently Using the same data?
Events (count data) go up, but the rate (per a certain number) actually went down.
What happens when the Y axis is manipulated?
By increasing or decreasing the Y axis you can change the appearance and visual comparison of data
What kind of skew is data that extends to the right?
What usually happens when people drop out of studies?
The data is used to the right.
What is the empirical rule?
68% of your data will be within one standard deviation of the mean, 95% of the data will be within two standard deviations of the mean, 99.7% of the data will be within three standard deviations of the mean.
What kind of data can we use the empirical rule with?
Data with the normal distribution.
How do we know a piece of data is usual?
It is within two standard deviations from the mean.
How do you figure out how many standard deviations a particular piece of data is?
Subtract that mean from each individual piece of data and divide by the standard deviation.
What is the coefficient of variation? I’m
A number that allows you to compare variation in two or more sets of data. It creates a percent.
What is the regression coefficient?
A non-zero slow indicating there is some kind of relationship between variables
What does a positive correlation look like on the dot plot?
The line goes up as it does right.
What does a negative correlation look like on a dot plot?
Because one variable causes another to decrease, the line goes down as you move to the right.
What is correlation?
The measure of the way to variables move together.
What is “r”?
The correlation coefficient.
What does an r of one mean?
That there is a one to one positive correlation relationship
What does an r of -1 mean?
That there is a one to one negative correlation relationship.
What values does the correlation coefficient always need to be between?
The r must always be between -1 and 1.
How do we know how strong a correlation is?
The closer to -1 or 1, the stronger the correlation is.
What does the r of zero look like?
No correlation, no linear relationship. Dots are all over the place and spread out evenly
What can we learn from correlation from the steepness of the regression line
What is r squared?
The squared correlation.
What does r squared tell us?
It is always between zero and one, it is in decimal form, and it tells us the accuracy of a gas if we know one variable. For example a squared correlation of .7 would equal a 70% accuracy if we know the value of variable A, but not the value of variable B.
What does a squared correlation of one mean
Perfect 100% predictability one variable to the other
What is the formula for a linear relationship?
What four things could a correlation reveal?
A causes B, B causes A, a third variable causes and B, or there might be no relationship at all (spurious correlation).
What is a spurious correlation?
A relationship which appears to exist between two variables but really has no relationship at all.
Why is it important to look at a scatterplot of data?
Because data can be presented in lots of ways with the same set of stats, which could mean entirely different things. See the Datasaurus Dozen.
What is allocation bias?
Researchers to putting people into groups that they think will respond best to an experiment
What is selection bias?
Subjects put themselves in a particular group
What is randomized block design?
Assuring sub groups are randomized within control and experimental groups
What is a matched pair experiment?
Subject a gets one treatment subject he gets a second treatment. The pairs are paired by many factors.
What is repeated measures design?
Where when subject gets multiple treatments. Common in the medical environment.
What are two things that can make or break a survey?
The questions actually asked, and who the questioner is.
Pasha questions in the survey be worded?
In a neutral way
What is non-response bias?
Those who respond or didn’t respond are very different from each other and are not always reflected in the data.
What is snowball sampling?
When current respondents are asked to help recruit people they know from the population of interest.
What is it example of non-experimental data collection?
How are statistics on censuses different then statistics on samples?
Statistics on censuses focus on whether or not differences in characteristics are big enough to make a difference in some outcome. Statistics on surveys or experiments focus on if there is a relationship between variables.
What are cross tabs?
Is symmetric data always spell shaped?
What is a resistant statistic?
A certain characteristic is not affected by skewness. Like the median is resistant to skewness.
How do you put the standard deviation into perspective?
We look at the value of the mean.
What are four properties of the standard deviation?
It cannot be negative, it’s most value is zero - which happens when every data set value is exactly the same. It is affected by outliers because it is related to distance from the meeting. It’s always in the same units as the original data. (Don’t confuse with the sample variance which is squared.)
Why do we need the standard variation for accuracy?
Because we can’t compare data accurately when we only have mean.
Can we apply the empirical rule to samples?
No. However you can use it to check if your sample size is large enough that it compares accurately with a normal population.
Which kind of distribution do we apply the empirical rule to?
A normal distribution
If we do not have a normal distribution how do we summarize the data?
Not with the empirical rule. We use percentiles or a five number summary.
What do we use to compare an individual to a sample or population?
We use relative standing as statistic called percentiles.
How do we calculate percentiles?
Order the data from smallest to largest, multiply a specific percent as a decimal by the number of values. Round. Count that many variables in. That number is your percentile for that data set.
How was percentile reported?
Percentile is relative standing it is not percentage by itself.
What is the five number summary?
The minimum value of a data set, Q1, the median, Q3, and the maximum value of a data set.
Is the interquartile range part of the five number summary?
No, but it may be a better descriptor depending on the data.
What type of graph to be used to present a five number summary?
A box plot.
What does the small IQR mean?
That lots of the data is near the median.
When is the median a better summary of data then the meet?
With the data is skewed?
What statistics needs to be included with a pie chart?
The sample size, n
How are percentages presented on the pie chart?
Name three things that can be presented on a bargraph?
Frequency, relative frequency, and cumulative frequency
What do we need to be careful of when creating a bar graph?
That we don’t squish or stretch out the scale of the X axis.
Where are the bins on a bargraph?
On the bottom for a vertical bar graph, or on the left for a horizontal bargraph
What are some examples of numerical data?
Counts or measurements
What kind of graph do you use to present numerical data?
A histogram and data is grouped, not binned.
What are some characteristics of a bargraph?
It uses unconnected bins, The bars do not connect, the bins are not in a particular order. For example genres or favorite colors.
What are some characteristics of a histogram?
The date is connected, so the bars are connected, and the bars are in order.
What are three things histograms tell us?
Histograms tell us the shape of the data, how distributed among the groups the data is, The amount of variability, and where the center of it is.
What does a histogram show?
Histogram shows data at one point in time, time charts show data over time.
What is a statistic versus a perimeter?
A statistic is a characteristic of a sample, and a parameter is a characteristic of a population
What are measures of relative standing?
Comparing measures between, or within, two data sets. Like the Z score.
What is a Z - score?
The Z score is the number of standard deviations A specific data value (X) is from the mean.
What are normal/usual Z scores?
Z-scores between -2 and 2
How are quartiles and percentiles different?
Percentile is parts out of 100. So the data is divided into 100 parts instead of 4 parts. Note that there are 99 percentiles, not 100.
What does percentile tell us?
The percentile tells how you did compared to everyone else who took a test. It is a ratio of the number of values less than you divided by the number of total values. This gives you a decimal what you can turn into a percentage by multiplying with 100.
What is the difference between percentage and percentile?
Percentage is out of the total on the test. Percentile is out of how many people took the test. The percentage you got on the test is it relevant to the percentile.