Research Methods Flashcards
What are the 3 Steps in the Statistical Process?
1) Collect Data
2) Describe & Summarize the Distribution
3) Interpret - draw general conclusion for the pop on the basis of the sample
What is Nominal Data?
Mutually Exclusive groups, lack intrinsic order.
Zoning classification, social security numbers, sex.
What is Ordinal Data?
Ordered implying a ranking of observations. Values are meaningless - rank is important.
Letter grades, response scales on a survey 1-5, suitability for development
What is Interval data?
Data with ordered relationship where the difference between scales has meaning.
Temperature. Diff between 40 and 30 degrees is the same as 30 and 20 but 20 degrees is not twice as cold as 40 degrees.
What is Ratio Data?
Gold standard of measurement. Absolute and relative difference have meaning.
Distance measurement. 40 - 30 miles is the same difference as 30-20 miles and 40 miles is twice as far as 20 miles.
What are Quantitative Variables?
Variables where numerical value is meaningful.
Interval or ratio measurement.
Household income, level of pollution in river
What are Qualitative Variables?
Variables where numerical value is not meaningful.
Nominal/Ordinal measurement.
Zoning classification
What are Continuous Variables?
Infinite number of values.
Positive & negative.
Most measurements in physical sciences yield continuous variables.
What are Discrete variables?
Finite number of distinct values.
Accidents per month - can’t be negative.
What are Binary/dichotomous variables?
Special case of discrete variables which can only take on two values - 0/1 typically.
What are Descriptive variables?
Describe the characteristics of the distribution of values in a population or sample.
Ex: on average, AICP test takers in 2018 are 30 years old
What are Inferential Statistics?
Using probability to determine characteristics of a pop based on a sample.
Define Distribution
The overall shape of observed data.
Ordered table, or histogram, or density plot
What is the Normal or Gaussian Distribution?
the bell curve.
Distribution is symmetric. The spread around the mean can be related to the proportion of observations.
More specifically, 95% of the observations that follow a normal distribution are within two standard deviations from the mean
What is the Symmetric distribution?
equal number of observations are below and above the mean
What is a Central tendency?
Typical or representative value for the distribution of observed values
What is the Coefficient of Variation?
the relative dispersion from the mean by taking the standard deviation and dividing by the mean.
What is a z-score?
This is a standardization of the original variable by subtracting the mean and dividing by the standard deviation.
The z-score in effect transforms the original measure into standard deviation units.
What is the inter-quartile range or IQR.?
Alternative measure of dispersion.
Breaks things into quartiles.
This is visualized in a box plot (also called box and whiskers plot).
What is the confidence interval?
this constitutes a range around the sample statistic that contains the population statistic with a given level of confidence, typically 95% or 99%.
Define Standard Deviation
A measure of how much the data in a certain collection are scattered around the mean. A low standard deviation means that the data are tightly clustered; a high standard deviation means that they are widely scattered. There are two common formulas used for standard deviation, both yielding the same result.
Define Variance
The square of the standard deviation. It is a mathematical expectation of the average squared deviations from the mean. The formula is the same as that for the standard deviation except the “s” variable is squared, and no square root function is performed.
What is a t-test?
allows us to compare the means of two groups and determine how likely the difference between the two means occurred by chance.
What is the correlated t-test?
concerned with the difference between the average scores of a single sample of individuals who is assessed at two different times (“before” vs. “after”) or on two different measures. The measures must be correlated (co-related), and so it can also compare average scores of samples of individuals who are paired in some way (i.e. parent-child).