# Research and Assessment Methods Flashcards

Qualitative research

- An approach for understanding the meaning individuals and groups ascribe to a human or social problem
- Emerging questions
- Flexible written report
- Analysis building from particular data to general themes (inductive)

Quantitative research

- An approach for testing objective theories by examining the relationships among variables (deductive)
- Numbered data which can be analyzed using statistical procedures
- Structured written report

Mixed methods research

- Collection of both qualitative and quantitative data
- Integrating the two forms of data
- May involve both philosophical assumptions and theoretical frameworks
- Assumes a more complete understanding of a research problem than using one of the approaches alone

Case Study Method

A research method focusing on the study of a single case. Usually it is not designed to compare one individual or group to another, although sometimes a case study may be included in comparative analysis as a key or illustrative example.

Comparative analysis

Analysis where data from different settings or groups at the same point in time or from the same settings or groups over a period of time are analyzed to identify similarities and differences.

Discourse Analysis

A study of the way versions or the world, society, events and psyche are produced in the use of language and discourse. It is often concerned with the construction of subjects within various forms of knowledge/power. Semiotics, deconstruction and narrative analysis are forms of discourse analysis.

e-Research

Also known as e-Science or e-Social Science, it is the harnessing of any digital technology to undertake and promote social research. This includes treating the digital sphere as a site of research by examining social interaction in the e-infrastructure.

Ethnography

A multi-method qualitative (participant observation, interviewing, discourse analyses of natural language, and personal documents) approach that studies people in their “…naturally occuring settings or ‘fields’ by means of methods which capture their social meanings and ordinary activities, involving the researcher participating directly in the setting…”

Field Research

Field research is when a researcher goes to observe an everyday event in the environment where it occurs.

Grounded theory

An inductive form of qualitative research where data collection and analysis are conducted together. Theories remain grounded in the observations rather than generated in the abstract. Grounded theory is an approach that develops the theory from the data collected, rather than applying a theory to the data.

Narrative analysis

Narrative analysis is a form of discourse analysis that seeks to study the textual devices at work in the constructions of process or sequence within a text.

In narrative research the respondent gives a detailed account of themselves and is encouraged to tell their story rather than answer a predetermined list of questions. This method is more successful when people are discussing a life changing event.

Analysis of the narrative tells the researcher about the person’s understanding of the meaning of events in their lives.

What are the three important steps in the statistical process?

(1) collect data (e.g., surveys), covered in Lesson 2; (2) describe and summarize the distribution of the values in the data set; (3) interpret by means of inferential statistics and statistical modeling, i.e., draw general conclusions for the population on the basis of the sample.

What are the 4 different types of measurement?

Nominal data

Ordinal data

Interval data

Ratio data

Nominal data

are classified into mutually exclusive groups or categories and lack intrinsic order. A zoning classification, social security number, and sex are examples of nominal data. The label of the categories does not matter and should not imply any order. So, even if one category might be labeled as 1 and the other as 2, those labels can be switched.

Ordinal data

are ordered categories implying a ranking of the observations. Even though ordinal data may be given numerical values, such as 1, 2, 3, 4, the values themselves are meaningless, only the rank counts. So, even though one might be tempted to infer that 4 is twice 2, this is not correct. Examples of ordinal data are letter grades, suitability for development, and response scales on a survey (e.g., 1 through 5).

Interval data

is data that has an ordered relationship where the difference between the scales has a meaningful interpretation. The typical example of interval data is temperature, where the difference between 40 and 30 degrees is the same as between 30 and 20 degrees, but 20 degrees is not twice as cold as 40 degrees.

Ratio data

is the gold standard of measurement, where both absolute and relative differences have a meaning. The classic example of ratio data is a distance measure, where the difference between 40 and 30 miles is the same as the difference between 30 and 20 miles, and in addition, 40 miles is twice as far as 20 miles.

Quantitative variables

the actual numerical value is meaningful

represent an interval or ratio measurement

(e.g., household income, level of a pollutant in a river)

Qualitative variables

numerical value is not meaningful

correspond to nominal or ordinal measurement

(e.g., a zoning classification)

Continuous variables

can take an infinite number of values, both positive and negative, and with as fine a degree of precision as desired. Most measurements in the physical sciences yield continuous variables.

Discrete variables

can only take on a finite number of distinct values. An example is the count of the number of events, such as the number of accidents per month. Such counts cannot be negative, and only take on integer values, such as 1, 28, or 211. A special case of discrete variables is binary or dichotomous variables, which can only take on two values, typically coded as 0 and 1.

Binary variables

dichotomous variables, which can only take on two values, typically coded as 0 and 1.

population

is the totality of some entity. For example, the total number of planners preparing for the 2018 AICP exam would be a population.

sample

is a subset of the population. For example, 25 candidates selected at random out of the total number of planners preparing for the 2018 AICP exam.

Descriptive Statistics

describe the characteristics of the distribution of values in a population or in a sample. For example, a descriptive statistic such as the mean could be applied to the age distribution in the population of AICP exam takers, providing a summary measure of central tendency (e.g., “on average, AICP test takers in 2018 are 30 years old”). The context will make clear whether the statistic pertains to the population (all values known), or to a sample (only partial observations). The latter is the typical case encountered in practice.

Inferential Statistics

use probability theory to determine characteristics of a population based on observations made on a sample from that population. We infer things about the population based on what is observed in the sample. For example, we could take a sample of 25 test takers and use their average age to say something about the mean age of all the test takers.

Distribution

is the overall shape of all observed data. It can be listed as an ordered table, or graphically represented by a histogram or density plot. A histogram groups observations in bins represented as a bar chart. A density plot is a smooth curve. The full distribution is typically too overwhelming so that its characteristics are summarized by descriptive statistics.

In addition to central tendency and dispersion, other characteristics are symmetry or lack thereof (skewness), and the presence of thick tails (kurtosis), i.e., a higher likelihood of extreme values.

skewness

lack of symmetry in dispersion of the data.

kurtosis

the presence of thick tails in dispersion of data, i.e., a higher likelihood of extreme values.

range

An important characteristic of the distribution is the range of the data, i.e., the difference between the largest and the smallest value.

Gaussian distribution

Normal or Gaussian distribution, also referred to as the bell curve. This distribution is symmetric and has the additional property that the spread around the mean can be related to the proportion of observations. More specifically, 95% of the observations that follow a normal distribution are within two standard deviations from the mean (see below, for further discussion). The normal distribution is often used as the reference distribution for statistical inference (see below).

Normal distribution

Normal or Gaussian distribution, also referred to as the bell curve. This distribution is symmetric and has the additional property that the spread around the mean can be related to the proportion of observations. More specifically, 95% of the observations that follow a normal distribution are within two standard deviations from the mean . The normal distribution is often used as the reference distribution for statistical inference.

Symmetric distribution

is one where an equal number of observations are below and above the mean (e.g., this is the case for the normal distribution).

Asymmetric distribution

where there are either more observations below the mean or more above the mean is also called skewed.

skewed to the right

when the bulk of the values are above the mean. This tends to happen when the distribution is dominated by a few very large values (outliers). For example, this is often the case for housing values in a community, where a few multi-million dollar homes can pull the distribution to the right.

skewed to the left

is the opposite phenomenon, where small values (such as zero) pull the distribution to the left.

Central tendency

is a typical or representative value for the distribution of observed values. There are several ways to measure central tendency, including mean, median, and mode. The central tendency can be applied to the population as a whole, or to a sample from the population. In a descriptive sense, it can be applied to any collection of data. Typically, the terminology will make clear what the context is, i.e., a population mean or a sample average (mean).

mean

is the average of a distribution. It is computed by adding up the values and dividing by the number of observations. For example, the mean of [2, 3, 4, 5] is (2 + 3 + 4 + 5)/4, or 14/4 = 3.5. weighted mean is when there is a greater importance placed on specific entries or when representative values are used for groups of observations. For example, when computing a measure for the mean income among a number of counties, the value for each county could be multiplied by the number of people of the county, yielding a population-weighted mean. The mean is appropriate for interval and ratio scaled data, but not for ordinal or nominal data.

median

is the middle value of a ranked distribution. The median of [2, 3, 4, 6, 7] is 4. When the number of observations is even, there is no exact middle, and typically the average of the two values just below and just above the middle is used. So, for [2, 3, 4, 5] the median would be (3 + 4)/2, or 3.5. The median is the only suitable measure of central tendency for ordinal data, but it can also be applied to interval and ratio scale data after they are converted to ranked values.

mode

is the most frequent number in a distribution. There can be more than one mode for distribution. For example, the modes of [1, 2, 3, 3, 5, 6, 7, 7] are 3 and 7. The mode is the only measure of central tendency that can be used for nominal data, but it can also be applied to interval and ratio scale data.

symmetry

The mean and median are affected by the symmetry of a distribution. For a symmetric distribution, they tend to be very close, but for skewed distributions, they tend to be different. Specifically, for a distribution that is skewed to the right (more large values), the mean will tend to be larger than the median, and in a distribution that is skewed to the left (more small values), the mean will tend to be smaller than the median. In both these cases, the median is typically the preferred measure of central tendency.

Basic Descriptive Statistics - Dispersion

An important characteristic of distribution is how its values are spread around the central tendency.

The two most commonly used measures to assess this are the variance and the standard deviation.

standard deviation

based on the squared difference from the mean.

the standard deviation is the square root of the variance. As a result, the standard deviation is in the same units as the original variable and is therefore often preferred.

variance

based on the squared difference from the mean

The variance is the average squared deviation from the mean. A larger variance means a greater spread

around the mean (flatter distribution), a smaller variance a narrower spread (a spikier distribution).

calculating variance and standard deviation for [1, 2, 3, 4, 5]

- first calculate the mean: (1 + 2 + 3 + 4 + 5)/5 = 15/5 = 3
- the squared deviation for each observation is (1 - 3)^2 = 4, (2 - 3)^2 = 1, (3 -3)^2 = 0, (4 -3)^2 = 1, and (5 - 3)^2 = 4.
- The sum of these squared deviations is 4 + 1+ 0 + 1 + 4 = 10.
- The variance is this value divided by the number of observations, or 10/5 = 2.
- The standard deviation is the square root of the variance or √ 2= 1.41…

n

the number of observations in statistics

This is the correct expression for the population variance (or standard deviation), where the mean is assumed to be known.

degree of freedom correction

in practice, we work with samples, where the mean is estimated and not known. Because we have to compute the mean first, we subtract 1 from the number of observations and divide by n - 1.

In essence, because we already used the data once to compute the mean, we have to correct for that when we compute an estimate for the variance. As a result, the variance calculated with a degree of freedom correction n - 1 will be slightly larger than the one that uses n.

outliers

in a normal or Gaussian distribution, 95% of the distribution is within two standard deviations below and above the mean. In practice, therefore, observations that lay outside this range are often referred to as outliers.

Coefficient of Variation

measures the relative dispersion from the mean by taking the standard deviation and dividing by the mean.

Which type of variables can be used with variance, standard deviation, and coefficient of variation calculations?

interval and ratio scaled variables

NOT ordinal or nominal variables